## Project Overview: An Ensemble Bayesian Neural Network for Recommendation Systems

This project implements a sophisticated recommendation system that leverages the power of **Bayesian Neural Networks (BNNs)** and **ensemble learning**. The core idea is to build a robust collaborative filtering model that not only provides accurate recommendations but also quantifies the uncertainty associated with its predictions. This is achieved by combining multiple, diverse BNN models into a powerful ensemble, whose collective predictions are intelligently aggregated by a meta-learning component.

The system is designed to work with standard recommendation datasets like MovieLens and is evaluated on ranking-based metrics such as Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG).

---

## Architectural Breakdown

The project is logically structured into three main Python files, each with a distinct responsibility.

### **File 1: `BNN.py` — The Bayesian Building Blocks**

This script defines the fundamental components required to construct Bayesian Neural Networks. Instead of learning fixed-point weights like standard neural networks, BNNs learn probability distributions over their weights. This file provides the necessary classes to manage these distributions.

**Key Components:**

*   **Distribution Classes:**
    *   `Gaussian`: Implements a Gaussian distribution for the weights and biases of the BNN layers. It uses the reparameterization trick (`μ + σ * ε`) for efficient sampling during training. The standard deviation `σ` is derived from a learnable parameter `ρ` to ensure it remains positive.
    *   `ScaleMixtureGaussian`: Defines a more flexible prior distribution for weights, constructed as a mixture of two Gaussian distributions with different variances. This allows the model to distinguish between highly important weights and those that can be pruned, effectively encouraging sparsity.
    *   `LaplacePrior` & `IsotropicGaussian`: Provide alternative, simpler prior distributions (Laplace and standard Gaussian, respectively) that can be used for regularization and experimentation.

*   **Core BNN Layer:**
    *   `BayesianLinear`: This is the cornerstone module, acting as a drop-in replacement for a standard `torch.nn.Linear` layer. It maintains learnable parameters (`mu` and `rho`) for the distributions of its weights and biases. During training, it calculates the **log-prior** (how well the sampled weights fit the prior distribution) and the **log-variational-posterior** (how probable the sampled weights are under their learned distribution). These two values are essential for computing the KL-divergence, a key component of the BNN's loss function.

### **File 2: `DataSet.py` — Data Handling and Preprocessing**

This script is dedicated to loading, parsing, and preparing the dataset for training and evaluation. It handles the specifics of the MovieLens datasets and transforms the raw data into a format suitable for the PyTorch model.

**Key Class: `DataSet`**

*   **Initialization & Data Loading:** The constructor, via the `getData` method, loads the specified MovieLens dataset (`ml-1m` or `ml-100k`) from file, parsing user IDs, item IDs, and ratings.
*   **Train-Test Splitting:** The `getTrainTest` method implements a temporal split. For each user, their last interaction is held out for the test set, while all preceding interactions form the training set. This is a standard and realistic evaluation protocol in recommendation systems.
*   **Negative Sampling:**
    *   `getInstances`: For each positive user-item interaction in the training set, this method samples a specified number of "negative" items (items the user has not interacted with). This is crucial for training the model to discriminate between relevant and irrelevant items.
    *   `getTestNeg`: Prepares the test set for ranking evaluation. For each user's true positive item in the test set, it samples 99 negative items, creating a list of 100 items to rank.
*   **Embedding Matrix:** The `getEmbedding` method constructs a full user-item interaction matrix, which is used to initialize the input embeddings for the model.

### **File 3: `main.py` — Model Architecture, Training, and Evaluation**

This is the main driver script that assembles the components from the other files into a complete system, defines the training loop, and manages the ensemble logic.

**Key Classes:**

*   **`Model`:** This class defines the architecture of a single BNN-based recommendation model.
    *   **Initialization:** It sets up separate user and item processing streams. Each stream consists of a stack of `BayesianLinear` layers. It also initializes an attention mechanism.
    *   **Embeddings:** It uses the user-item interaction matrix from `DataSet.py` as a non-trainable input embedding.
    *   **Forward Pass:** A user and an item are passed through their respective BNN towers. The resulting latent representations are then combined element-wise and fed into a `MultiheadAttention` layer to capture complex interactions. A final MLP block (`interaction_layer`) produces the predicted interaction probability.
    *   **Loss Calculation:** Includes helper methods (`log_prior`, `log_variational_posterior`, `sample_elbo`) to compute the total model loss, which is a combination of the standard Binary Cross-Entropy (BCE) loss and the KL-divergence term (the "complexity cost") from the Bayesian layers.

*   **`SuperModel`:** This class defines the meta-learner for the ensemble.
    *   **Architecture:** It is a small neural network that takes the concatenated prediction scores from all individual models in the ensemble as input.
    *   **Function:** It learns to weigh the predictions from the different base models to produce a final, more accurate prediction, effectively learning the strengths and weaknesses of each ensemble member.

**Execution Flow:**

1.  **Ensemble Initialization:** The `main` function begins by creating a diverse ensemble of `Model` instances. Diversity is achieved by randomly assigning different network architectures (layer depths/widths) and prior distributions (`ScaleMixtureGaussian`, `Laplace`, etc.) to each model. Each model is also trained on a different **bootstrap sample** of the training data.

2.  **Individual Model Training:** Each model in the ensemble is trained independently using the `run_epoch` function.
    *   The loss function is the **Evidence Lower Bound (ELBO)**, which balances the BCE loss (fitting the data) with the KL divergence (regularizing the model complexity).
    *   After each epoch, the model is evaluated using the `evaluate` function, which calculates HR@10 and NDCG@10. The best-performing checkpoint for each model is saved.

3.  **Super Model Training:** After all base models are trained, the saved best checkpoints are loaded. A `SuperModel` is then instantiated and trained via the `train_super_model` function. It learns to combine the predictions of the frozen base models.

4.  **Final Evaluation:** The entire ensemble, aggregated by the trained `SuperModel`, is evaluated on the test set using the `ensemble_eval` function to report the final HR and NDCG scores.

---

## Summary of Key Features

*   **Uncertainty Quantification:** The use of `BayesianLinear` layers allows the model to capture uncertainty in its weights, leading to more robust predictions.
*   **Ensemble Diversity:** The system actively promotes diversity through architectural heterogeneity and data bootstrapping (bagging), which is key to a successful ensemble.
*   **Advanced Interaction Modeling:** A `MultiheadAttention` mechanism is used to effectively model the complex, non-linear interactions between user and item latent features.
*   **Meta-Learning for Aggregation:** Instead of simple averaging, a dedicated `SuperModel` learns the optimal way to combine predictions from the ensemble members.
*   **Principled Loss Function:** The training relies on optimizing the ELBO, a standard and theoretically grounded objective for variational inference in Bayesian models.