## Project Overview: An Ensemble Bayesian Neural Network for Recommendation Systems This project implements a sophisticated recommendation system that leverages the power of **Bayesian Neural Networks (BNNs)** and **ensemble learning**. The core idea is to build a robust collaborative filtering model that not only provides accurate recommendations but also quantifies the uncertainty associated with its predictions. This is achieved by combining multiple, diverse BNN models into a powerful ensemble, whose collective predictions are intelligently aggregated by a meta-learning component. The system is designed to work with standard recommendation datasets like MovieLens and is evaluated on ranking-based metrics such as Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG). --- ## Architectural Breakdown The project is logically structured into three main Python files, each with a distinct responsibility. ### **File 1: `BNN.py` — The Bayesian Building Blocks** This script defines the fundamental components required to construct Bayesian Neural Networks. Instead of learning fixed-point weights like standard neural networks, BNNs learn probability distributions over their weights. This file provides the necessary classes to manage these distributions. **Key Components:** * **Distribution Classes:** * `Gaussian`: Implements a Gaussian distribution for the weights and biases of the BNN layers. It uses the reparameterization trick (`μ + σ * ε`) for efficient sampling during training. The standard deviation `σ` is derived from a learnable parameter `ρ` to ensure it remains positive. * `ScaleMixtureGaussian`: Defines a more flexible prior distribution for weights, constructed as a mixture of two Gaussian distributions with different variances. This allows the model to distinguish between highly important weights and those that can be pruned, effectively encouraging sparsity. * `LaplacePrior` & `IsotropicGaussian`: Provide alternative, simpler prior distributions (Laplace and standard Gaussian, respectively) that can be used for regularization and experimentation. * **Core BNN Layer:** * `BayesianLinear`: This is the cornerstone module, acting as a drop-in replacement for a standard `torch.nn.Linear` layer. It maintains learnable parameters (`mu` and `rho`) for the distributions of its weights and biases. During training, it calculates the **log-prior** (how well the sampled weights fit the prior distribution) and the **log-variational-posterior** (how probable the sampled weights are under their learned distribution). These two values are essential for computing the KL-divergence, a key component of the BNN's loss function. ### **File 2: `DataSet.py` — Data Handling and Preprocessing** This script is dedicated to loading, parsing, and preparing the dataset for training and evaluation. It handles the specifics of the MovieLens datasets and transforms the raw data into a format suitable for the PyTorch model. **Key Class: `DataSet`** * **Initialization & Data Loading:** The constructor, via the `getData` method, loads the specified MovieLens dataset (`ml-1m` or `ml-100k`) from file, parsing user IDs, item IDs, and ratings. * **Train-Test Splitting:** The `getTrainTest` method implements a temporal split. For each user, their last interaction is held out for the test set, while all preceding interactions form the training set. This is a standard and realistic evaluation protocol in recommendation systems. * **Negative Sampling:** * `getInstances`: For each positive user-item interaction in the training set, this method samples a specified number of "negative" items (items the user has not interacted with). This is crucial for training the model to discriminate between relevant and irrelevant items. * `getTestNeg`: Prepares the test set for ranking evaluation. For each user's true positive item in the test set, it samples 99 negative items, creating a list of 100 items to rank. * **Embedding Matrix:** The `getEmbedding` method constructs a full user-item interaction matrix, which is used to initialize the input embeddings for the model. ### **File 3: `main.py` — Model Architecture, Training, and Evaluation** This is the main driver script that assembles the components from the other files into a complete system, defines the training loop, and manages the ensemble logic. **Key Classes:** * **`Model`:** This class defines the architecture of a single BNN-based recommendation model. * **Initialization:** It sets up separate user and item processing streams. Each stream consists of a stack of `BayesianLinear` layers. It also initializes an attention mechanism. * **Embeddings:** It uses the user-item interaction matrix from `DataSet.py` as a non-trainable input embedding. * **Forward Pass:** A user and an item are passed through their respective BNN towers. The resulting latent representations are then combined element-wise and fed into a `MultiheadAttention` layer to capture complex interactions. A final MLP block (`interaction_layer`) produces the predicted interaction probability. * **Loss Calculation:** Includes helper methods (`log_prior`, `log_variational_posterior`, `sample_elbo`) to compute the total model loss, which is a combination of the standard Binary Cross-Entropy (BCE) loss and the KL-divergence term (the "complexity cost") from the Bayesian layers. * **`SuperModel`:** This class defines the meta-learner for the ensemble. * **Architecture:** It is a small neural network that takes the concatenated prediction scores from all individual models in the ensemble as input. * **Function:** It learns to weigh the predictions from the different base models to produce a final, more accurate prediction, effectively learning the strengths and weaknesses of each ensemble member. **Execution Flow:** 1. **Ensemble Initialization:** The `main` function begins by creating a diverse ensemble of `Model` instances. Diversity is achieved by randomly assigning different network architectures (layer depths/widths) and prior distributions (`ScaleMixtureGaussian`, `Laplace`, etc.) to each model. Each model is also trained on a different **bootstrap sample** of the training data. 2. **Individual Model Training:** Each model in the ensemble is trained independently using the `run_epoch` function. * The loss function is the **Evidence Lower Bound (ELBO)**, which balances the BCE loss (fitting the data) with the KL divergence (regularizing the model complexity). * After each epoch, the model is evaluated using the `evaluate` function, which calculates HR@10 and NDCG@10. The best-performing checkpoint for each model is saved. 3. **Super Model Training:** After all base models are trained, the saved best checkpoints are loaded. A `SuperModel` is then instantiated and trained via the `train_super_model` function. It learns to combine the predictions of the frozen base models. 4. **Final Evaluation:** The entire ensemble, aggregated by the trained `SuperModel`, is evaluated on the test set using the `ensemble_eval` function to report the final HR and NDCG scores. --- ## Summary of Key Features * **Uncertainty Quantification:** The use of `BayesianLinear` layers allows the model to capture uncertainty in its weights, leading to more robust predictions. * **Ensemble Diversity:** The system actively promotes diversity through architectural heterogeneity and data bootstrapping (bagging), which is key to a successful ensemble. * **Advanced Interaction Modeling:** A `MultiheadAttention` mechanism is used to effectively model the complex, non-linear interactions between user and item latent features. * **Meta-Learning for Aggregation:** Instead of simple averaging, a dedicated `SuperModel` learns the optimal way to combine predictions from the ensemble members. * **Principled Loss Function:** The training relies on optimizing the ELBO, a standard and theoretically grounded objective for variational inference in Bayesian models.