Bayesian Deep Ensemble Collaborative Filtering
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Mohammadshayan Shabani ff41b2101f thesis tools added 2 months ago
BNN.py thesis tools added 2 months ago
DataSet.py thesis tools added 2 months ago
README.md thesis tools added 2 months ago
main.py thesis tools added 2 months ago

README.md

Project Overview: An Ensemble Bayesian Neural Network for Recommendation Systems

This project implements a sophisticated recommendation system that leverages the power of Bayesian Neural Networks (BNNs) and ensemble learning. The core idea is to build a robust collaborative filtering model that not only provides accurate recommendations but also quantifies the uncertainty associated with its predictions. This is achieved by combining multiple, diverse BNN models into a powerful ensemble, whose collective predictions are intelligently aggregated by a meta-learning component.

The system is designed to work with standard recommendation datasets like MovieLens and is evaluated on ranking-based metrics such as Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG).


Architectural Breakdown

The project is logically structured into three main Python files, each with a distinct responsibility.

File 1: BNN.py — The Bayesian Building Blocks

This script defines the fundamental components required to construct Bayesian Neural Networks. Instead of learning fixed-point weights like standard neural networks, BNNs learn probability distributions over their weights. This file provides the necessary classes to manage these distributions.

Key Components:

  • Distribution Classes:

    • Gaussian: Implements a Gaussian distribution for the weights and biases of the BNN layers. It uses the reparameterization trick (μ + σ * ε) for efficient sampling during training. The standard deviation σ is derived from a learnable parameter ρ to ensure it remains positive.
    • ScaleMixtureGaussian: Defines a more flexible prior distribution for weights, constructed as a mixture of two Gaussian distributions with different variances. This allows the model to distinguish between highly important weights and those that can be pruned, effectively encouraging sparsity.
    • LaplacePrior & IsotropicGaussian: Provide alternative, simpler prior distributions (Laplace and standard Gaussian, respectively) that can be used for regularization and experimentation.
  • Core BNN Layer:

    • BayesianLinear: This is the cornerstone module, acting as a drop-in replacement for a standard torch.nn.Linear layer. It maintains learnable parameters (mu and rho) for the distributions of its weights and biases. During training, it calculates the log-prior (how well the sampled weights fit the prior distribution) and the log-variational-posterior (how probable the sampled weights are under their learned distribution). These two values are essential for computing the KL-divergence, a key component of the BNN’s loss function.

File 2: DataSet.py — Data Handling and Preprocessing

This script is dedicated to loading, parsing, and preparing the dataset for training and evaluation. It handles the specifics of the MovieLens datasets and transforms the raw data into a format suitable for the PyTorch model.

Key Class: DataSet

  • Initialization & Data Loading: The constructor, via the getData method, loads the specified MovieLens dataset (ml-1m or ml-100k) from file, parsing user IDs, item IDs, and ratings.
  • Train-Test Splitting: The getTrainTest method implements a temporal split. For each user, their last interaction is held out for the test set, while all preceding interactions form the training set. This is a standard and realistic evaluation protocol in recommendation systems.
  • Negative Sampling:
    • getInstances: For each positive user-item interaction in the training set, this method samples a specified number of “negative” items (items the user has not interacted with). This is crucial for training the model to discriminate between relevant and irrelevant items.
    • getTestNeg: Prepares the test set for ranking evaluation. For each user’s true positive item in the test set, it samples 99 negative items, creating a list of 100 items to rank.
  • Embedding Matrix: The getEmbedding method constructs a full user-item interaction matrix, which is used to initialize the input embeddings for the model.

File 3: main.py — Model Architecture, Training, and Evaluation

This is the main driver script that assembles the components from the other files into a complete system, defines the training loop, and manages the ensemble logic.

Key Classes:

  • Model: This class defines the architecture of a single BNN-based recommendation model.

    • Initialization: It sets up separate user and item processing streams. Each stream consists of a stack of BayesianLinear layers. It also initializes an attention mechanism.
    • Embeddings: It uses the user-item interaction matrix from DataSet.py as a non-trainable input embedding.
    • Forward Pass: A user and an item are passed through their respective BNN towers. The resulting latent representations are then combined element-wise and fed into a MultiheadAttention layer to capture complex interactions. A final MLP block (interaction_layer) produces the predicted interaction probability.
    • Loss Calculation: Includes helper methods (log_prior, log_variational_posterior, sample_elbo) to compute the total model loss, which is a combination of the standard Binary Cross-Entropy (BCE) loss and the KL-divergence term (the “complexity cost”) from the Bayesian layers.
  • SuperModel: This class defines the meta-learner for the ensemble.

    • Architecture: It is a small neural network that takes the concatenated prediction scores from all individual models in the ensemble as input.
    • Function: It learns to weigh the predictions from the different base models to produce a final, more accurate prediction, effectively learning the strengths and weaknesses of each ensemble member.

Execution Flow:

  1. Ensemble Initialization: The main function begins by creating a diverse ensemble of Model instances. Diversity is achieved by randomly assigning different network architectures (layer depths/widths) and prior distributions (ScaleMixtureGaussian, Laplace, etc.) to each model. Each model is also trained on a different bootstrap sample of the training data.

  2. Individual Model Training: Each model in the ensemble is trained independently using the run_epoch function.

    • The loss function is the Evidence Lower Bound (ELBO), which balances the BCE loss (fitting the data) with the KL divergence (regularizing the model complexity).
    • After each epoch, the model is evaluated using the evaluate function, which calculates HR@10 and NDCG@10. The best-performing checkpoint for each model is saved.
  3. Super Model Training: After all base models are trained, the saved best checkpoints are loaded. A SuperModel is then instantiated and trained via the train_super_model function. It learns to combine the predictions of the frozen base models.

  4. Final Evaluation: The entire ensemble, aggregated by the trained SuperModel, is evaluated on the test set using the ensemble_eval function to report the final HR and NDCG scores.


Summary of Key Features

  • Uncertainty Quantification: The use of BayesianLinear layers allows the model to capture uncertainty in its weights, leading to more robust predictions.
  • Ensemble Diversity: The system actively promotes diversity through architectural heterogeneity and data bootstrapping (bagging), which is key to a successful ensemble.
  • Advanced Interaction Modeling: A MultiheadAttention mechanism is used to effectively model the complex, non-linear interactions between user and item latent features.
  • Meta-Learning for Aggregation: Instead of simple averaging, a dedicated SuperModel learns the optimal way to combine predictions from the ensemble members.
  • Principled Loss Function: The training relies on optimizing the ELBO, a standard and theoretically grounded objective for variational inference in Bayesian models.