|
|
2 months ago | |
|---|---|---|
| BNN.py | 2 months ago | |
| DataSet.py | 2 months ago | |
| README.md | 2 months ago | |
| main.py | 2 months ago | |
This project implements a sophisticated recommendation system that leverages the power of Bayesian Neural Networks (BNNs) and ensemble learning. The core idea is to build a robust collaborative filtering model that not only provides accurate recommendations but also quantifies the uncertainty associated with its predictions. This is achieved by combining multiple, diverse BNN models into a powerful ensemble, whose collective predictions are intelligently aggregated by a meta-learning component.
The system is designed to work with standard recommendation datasets like MovieLens and is evaluated on ranking-based metrics such as Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG).
The project is logically structured into three main Python files, each with a distinct responsibility.
BNN.py — The Bayesian Building BlocksThis script defines the fundamental components required to construct Bayesian Neural Networks. Instead of learning fixed-point weights like standard neural networks, BNNs learn probability distributions over their weights. This file provides the necessary classes to manage these distributions.
Key Components:
Distribution Classes:
Gaussian: Implements a Gaussian distribution for the weights and biases of the BNN layers. It uses the reparameterization trick (μ + σ * ε) for efficient sampling during training. The standard deviation σ is derived from a learnable parameter ρ to ensure it remains positive.ScaleMixtureGaussian: Defines a more flexible prior distribution for weights, constructed as a mixture of two Gaussian distributions with different variances. This allows the model to distinguish between highly important weights and those that can be pruned, effectively encouraging sparsity.LaplacePrior & IsotropicGaussian: Provide alternative, simpler prior distributions (Laplace and standard Gaussian, respectively) that can be used for regularization and experimentation.Core BNN Layer:
BayesianLinear: This is the cornerstone module, acting as a drop-in replacement for a standard torch.nn.Linear layer. It maintains learnable parameters (mu and rho) for the distributions of its weights and biases. During training, it calculates the log-prior (how well the sampled weights fit the prior distribution) and the log-variational-posterior (how probable the sampled weights are under their learned distribution). These two values are essential for computing the KL-divergence, a key component of the BNN’s loss function.DataSet.py — Data Handling and PreprocessingThis script is dedicated to loading, parsing, and preparing the dataset for training and evaluation. It handles the specifics of the MovieLens datasets and transforms the raw data into a format suitable for the PyTorch model.
Key Class: DataSet
getData method, loads the specified MovieLens dataset (ml-1m or ml-100k) from file, parsing user IDs, item IDs, and ratings.getTrainTest method implements a temporal split. For each user, their last interaction is held out for the test set, while all preceding interactions form the training set. This is a standard and realistic evaluation protocol in recommendation systems.getInstances: For each positive user-item interaction in the training set, this method samples a specified number of “negative” items (items the user has not interacted with). This is crucial for training the model to discriminate between relevant and irrelevant items.getTestNeg: Prepares the test set for ranking evaluation. For each user’s true positive item in the test set, it samples 99 negative items, creating a list of 100 items to rank.getEmbedding method constructs a full user-item interaction matrix, which is used to initialize the input embeddings for the model.main.py — Model Architecture, Training, and EvaluationThis is the main driver script that assembles the components from the other files into a complete system, defines the training loop, and manages the ensemble logic.
Key Classes:
Model: This class defines the architecture of a single BNN-based recommendation model.
BayesianLinear layers. It also initializes an attention mechanism.DataSet.py as a non-trainable input embedding.MultiheadAttention layer to capture complex interactions. A final MLP block (interaction_layer) produces the predicted interaction probability.log_prior, log_variational_posterior, sample_elbo) to compute the total model loss, which is a combination of the standard Binary Cross-Entropy (BCE) loss and the KL-divergence term (the “complexity cost”) from the Bayesian layers.SuperModel: This class defines the meta-learner for the ensemble.
Execution Flow:
Ensemble Initialization: The main function begins by creating a diverse ensemble of Model instances. Diversity is achieved by randomly assigning different network architectures (layer depths/widths) and prior distributions (ScaleMixtureGaussian, Laplace, etc.) to each model. Each model is also trained on a different bootstrap sample of the training data.
Individual Model Training: Each model in the ensemble is trained independently using the run_epoch function.
evaluate function, which calculates HR@10 and NDCG@10. The best-performing checkpoint for each model is saved.Super Model Training: After all base models are trained, the saved best checkpoints are loaded. A SuperModel is then instantiated and trained via the train_super_model function. It learns to combine the predictions of the frozen base models.
Final Evaluation: The entire ensemble, aggregated by the trained SuperModel, is evaluated on the test set using the ensemble_eval function to report the final HR and NDCG scores.
BayesianLinear layers allows the model to capture uncertainty in its weights, leading to more robust predictions.MultiheadAttention mechanism is used to effectively model the complex, non-linear interactions between user and item latent features.SuperModel learns the optimal way to combine predictions from the ensemble members.