Parham b68169505b Add code for BSc proj | 1 year ago | |
---|---|---|
utils | 1 year ago | |
README.md | 1 year ago | |
best_config.yml | 1 year ago | |
extract_image_vector.py | 1 year ago | |
extract_recipe_vector.py | 1 year ago | |
network.py | 1 year ago | |
train.py | 1 year ago |
The “Crawled Images (Original Size)” path contains the original images of ingredients that were obtained from Google. Afterwards, these images were resized to 384 to facilitate their transfer between servers. Since our image models use a maximum size of 300 for input images, a larger size is not necessary, making this reduction in size very convenient for us. The original data folder is over 200 GB in size, while the resized folder still contains over 20 GB of data, therefore, only their paths are provided here. These folders are not required to run the final model since their features are extracted using pre-trained models that we use to run our model.
The FastText model is a non-contextual model used to extract embeddings of ingredient names. The model is approximately 1 GB in size, so only its path is provided here. This model is required to run the final training code.
The “Extracted Image Features” path refers to the folder containing the extracted features from ingredient images using pre-trained image models. These image features are necessary to run the main training code.
The “Extracted Text Features” path refers to the folder containing the extracted features from recipes using the BERT model. These features are also required to run the main training code.
This folder contains the following files:
image_dict_ings.json: a list of crawled image names.
The following list explains all the files that appear in the utils folder:
fasttext_embedding.py: a Python util file that provides functions to get embeddings from FastText.
io.py: utility functions that help in loading and saving the config.
bypass_bn.py: a file that contains functions to handle Batch Normalizations.
recipedb_dataset.py: an implementation of RecipeDB dataset using PyTorch Dataset class.
sam.py: an implementation of SAM optimizer, which is used in this project.
extract_image_vector.py: used for ingredient visual embedding extraction.
extract_recipe_vector.py: used for recipe text embedding extraction.
network.py: implementation of PyTorch Image-Text-Transformer model that is required for solving the problem.
train.py: code used for loading the data, creating the model, and feeding the data to the model.
best_config.yml: YAML config file used for specifying the hyperparameters of the model.
We can extract text features from RecipeDB’s recipes using the extract_recipe_vector.py
python file. This file defines a path to the JSON data file. The JSON files (Data/train.json
and Data/val.json
) are the files that this script uses to extract embeddings. This script also defines an output path at the beginning of the file. The output path is the location where the final embeddings will be saved. Below is the command for running this script:
python3 extract_recipe_vector.py
Image features can be extracted using the extract_image_vector.py
code, which specifies the following text fields at its beginning:
input_dir = '/home/dml/food/CuisineAdaptation/crawled-images-full-384'
output_root_dir = 'image-features-full'
Using these fields, we can define the path to the input image folder (which contains images of different ingredients resized to 384x384) and the output root directory, which indicates where the embeddings will be saved.
This script will load and run five pretrained models on the input data and save their embeddings in the output folder. Keep in mind that the output embedding for an ingredient is the average of all the embeddings extracted from its corresponding images.
The code only takes a configuration file as input and can be run solely using the configuration. The command for running the training code is as follows:
python3 train.py --config best_config.yml
optim:
epochs: "Number of epochs for training" :> int
batch_size: "Batch size for the dataset" :> int
max_lr: "Max learning rate to pass to the scheduler" :> float
weight_decay: "Weight decay value to pass to optimizer" :> float
device: "Device to use for training, either 'cuda' or 'cpu'" :> str
num_workers: "Number of workers for dataloader" :> int
sam_rho: "Hyperparameter for SAM optimizer" :> float
text_model: "bert-base-uncased" :> str
image_model: "Name of the image model. Available values: resnet18, resnet50, resnet101, efficientnet_b0, efficientnet_b3" :> str
image_features_path: "Path for the extracted image features" :> str
text_features_path: "Path for the extracted text features" :> str
use_recipe_text: "Should the model use recipe embeddings?" :> bool
use_image_ingredients: "Should the model use ingredient image embeddings?" :> bool
use_text_ingredients: "Should the model use ingredient text embeddings?" :> bool
model:
ingredient_feature_extractor:
layers: "Number of transformer blocks like T or TTTT" :> str
H: "Embedding size for the transformer" :> int
transformer:
L: "Number of layers for each transformer block" :> int
n_heads: "Number of heads for each transformer block" :> int
final_ingredient_feature_size: "What is the ingredient feature size after we get the output from the transformer?" :> int
image_feature_size: "What is the size of the image features reduced to in the beginning?" :> int
text_feature_size: "What is the size of the text features from recipes reduced to?" :> int
final_classes: "This will be replaced in the code. Just set it to -1." :> int
data:
embedding_size: "What is the embedding size of ingredient text features?" :> int
dataset_path: "Path for the RecipeDB dataset" :> str
fasttext_path: "Path to the fasttext .model file" :> str
target: "Type of target. Should be 'region' for this project." :> str