# Required data for running on other servers - Crawled Images (Resized to 384): (172.27.50.254):/home/dml/food/CuisineAdaptation/crawled-images-full-384 - FastText Model Folder: (172.27.50.254):/home/dml/food/CuisineAdaptation/fasttext - Extracted Image Features: (172.27.50.254):/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full - Extracted Text Features: (172.27.50.254):/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features - Crawled Images (Original Size): (172.27.49.27) /media/external_10TB/10TB/Behnamnia/HPC-BACKUP_01.02.07/food/ACM-MM/IngredientsCrawling/[crawled_images_full]|[crawled_images_full_v2] The "Crawled Images (Original Size)" path contains the original images of ingredients that were obtained from Google. Afterwards, these images were resized to 384 to facilitate their transfer between servers. Since our image models use a maximum size of 300 for input images, a larger size is not necessary, making this reduction in size very convenient for us. The original data folder is over 200 GB in size, while the resized folder still contains over 20 GB of data, therefore, only their paths are provided here. These folders **are not required** to run the final model since their features are extracted using pre-trained models that we use to run our model. The FastText model is a non-contextual model used to extract embeddings of ingredient names. The model is approximately 1 GB in size, so only its path is provided here. This model **is required** to run the final training code. The "Extracted Image Features" path refers to the folder containing the extracted features from ingredient images using pre-trained image models. These image features **are necessary** to run the main training code. The "Extracted Text Features" path refers to the folder containing the extracted features from recipes using the BERT model. These features **are also required** to run the main training code. # Structure of the Available Files ## Data This folder contains the following files: - train.json: train split of RecipeDB dataset - val.json: validation split of RecipeDB dataset - region.json: a JSON file listing all of the regions and assigning a number to each one of them - ingredient_counts.json: a JSON file showing a list of all of the ingredients in RecipeDB dataset and their count in the whole dataset. - image_dict_ings.json: a list of crawled image names. ## Utils The following list explains all the files that appear in the utils folder: - fasttext_embedding.py: a Python util file that provides functions to get embeddings from FastText. - io.py: utility functions that help in loading and saving the config. - bypass_bn.py: a file that contains functions to handle Batch Normalizations. - recipedb_dataset.py: an implementation of RecipeDB dataset using PyTorch Dataset class. - sam.py: an implementation of SAM optimizer, which is used in this project. ## Others - extract_image_vector.py: used for ingredient visual embedding extraction. - extract_recipe_vector.py: used for recipe text embedding extraction. - network.py: implementation of PyTorch Image-Text-Transformer model that is required for solving the problem. - train.py: code used for loading the data, creating the model, and feeding the data to the model. - best_config.yml: YAML config file used for specifying the hyperparameters of the model. # How to extract features ## Text features We can extract text features from RecipeDB's recipes using the `extract_recipe_vector.py` python file. This file defines a path to the JSON data file. The JSON files (`Data/train.json` and `Data/val.json`) are the files that this script uses to extract embeddings. This script also defines an output path at the beginning of the file. The output path is the location where the final embeddings will be saved. Below is the command for running this script: ```bash python3 extract_recipe_vector.py ``` ## Image features Image features can be extracted using the `extract_image_vector.py` code, which specifies the following text fields at its beginning: ```python input_dir = '/home/dml/food/CuisineAdaptation/crawled-images-full-384' output_root_dir = 'image-features-full' ``` Using these fields, we can define the path to the input image folder (which contains images of different ingredients resized to 384x384) and the output root directory, which indicates where the embeddings will be saved. This script will load and run five pretrained models on the input data and save their embeddings in the output folder. Keep in mind that the output embedding for an ingredient is the average of all the embeddings extracted from its corresponding images. # How to run the train code The code only takes a configuration file as input and can be run solely using the configuration. The command for running the training code is as follows: ```bash python3 train.py --config best_config.yml ``` # Config ```yml optim: epochs: "Number of epochs for training" :> int batch_size: "Batch size for the dataset" :> int max_lr: "Max learning rate to pass to the scheduler" :> float weight_decay: "Weight decay value to pass to optimizer" :> float device: "Device to use for training, either 'cuda' or 'cpu'" :> str num_workers: "Number of workers for dataloader" :> int sam_rho: "Hyperparameter for SAM optimizer" :> float text_model: "bert-base-uncased" :> str image_model: "Name of the image model. Available values: resnet18, resnet50, resnet101, efficientnet_b0, efficientnet_b3" :> str image_features_path: "Path for the extracted image features" :> str text_features_path: "Path for the extracted text features" :> str use_recipe_text: "Should the model use recipe embeddings?" :> bool use_image_ingredients: "Should the model use ingredient image embeddings?" :> bool use_text_ingredients: "Should the model use ingredient text embeddings?" :> bool model: ingredient_feature_extractor: layers: "Number of transformer blocks like T or TTTT" :> str H: "Embedding size for the transformer" :> int transformer: L: "Number of layers for each transformer block" :> int n_heads: "Number of heads for each transformer block" :> int final_ingredient_feature_size: "What is the ingredient feature size after we get the output from the transformer?" :> int image_feature_size: "What is the size of the image features reduced to in the beginning?" :> int text_feature_size: "What is the size of the text features from recipes reduced to?" :> int final_classes: "This will be replaced in the code. Just set it to -1." :> int data: embedding_size: "What is the embedding size of ingredient text features?" :> int dataset_path: "Path for the RecipeDB dataset" :> str fasttext_path: "Path to the fasttext .model file" :> str target: "Type of target. Should be 'region' for this project." :> str ```