BSc project of Parham Saremi. The goal of the project was to detect the geographical region of the food using textual and visual features extracted from recipes and ingredients of the food.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 6.8KB

10 months ago
10 months ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
  1. # Required data for running on other servers
  2. - Crawled Images (Resized to 384): (172.27.50.254):/home/dml/food/CuisineAdaptation/crawled-images-full-384
  3. - FastText Model Folder: (172.27.50.254):/home/dml/food/CuisineAdaptation/fasttext
  4. - Extracted Image Features: (172.27.50.254):/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full
  5. - Extracted Text Features: (172.27.50.254):/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features
  6. - Crawled Images (Original Size): (172.27.49.27) /media/external_10TB/10TB/Behnamnia/HPC-BACKUP_01.02.07/food/ACM-MM/IngredientsCrawling/[crawled_images_full]|[crawled_images_full_v2]
  7. The "Crawled Images (Original Size)" path contains the original images of ingredients that were obtained from Google. Afterwards, these images were resized to 384 to facilitate their transfer between servers. Since our image models use a maximum size of 300 for input images, a larger size is not necessary, making this reduction in size very convenient for us. The original data folder is over 200 GB in size, while the resized folder still contains over 20 GB of data, therefore, only their paths are provided here. These folders **are not required** to run the final model since their features are extracted using pre-trained models that we use to run our model.
  8. The FastText model is a non-contextual model used to extract embeddings of ingredient names. The model is approximately 1 GB in size, so only its path is provided here. This model **is required** to run the final training code.
  9. The "Extracted Image Features" path refers to the folder containing the extracted features from ingredient images using pre-trained image models. These image features **are necessary** to run the main training code.
  10. The "Extracted Text Features" path refers to the folder containing the extracted features from recipes using the BERT model. These features **are also required** to run the main training code.
  11. # Structure of the Available Files
  12. ## Data
  13. This folder contains the following files:
  14. - train.json: train split of RecipeDB dataset
  15. - val.json: validation split of RecipeDB dataset
  16. - region.json: a JSON file listing all of the regions and assigning a number to each one of them
  17. - ingredient_counts.json: a JSON file showing a list of all of the ingredients in RecipeDB dataset and their count in the whole dataset.
  18. - image_dict_ings.json: a list of crawled image names.
  19. ## Utils
  20. The following list explains all the files that appear in the utils folder:
  21. - fasttext_embedding.py: a Python util file that provides functions to get embeddings from FastText.
  22. - io.py: utility functions that help in loading and saving the config.
  23. - bypass_bn.py: a file that contains functions to handle Batch Normalizations.
  24. - recipedb_dataset.py: an implementation of RecipeDB dataset using PyTorch Dataset class.
  25. - sam.py: an implementation of SAM optimizer, which is used in this project.
  26. ## Others
  27. - extract_image_vector.py: used for ingredient visual embedding extraction.
  28. - extract_recipe_vector.py: used for recipe text embedding extraction.
  29. - network.py: implementation of PyTorch Image-Text-Transformer model that is required for solving the problem.
  30. - train.py: code used for loading the data, creating the model, and feeding the data to the model.
  31. - best_config.yml: YAML config file used for specifying the hyperparameters of the model.
  32. # How to extract features
  33. ## Text features
  34. We can extract text features from RecipeDB's recipes using the `extract_recipe_vector.py` python file. This file defines a path to the JSON data file. The JSON files (`Data/train.json` and `Data/val.json`) are the files that this script uses to extract embeddings. This script also defines an output path at the beginning of the file. The output path is the location where the final embeddings will be saved. Below is the command for running this script:
  35. ```bash
  36. python3 extract_recipe_vector.py
  37. ```
  38. ## Image features
  39. Image features can be extracted using the `extract_image_vector.py` code, which specifies the following text fields at its beginning:
  40. ```python
  41. input_dir = '/home/dml/food/CuisineAdaptation/crawled-images-full-384'
  42. output_root_dir = 'image-features-full'
  43. ```
  44. Using these fields, we can define the path to the input image folder (which contains images of different ingredients resized to 384x384) and the output root directory, which indicates where the embeddings will be saved.
  45. This script will load and run five pretrained models on the input data and save their embeddings in the output folder. Keep in mind that the output embedding for an ingredient is the average of all the embeddings extracted from its corresponding images.
  46. # How to run the train code
  47. The code only takes a configuration file as input and can be run solely using the configuration. The command for running the training code is as follows:
  48. ```bash
  49. python3 train.py --config best_config.yml
  50. ```
  51. # Config
  52. ```yml
  53. optim:
  54. epochs: "Number of epochs for training" :> int
  55. batch_size: "Batch size for the dataset" :> int
  56. max_lr: "Max learning rate to pass to the scheduler" :> float
  57. weight_decay: "Weight decay value to pass to optimizer" :> float
  58. device: "Device to use for training, either 'cuda' or 'cpu'" :> str
  59. num_workers: "Number of workers for dataloader" :> int
  60. sam_rho: "Hyperparameter for SAM optimizer" :> float
  61. text_model: "bert-base-uncased" :> str
  62. image_model: "Name of the image model. Available values: resnet18, resnet50, resnet101, efficientnet_b0, efficientnet_b3" :> str
  63. image_features_path: "Path for the extracted image features" :> str
  64. text_features_path: "Path for the extracted text features" :> str
  65. use_recipe_text: "Should the model use recipe embeddings?" :> bool
  66. use_image_ingredients: "Should the model use ingredient image embeddings?" :> bool
  67. use_text_ingredients: "Should the model use ingredient text embeddings?" :> bool
  68. model:
  69. ingredient_feature_extractor:
  70. layers: "Number of transformer blocks like T or TTTT" :> str
  71. H: "Embedding size for the transformer" :> int
  72. transformer:
  73. L: "Number of layers for each transformer block" :> int
  74. n_heads: "Number of heads for each transformer block" :> int
  75. final_ingredient_feature_size: "What is the ingredient feature size after we get the output from the transformer?" :> int
  76. image_feature_size: "What is the size of the image features reduced to in the beginning?" :> int
  77. text_feature_size: "What is the size of the text features from recipes reduced to?" :> int
  78. final_classes: "This will be replaced in the code. Just set it to -1." :> int
  79. data:
  80. embedding_size: "What is the embedding size of ingredient text features?" :> int
  81. dataset_path: "Path for the RecipeDB dataset" :> str
  82. fasttext_path: "Path to the fasttext .model file" :> str
  83. target: "Type of target. Should be 'region' for this project." :> str
  84. ```