1 year ago · b68169505b
--- a/README.md
+++ b/README.md
@@ -0,0 +1,96 @@
 # Required data for running on other servers
 - Crawled Images (Resized to 384): (172.27.50.254):/home/dml/food/CuisineAdaptation/crawled-images-full-384
 - FastText Model Folder: (172.27.50.254):/home/dml/food/CuisineAdaptation/fasttext
 - Extracted Image Features: (172.27.50.254):/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full
 - Extracted Text Features: (172.27.50.254):/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features
 - Crawled Images (Original Size): 

 The "Crawled Images (Original Size)" path contains the original images of ingredients that were obtained from Google. Afterwards, these images were resized to 384 to facilitate their transfer between servers. Since our image models use a maximum size of 300 for input images, a larger size is not necessary, making this reduction in size very convenient for us. The original data folder is over 200 GB in size, while the resized folder still contains over 20 GB of data, therefore, only their paths are provided here. These folders **are not required** to run the final model since their features are extracted using pre-trained models that we use to run our model.

 The FastText model is a non-contextual model used to extract embeddings of ingredient names. The model is approximately 1 GB in size, so only its path is provided here. This model **is required** to run the final training code.

 The "Extracted Image Features" path refers to the folder containing the extracted features from ingredient images using pre-trained image models. These image features **are necessary** to run the main training code.

 The "Extracted Text Features" path refers to the folder containing the extracted features from recipes using the BERT model. These features **are also required** to run the main training code.

 # Structure of the Available Files
 ## Data
 This folder contains the following files:

 - train.json: train split of RecipeDB dataset
 - val.json: validation split of RecipeDB dataset
 - region.json: a JSON file listing all of the regions and assigning a number to each one of them
 - ingredient_counts.json: a JSON file showing a list of all of the ingredients in RecipeDB dataset and their count in the whole dataset.
 - image_dict_ings.json: a list of crawled image names.
 ## Utils
 The following list explains all the files that appear in the utils folder:

 - fasttext_embedding.py: a Python util file that provides functions to get embeddings from FastText.
 - io.py: utility functions that help in loading and saving the config.
 - bypass_bn.py: a file that contains functions to handle Batch Normalizations.
 - recipedb_dataset.py: an implementation of RecipeDB dataset using PyTorch Dataset class.
 - sam.py: an implementation of SAM optimizer, which is used in this project.
 ## Others
 - extract_image_vector.py: used for ingredient visual embedding extraction.
 - extract_recipe_vector.py: used for recipe text embedding extraction.
 - network.py: implementation of PyTorch Image-Text-Transformer model that is required for solving the problem.
 - train.py: code used for loading the data, creating the model, and feeding the data to the model.
 - best_config.yml: YAML config file used for specifying the hyperparameters of the model.

 # How to extract features
 ## Text features
 We can extract text features from RecipeDB's recipes using the `extract_recipe_vector.py` python file. This file defines a path to the JSON data file. The JSON files (`Data/train.json` and `Data/val.json`) are the files that this script uses to extract embeddings. This script also defines an output path at the beginning of the file. The output path is the location where the final embeddings will be saved. Below is the command for running this script:
 ```bash
 python3 extract_recipe_vector.py
 ```

 ## Image features
 Image features can be extracted using the `extract_image_vector.py` code, which specifies the following text fields at its beginning:

 ```python
 input_dir = '/home/dml/food/CuisineAdaptation/crawled-images-full-384'
 output_root_dir = 'image-features-full'
 ```
 Using these fields, we can define the path to the input image folder (which contains images of different ingredients resized to 384x384) and the output root directory, which indicates where the embeddings will be saved.

 This script will load and run five pretrained models on the input data and save their embeddings in the output folder. Keep in mind that the output embedding for an ingredient is the average of all the embeddings extracted from its corresponding images.

 # How to run the train code
 The code only takes a configuration file as input and can be run solely using the configuration. The command for running the training code is as follows:
 ```bash
 python3 train.py --config best_config.yml
 ```
 # Config
 ```yml
 optim:
    epochs: "Number of epochs for training" :> int
    batch_size: "Batch size for the dataset" :> int
    max_lr: "Max learning rate to pass to the scheduler" :> float
    weight_decay: "Weight decay value to pass to optimizer" :> float
    device: "Device to use for training, either 'cuda' or 'cpu'" :> str
    num_workers: "Number of workers for dataloader" :> int
    sam_rho: "Hyperparameter for SAM optimizer" :> float
 text_model: "bert-base-uncased" :> str
 image_model: "Name of the image model. Available values: resnet18, resnet50, resnet101, efficientnet_b0, efficientnet_b3" :> str
 image_features_path: "Path for the extracted image features" :> str
 text_features_path: "Path for the extracted text features" :> str
 use_recipe_text: "Should the model use recipe embeddings?" :> bool
 use_image_ingredients: "Should the model use ingredient image embeddings?" :> bool
 use_text_ingredients: "Should the model use ingredient text embeddings?" :> bool
 model:
    ingredient_feature_extractor:
        layers: "Number of transformer blocks like T or TTTT" :> str
        H: "Embedding size for the transformer" :> int
        transformer:
            L: "Number of layers for each transformer block" :> int
            n_heads: "Number of heads for each transformer block" :> int
        final_ingredient_feature_size: "What is the ingredient feature size after we get the output from the transformer?" :> int
    image_feature_size: "What is the size of the image features reduced to in the beginning?" :> int
    text_feature_size: "What is the size of the text features from recipes reduced to?" :> int
    final_classes: "This will be replaced in the code. Just set it to -1." :> int
 data:
    embedding_size: "What is the embedding size of ingredient text features?" :> int
    dataset_path: "Path for the RecipeDB dataset" :> str
    fasttext_path: "Path to the fasttext .model file" :> str
    target: "Type of target. Should be 'region' for this project." :> str
 ```
--- a/best_config.yml
+++ b/best_config.yml
@@ -0,0 +1,31 @@
 optim:
    epochs: 50
    batch_size: 256
    max_lr: 0.001
    weight_decay: 0.0005
    device: "cuda:0"
    num_workers: 4
    sam_rho: 0.1
 text_model: "bert-base-uncased"
 image_model: "efficientnet_b0"
 image_features_path: "/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full"
 text_features_path: "/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features"
 use_recipe_text: True
 use_image_ingredients: True
 use_text_ingredients: True
 model:
    ingredient_feature_extractor:
        layers: "TTTTT"
        H: 384
        transformer:
            L: 4
            n_heads: 4
        final_ingredient_feature_size: 200
    image_feature_size: 200
    text_feature_size: 200
    final_classes: -1
 data:
    embedding_size: 100
    dataset_path: "Data"
    fasttext_path: "trained_1m.model"
    target: "region"
--- a/extract_image_vector.py
+++ b/extract_image_vector.py
@@ -0,0 +1,119 @@
 import os
 import numpy as np
 import torch
 import torchvision.models as models
 import torchvision.transforms as transforms
 from PIL import Image
 from tqdm import tqdm


 import warnings
 warnings.filterwarnings("ignore")

 models_names = [
    'efficientnet_t0', 
    'resnet50', 
    'resnet101', 
    'efficientnet_b0', 
    'efficientnet_b3'
 ]

 input_dir = '/home/dml/food/CuisineAdaptation/crawled-images-full-384'
 output_root_dir = 'image-features-full'

 image_size = {
    'resnet18': 224,
    'resnet50': 224,
    'resnet101': 224,
    'efficientnet_b0': 224,
    'efficientnet_t0': 224,
    'efficientnet_b3': 300
 }

 normalize = {
    'resnet18': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    'resnet50': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    'resnet101': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    'efficientnet_t0': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    'efficientnet_b3': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 }

 transform = {
    'resnet18': transforms.Compose([
        transforms.Resize(image_size['resnet18']),
        transforms.CenterCrop(image_size['resnet18']),
        transforms.ToTensor(),
        normalize['resnet18']
    ]),
    'resnet50': transforms.Compose([
        transforms.Resize(image_size['resnet50']),
        transforms.CenterCrop(image_size['resnet50']),
        transforms.ToTensor(),
        normalize['resnet50']
    ]),
    'resnet101': transforms.Compose([
        transforms.Resize(image_size['resnet101']),
        transforms.CenterCrop(image_size['resnet101']),
        transforms.ToTensor(),
        normalize['resnet101']
    ]),
    'efficientnet_t0': transforms.Compose([
        transforms.Resize(image_size['efficientnet_t0']),
        transforms.CenterCrop(image_size['efficientnet_t0']),
        transforms.ToTensor(),
        normalize['efficientnet_t0']
    ]),
    'efficientnet_b3': transforms.Compose([
        transforms.Resize(image_size['efficientnet_b3']),
        transforms.CenterCrop(image_size['efficientnet_b3']),
        transforms.ToTensor(),
        normalize['efficientnet_b3']
    ])
 }

 device = torch.device("cuda")
 counter = 0

 for model_name in models_names:
    if 'resnet' in model_name:
        model = getattr(models, model_name)(pretrained=True)
        num_features = model.fc.in_features
        model.fc = torch.nn.Identity()
    elif 'efficientnet' in model_name:
        model = getattr(models, model_name)(pretrained=True)
        num_features = model.classifier[1].in_features
        model.classifier = torch.nn.Identity()
    else:
        print('Unknown model name: {}'.format(model_name))
        continue
    num_classes = num_features
    model = model.eval().to(device)
    output_dir = os.path.join(output_root_dir, model_name)
    os.makedirs(output_dir, exist_ok=True)

    for folder_name in tqdm(os.listdir(input_dir)):
        folder_dir = os.path.join(input_dir, folder_name)
        if not os.path.isdir(folder_dir):
            continue

        image_tensors = []
        for image_filename in os.listdir(folder_dir):
            if not image_filename.lower().endswith(".png") and not image_filename.lower().endswith(".jpg"):
                continue
            counter += 1
            image_path = os.path.join(folder_dir, image_filename)

            image = Image.open(image_path).convert('RGB')
            image_tensor = transform[model_name](image).unsqueeze(0).to(device)
            image_tensors.append(image_tensor)
        
        if len(image_tensors) > 0:
            input_tensors = torch.cat(image_tensors)
            with torch.no_grad():
                avg_features = model(input_tensors).squeeze(0).mean(dim=0).cpu().numpy()
        else:
            avg_features = np.zeros(num_features)

        output_filename = '{}.npy'.format(folder_name)
        output_path = os.path.join(output_dir, output_filename)
        np.save(output_path, avg_features)
--- a/extract_recipe_vector.py
+++ b/extract_recipe_vector.py
@@ -0,0 +1,50 @@
 import json
 import numpy as np
 import os
 import torch
 from tqdm import tqdm

 from transformers import (BertTokenizer, BertModel,
                          GPT2Tokenizer, GPT2Model,
                          RobertaTokenizer, RobertaModel,
                          ElectraTokenizer, ElectraModel,
                          DistilBertTokenizer, DistilBertModel)

 models = {
    'bert-base-uncased': (BertTokenizer, BertModel),
 }

 #json_path = 'Data/val.json'
 json_path = 'Data/train.json'
 output_dir = 'text-features'

 with open(json_path, 'r') as f:
    data = json.load(f)

 counter = 0
 for model_name, (Tokenizer, Model) in models.items():
    tokenizer = Tokenizer.from_pretrained(model_name)
    max_size = tokenizer.max_model_input_sizes[model_name]
    model = Model.from_pretrained(model_name)
    model.to("cuda")
    
    for datapoint in tqdm(data):
        instructions = " ".join(datapoint['instructions'])
        if "gpt" in model_name:
            tokenized_instructions = tokenizer.encode(instructions, add_special_tokens=True)[:max_size]
        else:
            tokenized_instructions = [tokenizer.encode(instructions, add_special_tokens=True)[:max_size]]
            
        input_ids = torch.tensor(tokenized_instructions)
        # print(input_ids.shape)
        with torch.no_grad():
            outputs = model(input_ids.to("cuda"))
            if "gpt" in model_name:
                embeddings = outputs.last_hidden_state[0, :].detach().cpu().numpy()
            else:
                embeddings = outputs.last_hidden_state[:, 0, :].detach().cpu().numpy()
            # print(embeddings.shape)
        output_filename = '{}.npy'.format(datapoint['id'])
        output_path = os.path.join(output_dir, model_name, output_filename)
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        np.save(output_path, embeddings)
--- a/network.py
+++ b/network.py
@@ -0,0 +1,155 @@
 from torch.nn import Module
 from torch import nn
 import torch
 import torch.nn.functional as F
 from einops import reduce
 from gensim.models import FastText
 import numpy as np
 import json
 epsilon = 1e-8
 import pickle
 VECTORIZER_SIZE = 1500

 class EmbedderFasttext():
    def __init__(self, path):
        self.model = FastText.load(path)
        print(f'sFastText Embedding Loaded:\n\t Embedding Size = {self.model.wv.vector_size}\n\t Vocabulary Size = {self.model.wv.vectors.shape[0]}')
    
    def has(self, word):
        if word == "":
            return False
        return True
    
    def get(self, word):
        words = word.split('_')
        out = np.zeros(self.model.wv.vector_size)
        n = len(words)
        if n == 0:
            raise ValueError('Empty string was given.')
        for item in words:
            out += self.model.wv.get_vector(item) / n
        return list(out)

 class Transformer(Module):
    def __init__(self, input_size, nhead, num_layers, dim_feedforward, num_classes, aggregate = True):
        super(Transformer, self).__init__()
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=input_size, dim_feedforward=dim_feedforward,nhead=nhead, batch_first=True)
        self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_layers)
        self.aggregate = aggregate
        if self.aggregate:
            self.linear = nn.Linear(input_size, num_classes, True)
        
    def forward(self, x, padding_mask):
        out = self.transformer_encoder(x, src_key_padding_mask=padding_mask)
        if self.aggregate:
            out = (out* ~padding_mask.unsqueeze(-1)).sum(dim=1)
            out = self.linear(torch.relu(out))
        return out
        

 class ImageTextTransformer(Module):
    def __init__(self, config):
        super(ImageTextTransformer, self).__init__()
        self.embedding_size = config.data.embedding_size
        self.custom_embed = False
        self.layers = config.model.ingredient_feature_extractor.layers
        
        if "G" in config.model.ingredient_feature_extractor.layers:
            assert False, "No GNN for this model"
            
        self.use_recipe_text = config.use_recipe_text
        self.use_text_ingredients = config.use_text_ingredients
        self.use_image_ingredients = config.use_image_ingredients
        
        if not self.use_recipe_text and not self.use_text_ingredients and not self.use_image_ingredients:
            raise Exception("The model can't work without any features")
        
        
        if self.use_text_ingredients or self.use_image_ingredients:
            transformer_input_feature_size = 0
            
            if self.use_image_ingredients:
                transformer_input_feature_size += config.model.image_feature_size
            if self.use_text_ingredients:
                transformer_input_feature_size += self.embedding_size
            
            blocks = [
                    Transformer(
                        input_size=transformer_input_feature_size, 
                        nhead=config.model.ingredient_feature_extractor.transformer.n_heads,
                        num_layers=config.model.ingredient_feature_extractor.transformer.L, 
                        dim_feedforward=config.model.ingredient_feature_extractor.H, 
                        num_classes=config.model.ingredient_feature_extractor.final_ingredient_feature_size if i==len(config.model.ingredient_feature_extractor.layers)-1 else None, 
                        aggregate = (i==len(config.model.ingredient_feature_extractor.layers)-1)
                    ) for i, m in enumerate(config.model.ingredient_feature_extractor.layers)
                ]
            self.ingredient_feature_module = nn.ModuleList(blocks)
        
        feature_size = {
            'resnet18': 512,
            'resnet50': 2048,
            'resnet101': 2048,
            'efficientnet_b0': 1280,
            'efficientnet_b3': 1536,
            'bert-base-uncased': 768,
        }
        
        if self.use_image_ingredients:
            self.image_feature_extractor = torch.nn.Linear(feature_size[config.image_model], config.model.image_feature_size)

        if self.use_recipe_text:
            self.text_feature_extractor = torch.nn.Linear(feature_size[config.text_model], config.model.text_feature_size)
            
        classifier_input_size = 0
        if self.use_image_ingredients or self.use_text_ingredients:
            classifier_input_size += config.model.ingredient_feature_extractor.final_ingredient_feature_size
        if self.use_recipe_text:
            classifier_input_size += config.model.text_feature_size
        
        self.classifier = torch.nn.Sequential(
            torch.nn.Linear(classifier_input_size, 300),
            torch.nn.ReLU(),
            torch.nn.Linear(300, 300),
            torch.nn.ReLU(),
            torch.nn.Linear(300, config.model.final_classes)
        )
             
    def forward(self, embeddings, mask, image_ingredients, recipe_embeddings):
        if self.use_recipe_text:
            text_features = self.text_feature_extractor(recipe_embeddings)
        
        if self.use_image_ingredients:
            image_features = self.image_feature_extractor(image_ingredients)
            
        if self.use_image_ingredients or self.use_text_ingredients:    
            if self.use_text_ingredients and self.use_image_ingredients:
                ingredient_features = torch.cat([embeddings, image_features], dim = 2)
            elif self.use_text_ingredients:
                ingredient_features = embeddings
            else:
                ingredient_features = image_features
            out = ingredient_features
            
            for i, m in enumerate(self.layers):
                if m == "T":
                    out = self.ingredient_feature_module[i](out, ~mask)
                else:
                    raise Exception("Invalid module")
                
            aggregated_ingredient_features = out
            
            if self.use_recipe_text:
                recipe_features = torch.cat([text_features, aggregated_ingredient_features], dim=1)
            else:
                recipe_features = aggregated_ingredient_features
        else:
            recipe_features = text_features
        
        final_result = self.classifier(torch.nn.functional.relu(recipe_features))
        return final_result

    def freeze_features(self):
        self.feature_extractor.eval()
    
    def freeze_function(self):
        self.classifier.eval()
--- a/train.py
+++ b/train.py
@@ -0,0 +1,203 @@
 from datetime import datetime
 import os

 experiment_name = 'Parham BS Project Region Prediction'
 experiment_code = experiment_name.replace(' - ', '.').replace(' ', '_').lower()


 import nltk


 # nltk.download('wordnet')
 # nltk.download('omw-1.4')
 # nltk.download('punkt')


 import json
 import numpy as np

 from torch.utils.data import Dataset, DataLoader
 from torch.nn import Module
 import torch
 import json
 from tqdm import tqdm
 from gensim.models import FastText
 from utils.sam import SAM
 from utils.bypass_bn import enable_running_stats, disable_running_stats
 from einops import reduce
 from utils.recipedb_dataset import RecipeDBDataset

 import logging
 import argparse
 from tqdm import tqdm
 import mlflow
 import mlflow.pytorch
 logging.basicConfig(level=logging.WARN)
 logger = logging.getLogger(__name__)

 from network import ImageTextTransformer
 from utils.io import load_config, save_config
 print("here")

 mlflow.set_experiment(experiment_name)

 parser = argparse.ArgumentParser()
 parser.add_argument('--config', type=str)


 args = parser.parse_args()
 config = load_config(args.config)

 epochs = config.optim.epochs
 batch_size = config.optim.batch_size
 learning_rate = config.optim.max_lr
 weight_decay = config.optim.weight_decay
 embedding_size = config.data.embedding_size
 num_classes = config.model.final_classes
 sam_rho = config.optim.sam_rho
 num_workers = config.optim.num_workers
 data_path = config.data.dataset_path
 target = config.data.target
 target_dictionary = json.load(open(os.path.join(data_path, f'{target}.json'), 'r'))
 if 'entropy' in config.optim:
    entropy_weight = config.optim.entropy
 else:
    entropy_weight = 0
 config.model.final_classes= len(target_dictionary)
 epsilon = 1e-8
 print(target)
 print(target_dictionary)
 output_dir =  f'parham-models_image_taext_transformer/{target}/{datetime.now().strftime("%Y-%m-%d_%H-%M-%S")}'
 if not os.path.isdir(output_dir):
    os.makedirs(output_dir, exist_ok=True)

 class EmbedderFasttext():
    def __init__(self, path):
        self.model = FastText.load(path)
        print(f'sFastText Embedding Loaded:\n\t Embedding Size = {self.model.wv.vector_size}\n\t Vocabulary Size = {self.model.wv.vectors.shape[0]}')
    
    def has(self, word):
        if word == "":
            return False
        return True
    
    def get(self, word):
        words = word.split('_')
        out = np.zeros(self.model.wv.vector_size)
        n = len(words)
        if n == 0:
            raise ValueError('Empty string was given.')
        for item in words:
            out += self.model.wv.get_vector(item) / n
        return list(out)
 embedder = EmbedderFasttext(config.data.fasttext_path)

 datasets = {
    "train": RecipeDBDataset(os.path.join(data_path, 'train.json'), 
            cousine_dict=target_dictionary,
            extract_ingredients=True, extract_recipes=True, extract_cousine=(target != 'category'), 
            embedder=embedder, target=target, occr_path=os.path.join(data_path, "ingredient_counts.json"),
            mask_path=os.path.join(data_path, "ingredient_counts.json"), include_id=True, image_model = config.image_model),
    
    "val": RecipeDBDataset(os.path.join(data_path, "val.json"), 
            cousine_dict=target_dictionary,
            extract_ingredients=True, extract_recipes=True, extract_cousine=(target != 'category'), 
            embedder=embedder, target=target, occr_path=os.path.join(data_path, "ingredient_counts.json"),
            mask_path=os.path.join(data_path, "ingredient_counts.json"), include_id=True, image_model = config.image_model)
 }
 print('Dataset constructed.')
 print(len(datasets['train']), len(datasets['val']))
 print(f'target: {target}')
 print(f'number of classes: {len(target_dictionary)}')

 device = config.optim.device

 dataloaders = {
    "train":DataLoader(datasets["train"], batch_size=batch_size, collate_fn=datasets['train'].rdb_collate, shuffle=True, num_workers=num_workers),
    "val":DataLoader(datasets["val"], batch_size=batch_size, collate_fn=datasets['val'].rdb_collate, shuffle=False,num_workers=num_workers)
 }
 loss_fn = torch.nn.CrossEntropyLoss().to(device)
 print('Dataloader constructed.')

 model = ImageTextTransformer(config)
 print(model)
 model = model.to(device)
 optimizer = SAM(model.parameters(), rho=sam_rho, base_optimizer=torch.optim.Adam, lr=learning_rate/10, weight_decay=weight_decay)
 scheduler = torch.optim.lr_scheduler.OneCycleLR(max_lr = learning_rate, epochs=epochs, steps_per_epoch=len(dataloaders["train"]), optimizer=optimizer.base_optimizer)

 def stable_log_sigmoid(x):
    max_value = torch.maximum(x, torch.zeros(*x.shape, dtype=torch.float32, device=x.device))
    return -max_value - torch.log(torch.exp(-max_value) + torch.exp(x - max_value))

 def argtopk(tensor, k, dim):
    indices = torch.argsort(tensor, dim=dim, descending=True)
    topk_indices = indices.narrow(dim, 0, k)
    return topk_indices

 with mlflow.start_run():
    mlflow.log_params(dict(config))
    result = None
    best_val_acc = 0
    best_val_top3 = 0
    best_val_top5 = 0
    for epoch in range(epochs):
        for mode in ["train", "val"]:
            if mode == 'train':
                model.train()       
            else:
                model.eval()
            running_loss = 0.0
            running_corrects = 0
            top_5_corrects = 0
            top_3_corrects = 0 
            num_samples = 0
            s = 0
            for data_batch in tqdm(dataloaders[mode]):
                embeddings= data_batch['ingredients'].to(device)
                masks = data_batch['masks'].to(device)
                targets = data_batch['cousines'].to(device) if 'cousines' in data_batch else data_batch['targets'].to(device)
                image_ingredients = data_batch['image_ingredients'].to(device)
                recipe_embeddings = data_batch['recipe_embeddings'].to(device)
                with torch.set_grad_enabled(mode == 'train'):
                    enable_running_stats(model)
                    out = model(embeddings, masks, image_ingredients, recipe_embeddings)
                    entropy = -torch.sum(torch.sigmoid(out) * stable_log_sigmoid(out)) / embeddings.shape[0]
                    loss = loss_fn(out, targets) + entropy_weight * entropy
                    if mode == 'train':
                        loss.backward()
                        optimizer.first_step(zero_grad=True)
                        disable_running_stats(model)
                        out = model(embeddings, masks, image_ingredients, recipe_embeddings)
                        entropy = -torch.sum(torch.sigmoid(out) * stable_log_sigmoid(out)) / embeddings.shape[0]
                        (loss_fn(out, targets) + entropy_weight * entropy).backward()       
                        optimizer.second_step(zero_grad=True)
                        scheduler.step()

                running_loss+=loss.item()*embeddings.shape[0]
                running_corrects += (out.argmax(dim=1) == targets).sum().item()
                num_samples+=embeddings.shape[0]
                top_5_corrects += (argtopk(out, k=5, dim=1) == targets.unsqueeze(1)).sum().item()
                top_3_corrects += (argtopk(out, k=3, dim=1) == targets.unsqueeze(1)).sum().item()
            print(f"epoch: {epoch}, loss: {running_loss/num_samples}, acc: {running_corrects/num_samples}, top3: {top_3_corrects/num_samples}, top5: {top_5_corrects/num_samples}")
            if mode=="val":
                best_val_acc = running_corrects/num_samples*100 if running_corrects/num_samples*100 > best_val_acc else best_val_acc
                best_val_top3 = top_3_corrects/num_samples*100 if top_3_corrects/num_samples*100 > best_val_top3 else best_val_top3
                best_val_top5 = top_5_corrects/num_samples*100 if top_5_corrects/num_samples*100 > best_val_top5 else best_val_top5
            metrics = {
                '{}_loss'.format(mode): running_loss/num_samples,
                '{}_acc'.format(mode): running_corrects/num_samples*100,
                '{}_acc3'.format(mode): top_3_corrects/num_samples*100,
                '{}_acc5'.format(mode): top_5_corrects/num_samples*100
                }
            if mode == 'val': 
                metrics["best_val_acc"] = best_val_acc
                metrics["best_val_acc3"] = best_val_top3
                metrics["best_val_acc5"] = best_val_top5
                
                result = running_corrects/num_samples*100
            mlflow.log_metrics(metrics)
    os.makedirs(output_dir, exist_ok=True)
    mlflow.pytorch.log_model(model, 'model')
    config.result = result
    torch.save(model.state_dict(), os.path.join(output_dir, "checkpoint.pth"))
    save_config(config, os.path.join(output_dir, "config.yml"))
--- a/utils/__init__.py
+++ b/utils/__init__.py
--- a/utils/bypass_bn.py
+++ b/utils/bypass_bn.py
@@ -0,0 +1,17 @@
 import torch
 import torch.nn as nn

 def disable_running_stats(model):
    def _disable(module):
        if isinstance(module, nn.BatchNorm2d):
            module.backup_momentum = module.momentum
            module.momentum = 0

    model.apply(_disable)

 def enable_running_stats(model):
    def _enable(module):
        if isinstance(module, nn.BatchNorm2d) and hasattr(module, "backup_momentum"):
            module.momentum = module.backup_momentum

    model.apply(_enable)
--- a/utils/fasttext_embedding.py
+++ b/utils/fasttext_embedding.py
@@ -0,0 +1,27 @@
 from gensim.corpora.dictionary import Dictionary
 import logging
 from pyemd import emd
 from nltk.corpus import stopwords
 import fasttext
 import json
 import numpy as np

 logger = logging.getLogger(__name__)

 class FasttextEmbedding:
  def __init__(self, model_path):
    if model_path.endswith('.bin'):
      self.model = fasttext.load_model(model_path)
      self.full = True
    else:
      self.model = np.load(model_path)
      self.full = False
    self.stopwords = stopwords.words('english')

  def __getitem__(self, idx):
    if self.full:
      return self.model.get_word_vector(idx)
    else:
      if idx not in self.model:
        raise ValueError('Word not available.')
      return self.model[idx]
--- a/utils/io.py
+++ b/utils/io.py
@@ -0,0 +1,12 @@
 import yaml
 from easydict import EasyDict as edict
 import json 

 def load_config(path):
    with open(path, 'r', encoding='utf8') as f:
        return edict(yaml.safe_load(f))
    
 def save_config(config, path):
    x = json.loads(json.dumps(config))
    with open(path, 'w', encoding='utf8') as f:
        yaml.dump(x, f, default_flow_style=False, allow_unicode=True)
--- a/utils/recipedb_dataset.py
+++ b/utils/recipedb_dataset.py
@@ -0,0 +1,185 @@
 from typing import Any
 import torch
 from torch.utils.data import Dataset
 import json
 import numpy as np
 from torch.nn.utils.rnn import pad_sequence
 import warnings
 import os
 warnings.filterwarnings(action='ignore',category=UserWarning,module='gensim')  
 warnings.filterwarnings(action='ignore',category=FutureWarning,module='gensim') 
 def mask_count(num):
    return num//5

 def generate_ing_dict(path, threshold):
    assert path != None
    with open(path, "r") as json_file:
        full_ing_count_list:dict = json.load(json_file)
        filtered_ing_list = {}
        counter = 0
        for ing, count in full_ing_count_list.items():
            if count > threshold:
                filtered_ing_list[ing] = counter
                counter += 1
        return filtered_ing_list

 def get_ingredient_frequencies(occr_path):
    occr = None
    with open(occr_path, "r") as json_file:
        occr = json.load(json_file)
    if '' in occr:
        del occr[''] 
    return occr
    
 class RecipeDBDataset(Dataset):
    def __init__(self, json_path, cousine_dict=None,
                 extract_ingredients=False, extract_recipes=False, extract_cousine=False,
                 embedder=None, include_id=False, mask_threshold=1000, mask_path=None, 
                 occr_path = None, target='country',
                 image_model="resnet18") -> None:
        super(RecipeDBDataset, self).__init__()

        with open(json_path, "r") as json_file:
            data = json.load(json_file)

        if occr_path is not None:
            self.freqs = get_ingredient_frequencies(occr_path)
            self.all_ingredients, self.all_ingredient_probs = zip(*sorted(self.freqs.items()))
            self.all_ingredients = list(self.all_ingredients)
            self.all_ingredient_probs = np.array(self.all_ingredient_probs, dtype=np.float32)
            self.all_ingredient_probs /= np.sum(self.all_ingredient_probs)
        self.ing_dict:dict = generate_ing_dict(mask_path, mask_threshold)
        self.len_mask_ing = len(self.ing_dict)
        self.data = []
        self.embedder = embedder
        self.extract_ingredients = extract_ingredients
        self.extract_recipes = extract_recipes
        self.extract_cousine = extract_cousine
        self.ingredient_set = set()
        
        self.image_path = "Data/image_dict_ings.json"
        with open(self.image_path, 'r') as jf:
            self.image_ing_dict = json.load(jf)
            
        self.image_feature_path = "/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full"
        feature_size = {
            'resnet18': 512,
            'resnet50': 2048,
            'resnet101': 2048,
            'efficientnet_b0': 1280,
            'efficientnet_b3': 1536,
            'efficientnet_t0': 1280
        }
        self.image_model = image_model
        self.image_feature_size = feature_size[self.image_model]
        self.not_found_ings = set()
        
        self.text_feature_path = "/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features"
        self.text_feature_model = "bert-base-uncased"
        
        failed_ing_count = 0
        for recipe in data:
            temp_data = {}
            if extract_ingredients:
                temp_data["ingredients"] = []
                for ing in recipe["ingredients"]:
                    if ing["Ingredient Name"] != "":
                        temp_data["ingredients"].append(ing["Ingredient Name"])
                
                if len(temp_data["ingredients"]) == 0:
                    failed_ing_count += 1
                    continue
            if extract_cousine:
                temp_data["cousine"] = cousine_dict[recipe[target]]
            if include_id:
                temp_data["id"] = recipe["id"]
            self.data.append(temp_data)
        self.cousine_dict = cousine_dict

        print(f"failed ings count: {failed_ing_count}")
        
    def __getitem__(self, index: Any):
        d = self.data[index]
        out = {}
        ings = []
        if self.extract_ingredients:
            for ing in d["ingredients"]:
                if self.embedder.has(ing):
                    ings.append(self.embedder.get(ing))
            ings = torch.tensor(ings, dtype=torch.float32)
            
            image_ingredients = []
            
            for ing in d["ingredients"]:
                npy_path = ""
                if ing in self.image_ing_dict:
                    npy_path = os.path.join(self.image_feature_path, self.image_model, f"{ing}.npy")
                elif ing.replace(" ", "_") in self.image_ing_dict:
                    npy_path = os.path.join(self.image_feature_path, self.image_model, f"{ing.replace(' ', '_')}.npy")
                else:
                    for ing_part in ing.split():
                        if ing_part in self.image_ing_dict:
                            npy_path = os.path.join(self.image_feature_path, self.image_model, f"{ing_part}.npy")
                            break
                    else:
                        self.not_found_ings.add(ing)
                if npy_path == "":
                    image_ingredients.append(np.zeros(self.image_feature_size))
                else:
                    image_ingredients.append(np.load(npy_path))
                    
            image_ingredients = torch.tensor(image_ingredients, dtype=torch.float32)
            out["ingredients"] = ings
            out["image_ingredients"] = image_ingredients

        if self.extract_recipes:
            out["recipe_embedding"] = torch.tensor(np.load(os.path.join(self.text_feature_path, self.text_feature_model, f'{d["id"]}.npy')), dtype=torch.float32)
        if self.extract_cousine:
            out["cousine"] = d["cousine"]
        return out


    def __len__(self):
        return self.data.__len__()


    def rdb_collate(self, batch): 
        cousines = []
        ingredients = []
        masks = []
        image_ingredients = []
        recipe_embeddings  = []
        for data in batch:
            if "cousine" in data:
                cousines.append(data["cousine"])
            if "recipe_embedding" in data:
                recipe_embeddings.append(data["recipe_embedding"])
            if "ingredients" in data:
                ingredients.append(data["ingredients"])
                masks.append(torch.ones(data["ingredients"].shape[0]))
                image_ingredients.append(data["image_ingredients"])

        outs = {}
        if "ingredients" in data:
            masks = pad_sequence(masks, batch_first=True, padding_value=0).type(torch.bool)

            ingredients = pad_sequence(ingredients, batch_first=True, padding_value=0)
            image_ingredients = pad_sequence(image_ingredients, batch_first=True, padding_value=0)
            outs["masks"] = masks
            outs["ingredients"] = ingredients
            outs["image_ingredients"] = image_ingredients
        if "recipe_embedding" in data:
            outs["recipe_embeddings"] = torch.cat(recipe_embeddings, dim=0)
        if "cousine" in data:
            cousines = torch.LongTensor(cousines)
            outs["cousines"] = cousines
        return outs

 def dict_to_device(data:dict, device, return_new_dict=False):
    new_dict = {}
    for k, v in data.items():
        if not return_new_dict:
            data[k] = v.to(device)
        else:
            new_dict[k] = v.to(device)
    return new_dict if return_new_dict else data
--- a/utils/sam.py
+++ b/utils/sam.py
@@ -0,0 +1,62 @@
 import torch


 class SAM(torch.optim.Optimizer):
    def __init__(self, params, base_optimizer, rho=0.05, adaptive=False, **kwargs):
        assert rho >= 0.0, f"Invalid rho, should be non-negative: {rho}"

        defaults = dict(rho=rho, adaptive=adaptive, **kwargs)
        super(SAM, self).__init__(params, defaults)

        self.base_optimizer = base_optimizer(self.param_groups, **kwargs)
        self.param_groups = self.base_optimizer.param_groups

    @torch.no_grad()
    def first_step(self, zero_grad=False):
        grad_norm = self._grad_norm()
        for group in self.param_groups:
            scale = group["rho"] / (grad_norm + 1e-12)

            for p in group["params"]:
                if p.grad is None: continue
                self.state[p]["old_p"] = p.data.clone()
                e_w = (torch.pow(p, 2) if group["adaptive"] else 1.0) * p.grad * scale.to(p)
                p.add_(e_w)  # climb to the local maximum "w + e(w)"

        if zero_grad: self.zero_grad()

    @torch.no_grad()
    def second_step(self, zero_grad=False):
        for group in self.param_groups:
            for p in group["params"]:
                if p.grad is None: continue
                p.data = self.state[p]["old_p"]  # get back to "w" from "w + e(w)"

        self.base_optimizer.step()  # do the actual "sharpness-aware" update

        if zero_grad: self.zero_grad()

    @torch.no_grad()
    def step(self, closure=None):
        assert closure is not None, "Sharpness Aware Minimization requires closure, but it was not provided"
        closure = torch.enable_grad()(closure)  # the closure should do a full forward-backward pass

        self.first_step(zero_grad=True)
        closure()
        self.second_step()

    def _grad_norm(self):
        shared_device = self.param_groups[0]["params"][0].device  # put everything on the same device, in case of model parallelism
        norm = torch.norm(
                    torch.stack([
                        ((torch.abs(p) if group["adaptive"] else 1.0) * p.grad).norm(p=2).to(shared_device)
                        for group in self.param_groups for p in group["params"]
                        if p.grad is not None
                    ]),
                    p=2
               )
        return norm

    def load_state_dict(self, state_dict):
        super().load_state_dict(state_dict)
        self.base_optimizer.param_groups = self.param_groups