# Required data for running on other servers | |||||
- Crawled Images (Resized to 384): (172.27.50.254):/home/dml/food/CuisineAdaptation/crawled-images-full-384 | |||||
- FastText Model Folder: (172.27.50.254):/home/dml/food/CuisineAdaptation/fasttext | |||||
- Extracted Image Features: (172.27.50.254):/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full | |||||
- Extracted Text Features: (172.27.50.254):/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features | |||||
- Crawled Images (Original Size): | |||||
The "Crawled Images (Original Size)" path contains the original images of ingredients that were obtained from Google. Afterwards, these images were resized to 384 to facilitate their transfer between servers. Since our image models use a maximum size of 300 for input images, a larger size is not necessary, making this reduction in size very convenient for us. The original data folder is over 200 GB in size, while the resized folder still contains over 20 GB of data, therefore, only their paths are provided here. These folders **are not required** to run the final model since their features are extracted using pre-trained models that we use to run our model. | |||||
The FastText model is a non-contextual model used to extract embeddings of ingredient names. The model is approximately 1 GB in size, so only its path is provided here. This model **is required** to run the final training code. | |||||
The "Extracted Image Features" path refers to the folder containing the extracted features from ingredient images using pre-trained image models. These image features **are necessary** to run the main training code. | |||||
The "Extracted Text Features" path refers to the folder containing the extracted features from recipes using the BERT model. These features **are also required** to run the main training code. | |||||
# Structure of the Available Files | |||||
## Data | |||||
This folder contains the following files: | |||||
- train.json: train split of RecipeDB dataset | |||||
- val.json: validation split of RecipeDB dataset | |||||
- region.json: a JSON file listing all of the regions and assigning a number to each one of them | |||||
- ingredient_counts.json: a JSON file showing a list of all of the ingredients in RecipeDB dataset and their count in the whole dataset. | |||||
- image_dict_ings.json: a list of crawled image names. | |||||
## Utils | |||||
The following list explains all the files that appear in the utils folder: | |||||
- fasttext_embedding.py: a Python util file that provides functions to get embeddings from FastText. | |||||
- io.py: utility functions that help in loading and saving the config. | |||||
- bypass_bn.py: a file that contains functions to handle Batch Normalizations. | |||||
- recipedb_dataset.py: an implementation of RecipeDB dataset using PyTorch Dataset class. | |||||
- sam.py: an implementation of SAM optimizer, which is used in this project. | |||||
## Others | |||||
- extract_image_vector.py: used for ingredient visual embedding extraction. | |||||
- extract_recipe_vector.py: used for recipe text embedding extraction. | |||||
- network.py: implementation of PyTorch Image-Text-Transformer model that is required for solving the problem. | |||||
- train.py: code used for loading the data, creating the model, and feeding the data to the model. | |||||
- best_config.yml: YAML config file used for specifying the hyperparameters of the model. | |||||
# How to extract features | |||||
## Text features | |||||
We can extract text features from RecipeDB's recipes using the `extract_recipe_vector.py` python file. This file defines a path to the JSON data file. The JSON files (`Data/train.json` and `Data/val.json`) are the files that this script uses to extract embeddings. This script also defines an output path at the beginning of the file. The output path is the location where the final embeddings will be saved. Below is the command for running this script: | |||||
```bash | |||||
python3 extract_recipe_vector.py | |||||
``` | |||||
## Image features | |||||
Image features can be extracted using the `extract_image_vector.py` code, which specifies the following text fields at its beginning: | |||||
```python | |||||
input_dir = '/home/dml/food/CuisineAdaptation/crawled-images-full-384' | |||||
output_root_dir = 'image-features-full' | |||||
``` | |||||
Using these fields, we can define the path to the input image folder (which contains images of different ingredients resized to 384x384) and the output root directory, which indicates where the embeddings will be saved. | |||||
This script will load and run five pretrained models on the input data and save their embeddings in the output folder. Keep in mind that the output embedding for an ingredient is the average of all the embeddings extracted from its corresponding images. | |||||
# How to run the train code | |||||
The code only takes a configuration file as input and can be run solely using the configuration. The command for running the training code is as follows: | |||||
```bash | |||||
python3 train.py --config best_config.yml | |||||
``` | |||||
# Config | |||||
```yml | |||||
optim: | |||||
epochs: "Number of epochs for training" :> int | |||||
batch_size: "Batch size for the dataset" :> int | |||||
max_lr: "Max learning rate to pass to the scheduler" :> float | |||||
weight_decay: "Weight decay value to pass to optimizer" :> float | |||||
device: "Device to use for training, either 'cuda' or 'cpu'" :> str | |||||
num_workers: "Number of workers for dataloader" :> int | |||||
sam_rho: "Hyperparameter for SAM optimizer" :> float | |||||
text_model: "bert-base-uncased" :> str | |||||
image_model: "Name of the image model. Available values: resnet18, resnet50, resnet101, efficientnet_b0, efficientnet_b3" :> str | |||||
image_features_path: "Path for the extracted image features" :> str | |||||
text_features_path: "Path for the extracted text features" :> str | |||||
use_recipe_text: "Should the model use recipe embeddings?" :> bool | |||||
use_image_ingredients: "Should the model use ingredient image embeddings?" :> bool | |||||
use_text_ingredients: "Should the model use ingredient text embeddings?" :> bool | |||||
model: | |||||
ingredient_feature_extractor: | |||||
layers: "Number of transformer blocks like T or TTTT" :> str | |||||
H: "Embedding size for the transformer" :> int | |||||
transformer: | |||||
L: "Number of layers for each transformer block" :> int | |||||
n_heads: "Number of heads for each transformer block" :> int | |||||
final_ingredient_feature_size: "What is the ingredient feature size after we get the output from the transformer?" :> int | |||||
image_feature_size: "What is the size of the image features reduced to in the beginning?" :> int | |||||
text_feature_size: "What is the size of the text features from recipes reduced to?" :> int | |||||
final_classes: "This will be replaced in the code. Just set it to -1." :> int | |||||
data: | |||||
embedding_size: "What is the embedding size of ingredient text features?" :> int | |||||
dataset_path: "Path for the RecipeDB dataset" :> str | |||||
fasttext_path: "Path to the fasttext .model file" :> str | |||||
target: "Type of target. Should be 'region' for this project." :> str | |||||
``` |
optim: | |||||
epochs: 50 | |||||
batch_size: 256 | |||||
max_lr: 0.001 | |||||
weight_decay: 0.0005 | |||||
device: "cuda:0" | |||||
num_workers: 4 | |||||
sam_rho: 0.1 | |||||
text_model: "bert-base-uncased" | |||||
image_model: "efficientnet_b0" | |||||
image_features_path: "/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full" | |||||
text_features_path: "/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features" | |||||
use_recipe_text: True | |||||
use_image_ingredients: True | |||||
use_text_ingredients: True | |||||
model: | |||||
ingredient_feature_extractor: | |||||
layers: "TTTTT" | |||||
H: 384 | |||||
transformer: | |||||
L: 4 | |||||
n_heads: 4 | |||||
final_ingredient_feature_size: 200 | |||||
image_feature_size: 200 | |||||
text_feature_size: 200 | |||||
final_classes: -1 | |||||
data: | |||||
embedding_size: 100 | |||||
dataset_path: "Data" | |||||
fasttext_path: "trained_1m.model" | |||||
target: "region" |
import os | |||||
import numpy as np | |||||
import torch | |||||
import torchvision.models as models | |||||
import torchvision.transforms as transforms | |||||
from PIL import Image | |||||
from tqdm import tqdm | |||||
import warnings | |||||
warnings.filterwarnings("ignore") | |||||
models_names = [ | |||||
'efficientnet_t0', | |||||
'resnet50', | |||||
'resnet101', | |||||
'efficientnet_b0', | |||||
'efficientnet_b3' | |||||
] | |||||
input_dir = '/home/dml/food/CuisineAdaptation/crawled-images-full-384' | |||||
output_root_dir = 'image-features-full' | |||||
image_size = { | |||||
'resnet18': 224, | |||||
'resnet50': 224, | |||||
'resnet101': 224, | |||||
'efficientnet_b0': 224, | |||||
'efficientnet_t0': 224, | |||||
'efficientnet_b3': 300 | |||||
} | |||||
normalize = { | |||||
'resnet18': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), | |||||
'resnet50': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), | |||||
'resnet101': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), | |||||
'efficientnet_t0': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), | |||||
'efficientnet_b3': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) | |||||
} | |||||
transform = { | |||||
'resnet18': transforms.Compose([ | |||||
transforms.Resize(image_size['resnet18']), | |||||
transforms.CenterCrop(image_size['resnet18']), | |||||
transforms.ToTensor(), | |||||
normalize['resnet18'] | |||||
]), | |||||
'resnet50': transforms.Compose([ | |||||
transforms.Resize(image_size['resnet50']), | |||||
transforms.CenterCrop(image_size['resnet50']), | |||||
transforms.ToTensor(), | |||||
normalize['resnet50'] | |||||
]), | |||||
'resnet101': transforms.Compose([ | |||||
transforms.Resize(image_size['resnet101']), | |||||
transforms.CenterCrop(image_size['resnet101']), | |||||
transforms.ToTensor(), | |||||
normalize['resnet101'] | |||||
]), | |||||
'efficientnet_t0': transforms.Compose([ | |||||
transforms.Resize(image_size['efficientnet_t0']), | |||||
transforms.CenterCrop(image_size['efficientnet_t0']), | |||||
transforms.ToTensor(), | |||||
normalize['efficientnet_t0'] | |||||
]), | |||||
'efficientnet_b3': transforms.Compose([ | |||||
transforms.Resize(image_size['efficientnet_b3']), | |||||
transforms.CenterCrop(image_size['efficientnet_b3']), | |||||
transforms.ToTensor(), | |||||
normalize['efficientnet_b3'] | |||||
]) | |||||
} | |||||
device = torch.device("cuda") | |||||
counter = 0 | |||||
for model_name in models_names: | |||||
if 'resnet' in model_name: | |||||
model = getattr(models, model_name)(pretrained=True) | |||||
num_features = model.fc.in_features | |||||
model.fc = torch.nn.Identity() | |||||
elif 'efficientnet' in model_name: | |||||
model = getattr(models, model_name)(pretrained=True) | |||||
num_features = model.classifier[1].in_features | |||||
model.classifier = torch.nn.Identity() | |||||
else: | |||||
print('Unknown model name: {}'.format(model_name)) | |||||
continue | |||||
num_classes = num_features | |||||
model = model.eval().to(device) | |||||
output_dir = os.path.join(output_root_dir, model_name) | |||||
os.makedirs(output_dir, exist_ok=True) | |||||
for folder_name in tqdm(os.listdir(input_dir)): | |||||
folder_dir = os.path.join(input_dir, folder_name) | |||||
if not os.path.isdir(folder_dir): | |||||
continue | |||||
image_tensors = [] | |||||
for image_filename in os.listdir(folder_dir): | |||||
if not image_filename.lower().endswith(".png") and not image_filename.lower().endswith(".jpg"): | |||||
continue | |||||
counter += 1 | |||||
image_path = os.path.join(folder_dir, image_filename) | |||||
image = Image.open(image_path).convert('RGB') | |||||
image_tensor = transform[model_name](image).unsqueeze(0).to(device) | |||||
image_tensors.append(image_tensor) | |||||
if len(image_tensors) > 0: | |||||
input_tensors = torch.cat(image_tensors) | |||||
with torch.no_grad(): | |||||
avg_features = model(input_tensors).squeeze(0).mean(dim=0).cpu().numpy() | |||||
else: | |||||
avg_features = np.zeros(num_features) | |||||
output_filename = '{}.npy'.format(folder_name) | |||||
output_path = os.path.join(output_dir, output_filename) | |||||
np.save(output_path, avg_features) |
import json | |||||
import numpy as np | |||||
import os | |||||
import torch | |||||
from tqdm import tqdm | |||||
from transformers import (BertTokenizer, BertModel, | |||||
GPT2Tokenizer, GPT2Model, | |||||
RobertaTokenizer, RobertaModel, | |||||
ElectraTokenizer, ElectraModel, | |||||
DistilBertTokenizer, DistilBertModel) | |||||
models = { | |||||
'bert-base-uncased': (BertTokenizer, BertModel), | |||||
} | |||||
#json_path = 'Data/val.json' | |||||
json_path = 'Data/train.json' | |||||
output_dir = 'text-features' | |||||
with open(json_path, 'r') as f: | |||||
data = json.load(f) | |||||
counter = 0 | |||||
for model_name, (Tokenizer, Model) in models.items(): | |||||
tokenizer = Tokenizer.from_pretrained(model_name) | |||||
max_size = tokenizer.max_model_input_sizes[model_name] | |||||
model = Model.from_pretrained(model_name) | |||||
model.to("cuda") | |||||
for datapoint in tqdm(data): | |||||
instructions = " ".join(datapoint['instructions']) | |||||
if "gpt" in model_name: | |||||
tokenized_instructions = tokenizer.encode(instructions, add_special_tokens=True)[:max_size] | |||||
else: | |||||
tokenized_instructions = [tokenizer.encode(instructions, add_special_tokens=True)[:max_size]] | |||||
input_ids = torch.tensor(tokenized_instructions) | |||||
# print(input_ids.shape) | |||||
with torch.no_grad(): | |||||
outputs = model(input_ids.to("cuda")) | |||||
if "gpt" in model_name: | |||||
embeddings = outputs.last_hidden_state[0, :].detach().cpu().numpy() | |||||
else: | |||||
embeddings = outputs.last_hidden_state[:, 0, :].detach().cpu().numpy() | |||||
# print(embeddings.shape) | |||||
output_filename = '{}.npy'.format(datapoint['id']) | |||||
output_path = os.path.join(output_dir, model_name, output_filename) | |||||
os.makedirs(os.path.dirname(output_path), exist_ok=True) | |||||
np.save(output_path, embeddings) |
from torch.nn import Module | |||||
from torch import nn | |||||
import torch | |||||
import torch.nn.functional as F | |||||
from einops import reduce | |||||
from gensim.models import FastText | |||||
import numpy as np | |||||
import json | |||||
epsilon = 1e-8 | |||||
import pickle | |||||
VECTORIZER_SIZE = 1500 | |||||
class EmbedderFasttext(): | |||||
def __init__(self, path): | |||||
self.model = FastText.load(path) | |||||
print(f'sFastText Embedding Loaded:\n\t Embedding Size = {self.model.wv.vector_size}\n\t Vocabulary Size = {self.model.wv.vectors.shape[0]}') | |||||
def has(self, word): | |||||
if word == "": | |||||
return False | |||||
return True | |||||
def get(self, word): | |||||
words = word.split('_') | |||||
out = np.zeros(self.model.wv.vector_size) | |||||
n = len(words) | |||||
if n == 0: | |||||
raise ValueError('Empty string was given.') | |||||
for item in words: | |||||
out += self.model.wv.get_vector(item) / n | |||||
return list(out) | |||||
class Transformer(Module): | |||||
def __init__(self, input_size, nhead, num_layers, dim_feedforward, num_classes, aggregate = True): | |||||
super(Transformer, self).__init__() | |||||
self.encoder_layer = nn.TransformerEncoderLayer(d_model=input_size, dim_feedforward=dim_feedforward,nhead=nhead, batch_first=True) | |||||
self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_layers) | |||||
self.aggregate = aggregate | |||||
if self.aggregate: | |||||
self.linear = nn.Linear(input_size, num_classes, True) | |||||
def forward(self, x, padding_mask): | |||||
out = self.transformer_encoder(x, src_key_padding_mask=padding_mask) | |||||
if self.aggregate: | |||||
out = (out* ~padding_mask.unsqueeze(-1)).sum(dim=1) | |||||
out = self.linear(torch.relu(out)) | |||||
return out | |||||
class ImageTextTransformer(Module): | |||||
def __init__(self, config): | |||||
super(ImageTextTransformer, self).__init__() | |||||
self.embedding_size = config.data.embedding_size | |||||
self.custom_embed = False | |||||
self.layers = config.model.ingredient_feature_extractor.layers | |||||
if "G" in config.model.ingredient_feature_extractor.layers: | |||||
assert False, "No GNN for this model" | |||||
self.use_recipe_text = config.use_recipe_text | |||||
self.use_text_ingredients = config.use_text_ingredients | |||||
self.use_image_ingredients = config.use_image_ingredients | |||||
if not self.use_recipe_text and not self.use_text_ingredients and not self.use_image_ingredients: | |||||
raise Exception("The model can't work without any features") | |||||
if self.use_text_ingredients or self.use_image_ingredients: | |||||
transformer_input_feature_size = 0 | |||||
if self.use_image_ingredients: | |||||
transformer_input_feature_size += config.model.image_feature_size | |||||
if self.use_text_ingredients: | |||||
transformer_input_feature_size += self.embedding_size | |||||
blocks = [ | |||||
Transformer( | |||||
input_size=transformer_input_feature_size, | |||||
nhead=config.model.ingredient_feature_extractor.transformer.n_heads, | |||||
num_layers=config.model.ingredient_feature_extractor.transformer.L, | |||||
dim_feedforward=config.model.ingredient_feature_extractor.H, | |||||
num_classes=config.model.ingredient_feature_extractor.final_ingredient_feature_size if i==len(config.model.ingredient_feature_extractor.layers)-1 else None, | |||||
aggregate = (i==len(config.model.ingredient_feature_extractor.layers)-1) | |||||
) for i, m in enumerate(config.model.ingredient_feature_extractor.layers) | |||||
] | |||||
self.ingredient_feature_module = nn.ModuleList(blocks) | |||||
feature_size = { | |||||
'resnet18': 512, | |||||
'resnet50': 2048, | |||||
'resnet101': 2048, | |||||
'efficientnet_b0': 1280, | |||||
'efficientnet_b3': 1536, | |||||
'bert-base-uncased': 768, | |||||
} | |||||
if self.use_image_ingredients: | |||||
self.image_feature_extractor = torch.nn.Linear(feature_size[config.image_model], config.model.image_feature_size) | |||||
if self.use_recipe_text: | |||||
self.text_feature_extractor = torch.nn.Linear(feature_size[config.text_model], config.model.text_feature_size) | |||||
classifier_input_size = 0 | |||||
if self.use_image_ingredients or self.use_text_ingredients: | |||||
classifier_input_size += config.model.ingredient_feature_extractor.final_ingredient_feature_size | |||||
if self.use_recipe_text: | |||||
classifier_input_size += config.model.text_feature_size | |||||
self.classifier = torch.nn.Sequential( | |||||
torch.nn.Linear(classifier_input_size, 300), | |||||
torch.nn.ReLU(), | |||||
torch.nn.Linear(300, 300), | |||||
torch.nn.ReLU(), | |||||
torch.nn.Linear(300, config.model.final_classes) | |||||
) | |||||
def forward(self, embeddings, mask, image_ingredients, recipe_embeddings): | |||||
if self.use_recipe_text: | |||||
text_features = self.text_feature_extractor(recipe_embeddings) | |||||
if self.use_image_ingredients: | |||||
image_features = self.image_feature_extractor(image_ingredients) | |||||
if self.use_image_ingredients or self.use_text_ingredients: | |||||
if self.use_text_ingredients and self.use_image_ingredients: | |||||
ingredient_features = torch.cat([embeddings, image_features], dim = 2) | |||||
elif self.use_text_ingredients: | |||||
ingredient_features = embeddings | |||||
else: | |||||
ingredient_features = image_features | |||||
out = ingredient_features | |||||
for i, m in enumerate(self.layers): | |||||
if m == "T": | |||||
out = self.ingredient_feature_module[i](out, ~mask) | |||||
else: | |||||
raise Exception("Invalid module") | |||||
aggregated_ingredient_features = out | |||||
if self.use_recipe_text: | |||||
recipe_features = torch.cat([text_features, aggregated_ingredient_features], dim=1) | |||||
else: | |||||
recipe_features = aggregated_ingredient_features | |||||
else: | |||||
recipe_features = text_features | |||||
final_result = self.classifier(torch.nn.functional.relu(recipe_features)) | |||||
return final_result | |||||
def freeze_features(self): | |||||
self.feature_extractor.eval() | |||||
def freeze_function(self): | |||||
self.classifier.eval() |
from datetime import datetime | |||||
import os | |||||
experiment_name = 'Parham BS Project Region Prediction' | |||||
experiment_code = experiment_name.replace(' - ', '.').replace(' ', '_').lower() | |||||
import nltk | |||||
# nltk.download('wordnet') | |||||
# nltk.download('omw-1.4') | |||||
# nltk.download('punkt') | |||||
import json | |||||
import numpy as np | |||||
from torch.utils.data import Dataset, DataLoader | |||||
from torch.nn import Module | |||||
import torch | |||||
import json | |||||
from tqdm import tqdm | |||||
from gensim.models import FastText | |||||
from utils.sam import SAM | |||||
from utils.bypass_bn import enable_running_stats, disable_running_stats | |||||
from einops import reduce | |||||
from utils.recipedb_dataset import RecipeDBDataset | |||||
import logging | |||||
import argparse | |||||
from tqdm import tqdm | |||||
import mlflow | |||||
import mlflow.pytorch | |||||
logging.basicConfig(level=logging.WARN) | |||||
logger = logging.getLogger(__name__) | |||||
from network import ImageTextTransformer | |||||
from utils.io import load_config, save_config | |||||
print("here") | |||||
mlflow.set_experiment(experiment_name) | |||||
parser = argparse.ArgumentParser() | |||||
parser.add_argument('--config', type=str) | |||||
args = parser.parse_args() | |||||
config = load_config(args.config) | |||||
epochs = config.optim.epochs | |||||
batch_size = config.optim.batch_size | |||||
learning_rate = config.optim.max_lr | |||||
weight_decay = config.optim.weight_decay | |||||
embedding_size = config.data.embedding_size | |||||
num_classes = config.model.final_classes | |||||
sam_rho = config.optim.sam_rho | |||||
num_workers = config.optim.num_workers | |||||
data_path = config.data.dataset_path | |||||
target = config.data.target | |||||
target_dictionary = json.load(open(os.path.join(data_path, f'{target}.json'), 'r')) | |||||
if 'entropy' in config.optim: | |||||
entropy_weight = config.optim.entropy | |||||
else: | |||||
entropy_weight = 0 | |||||
config.model.final_classes= len(target_dictionary) | |||||
epsilon = 1e-8 | |||||
print(target) | |||||
print(target_dictionary) | |||||
output_dir = f'parham-models_image_taext_transformer/{target}/{datetime.now().strftime("%Y-%m-%d_%H-%M-%S")}' | |||||
if not os.path.isdir(output_dir): | |||||
os.makedirs(output_dir, exist_ok=True) | |||||
class EmbedderFasttext(): | |||||
def __init__(self, path): | |||||
self.model = FastText.load(path) | |||||
print(f'sFastText Embedding Loaded:\n\t Embedding Size = {self.model.wv.vector_size}\n\t Vocabulary Size = {self.model.wv.vectors.shape[0]}') | |||||
def has(self, word): | |||||
if word == "": | |||||
return False | |||||
return True | |||||
def get(self, word): | |||||
words = word.split('_') | |||||
out = np.zeros(self.model.wv.vector_size) | |||||
n = len(words) | |||||
if n == 0: | |||||
raise ValueError('Empty string was given.') | |||||
for item in words: | |||||
out += self.model.wv.get_vector(item) / n | |||||
return list(out) | |||||
embedder = EmbedderFasttext(config.data.fasttext_path) | |||||
datasets = { | |||||
"train": RecipeDBDataset(os.path.join(data_path, 'train.json'), | |||||
cousine_dict=target_dictionary, | |||||
extract_ingredients=True, extract_recipes=True, extract_cousine=(target != 'category'), | |||||
embedder=embedder, target=target, occr_path=os.path.join(data_path, "ingredient_counts.json"), | |||||
mask_path=os.path.join(data_path, "ingredient_counts.json"), include_id=True, image_model = config.image_model), | |||||
"val": RecipeDBDataset(os.path.join(data_path, "val.json"), | |||||
cousine_dict=target_dictionary, | |||||
extract_ingredients=True, extract_recipes=True, extract_cousine=(target != 'category'), | |||||
embedder=embedder, target=target, occr_path=os.path.join(data_path, "ingredient_counts.json"), | |||||
mask_path=os.path.join(data_path, "ingredient_counts.json"), include_id=True, image_model = config.image_model) | |||||
} | |||||
print('Dataset constructed.') | |||||
print(len(datasets['train']), len(datasets['val'])) | |||||
print(f'target: {target}') | |||||
print(f'number of classes: {len(target_dictionary)}') | |||||
device = config.optim.device | |||||
dataloaders = { | |||||
"train":DataLoader(datasets["train"], batch_size=batch_size, collate_fn=datasets['train'].rdb_collate, shuffle=True, num_workers=num_workers), | |||||
"val":DataLoader(datasets["val"], batch_size=batch_size, collate_fn=datasets['val'].rdb_collate, shuffle=False,num_workers=num_workers) | |||||
} | |||||
loss_fn = torch.nn.CrossEntropyLoss().to(device) | |||||
print('Dataloader constructed.') | |||||
model = ImageTextTransformer(config) | |||||
print(model) | |||||
model = model.to(device) | |||||
optimizer = SAM(model.parameters(), rho=sam_rho, base_optimizer=torch.optim.Adam, lr=learning_rate/10, weight_decay=weight_decay) | |||||
scheduler = torch.optim.lr_scheduler.OneCycleLR(max_lr = learning_rate, epochs=epochs, steps_per_epoch=len(dataloaders["train"]), optimizer=optimizer.base_optimizer) | |||||
def stable_log_sigmoid(x): | |||||
max_value = torch.maximum(x, torch.zeros(*x.shape, dtype=torch.float32, device=x.device)) | |||||
return -max_value - torch.log(torch.exp(-max_value) + torch.exp(x - max_value)) | |||||
def argtopk(tensor, k, dim): | |||||
indices = torch.argsort(tensor, dim=dim, descending=True) | |||||
topk_indices = indices.narrow(dim, 0, k) | |||||
return topk_indices | |||||
with mlflow.start_run(): | |||||
mlflow.log_params(dict(config)) | |||||
result = None | |||||
best_val_acc = 0 | |||||
best_val_top3 = 0 | |||||
best_val_top5 = 0 | |||||
for epoch in range(epochs): | |||||
for mode in ["train", "val"]: | |||||
if mode == 'train': | |||||
model.train() | |||||
else: | |||||
model.eval() | |||||
running_loss = 0.0 | |||||
running_corrects = 0 | |||||
top_5_corrects = 0 | |||||
top_3_corrects = 0 | |||||
num_samples = 0 | |||||
s = 0 | |||||
for data_batch in tqdm(dataloaders[mode]): | |||||
embeddings= data_batch['ingredients'].to(device) | |||||
masks = data_batch['masks'].to(device) | |||||
targets = data_batch['cousines'].to(device) if 'cousines' in data_batch else data_batch['targets'].to(device) | |||||
image_ingredients = data_batch['image_ingredients'].to(device) | |||||
recipe_embeddings = data_batch['recipe_embeddings'].to(device) | |||||
with torch.set_grad_enabled(mode == 'train'): | |||||
enable_running_stats(model) | |||||
out = model(embeddings, masks, image_ingredients, recipe_embeddings) | |||||
entropy = -torch.sum(torch.sigmoid(out) * stable_log_sigmoid(out)) / embeddings.shape[0] | |||||
loss = loss_fn(out, targets) + entropy_weight * entropy | |||||
if mode == 'train': | |||||
loss.backward() | |||||
optimizer.first_step(zero_grad=True) | |||||
disable_running_stats(model) | |||||
out = model(embeddings, masks, image_ingredients, recipe_embeddings) | |||||
entropy = -torch.sum(torch.sigmoid(out) * stable_log_sigmoid(out)) / embeddings.shape[0] | |||||
(loss_fn(out, targets) + entropy_weight * entropy).backward() | |||||
optimizer.second_step(zero_grad=True) | |||||
scheduler.step() | |||||
running_loss+=loss.item()*embeddings.shape[0] | |||||
running_corrects += (out.argmax(dim=1) == targets).sum().item() | |||||
num_samples+=embeddings.shape[0] | |||||
top_5_corrects += (argtopk(out, k=5, dim=1) == targets.unsqueeze(1)).sum().item() | |||||
top_3_corrects += (argtopk(out, k=3, dim=1) == targets.unsqueeze(1)).sum().item() | |||||
print(f"epoch: {epoch}, loss: {running_loss/num_samples}, acc: {running_corrects/num_samples}, top3: {top_3_corrects/num_samples}, top5: {top_5_corrects/num_samples}") | |||||
if mode=="val": | |||||
best_val_acc = running_corrects/num_samples*100 if running_corrects/num_samples*100 > best_val_acc else best_val_acc | |||||
best_val_top3 = top_3_corrects/num_samples*100 if top_3_corrects/num_samples*100 > best_val_top3 else best_val_top3 | |||||
best_val_top5 = top_5_corrects/num_samples*100 if top_5_corrects/num_samples*100 > best_val_top5 else best_val_top5 | |||||
metrics = { | |||||
'{}_loss'.format(mode): running_loss/num_samples, | |||||
'{}_acc'.format(mode): running_corrects/num_samples*100, | |||||
'{}_acc3'.format(mode): top_3_corrects/num_samples*100, | |||||
'{}_acc5'.format(mode): top_5_corrects/num_samples*100 | |||||
} | |||||
if mode == 'val': | |||||
metrics["best_val_acc"] = best_val_acc | |||||
metrics["best_val_acc3"] = best_val_top3 | |||||
metrics["best_val_acc5"] = best_val_top5 | |||||
result = running_corrects/num_samples*100 | |||||
mlflow.log_metrics(metrics) | |||||
os.makedirs(output_dir, exist_ok=True) | |||||
mlflow.pytorch.log_model(model, 'model') | |||||
config.result = result | |||||
torch.save(model.state_dict(), os.path.join(output_dir, "checkpoint.pth")) | |||||
save_config(config, os.path.join(output_dir, "config.yml")) |
import torch | |||||
import torch.nn as nn | |||||
def disable_running_stats(model): | |||||
def _disable(module): | |||||
if isinstance(module, nn.BatchNorm2d): | |||||
module.backup_momentum = module.momentum | |||||
module.momentum = 0 | |||||
model.apply(_disable) | |||||
def enable_running_stats(model): | |||||
def _enable(module): | |||||
if isinstance(module, nn.BatchNorm2d) and hasattr(module, "backup_momentum"): | |||||
module.momentum = module.backup_momentum | |||||
model.apply(_enable) |
from gensim.corpora.dictionary import Dictionary | |||||
import logging | |||||
from pyemd import emd | |||||
from nltk.corpus import stopwords | |||||
import fasttext | |||||
import json | |||||
import numpy as np | |||||
logger = logging.getLogger(__name__) | |||||
class FasttextEmbedding: | |||||
def __init__(self, model_path): | |||||
if model_path.endswith('.bin'): | |||||
self.model = fasttext.load_model(model_path) | |||||
self.full = True | |||||
else: | |||||
self.model = np.load(model_path) | |||||
self.full = False | |||||
self.stopwords = stopwords.words('english') | |||||
def __getitem__(self, idx): | |||||
if self.full: | |||||
return self.model.get_word_vector(idx) | |||||
else: | |||||
if idx not in self.model: | |||||
raise ValueError('Word not available.') | |||||
return self.model[idx] |
import yaml | |||||
from easydict import EasyDict as edict | |||||
import json | |||||
def load_config(path): | |||||
with open(path, 'r', encoding='utf8') as f: | |||||
return edict(yaml.safe_load(f)) | |||||
def save_config(config, path): | |||||
x = json.loads(json.dumps(config)) | |||||
with open(path, 'w', encoding='utf8') as f: | |||||
yaml.dump(x, f, default_flow_style=False, allow_unicode=True) |
from typing import Any | |||||
import torch | |||||
from torch.utils.data import Dataset | |||||
import json | |||||
import numpy as np | |||||
from torch.nn.utils.rnn import pad_sequence | |||||
import warnings | |||||
import os | |||||
warnings.filterwarnings(action='ignore',category=UserWarning,module='gensim') | |||||
warnings.filterwarnings(action='ignore',category=FutureWarning,module='gensim') | |||||
def mask_count(num): | |||||
return num//5 | |||||
def generate_ing_dict(path, threshold): | |||||
assert path != None | |||||
with open(path, "r") as json_file: | |||||
full_ing_count_list:dict = json.load(json_file) | |||||
filtered_ing_list = {} | |||||
counter = 0 | |||||
for ing, count in full_ing_count_list.items(): | |||||
if count > threshold: | |||||
filtered_ing_list[ing] = counter | |||||
counter += 1 | |||||
return filtered_ing_list | |||||
def get_ingredient_frequencies(occr_path): | |||||
occr = None | |||||
with open(occr_path, "r") as json_file: | |||||
occr = json.load(json_file) | |||||
if '' in occr: | |||||
del occr[''] | |||||
return occr | |||||
class RecipeDBDataset(Dataset): | |||||
def __init__(self, json_path, cousine_dict=None, | |||||
extract_ingredients=False, extract_recipes=False, extract_cousine=False, | |||||
embedder=None, include_id=False, mask_threshold=1000, mask_path=None, | |||||
occr_path = None, target='country', | |||||
image_model="resnet18") -> None: | |||||
super(RecipeDBDataset, self).__init__() | |||||
with open(json_path, "r") as json_file: | |||||
data = json.load(json_file) | |||||
if occr_path is not None: | |||||
self.freqs = get_ingredient_frequencies(occr_path) | |||||
self.all_ingredients, self.all_ingredient_probs = zip(*sorted(self.freqs.items())) | |||||
self.all_ingredients = list(self.all_ingredients) | |||||
self.all_ingredient_probs = np.array(self.all_ingredient_probs, dtype=np.float32) | |||||
self.all_ingredient_probs /= np.sum(self.all_ingredient_probs) | |||||
self.ing_dict:dict = generate_ing_dict(mask_path, mask_threshold) | |||||
self.len_mask_ing = len(self.ing_dict) | |||||
self.data = [] | |||||
self.embedder = embedder | |||||
self.extract_ingredients = extract_ingredients | |||||
self.extract_recipes = extract_recipes | |||||
self.extract_cousine = extract_cousine | |||||
self.ingredient_set = set() | |||||
self.image_path = "Data/image_dict_ings.json" | |||||
with open(self.image_path, 'r') as jf: | |||||
self.image_ing_dict = json.load(jf) | |||||
self.image_feature_path = "/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full" | |||||
feature_size = { | |||||
'resnet18': 512, | |||||
'resnet50': 2048, | |||||
'resnet101': 2048, | |||||
'efficientnet_b0': 1280, | |||||
'efficientnet_b3': 1536, | |||||
'efficientnet_t0': 1280 | |||||
} | |||||
self.image_model = image_model | |||||
self.image_feature_size = feature_size[self.image_model] | |||||
self.not_found_ings = set() | |||||
self.text_feature_path = "/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features" | |||||
self.text_feature_model = "bert-base-uncased" | |||||
failed_ing_count = 0 | |||||
for recipe in data: | |||||
temp_data = {} | |||||
if extract_ingredients: | |||||
temp_data["ingredients"] = [] | |||||
for ing in recipe["ingredients"]: | |||||
if ing["Ingredient Name"] != "": | |||||
temp_data["ingredients"].append(ing["Ingredient Name"]) | |||||
if len(temp_data["ingredients"]) == 0: | |||||
failed_ing_count += 1 | |||||
continue | |||||
if extract_cousine: | |||||
temp_data["cousine"] = cousine_dict[recipe[target]] | |||||
if include_id: | |||||
temp_data["id"] = recipe["id"] | |||||
self.data.append(temp_data) | |||||
self.cousine_dict = cousine_dict | |||||
print(f"failed ings count: {failed_ing_count}") | |||||
def __getitem__(self, index: Any): | |||||
d = self.data[index] | |||||
out = {} | |||||
ings = [] | |||||
if self.extract_ingredients: | |||||
for ing in d["ingredients"]: | |||||
if self.embedder.has(ing): | |||||
ings.append(self.embedder.get(ing)) | |||||
ings = torch.tensor(ings, dtype=torch.float32) | |||||
image_ingredients = [] | |||||
for ing in d["ingredients"]: | |||||
npy_path = "" | |||||
if ing in self.image_ing_dict: | |||||
npy_path = os.path.join(self.image_feature_path, self.image_model, f"{ing}.npy") | |||||
elif ing.replace(" ", "_") in self.image_ing_dict: | |||||
npy_path = os.path.join(self.image_feature_path, self.image_model, f"{ing.replace(' ', '_')}.npy") | |||||
else: | |||||
for ing_part in ing.split(): | |||||
if ing_part in self.image_ing_dict: | |||||
npy_path = os.path.join(self.image_feature_path, self.image_model, f"{ing_part}.npy") | |||||
break | |||||
else: | |||||
self.not_found_ings.add(ing) | |||||
if npy_path == "": | |||||
image_ingredients.append(np.zeros(self.image_feature_size)) | |||||
else: | |||||
image_ingredients.append(np.load(npy_path)) | |||||
image_ingredients = torch.tensor(image_ingredients, dtype=torch.float32) | |||||
out["ingredients"] = ings | |||||
out["image_ingredients"] = image_ingredients | |||||
if self.extract_recipes: | |||||
out["recipe_embedding"] = torch.tensor(np.load(os.path.join(self.text_feature_path, self.text_feature_model, f'{d["id"]}.npy')), dtype=torch.float32) | |||||
if self.extract_cousine: | |||||
out["cousine"] = d["cousine"] | |||||
return out | |||||
def __len__(self): | |||||
return self.data.__len__() | |||||
def rdb_collate(self, batch): | |||||
cousines = [] | |||||
ingredients = [] | |||||
masks = [] | |||||
image_ingredients = [] | |||||
recipe_embeddings = [] | |||||
for data in batch: | |||||
if "cousine" in data: | |||||
cousines.append(data["cousine"]) | |||||
if "recipe_embedding" in data: | |||||
recipe_embeddings.append(data["recipe_embedding"]) | |||||
if "ingredients" in data: | |||||
ingredients.append(data["ingredients"]) | |||||
masks.append(torch.ones(data["ingredients"].shape[0])) | |||||
image_ingredients.append(data["image_ingredients"]) | |||||
outs = {} | |||||
if "ingredients" in data: | |||||
masks = pad_sequence(masks, batch_first=True, padding_value=0).type(torch.bool) | |||||
ingredients = pad_sequence(ingredients, batch_first=True, padding_value=0) | |||||
image_ingredients = pad_sequence(image_ingredients, batch_first=True, padding_value=0) | |||||
outs["masks"] = masks | |||||
outs["ingredients"] = ingredients | |||||
outs["image_ingredients"] = image_ingredients | |||||
if "recipe_embedding" in data: | |||||
outs["recipe_embeddings"] = torch.cat(recipe_embeddings, dim=0) | |||||
if "cousine" in data: | |||||
cousines = torch.LongTensor(cousines) | |||||
outs["cousines"] = cousines | |||||
return outs | |||||
def dict_to_device(data:dict, device, return_new_dict=False): | |||||
new_dict = {} | |||||
for k, v in data.items(): | |||||
if not return_new_dict: | |||||
data[k] = v.to(device) | |||||
else: | |||||
new_dict[k] = v.to(device) | |||||
return new_dict if return_new_dict else data |
import torch | |||||
class SAM(torch.optim.Optimizer): | |||||
def __init__(self, params, base_optimizer, rho=0.05, adaptive=False, **kwargs): | |||||
assert rho >= 0.0, f"Invalid rho, should be non-negative: {rho}" | |||||
defaults = dict(rho=rho, adaptive=adaptive, **kwargs) | |||||
super(SAM, self).__init__(params, defaults) | |||||
self.base_optimizer = base_optimizer(self.param_groups, **kwargs) | |||||
self.param_groups = self.base_optimizer.param_groups | |||||
@torch.no_grad() | |||||
def first_step(self, zero_grad=False): | |||||
grad_norm = self._grad_norm() | |||||
for group in self.param_groups: | |||||
scale = group["rho"] / (grad_norm + 1e-12) | |||||
for p in group["params"]: | |||||
if p.grad is None: continue | |||||
self.state[p]["old_p"] = p.data.clone() | |||||
e_w = (torch.pow(p, 2) if group["adaptive"] else 1.0) * p.grad * scale.to(p) | |||||
p.add_(e_w) # climb to the local maximum "w + e(w)" | |||||
if zero_grad: self.zero_grad() | |||||
@torch.no_grad() | |||||
def second_step(self, zero_grad=False): | |||||
for group in self.param_groups: | |||||
for p in group["params"]: | |||||
if p.grad is None: continue | |||||
p.data = self.state[p]["old_p"] # get back to "w" from "w + e(w)" | |||||
self.base_optimizer.step() # do the actual "sharpness-aware" update | |||||
if zero_grad: self.zero_grad() | |||||
@torch.no_grad() | |||||
def step(self, closure=None): | |||||
assert closure is not None, "Sharpness Aware Minimization requires closure, but it was not provided" | |||||
closure = torch.enable_grad()(closure) # the closure should do a full forward-backward pass | |||||
self.first_step(zero_grad=True) | |||||
closure() | |||||
self.second_step() | |||||
def _grad_norm(self): | |||||
shared_device = self.param_groups[0]["params"][0].device # put everything on the same device, in case of model parallelism | |||||
norm = torch.norm( | |||||
torch.stack([ | |||||
((torch.abs(p) if group["adaptive"] else 1.0) * p.grad).norm(p=2).to(shared_device) | |||||
for group in self.param_groups for p in group["params"] | |||||
if p.grad is not None | |||||
]), | |||||
p=2 | |||||
) | |||||
return norm | |||||
def load_state_dict(self, state_dict): | |||||
super().load_state_dict(state_dict) | |||||
self.base_optimizer.param_groups = self.param_groups |