Browse Source

Add code for BSc proj

master
Parham 10 months ago
commit
b68169505b
12 changed files with 957 additions and 0 deletions
  1. 96
    0
      README.md
  2. 31
    0
      best_config.yml
  3. 119
    0
      extract_image_vector.py
  4. 50
    0
      extract_recipe_vector.py
  5. 155
    0
      network.py
  6. 203
    0
      train.py
  7. 0
    0
      utils/__init__.py
  8. 17
    0
      utils/bypass_bn.py
  9. 27
    0
      utils/fasttext_embedding.py
  10. 12
    0
      utils/io.py
  11. 185
    0
      utils/recipedb_dataset.py
  12. 62
    0
      utils/sam.py

+ 96
- 0
README.md View File

@@ -0,0 +1,96 @@
# Required data for running on other servers
- Crawled Images (Resized to 384): (172.27.50.254):/home/dml/food/CuisineAdaptation/crawled-images-full-384
- FastText Model Folder: (172.27.50.254):/home/dml/food/CuisineAdaptation/fasttext
- Extracted Image Features: (172.27.50.254):/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full
- Extracted Text Features: (172.27.50.254):/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features
- Crawled Images (Original Size):

The "Crawled Images (Original Size)" path contains the original images of ingredients that were obtained from Google. Afterwards, these images were resized to 384 to facilitate their transfer between servers. Since our image models use a maximum size of 300 for input images, a larger size is not necessary, making this reduction in size very convenient for us. The original data folder is over 200 GB in size, while the resized folder still contains over 20 GB of data, therefore, only their paths are provided here. These folders **are not required** to run the final model since their features are extracted using pre-trained models that we use to run our model.

The FastText model is a non-contextual model used to extract embeddings of ingredient names. The model is approximately 1 GB in size, so only its path is provided here. This model **is required** to run the final training code.

The "Extracted Image Features" path refers to the folder containing the extracted features from ingredient images using pre-trained image models. These image features **are necessary** to run the main training code.

The "Extracted Text Features" path refers to the folder containing the extracted features from recipes using the BERT model. These features **are also required** to run the main training code.

# Structure of the Available Files
## Data
This folder contains the following files:

- train.json: train split of RecipeDB dataset
- val.json: validation split of RecipeDB dataset
- region.json: a JSON file listing all of the regions and assigning a number to each one of them
- ingredient_counts.json: a JSON file showing a list of all of the ingredients in RecipeDB dataset and their count in the whole dataset.
- image_dict_ings.json: a list of crawled image names.
## Utils
The following list explains all the files that appear in the utils folder:

- fasttext_embedding.py: a Python util file that provides functions to get embeddings from FastText.
- io.py: utility functions that help in loading and saving the config.
- bypass_bn.py: a file that contains functions to handle Batch Normalizations.
- recipedb_dataset.py: an implementation of RecipeDB dataset using PyTorch Dataset class.
- sam.py: an implementation of SAM optimizer, which is used in this project.
## Others
- extract_image_vector.py: used for ingredient visual embedding extraction.
- extract_recipe_vector.py: used for recipe text embedding extraction.
- network.py: implementation of PyTorch Image-Text-Transformer model that is required for solving the problem.
- train.py: code used for loading the data, creating the model, and feeding the data to the model.
- best_config.yml: YAML config file used for specifying the hyperparameters of the model.

# How to extract features
## Text features
We can extract text features from RecipeDB's recipes using the `extract_recipe_vector.py` python file. This file defines a path to the JSON data file. The JSON files (`Data/train.json` and `Data/val.json`) are the files that this script uses to extract embeddings. This script also defines an output path at the beginning of the file. The output path is the location where the final embeddings will be saved. Below is the command for running this script:
```bash
python3 extract_recipe_vector.py
```

## Image features
Image features can be extracted using the `extract_image_vector.py` code, which specifies the following text fields at its beginning:

```python
input_dir = '/home/dml/food/CuisineAdaptation/crawled-images-full-384'
output_root_dir = 'image-features-full'
```
Using these fields, we can define the path to the input image folder (which contains images of different ingredients resized to 384x384) and the output root directory, which indicates where the embeddings will be saved.

This script will load and run five pretrained models on the input data and save their embeddings in the output folder. Keep in mind that the output embedding for an ingredient is the average of all the embeddings extracted from its corresponding images.

# How to run the train code
The code only takes a configuration file as input and can be run solely using the configuration. The command for running the training code is as follows:
```bash
python3 train.py --config best_config.yml
```
# Config
```yml
optim:
epochs: "Number of epochs for training" :> int
batch_size: "Batch size for the dataset" :> int
max_lr: "Max learning rate to pass to the scheduler" :> float
weight_decay: "Weight decay value to pass to optimizer" :> float
device: "Device to use for training, either 'cuda' or 'cpu'" :> str
num_workers: "Number of workers for dataloader" :> int
sam_rho: "Hyperparameter for SAM optimizer" :> float
text_model: "bert-base-uncased" :> str
image_model: "Name of the image model. Available values: resnet18, resnet50, resnet101, efficientnet_b0, efficientnet_b3" :> str
image_features_path: "Path for the extracted image features" :> str
text_features_path: "Path for the extracted text features" :> str
use_recipe_text: "Should the model use recipe embeddings?" :> bool
use_image_ingredients: "Should the model use ingredient image embeddings?" :> bool
use_text_ingredients: "Should the model use ingredient text embeddings?" :> bool
model:
ingredient_feature_extractor:
layers: "Number of transformer blocks like T or TTTT" :> str
H: "Embedding size for the transformer" :> int
transformer:
L: "Number of layers for each transformer block" :> int
n_heads: "Number of heads for each transformer block" :> int
final_ingredient_feature_size: "What is the ingredient feature size after we get the output from the transformer?" :> int
image_feature_size: "What is the size of the image features reduced to in the beginning?" :> int
text_feature_size: "What is the size of the text features from recipes reduced to?" :> int
final_classes: "This will be replaced in the code. Just set it to -1." :> int
data:
embedding_size: "What is the embedding size of ingredient text features?" :> int
dataset_path: "Path for the RecipeDB dataset" :> str
fasttext_path: "Path to the fasttext .model file" :> str
target: "Type of target. Should be 'region' for this project." :> str
```

+ 31
- 0
best_config.yml View File

@@ -0,0 +1,31 @@
optim:
epochs: 50
batch_size: 256
max_lr: 0.001
weight_decay: 0.0005
device: "cuda:0"
num_workers: 4
sam_rho: 0.1
text_model: "bert-base-uncased"
image_model: "efficientnet_b0"
image_features_path: "/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full"
text_features_path: "/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features"
use_recipe_text: True
use_image_ingredients: True
use_text_ingredients: True
model:
ingredient_feature_extractor:
layers: "TTTTT"
H: 384
transformer:
L: 4
n_heads: 4
final_ingredient_feature_size: 200
image_feature_size: 200
text_feature_size: 200
final_classes: -1
data:
embedding_size: 100
dataset_path: "Data"
fasttext_path: "trained_1m.model"
target: "region"

+ 119
- 0
extract_image_vector.py View File

@@ -0,0 +1,119 @@
import os
import numpy as np
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
from tqdm import tqdm


import warnings
warnings.filterwarnings("ignore")

models_names = [
'efficientnet_t0',
'resnet50',
'resnet101',
'efficientnet_b0',
'efficientnet_b3'
]

input_dir = '/home/dml/food/CuisineAdaptation/crawled-images-full-384'
output_root_dir = 'image-features-full'

image_size = {
'resnet18': 224,
'resnet50': 224,
'resnet101': 224,
'efficientnet_b0': 224,
'efficientnet_t0': 224,
'efficientnet_b3': 300
}

normalize = {
'resnet18': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
'resnet50': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
'resnet101': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
'efficientnet_t0': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
'efficientnet_b3': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
}

transform = {
'resnet18': transforms.Compose([
transforms.Resize(image_size['resnet18']),
transforms.CenterCrop(image_size['resnet18']),
transforms.ToTensor(),
normalize['resnet18']
]),
'resnet50': transforms.Compose([
transforms.Resize(image_size['resnet50']),
transforms.CenterCrop(image_size['resnet50']),
transforms.ToTensor(),
normalize['resnet50']
]),
'resnet101': transforms.Compose([
transforms.Resize(image_size['resnet101']),
transforms.CenterCrop(image_size['resnet101']),
transforms.ToTensor(),
normalize['resnet101']
]),
'efficientnet_t0': transforms.Compose([
transforms.Resize(image_size['efficientnet_t0']),
transforms.CenterCrop(image_size['efficientnet_t0']),
transforms.ToTensor(),
normalize['efficientnet_t0']
]),
'efficientnet_b3': transforms.Compose([
transforms.Resize(image_size['efficientnet_b3']),
transforms.CenterCrop(image_size['efficientnet_b3']),
transforms.ToTensor(),
normalize['efficientnet_b3']
])
}

device = torch.device("cuda")
counter = 0

for model_name in models_names:
if 'resnet' in model_name:
model = getattr(models, model_name)(pretrained=True)
num_features = model.fc.in_features
model.fc = torch.nn.Identity()
elif 'efficientnet' in model_name:
model = getattr(models, model_name)(pretrained=True)
num_features = model.classifier[1].in_features
model.classifier = torch.nn.Identity()
else:
print('Unknown model name: {}'.format(model_name))
continue
num_classes = num_features
model = model.eval().to(device)
output_dir = os.path.join(output_root_dir, model_name)
os.makedirs(output_dir, exist_ok=True)

for folder_name in tqdm(os.listdir(input_dir)):
folder_dir = os.path.join(input_dir, folder_name)
if not os.path.isdir(folder_dir):
continue

image_tensors = []
for image_filename in os.listdir(folder_dir):
if not image_filename.lower().endswith(".png") and not image_filename.lower().endswith(".jpg"):
continue
counter += 1
image_path = os.path.join(folder_dir, image_filename)

image = Image.open(image_path).convert('RGB')
image_tensor = transform[model_name](image).unsqueeze(0).to(device)
image_tensors.append(image_tensor)
if len(image_tensors) > 0:
input_tensors = torch.cat(image_tensors)
with torch.no_grad():
avg_features = model(input_tensors).squeeze(0).mean(dim=0).cpu().numpy()
else:
avg_features = np.zeros(num_features)

output_filename = '{}.npy'.format(folder_name)
output_path = os.path.join(output_dir, output_filename)
np.save(output_path, avg_features)

+ 50
- 0
extract_recipe_vector.py View File

@@ -0,0 +1,50 @@
import json
import numpy as np
import os
import torch
from tqdm import tqdm

from transformers import (BertTokenizer, BertModel,
GPT2Tokenizer, GPT2Model,
RobertaTokenizer, RobertaModel,
ElectraTokenizer, ElectraModel,
DistilBertTokenizer, DistilBertModel)

models = {
'bert-base-uncased': (BertTokenizer, BertModel),
}

#json_path = 'Data/val.json'
json_path = 'Data/train.json'
output_dir = 'text-features'

with open(json_path, 'r') as f:
data = json.load(f)

counter = 0
for model_name, (Tokenizer, Model) in models.items():
tokenizer = Tokenizer.from_pretrained(model_name)
max_size = tokenizer.max_model_input_sizes[model_name]
model = Model.from_pretrained(model_name)
model.to("cuda")
for datapoint in tqdm(data):
instructions = " ".join(datapoint['instructions'])
if "gpt" in model_name:
tokenized_instructions = tokenizer.encode(instructions, add_special_tokens=True)[:max_size]
else:
tokenized_instructions = [tokenizer.encode(instructions, add_special_tokens=True)[:max_size]]
input_ids = torch.tensor(tokenized_instructions)
# print(input_ids.shape)
with torch.no_grad():
outputs = model(input_ids.to("cuda"))
if "gpt" in model_name:
embeddings = outputs.last_hidden_state[0, :].detach().cpu().numpy()
else:
embeddings = outputs.last_hidden_state[:, 0, :].detach().cpu().numpy()
# print(embeddings.shape)
output_filename = '{}.npy'.format(datapoint['id'])
output_path = os.path.join(output_dir, model_name, output_filename)
os.makedirs(os.path.dirname(output_path), exist_ok=True)
np.save(output_path, embeddings)

+ 155
- 0
network.py View File

@@ -0,0 +1,155 @@
from torch.nn import Module
from torch import nn
import torch
import torch.nn.functional as F
from einops import reduce
from gensim.models import FastText
import numpy as np
import json
epsilon = 1e-8
import pickle
VECTORIZER_SIZE = 1500

class EmbedderFasttext():
def __init__(self, path):
self.model = FastText.load(path)
print(f'sFastText Embedding Loaded:\n\t Embedding Size = {self.model.wv.vector_size}\n\t Vocabulary Size = {self.model.wv.vectors.shape[0]}')
def has(self, word):
if word == "":
return False
return True
def get(self, word):
words = word.split('_')
out = np.zeros(self.model.wv.vector_size)
n = len(words)
if n == 0:
raise ValueError('Empty string was given.')
for item in words:
out += self.model.wv.get_vector(item) / n
return list(out)

class Transformer(Module):
def __init__(self, input_size, nhead, num_layers, dim_feedforward, num_classes, aggregate = True):
super(Transformer, self).__init__()
self.encoder_layer = nn.TransformerEncoderLayer(d_model=input_size, dim_feedforward=dim_feedforward,nhead=nhead, batch_first=True)
self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_layers)
self.aggregate = aggregate
if self.aggregate:
self.linear = nn.Linear(input_size, num_classes, True)
def forward(self, x, padding_mask):
out = self.transformer_encoder(x, src_key_padding_mask=padding_mask)
if self.aggregate:
out = (out* ~padding_mask.unsqueeze(-1)).sum(dim=1)
out = self.linear(torch.relu(out))
return out

class ImageTextTransformer(Module):
def __init__(self, config):
super(ImageTextTransformer, self).__init__()
self.embedding_size = config.data.embedding_size
self.custom_embed = False
self.layers = config.model.ingredient_feature_extractor.layers
if "G" in config.model.ingredient_feature_extractor.layers:
assert False, "No GNN for this model"
self.use_recipe_text = config.use_recipe_text
self.use_text_ingredients = config.use_text_ingredients
self.use_image_ingredients = config.use_image_ingredients
if not self.use_recipe_text and not self.use_text_ingredients and not self.use_image_ingredients:
raise Exception("The model can't work without any features")
if self.use_text_ingredients or self.use_image_ingredients:
transformer_input_feature_size = 0
if self.use_image_ingredients:
transformer_input_feature_size += config.model.image_feature_size
if self.use_text_ingredients:
transformer_input_feature_size += self.embedding_size
blocks = [
Transformer(
input_size=transformer_input_feature_size,
nhead=config.model.ingredient_feature_extractor.transformer.n_heads,
num_layers=config.model.ingredient_feature_extractor.transformer.L,
dim_feedforward=config.model.ingredient_feature_extractor.H,
num_classes=config.model.ingredient_feature_extractor.final_ingredient_feature_size if i==len(config.model.ingredient_feature_extractor.layers)-1 else None,
aggregate = (i==len(config.model.ingredient_feature_extractor.layers)-1)
) for i, m in enumerate(config.model.ingredient_feature_extractor.layers)
]
self.ingredient_feature_module = nn.ModuleList(blocks)
feature_size = {
'resnet18': 512,
'resnet50': 2048,
'resnet101': 2048,
'efficientnet_b0': 1280,
'efficientnet_b3': 1536,
'bert-base-uncased': 768,
}
if self.use_image_ingredients:
self.image_feature_extractor = torch.nn.Linear(feature_size[config.image_model], config.model.image_feature_size)

if self.use_recipe_text:
self.text_feature_extractor = torch.nn.Linear(feature_size[config.text_model], config.model.text_feature_size)
classifier_input_size = 0
if self.use_image_ingredients or self.use_text_ingredients:
classifier_input_size += config.model.ingredient_feature_extractor.final_ingredient_feature_size
if self.use_recipe_text:
classifier_input_size += config.model.text_feature_size
self.classifier = torch.nn.Sequential(
torch.nn.Linear(classifier_input_size, 300),
torch.nn.ReLU(),
torch.nn.Linear(300, 300),
torch.nn.ReLU(),
torch.nn.Linear(300, config.model.final_classes)
)
def forward(self, embeddings, mask, image_ingredients, recipe_embeddings):
if self.use_recipe_text:
text_features = self.text_feature_extractor(recipe_embeddings)
if self.use_image_ingredients:
image_features = self.image_feature_extractor(image_ingredients)
if self.use_image_ingredients or self.use_text_ingredients:
if self.use_text_ingredients and self.use_image_ingredients:
ingredient_features = torch.cat([embeddings, image_features], dim = 2)
elif self.use_text_ingredients:
ingredient_features = embeddings
else:
ingredient_features = image_features
out = ingredient_features
for i, m in enumerate(self.layers):
if m == "T":
out = self.ingredient_feature_module[i](out, ~mask)
else:
raise Exception("Invalid module")
aggregated_ingredient_features = out
if self.use_recipe_text:
recipe_features = torch.cat([text_features, aggregated_ingredient_features], dim=1)
else:
recipe_features = aggregated_ingredient_features
else:
recipe_features = text_features
final_result = self.classifier(torch.nn.functional.relu(recipe_features))
return final_result

def freeze_features(self):
self.feature_extractor.eval()
def freeze_function(self):
self.classifier.eval()

+ 203
- 0
train.py View File

@@ -0,0 +1,203 @@
from datetime import datetime
import os

experiment_name = 'Parham BS Project Region Prediction'
experiment_code = experiment_name.replace(' - ', '.').replace(' ', '_').lower()


import nltk


# nltk.download('wordnet')
# nltk.download('omw-1.4')
# nltk.download('punkt')


import json
import numpy as np

from torch.utils.data import Dataset, DataLoader
from torch.nn import Module
import torch
import json
from tqdm import tqdm
from gensim.models import FastText
from utils.sam import SAM
from utils.bypass_bn import enable_running_stats, disable_running_stats
from einops import reduce
from utils.recipedb_dataset import RecipeDBDataset

import logging
import argparse
from tqdm import tqdm
import mlflow
import mlflow.pytorch
logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)

from network import ImageTextTransformer
from utils.io import load_config, save_config
print("here")

mlflow.set_experiment(experiment_name)

parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str)


args = parser.parse_args()
config = load_config(args.config)

epochs = config.optim.epochs
batch_size = config.optim.batch_size
learning_rate = config.optim.max_lr
weight_decay = config.optim.weight_decay
embedding_size = config.data.embedding_size
num_classes = config.model.final_classes
sam_rho = config.optim.sam_rho
num_workers = config.optim.num_workers
data_path = config.data.dataset_path
target = config.data.target
target_dictionary = json.load(open(os.path.join(data_path, f'{target}.json'), 'r'))
if 'entropy' in config.optim:
entropy_weight = config.optim.entropy
else:
entropy_weight = 0
config.model.final_classes= len(target_dictionary)
epsilon = 1e-8
print(target)
print(target_dictionary)
output_dir = f'parham-models_image_taext_transformer/{target}/{datetime.now().strftime("%Y-%m-%d_%H-%M-%S")}'
if not os.path.isdir(output_dir):
os.makedirs(output_dir, exist_ok=True)

class EmbedderFasttext():
def __init__(self, path):
self.model = FastText.load(path)
print(f'sFastText Embedding Loaded:\n\t Embedding Size = {self.model.wv.vector_size}\n\t Vocabulary Size = {self.model.wv.vectors.shape[0]}')
def has(self, word):
if word == "":
return False
return True
def get(self, word):
words = word.split('_')
out = np.zeros(self.model.wv.vector_size)
n = len(words)
if n == 0:
raise ValueError('Empty string was given.')
for item in words:
out += self.model.wv.get_vector(item) / n
return list(out)
embedder = EmbedderFasttext(config.data.fasttext_path)

datasets = {
"train": RecipeDBDataset(os.path.join(data_path, 'train.json'),
cousine_dict=target_dictionary,
extract_ingredients=True, extract_recipes=True, extract_cousine=(target != 'category'),
embedder=embedder, target=target, occr_path=os.path.join(data_path, "ingredient_counts.json"),
mask_path=os.path.join(data_path, "ingredient_counts.json"), include_id=True, image_model = config.image_model),
"val": RecipeDBDataset(os.path.join(data_path, "val.json"),
cousine_dict=target_dictionary,
extract_ingredients=True, extract_recipes=True, extract_cousine=(target != 'category'),
embedder=embedder, target=target, occr_path=os.path.join(data_path, "ingredient_counts.json"),
mask_path=os.path.join(data_path, "ingredient_counts.json"), include_id=True, image_model = config.image_model)
}
print('Dataset constructed.')
print(len(datasets['train']), len(datasets['val']))
print(f'target: {target}')
print(f'number of classes: {len(target_dictionary)}')

device = config.optim.device

dataloaders = {
"train":DataLoader(datasets["train"], batch_size=batch_size, collate_fn=datasets['train'].rdb_collate, shuffle=True, num_workers=num_workers),
"val":DataLoader(datasets["val"], batch_size=batch_size, collate_fn=datasets['val'].rdb_collate, shuffle=False,num_workers=num_workers)
}
loss_fn = torch.nn.CrossEntropyLoss().to(device)
print('Dataloader constructed.')

model = ImageTextTransformer(config)
print(model)
model = model.to(device)
optimizer = SAM(model.parameters(), rho=sam_rho, base_optimizer=torch.optim.Adam, lr=learning_rate/10, weight_decay=weight_decay)
scheduler = torch.optim.lr_scheduler.OneCycleLR(max_lr = learning_rate, epochs=epochs, steps_per_epoch=len(dataloaders["train"]), optimizer=optimizer.base_optimizer)

def stable_log_sigmoid(x):
max_value = torch.maximum(x, torch.zeros(*x.shape, dtype=torch.float32, device=x.device))
return -max_value - torch.log(torch.exp(-max_value) + torch.exp(x - max_value))

def argtopk(tensor, k, dim):
indices = torch.argsort(tensor, dim=dim, descending=True)
topk_indices = indices.narrow(dim, 0, k)
return topk_indices

with mlflow.start_run():
mlflow.log_params(dict(config))
result = None
best_val_acc = 0
best_val_top3 = 0
best_val_top5 = 0
for epoch in range(epochs):
for mode in ["train", "val"]:
if mode == 'train':
model.train()
else:
model.eval()
running_loss = 0.0
running_corrects = 0
top_5_corrects = 0
top_3_corrects = 0
num_samples = 0
s = 0
for data_batch in tqdm(dataloaders[mode]):
embeddings= data_batch['ingredients'].to(device)
masks = data_batch['masks'].to(device)
targets = data_batch['cousines'].to(device) if 'cousines' in data_batch else data_batch['targets'].to(device)
image_ingredients = data_batch['image_ingredients'].to(device)
recipe_embeddings = data_batch['recipe_embeddings'].to(device)
with torch.set_grad_enabled(mode == 'train'):
enable_running_stats(model)
out = model(embeddings, masks, image_ingredients, recipe_embeddings)
entropy = -torch.sum(torch.sigmoid(out) * stable_log_sigmoid(out)) / embeddings.shape[0]
loss = loss_fn(out, targets) + entropy_weight * entropy
if mode == 'train':
loss.backward()
optimizer.first_step(zero_grad=True)
disable_running_stats(model)
out = model(embeddings, masks, image_ingredients, recipe_embeddings)
entropy = -torch.sum(torch.sigmoid(out) * stable_log_sigmoid(out)) / embeddings.shape[0]
(loss_fn(out, targets) + entropy_weight * entropy).backward()
optimizer.second_step(zero_grad=True)
scheduler.step()

running_loss+=loss.item()*embeddings.shape[0]
running_corrects += (out.argmax(dim=1) == targets).sum().item()
num_samples+=embeddings.shape[0]
top_5_corrects += (argtopk(out, k=5, dim=1) == targets.unsqueeze(1)).sum().item()
top_3_corrects += (argtopk(out, k=3, dim=1) == targets.unsqueeze(1)).sum().item()
print(f"epoch: {epoch}, loss: {running_loss/num_samples}, acc: {running_corrects/num_samples}, top3: {top_3_corrects/num_samples}, top5: {top_5_corrects/num_samples}")
if mode=="val":
best_val_acc = running_corrects/num_samples*100 if running_corrects/num_samples*100 > best_val_acc else best_val_acc
best_val_top3 = top_3_corrects/num_samples*100 if top_3_corrects/num_samples*100 > best_val_top3 else best_val_top3
best_val_top5 = top_5_corrects/num_samples*100 if top_5_corrects/num_samples*100 > best_val_top5 else best_val_top5
metrics = {
'{}_loss'.format(mode): running_loss/num_samples,
'{}_acc'.format(mode): running_corrects/num_samples*100,
'{}_acc3'.format(mode): top_3_corrects/num_samples*100,
'{}_acc5'.format(mode): top_5_corrects/num_samples*100
}
if mode == 'val':
metrics["best_val_acc"] = best_val_acc
metrics["best_val_acc3"] = best_val_top3
metrics["best_val_acc5"] = best_val_top5
result = running_corrects/num_samples*100
mlflow.log_metrics(metrics)
os.makedirs(output_dir, exist_ok=True)
mlflow.pytorch.log_model(model, 'model')
config.result = result
torch.save(model.state_dict(), os.path.join(output_dir, "checkpoint.pth"))
save_config(config, os.path.join(output_dir, "config.yml"))

+ 0
- 0
utils/__init__.py View File


+ 17
- 0
utils/bypass_bn.py View File

@@ -0,0 +1,17 @@
import torch
import torch.nn as nn

def disable_running_stats(model):
def _disable(module):
if isinstance(module, nn.BatchNorm2d):
module.backup_momentum = module.momentum
module.momentum = 0

model.apply(_disable)

def enable_running_stats(model):
def _enable(module):
if isinstance(module, nn.BatchNorm2d) and hasattr(module, "backup_momentum"):
module.momentum = module.backup_momentum

model.apply(_enable)

+ 27
- 0
utils/fasttext_embedding.py View File

@@ -0,0 +1,27 @@
from gensim.corpora.dictionary import Dictionary
import logging
from pyemd import emd
from nltk.corpus import stopwords
import fasttext
import json
import numpy as np

logger = logging.getLogger(__name__)

class FasttextEmbedding:
def __init__(self, model_path):
if model_path.endswith('.bin'):
self.model = fasttext.load_model(model_path)
self.full = True
else:
self.model = np.load(model_path)
self.full = False
self.stopwords = stopwords.words('english')

def __getitem__(self, idx):
if self.full:
return self.model.get_word_vector(idx)
else:
if idx not in self.model:
raise ValueError('Word not available.')
return self.model[idx]

+ 12
- 0
utils/io.py View File

@@ -0,0 +1,12 @@
import yaml
from easydict import EasyDict as edict
import json

def load_config(path):
with open(path, 'r', encoding='utf8') as f:
return edict(yaml.safe_load(f))
def save_config(config, path):
x = json.loads(json.dumps(config))
with open(path, 'w', encoding='utf8') as f:
yaml.dump(x, f, default_flow_style=False, allow_unicode=True)

+ 185
- 0
utils/recipedb_dataset.py View File

@@ -0,0 +1,185 @@
from typing import Any
import torch
from torch.utils.data import Dataset
import json
import numpy as np
from torch.nn.utils.rnn import pad_sequence
import warnings
import os
warnings.filterwarnings(action='ignore',category=UserWarning,module='gensim')
warnings.filterwarnings(action='ignore',category=FutureWarning,module='gensim')
def mask_count(num):
return num//5

def generate_ing_dict(path, threshold):
assert path != None
with open(path, "r") as json_file:
full_ing_count_list:dict = json.load(json_file)
filtered_ing_list = {}
counter = 0
for ing, count in full_ing_count_list.items():
if count > threshold:
filtered_ing_list[ing] = counter
counter += 1
return filtered_ing_list

def get_ingredient_frequencies(occr_path):
occr = None
with open(occr_path, "r") as json_file:
occr = json.load(json_file)
if '' in occr:
del occr['']
return occr
class RecipeDBDataset(Dataset):
def __init__(self, json_path, cousine_dict=None,
extract_ingredients=False, extract_recipes=False, extract_cousine=False,
embedder=None, include_id=False, mask_threshold=1000, mask_path=None,
occr_path = None, target='country',
image_model="resnet18") -> None:
super(RecipeDBDataset, self).__init__()

with open(json_path, "r") as json_file:
data = json.load(json_file)

if occr_path is not None:
self.freqs = get_ingredient_frequencies(occr_path)
self.all_ingredients, self.all_ingredient_probs = zip(*sorted(self.freqs.items()))
self.all_ingredients = list(self.all_ingredients)
self.all_ingredient_probs = np.array(self.all_ingredient_probs, dtype=np.float32)
self.all_ingredient_probs /= np.sum(self.all_ingredient_probs)
self.ing_dict:dict = generate_ing_dict(mask_path, mask_threshold)
self.len_mask_ing = len(self.ing_dict)
self.data = []
self.embedder = embedder
self.extract_ingredients = extract_ingredients
self.extract_recipes = extract_recipes
self.extract_cousine = extract_cousine
self.ingredient_set = set()
self.image_path = "Data/image_dict_ings.json"
with open(self.image_path, 'r') as jf:
self.image_ing_dict = json.load(jf)
self.image_feature_path = "/home/dml/food/CuisineAdaptation/IngredientsEncoding/image-features-full"
feature_size = {
'resnet18': 512,
'resnet50': 2048,
'resnet101': 2048,
'efficientnet_b0': 1280,
'efficientnet_b3': 1536,
'efficientnet_t0': 1280
}
self.image_model = image_model
self.image_feature_size = feature_size[self.image_model]
self.not_found_ings = set()
self.text_feature_path = "/home/dml/food/CuisineAdaptation/IngredientsEncoding/text-features"
self.text_feature_model = "bert-base-uncased"
failed_ing_count = 0
for recipe in data:
temp_data = {}
if extract_ingredients:
temp_data["ingredients"] = []
for ing in recipe["ingredients"]:
if ing["Ingredient Name"] != "":
temp_data["ingredients"].append(ing["Ingredient Name"])
if len(temp_data["ingredients"]) == 0:
failed_ing_count += 1
continue
if extract_cousine:
temp_data["cousine"] = cousine_dict[recipe[target]]
if include_id:
temp_data["id"] = recipe["id"]
self.data.append(temp_data)
self.cousine_dict = cousine_dict

print(f"failed ings count: {failed_ing_count}")
def __getitem__(self, index: Any):
d = self.data[index]
out = {}
ings = []
if self.extract_ingredients:
for ing in d["ingredients"]:
if self.embedder.has(ing):
ings.append(self.embedder.get(ing))
ings = torch.tensor(ings, dtype=torch.float32)
image_ingredients = []
for ing in d["ingredients"]:
npy_path = ""
if ing in self.image_ing_dict:
npy_path = os.path.join(self.image_feature_path, self.image_model, f"{ing}.npy")
elif ing.replace(" ", "_") in self.image_ing_dict:
npy_path = os.path.join(self.image_feature_path, self.image_model, f"{ing.replace(' ', '_')}.npy")
else:
for ing_part in ing.split():
if ing_part in self.image_ing_dict:
npy_path = os.path.join(self.image_feature_path, self.image_model, f"{ing_part}.npy")
break
else:
self.not_found_ings.add(ing)
if npy_path == "":
image_ingredients.append(np.zeros(self.image_feature_size))
else:
image_ingredients.append(np.load(npy_path))
image_ingredients = torch.tensor(image_ingredients, dtype=torch.float32)
out["ingredients"] = ings
out["image_ingredients"] = image_ingredients

if self.extract_recipes:
out["recipe_embedding"] = torch.tensor(np.load(os.path.join(self.text_feature_path, self.text_feature_model, f'{d["id"]}.npy')), dtype=torch.float32)
if self.extract_cousine:
out["cousine"] = d["cousine"]
return out


def __len__(self):
return self.data.__len__()


def rdb_collate(self, batch):
cousines = []
ingredients = []
masks = []
image_ingredients = []
recipe_embeddings = []
for data in batch:
if "cousine" in data:
cousines.append(data["cousine"])
if "recipe_embedding" in data:
recipe_embeddings.append(data["recipe_embedding"])
if "ingredients" in data:
ingredients.append(data["ingredients"])
masks.append(torch.ones(data["ingredients"].shape[0]))
image_ingredients.append(data["image_ingredients"])

outs = {}
if "ingredients" in data:
masks = pad_sequence(masks, batch_first=True, padding_value=0).type(torch.bool)

ingredients = pad_sequence(ingredients, batch_first=True, padding_value=0)
image_ingredients = pad_sequence(image_ingredients, batch_first=True, padding_value=0)
outs["masks"] = masks
outs["ingredients"] = ingredients
outs["image_ingredients"] = image_ingredients
if "recipe_embedding" in data:
outs["recipe_embeddings"] = torch.cat(recipe_embeddings, dim=0)
if "cousine" in data:
cousines = torch.LongTensor(cousines)
outs["cousines"] = cousines
return outs

def dict_to_device(data:dict, device, return_new_dict=False):
new_dict = {}
for k, v in data.items():
if not return_new_dict:
data[k] = v.to(device)
else:
new_dict[k] = v.to(device)
return new_dict if return_new_dict else data

+ 62
- 0
utils/sam.py View File

@@ -0,0 +1,62 @@
import torch


class SAM(torch.optim.Optimizer):
def __init__(self, params, base_optimizer, rho=0.05, adaptive=False, **kwargs):
assert rho >= 0.0, f"Invalid rho, should be non-negative: {rho}"

defaults = dict(rho=rho, adaptive=adaptive, **kwargs)
super(SAM, self).__init__(params, defaults)

self.base_optimizer = base_optimizer(self.param_groups, **kwargs)
self.param_groups = self.base_optimizer.param_groups

@torch.no_grad()
def first_step(self, zero_grad=False):
grad_norm = self._grad_norm()
for group in self.param_groups:
scale = group["rho"] / (grad_norm + 1e-12)

for p in group["params"]:
if p.grad is None: continue
self.state[p]["old_p"] = p.data.clone()
e_w = (torch.pow(p, 2) if group["adaptive"] else 1.0) * p.grad * scale.to(p)
p.add_(e_w) # climb to the local maximum "w + e(w)"

if zero_grad: self.zero_grad()

@torch.no_grad()
def second_step(self, zero_grad=False):
for group in self.param_groups:
for p in group["params"]:
if p.grad is None: continue
p.data = self.state[p]["old_p"] # get back to "w" from "w + e(w)"

self.base_optimizer.step() # do the actual "sharpness-aware" update

if zero_grad: self.zero_grad()

@torch.no_grad()
def step(self, closure=None):
assert closure is not None, "Sharpness Aware Minimization requires closure, but it was not provided"
closure = torch.enable_grad()(closure) # the closure should do a full forward-backward pass

self.first_step(zero_grad=True)
closure()
self.second_step()

def _grad_norm(self):
shared_device = self.param_groups[0]["params"][0].device # put everything on the same device, in case of model parallelism
norm = torch.norm(
torch.stack([
((torch.abs(p) if group["adaptive"] else 1.0) * p.grad).norm(p=2).to(shared_device)
for group in self.param_groups for p in group["params"]
if p.grad is not None
]),
p=2
)
return norm

def load_state_dict(self, state_dict):
super().load_state_dict(state_dict)
self.base_optimizer.param_groups = self.param_groups

Loading…
Cancel
Save