|
|
1 month ago | |
|---|---|---|
| FeatUp @ 6b5a6c0e91 | 1 month ago | |
| TSNE | 1 month ago | |
| __pycache__ | 1 month ago | |
| README.md | 1 month ago | |
| dataset.py | 1 month ago | |
| mixstyle.py | 1 month ago | |
| model.py | 1 month ago | |
| train_concat.py | 1 month ago | |
| train_concat_ddp.py | 1 month ago | |
| tsne.py | 1 month ago | |
| validate.py | 1 month ago | |
A hybrid deep learning model for fake image detection that combines DINOv2 and CLIP features with optional robustness enhancements.
├── dataset.py # Custom dataset class with data augmentation
├── model.py # Hybrid model architecture (DINOv2 + CLIP)
├── train_concat.py # Main training and evaluation script
└── README.md
pip install torch torchvision pillow open_clip_torch
python train_concat.py \
--train_fake_dir /path/to/train/fake/images \
--train_real_dir /path/to/train/real/images \
--test_fake_dir /path/to/test/fake/images \
--test_real_dir /path/to/test/real/images \
--save_model_path /path/to/save/models \
--clip_variant ViT-L-14 \
--dino_variant dinov2_vitb14 \
--num_layers 4 \
--batch_size 256 \
--epochs 10 \
--gpu 0
python train_concat.py \
--train_fake_dir /path/to/train/fake/images \
--train_real_dir /path/to/train/real/images \
--test_fake_dir /path/to/test/fake/images \
--test_real_dir /path/to/test/real/images \
--model_path /path/to/saved/model.pth \
--dino_variant dinov2_vitl14 \
--clip_variant ViT-L-14 \
--num_layers 4 \
--gpu 0 \
--eval
python train_concat.py \
... # same as training command
--aug_prob 0.3 # 30% probability to apply JPEG/blur during training
python train_concat.py \
... # same as evaluation command
--jpeg 95 --blur 2 # Apply JPEG QF=95 and blur sigma=2 during testing
--clip_variant: CLIP model variant (ViT-L-14, ViT-H-14-quickgelu)--dino_variant: DINOv2 model variant (dinov2_vits14, dinov2_vitb14, dinov2_vitl14)--num_layers: Number of layers in classifier head (1-5)--aug_prob: Probability for JPEG/blur augmentations during training--jpeg: JPEG quality factors for evaluation (e.g., 95 75 50)--blur: Gaussian blur sigma values for evaluation (e.g., 1 2 3)--featup: Use FeatUp feature upsampling--mixstyle: Apply MixStyle for domain generalization--finetune_clip: Fine-tune CLIP model during training--finetune_dino: Fine-tune DINOv2 model during trainingOrganize your dataset as follows:
dataset/
├── train/
│ ├── 0_real/
│ └── 1_fake/
└── test/
├── 0_real/
└── 1_fake/
--save_model_path directoryargs.txtpython train_concat.py \
--train_fake_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/all_3_cham_sd14/1_fake/ \
--train_real_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/all_3_cham_sd14/0_real/ \
--test_fake_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/Chameleon-train-test/test/1_fake/ \
--test_real_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/Chameleon-train-test/test/0_real/ \
--save_model_path /media/external_16TB_1/amirtaha_amanzadi/dino/ablation/clip-l14_dino-b14 \
--clip_variant ViT-L-14 \
--dino_variant dinov2_vitb14 \
--num_layers 4 \
--batch_size 256 \
--epochs 10 \
--gpu 0
python train_concat.py \
--train_fake_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/all_3_cham_sd14/1_fake/ \
--train_real_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/all_3_cham_sd14/0_real/ \
--test_fake_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/Gen-Img-Cham_test/1_fake/ \
--test_real_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/Gen-Img-Cham_test/0_real/ \
--model_path /media/external_16TB_1/amirtaha_amanzadi/dino/saved_models/robustness/aug_prob_30/ep20_acc_0.7718_ap_0.7451.pth \
--dino_variant dinov2_vitl14 \
--clip_variant ViT-L-14 \
--num_layers 4 \
--gpu 0 \
--eval
--jpeg and --blur arguments during evaluation--aug_prob to enable random augmentations--gpu 0,1,2