You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Amirtaha Amanzadi ac014bc1f7 first commit 1 month ago
FeatUp @ 6b5a6c0e91 first commit 1 month ago
TSNE first commit 1 month ago
__pycache__ first commit 1 month ago
README.md first commit 1 month ago
dataset.py first commit 1 month ago
mixstyle.py first commit 1 month ago
model.py first commit 1 month ago
train_concat.py first commit 1 month ago
train_concat_ddp.py first commit 1 month ago
tsne.py first commit 1 month ago
validate.py first commit 1 month ago

README.md

FusionDetect: Fake Image Detection with DINOv2 + CLIP

A hybrid deep learning model for fake image detection that combines DINOv2 and CLIP features with optional robustness enhancements.

Features

  • Dual Backbone Architecture: Leverages both DINOv2 and CLIP vision transformers
  • Robustness Enhancements: JPEG compression and Gaussian blur augmentations
  • Flexible Training: Multiple classifier head configurations and fine-tuning options
  • Multi-GPU Support: Parallel training across multiple GPUs

Project Structure

├── dataset.py          # Custom dataset class with data augmentation
├── model.py           # Hybrid model architecture (DINOv2 + CLIP)
├── train_concat.py    # Main training and evaluation script
└── README.md

Installation

pip install torch torchvision pillow open_clip_torch

Usage

Training

python train_concat.py \
  --train_fake_dir /path/to/train/fake/images \
  --train_real_dir /path/to/train/real/images \
  --test_fake_dir /path/to/test/fake/images \
  --test_real_dir /path/to/test/real/images \
  --save_model_path /path/to/save/models \
  --clip_variant ViT-L-14 \
  --dino_variant dinov2_vitb14 \
  --num_layers 4 \
  --batch_size 256 \
  --epochs 10 \
  --gpu 0

Evaluation

python train_concat.py \
  --train_fake_dir /path/to/train/fake/images \
  --train_real_dir /path/to/train/real/images \
  --test_fake_dir /path/to/test/fake/images \
  --test_real_dir /path/to/test/real/images \
  --model_path /path/to/saved/model.pth \
  --dino_variant dinov2_vitl14 \
  --clip_variant ViT-L-14 \
  --num_layers 4 \
  --gpu 0 \
  --eval

Robustness Training (with augmentations)

python train_concat.py \
  ... # same as training command
  --aug_prob 0.3  # 30% probability to apply JPEG/blur during training

Robustness Evaluation

python train_concat.py \
  ... # same as evaluation command
  --jpeg 95 --blur 2  # Apply JPEG QF=95 and blur sigma=2 during testing

Key Arguments

  • --clip_variant: CLIP model variant (ViT-L-14, ViT-H-14-quickgelu)
  • --dino_variant: DINOv2 model variant (dinov2_vits14, dinov2_vitb14, dinov2_vitl14)
  • --num_layers: Number of layers in classifier head (1-5)
  • --aug_prob: Probability for JPEG/blur augmentations during training
  • --jpeg: JPEG quality factors for evaluation (e.g., 95 75 50)
  • --blur: Gaussian blur sigma values for evaluation (e.g., 1 2 3)
  • --featup: Use FeatUp feature upsampling
  • --mixstyle: Apply MixStyle for domain generalization
  • --finetune_clip: Fine-tune CLIP model during training
  • --finetune_dino: Fine-tune DINOv2 model during training

Dataset Structure

Organize your dataset as follows:

dataset/
├── train/
│   ├── 0_real/
│   └── 1_fake/
└── test/
    ├── 0_real/
    └── 1_fake/

Output

  • Trained models are saved in the specified --save_model_path directory
  • Training arguments are logged to args.txt
  • Models are named with epoch number, accuracy, and average precision

Example Commands

Full Training Example

python train_concat.py \
  --train_fake_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/all_3_cham_sd14/1_fake/ \
  --train_real_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/all_3_cham_sd14/0_real/ \
  --test_fake_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/Chameleon-train-test/test/1_fake/ \
  --test_real_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/Chameleon-train-test/test/0_real/ \
  --save_model_path /media/external_16TB_1/amirtaha_amanzadi/dino/ablation/clip-l14_dino-b14 \
  --clip_variant ViT-L-14 \
  --dino_variant dinov2_vitb14 \
  --num_layers 4 \
  --batch_size 256 \
  --epochs 10 \
  --gpu 0

Full Evaluation Example

python train_concat.py \
  --train_fake_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/all_3_cham_sd14/1_fake/ \
  --train_real_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/all_3_cham_sd14/0_real/ \
  --test_fake_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/Gen-Img-Cham_test/1_fake/ \
  --test_real_dir /media/external_16TB_1/amirtaha_amanzadi/datasets/sample_3_datasets/Gen-Img-Cham_test/0_real/ \
  --model_path /media/external_16TB_1/amirtaha_amanzadi/dino/saved_models/robustness/aug_prob_30/ep20_acc_0.7718_ap_0.7451.pth \
  --dino_variant dinov2_vitl14 \
  --clip_variant ViT-L-14 \
  --num_layers 4 \
  --gpu 0 \
  --eval

Notes

  • For robustness testing, use --jpeg and --blur arguments during evaluation
  • For robust training, use --aug_prob to enable random augmentations
  • Multiple GPUs can be specified using --gpu 0,1,2