You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Morteza Abolghasemi dc9ee53a1f first commit 4 weeks ago
.vscode first commit 4 weeks ago
dataloaders first commit 4 weeks ago
examples first commit 4 weeks ago
modules first commit 4 weeks ago
utils first commit 4 weeks ago
.DS_Store first commit 4 weeks ago
.gitignore first commit 4 weeks ago
Readme.md first commit 4 weeks ago
VideoMAE_frame_selector.ipynb first commit 4 weeks ago
check_labels_dist_in_clip_space.ipynb first commit 4 weeks ago
download_k400.ipynb first commit 4 weeks ago
main.py first commit 4 weeks ago
postpretrain_VideoMAE_to_CLIP_Space.ipynb first commit 4 weeks ago
preprocess_kinetics_labels.ipynb first commit 4 weeks ago
requirements.txt first commit 4 weeks ago
save_kinetics_dataset.ipynb first commit 4 weeks ago
test.ipynb first commit 4 weeks ago
uniform-sampler-video-embedder.py first commit 4 weeks ago

Readme.md

Video Action Recognition Using Transfer Learning and Attention Mechanisms

This project focuses on video action recognition using deep learning techniques, leveraging transfer learning from language models and attention mechanisms.

Getting Started

1. Dataset Preparation

1.1. Download the Kinetics dataset:

  • Use save_kinetics_dataset.ipynb to download the dataset.
  • Alternatively, you can use download_k400.ipynb.

1.2. Save the dataset:

  • Store the downloaded dataset in your Google Drive for easy access.

2. Label Preprocessing

2.1. Update Kinetics labels:

  • Run preprocess_kinetics_labels.ipynb.
  • This script uses GPT-4 to generate detailed descriptions for each video action.

3. Model Training

3.1. Post-pretraining of VideoMAE:

  • Execute postpretrain_VideoMAE_to_CLIP_Space.ipynb.
  • This notebook trains a transformer layer to map VideoMAE embeddings to CLIP space.

4. Testing

4.1. Prepare the test dataset:

  • Download the UCF101 dataset.
  • Update the UCF101 labels using GPT-4, similar to the Kinetics label preprocessing step.

4.2. Run the test:

  • Use test.ipynb to evaluate the model’s performance.

Prerequisites

  • Python 3.x
  • Jupyter Notebook
  • PyTorch
  • Transformers library
  • CLIP model
  • VideoMAE model
  • Access to GPT-4 API for label preprocessing
  • Google Drive (for storing datasets)

Usage

  1. Follow the steps in the “Getting Started” section to prepare your data and train the model.
  2. Ensure all datasets are properly saved in your Google Drive.
  3. Run the notebooks in the order specified above.
  4. For testing, make sure you have the UCF101 dataset prepared and labels updated before running test.ipynb.

The model processes multiple frames from a video scene and creates rich representations in the CLIP space.

Future Work

  • Implement an adaptive frame selection unit
  • Extend to more diverse datasets
  • Integrate multimodal inputs (e.g., audio)
  • Fine-tune hyperparameters