You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Morteza Abolghasemi dc9ee53a1f first commit 3 months ago
.vscode first commit 3 months ago
dataloaders first commit 3 months ago
examples first commit 3 months ago
modules first commit 3 months ago
utils first commit 3 months ago
.DS_Store first commit 3 months ago
.gitignore first commit 3 months ago
Readme.md first commit 3 months ago
VideoMAE_frame_selector.ipynb first commit 3 months ago
check_labels_dist_in_clip_space.ipynb first commit 3 months ago
download_k400.ipynb first commit 3 months ago
main.py first commit 3 months ago
postpretrain_VideoMAE_to_CLIP_Space.ipynb first commit 3 months ago
preprocess_kinetics_labels.ipynb first commit 3 months ago
requirements.txt first commit 3 months ago
save_kinetics_dataset.ipynb first commit 3 months ago
test.ipynb first commit 3 months ago
uniform-sampler-video-embedder.py first commit 3 months ago

Readme.md

Video Action Recognition Using Transfer Learning and Attention Mechanisms

This project focuses on video action recognition using deep learning techniques, leveraging transfer learning from language models and attention mechanisms.

Getting Started

1. Dataset Preparation

1.1. Download the Kinetics dataset:

  • Use save_kinetics_dataset.ipynb to download the dataset.
  • Alternatively, you can use download_k400.ipynb.

1.2. Save the dataset:

  • Store the downloaded dataset in your Google Drive for easy access.

2. Label Preprocessing

2.1. Update Kinetics labels:

  • Run preprocess_kinetics_labels.ipynb.
  • This script uses GPT-4 to generate detailed descriptions for each video action.

3. Model Training

3.1. Post-pretraining of VideoMAE:

  • Execute postpretrain_VideoMAE_to_CLIP_Space.ipynb.
  • This notebook trains a transformer layer to map VideoMAE embeddings to CLIP space.

4. Testing

4.1. Prepare the test dataset:

  • Download the UCF101 dataset.
  • Update the UCF101 labels using GPT-4, similar to the Kinetics label preprocessing step.

4.2. Run the test:

  • Use test.ipynb to evaluate the model’s performance.

Prerequisites

  • Python 3.x
  • Jupyter Notebook
  • PyTorch
  • Transformers library
  • CLIP model
  • VideoMAE model
  • Access to GPT-4 API for label preprocessing
  • Google Drive (for storing datasets)

Usage

  1. Follow the steps in the “Getting Started” section to prepare your data and train the model.
  2. Ensure all datasets are properly saved in your Google Drive.
  3. Run the notebooks in the order specified above.
  4. For testing, make sure you have the UCF101 dataset prepared and labels updated before running test.ipynb.

The model processes multiple frames from a video scene and creates rich representations in the CLIP space.

Future Work

  • Implement an adaptive frame selection unit
  • Extend to more diverse datasets
  • Integrate multimodal inputs (e.g., audio)
  • Fine-tune hyperparameters