Morteza Abolghasemi dc9ee53a1f first commit | 4 months ago | |
---|---|---|
.vscode | 4 months ago | |
dataloaders | 4 months ago | |
examples | 4 months ago | |
modules | 4 months ago | |
utils | 4 months ago | |
.DS_Store | 4 months ago | |
.gitignore | 4 months ago | |
Readme.md | 4 months ago | |
VideoMAE_frame_selector.ipynb | 4 months ago | |
check_labels_dist_in_clip_space.ipynb | 4 months ago | |
download_k400.ipynb | 4 months ago | |
main.py | 4 months ago | |
postpretrain_VideoMAE_to_CLIP_Space.ipynb | 4 months ago | |
preprocess_kinetics_labels.ipynb | 4 months ago | |
requirements.txt | 4 months ago | |
save_kinetics_dataset.ipynb | 4 months ago | |
test.ipynb | 4 months ago | |
uniform-sampler-video-embedder.py | 4 months ago |
This project focuses on video action recognition using deep learning techniques, leveraging transfer learning from language models and attention mechanisms.
1.1. Download the Kinetics dataset:
save_kinetics_dataset.ipynb
to download the dataset.download_k400.ipynb
.1.2. Save the dataset:
2.1. Update Kinetics labels:
preprocess_kinetics_labels.ipynb
.3.1. Post-pretraining of VideoMAE:
postpretrain_VideoMAE_to_CLIP_Space.ipynb
.4.1. Prepare the test dataset:
4.2. Run the test:
test.ipynb
to evaluate the model’s performance.test.ipynb
.The model processes multiple frames from a video scene and creates rich representations in the CLIP space.