|
|
1 year ago | |
|---|---|---|
| .vscode | 1 year ago | |
| dataloaders | 1 year ago | |
| examples | 1 year ago | |
| modules | 1 year ago | |
| utils | 1 year ago | |
| .DS_Store | 1 year ago | |
| .gitignore | 1 year ago | |
| Readme.md | 1 year ago | |
| VideoMAE_frame_selector.ipynb | 1 year ago | |
| check_labels_dist_in_clip_space.ipynb | 1 year ago | |
| download_k400.ipynb | 1 year ago | |
| main.py | 1 year ago | |
| postpretrain_VideoMAE_to_CLIP_Space.ipynb | 1 year ago | |
| preprocess_kinetics_labels.ipynb | 1 year ago | |
| requirements.txt | 1 year ago | |
| save_kinetics_dataset.ipynb | 1 year ago | |
| test.ipynb | 1 year ago | |
| uniform-sampler-video-embedder.py | 1 year ago | |
This project focuses on video action recognition using deep learning techniques, leveraging transfer learning from language models and attention mechanisms.
1.1. Download the Kinetics dataset:
save_kinetics_dataset.ipynb to download the dataset.download_k400.ipynb.1.2. Save the dataset:
2.1. Update Kinetics labels:
preprocess_kinetics_labels.ipynb.3.1. Post-pretraining of VideoMAE:
postpretrain_VideoMAE_to_CLIP_Space.ipynb.4.1. Prepare the test dataset:
4.2. Run the test:
test.ipynb to evaluate the model’s performance.test.ipynb.The model processes multiple frames from a video scene and creates rich representations in the CLIP space.