Meta Byte Track
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 14KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297
  1. # ByteTrack
  2. [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bytetrack-multi-object-tracking-by-1/multi-object-tracking-on-mot17)](https://paperswithcode.com/sota/multi-object-tracking-on-mot17?p=bytetrack-multi-object-tracking-by-1)
  3. [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bytetrack-multi-object-tracking-by-1/multi-object-tracking-on-mot20-1)](https://paperswithcode.com/sota/multi-object-tracking-on-mot20-1?p=bytetrack-multi-object-tracking-by-1)
  4. #### ByteTrack is a simple, fast and strong multi-object tracker.
  5. <p align="center"><img src="assets/sota.png" width="500"/></p>
  6. > [**ByteTrack: Multi-Object Tracking by Associating Every Detection Box**](https://arxiv.org/abs/2110.06864)
  7. >
  8. > Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Zehuan Yuan, Ping Luo, Wenyu Liu, Xinggang Wang
  9. >
  10. > *[arXiv 2110.06864](https://arxiv.org/abs/2110.06864)*
  11. ## Demo Links
  12. | Google Colab demo | Huggingface Demo | Original Paper: ByteTrack |
  13. |:-:|:-:|:-:|
  14. |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1bDilg4cmXFa8HCKHbsZ_p16p0vrhLyu0?usp=sharing)|[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/bytetrack)|[arXiv 2110.06864](https://arxiv.org/abs/2110.06864)|
  15. * Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio).
  16. ## Abstract
  17. Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtain identities by associating detection boxes whose scores are higher than a threshold. The objects with low detection scores, e.g. occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating every detection box instead of only the high score ones. For the low score detection boxes, we utilize their similarities with tracklets to recover true objects and filter out the background detections. When applied to 9 different state-of-the-art trackers, our method achieves consistent improvement on IDF1 scores ranging from 1 to 10 points. To put forwards the state-of-the-art performance of MOT, we design a simple and strong tracker, named ByteTrack. For the first time, we achieve 80.3 MOTA, 77.3 IDF1 and 63.1 HOTA on the test set of MOT17 with 30 FPS running speed on a single V100 GPU.
  18. <p align="center"><img src="assets/teasing.png" width="400"/></p>
  19. ## Tracking performance
  20. ### Results on MOT challenge test set
  21. | Dataset | MOTA | IDF1 | HOTA | MT | ML | FP | FN | IDs | FPS |
  22. |------------|-------|------|------|-------|-------|------|------|------|------|
  23. |MOT17 | 80.3 | 77.3 | 63.1 | 53.2% | 14.5% | 25491 | 83721 | 2196 | 29.6 |
  24. |MOT20 | 77.8 | 75.2 | 61.3 | 69.2% | 9.5% | 26249 | 87594 | 1223 | 13.7 |
  25. ### Visualization results on MOT challenge test set
  26. <img src="assets/MOT17-01-SDP.gif" width="400"/> <img src="assets/MOT17-07-SDP.gif" width="400"/>
  27. <img src="assets/MOT20-07.gif" width="400"/> <img src="assets/MOT20-08.gif" width="400"/>
  28. ## Installation
  29. ### 1. Installing on the host machine
  30. Step1. Install ByteTrack.
  31. ```shell
  32. git clone https://github.com/ifzhang/ByteTrack.git
  33. cd ByteTrack
  34. pip3 install -r requirements.txt
  35. python3 setup.py develop
  36. ```
  37. Step2. Install [pycocotools](https://github.com/cocodataset/cocoapi).
  38. ```shell
  39. pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
  40. ```
  41. Step3. Others
  42. ```shell
  43. pip3 install cython_bbox
  44. ```
  45. ### 2. Docker build
  46. ```shell
  47. docker build -t bytetrack:latest .
  48. # Startup sample
  49. mkdir -p pretrained && \
  50. mkdir -p YOLOX_outputs && \
  51. xhost +local: && \
  52. docker run --gpus all -it --rm \
  53. -v $PWD/pretrained:/workspace/ByteTrack/pretrained \
  54. -v $PWD/datasets:/workspace/ByteTrack/datasets \
  55. -v $PWD/YOLOX_outputs:/workspace/ByteTrack/YOLOX_outputs \
  56. -v /tmp/.X11-unix/:/tmp/.X11-unix:rw \
  57. --device /dev/video0:/dev/video0:mwr \
  58. --net=host \
  59. -e XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR \
  60. -e DISPLAY=$DISPLAY \
  61. --privileged \
  62. bytetrack:latest
  63. ```
  64. ## Data preparation
  65. Download [MOT17](https://motchallenge.net/), [MOT20](https://motchallenge.net/), [CrowdHuman](https://www.crowdhuman.org/), [Cityperson](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md), [ETHZ](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md) and put them under <ByteTrack_HOME>/datasets in the following structure:
  66. ```
  67. datasets
  68. |——————mot
  69. | └——————train
  70. | └——————test
  71. └——————crowdhuman
  72. | └——————Crowdhuman_train
  73. | └——————Crowdhuman_val
  74. | └——————annotation_train.odgt
  75. | └——————annotation_val.odgt
  76. └——————MOT20
  77. | └——————train
  78. | └——————test
  79. └——————Cityscapes
  80. | └——————images
  81. | └——————labels_with_ids
  82. └——————ETHZ
  83. └——————eth01
  84. └——————...
  85. └——————eth07
  86. ```
  87. Then, you need to turn the datasets to COCO format and mix different training data:
  88. ```shell
  89. cd <ByteTrack_HOME>
  90. python3 tools/convert_mot17_to_coco.py
  91. python3 tools/convert_mot20_to_coco.py
  92. python3 tools/convert_crowdhuman_to_coco.py
  93. python3 tools/convert_cityperson_to_coco.py
  94. python3 tools/convert_ethz_to_coco.py
  95. ```
  96. Before mixing different datasets, you need to follow the operations in [mix_xxx.py](https://github.com/ifzhang/ByteTrack/blob/c116dfc746f9ebe07d419caa8acba9b3acfa79a6/tools/mix_data_ablation.py#L6) to create a data folder and link. Finally, you can mix the training data:
  97. ```shell
  98. cd <ByteTrack_HOME>
  99. python3 tools/mix_data_ablation.py
  100. python3 tools/mix_data_test_mot17.py
  101. python3 tools/mix_data_test_mot20.py
  102. ```
  103. ## Model zoo
  104. ### Ablation model
  105. Train on CrowdHuman and MOT17 half train, evaluate on MOT17 half val
  106. | Model | MOTA | IDF1 | IDs | FPS |
  107. |------------|-------|------|------|------|
  108. |ByteTrack_ablation [[google]](https://drive.google.com/file/d/1iqhM-6V_r1FpOlOzrdP_Ejshgk0DxOob/view?usp=sharing), [[baidu(code:eeo8)]](https://pan.baidu.com/s/1W5eRBnxc4x9V8gm7dgdEYg) | 76.6 | 79.3 | 159 | 29.6 |
  109. ### MOT17 test model
  110. Train on CrowdHuman, MOT17, Cityperson and ETHZ, evaluate on MOT17 train.
  111. * **Standard models**
  112. | Model | MOTA | IDF1 | IDs | FPS |
  113. |------------|-------|------|------|------|
  114. |bytetrack_x_mot17 [[google]](https://drive.google.com/file/d/1P4mY0Yyd3PPTybgZkjMYhFri88nTmJX5/view?usp=sharing), [[baidu(code:ic0i)]](https://pan.baidu.com/s/1OJKrcQa_JP9zofC6ZtGBpw) | 90.0 | 83.3 | 422 | 29.6 |
  115. |bytetrack_l_mot17 [[google]](https://drive.google.com/file/d/1XwfUuCBF4IgWBWK2H7oOhQgEj9Mrb3rz/view?usp=sharing), [[baidu(code:1cml)]](https://pan.baidu.com/s/1242adimKM6TYdeLU2qnuRA) | 88.7 | 80.7 | 460 | 43.7 |
  116. |bytetrack_m_mot17 [[google]](https://drive.google.com/file/d/11Zb0NN_Uu7JwUd9e6Nk8o2_EUfxWqsun/view?usp=sharing), [[baidu(code:u3m4)]](https://pan.baidu.com/s/1fKemO1uZfvNSLzJfURO4TQ) | 87.0 | 80.1 | 477 | 54.1 |
  117. |bytetrack_s_mot17 [[google]](https://drive.google.com/file/d/1uSmhXzyV1Zvb4TJJCzpsZOIcw7CCJLxj/view?usp=sharing), [[baidu(code:qflm)]](https://pan.baidu.com/s/1PiP1kQfgxAIrnGUbFP6Wfg) | 79.2 | 74.3 | 533 | 64.5 |
  118. * **Light models**
  119. | Model | MOTA | IDF1 | IDs | Params(M) | FLOPs(G) |
  120. |------------|-------|------|------|------|-------|
  121. |bytetrack_nano_mot17 [[google]](https://drive.google.com/file/d/1AoN2AxzVwOLM0gJ15bcwqZUpFjlDV1dX/view?usp=sharing), [[baidu(code:1ub8)]](https://pan.baidu.com/s/1dMxqBPP7lFNRZ3kFgDmWdw) | 69.0 | 66.3 | 531 | 0.90 | 3.99 |
  122. |bytetrack_tiny_mot17 [[google]](https://drive.google.com/file/d/1LFAl14sql2Q5Y9aNFsX_OqsnIzUD_1ju/view?usp=sharing), [[baidu(code:cr8i)]](https://pan.baidu.com/s/1jgIqisPSDw98HJh8hqhM5w) | 77.1 | 71.5 | 519 | 5.03 | 24.45 |
  123. ### MOT20 test model
  124. Train on CrowdHuman and MOT20, evaluate on MOT20 train.
  125. | Model | MOTA | IDF1 | IDs | FPS |
  126. |------------|-------|------|------|------|
  127. |bytetrack_x_mot20 [[google]](https://drive.google.com/file/d/1HX2_JpMOjOIj1Z9rJjoet9XNy_cCAs5U/view?usp=sharing), [[baidu(code:3apd)]](https://pan.baidu.com/s/1bowJJj0bAnbhEQ3_6_Am0A) | 93.4 | 89.3 | 1057 | 17.5 |
  128. ## Training
  129. The COCO pretrained YOLOX model can be downloaded from their [model zoo](https://github.com/Megvii-BaseDetection/YOLOX/tree/0.1.0). After downloading the pretrained models, you can put them under <ByteTrack_HOME>/pretrained.
  130. * **Train ablation model (MOT17 half train and CrowdHuman)**
  131. ```shell
  132. cd <ByteTrack_HOME>
  133. python3 tools/train.py -f exps/example/mot/yolox_x_ablation.py -d 8 -b 48 --fp16 -o -c pretrained/yolox_x.pth
  134. ```
  135. * **Train MOT17 test model (MOT17 train, CrowdHuman, Cityperson and ETHZ)**
  136. ```shell
  137. cd <ByteTrack_HOME>
  138. python3 tools/train.py -f exps/example/mot/yolox_x_mix_det.py -d 8 -b 48 --fp16 -o -c pretrained/yolox_x.pth
  139. ```
  140. * **Train MOT20 test model (MOT20 train, CrowdHuman)**
  141. For MOT20, you need to clip the bounding boxes inside the image.
  142. Add clip operation in [line 134-135 in data_augment.py](https://github.com/ifzhang/ByteTrack/blob/72cd6dd24083c337a9177e484b12bb2b5b3069a6/yolox/data/data_augment.py#L134), [line 122-125 in mosaicdetection.py](https://github.com/ifzhang/ByteTrack/blob/72cd6dd24083c337a9177e484b12bb2b5b3069a6/yolox/data/datasets/mosaicdetection.py#L122), [line 217-225 in mosaicdetection.py](https://github.com/ifzhang/ByteTrack/blob/72cd6dd24083c337a9177e484b12bb2b5b3069a6/yolox/data/datasets/mosaicdetection.py#L217), [line 115-118 in boxes.py](https://github.com/ifzhang/ByteTrack/blob/72cd6dd24083c337a9177e484b12bb2b5b3069a6/yolox/utils/boxes.py#L115).
  143. ```shell
  144. cd <ByteTrack_HOME>
  145. python3 tools/train.py -f exps/example/mot/yolox_x_mix_mot20_ch.py -d 8 -b 48 --fp16 -o -c pretrained/yolox_x.pth
  146. ```
  147. * **Train custom dataset**
  148. First, you need to prepare your dataset in COCO format. You can refer to [MOT-to-COCO](https://github.com/ifzhang/ByteTrack/blob/main/tools/convert_mot17_to_coco.py) or [CrowdHuman-to-COCO](https://github.com/ifzhang/ByteTrack/blob/main/tools/convert_crowdhuman_to_coco.py). Then, you need to create a Exp file for your dataset. You can refer to the [CrowdHuman](https://github.com/ifzhang/ByteTrack/blob/main/exps/example/mot/yolox_x_ch.py) training Exp file. Don't forget to modify get_data_loader() and get_eval_loader in your Exp file. Finally, you can train bytetrack on your dataset by running:
  149. ```shell
  150. cd <ByteTrack_HOME>
  151. python3 tools/train.py -f exps/example/mot/your_exp_file.py -d 8 -b 48 --fp16 -o -c pretrained/yolox_x.pth
  152. ```
  153. ## Tracking
  154. * **Evaluation on MOT17 half val**
  155. Run ByteTrack:
  156. ```shell
  157. cd <ByteTrack_HOME>
  158. python3 tools/track.py -f exps/example/mot/yolox_x_ablation.py -c pretrained/bytetrack_ablation.pth.tar -b 1 -d 1 --fp16 --fuse
  159. ```
  160. You can get 76.6 MOTA using our pretrained model.
  161. Run other trackers:
  162. ```shell
  163. python3 tools/track_sort.py -f exps/example/mot/yolox_x_ablation.py -c pretrained/bytetrack_ablation.pth.tar -b 1 -d 1 --fp16 --fuse
  164. python3 tools/track_deepsort.py -f exps/example/mot/yolox_x_ablation.py -c pretrained/bytetrack_ablation.pth.tar -b 1 -d 1 --fp16 --fuse
  165. python3 tools/track_motdt.py -f exps/example/mot/yolox_x_ablation.py -c pretrained/bytetrack_ablation.pth.tar -b 1 -d 1 --fp16 --fuse
  166. ```
  167. * **Test on MOT17**
  168. Run ByteTrack:
  169. ```shell
  170. cd <ByteTrack_HOME>
  171. python3 tools/track.py -f exps/example/mot/yolox_x_mix_det.py -c pretrained/bytetrack_x_mot17.pth.tar -b 1 -d 1 --fp16 --fuse
  172. python3 tools/interpolation.py
  173. ```
  174. Submit the txt files to [MOTChallenge](https://motchallenge.net/) website and you can get 79+ MOTA (For 80+ MOTA, you need to carefully tune the test image size and high score detection threshold of each sequence).
  175. * **Test on MOT20**
  176. We use the input size 1600 x 896 for MOT20-04, MOT20-07 and 1920 x 736 for MOT20-06, MOT20-08. You can edit it in [yolox_x_mix_mot20_ch.py](https://github.com/ifzhang/ByteTrack/blob/main/exps/example/mot/yolox_x_mix_mot20_ch.py)
  177. Run ByteTrack:
  178. ```shell
  179. cd <ByteTrack_HOME>
  180. python3 tools/track.py -f exps/example/mot/yolox_x_mix_mot20_ch.py -c pretrained/bytetrack_x_mot20.pth.tar -b 1 -d 1 --fp16 --fuse --match_thresh 0.7 --mot20
  181. python3 tools/interpolation.py
  182. ```
  183. Submit the txt files to [MOTChallenge](https://motchallenge.net/) website and you can get 77+ MOTA (For higher MOTA, you need to carefully tune the test image size and high score detection threshold of each sequence).
  184. ## Applying BYTE to other trackers
  185. See [tutorials](https://github.com/ifzhang/ByteTrack/tree/main/tutorials).
  186. ## Combining BYTE with other detectors
  187. Suppose you have already got the detection results 'dets' (x1, y1, x2, y2, score) from other detectors, you can simply pass the detection results to BYTETracker (you need to first modify some post-processing code according to the format of your detection results in [byte_tracker.py](https://github.com/ifzhang/ByteTrack/blob/main/yolox/tracker/byte_tracker.py)):
  188. ```
  189. from yolox.tracker.byte_tracker import BYTETracker
  190. tracker = BYTETracker(args)
  191. for image in images:
  192. dets = detector(image)
  193. online_targets = tracker.update(dets, info_imgs, img_size)
  194. ```
  195. You can get the tracking results in each frame from 'online_targets'. You can refer to [mot_evaluators.py](https://github.com/ifzhang/ByteTrack/blob/main/yolox/evaluators/mot_evaluator.py) to pass the detection results to BYTETracker.
  196. ## Demo
  197. <img src="assets/palace_demo.gif" width="600"/>
  198. ```shell
  199. cd <ByteTrack_HOME>
  200. python3 tools/demo_track.py video -f exps/example/mot/yolox_x_mix_det.py -c pretrained/bytetrack_x_mot17.pth.tar --fp16 --fuse --save_result
  201. ```
  202. ## Deploy
  203. 1. [ONNX export and ONNXRuntime](./deploy/ONNXRuntime)
  204. 2. [TensorRT in Python](./deploy/TensorRT/python)
  205. 3. [TensorRT in C++](./deploy/TensorRT/cpp)
  206. 4. [ncnn in C++](./deploy/ncnn/cpp)
  207. ## Citation
  208. ```
  209. @article{zhang2021bytetrack,
  210. title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
  211. author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
  212. journal={arXiv preprint arXiv:2110.06864},
  213. year={2021}
  214. }
  215. ```
  216. ## Acknowledgement
  217. A large part of the code is borrowed from [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX), [FairMOT](https://github.com/ifzhang/FairMOT), [TransTrack](https://github.com/PeizeSun/TransTrack) and [JDE-Cpp](https://github.com/samylee/Towards-Realtime-MOT-Cpp). Many thanks for their wonderful works.