更换文档检测模型

2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions
--- a/paddle_detection/configs/mot/bytetrack/README.md
+++ b/paddle_detection/configs/mot/bytetrack/README.md
@@ -0,0 +1 @@
+README_cn.md
--- a/paddle_detection/configs/mot/bytetrack/README_cn.md
+++ b/paddle_detection/configs/mot/bytetrack/README_cn.md
@@ -0,0 +1,195 @@
+简体中文 | [English](README.md)
+
+# ByteTrack (ByteTrack: Multi-Object Tracking by Associating Every Detection Box)
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+    - [行人跟踪](#行人跟踪)
+    - [人头跟踪](#人头跟踪)
+- [多类别适配](#多类别适配)
+- [快速开始](#快速开始)
+- [引用](#引用)
+
+
+## 简介
+[ByteTrack](https://arxiv.org/abs/2110.06864)(ByteTrack: Multi-Object Tracking by Associating Every Detection Box) 通过关联每个检测框来跟踪，而不仅是关联高分的检测框。对于低分数检测框会利用它们与轨迹片段的相似性来恢复真实对象并过滤掉背景检测框。此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异，请自行根据需求进行适配。
+
+
+## 模型库
+
+### 行人跟踪
+
+#### 基于不同检测器的ByteTrack在 MOT-17 half Val Set 上的结果
+
+|  检测训练数据集      |  检测器     | 输入尺度  |  ReID  |  检测mAP(0.5:0.95)  |  MOTA  |  IDF1  |  FPS | 配置文件 |
+| :--------         | :-----      | :----:  | :----:|:------:  | :----: |:-----: |:----:|:----:   |
+| MOT-17 half train | YOLOv3      | 608x608 | -     |  42.7    |  49.5  |  54.8  |   -    |[配置文件](./bytetrack_yolov3.yml) |
+| MOT-17 half train | PP-YOLOE-l  | 640x640 | -     |  52.9    |  50.4  |  59.7  |   -    |[配置文件](./bytetrack_ppyoloe.yml) |
+| MOT-17 half train | PP-YOLOE-l  | 640x640 |PPLCNet|  52.9    |  51.7  |  58.8  |   -    |[配置文件](./bytetrack_ppyoloe_pplcnet.yml) |
+| **mix_mot_ch** | YOLOX-x     | 800x1440|   -   |  61.9    |  77.3  |  71.6  |   -    |[配置文件](./bytetrack_yolox.yml) |
+| **mix_det** | YOLOX-x     | 800x1440|   -   |  65.4    |  84.5  |  77.4  |   -    |[配置文件](./bytetrack_yolox.yml) |
+
+**注意:**
+  - 检测任务相关配置和文档请查看[detector](detector/)。
+  - 模型权重下载链接在配置文件中的```det_weights```和```reid_weights```，运行```tools/eval_mot.py```评估的命令即可自动下载，```reid_weights```若为None则表示不需要使用。
+  - **ByteTrack默认不使用ReID权重**，如需使用ReID权重，可以参考 [bytetrack_ppyoloe_pplcnet.yml](./bytetrack_ppyoloe_pplcnet.yml)，如需**更换ReID权重，可改动其中的`reid_weights: `为自己的权重路径**。
+  - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集，而为了验证精度可以都用**MOT17-half val**数据集去评估，它是每个视频的后一半帧组成的，数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载，并解压放在`dataset/mot/`文件夹下。
+  - **mix_mot_ch**数据集，是MOT17、CrowdHuman组成的联合数据集，**mix_det**数据集是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集，数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation)，最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。
+
+
+#### YOLOX-x ByteTrack(mix_det)在 MOT-16/MOT-17 上的结果
+
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pp-yoloe-an-evolved-version-of-yolo/multi-object-tracking-on-mot16)](https://paperswithcode.com/sota/multi-object-tracking-on-mot16?p=pp-yoloe-an-evolved-version-of-yolo)
+
+|    网络      |  测试集 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :---------: | :-------: | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| ByteTrack-x| MOT-17 Train |  84.4  |  72.8  |  837  |  5653  | 10985 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) |
+| ByteTrack-x| **MOT-17 Test** |  **78.4**  |  69.7  |  4974  |  37551  | 79524 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) |
+| ByteTrack-x| MOT-16 Train |  83.5  |  72.7  |  800  |  6973  | 10419 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) |
+| ByteTrack-x| **MOT-16 Test** |  **77.7**  |  70.1  |  1570  |  15695  | 23304 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) |
+
+
+**注意:**
+  - **mix_det**数据集是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集，数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation)，最终放置于`dataset/mot/`目录下。
+  - MOT-17 Train 和 MOT-16 Train 的指标均为本地评估该数据后的指标，由于Train集包括在了训练集中，此MOTA指标不代表模型的检测跟踪能力，只是因为MOT-17和MOT-16无验证集而它们的Train集有ground truth，是为了方便验证精度。
+  - MOT-17 Test 和 MOT-16 Test 的指标均为交到 [MOTChallenge](https://motchallenge.net)官网评测后的指标，因为MOT-17和MOT-16的Test集未开放ground truth，此MOTA指标可以代表模型的检测跟踪能力。
+  - ByteTrack的训练是单独的检测器训练MOT数据集，推理是组装跟踪器去评估MOT指标，单独的检测模型也可以评估检测指标。
+  - ByteTrack的导出部署，是单独导出检测模型，再组装跟踪器运行的，参照[PP-Tracking](../../../deploy/pptracking/python/README.md)。
+
+
+### 人头跟踪
+
+#### YOLOX-x ByteTrack 在 HT-21 Test Set上的结果
+
+|    模型      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
+| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
+| ByteTrack-x     | 1440x800 |  64.1 |  63.4  |  4191   |  185162  |  210240 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](./bytetrack_yolox_ht21.yml) |
+
+#### YOLOX-x ByteTrack 在 HT-21 Test Set上的结果
+
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
+| ByteTrack-x     | 1440x800 |  72.6  |  61.8  |  5163   |  71235  |  154139 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](./bytetrack_yolox_ht21.yml) |
+
+**注意:**
+  - 更多人头跟踪模型可以参考[headtracking21](../headtracking21)。
+
+
+## 多类别适配
+
+多类别ByteTrack，可以参考 [bytetrack_ppyoloe_ppvehicle9cls.yml](./bytetrack_ppyoloe_ppvehicle9cls.yml)，表示使用 [PP-Vehicle](../../ppvehicle/) 中的PPVehicle9cls数据集训好的模型权重去做多类别车辆跟踪。由于没有跟踪的ground truth标签无法做评估，故只做跟踪预测，只需修改`TestMOTDataset`确保路径存在，且其中的`anno_path`表示指定在一个`label_list.txt`中记录具体类别，需要自己手写，一行表示一个种类，注意路径`anno_path`如果写错或找不到则将默认使用COCO数据集80类的类别。
+
+如需**更换检测器权重，可改动其中的`det_weights: `为自己的权重路径**，并注意**数据集路径、`label_list.txt`和类别数**做出相应更改。
+
+预测多类别车辆跟踪：
+```bash
+# 下载demo视频
+wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/bdd100k_demo.mp4
+
+# 使用PPYOLOE 多类别车辆检测模型
+CUDA_VISIBLE_DEVICES=1 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe_ppvehicle9cls.yml --video_file=bdd100k_demo.mp4 --scaled=True --save_videos
+```
+
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE的YOLOv3则为False，如果使用通用检测模型则为True。
+ - `--save_videos`表示保存可视化视频，同时会保存可视化的图片在`{output_dir}/mot_outputs/`中，`{output_dir}`可通过`--output_dir`设置，默认文件夹名为`output`。
+
+
+## 快速开始
+
+### 1. 训练
+通过如下命令一键式启动训练和评估
+```bash
+python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp
+# 或者
+python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml --eval --amp
+```
+
+**注意:**
+  - ` --eval`是边训练边验证精度；`--amp`是混合精度训练避免溢出，推荐使用paddlepaddle2.2.2版本。
+
+### 2. 评估
+#### 2.1 评估检测效果
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams
+```
+
+**注意:**
+ - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。
+
+#### 2.2 评估跟踪效果
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_yolov3.yml --scaled=True
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --scaled=True
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml --scaled=True
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_yolox.yml --scaled=True
+```
+**注意:**
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE YOLOv3则为False，如果使用通用检测模型则为True, 默认值是False。
+ - 跟踪结果会存于`{output_dir}/mot_results/`中，里面每个视频序列对应一个txt，每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置，默认文件夹名为`output`。
+
+### 3. 预测
+
+使用单个GPU通过如下命令预测一个视频，并保存为视频
+
+```bash
+# 下载demo视频
+wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4
+
+# 使用PPYOLOe行人检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos
+# 或者使用YOLOX行人检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_yolox.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos
+```
+
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE的YOLOv3则为False，如果使用通用检测模型则为True。
+ - `--save_videos`表示保存可视化视频，同时会保存可视化的图片在`{output_dir}/mot_outputs/`中，`{output_dir}`可通过`--output_dir`设置，默认文件夹名为`output`。
+
+
+### 4. 导出预测模型
+
+Step 1：导出检测模型
+```bash
+# 导出PPYOLOe行人检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+# 或者导出YOLOX行人检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams
+```
+
+Step 2：导出ReID模型(可选步骤，默认不需要)
+```bash
+# 导出PPLCNet ReID模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+```
+
+### 5. 用导出的模型基于Python去预测
+
+```bash
+python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts
+# 或者
+python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/yolox_x_24e_800x1440_mix_det/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts
+```
+
+**注意:**
+ - 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。
+
+
+## 引用
+```
+@article{zhang2021bytetrack,
+  title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
+  author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
+  journal={arXiv preprint arXiv:2110.06864},
+  year={2021}
+}
+```
--- a/paddle_detection/configs/mot/bytetrack/_base_/ht21.yml
+++ b/paddle_detection/configs/mot/bytetrack/_base_/ht21.yml
@@ -0,0 +1,34 @@
+metric: COCO
+num_classes: 1
+
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    image_dir: images/train
+    anno_path: annotations/train.json
+    dataset_dir: dataset/mot/HT21
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: images/train
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/HT21
+
+TestDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot/HT21
+    anno_path: annotations/val_half.json
+
+
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: HT21/images/test
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/paddle_detection/configs/mot/bytetrack/_base_/mix_det.yml
+++ b/paddle_detection/configs/mot/bytetrack/_base_/mix_det.yml
@@ -0,0 +1,34 @@
+metric: COCO
+num_classes: 1
+
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    image_dir: ""
+    anno_path: annotations/train.json
+    dataset_dir: dataset/mot/mix_det
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: images/train
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/MOT17
+
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/MOT17
+
+
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/paddle_detection/configs/mot/bytetrack/_base_/mix_mot_ch.yml
+++ b/paddle_detection/configs/mot/bytetrack/_base_/mix_mot_ch.yml
@@ -0,0 +1,34 @@
+metric: COCO
+num_classes: 1
+
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    image_dir: ""
+    anno_path: annotations/train.json
+    dataset_dir: dataset/mot/mix_mot_ch
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: images/train
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/MOT17
+
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/MOT17
+
+
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/paddle_detection/configs/mot/bytetrack/_base_/mot17.yml
+++ b/paddle_detection/configs/mot/bytetrack/_base_/mot17.yml
@@ -0,0 +1,34 @@
+metric: COCO
+num_classes: 1
+
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/train_half.json
+    image_dir: images/train
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+    image_dir: images/train
+
+TestDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+
+
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/paddle_detection/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml
+++ b/paddle_detection/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml
@@ -0,0 +1,60 @@
+worker_num: 4
+eval_height: &eval_height 640
+eval_width: &eval_width 640
+eval_size: &eval_size [*eval_height, *eval_width]
+
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {}
+    - RandomFlip: {}
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - PadGT: {}
+  batch_size: 8
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+  collate_batch: true
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 8
+
+TestReader:
+  inputs_def:
+    image_shape: [3, *eval_height, *eval_width]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, *eval_height, *eval_width]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/paddle_detection/configs/mot/bytetrack/_base_/yolov3_mot_reader_608x608.yml
+++ b/paddle_detection/configs/mot/bytetrack/_base_/yolov3_mot_reader_608x608.yml
@@ -0,0 +1,66 @@
+worker_num: 2
+TrainReader:
+  inputs_def:
+    num_max_boxes: 50
+  sample_transforms:
+    - Decode: {}
+    - Mixup: {alpha: 1.5, beta: 1.5}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {}
+    - RandomFlip: {}
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeBox: {}
+    - PadBox: {num_max_boxes: 50}
+    - BboxXYXY2XYWH: {}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
+  batch_size: 8
+  shuffle: true
+  drop_last: true
+  mixup_epoch: 250
+  use_shared_memory: true
+
+EvalReader:
+  inputs_def:
+    num_max_boxes: 50
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 8
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 608, 608]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT
+EvalMOTReader:
+  inputs_def:
+    num_max_boxes: 50
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 608, 608]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/paddle_detection/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml
+++ b/paddle_detection/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml
@@ -0,0 +1,67 @@
+
+input_height: &input_height 800
+input_width: &input_width 1440
+input_size: &input_size [*input_height, *input_width]
+
+worker_num: 4
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - Mosaic:
+        prob: 1.0
+        input_dim: *input_size
+        degrees: [-10, 10]
+        scale: [0.1, 2.0]
+        shear: [-2, 2]
+        translate: [-0.1, 0.1]
+        enable_mixup: True
+        mixup_prob: 1.0
+        mixup_scale: [0.5, 1.5]
+    - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30}
+    - PadResize: {target_size: *input_size}
+    - RandomFlip: {}
+  batch_transforms:
+    - Permute: {}
+  batch_size: 6
+  shuffle: True
+  drop_last: True
+  collate_batch: False
+  mosaic_epoch: 20
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 8
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 800, 1440]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 1
+
+
+# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 1
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 800, 1440]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 1
--- a/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe.yml
+++ b/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe.yml
@@ -0,0 +1,59 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml',
+  '_base_/mot17.yml',
+  '_base_/ppyoloe_mot_reader_640x640.yml'
+]
+weights: output/bytetrack_ppyoloe/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode, set 'COCO' can be training mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOv3 # PPYOLOe version
+  reid: None
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+reid_weights: None
+
+YOLOv3:
+  backbone: CSPResNet
+  neck: CustomCSPPAN
+  yolo_head: PPYOLOEHead
+  post_process: ~
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+PPYOLOEHead:
+  fpn_strides: [32, 16, 8]
+  grid_cell_scale: 5.0
+  grid_cell_offset: 0.5
+  static_assigner_epoch: -1 # 100
+  use_varifocal_loss: True
+  loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.1 # 0.01 in original detector
+    nms_threshold: 0.4 # 0.6 in original detector
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.2
+  low_conf_thres: 0.1
+  min_box_area: 100
+  vertical_ratio: 1.6 # for pedestrian
--- a/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml
+++ b/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml
@@ -0,0 +1,59 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml',
+  '_base_/mot17.yml',
+  '_base_/ppyoloe_mot_reader_640x640.yml'
+]
+weights: output/bytetrack_ppyoloe_pplcnet/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOv3 # PPYOLOe version
+  reid: PPLCNetEmbedding # use reid
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+reid_weights: https://bj.bcebos.com/v1/paddledet/models/mot/deepsort_pplcnet.pdparams
+
+YOLOv3:
+  backbone: CSPResNet
+  neck: CustomCSPPAN
+  yolo_head: PPYOLOEHead
+  post_process: ~
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+PPYOLOEHead:
+  fpn_strides: [32, 16, 8]
+  grid_cell_scale: 5.0
+  grid_cell_offset: 0.5
+  static_assigner_epoch: -1 # 100
+  use_varifocal_loss: True
+  loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.1 # 0.01 in original detector
+    nms_threshold: 0.4 # 0.6 in original detector
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.2
+  low_conf_thres: 0.1
+  min_box_area: 100
+  vertical_ratio: 1.6 # for pedestrian
--- a/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe_ppvehicle9cls.yml
+++ b/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe_ppvehicle9cls.yml
@@ -0,0 +1,49 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'bytetrack_ppyoloe.yml',
+  '_base_/ppyoloe_mot_reader_640x640.yml'
+]
+weights: output/bytetrack_ppyoloe_ppvehicle9cls/model_final
+
+metric: MCMOT # multi-class, `MOT` for single class
+num_classes: 9
+# pedestrian(1), rider(2), car(3), truck(4), bus(5), van(6), motorcycle(7), bicycle(8), others(9)
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
+    anno_path: dataset/mot/label_list.txt # absolute path
+
+### write in label_list.txt each line:
+# pedestrian
+# rider
+# car
+# truck
+# bus
+# van
+# motorcycle
+# bicycle
+# others
+###
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot_ppyoloe_l_36e_ppvehicle9cls.pdparams
+depth_mult: 1.0
+width_mult: 1.0
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+PPYOLOEHead:
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.1 # 0.01 in original detector
+    nms_threshold: 0.4 # 0.6 in original detector
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.2
+  low_conf_thres: 0.1
+  min_box_area: 0
+  vertical_ratio: 0 # only use 1.6 in MOT17 pedestrian
--- a/paddle_detection/configs/mot/bytetrack/bytetrack_yolov3.yml
+++ b/paddle_detection/configs/mot/bytetrack/bytetrack_yolov3.yml
@@ -0,0 +1,50 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/yolov3_darknet53_40e_608x608_mot17half.yml',
+  '_base_/mot17.yml',
+  '_base_/yolov3_mot_reader_608x608.yml'
+]
+weights: output/bytetrack_yolov3/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolov3_darknet53_270e_coco.pdparams
+ByteTrack:
+  detector: YOLOv3 # General YOLOv3 version
+  reid: None
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolov3_darknet53_40e_608x608_mot17half.pdparams
+reid_weights: None
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.005
+    downsample_ratio: 32
+    clip_bbox: true
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.45
+    nms_top_k: 1000
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.2
+  low_conf_thres: 0.1
+  min_box_area: 100
+  vertical_ratio: 1.6 # for pedestrian
--- a/paddle_detection/configs/mot/bytetrack/bytetrack_yolox.yml
+++ b/paddle_detection/configs/mot/bytetrack/bytetrack_yolox.yml
@@ -0,0 +1,68 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/yolox_x_24e_800x1440_mix_det.yml',
+  '_base_/mix_det.yml',
+  '_base_/yolox_mot_reader_800x1440.yml'
+]
+weights: output/bytetrack_yolox/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOX
+  reid: None
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams
+reid_weights: None
+
+depth_mult: 1.33
+width_mult: 1.25
+
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+
+YOLOCSPPAN:
+  depthwise: False
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
+
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.6
+  low_conf_thres: 0.2
+  min_box_area: 100
+  vertical_ratio: 1.6 # for pedestrian
--- a/paddle_detection/configs/mot/bytetrack/bytetrack_yolox_ht21.yml
+++ b/paddle_detection/configs/mot/bytetrack/bytetrack_yolox_ht21.yml
@@ -0,0 +1,68 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/yolox_x_24e_800x1440_ht21.yml',
+  '_base_/ht21.yml',
+  '_base_/yolox_mot_reader_800x1440.yml'
+]
+weights: output/bytetrack_yolox_ht21/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOX
+  reid: None
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_ht21.pdparams
+reid_weights: None
+
+depth_mult: 1.33
+width_mult: 1.25
+
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+
+YOLOCSPPAN:
+  depthwise: False
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 30000
+    keep_top_k: 1000
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
+
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.7
+  low_conf_thres: 0.1
+  min_box_area: 0
+  vertical_ratio: 0 # 1.6 for pedestrian
--- a/paddle_detection/configs/mot/bytetrack/detector/README.md
+++ b/paddle_detection/configs/mot/bytetrack/detector/README.md
@@ -0,0 +1 @@
+README_cn.md
--- a/paddle_detection/configs/mot/bytetrack/detector/README_cn.md
+++ b/paddle_detection/configs/mot/bytetrack/detector/README_cn.md
@@ -0,0 +1,39 @@
+简体中文 | [English](README.md)
+
+# ByteTrack的检测器
+
+## 简介
+[ByteTrack](https://arxiv.org/abs/2110.06864)(ByteTrack: Multi-Object Tracking by Associating Every Detection Box) 通过关联每个检测框来跟踪，而不仅是关联高分的检测框。此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异，请自行根据需求进行适配。
+
+## 模型库
+
+### 在MOT17-half val数据集上的检测结果
+| 骨架网络         | 网络类型          |   输入尺度   | 学习率策略    |推理时间(fps)   |  Box AP |   下载    | 配置文件 |
+| :-------------- | :-------------  | :--------:  | :---------: | :-----------: | :-----: | :------: | :-----: |
+| DarkNet-53      | YOLOv3          |   608X608   |   40e      |      ----     |  42.7   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams)  | [配置文件](./yolov3_darknet53_40e_608x608_mot17half.yml) |
+| CSPResNet       | PPYOLOe         |   640x640   |   36e       |      ----     |  52.9   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams)     | [配置文件](./ppyoloe_crn_l_36e_640x640_mot17half.yml)    |
+| CSPDarkNet       | YOLOX-x(mix_mot_ch) |   800x1440   |   24e       |      ----     |  61.9   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_mot_ch.pdparams)     | [配置文件](./yolox_x_24e_800x1440_mix_mot_ch.yml)    |
+| CSPDarkNet       | YOLOX-x(mix_det) |   800x1440   |   24e       |      ----     |  65.4   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_det.pdparams)     | [配置文件](./yolox_x_24e_800x1440_mix_det.yml)    |
+
+**注意:**
+  - 以上模型除YOLOX外采用**MOT17-half train**数据集训练，数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载。
+  - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集，而为了验证精度可以都用**MOT17-half val**数据集去评估，它是每个视频的后一半帧组成的，数据集可以从[此链接](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip)下载，并解压放在`dataset/mot/MOT17/images/`文件夹下。
+  - YOLOX-x(mix_mot_ch)采用**mix_mot_ch**数据集，是MOT17、CrowdHuman组成的联合数据集；YOLOX-x(mix_det)采用**mix_det**数据集，是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集，数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation)，最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。
+  - 行人跟踪请使用行人检测器结合行人ReID模型。车辆跟踪请使用车辆检测器结合车辆ReID模型。
+  - 用于ByteTrack跟踪时，这些模型的NMS阈值等后处理设置会与纯检测任务的设置不同。
+
+
+## 快速开始
+
+通过如下命令一键式启动评估、评估和导出
+```bash
+job_name=ppyoloe_crn_l_36e_640x640_mot17half
+config=configs/mot/bytetrack/detector/${job_name}.yml
+log_dir=log_dir/${job_name}
+# 1. training
+python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp
+# 2. evaluation
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=output/${job_name}/model_final.pdparams
+# 3. export
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=output/${job_name}/model_final.pdparams
+```
--- a/paddle_detection/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml
+++ b/paddle_detection/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml
@@ -0,0 +1,83 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../ppyoloe/ppyoloe_crn_l_300e_coco.yml',
+  '../_base_/mot17.yml',
+]
+weights: output/ppyoloe_crn_l_36e_640x640_mot17half/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+
+# schedule configuration for fine-tuning
+epoch: 36
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+    - !CosineDecay
+      max_epochs: 43
+    - !LinearWarmup
+      start_factor: 0.001
+      epochs: 1
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+
+TrainReader:
+  batch_size: 8
+
+
+# detector configuration
+architecture: YOLOv3
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams
+depth_mult: 1.0
+width_mult: 1.0
+
+YOLOv3:
+  backbone: CSPResNet
+  neck: CustomCSPPAN
+  yolo_head: PPYOLOEHead
+  post_process: ~
+
+CSPResNet:
+  layers: [3, 6, 6, 3]
+  channels: [64, 128, 256, 512, 1024]
+  return_idx: [1, 2, 3]
+  use_large_stem: True
+
+CustomCSPPAN:
+  out_channels: [768, 384, 192]
+  stage_num: 1
+  block_num: 3
+  act: 'swish'
+  spp: true
+
+PPYOLOEHead:
+  fpn_strides: [32, 16, 8]
+  grid_cell_scale: 5.0
+  grid_cell_offset: 0.5
+  static_assigner_epoch: -1 # 100
+  use_varifocal_loss: True
+  loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.6
--- a/paddle_detection/configs/mot/bytetrack/detector/yolov3_darknet53_40e_608x608_mot17half.yml
+++ b/paddle_detection/configs/mot/bytetrack/detector/yolov3_darknet53_40e_608x608_mot17half.yml
@@ -0,0 +1,77 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../yolov3/yolov3_darknet53_270e_coco.yml',
+  '../_base_/mot17.yml',
+]
+weights: output/yolov3_darknet53_40e_608x608_mot17half/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+# schedule configuration for fine-tuning
+epoch: 40
+LearningRate:
+  base_lr: 0.0001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 32
+    - 36
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 100
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+TrainReader:
+  batch_size: 8
+  mixup_epoch: 35
+
+# detector configuration
+architecture: YOLOv3
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolov3_darknet53_270e_coco.pdparams
+norm_type: sync_bn
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+DarkNet:
+  depth: 53
+  return_idx: [2, 3, 4]
+
+# use default config
+# YOLOv3FPN:
+
+YOLOv3Head:
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  loss: YOLOv3Loss
+
+YOLOv3Loss:
+  ignore_thresh: 0.7
+  downsample: [32, 16, 8]
+  label_smooth: false
+
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.005
+    downsample_ratio: 32
+    clip_bbox: true
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.45
+    nms_top_k: 1000
--- a/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml
+++ b/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml
@@ -0,0 +1,80 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../yolox/yolox_x_300e_coco.yml',
+  '../_base_/ht21.yml',
+]
+weights: output/yolox_x_24e_800x1440_ht21/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+# schedule configuration for fine-tuning
+epoch: 24
+LearningRate:
+  base_lr: 0.0005 # fintune
+  schedulers:
+  - !CosineDecay
+    max_epochs: 24
+    min_lr_ratio: 0.05
+    last_plateau_epochs: 4
+  - !ExpWarmup
+    epochs: 1
+
+OptimizerBuilder:
+  optimizer:
+    type: Momentum
+    momentum: 0.9
+    use_nesterov: True
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+
+TrainReader:
+  batch_size: 4
+  mosaic_epoch: 20
+
+# detector configuration
+architecture: YOLOX
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+norm_type: sync_bn
+use_ema: True
+ema_decay: 0.9999
+ema_decay_type: "exponential"
+act: silu
+find_unused_parameters: True
+depth_mult: 1.33
+width_mult: 1.25
+
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 32] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+
+YOLOCSPPAN:
+  depthwise: False
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
--- a/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml
+++ b/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml
@@ -0,0 +1,80 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../yolox/yolox_x_300e_coco.yml',
+  '../_base_/mix_det.yml',
+]
+weights: output/yolox_x_24e_800x1440_mix_det/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+# schedule configuration for fine-tuning
+epoch: 24
+LearningRate:
+  base_lr: 0.00075 # fintune
+  schedulers:
+  - !CosineDecay
+    max_epochs: 24
+    min_lr_ratio: 0.05
+    last_plateau_epochs: 4
+  - !ExpWarmup
+    epochs: 1
+
+OptimizerBuilder:
+  optimizer:
+    type: Momentum
+    momentum: 0.9
+    use_nesterov: True
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+
+TrainReader:
+  batch_size: 6
+  mosaic_epoch: 20
+
+# detector configuration
+architecture: YOLOX
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+norm_type: sync_bn
+use_ema: True
+ema_decay: 0.9999
+ema_decay_type: "exponential"
+act: silu
+find_unused_parameters: True
+depth_mult: 1.33
+width_mult: 1.25
+
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 30] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+
+YOLOCSPPAN:
+  depthwise: False
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
--- a/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml
+++ b/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml
@@ -0,0 +1,80 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../yolox/yolox_x_300e_coco.yml',
+  '../_base_/mix_mot_ch.yml',
+]
+weights: output/yolox_x_24e_800x1440_mix_mot_ch/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+# schedule configuration for fine-tuning
+epoch: 24
+LearningRate:
+  base_lr: 0.00075 # fine-tune
+  schedulers:
+  - !CosineDecay
+    max_epochs: 24
+    min_lr_ratio: 0.05
+    last_plateau_epochs: 4
+  - !ExpWarmup
+    epochs: 1
+
+OptimizerBuilder:
+  optimizer:
+    type: Momentum
+    momentum: 0.9
+    use_nesterov: True
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+
+TrainReader:
+  batch_size: 6
+  mosaic_epoch: 20
+
+# detector configuration
+architecture: YOLOX
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+norm_type: sync_bn
+use_ema: True
+ema_decay: 0.9999
+ema_decay_type: "exponential"
+act: silu
+find_unused_parameters: True
+depth_mult: 1.33
+width_mult: 1.25
+
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 30] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+
+YOLOCSPPAN:
+  depthwise: False
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.