移动paddle_detection

2024-09-24 17:02:56 +08:00
parent 90a6d5ec75
commit 3438cf6e0e
2025 changed files with 11 additions and 11 deletions
--- a/services/paddle_services/paddle_detection/configs/mot/DataDownload.md
+++ b/services/paddle_services/paddle_detection/configs/mot/DataDownload.md
@@ -0,0 +1,39 @@
+# 多目标跟踪数据集下载汇总
+## 目录
+- [行人跟踪](#行人跟踪)
+- [车辆跟踪](#车辆跟踪)
+- [人头跟踪](#人头跟踪)
+- [多类别跟踪](#多类别跟踪)
+
+## 行人跟踪
+
+|    数据集      |   下载链接     |  备注  |
+| :-------------| :-------------| :----: |
+| MOT17     | [download](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip) | - |
+| MOT16     | [download](https://bj.bcebos.com/v1/paddledet/data/mot/MOT16.zip) | - |
+| Caltech     | [download](https://bj.bcebos.com/v1/paddledet/data/mot/Caltech.zip) | - |
+| Cityscapes     | [download](https://bj.bcebos.com/v1/paddledet/data/mot/Cityscapes.zip) | - |
+| CUHKSYSU     | [download](https://bj.bcebos.com/v1/paddledet/data/mot/CUHKSYSU.zip) | - |
+| PRW     | [download](https://bj.bcebos.com/v1/paddledet/data/mot/PRW.zip) | - |
+| ETHZ     | [download](https://bj.bcebos.com/v1/paddledet/data/mot/ETHZ.zip) | - |
+
+
+## 车辆跟踪
+
+|    数据集      |   下载链接     |  备注  |
+| :-------------| :-------------| :----: |
+| AICity21     | [download](https://bj.bcebos.com/v1/paddledet/data/mot/aic21mtmct_vehicle.zip) | - |
+
+
+## 人头跟踪
+
+|    数据集      |   下载链接     |  备注  |
+| :-------------| :-------------| :----: |
+| HT21     | [download](https://bj.bcebos.com/v1/paddledet/data/mot/HT21.zip) | - |
+
+
+## 多类别跟踪
+
+|    数据集      |   下载链接     |  备注  |
+| :-------------| :-------------| :----: |
+|  VisDrone-MOT    | [download](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot.zip) | - |
--- a/services/paddle_services/paddle_detection/configs/mot/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/README.md
@@ -0,0 +1,301 @@
+简体中文 | [English](README_en.md)
+
+# 多目标跟踪 (Multi-Object Tracking)
+
+## 内容
+- [简介](#简介)
+- [安装依赖](#安装依赖)
+- [模型库和选型](#模型库和选型)
+- [MOT数据集准备](#MOT数据集准备)
+    - [SDE数据集](#SDE数据集)
+    - [JDE数据集](#JDE数据集)
+    - [用户自定义数据集准备](#用户自定义数据集准备)
+- [引用](#引用)
+
+
+## 简介
+多目标跟踪(Multi-Object Tracking, MOT)是对给定视频或图片序列，定位出多个感兴趣的目标，并在连续帧之间维持个体的ID信息和记录其轨迹。
+当前主流的做法是Tracking By Detecting方式，算法主要由两部分组成：Detection + Embedding。Detection部分即针对视频，检测出每一帧中的潜在目标。Embedding部分则将检出的目标分配和更新到已有的对应轨迹上(即ReID重识别任务)，进行物体间的长时序关联。根据这两部分实现的不同，又可以划分为**SDE**系列和**JDE**系列算法。
+- SDE(Separate Detection and Embedding)这类算法完全分离Detection和Embedding两个环节，最具代表性的是**DeepSORT**算法。这样的设计可以使系统无差别的适配各类检测器，可以针对两个部分分别调优，但由于流程上是串联的导致速度慢耗时较长。也有算法如**ByteTrack**算法为了降低耗时，不使用Embedding特征来计算外观相似度，前提是检测器的精度足够高。
+- JDE(Joint Detection and Embedding)这类算法完是在一个共享神经网络中同时学习Detection和Embedding，使用一个多任务学习的思路设置损失函数。代表性的算法有**JDE**和**FairMOT**。这样的设计兼顾精度和速度，可以实现高精度的实时多目标跟踪。
+
+PaddleDetection中提供了SDE和JDE两个系列的多种算法实现：
+- SDE
+  - [ByteTrack](./bytetrack)
+  - [OC-SORT](./ocsort)
+  - [BoT-SORT](./botsort)
+  - [DeepSORT](./deepsort)
+  - [CenterTrack](./centertrack)
+- JDE
+  - [JDE](./jde)
+  - [FairMOT](./fairmot)
+  - [MCFairMOT](./mcfairmot)
+
+**注意：**
+  - 以上算法原论文均为单类别的多目标跟踪，PaddleDetection团队同时也支持了[ByteTrack](./bytetrack)和FairMOT([MCFairMOT](./mcfairmot))的多类别的多目标跟踪；
+  - [DeepSORT](./deepsort)、[JDE](./jde)、[OC-SORT](./ocsort)、[BoT-SORT](./botsort)和[CenterTrack](./centertrack)均只支持单类别的多目标跟踪；
+  - [DeepSORT](./deepsort)需要额外添加ReID权重一起执行，[ByteTrack](./bytetrack)可加可不加ReID权重，默认不加；
+
+
+### 实时多目标跟踪系统 PP-Tracking
+PaddleDetection团队提供了实时多目标跟踪系统[PP-Tracking](../../deploy/pptracking)，是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统，具有模型丰富、应用广泛和部署高效三大优势。
+PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式，针对实际业务的难点和痛点，提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用，部署方式支持API调用和GUI可视化界面，部署语言支持Python和C++，部署平台环境支持Linux、NVIDIA Jetson等。
+PP-Tracking单镜头跟踪采用的方案是[FairMOT](./fairmot)，跨镜头跟踪采用的方案是[DeepSORT](./deepsort)。
+
+<div width="1000" align="center">
+  <img src="../../docs/images/pptracking.png"/>
+</div>
+
+<div width="1000" align="center">
+  <img src="https://user-images.githubusercontent.com/22989727/205546999-f847183d-73e5-4abe-9896-ce6a245efc79.gif"/>
+  <br>
+  视频来源：VisDrone和BDD100K公开数据集</div>
+</div>
+
+#### AI Studio公开项目案例
+教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。
+
+#### Python端预测部署
+教程请参考[PP-Tracking Python部署文档](../../deploy/pptracking/python/README.md)。
+
+#### C++端预测部署
+教程请参考[PP-Tracking C++部署文档](../../deploy/pptracking/cpp/README.md)。
+
+#### GUI可视化界面预测部署
+教程请参考[PP-Tracking可视化界面使用文档](https://github.com/yangyudong2020/PP-Tracking_GUi)。
+
+
+### 实时行人分析工具 PP-Human
+PaddleDetection团队提供了实时行人分析工具[PP-Human](../../deploy/pipeline)，是基于PaddlePaddle深度学习框架的业界首个开源的产业级实时行人分析工具，具有模型丰富、应用广泛和部署高效三大优势。
+PP-Human支持图片/单镜头视频/多镜头视频多种输入方式，功能覆盖多目标跟踪、属性识别、行为分析及人流量计数与轨迹记录。能够广泛应用于智慧交通、智慧社区、工业巡检等领域。支持服务器端部署及TensorRT加速，T4服务器上可达到实时。
+PP-Human跟踪采用的方案是[ByteTrack](./bytetrack)。
+
+![](https://user-images.githubusercontent.com/48054808/173030254-ecf282bd-2cfe-43d5-b598-8fed29e22020.gif)
+
+#### AI Studio公开项目案例
+PP-Human实时行人分析全流程实战教程[链接](https://aistudio.baidu.com/aistudio/projectdetail/3842982)。
+
+PP-Human赋能社区智能精细化管理教程[链接](https://aistudio.baidu.com/aistudio/projectdetail/3679564)。
+
+
+
+## 安装依赖
+一键安装MOT相关的依赖：
+```
+pip install -r requirements.txt
+# 或手动pip安装MOT相关的库
+pip install lap motmetrics sklearn
+```
+**注意：**
+  - 预测需确保已安装[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+
+
+
+## 模型库和选型
+- 基础模型
+    - [ByteTrack](bytetrack/README_cn.md)
+    - [OC-SORT](ocsort/README_cn.md)
+    - [BoT-SORT](botsort/README_cn.md)
+    - [DeepSORT](deepsort/README_cn.md)
+    - [JDE](jde/README_cn.md)
+    - [FairMOT](fairmot/README_cn.md)
+    - [CenterTrack](centertrack/README_cn.md)
+- 特色垂类模型
+    - [行人跟踪](pedestrian/README_cn.md)
+    - [人头跟踪](headtracking21/README_cn.md)
+    - [车辆跟踪](vehicle/README_cn.md)
+- 多类别跟踪
+    - [多类别跟踪](mcfairmot/README_cn.md)
+- 跨境头跟踪
+    - [跨境头跟踪](mtmct/README_cn.md)
+
+### 模型选型总结
+
+关于模型选型，PaddleDetection团队提供的总结建议如下：
+
+|    MOT方式      |   经典算法      |  算法流程 |  数据集要求  |  其他特点  |
+| :--------------| :--------------| :------- | :----: | :----: |
+| SDE系列  | DeepSORT,ByteTrack,OC-SORT,BoT-SORT,CenterTrack | 分离式，两个独立模型权重先检测后ReID，也可不加ReID | 检测和ReID数据相对独立，不加ReID时即纯检测数据集 |检测和ReID可分别调优，鲁棒性较高，AI竞赛常用|
+| JDE系列  | FairMOT,JDE | 联合式，一个模型权重端到端同时检测和ReID | 必须同时具有检测和ReID标注 | 检测和ReID联合训练，不易调优，泛化性不强|
+
+**注意：**
+  - 由于数据标注的成本较大，建议选型前优先考虑**数据集要求**，如果数据集只有检测框标注而没有ReID标注，是无法使用JDE系列算法训练的，更推荐使用SDE系列；
+  - SDE系列算法在检测器精度足够高时，也可以不使用ReID权重进行物体间的长时序关联，可以参照[ByteTrack](bytetrack)；
+  - 耗时速度和模型权重参数量计算量有一定关系，耗时从理论上看`不使用ReID的SDE系列 < JDE系列 < 使用ReID的SDE系列`；
+
+
+
+## MOT数据集准备
+PaddleDetection团队提供了众多公开数据集或整理后数据集的下载链接，参考[数据集下载汇总](DataDownload.md)，用户可以自行下载使用。
+
+根据模型选型总结，MOT数据集可以分为两类：一类纯检测框标注的数据集，仅SDE系列可以使用；另一类是同时有检测和ReID标注的数据集，SDE系列和JDE系列都可以使用。
+
+### SDE数据集
+SDE数据集是纯检测标注的数据集，用户自定义数据集可以参照[DET数据准备文档](../../docs/tutorials/data/PrepareDetDataSet.md)准备。
+
+以MOT17数据集为例，下载并解压放在`PaddleDetection/dataset/mot`目录下：
+```
+wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip
+
+```
+并修改数据集部分的配置文件如下：
+```
+num_classes: 1
+
+TrainDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/train_half.json
+    image_dir: images/train
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+    image_dir: images/train
+
+TestDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+```
+
+数据集目录为：
+```
+dataset/mot
+        |——————MOT17
+                |——————annotations
+                |——————images
+```
+
+### JDE数据集
+JDE数据集是同时有检测和ReID标注的数据集，首先按照以下命令`image_lists.zip`并解压放在`PaddleDetection/dataset/mot`目录下：
+```
+wget https://bj.bcebos.com/v1/paddledet/data/mot/image_lists.zip
+```
+
+然后按照以下命令可以快速下载各个公开数据集，也解压放在`PaddleDetection/dataset/mot`目录下：
+```
+# MIX数据，同JDE,FairMOT论文使用的数据集
+wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/Caltech.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/CUHKSYSU.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/PRW.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/Cityscapes.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/ETHZ.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT16.zip
+```
+数据集目录为：
+```
+dataset/mot
+  |——————image_lists
+            |——————caltech.all  
+            |——————citypersons.train  
+            |——————cuhksysu.train  
+            |——————eth.train  
+            |——————mot16.train  
+            |——————mot17.train  
+            |——————prw.train  
+  |——————Caltech
+  |——————Cityscapes
+  |——————CUHKSYSU
+  |——————ETHZ
+  |——————MOT16
+  |——————MOT17
+  |——————PRW
+```
+
+#### JDE数据集的格式
+这几个相关数据集都遵循以下结构：
+```
+MOT17
+   |——————images
+   |        └——————train
+   |        └——————test
+   └——————labels_with_ids
+            └——————train
+```
+所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径，可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中，每行都描述一个边界框，格式如下：
+```
+[class] [identity] [x_center] [y_center] [width] [height]
+```
+  - `class`为类别id，支持单类别和多类别，从`0`开始计，单类别即为`0`。
+  - `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数)，如果此框没有`identity`标注，则为`-1`。
+  - `[x_center] [y_center] [width] [height]`是中心点坐标和宽高，注意他们的值是由图片的宽度/高度标准化的，因此它们是从0到1的浮点数。
+
+
+**注意：**
+  - MIX数据集是[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT)和[FairMOT](https://github.com/ifzhang/FairMOT)原论文使用的数据集，包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。使用前6者作为联合数据集参与训练，MOT16作为评测数据集。如果您想使用这些数据集，请**遵循他们的License**。
+  - MIX数据集以及其子数据集都是单类别的行人跟踪数据集，可认为相比于行人检测数据集多了id号的标注。
+  - 更多场景的垂类模型例如车辆行人人头跟踪等，垂类数据集也需要处理成与MIX数据集相同的格式，参照[数据集下载汇总](DataDownload.md)、[车辆跟踪](vehicle/README_cn.md)、[人头跟踪](headtracking21/README_cn.md)以及更通用的[行人跟踪](pedestrian/README_cn.md)。
+  - 用户自定义数据集可参照[MOT数据集准备教程](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。
+
+
+### 用户自定义数据集准备
+用户自定义数据集准备请参考[MOT数据集准备教程](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。
+
+## 引用
+```
+@inproceedings{Wojke2017simple,
+  title={Simple Online and Realtime Tracking with a Deep Association Metric},
+  author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
+  booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
+  year={2017},
+  pages={3645--3649},
+  organization={IEEE},
+  doi={10.1109/ICIP.2017.8296962}
+}
+
+@inproceedings{Wojke2018deep,
+  title={Deep Cosine Metric Learning for Person Re-identification},
+  author={Wojke, Nicolai and Bewley, Alex},
+  booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
+  year={2018},
+  pages={748--756},
+  organization={IEEE},
+  doi={10.1109/WACV.2018.00087}
+}
+
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+
+@article{zhang2020fair,
+  title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking},
+  author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu},
+  journal={arXiv preprint arXiv:2004.01888},
+  year={2020}
+}
+
+@article{zhang2021bytetrack,
+  title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
+  author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
+  journal={arXiv preprint arXiv:2110.06864},
+  year={2021}
+}
+
+@article{cao2022observation,
+  title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking},
+  author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris},
+  journal={arXiv preprint arXiv:2203.14360},
+  year={2022}
+}
+
+@article{aharon2022bot,
+  title={BoT-SORT: Robust Associations Multi-Pedestrian Tracking},
+  author={Aharon, Nir and Orfaig, Roy and Bobrovsky, Ben-Zion},
+  journal={arXiv preprint arXiv:2206.14651},
+  year={2022}
+}
+
+@article{zhou2020tracking,
+  title={Tracking Objects as Points},
+  author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
+  journal={ECCV},
+  year={2020}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/README_en.md
+++ b/services/paddle_services/paddle_detection/configs/mot/README_en.md
@@ -0,0 +1,217 @@
+English | [简体中文](README.md)
+
+# MOT (Multi-Object Tracking)
+
+## Table of Contents
+- [Introduction](#Introduction)
+- [Installation](#Installation)
+- [Model Zoo](#Model_Zoo)
+- [Dataset Preparation](#Dataset_Preparation)
+- [Citations](#Citations)
+
+## Introduction
+The current mainstream of 'Tracking By Detecting' multi-object tracking (MOT) algorithm is mainly composed of two parts: detection and embedding. Detection aims to detect the potential targets in each frame of the video. Embedding assigns and updates the detected target to the corresponding track (named ReID task). According to the different implementation of these two parts, it can be divided into **SDE** series and **JDE** series algorithm.
+
+- **SDE** (Separate Detection and Embedding) is a kind of algorithm which completely separates Detection and Embedding. The most representative is **DeepSORT** algorithm. This design can make the system fit any kind of detectors without difference, and can be improved for each part separately. However, due to the series process, the speed is slow. Time-consuming is a great challenge in the construction of real-time MOT system.
+- **JDE** (Joint Detection and Embedding) is to learn detection and embedding simultaneously in a shared neural network, and set the loss function with a multi task learning approach. The representative algorithms are **JDE** and **FairMOT**. This design can achieve high-precision real-time MOT performance.
+
+Paddledetection implements three MOT algorithms of these two series, they are [DeepSORT](https://arxiv.org/abs/1812.00442) of SDE algorithm, and [JDE](https://arxiv.org/abs/1909.12605),[FairMOT](https://arxiv.org/abs/2004.01888) of JDE algorithm.
+
+### PP-Tracking real-time MOT system
+In addition, PaddleDetection also provides [PP-Tracking](../../deploy/pptracking/README.md) real-time multi-object tracking system.
+PP-Tracking is the first open source real-time Multi-Object Tracking system, and it is based on PaddlePaddle deep learning framework. It has rich models, wide application and high efficiency deployment.
+
+PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). Aiming at the difficulties and pain points of actual business, PP-Tracking provides various MOT functions and applications such as pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking, traffic statistics and multi-camera tracking. The deployment method supports API and GUI visual interface, and the deployment language supports Python and C++, The deployment platform environment supports Linux, NVIDIA Jetson, etc.
+
+### AI studio public project tutorial
+PP-tracking provides an AI studio public project tutorial. Please refer to this [tutorial](https://aistudio.baidu.com/aistudio/projectdetail/3022582).
+
+### Python predict and deployment
+PP-Tracking supports Python predict and deployment. Please refer to this [doc](../../deploy/pptracking/python/README.md).
+
+### C++ predict and deployment
+PP-Tracking supports C++ predict and deployment. Please refer to this [doc](../../deploy/pptracking/cpp/README.md).
+
+### GUI predict and deployment
+PP-Tracking supports GUI predict and deployment. Please refer to this [doc](https://github.com/yangyudong2020/PP-Tracking_GUi).
+
+<div width="1000" align="center">
+  <img src="../../docs/images/pptracking_en.png"/>
+</div>
+
+<div width="1000" align="center">
+  <img src="https://user-images.githubusercontent.com/22989727/205546999-f847183d-73e5-4abe-9896-ce6a245efc79.gif"/>
+  <br>
+  video source：VisDrone, BDD100K dataset</div>
+</div>
+
+
+## Installation
+Install all the related dependencies for MOT:
+```
+pip install lap motmetrics sklearn
+or
+pip install -r requirements.txt
+```
+**Notes:**
+- Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`.
+
+
+## Model Zoo
+- Base models
+    - [ByteTrack](bytetrack/README.md)
+    - [OC-SORT](ocsort/README.md)
+    - [BoT-SORT](botsort/README.md)
+    - [DeepSORT](deepsort/README.md)
+    - [JDE](jde/README.md)
+    - [FairMOT](fairmot/README.md)
+    - [CenterTrack](centertrack/README.md)
+- Feature models
+    - [Pedestrian](pedestrian/README.md)
+    - [Head](headtracking21/README.md)
+    - [Vehicle](vehicle/README.md)
+- Multi-Class Tracking
+    - [MCFairMOT](mcfairmot/README.md)
+- Multi-Target Multi-Camera Tracking
+    - [MTMCT](mtmct/README.md)
+
+
+## Dataset Preparation
+### MOT Dataset
+PaddleDetection implement [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT), and use the same training data named 'MIX' as them, including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. The former six are used as the mixed dataset for training, and MOT16 are used as the evaluation dataset. If you want to use these datasets, please **follow their licenses**.
+
+**Notes:**
+- Multi-Object Tracking(MOT) datasets are always used for single category tracking. DeepSORT, JDE and FairMOT are single category MOT models. 'MIX' dataset and it's sub datasets are also single category pedestrian tracking datasets. It can be considered that there are additional IDs ground truth for detection datasets.
+- In order to train the feature models of more scenes, more datasets are also processed into the same format as the MIX dataset. PaddleDetection Team also provides feature datasets and models of [vehicle tracking](vehicle/README.md), [head tracking](headtracking21/README.md) and more general [pedestrian tracking](pedestrian/README.md). User defined datasets can also be prepared by referring to data preparation [doc](../../docs/tutorials/data/PrepareMOTDataSet.md).
+- The multipe category MOT model is [MCFairMOT] (mcfairmot/readme_cn.md), and the multi category dataset is the integrated version of VisDrone dataset. Please refer to the doc of [MCFairMOT](mcfairmot/README.md).
+- The Multi-Target Multi-Camera Tracking (MTMCT) model is [AIC21 MTMCT](https://www.aicitychallenge.org)(CityFlow) Multi-Camera Vehicle Tracking dataset. The dataset and model can refer to the doc of [MTMCT](mtmct/README.md)
+
+### Dataset Directory
+First, download the image_lists.zip using the following command, and unzip them into `PaddleDetection/dataset/mot`:
+```
+wget https://bj.bcebos.com/v1/paddledet/data/mot/image_lists.zip
+```
+
+Then, download the MIX dataset using the following command, and unzip them into `PaddleDetection/dataset/mot`:
+```
+wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/Caltech.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/CUHKSYSU.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/PRW.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/Cityscapes.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/ETHZ.zip
+wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT16.zip
+```
+
+The final directory is:
+```
+dataset/mot
+  |——————image_lists
+            |——————caltech.10k.val  
+            |——————caltech.all  
+            |——————caltech.train  
+            |——————caltech.val  
+            |——————citypersons.train  
+            |——————citypersons.val  
+            |——————cuhksysu.train  
+            |——————cuhksysu.val  
+            |——————eth.train  
+            |——————mot16.train  
+            |——————mot17.train  
+            |——————prw.train  
+            |——————prw.val
+  |——————Caltech
+  |——————Cityscapes
+  |——————CUHKSYSU
+  |——————ETHZ
+  |——————MOT16
+  |——————MOT17
+  |——————PRW
+```
+
+### Data Format
+These several relevant datasets have the following structure:
+```
+MOT17
+   |——————images
+   |        └——————train
+   |        └——————test
+   └——————labels_with_ids
+            └——————train
+```
+Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
+
+In the annotation text, each line is describing a bounding box and has the following format:
+```
+[class] [identity] [x_center] [y_center] [width] [height]
+```
+**Notes:**
+- `class` is the class id, support single class and multi-class, start from `0`, and for single class is `0`.
+- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset of all videos or image squences), or `-1` if this box has no identity annotation.
+- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
+
+
+## Citations
+```
+@inproceedings{Wojke2017simple,
+  title={Simple Online and Realtime Tracking with a Deep Association Metric},
+  author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
+  booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
+  year={2017},
+  pages={3645--3649},
+  organization={IEEE},
+  doi={10.1109/ICIP.2017.8296962}
+}
+
+@inproceedings{Wojke2018deep,
+  title={Deep Cosine Metric Learning for Person Re-identification},
+  author={Wojke, Nicolai and Bewley, Alex},
+  booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
+  year={2018},
+  pages={748--756},
+  organization={IEEE},
+  doi={10.1109/WACV.2018.00087}
+}
+
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+
+@article{zhang2020fair,
+  title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking},
+  author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu},
+  journal={arXiv preprint arXiv:2004.01888},
+  year={2020}
+}
+
+@article{zhang2021bytetrack,
+  title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
+  author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
+  journal={arXiv preprint arXiv:2110.06864},
+  year={2021}
+}
+
+@article{cao2022observation,
+  title={Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking},
+  author={Cao, Jinkun and Weng, Xinshuo and Khirodkar, Rawal and Pang, Jiangmiao and Kitani, Kris},
+  journal={arXiv preprint arXiv:2203.14360},
+  year={2022}
+}
+
+@article{aharon2022bot,
+  title={BoT-SORT: Robust Associations Multi-Pedestrian Tracking},
+  author={Aharon, Nir and Orfaig, Roy and Bobrovsky, Ben-Zion},
+  journal={arXiv preprint arXiv:2206.14651},
+  year={2022}
+}
+
+@article{zhou2020tracking,
+  title={Tracking Objects as Points},
+  author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
+  journal={ECCV},
+  year={2020}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/botsort/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/botsort/README.md
@@ -0,0 +1,89 @@
+English | [简体中文](README_cn.md)
+
+# BOT_SORT (BoT-SORT: Robust Associations Multi-Pedestrian Tracking)
+
+## content
+- [introduction](#introduction)
+- [model zoo](#modelzoo)
+- [Quick Start](#QuickStart)
+- [Citation](Citation)
+
+## introduction
+[BOT_SORT](https://arxiv.org/pdf/2206.14651v2.pdf)(BoT-SORT: Robust Associations Multi-Pedestrian Tracking). The configuration of common detectors is provided here for reference. Because different training data sets, input scales, number of training epochs, NMS threshold settings, etc. will lead to differences in model accuracy and performance, please adapt according to your needs
+
+## modelzoo
+
+### BOT_SORT在MOT-17 half Val Set
+
+|  Dataset      |  detector     | input size  | detector mAP  |  MOTA  |  IDF1  |  config |
+| :--------         | :-----      | :----:  | :------:  | :----: |:-----: |:----:   |
+| MOT-17 half train | PP-YOLOE-l  | 640x640 |  52.7    |  55.5  |  64.2 |[config](./botsort_ppyoloe.yml) |
+
+
+**Attention:**
+  - Model weight download link in the configuration file ` ` ` det_ Weights ` ` `, run the verification command to automatically download.
+  - **MOT17-half train** is a data set composed of pictures and labels of the first half frames of each video in the MOT17 train sequence (7 in total). To verify the accuracy, we can use the **MOT17-half val** to eval，It is composed of the second half frame of each video，download [link](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)，decompression `dataset/mot/`
+
+  - BOT_ SORT training is a separate detector training MOT dataset, reasoning is to assemble a tracker to evaluate MOT indicators, and a separate detection model can also evaluate detection indicators.
+  - BOT_SORT export deployment is to export the detection model separately and then assemble the tracker for operation. Refer to [PP-Tracking](../../../deploy/pptracking/python)。
+  - BOT_SORT is the main scheme for PP Human, PP Vehicle and other pipelines to analyze the project tracking direction. For specific use, please refer to [Pipeline](../../../deploy/pipeline) and [MOT](../../../deploy/pipeline/docs/tutorials/pphuman_mot.md).
+
+
+## QuickStart
+
+### 1. train
+Start training and evaluation with the following command
+```bash
+#Single gpu
+CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp
+
+#Multi gpu
+python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp
+```
+
+### 2. evaluate
+#### 2.1 detection
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml
+```
+
+**Attention:**
+ - eval detection use ```tools/eval.py```，eval mot use ```tools/eval_mot.py```.
+
+#### 2.2 mot
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/botsort/botsort_ppyoloe.yml --scaled=True
+```
+**Attention:**
+ - `--scaled` indicates whether the coordinates of the output results of the model have been scaled back to the original drawing. If the detection model used is JDE YOLOv3, it is false. If the universal detection model is used, it is true. The default value is false.
+ - mot result save `{output_dir}/mot_results/`,each video sequence in it corresponds to a txt, and each line of information in each txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`, and `{output_dir}` could  use `--output_dir` to set.
+
+### 3. export detection model
+
+```bash
+python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --output_dir=output_inference -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+```
+
+### 4. Use the export model to predict
+
+```bash
+# download demo video
+wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4
+
+CUDA_VISIBLE_DEVICES=0 python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5
+```
+**Attention:**
+ - You must fix `tracker_config.yml` tracker `type: BOTSORTTracker`，if you want to use BOT_SORT.
+ - The tracking model is used to predict videos. It does not support prediction of a single image. By default, the videos with visualized tracking results are saved. You can add `--save_mot_txts` (save a txt for each video) or `--save_mot_txt_per_img`(Save a txt for each image)  or `--save_images` save the visualization picture of tracking results.
+ - Each line of the trace result txt file format `frame,id,x1,y1,w,h,score,-1,-1,-1`。
+
+
+## Citation
+```
+@article{aharon2022bot,
+  title={BoT-SORT: Robust Associations Multi-Pedestrian Tracking},
+  author={Aharon, Nir and Orfaig, Roy and Bobrovsky, Ben-Zion},
+  journal={arXiv preprint arXiv:2206.14651},
+  year={2022}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/botsort/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/botsort/README_cn.md
@@ -0,0 +1,89 @@
+简体中文 | [English](README.md)
+
+# BOT_SORT (BoT-SORT: Robust Associations Multi-Pedestrian Tracking)
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+- [快速开始](#快速开始)
+- [引用](#引用)
+
+## 简介
+[BOT_SORT](https://arxiv.org/pdf/2206.14651v2.pdf)(BoT-SORT: Robust Associations Multi-Pedestrian Tracking)。此处提供了常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异，请自行根据需求进行适配。
+
+## 模型库
+
+### BOT_SORT在MOT-17 half Val Set上结果
+
+|  检测训练数据集      |  检测器     | 输入尺度  | 检测mAP  |  MOTA  |  IDF1  |  配置文件 |
+| :--------         | :-----      | :----:  | :------:  | :----: |:-----: |:----:   |
+| MOT-17 half train | PP-YOLOE-l  | 640x640 |  52.7    |  55.5  |  64.2 |[配置文件](./botsort_ppyoloe.yml) |
+
+
+**注意:**
+  - 模型权重下载链接在配置文件中的```det_weights```，运行验证的命令即可自动下载。
+  - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集，而为了验证精度可以都用**MOT17-half val**数据集去评估，它是每个视频的后一半帧组成的，数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载，并解压放在`dataset/mot/`文件夹下。
+
+  - BOT_SORT的训练是单独的检测器训练MOT数据集，推理是组装跟踪器去评估MOT指标，单独的检测模型也可以评估检测指标。
+  - BOT_SORT的导出部署，是单独导出检测模型，再组装跟踪器运行的，参照[PP-Tracking](../../../deploy/pptracking/python)。
+  - BOT_SORT是PP-Human和PP-Vehicle等Pipeline分析项目跟踪方向的主要方案，具体使用参照[Pipeline](../../../deploy/pipeline)和[MOT](../../../deploy/pipeline/docs/tutorials/pphuman_mot.md)。
+
+
+## 快速开始
+
+### 1. 训练
+通过如下命令一键式启动训练和评估
+```bash
+#单卡训练
+CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp
+
+#多卡训练
+python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp
+```
+
+### 2. 评估
+#### 2.1 评估检测效果
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml
+```
+
+**注意:**
+ - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。
+
+#### 2.2 评估跟踪效果
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/botsort/botsort_ppyoloe.yml --scaled=True
+```
+**注意:**
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE YOLOv3则为False，如果使用通用检测模型则为True, 默认值是False。
+ - 跟踪结果会存于`{output_dir}/mot_results/`中，里面每个视频序列对应一个txt，每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。
+
+### 3. 导出预测模型
+
+```bash
+python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --output_dir=output_inference -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+```
+
+### 4. 用导出的模型基于Python去预测
+
+```bash
+# 下载demo视频
+wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4
+
+CUDA_VISIBLE_DEVICES=0 python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --threshold=0.5
+```
+**注意:**
+ - 运行前需要手动修改`tracker_config.yml`的跟踪器类型为`type: BOTSORTTracker`。
+ - 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。
+
+
+## 引用
+```
+@article{aharon2022bot,
+  title={BoT-SORT: Robust Associations Multi-Pedestrian Tracking},
+  author={Aharon, Nir and Orfaig, Roy and Bobrovsky, Ben-Zion},
+  journal={arXiv preprint arXiv:2206.14651},
+  year={2022}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/botsort/botsort_ppyoloe.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/botsort/botsort_ppyoloe.yml
@@ -0,0 +1,75 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml',
+  '../bytetrack/_base_/mot17.yml',
+  '../bytetrack/_base_/ppyoloe_mot_reader_640x640.yml'
+]
+weights: output/botsort_ppyoloe/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode, set 'COCO' can be training mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOv3 # PPYOLOe version
+  reid: None
+  tracker: BOTSORTTracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+reid_weights: None
+
+YOLOv3:
+  backbone: CSPResNet
+  neck: CustomCSPPAN
+  yolo_head: PPYOLOEHead
+  post_process: ~
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+PPYOLOEHead:
+  fpn_strides: [32, 16, 8]
+  grid_cell_scale: 5.0
+  grid_cell_offset: 0.5
+  static_assigner_epoch: -1 # 100
+  use_varifocal_loss: True
+  loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.1 # 0.01 in original detector
+    nms_threshold: 0.4 # 0.6 in original detector
+
+
+BOTSORTTracker:
+  track_high_thresh: 0.3
+  track_low_thresh: 0.2
+  new_track_thresh: 0.4
+  match_thresh: 0.7
+  track_buffer: 30
+  min_box_area: 0
+  camera_motion: False
+  cmc_method: 'sparseOptFlow' # only camera_motion is True,
+                              # sparseOptFlow | files (Vidstab GMC) | orb | ecc
+
+
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/README.md
@@ -0,0 +1 @@
+README_cn.md
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/README_cn.md
@@ -0,0 +1,195 @@
+简体中文 | [English](README.md)
+
+# ByteTrack (ByteTrack: Multi-Object Tracking by Associating Every Detection Box)
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+    - [行人跟踪](#行人跟踪)
+    - [人头跟踪](#人头跟踪)
+- [多类别适配](#多类别适配)
+- [快速开始](#快速开始)
+- [引用](#引用)
+
+
+## 简介
+[ByteTrack](https://arxiv.org/abs/2110.06864)(ByteTrack: Multi-Object Tracking by Associating Every Detection Box) 通过关联每个检测框来跟踪，而不仅是关联高分的检测框。对于低分数检测框会利用它们与轨迹片段的相似性来恢复真实对象并过滤掉背景检测框。此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异，请自行根据需求进行适配。
+
+
+## 模型库
+
+### 行人跟踪
+
+#### 基于不同检测器的ByteTrack在 MOT-17 half Val Set 上的结果
+
+|  检测训练数据集      |  检测器     | 输入尺度  |  ReID  |  检测mAP(0.5:0.95)  |  MOTA  |  IDF1  |  FPS | 配置文件 |
+| :--------         | :-----      | :----:  | :----:|:------:  | :----: |:-----: |:----:|:----:   |
+| MOT-17 half train | YOLOv3      | 608x608 | -     |  42.7    |  49.5  |  54.8  |   -    |[配置文件](./bytetrack_yolov3.yml) |
+| MOT-17 half train | PP-YOLOE-l  | 640x640 | -     |  52.9    |  50.4  |  59.7  |   -    |[配置文件](./bytetrack_ppyoloe.yml) |
+| MOT-17 half train | PP-YOLOE-l  | 640x640 |PPLCNet|  52.9    |  51.7  |  58.8  |   -    |[配置文件](./bytetrack_ppyoloe_pplcnet.yml) |
+| **mix_mot_ch** | YOLOX-x     | 800x1440|   -   |  61.9    |  77.3  |  71.6  |   -    |[配置文件](./bytetrack_yolox.yml) |
+| **mix_det** | YOLOX-x     | 800x1440|   -   |  65.4    |  84.5  |  77.4  |   -    |[配置文件](./bytetrack_yolox.yml) |
+
+**注意:**
+  - 检测任务相关配置和文档请查看[detector](detector/)。
+  - 模型权重下载链接在配置文件中的```det_weights```和```reid_weights```，运行```tools/eval_mot.py```评估的命令即可自动下载，```reid_weights```若为None则表示不需要使用。
+  - **ByteTrack默认不使用ReID权重**，如需使用ReID权重，可以参考 [bytetrack_ppyoloe_pplcnet.yml](./bytetrack_ppyoloe_pplcnet.yml)，如需**更换ReID权重，可改动其中的`reid_weights: `为自己的权重路径**。
+  - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集，而为了验证精度可以都用**MOT17-half val**数据集去评估，它是每个视频的后一半帧组成的，数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载，并解压放在`dataset/mot/`文件夹下。
+  - **mix_mot_ch**数据集，是MOT17、CrowdHuman组成的联合数据集，**mix_det**数据集是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集，数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation)，最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。
+
+
+#### YOLOX-x ByteTrack(mix_det)在 MOT-16/MOT-17 上的结果
+
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pp-yoloe-an-evolved-version-of-yolo/multi-object-tracking-on-mot16)](https://paperswithcode.com/sota/multi-object-tracking-on-mot16?p=pp-yoloe-an-evolved-version-of-yolo)
+
+|    网络      |  测试集 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :---------: | :-------: | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| ByteTrack-x| MOT-17 Train |  84.4  |  72.8  |  837  |  5653  | 10985 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) |
+| ByteTrack-x| **MOT-17 Test** |  **78.4**  |  69.7  |  4974  |  37551  | 79524 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) |
+| ByteTrack-x| MOT-16 Train |  83.5  |  72.7  |  800  |  6973  | 10419 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) |
+| ByteTrack-x| **MOT-16 Test** |  **77.7**  |  70.1  |  1570  |  15695  | 23304 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) | [配置文件](./bytetrack_yolox.yml) |
+
+
+**注意:**
+  - **mix_det**数据集是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集，数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation)，最终放置于`dataset/mot/`目录下。
+  - MOT-17 Train 和 MOT-16 Train 的指标均为本地评估该数据后的指标，由于Train集包括在了训练集中，此MOTA指标不代表模型的检测跟踪能力，只是因为MOT-17和MOT-16无验证集而它们的Train集有ground truth，是为了方便验证精度。
+  - MOT-17 Test 和 MOT-16 Test 的指标均为交到 [MOTChallenge](https://motchallenge.net)官网评测后的指标，因为MOT-17和MOT-16的Test集未开放ground truth，此MOTA指标可以代表模型的检测跟踪能力。
+  - ByteTrack的训练是单独的检测器训练MOT数据集，推理是组装跟踪器去评估MOT指标，单独的检测模型也可以评估检测指标。
+  - ByteTrack的导出部署，是单独导出检测模型，再组装跟踪器运行的，参照[PP-Tracking](../../../deploy/pptracking/python/README.md)。
+
+
+### 人头跟踪
+
+#### YOLOX-x ByteTrack 在 HT-21 Test Set上的结果
+
+|    模型      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
+| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
+| ByteTrack-x     | 1440x800 |  64.1 |  63.4  |  4191   |  185162  |  210240 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](./bytetrack_yolox_ht21.yml) |
+
+#### YOLOX-x ByteTrack 在 HT-21 Test Set上的结果
+
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
+| ByteTrack-x     | 1440x800 |  72.6  |  61.8  |  5163   |  71235  |  154139 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](./bytetrack_yolox_ht21.yml) |
+
+**注意:**
+  - 更多人头跟踪模型可以参考[headtracking21](../headtracking21)。
+
+
+## 多类别适配
+
+多类别ByteTrack，可以参考 [bytetrack_ppyoloe_ppvehicle9cls.yml](./bytetrack_ppyoloe_ppvehicle9cls.yml)，表示使用 [PP-Vehicle](../../ppvehicle/) 中的PPVehicle9cls数据集训好的模型权重去做多类别车辆跟踪。由于没有跟踪的ground truth标签无法做评估，故只做跟踪预测，只需修改`TestMOTDataset`确保路径存在，且其中的`anno_path`表示指定在一个`label_list.txt`中记录具体类别，需要自己手写，一行表示一个种类，注意路径`anno_path`如果写错或找不到则将默认使用COCO数据集80类的类别。
+
+如需**更换检测器权重，可改动其中的`det_weights: `为自己的权重路径**，并注意**数据集路径、`label_list.txt`和类别数**做出相应更改。
+
+预测多类别车辆跟踪：
+```bash
+# 下载demo视频
+wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/bdd100k_demo.mp4
+
+# 使用PPYOLOE 多类别车辆检测模型
+CUDA_VISIBLE_DEVICES=1 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe_ppvehicle9cls.yml --video_file=bdd100k_demo.mp4 --scaled=True --save_videos
+```
+
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE的YOLOv3则为False，如果使用通用检测模型则为True。
+ - `--save_videos`表示保存可视化视频，同时会保存可视化的图片在`{output_dir}/mot_outputs/`中，`{output_dir}`可通过`--output_dir`设置，默认文件夹名为`output`。
+
+
+## 快速开始
+
+### 1. 训练
+通过如下命令一键式启动训练和评估
+```bash
+python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp
+# 或者
+python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml --eval --amp
+```
+
+**注意:**
+  - ` --eval`是边训练边验证精度；`--amp`是混合精度训练避免溢出，推荐使用paddlepaddle2.2.2版本。
+
+### 2. 评估
+#### 2.1 评估检测效果
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams
+```
+
+**注意:**
+ - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。
+
+#### 2.2 评估跟踪效果
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_yolov3.yml --scaled=True
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --scaled=True
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml --scaled=True
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_yolox.yml --scaled=True
+```
+**注意:**
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE YOLOv3则为False，如果使用通用检测模型则为True, 默认值是False。
+ - 跟踪结果会存于`{output_dir}/mot_results/`中，里面每个视频序列对应一个txt，每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置，默认文件夹名为`output`。
+
+### 3. 预测
+
+使用单个GPU通过如下命令预测一个视频，并保存为视频
+
+```bash
+# 下载demo视频
+wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4
+
+# 使用PPYOLOe行人检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos
+# 或者使用YOLOX行人检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_yolox.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos
+```
+
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE的YOLOv3则为False，如果使用通用检测模型则为True。
+ - `--save_videos`表示保存可视化视频，同时会保存可视化的图片在`{output_dir}/mot_outputs/`中，`{output_dir}`可通过`--output_dir`设置，默认文件夹名为`output`。
+
+
+### 4. 导出预测模型
+
+Step 1：导出检测模型
+```bash
+# 导出PPYOLOe行人检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+# 或者导出YOLOX行人检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/yolox_x_24e_800x1440_mix_det.pdparams
+```
+
+Step 2：导出ReID模型(可选步骤，默认不需要)
+```bash
+# 导出PPLCNet ReID模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+```
+
+### 5. 用导出的模型基于Python去预测
+
+```bash
+python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts
+# 或者
+python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/yolox_x_24e_800x1440_mix_det/ --tracker_config=deploy/pptracking/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts
+```
+
+**注意:**
+ - 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。
+
+
+## 引用
+```
+@article{zhang2021bytetrack,
+  title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
+  author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
+  journal={arXiv preprint arXiv:2110.06864},
+  year={2021}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/ht21.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/ht21.yml
@@ -0,0 +1,34 @@
+metric: COCO
+num_classes: 1
+
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    image_dir: images/train
+    anno_path: annotations/train.json
+    dataset_dir: dataset/mot/HT21
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: images/train
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/HT21
+
+TestDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot/HT21
+    anno_path: annotations/val_half.json
+
+
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: HT21/images/test
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/mix_det.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/mix_det.yml
@@ -0,0 +1,34 @@
+metric: COCO
+num_classes: 1
+
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    image_dir: ""
+    anno_path: annotations/train.json
+    dataset_dir: dataset/mot/mix_det
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: images/train
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/MOT17
+
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/MOT17
+
+
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/mix_mot_ch.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/mix_mot_ch.yml
@@ -0,0 +1,34 @@
+metric: COCO
+num_classes: 1
+
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    image_dir: ""
+    anno_path: annotations/train.json
+    dataset_dir: dataset/mot/mix_mot_ch
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: images/train
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/MOT17
+
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/val_half.json
+    dataset_dir: dataset/mot/MOT17
+
+
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/mot17.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/mot17.yml
@@ -0,0 +1,34 @@
+metric: COCO
+num_classes: 1
+
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/train_half.json
+    image_dir: images/train
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+    image_dir: images/train
+
+TestDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+
+
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/ppyoloe_mot_reader_640x640.yml
@@ -0,0 +1,60 @@
+worker_num: 4
+eval_height: &eval_height 640
+eval_width: &eval_width 640
+eval_size: &eval_size [*eval_height, *eval_width]
+
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {}
+    - RandomFlip: {}
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - PadGT: {}
+  batch_size: 8
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+  collate_batch: true
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 8
+
+TestReader:
+  inputs_def:
+    image_shape: [3, *eval_height, *eval_width]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, *eval_height, *eval_width]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/yolov3_mot_reader_608x608.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/yolov3_mot_reader_608x608.yml
@@ -0,0 +1,66 @@
+worker_num: 2
+TrainReader:
+  inputs_def:
+    num_max_boxes: 50
+  sample_transforms:
+    - Decode: {}
+    - Mixup: {alpha: 1.5, beta: 1.5}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {}
+    - RandomFlip: {}
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeBox: {}
+    - PadBox: {num_max_boxes: 50}
+    - BboxXYXY2XYWH: {}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
+  batch_size: 8
+  shuffle: true
+  drop_last: true
+  mixup_epoch: 250
+  use_shared_memory: true
+
+EvalReader:
+  inputs_def:
+    num_max_boxes: 50
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 8
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 608, 608]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT
+EvalMOTReader:
+  inputs_def:
+    num_max_boxes: 50
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 608, 608]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/_base_/yolox_mot_reader_800x1440.yml
@@ -0,0 +1,67 @@
+
+input_height: &input_height 800
+input_width: &input_width 1440
+input_size: &input_size [*input_height, *input_width]
+
+worker_num: 4
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - Mosaic:
+        prob: 1.0
+        input_dim: *input_size
+        degrees: [-10, 10]
+        scale: [0.1, 2.0]
+        shear: [-2, 2]
+        translate: [-0.1, 0.1]
+        enable_mixup: True
+        mixup_prob: 1.0
+        mixup_scale: [0.5, 1.5]
+    - AugmentHSV: {is_bgr: False, hgain: 5, sgain: 30, vgain: 30}
+    - PadResize: {target_size: *input_size}
+    - RandomFlip: {}
+  batch_transforms:
+    - Permute: {}
+  batch_size: 6
+  shuffle: True
+  drop_last: True
+  collate_batch: False
+  mosaic_epoch: 20
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 8
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 800, 1440]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 1
+
+
+# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 1
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 800, 1440]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *input_size, keep_ratio: True}
+    - Pad: {size: *input_size, fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 1
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe.yml
@@ -0,0 +1,59 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml',
+  '_base_/mot17.yml',
+  '_base_/ppyoloe_mot_reader_640x640.yml'
+]
+weights: output/bytetrack_ppyoloe/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode, set 'COCO' can be training mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOv3 # PPYOLOe version
+  reid: None
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+reid_weights: None
+
+YOLOv3:
+  backbone: CSPResNet
+  neck: CustomCSPPAN
+  yolo_head: PPYOLOEHead
+  post_process: ~
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+PPYOLOEHead:
+  fpn_strides: [32, 16, 8]
+  grid_cell_scale: 5.0
+  grid_cell_offset: 0.5
+  static_assigner_epoch: -1 # 100
+  use_varifocal_loss: True
+  loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.1 # 0.01 in original detector
+    nms_threshold: 0.4 # 0.6 in original detector
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.2
+  low_conf_thres: 0.1
+  min_box_area: 100
+  vertical_ratio: 1.6 # for pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml
@@ -0,0 +1,59 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml',
+  '_base_/mot17.yml',
+  '_base_/ppyoloe_mot_reader_640x640.yml'
+]
+weights: output/bytetrack_ppyoloe_pplcnet/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOv3 # PPYOLOe version
+  reid: PPLCNetEmbedding # use reid
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+reid_weights: https://bj.bcebos.com/v1/paddledet/models/mot/deepsort_pplcnet.pdparams
+
+YOLOv3:
+  backbone: CSPResNet
+  neck: CustomCSPPAN
+  yolo_head: PPYOLOEHead
+  post_process: ~
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+PPYOLOEHead:
+  fpn_strides: [32, 16, 8]
+  grid_cell_scale: 5.0
+  grid_cell_offset: 0.5
+  static_assigner_epoch: -1 # 100
+  use_varifocal_loss: True
+  loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.1 # 0.01 in original detector
+    nms_threshold: 0.4 # 0.6 in original detector
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.2
+  low_conf_thres: 0.1
+  min_box_area: 100
+  vertical_ratio: 1.6 # for pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe_ppvehicle9cls.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_ppyoloe_ppvehicle9cls.yml
@@ -0,0 +1,49 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'bytetrack_ppyoloe.yml',
+  '_base_/ppyoloe_mot_reader_640x640.yml'
+]
+weights: output/bytetrack_ppyoloe_ppvehicle9cls/model_final
+
+metric: MCMOT # multi-class, `MOT` for single class
+num_classes: 9
+# pedestrian(1), rider(2), car(3), truck(4), bus(5), van(6), motorcycle(7), bicycle(8), others(9)
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
+    anno_path: dataset/mot/label_list.txt # absolute path
+
+### write in label_list.txt each line:
+# pedestrian
+# rider
+# car
+# truck
+# bus
+# van
+# motorcycle
+# bicycle
+# others
+###
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot_ppyoloe_l_36e_ppvehicle9cls.pdparams
+depth_mult: 1.0
+width_mult: 1.0
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+PPYOLOEHead:
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.1 # 0.01 in original detector
+    nms_threshold: 0.4 # 0.6 in original detector
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.2
+  low_conf_thres: 0.1
+  min_box_area: 0
+  vertical_ratio: 0 # only use 1.6 in MOT17 pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_yolov3.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_yolov3.yml
@@ -0,0 +1,50 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/yolov3_darknet53_40e_608x608_mot17half.yml',
+  '_base_/mot17.yml',
+  '_base_/yolov3_mot_reader_608x608.yml'
+]
+weights: output/bytetrack_yolov3/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolov3_darknet53_270e_coco.pdparams
+ByteTrack:
+  detector: YOLOv3 # General YOLOv3 version
+  reid: None
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolov3_darknet53_40e_608x608_mot17half.pdparams
+reid_weights: None
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.005
+    downsample_ratio: 32
+    clip_bbox: true
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.45
+    nms_top_k: 1000
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.2
+  low_conf_thres: 0.1
+  min_box_area: 100
+  vertical_ratio: 1.6 # for pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_yolox.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_yolox.yml
@@ -0,0 +1,68 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/yolox_x_24e_800x1440_mix_det.yml',
+  '_base_/mix_det.yml',
+  '_base_/yolox_mot_reader_800x1440.yml'
+]
+weights: output/bytetrack_yolox/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOX
+  reid: None
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams
+reid_weights: None
+
+depth_mult: 1.33
+width_mult: 1.25
+
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+
+YOLOCSPPAN:
+  depthwise: False
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
+
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.6
+  low_conf_thres: 0.2
+  min_box_area: 100
+  vertical_ratio: 1.6 # for pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_yolox_ht21.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/bytetrack_yolox_ht21.yml
@@ -0,0 +1,68 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  'detector/yolox_x_24e_800x1440_ht21.yml',
+  '_base_/ht21.yml',
+  '_base_/yolox_mot_reader_800x1440.yml'
+]
+weights: output/bytetrack_yolox_ht21/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+metric: MOT # eval/infer mode
+num_classes: 1
+
+architecture: ByteTrack
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+ByteTrack:
+  detector: YOLOX
+  reid: None
+  tracker: JDETracker
+det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_ht21.pdparams
+reid_weights: None
+
+depth_mult: 1.33
+width_mult: 1.25
+
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 22] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+
+YOLOCSPPAN:
+  depthwise: False
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 30000
+    keep_top_k: 1000
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
+
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  match_thres: 0.9
+  conf_thres: 0.7
+  low_conf_thres: 0.1
+  min_box_area: 0
+  vertical_ratio: 0 # 1.6 for pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/README.md
@@ -0,0 +1 @@
+README_cn.md
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/README_cn.md
@@ -0,0 +1,39 @@
+简体中文 | [English](README.md)
+
+# ByteTrack的检测器
+
+## 简介
+[ByteTrack](https://arxiv.org/abs/2110.06864)(ByteTrack: Multi-Object Tracking by Associating Every Detection Box) 通过关联每个检测框来跟踪，而不仅是关联高分的检测框。此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异，请自行根据需求进行适配。
+
+## 模型库
+
+### 在MOT17-half val数据集上的检测结果
+| 骨架网络         | 网络类型          |   输入尺度   | 学习率策略    |推理时间(fps)   |  Box AP |   下载    | 配置文件 |
+| :-------------- | :-------------  | :--------:  | :---------: | :-----------: | :-----: | :------: | :-----: |
+| DarkNet-53      | YOLOv3          |   608X608   |   40e      |      ----     |  42.7   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams)  | [配置文件](./yolov3_darknet53_40e_608x608_mot17half.yml) |
+| CSPResNet       | PPYOLOe         |   640x640   |   36e       |      ----     |  52.9   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams)     | [配置文件](./ppyoloe_crn_l_36e_640x640_mot17half.yml)    |
+| CSPDarkNet       | YOLOX-x(mix_mot_ch) |   800x1440   |   24e       |      ----     |  61.9   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_mot_ch.pdparams)     | [配置文件](./yolox_x_24e_800x1440_mix_mot_ch.yml)    |
+| CSPDarkNet       | YOLOX-x(mix_det) |   800x1440   |   24e       |      ----     |  65.4   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_det.pdparams)     | [配置文件](./yolox_x_24e_800x1440_mix_det.yml)    |
+
+**注意:**
+  - 以上模型除YOLOX外采用**MOT17-half train**数据集训练，数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载。
+  - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集，而为了验证精度可以都用**MOT17-half val**数据集去评估，它是每个视频的后一半帧组成的，数据集可以从[此链接](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip)下载，并解压放在`dataset/mot/MOT17/images/`文件夹下。
+  - YOLOX-x(mix_mot_ch)采用**mix_mot_ch**数据集，是MOT17、CrowdHuman组成的联合数据集；YOLOX-x(mix_det)采用**mix_det**数据集，是MOT17、CrowdHuman、Cityscapes、ETHZ组成的联合数据集，数据集整理的格式和目录可以参考[此链接](https://github.com/ifzhang/ByteTrack#data-preparation)，最终放置于`dataset/mot/`目录下。为了验证精度可以都用**MOT17-half val**数据集去评估。
+  - 行人跟踪请使用行人检测器结合行人ReID模型。车辆跟踪请使用车辆检测器结合车辆ReID模型。
+  - 用于ByteTrack跟踪时，这些模型的NMS阈值等后处理设置会与纯检测任务的设置不同。
+
+
+## 快速开始
+
+通过如下命令一键式启动评估、评估和导出
+```bash
+job_name=ppyoloe_crn_l_36e_640x640_mot17half
+config=configs/mot/bytetrack/detector/${job_name}.yml
+log_dir=log_dir/${job_name}
+# 1. training
+python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp
+# 2. evaluation
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=output/${job_name}/model_final.pdparams
+# 3. export
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=output/${job_name}/model_final.pdparams
+```
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml
@@ -0,0 +1,83 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../ppyoloe/ppyoloe_crn_l_300e_coco.yml',
+  '../_base_/mot17.yml',
+]
+weights: output/ppyoloe_crn_l_36e_640x640_mot17half/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+
+# schedule configuration for fine-tuning
+epoch: 36
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+    - !CosineDecay
+      max_epochs: 43
+    - !LinearWarmup
+      start_factor: 0.001
+      epochs: 1
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+
+TrainReader:
+  batch_size: 8
+
+
+# detector configuration
+architecture: YOLOv3
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams
+depth_mult: 1.0
+width_mult: 1.0
+
+YOLOv3:
+  backbone: CSPResNet
+  neck: CustomCSPPAN
+  yolo_head: PPYOLOEHead
+  post_process: ~
+
+CSPResNet:
+  layers: [3, 6, 6, 3]
+  channels: [64, 128, 256, 512, 1024]
+  return_idx: [1, 2, 3]
+  use_large_stem: True
+
+CustomCSPPAN:
+  out_channels: [768, 384, 192]
+  stage_num: 1
+  block_num: 3
+  act: 'swish'
+  spp: true
+
+PPYOLOEHead:
+  fpn_strides: [32, 16, 8]
+  grid_cell_scale: 5.0
+  grid_cell_offset: 0.5
+  static_assigner_epoch: -1 # 100
+  use_varifocal_loss: True
+  loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.6
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/yolov3_darknet53_40e_608x608_mot17half.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/yolov3_darknet53_40e_608x608_mot17half.yml
@@ -0,0 +1,77 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../yolov3/yolov3_darknet53_270e_coco.yml',
+  '../_base_/mot17.yml',
+]
+weights: output/yolov3_darknet53_40e_608x608_mot17half/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+# schedule configuration for fine-tuning
+epoch: 40
+LearningRate:
+  base_lr: 0.0001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 32
+    - 36
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 100
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+TrainReader:
+  batch_size: 8
+  mixup_epoch: 35
+
+# detector configuration
+architecture: YOLOv3
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolov3_darknet53_270e_coco.pdparams
+norm_type: sync_bn
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+DarkNet:
+  depth: 53
+  return_idx: [2, 3, 4]
+
+# use default config
+# YOLOv3FPN:
+
+YOLOv3Head:
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  loss: YOLOv3Loss
+
+YOLOv3Loss:
+  ignore_thresh: 0.7
+  downsample: [32, 16, 8]
+  label_smooth: false
+
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.005
+    downsample_ratio: 32
+    clip_bbox: true
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.45
+    nms_top_k: 1000
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_ht21.yml
@@ -0,0 +1,80 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../yolox/yolox_x_300e_coco.yml',
+  '../_base_/ht21.yml',
+]
+weights: output/yolox_x_24e_800x1440_ht21/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+# schedule configuration for fine-tuning
+epoch: 24
+LearningRate:
+  base_lr: 0.0005 # fintune
+  schedulers:
+  - !CosineDecay
+    max_epochs: 24
+    min_lr_ratio: 0.05
+    last_plateau_epochs: 4
+  - !ExpWarmup
+    epochs: 1
+
+OptimizerBuilder:
+  optimizer:
+    type: Momentum
+    momentum: 0.9
+    use_nesterov: True
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+
+TrainReader:
+  batch_size: 4
+  mosaic_epoch: 20
+
+# detector configuration
+architecture: YOLOX
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+norm_type: sync_bn
+use_ema: True
+ema_decay: 0.9999
+ema_decay_type: "exponential"
+act: silu
+find_unused_parameters: True
+depth_mult: 1.33
+width_mult: 1.25
+
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 32] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+
+YOLOCSPPAN:
+  depthwise: False
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml
@@ -0,0 +1,80 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../yolox/yolox_x_300e_coco.yml',
+  '../_base_/mix_det.yml',
+]
+weights: output/yolox_x_24e_800x1440_mix_det/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+# schedule configuration for fine-tuning
+epoch: 24
+LearningRate:
+  base_lr: 0.00075 # fintune
+  schedulers:
+  - !CosineDecay
+    max_epochs: 24
+    min_lr_ratio: 0.05
+    last_plateau_epochs: 4
+  - !ExpWarmup
+    epochs: 1
+
+OptimizerBuilder:
+  optimizer:
+    type: Momentum
+    momentum: 0.9
+    use_nesterov: True
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+
+TrainReader:
+  batch_size: 6
+  mosaic_epoch: 20
+
+# detector configuration
+architecture: YOLOX
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+norm_type: sync_bn
+use_ema: True
+ema_decay: 0.9999
+ema_decay_type: "exponential"
+act: silu
+find_unused_parameters: True
+depth_mult: 1.33
+width_mult: 1.25
+
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 30] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+
+YOLOCSPPAN:
+  depthwise: False
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
--- a/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml
@@ -0,0 +1,80 @@
+# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
+_BASE_: [
+  '../../../yolox/yolox_x_300e_coco.yml',
+  '../_base_/mix_mot_ch.yml',
+]
+weights: output/yolox_x_24e_800x1440_mix_mot_ch/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+# schedule configuration for fine-tuning
+epoch: 24
+LearningRate:
+  base_lr: 0.00075 # fine-tune
+  schedulers:
+  - !CosineDecay
+    max_epochs: 24
+    min_lr_ratio: 0.05
+    last_plateau_epochs: 4
+  - !ExpWarmup
+    epochs: 1
+
+OptimizerBuilder:
+  optimizer:
+    type: Momentum
+    momentum: 0.9
+    use_nesterov: True
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+
+TrainReader:
+  batch_size: 6
+  mosaic_epoch: 20
+
+# detector configuration
+architecture: YOLOX
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolox_x_300e_coco.pdparams
+norm_type: sync_bn
+use_ema: True
+ema_decay: 0.9999
+ema_decay_type: "exponential"
+act: silu
+find_unused_parameters: True
+depth_mult: 1.33
+width_mult: 1.25
+
+YOLOX:
+  backbone: CSPDarkNet
+  neck: YOLOCSPPAN
+  head: YOLOXHead
+  input_size: [800, 1440]
+  size_stride: 32
+  size_range: [18, 30] # multi-scale range [576*1024 ~ 800*1440], w/h ratio=1.8
+
+CSPDarkNet:
+  arch: "X"
+  return_idx: [2, 3, 4]
+  depthwise: False
+
+YOLOCSPPAN:
+  depthwise: False
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+YOLOXHead:
+  l1_epoch: 20
+  depthwise: False
+  loss_weight: {cls: 1.0, obj: 1.0, iou: 5.0, l1: 1.0}
+  assigner:
+    name: SimOTAAssigner
+    candidate_topk: 10
+    use_vfl: False
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.7
+    # For speed while keep high mAP, you can modify 'nms_top_k' to 1000 and 'keep_top_k' to 100, the mAP will drop about 0.1%.
+    # For high speed demo, you can modify 'score_threshold' to 0.25 and 'nms_threshold' to 0.45, but the mAP will drop a lot.
--- a/services/paddle_services/paddle_detection/configs/mot/centertrack/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/centertrack/README.md
@@ -0,0 +1 @@
+README_cn.md
--- a/services/paddle_services/paddle_detection/configs/mot/centertrack/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/centertrack/README_cn.md
@@ -0,0 +1,156 @@
+简体中文 | [English](README.md)
+
+# CenterTrack (Tracking Objects as Points)
+
+## 内容
+- [模型库](#模型库)
+- [快速开始](#快速开始)
+- [引用](#引用)
+
+## 模型库
+
+### MOT17
+
+|      训练数据集     |  输入尺度  |  总batch_size  |      val MOTA      |  test MOTA  |     FPS   | 配置文件 |  下载链接|
+| :---------------: | :-------: | :------------: | :----------------: | :---------: | :-------: | :----: | :-----: |
+| MOT17-half train |  544x960  |         32     |   69.2(MOT17-half)  |     -       |     -     |[config](./centertrack_dla34_70e_mot17half.yml) | [download](https://paddledet.bj.bcebos.com/models/mot/centertrack_dla34_70e_mot17half.pdparams) |
+| MOT17 train      |  544x960  |         32     |   87.9(MOT17-train) |    70.5(MOT17-test)     |     -     |[config](./centertrack_dla34_70e_mot17.yml) | [download](https://paddledet.bj.bcebos.com/models/mot/centertrack_dla34_70e_mot17.pdparams) |
+| MOT17 train(paper) |  544x960|         32     |          -          |    67.8(MOT17-test)     |     -     | - | - |
+
+
+**注意:**
+  - CenterTrack默认使用2 GPUs总batch_size为32进行训练，如改变GPU数或单卡batch_size，最好保持总batch_size为32去训练。
+  - **val MOTA**可能会有1.0 MOTA左右的波动，最好使用2 GPUs和总batch_size为32的默认配置去训练。
+  - **MOT17-half train**是MOT17的train序列(共7个)每个视频的**前一半帧**的图片和标注用作训练集，而用每个视频的后一半帧组成的**MOT17-half val**作为验证集去评估得到**val MOTA**，数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载，并解压放在`dataset/mot/`文件夹下。
+  - **MOT17 train**是MOT17的train序列(共7个)每个视频的所有帧的图片和标注用作训练集，由于MOT17数据集有限也使用**MOT17 train**数据集去评估得到**val MOTA**，而**test MOTA**为交到[MOT Challenge官网](https://motchallenge.net)评测的结果。
+
+
+## 快速开始
+
+### 1.训练
+通过如下命令一键式启动训练和评估
+```bash
+# 单卡训练(不推荐)
+CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml --amp
+# 多卡训练
+python -m paddle.distributed.launch --log_dir=centertrack_dla34_70e_mot17half/ --gpus 0,1 tools/train.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml --amp
+```
+**注意:**
+  - `--eval`暂不支持边训练边验证跟踪的MOTA精度，如果需要开启`--eval`边训练边验证检测mAP，需设置**注释配置文件中的`mot_metric: True`和`metric: MOT`**；
+  - `--amp`表示混合精度训练避免显存溢出；
+  - CenterTrack默认使用2 GPUs总batch_size为32进行训练，如改变GPU数或单卡batch_size，最好保持总batch_size仍然为32；
+
+
+### 2.评估
+
+#### 2.1 评估检测效果
+
+注意首先需要**注释配置文件中的`mot_metric: True`和`metric: MOT`**:
+```python
+### for detection eval.py/infer.py
+mot_metric: False
+metric: COCO
+
+### for MOT eval_mot.py/infer_mot_mot.py
+#mot_metric: True # 默认是不注释的，评估跟踪需要为 True，会覆盖之前的 mot_metric: False
+#metric: MOT # 默认是不注释的，评估跟踪需要使用 MOT，会覆盖之前的 metric: COCO
+```
+
+然后执行以下语句：
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml -o weights=output/centertrack_dla34_70e_mot17half/model_final.pdparams
+```
+
+**注意:**
+ - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。
+
+#### 2.2 评估跟踪效果
+
+注意首先确保设置了**配置文件中的`mot_metric: True`和`metric: MOT`**；
+
+然后执行以下语句：
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml -o weights=output/centertrack_dla34_70e_mot17half/model_final.pdparams
+```
+**注意:**
+ - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。
+ - 跟踪结果会存于`{output_dir}/mot_results/`中，里面每个视频序列对应一个txt，每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置，默认文件夹名为`output`。
+
+
+### 3.预测
+
+#### 3.1 预测检测效果
+注意首先需要**注释配置文件中的`mot_metric: True`和`metric: MOT`**:
+```python
+### for detection eval.py/infer.py
+mot_metric: False
+metric: COCO
+
+### for MOT eval_mot.py/infer_mot_mot.py
+#mot_metric: True # 默认是不注释的，评估跟踪需要为 True，会覆盖之前的 mot_metric: False
+#metric: MOT # 默认是不注释的，评估跟踪需要使用 MOT，会覆盖之前的 metric: COCO
+```
+
+然后执行以下语句：
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml -o weights=output/centertrack_dla34_70e_mot17half/model_final.pdparams --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.5
+```
+
+**注意:**
+ - 预测检测使用的是```tools/infer.py```, 预测跟踪使用的是```tools/infer_mot.py```。
+
+
+#### 3.2 预测跟踪效果
+
+注意首先确保设置了**配置文件中的`mot_metric: True`和`metric: MOT`**；
+
+然后执行以下语句：
+```bash
+# 下载demo视频
+wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4
+# 预测视频
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml --video_file=mot17_demo.mp4 --draw_threshold=0.5 --save_videos -o weights=output/centertrack_dla34_70e_mot17half/model_final.pdparams
+#或预测图片文件夹
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml --image_dir=mot17_demo/ --draw_threshold=0.5 --save_videos -o weights=output/centertrack_dla34_70e_mot17half/model_final.pdparams
+```
+
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+ - `--save_videos`表示保存可视化视频，同时会保存可视化的图片在`{output_dir}/mot_outputs/`中，`{output_dir}`可通过`--output_dir`设置，默认文件夹名为`output`。
+
+
+### 4. 导出预测模型
+
+注意首先确保设置了**配置文件中的`mot_metric: True`和`metric: MOT`**；
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/centertrack_dla34_70e_mot17half.pdparams
+```
+
+### 5. 用导出的模型基于Python去预测
+
+注意首先应在`deploy/python/tracker_config.yml`中设置`type: CenterTracker`。
+
+```bash
+# 预测某个视频
+# wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4
+python deploy/python/mot_centertrack_infer.py --model_dir=output_inference/centertrack_dla34_70e_mot17half/ --tracker_config=deploy/python/tracker_config.yml --video_file=mot17_demo.mp4 --device=GPU --save_images=True --save_mot_txts
+# 预测图片文件夹
+python deploy/python/mot_centertrack_infer.py --model_dir=output_inference/centertrack_dla34_70e_mot17half/ --tracker_config=deploy/python/tracker_config.yml --image_dir=mot17_demo/ --device=GPU --save_images=True --save_mot_txts
+```
+
+**注意:**
+ - 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。
+
+
+## 引用
+```
+@article{zhou2020tracking,
+  title={Tracking Objects as Points},
+  author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
+  journal={ECCV},
+  year={2020}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/centertrack/_base_/centertrack_dla34.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/centertrack/_base_/centertrack_dla34.yml
@@ -0,0 +1,57 @@
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/crowdhuman_centertrack.pdparams
+architecture: CenterTrack
+for_mot: True
+mot_metric: True
+
+### model
+CenterTrack:
+  detector: CenterNet
+  plugin_head: CenterTrackHead
+  tracker: CenterTracker
+
+
+### CenterTrack.detector
+CenterNet:
+  backbone: DLA
+  neck: CenterNetDLAFPN
+  head: CenterNetHead
+  post_process: CenterNetPostProcess
+  for_mot: True # Note
+
+DLA:
+  depth: 34
+  pre_img: True # Note
+  pre_hm: True # Note
+
+CenterNetDLAFPN:
+  down_ratio: 4
+  last_level: 5
+  out_channel: 0
+  dcn_v2: True
+
+CenterNetHead:
+  head_planes: 256
+  prior_bias: -4.6 # Note
+  regress_ltrb: False
+  size_loss: 'L1'
+  loss_weight: {'heatmap': 1.0, 'size': 0.1, 'offset': 1.0}
+
+CenterNetPostProcess:
+  max_per_img: 100 # top-K
+  regress_ltrb: False
+
+
+### CenterTrack.plugin_head
+CenterTrackHead:
+  head_planes: 256
+  task: tracking
+  loss_weight: {'tracking': 1.0, 'ltrb_amodal': 0.1}
+  add_ltrb_amodal: True
+
+
+### CenterTrack.tracker
+CenterTracker:
+  min_box_area: -1
+  vertical_ratio: -1
+  track_thresh: 0.4
+  pre_thresh: 0.5
--- a/services/paddle_services/paddle_detection/configs/mot/centertrack/_base_/centertrack_reader.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/centertrack/_base_/centertrack_reader.yml
@@ -0,0 +1,75 @@
+input_h: &input_h 544
+input_w: &input_w 960
+input_size: &input_size [*input_h, *input_w]
+pre_img_epoch: &pre_img_epoch 70 # Add previous image as input
+
+worker_num: 4
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - FlipWarpAffine:
+        keep_res: False
+        input_h: *input_h
+        input_w: *input_w
+        not_rand_crop: False
+        flip: 0.5
+        is_scale: True
+        use_random: True
+        add_pre_img: True
+    - CenterRandColor: {saturation: 0.4, contrast: 0.4, brightness: 0.4}
+    - Lighting: {alphastd: 0.1, eigval: [0.2141788, 0.01817699, 0.00341571], eigvec: [[-0.58752847, -0.69563484, 0.41340352], [-0.5832747, 0.00994535, -0.81221408], [-0.56089297, 0.71832671, 0.41158938]]}
+    - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: False}
+    - Permute: {}
+    - Gt2CenterTrackTarget:
+        down_ratio: 4
+        max_objs: 256
+        hm_disturb: 0.05
+        lost_disturb: 0.4
+        fp_disturb: 0.1
+        pre_hm: True
+        add_tracking: True
+        add_ltrb_amodal: True
+  batch_size: 16 # total 32 for 2 GPUs
+  shuffle: True
+  drop_last: True
+  collate_batch: True
+  use_shared_memory: True
+  pre_img_epoch: *pre_img_epoch
+
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - WarpAffine: {keep_res: True, input_h: *input_h, input_w: *input_w}
+    - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestReader:
+  sample_transforms:
+    - Decode: {}
+    - WarpAffine: {keep_res: True, input_h: *input_h, input_w: *input_w}
+    - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+  fuse_normalize: True
+
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - WarpAffine: {keep_res: False, input_h: *input_h, input_w: *input_w}
+    - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - WarpAffine: {keep_res: False, input_h: *input_h, input_w: *input_w}
+    - NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+  fuse_normalize: True
--- a/services/paddle_services/paddle_detection/configs/mot/centertrack/_base_/optimizer_70e.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/centertrack/_base_/optimizer_70e.yml
@@ -0,0 +1,14 @@
+epoch: 70
+
+LearningRate:
+  base_lr: 0.000125
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [60]
+    use_warmup: False
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer: NULL
--- a/services/paddle_services/paddle_detection/configs/mot/centertrack/centertrack_dla34_70e_mot17.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/centertrack/centertrack_dla34_70e_mot17.yml
@@ -0,0 +1,66 @@
+_BASE_: [
+  '_base_/optimizer_70e.yml',
+  '_base_/centertrack_dla34.yml',
+  '_base_/centertrack_reader.yml',
+  '../../runtime.yml',
+]
+log_iter: 20
+snapshot_epoch: 5
+weights: output/centertrack_dla34_70e_mot17/model_final
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/crowdhuman_centertrack.pdparams
+
+
+### for Detection eval.py/infer.py
+# mot_metric: False
+# metric: COCO
+
+### for MOT eval_mot.py/infer_mot_mot.py
+mot_metric: True
+metric: MOT
+
+
+worker_num: 4
+TrainReader:
+  batch_size: 16 # total 32 for 2 GPUs
+
+EvalReader:
+  batch_size: 1
+
+EvalMOTReader:
+  batch_size: 1
+
+
+# COCO style dataset for training
+num_classes: 1
+TrainDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/train.json
+    image_dir: images/train
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_track_id']
+    # add 'gt_track_id', the boxes annotations of json file should have 'gt_track_id'
+
+EvalDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+    image_dir: images/train
+
+TestDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+
+# for MOT evaluation
+# If you want to change the MOT evaluation dataset, please modify 'data_root'
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot/MOT17
+    data_root: images/train # set 'images/test' for MOTChallenge test
+    keep_ori_im: True # set True if save visualization images or video, or used in SDE MOT
+
+# for MOT video inference
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot/MOT17
+    keep_ori_im: True # set True if save visualization images or video
--- a/services/paddle_services/paddle_detection/configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/centertrack/centertrack_dla34_70e_mot17half.yml
@@ -0,0 +1,66 @@
+_BASE_: [
+  '_base_/optimizer_70e.yml',
+  '_base_/centertrack_dla34.yml',
+  '_base_/centertrack_reader.yml',
+  '../../runtime.yml',
+]
+log_iter: 20
+snapshot_epoch: 5
+weights: output/centertrack_dla34_70e_mot17half/model_final
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/crowdhuman_centertrack.pdparams
+
+
+### for Detection eval.py/infer.py
+# mot_metric: False
+# metric: COCO
+
+### for MOT eval_mot.py/infer_mot.py
+mot_metric: True
+metric: MOT
+
+
+worker_num: 4
+TrainReader:
+  batch_size: 16 # total 32 for 2 GPUs
+
+EvalReader:
+  batch_size: 1
+
+EvalMOTReader:
+  batch_size: 1
+
+
+# COCO style dataset for training
+num_classes: 1
+TrainDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/train_half.json
+    image_dir: images/train
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_track_id']
+    # add 'gt_track_id', the boxes annotations of json file should have 'gt_track_id'
+
+EvalDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+    image_dir: images/train
+
+TestDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+
+# for MOT evaluation
+# If you want to change the MOT evaluation dataset, please modify 'data_root'
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot/MOT17
+    data_root: images/half
+    keep_ori_im: True # set True if save visualization images or video, or used in SDE MOT
+
+# for MOT video inference
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot/MOT17
+    keep_ori_im: True # set True if save visualization images or video
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/README.md
@@ -0,0 +1 @@
+README_cn.md
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/README_cn.md
@@ -0,0 +1,232 @@
+简体中文 | [English](README.md)
+
+# DeepSORT (Deep Cosine Metric Learning for Person Re-identification)
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+- [快速开始](#快速开始)
+- [适配其他检测器](#适配其他检测器)
+- [引用](#引用)
+
+## 简介
+[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 扩展了原有的[SORT](https://arxiv.org/abs/1703.07402)(Simple Online and Realtime Tracking)算法，增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征，在深度外观描述的基础上整合外观信息，将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成，然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`PCB+Pyramid ResNet101`和`PPLCNet`模型。
+
+## 模型库
+
+### DeepSORT在MOT-16 Training Set上结果
+
+|  骨干网络  |  输入尺寸  |  MOTA  |  IDF1  |  IDS |   FP   |   FN  |  FPS | 检测结果或模型 | ReID模型 |配置文件 |
+| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----:| :-----: | :-----: |
+| ResNet-101 | 1088x608 |  72.2  |  60.5  | 998  |  8054  | 21644 |  - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  68.3  |  56.5  | 1722 |  17337 | 15890 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) |
+| PPLCNet    | 1088x608 |  72.2  |  59.5  | 1087  |  8034  | 21481 |  - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) |
+| PPLCNet    | 1088x608 |  68.1  |  53.6  | 1979 |  17446 | 15766 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort_jde_yolov3_pplcnet.yml) |
+
+### DeepSORT在MOT-16 Test Set上结果
+
+|  骨干网络  |  输入尺寸  |  MOTA  |  IDF1  |  IDS |   FP   |   FN  |  FPS | 检测结果或模型 | ReID模型 |配置文件 |
+| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: | :-----: | :-----: |:-----: |
+| ResNet-101 | 1088x608 |  64.1  |  53.0  | 1024  |  12457  | 51919 |  - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) | [ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./reid/deepsort_pcb_pyramid_r101.yml) |
+| ResNet-101 | 1088x608 |  61.2  |  48.5  | 1799  |  25796  | 43232 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams)  |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) |
+| PPLCNet    | 1088x608 |  64.0  |  51.3  | 1208  |  12697  | 51784 |  - | [检测结果](https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./reid/deepsort_pplcnet.yml) |
+| PPLCNet    | 1088x608 |  61.1  |  48.8  | 2010 |  25401 | 43432 |  - | [检测模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams) |[ReID模型](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)|[配置文件](./deepsort_jde_yolov3_pplcnet.yml) |
+
+
+### DeepSORT在MOT-17 half Val Set上结果
+
+|  检测训练数据集      |  检测器    |  ReID       |  检测mAP  |  MOTA  |  IDF1  |  FPS | 配置文件 |
+| :--------         | :-----     | :----:      |:------:  | :----: |:-----: |:----:|:----:   |
+| MIX               | JDE YOLOv3 | PCB Pyramid |  -       |  66.9  |  62.7  |   -    |[配置文件](./deepsort_jde_yolov3_pcb_pyramid.yml) |
+| MIX               | JDE YOLOv3 | PPLCNet     |  -       |  66.3  |  62.1  |   -    |[配置文件](./deepsort_jde_yolov3_pplcnet.yml) |
+| MOT-17 half train | YOLOv3     | PPLCNet     |  42.7    |  50.2  |  52.4  |   -    |[配置文件](./deepsort_yolov3_pplcnet.yml) |
+| MOT-17 half train | PPYOLOv2   | PPLCNet     |  46.8    |  51.8  |  55.8  |   -    |[配置文件](./deepsort_ppyolov2_pplcnet.yml) |
+| MOT-17 half train | PPYOLOe    | PPLCNet     |  52.7    |  56.7  |  60.5  |   -    |[配置文件](./deepsort_ppyoloe_pplcnet.yml) |
+| MOT-17 half train | PPYOLOe    | ResNet-50   |  52.7    |  56.7  |  64.6  |   -    |[配置文件](./deepsort_ppyoloe_resnet.yml) |
+
+**注意:**
+模型权重下载链接在配置文件中的```det_weights```和```reid_weights```，运行验证的命令即可自动下载。
+DeepSORT是分离检测器和ReID模型的，其中检测器单独训练MOT数据集，而组装成DeepSORT后只用于评估，现在支持两种评估的方式。
+- **方式1**：加载检测结果文件和ReID模型，在使用DeepSORT模型评估之前，应该首先通过一个检测模型得到检测结果，然后像这样准备好结果文件:
+```
+det_results_dir
+   |——————MOT16-02.txt
+   |——————MOT16-04.txt
+   |——————MOT16-05.txt
+   |——————MOT16-09.txt
+   |——————MOT16-10.txt
+   |——————MOT16-11.txt
+   |——————MOT16-13.txt
+```
+对于MOT16数据集，可以下载PaddleDetection提供的一个经过匹配之后的检测框结果det_results_dir.zip并解压：
+```
+wget https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip
+```
+如果使用更强的检测模型，可以取得更好的结果。其中每个txt是每个视频中所有图片的检测结果，每行都描述一个边界框，格式如下：
+```
+[frame_id],[x0],[y0],[w],[h],[score],[class_id]
+```
+- `frame_id`是图片帧的序号
+- `x0,y0`是目标框的左上角x和y坐标
+- `w,h`是目标框的像素宽高
+- `score`是目标框的得分
+- `class_id`是目标框的类别，如果只有1类则是`0`
+
+- **方式2**：同时加载检测模型和ReID模型，此处选用JDE版本的YOLOv3，具体配置见`configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml`。加载其他通用检测模型可参照`configs/mot/deepsort/deepsort_yoloe_pplcnet.yml`进行修改。
+
+## 快速开始
+
+### 1. 评估
+
+#### 1.1 评估检测效果
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+```
+
+**注意:**
+ - 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```。
+
+#### 1.2 评估跟踪效果
+**方式1**：加载检测结果文件和ReID模型，得到跟踪结果
+```bash
+# 下载PaddleDetection提供的MOT16数据集检测结果文件并解压，如需自己使用其他检测器生成请参照这个文件里的格式
+wget https://bj.bcebos.com/v1/paddledet/data/mot/det_results_dir.zip
+
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml --det_results_dir det_results_dir
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml --det_results_dir det_results_dir
+```
+
+**方式2**：加载行人检测模型和ReID模型，得到跟踪结果
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml --scaled=True
+# 或者
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_ppyoloe_resnet.yml --scaled=True
+```
+**注意:**
+ - JDE YOLOv3行人检测模型是和JDE和FairMOT使用同样的MOT数据集训练的，因此MOTA较高。而其他通用检测模型如PPYOLOv2只使用了MOT17 half数据集训练。
+ - JDE YOLOv3模型与通用检测模型如YOLOv3和PPYOLOv2最大的区别是使用了JDEBBoxPostProcess后处理，结果输出坐标没有缩放回原图，而通用检测模型输出坐标是缩放回原图的。
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE YOLOv3则为False，如果使用通用检测模型则为True, 默认值是False。
+ - 跟踪结果会存于`{output_dir}/mot_results/`中，里面每个视频序列对应一个txt，每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。
+
+### 2. 预测
+
+使用单个GPU通过如下命令预测一个视频，并保存为视频
+
+```bash
+# 下载demo视频
+wget https://bj.bcebos.com/v1/paddledet/data/mot/demo/mot17_demo.mp4
+
+# 加载JDE YOLOv3行人检测模型和PCB Pyramid ReID模型，并保存为视频
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml --video_file=mot17_demo.mp4  --save_videos
+
+# 或者加载PPYOLOE行人检测模型和PPLCNet ReID模型，并保存为视频
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos
+```
+
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE的YOLOv3则为False，如果使用通用检测模型则为True。
+
+
+### 3. 导出预测模型
+
+Step 1：导出检测模型
+```bash
+# 导出JDE YOLOv3行人检测模型
+CUDA_VISIBLE_DEVICES=0 python3.7 tools/export_model.py -c configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams
+
+# 或导出PPYOLOE行人检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+```
+
+Step 2：导出ReID模型
+```bash
+# 导出PCB Pyramid ReID模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams
+
+# 或者导出PPLCNet ReID模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+
+# 或者导出ResNet ReID模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_resnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_resnet.pdparams
+```
+
+### 4. 用导出的模型基于Python去预测
+
+```bash
+# 用导出的PPYOLOE行人检测模型和PPLCNet ReID模型
+python3.7 deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --reid_model_dir=output_inference/deepsort_pplcnet/ --tracker_config=deploy/pptracking/python/tracker_config.yml  --video_file=mot17_demo.mp4 --device=GPU --save_mot_txts --threshold=0.5
+```
+**注意:**
+ - 运行前需要先改动`deploy/pptracking/python/tracker_config.yml`里的tracker为`DeepSORTTracker`。
+ - 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`表示对每个视频保存一个txt，或`--save_images`表示保存跟踪结果可视化图片。
+ - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。
+
+
+## 适配其他检测器
+
+### 1、配置文件目录说明
+- `detector/xxx.yml`是纯粹的检测模型配置文件，如`detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml`，支持检测的所有流程(train/eval/infer/export/deploy)。DeepSORT跟踪的eval/infer与这个纯检测的yml文件无关，但是export的时候需要这个纯检测的yml单独导出检测模型，DeepSORT跟踪导出模型是分开detector和reid分别导出的，用户可自行定义和组装detector+reid成为一个完整的DeepSORT跟踪系统。
+- `detector/`下的检测器配置文件中，用户需要将自己的数据集转为COCO格式。由于ID的真实标签不需要参与进去，用户可以在此自行配置任何检测模型，只需保证输出结果包含结果框的种类、坐标和分数即可。
+- `reid/deepsort_yyy.yml`文件夹里的是ReID模型和tracker的配置文件，如`reid/deepsort_pplcnet.yml`，此处ReID模型是由[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供的`deepsort_pcb_pyramid_r101.yml`和`deepsort_pplcnet.yml`，是在Market1501(751类人)行人ReID数据集上训练得到的，训练细节待PaddleClas公布。
+- `deepsort_xxx_yyy.yml`是一个完整的DeepSORT跟踪的配置，如`deepsort_ppyolov2_pplcnet.yml`，其中检测部分`xxx`是`detector/`里的，reid和tracker部分`yyy`是`reid/`里的。
+- DeepSORT跟踪的eval/infer有两种方式，方式1是只使用`reid/deepsort_yyy.yml`加载检测结果文件和`yyy`ReID模型，方式2是使用`deepsort_xxx_yyy.yml`加载`xxx`检测模型和`yyy`ReID模型，但是DeepSORT跟踪的deploy必须使用`deepsort_xxx_yyy.yml`。
+- 检测器的eval/infer/deploy只使用到`detector/xxx.yml`，ReID一般不单独使用，如需单独使用必须提前加载检测结果文件然后只使用`reid/deepsort_yyy.yml`。
+
+
+### 2、适配的具体步骤
+1.先将数据集制作成COCO格式按通用检测模型配置来训练，参照`detector/`文件夹里的模型配置文件，制作生成`detector/xxx.yml`, 已经支持有Faster R-CNN、YOLOv3、PPYOLOv2、JDE YOLOv3和PicoDet等模型。
+
+2.制作`deepsort_xxx_yyy.yml`, 其中`DeepSORT.detector`的配置就是`detector/xxx.yml`里的, `EvalMOTDataset`和`det_weights`可以自行设置。`yyy`是`reid/deepsort_yyy.yml`如`reid/deepsort_pplcnet.yml`。
+
+### 3、使用的具体步骤
+#### 1.加载检测模型和ReID模型去评估:
+```
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/deepsort/deepsort_xxx_yyy.yml --scaled=True
+```
+#### 2.加载检测模型和ReID模型去推理:
+```
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/deepsort/deepsort_xxx_yyy.yml --video_file=mot17_demo.mp4 --scaled=True --save_videos
+```
+#### 3.导出检测模型和ReID模型:
+```bash
+# 导出检测模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/detector/xxx.yml
+# 导出ReID模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_yyy.yml
+```
+#### 4.使用导出的检测模型和ReID模型去部署:
+```
+python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/xxx./ --reid_model_dir=output_inference/deepsort_yyy/ --video_file=mot17_demo.mp4 --device=GPU --scaled=True --save_mot_txts
+```
+**注意:**
+ - `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的，如果使用的检测模型是JDE的YOLOv3则为False，如果使用通用检测模型则为True。
+
+
+## 引用
+```
+@inproceedings{Wojke2017simple,
+  title={Simple Online and Realtime Tracking with a Deep Association Metric},
+  author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
+  booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
+  year={2017},
+  pages={3645--3649},
+  organization={IEEE},
+  doi={10.1109/ICIP.2017.8296962}
+}
+
+@inproceedings{Wojke2018deep,
+  title={Deep Cosine Metric Learning for Person Re-identification},
+  author={Wojke, Nicolai and Bewley, Alex},
+  booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
+  year={2018},
+  pages={748--756},
+  organization={IEEE},
+  doi={10.1109/WACV.2018.00087}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/_base_/deepsort_reader_1088x608.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/_base_/deepsort_reader_1088x608.yml
@@ -0,0 +1,22 @@
+# DeepSORT does not need to train on MOT dataset, only used for evaluation.
+# MOT dataset needs to be trained on the detector(like YOLOv3) only using bboxes.
+# And gt IDs don't need to be trained.
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 608, 1088]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/_base_/mot17.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/_base_/mot17.yml
@@ -0,0 +1,34 @@
+metric: COCO
+num_classes: 1
+
+# Detection Dataset for training
+TrainDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/train_half.json
+    image_dir: images/train
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+    image_dir: images/train
+
+TestDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot/MOT17
+    anno_path: annotations/val_half.json
+
+
+# MOTDataset for MOT evaluation and inference
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT and ByteTrack
+
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_jde_yolov3_pcb_pyramid.yml
@@ -0,0 +1,71 @@
+_BASE_: [
+  'detector/jde_yolov3_darknet53_30e_1088x608_mix.yml',
+  '_base_/mot17.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT16/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: YOLOv3 # JDE version YOLOv3
+  reid: PCBPyramid
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml'
+PCBPyramid:
+  model_name: "ResNet101"
+  num_conv_out_channels: 128
+  num_classes: 751
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration: JDE version YOLOv3
+# see 'configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml'
+# The most obvious difference from general YOLOv3 is the JDEBBoxPostProcess and the bboxes coordinates output are not scaled to the original image.
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+
+# Tracking requires higher quality boxes, so decode.conf_thresh will be higher
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.3
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.5
+    nms_top_k: 2000
+    normalized: true
+  return_idx: false
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_jde_yolov3_pplcnet.yml
@@ -0,0 +1,70 @@
+_BASE_: [
+  'detector/jde_yolov3_darknet53_30e_1088x608_mix.yml',
+  '_base_/mot17.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT16/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/jde_yolov3_darknet53_30e_1088x608_mix.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: YOLOv3 # JDE version YOLOv3
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration: JDE version YOLOv3
+# see 'configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml'
+# The most obvious difference from general YOLOv3 is the JDEBBoxPostProcess and the bboxes coordinates output are not scaled to the original image.
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+
+# Tracking requires higher quality boxes, so decode.conf_thresh will be higher
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.3
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.5
+    nms_top_k: 2000
+    normalized: true
+  return_idx: false
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_ppyoloe_pplcnet.yml
@@ -0,0 +1,109 @@
+_BASE_: [
+  'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml',
+  '_base_/mot17.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+
+# reader
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 640, 640]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: YOLOv3 # PPYOLOe version
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration: PPYOLOe version
+# see 'configs/mot/deepsort/detector/ppyoloe_crn_l_300e_640x640_mot17half.yml'
+YOLOv3:
+  backbone: CSPResNet
+  neck: CustomCSPPAN
+  yolo_head: PPYOLOEHead
+  post_process: ~
+
+CSPResNet:
+  layers: [3, 6, 6, 3]
+  channels: [64, 128, 256, 512, 1024]
+  return_idx: [1, 2, 3]
+  use_large_stem: True
+
+CustomCSPPAN:
+  out_channels: [768, 384, 192]
+  stage_num: 1
+  block_num: 3
+  act: 'swish'
+  spp: true
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+PPYOLOEHead:
+  fpn_strides: [32, 16, 8]
+  grid_cell_scale: 5.0
+  grid_cell_offset: 0.5
+  static_assigner_epoch: -1 # 100
+  use_varifocal_loss: True
+  loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.4 # 0.01 in original detector
+    nms_threshold: 0.6
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_ppyoloe_resnet.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_ppyoloe_resnet.yml
@@ -0,0 +1,108 @@
+_BASE_: [
+  'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml',
+  '_base_/mot17.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_resnet.pdparams
+
+# reader
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 640, 640]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: YOLOv3 # PPYOLOe version
+  reid: ResNetEmbedding
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_resnet.yml'
+ResNetEmbedding:
+  model_name: "ResNet50"
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration: PPYOLOe version
+# see 'configs/mot/deepsort/detector/ppyoloe_crn_l_300e_640x640_mot17half.yml'
+YOLOv3:
+  backbone: CSPResNet
+  neck: CustomCSPPAN
+  yolo_head: PPYOLOEHead
+  post_process: ~
+
+CSPResNet:
+  layers: [3, 6, 6, 3]
+  channels: [64, 128, 256, 512, 1024]
+  return_idx: [1, 2, 3]
+  use_large_stem: True
+
+CustomCSPPAN:
+  out_channels: [768, 384, 192]
+  stage_num: 1
+  block_num: 3
+  act: 'swish'
+  spp: true
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+PPYOLOEHead:
+  fpn_strides: [32, 16, 8]
+  grid_cell_scale: 5.0
+  grid_cell_offset: 0.5
+  static_assigner_epoch: -1 # 100
+  use_varifocal_loss: True
+  loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.4 # 0.01 in original detector
+    nms_threshold: 0.6
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_ppyolov2_pplcnet.yml
@@ -0,0 +1,98 @@
+_BASE_: [
+  'detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml',
+  '_base_/mot17.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+
+# reader
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 640, 640]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: YOLOv3 # PPYOLOv2 version
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration: PPYOLOv2 version
+# see 'configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml'
+YOLOv3:
+  backbone: ResNet
+  neck: PPYOLOPAN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+ResNet:
+  depth: 50
+  variant: d
+  return_idx: [1, 2, 3]
+  dcn_v2_stages: [3]
+  freeze_at: -1
+  freeze_norm: false
+  norm_decay: 0.
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.25 # 0.01 in original detector
+    downsample_ratio: 32
+    clip_bbox: true
+    scale_x_y: 1.05
+  nms:
+    name: MatrixNMS
+    keep_top_k: 100
+    score_threshold: 0.4 # 0.01 in original detector
+    post_threshold: 0.4 # 0.01 in original detector
+    nms_top_k: -1
+    background_label: -1
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_yolov3_pplcnet.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/deepsort_yolov3_pplcnet.yml
@@ -0,0 +1,87 @@
+_BASE_: [
+  'detector/yolov3_darknet53_40e_608x608_mot17half.yml',
+  '_base_/mot17.yml',
+  '_base_/deepsort_reader_1088x608.yml',
+]
+metric: MOT
+num_classes: 1
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
+
+# reader
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 608, 608]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+# DeepSORT configuration
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: YOLOv3 # General YOLOv3 version
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+
+# reid and tracker configuration
+# see 'configs/mot/deepsort/reid/deepsort_pplcnet.yml'
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
+
+
+# detector configuration: General YOLOv3 version
+# see 'configs/mot/deepsort/detector/yolov3_darknet53_40e_608x608_mot17half.yml'
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+# Tracking requires higher quality boxes, so NMS score_threshold will be higher
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.005
+    downsample_ratio: 32
+    clip_bbox: true
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.3 # 0.01 in original detector
+    nms_threshold: 0.45
+    nms_top_k: 1000
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/README.md
@@ -0,0 +1,34 @@
+English | [简体中文](README_cn.md)
+
+# Detector for DeepSORT
+
+## Introduction
+[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) is composed of a detector and a ReID model in series. The configs of several common detectors are provided here as a reference. Note that different training dataset, backbone, input size, training epochs and NMS threshold will lead to differences in model accuracy and performance. Please adapt according to your needs.
+
+## Model Zoo
+### Results on MOT17-half dataset
+| Backbone        | Model           | input size  | lr schedule |  FPS          | Box AP  |  download    | config  |
+| :-------------- | :-------------  | :--------:  | :---------: | :-----------: | :-----: | :----------: | :-----: |
+| DarkNet-53      | YOLOv3          |   608X608   |   40e      |      ----     |  42.7   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams)  | [config](./yolov3_darknet53_40e_608x608_mot17half.yml) |
+| ResNet50-vd     | PPYOLOv2        |   640x640   |   365e      |      ----     |  46.8   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams)  | [config](./ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml) |
+| CSPResNet       | PPYOLOe         |   640x640   |   36e       |      ----     |  52.9   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams)     | [config](./ppyoloe_crn_l_36e_640x640_mot17half.yml)    |
+
+**Notes:**
+  - The above models are trained with **MOT17-half train** set, it can be downloaded from this [link](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip).
+  - **MOT17-half train** set is a dataset composed of pictures and labels of the first half frame of each video in MOT17 Train dataset (7 sequences in total). **MOT17-half val set** is used for evaluation, which is composed of the second half frame of each video. They can be downloaded from this [link](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip). Download and unzip it in the `dataset/mot/MOT17/images/`folder.
+  - YOLOv3 is trained with the same pedestrian dataset as `configs/pphuman/pedestrian_yolov3/pedestrian_yolov3_darknet.yml`, which is not open yet.
+  - For pedestrian tracking, please use pedestrian detector combined with pedestrian ReID model. For vehicle tracking, please use vehicle detector combined with vehicle ReID model.
+  - High quality detected boxes are required for DeepSORT tracking, so the post-processing settings such as NMS threshold of these models are different from those in pure detection tasks.
+
+## Quick Start
+
+Start the training and evaluation with the following command
+```bash
+job_name=ppyoloe_crn_l_36e_640x640_mot17half
+config=configs/mot/deepsort/detector/${job_name}.yml
+log_dir=log_dir/${job_name}
+# 1. training
+python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp --fleet
+# 2. evaluation
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/${job_name}.pdparams
+```
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/README_cn.md
@@ -0,0 +1,36 @@
+简体中文 | [English](README.md)
+
+# DeepSORT的检测器
+
+## 简介
+[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 由检测器和ReID模型串联组合而成，此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异，请自行根据需求进行适配。
+
+## 模型库
+
+### 在MOT17-half val数据集上的检测结果
+| 骨架网络         | 网络类型          |   输入尺度   | 学习率策略    |推理时间(fps)   |  Box AP |   下载    | 配置文件 |
+| :-------------- | :-------------  | :--------:  | :---------: | :-----------: | :-----: | :------: | :-----: |
+| DarkNet-53      | YOLOv3          |   608X608   |   40e      |      ----     |  42.7   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolov3_darknet53_40e_608x608_mot17half.pdparams)  | [配置文件](./yolov3_darknet53_40e_608x608_mot17half.yml) |
+| ResNet50-vd     | PPYOLOv2        |   640x640   |   365e      |      ----     |  46.8   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyolov2_r50vd_dcn_365e_640x640_mot17half.pdparams)  | [配置文件](./ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml) |
+| CSPResNet       | PPYOLOe         |   640x640   |   36e       |      ----     |  52.9   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/ppyoloe_crn_l_36e_640x640_mot17half.pdparams)     | [配置文件](./ppyoloe_crn_l_36e_640x640_mot17half.yml)    |
+
+**注意:**
+  - 以上模型均可采用**MOT17-half train**数据集训练，数据集可以从[此链接](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip)下载。
+  - **MOT17-half train**是MOT17的train序列(共7个)每个视频的前一半帧的图片和标注组成的数据集，而为了验证精度可以都用**MOT17-half val**数据集去评估，它是每个视频的后一半帧组成的，数据集可以从[此链接](https://paddledet.bj.bcebos.com/data/mot/mot17half/annotations.zip)下载，并解压放在`dataset/mot/MOT17/images/`文件夹下。
+  - YOLOv3和`configs/pphuman/pedestrian_yolov3/pedestrian_yolov3_darknet.yml`是相同的pedestrian数据集训练的，此数据集暂未开放。
+  - 行人跟踪请使用行人检测器结合行人ReID模型。车辆跟踪请使用车辆检测器结合车辆ReID模型。
+  - 用于DeepSORT跟踪时需要高质量的检出框，因此这些模型的NMS阈值等后处理设置会与纯检测任务的设置不同。
+
+
+## 快速开始
+
+通过如下命令一键式启动训练和评估
+```bash
+job_name=ppyoloe_crn_l_36e_640x640_mot17half
+config=configs/mot/deepsort/detector/${job_name}.yml
+log_dir=log_dir/${job_name}
+# 1. training
+python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp
+# 2. evaluation
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=https://paddledet.bj.bcebos.com/models/mot/${job_name}.pdparams
+```
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/jde_yolov3_darknet53_30e_1088x608_mix.yml
@@ -0,0 +1,83 @@
+_BASE_: [
+  '../../../datasets/mot.yml',
+  '../../../runtime.yml',
+  '../../jde/_base_/optimizer_30e.yml',
+  '../../jde/_base_/jde_reader_1088x608.yml',
+]
+weights: output/jde_yolov3_darknet53_30e_1088x608_mix/model_final
+
+metric: MOTDet
+num_classes: 1
+EvalReader:
+  inputs_def:
+    num_max_boxes: 50
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 608, 1088]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+EvalDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['mot17.half']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+TestDataset:
+  !ImageFolder
+    anno_path: None
+
+
+# detector configuration
+architecture: YOLOv3
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
+
+# JDE version for MOT dataset
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+
+DarkNet:
+  depth: 53
+  return_idx: [2, 3, 4]
+  freeze_norm: True
+
+YOLOv3FPN:
+  freeze_norm: True
+
+YOLOv3Head:
+  anchors: [[128,384], [180,540], [256,640], [512,640],
+            [32,96], [45,135], [64,192], [90,271],
+            [8,24], [11,34], [16,48], [23,68]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDEDetectionLoss:
+  for_mot: False
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.3
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.5
+    nms_top_k: 2000
+    normalized: true
+  return_idx: false
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml
@@ -0,0 +1,82 @@
+_BASE_: [
+  '../../../ppyoloe/ppyoloe_crn_l_300e_coco.yml',
+  '../_base_/mot17.yml',
+]
+weights: output/ppyoloe_crn_l_36e_640x640_mot17half/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+
+# schedule configuration for fine-tuning
+epoch: 36
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+    - !CosineDecay
+      max_epochs: 43
+    - !LinearWarmup
+      start_factor: 0.001
+      epochs: 1
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+
+TrainReader:
+  batch_size: 8
+
+
+# detector configuration
+architecture: YOLOv3
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams
+depth_mult: 1.0
+width_mult: 1.0
+
+YOLOv3:
+  backbone: CSPResNet
+  neck: CustomCSPPAN
+  yolo_head: PPYOLOEHead
+  post_process: ~
+
+CSPResNet:
+  layers: [3, 6, 6, 3]
+  channels: [64, 128, 256, 512, 1024]
+  return_idx: [1, 2, 3]
+  use_large_stem: True
+
+CustomCSPPAN:
+  out_channels: [768, 384, 192]
+  stage_num: 1
+  block_num: 3
+  act: 'swish'
+  spp: true
+
+PPYOLOEHead:
+  fpn_strides: [32, 16, 8]
+  grid_cell_scale: 5.0
+  grid_cell_offset: 0.5
+  static_assigner_epoch: -1 # 100
+  use_varifocal_loss: True
+  loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
+  static_assigner:
+    name: ATSSAssigner
+    topk: 9
+  assigner:
+    name: TaskAlignedAssigner
+    topk: 13
+    alpha: 1.0
+    beta: 6.0
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 1000
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.6
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/ppyolov2_r50vd_dcn_365e_640x640_mot17half.yml
@@ -0,0 +1,75 @@
+_BASE_: [
+  '../../../ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml',
+  '../_base_/mot17.yml',
+]
+weights: output/ppyolov2_r50vd_dcn_365e_640x640_mot17half/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+
+# detector configuration
+architecture: YOLOv3
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+
+YOLOv3:
+  backbone: ResNet
+  neck: PPYOLOPAN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+ResNet:
+  depth: 50
+  variant: d
+  return_idx: [1, 2, 3]
+  dcn_v2_stages: [3]
+  freeze_at: -1
+  freeze_norm: false
+  norm_decay: 0.
+
+PPYOLOPAN:
+  drop_block: true
+  block_size: 3
+  keep_prob: 0.9
+  spp: true
+
+YOLOv3Head:
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  loss: YOLOv3Loss
+  iou_aware: true
+  iou_aware_factor: 0.5
+
+YOLOv3Loss:
+  ignore_thresh: 0.7
+  downsample: [32, 16, 8]
+  label_smooth: false
+  scale_x_y: 1.05
+  iou_loss: IouLoss
+  iou_aware_loss: IouAwareLoss
+
+IouLoss:
+  loss_weight: 2.5
+  loss_square: true
+
+IouAwareLoss:
+  loss_weight: 1.0
+
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.01
+    downsample_ratio: 32
+    clip_bbox: true
+    scale_x_y: 1.05
+  nms:
+    name: MatrixNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    post_threshold: 0.01
+    nms_top_k: -1
+    background_label: -1
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/yolov3_darknet53_40e_608x608_mot17half.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/detector/yolov3_darknet53_40e_608x608_mot17half.yml
@@ -0,0 +1,76 @@
+_BASE_: [
+  '../../../yolov3/yolov3_darknet53_270e_coco.yml',
+  '../_base_/mot17.yml',
+]
+weights: output/yolov3_darknet53_40e_608x608_mot17half/model_final
+log_iter: 20
+snapshot_epoch: 2
+
+# schedule configuration for fine-tuning
+epoch: 40
+LearningRate:
+  base_lr: 0.0001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 32
+    - 36
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 100
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+TrainReader:
+  batch_size: 8
+  mixup_epoch: 35
+
+# detector configuration
+architecture: YOLOv3
+pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/yolov3_darknet53_270e_coco.pdparams
+norm_type: sync_bn
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+DarkNet:
+  depth: 53
+  return_idx: [2, 3, 4]
+
+# use default config
+# YOLOv3FPN:
+
+YOLOv3Head:
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  loss: YOLOv3Loss
+
+YOLOv3Loss:
+  ignore_thresh: 0.7
+  downsample: [32, 16, 8]
+  label_smooth: false
+
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.005
+    downsample_ratio: 32
+    clip_bbox: true
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.45
+    nms_top_k: 1000
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/README.md
@@ -0,0 +1,26 @@
+English | [简体中文](README_cn.md)
+
+# ReID of DeepSORT
+
+## Introduction
+[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) is composed of detector and ReID model in series. Several common ReID models are provided here for the configs of DeepSORT as a reference.
+
+## Model Zoo
+
+### Results on Market1501 pedestrian ReID dataset
+
+| Backbone        | Model                   |   Params  |   FPS     |    mAP    |   Top1    |   Top5    | download  |  config   |
+| :-------------: |  :-----------------:    | :-------: |  :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
+| ResNet-101      |  PCB Pyramid Embedding  |  289M     |   ---     |   86.31   |   94.95   |   98.28   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)   |   [config](./deepsort_pcb_pyramid_r101.yml)     |
+| PPLCNet-2.5x    |  PPLCNet Embedding      |  36M      |   ---     |   71.59   |   87.38   |   95.49   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)   |   [config](./deepsort_pplcnet.yml)     |
+
+### Results on VERI-Wild vehicle ReID dataset
+
+| Backbone        | Model                   |  Params   |   FPS     |    mAP    |   Top1    |   Top5    | download  |  config   |
+| :-------------: |  :-----------------:    | :-------: |  :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
+| PPLCNet-2.5x    |  PPLCNet Embedding      |  93M      |   ---     |   82.44   |   93.54   |   98.53   | [download](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.pdparams)   |   [config](./deepsort_pplcnet_vehicle.yml)     |
+
+**Notes:**
+  - ReID models are provided by [PaddleClas](https://github.com/PaddlePaddle/PaddleClas), the specific training process and code will be published by PaddleClas.
+  - For pedestrian tracking, please use the **Market1501** pedestrian ReID model in combination with a pedestrian detector.
+  - For vehicle tracking, please use the **VERI-Wild** vehicle ReID model in combination with a vehicle detector.
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/README_cn.md
@@ -0,0 +1,26 @@
+简体中文 | [English](README.md)
+
+# DeepSORT的ReID模型
+
+## 简介
+[DeepSORT](https://arxiv.org/abs/1812.00442)(Deep Cosine Metric Learning SORT) 由检测器和ReID模型串联组合而成，此处提供了几个常用ReID模型的配置作为DeepSORT使用的参考。
+
+## 模型库
+
+### 在Market1501行人重识别数据集上的结果
+
+| 骨架网络         | 网络类型                  |  Params   |   FPS     |    mAP    |   Top1    |   Top5    |  下载链接  |   配置文件 |
+| :-------------: |  :-----------------:    | :-------: |  :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
+| ResNet-101      |  PCB Pyramid Embedding  |  289M     |   ---     |   86.31   |   94.95   |   98.28   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams)   |   [配置文件](./deepsort_pcb_pyramid_r101.yml)     |
+| PPLCNet-2.5x    |  PPLCNet Embedding      |  36M      |   ---     |   71.59   |   87.38   |   95.49   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams)   |   [配置文件](./deepsort_pplcnet.yml)     |
+
+### 在VERI-Wild车辆重识别数据集上的结果
+
+| 骨架网络         | 网络类型                  |  Params   |   FPS     |    mAP    |   Top1    |   Top5    |  下载链接  |   配置文件 |
+| :-------------: |  :-----------------:    | :-------: |  :------: | :-------: | :-------: | :-------: | :-------: | :-------: |
+| PPLCNet-2.5x    |  PPLCNet Embedding      |  93M      |   ---     |   82.44   |   93.54   |   98.53   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet_vehicle.pdparams)   |   [配置文件](./deepsort_pplcnet_vehicle.yml)     |
+
+**注意:**
+  - ReID模型由[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)提供，具体训练流程和代码待PaddleClas公布.
+  - 行人跟踪请用**Market1501**行人重识别数据集训练的ReID模型结合行人检测器去使用。
+  - 车辆跟踪请用**VERI-Wild**车辆重识别数据集训练的ReID模型结合车辆检测器去使用。
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/deepsort_pcb_pyramid_r101.yml
@@ -0,0 +1,45 @@
+# This config represents a ReID only configuration of DeepSORT, it has two uses.
+# One is used for loading the detection results and ReID model to get tracking results;
+# Another is used for exporting the ReID model to deploy infer.
+
+_BASE_: [
+  '../../../datasets/mot.yml',
+  '../../../runtime.yml',
+  '../_base_/deepsort_reader_1088x608.yml',
+]
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT16/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: None
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pcb_pyramid_r101.pdparams
+
+
+# A ReID only configuration of DeepSORT, detector should be None.
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: None
+  reid: PCBPyramid
+  tracker: DeepSORTTracker
+
+PCBPyramid:
+  model_name: "ResNet101"
+  num_conv_out_channels: 128
+  num_classes: 751 # default 751 classes in Market-1501 dataset.
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0     # 0 means no need to filter out too small boxes
+  vertical_ratio: -1  # -1 means no need to filter out bboxes, usually set 1.6 for pedestrian
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/deepsort_pplcnet.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/deepsort_pplcnet.yml
@@ -0,0 +1,44 @@
+# This config represents a ReID only configuration of DeepSORT, it has two uses.
+# One is used for loading the detection results and ReID model to get tracking results;
+# Another is used for exporting the ReID model to deploy infer.
+
+_BASE_: [
+  '../../../datasets/mot.yml',
+  '../../../runtime.yml',
+  '../_base_/deepsort_reader_1088x608.yml',
+]
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT16/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: None
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pplcnet.pdparams
+
+
+# A ReID only configuration of DeepSORT, detector should be None.
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: None
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0 # filter out too small boxes
+  vertical_ratio: -1 # filter out bboxes, usually set 1.6 for pedestrian
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/deepsort_pplcnet_vehicle.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/deepsort_pplcnet_vehicle.yml
@@ -0,0 +1,44 @@
+# This config represents a ReID only configuration of DeepSORT, it has two uses.
+# One is used for loading the detection results and ReID model to get tracking results;
+# Another is used for exporting the ReID model to deploy infer.
+
+_BASE_: [
+  '../../../datasets/mot.yml',
+  '../../../runtime.yml',
+  '../_base_/deepsort_reader_1088x608.yml',
+]
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: kitti_vehicle/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: None
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort_pplcnet_vehicle.pdparams
+
+
+# A ReID only configuration of DeepSORT, detector should be None.
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: None
+  reid: PPLCNetEmbedding
+  tracker: DeepSORTTracker
+
+PPLCNetEmbedding:
+  input_ch: 1280
+  output_ch: 512
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0     # 0 means no need to filter out too small boxes
+  vertical_ratio: -1  # -1 means no need to filter out bboxes, usually set 1.6 for pedestrian
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
--- a/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/deepsort_resnet.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/deepsort/reid/deepsort_resnet.yml
@@ -0,0 +1,43 @@
+# This config represents a ReID only configuration of DeepSORT, it has two uses.
+# One is used for loading the detection results and ReID model to get tracking results;
+# Another is used for exporting the ReID model to deploy infer.
+
+_BASE_: [
+  '../../../datasets/mot.yml',
+  '../../../runtime.yml',
+  '../_base_/deepsort_reader_1088x608.yml',
+]
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT16/images/train
+    keep_ori_im: True # set as True in DeepSORT
+
+det_weights: None
+reid_weights: https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_resnet.pdparams
+
+
+# A ReID only configuration of DeepSORT, detector should be None.
+architecture: DeepSORT
+pretrain_weights: None
+
+DeepSORT:
+  detector: None
+  reid: ResNetEmbedding
+  tracker: DeepSORTTracker
+
+ResNetEmbedding:
+  model_name: "ResNet50"
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0 # filter out too small boxes
+  vertical_ratio: -1 # filter out bboxes, usually set 1.6 for pedestrian
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+  motion: KalmanFilter
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/README.md
@@ -0,0 +1,208 @@
+English | [简体中文](README_cn.md)
+
+# FairMOT (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking)
+
+## Table of Contents
+- [Introduction](#Introduction)
+- [Model Zoo](#Model_Zoo)
+- [Getting Start](#Getting_Start)
+- [Citations](#Citations)
+
+## Introduction
+
+[FairMOT](https://arxiv.org/abs/2004.01888) is based on an Anchor Free detector Centernet, which overcomes the problem of anchor and feature misalignment in anchor based detection framework. The fusion of deep and shallow features enables the detection and ReID tasks to obtain the required features respectively. It also uses low dimensional ReID features. FairMOT is a simple baseline composed of two homogeneous branches propose to predict the pixel level target score and ReID features. It achieves the fairness between the two tasks and  obtains a higher level of real-time MOT performance.
+
+### PP-Tracking real-time MOT system
+In addition, PaddleDetection also provides [PP-Tracking](../../../deploy/pptracking/README.md) real-time multi-object tracking system.
+PP-Tracking is the first open source real-time Multi-Object Tracking system, and it is based on PaddlePaddle deep learning framework. It has rich models, wide application and high efficiency deployment.
+
+PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). Aiming at the difficulties and pain points of actual business, PP-Tracking provides various MOT functions and applications such as pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking, traffic statistics and multi-camera tracking. The deployment method supports API and GUI visual interface, and the deployment language supports Python and C++, The deployment platform environment supports Linux, NVIDIA Jetson, etc.
+
+### AI studio public project tutorial
+PP-tracking provides an AI studio public project tutorial. Please refer to this [tutorial](https://aistudio.baidu.com/aistudio/projectdetail/3022582).
+
+
+## Model Zoo
+
+### FairMOT Results on MOT-16 Training Set
+
+| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34(paper)  | 1088x608 |  83.3  |  81.9  |   544  |  3822  |  14095  |     -   |    -   |   -    |
+| DLA-34         | 1088x608 |  83.2  |  83.1  |   499  |  3861  |  14223  |     -   | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](./fairmot_dla34_30e_1088x608.yml) |
+| DLA-34         | 864x480 |  80.8  |  81.1  |  561  |  3643  | 16967 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_864x480.pdparams) | [config](./fairmot_dla34_30e_864x480.yml) |
+| DLA-34         | 576x320 |  74.0  |  76.1  |  640  |  4989  | 23034 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_576x320.pdparams) | [config](./fairmot_dla34_30e_576x320.yml) |
+
+
+### FairMOT Results on MOT-16 Test Set
+
+| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34(paper)  | 1088x608 |  74.9  |  72.8  |  1074  |    -   |    -   |   25.9   |    -   |   -    |
+| DLA-34         | 1088x608 |  75.0  |  74.7  |  919   |  7934  |  36747 |    -     | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](./fairmot_dla34_30e_1088x608.yml) |
+| DLA-34         | 864x480 |  73.0  |  72.6  |  977   |  7578  |  40601 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_864x480.pdparams) | [config](./fairmot_dla34_30e_864x480.yml) |
+| DLA-34         | 576x320 |  69.9  |  70.2  |  1044   |  8869  |  44898 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_576x320.pdparams) | [config](./fairmot_dla34_30e_576x320.yml) |
+
+**Notes:**
+ - FairMOT DLA-34 used 2 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epochs.
+
+
+### FairMOT enhance model
+### Results on MOT-16 Test Set
+| backbone       | input shape |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34         | 1088x608 |  75.9  |  74.7  |  1021   |  11425  |  31475 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [config](./fairmot_enhance_dla34_60e_1088x608.yml) |
+| HarDNet-85     | 1088x608 |  75.0  |  70.0  |  1050   |  11837  |  32774 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot_enhance_hardnet85_30e_1088x608.yml) |
+
+### Results on MOT-17 Test Set
+| backbone       | input shape |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  download  | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34         | 1088x608 |  75.3  |  74.2  |  3270  |  29112  | 106749 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [config](./fairmot_enhance_dla34_60e_1088x608.yml) |
+| HarDNet-85     | 1088x608 |  74.7  |  70.7  |  3210  |  29790  | 109914 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot_enhance_hardnet85_30e_1088x608.yml) |
+
+**Notes:**
+ - FairMOT enhance used 8 GPUs for training, and the crowdhuman dataset is added to the train-set during training.
+ - For FairMOT enhance DLA-34 the batch size is 16 on each GPU，and trained for 60 epochs.
+ - For FairMOT enhance HarDNet-85 the batch size is 10 on each GPU，and trained for 30 epochs.
+
+### FairMOT light model
+### Results on MOT-16 Test Set
+| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| HRNetV2-W18   | 1088x608 |  71.7  |  66.6  |  1340  |  8642  | 41592 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [config](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) |
+
+### Results on MOT-17 Test Set
+| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| HRNetV2-W18   | 1088x608 |  70.7  |  65.7  |  4281  |  22485  | 138468 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [config](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) |
+| HRNetV2-W18   | 864x480  |  70.3  |  65.8  |  4056  |  18927  | 144486 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.pdparams) | [config](./fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml) |
+| HRNetV2-W18   | 576x320  |  65.3  |  64.8  |  4137  |  28860  | 163017 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams) | [config](./fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml) |
+
+**Notes:**
+ - FairMOT HRNetV2-W18 used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epochs. Only ImageNet pre-train model is used, and the optimizer adopts Momentum. The crowdhuman dataset is added to the train-set during training.
+
+### FairMOT + BYTETracker
+
+### Results on MOT-17 Half Set
+| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34         | 1088x608 |  69.1  |  72.8  |  299  |  1957  | 14412 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](./fairmot_dla34_30e_1088x608.yml) |
+| DLA-34 + BYTETracker| 1088x608 |  70.3 |  73.2  |  234  |  2176  | 13598 |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bytetracker.pdparams) | [config](./fairmot_dla34_30e_1088x608_bytetracker.yml) |
+
+**Notes:**
+ - FairMOT here is for ablation study, the training dataset is the 5 datasets of MIX(Caltech,CUHKSYSU,PRW,Cityscapes,ETHZ) and the first half of MOT17 Train, and the pretrain weights is CenterNet COCO model, the evaluation is on the second half of MOT17 Train.
+ - BYTETracker adapt to other FairMOT models of PaddleDetection, you can modify the tracker of the config like this:
+ ```
+ JDETracker:
+  use_byte: True
+  match_thres: 0.8
+  conf_thres: 0.4
+  low_conf_thres: 0.2
+ ```
+
+### Fairmot transfer learning model
+
+### Results on GMOT-40 airplane subset
+| backbone       | input shape | MOTA | IDF1 |  IDS  |    FP   |   FN   |    FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34         | 1088x608 |  96.6  |  94.7  |   19   |  300   | 466    |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_airplane.pdparams) | [config](./fairmot_dla34_30e_1088x608_airplane.yml) |
+
+**Note:**
+ - The dataset of this model is a subset of airport category extracted from GMOT-40 dataset. The download link provided by the PaddleDetection team is```wget https://bj.bcebos.com/v1/paddledet/data/mot/airplane.zip```, unzip and store it in the ```dataset/mot```, and then copy the ```airplane.train``` to ```dataset/mot/image_lists```.
+ - FairMOT model here uses the pedestrian FairMOT trained model for pre- training weights. The train-set used is the complete set of airplane, with a total of 4 video sequences, and it also used for evaluation.
+- When applied to the tracking other objects, you should modify ```min_box_area``` and ```vertical_ratio``` of the tracker in the corresponding config file,  like this：
+ ```
+JDETracker:
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+  min_box_area: 0 # 200 for pedestrian
+  vertical_ratio: 0 # 1.6 for pedestrian
+ ```
+
+
+## Getting Start
+
+### 1. Training
+
+Training FairMOT on 2 GPUs with following command
+
+```bash
+python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
+```
+
+
+### 2. Evaluation
+
+Evaluating the track performance of FairMOT on val dataset in single GPU with following commands:
+
+```bash
+# use weights released in PaddleDetection model zoo
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+
+# use saved checkpoint in training
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams
+```
+**Notes:**
+ - The default evaluation dataset is MOT-16 Train Set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mot.yml`：
+  ```
+  EvalMOTDataset:
+    !MOTImageFolder
+      dataset_dir: dataset/mot
+      data_root: MOT17/images/train
+      keep_ori_im: False # set True if save visualization images or video
+  ```
+ - Tracking results will be saved in `{output_dir}/mot_results/`, and every sequence has one txt file, each line of the txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`, and you can set `{output_dir}` by `--output_dir`.
+
+### 3. Inference
+
+Inference a video on single GPU with following command:
+
+```bash
+# inference on video and save a video
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4  --save_videos
+```
+**Notes:**
+ - Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`.
+
+
+### 4. Export model
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+```
+
+### 5. Using exported model for python inference
+
+```bash
+python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
+```
+**Notes:**
+ - The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
+ - Each line of the tracking results txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`.
+
+
+### 6. Using exported MOT and keypoint model for unite python inference
+
+```bash
+python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
+```
+**Notes:**
+ - Keypoint model export tutorial: `configs/keypoint/README.md`.
+
+
+## Citations
+```
+@article{zhang2020fair,
+  title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking},
+  author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu},
+  journal={arXiv preprint arXiv:2004.01888},
+  year={2020}
+}
+@article{shao2018crowdhuman,
+  title={CrowdHuman: A Benchmark for Detecting Human in a Crowd},
+  author={Shao, Shuai and Zhao, Zijian and Li, Boxun and Xiao, Tete and Yu, Gang and Zhang, Xiangyu and Sun, Jian},
+  journal={arXiv preprint arXiv:1805.00123},
+  year={2018}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/README_cn.md
@@ -0,0 +1,202 @@
+简体中文 | [English](README.md)
+
+# FairMOT (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking)
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+- [快速开始](#快速开始)
+- [引用](#引用)
+
+## 内容
+
+[FairMOT](https://arxiv.org/abs/2004.01888)以Anchor Free的CenterNet检测器为基础，克服了Anchor-Based的检测框架中anchor和特征不对齐问题，深浅层特征融合使得检测和ReID任务各自获得所需要的特征，并且使用低维度ReID特征，提出了一种由两个同质分支组成的简单baseline来预测像素级目标得分和ReID特征，实现了两个任务之间的公平性，并获得了更高水平的实时多目标跟踪精度。
+
+### PP-Tracking 实时多目标跟踪系统
+此外，PaddleDetection还提供了[PP-Tracking](../../../deploy/pptracking/README.md)实时多目标跟踪系统。PP-Tracking是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统，具有模型丰富、应用广泛和部署高效三大优势。
+PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式，针对实际业务的难点和痛点，提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用，部署方式支持API调用和GUI可视化界面，部署语言支持Python和C++，部署平台环境支持Linux、NVIDIA Jetson等。
+
+### AI Studio公开项目案例
+PP-Tracking 提供了AI Studio公开项目案例，教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。
+
+## 模型库
+
+### FairMOT在MOT-16 Training Set上结果
+
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
+| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
+| DLA-34(paper)  | 1088x608 |  83.3  |  81.9  |  544  |  3822  | 14095 |    -     |   -   |   -   |
+| DLA-34         | 1088x608 |  83.2  |  83.1  |  499  |  3861  | 14223 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608.yml) |
+| DLA-34         | 864x480 |  80.8  |  81.1  |  561  |  3643  | 16967 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_864x480.pdparams) | [配置文件](./fairmot_dla34_30e_864x480.yml) |
+| DLA-34         | 576x320 |  74.0  |  76.1  |  640  |  4989  | 23034 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_576x320.pdparams) | [配置文件](./fairmot_dla34_30e_576x320.yml) |
+
+### FairMOT在MOT-16 Test Set上结果
+
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
+| DLA-34(paper)  | 1088x608 |  74.9  |  72.8  |  1074  |    -   |    -   |   25.9   |    -   |   -    |
+| DLA-34         | 1088x608 |  75.0  |  74.7  |  919   |  7934  |  36747 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608.yml) |
+| DLA-34         | 864x480 |  73.0  |  72.6  |  977   |  7578  |  40601 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_864x480.pdparams) | [配置文件](./fairmot_dla34_30e_864x480.yml) |
+| DLA-34         | 576x320 |  69.9  |  70.2  |  1044   |  8869  |  44898 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_576x320.pdparams) | [配置文件](./fairmot_dla34_30e_576x320.yml) |
+
+**注意:**
+ - FairMOT DLA-34均使用2个GPU进行训练，每个GPU上batch size为6，训练30个epoch。
+
+
+### FairMOT enhance模型
+### 在MOT-16 Test Set上结果
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34         | 1088x608 |  75.9  |  74.7  |  1021   |  11425  |  31475 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [配置文件](./fairmot_enhance_dla34_60e_1088x608.yml) |
+| HarDNet-85     | 1088x608 |  75.0  |  70.0  |  1050   |  11837  |  32774 | -        |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot_enhance_hardnet85_30e_1088x608.yml) |
+
+### 在MOT-17 Test Set上结果
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34         | 1088x608 |  75.3  |  74.2  |  3270  |  29112  | 106749 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [配置文件](./fairmot_enhance_dla34_60e_1088x608.yml) |
+| HarDNet-85     | 1088x608 |  74.7  |  70.7  |  3210  |  29790  | 109914 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot_enhance_hardnet85_30e_1088x608.yml) |
+
+**注意:**
+ - FairMOT enhance模型均使用8个GPU进行训练，训练集中加入了crowdhuman数据集一起参与训练。
+ - FairMOT enhance DLA-34 每个GPU上batch size为16，训练60个epoch。
+ - FairMOT enhance HarDNet-85 每个GPU上batch size为10，训练30个epoch。
+
+### FairMOT轻量级模型
+### 在MOT-16 Test Set上结果
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| HRNetV2-W18   | 1088x608 |  71.7  |  66.6  |  1340  |  8642  | 41592 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) |
+
+### 在MOT-17 Test Set上结果
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| HRNetV2-W18   | 1088x608 |  70.7  |  65.7  |  4281  |  22485  | 138468 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) |
+| HRNetV2-W18   | 864x480  |  70.3  |  65.8  |  4056  |  18927  | 144486 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml) |
+| HRNetV2-W18   | 576x320  |  65.3  |  64.8  |  4137  |  28860  | 163017 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.pdparams) | [配置文件](./fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml) |
+
+**注意:**
+ - FairMOT HRNetV2-W18均使用8个GPU进行训练，每个GPU上batch size为4，训练30个epoch，使用的ImageNet预训练，优化器策略采用的是Momentum，并且训练集中加入了crowdhuman数据集一起参与训练。
+
+### FairMOT + BYTETracker
+
+### 在MOT-17 Half上结果
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34         | 1088x608 |  69.1  |  72.8  |  299  |  1957  | 14412 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608.yml) |
+| DLA-34 + BYTETracker| 1088x608 |  70.3 |  73.2  |  234  |  2176  | 13598 |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_bytetracker.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_bytetracker.yml) |
+
+
+**注意:**
+ - FairMOT模型此处是ablation study的配置，使用的训练集是原先MIX的5个数据集(Caltech,CUHKSYSU,PRW,Cityscapes,ETHZ)加上MOT17 Train的前一半，且使用是预训练权重是CenterNet的COCO预训练权重，验证是在MOT17 Train的后一半上测的。
+ - BYTETracker应用到PaddleDetection的其他FairMOT模型，只需要更改对应的config文件里的tracker部分为如下所示：
+ ```
+ JDETracker:
+  use_byte: True
+  match_thres: 0.8
+  conf_thres: 0.4
+  low_conf_thres: 0.2
+ ```
+
+### FairMOT迁移学习模型
+
+### 在GMOT-40的airplane子集上的结果
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
+| DLA-34         | 1088x608 |  96.6  |  94.7  |   19   |  300   | 466    |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_airplane.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_airplane.yml) |
+
+**注意:**
+ - 此模型数据集是GMOT-40的airplane类别抽离出来的子集，PaddleDetection团队整理后的下载链接为: ```wget https://bj.bcebos.com/v1/paddledet/data/mot/airplane.zip```，下载解压存放于 ```dataset/mot```目录下，并将其中的```airplane.train```复制存放于```dataset/mot/image_lists```。
+ - FairMOT模型此处训练是采用行人FairMOT训好的模型作为预训练权重，使用的训练集是airplane全集共4个视频序列，验证也是在全集上测的。
+ - 应用到其他物体的跟踪，需要更改对应的config文件里的tracker部分的```min_box_area```和```vertical_ratio```，如下所示：
+ ```
+JDETracker:
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+  min_box_area: 0 # 200 for pedestrian
+  vertical_ratio: 0 # 1.6 for pedestrian
+ ```
+
+## 快速开始
+
+### 1. 训练
+
+使用2个GPU通过如下命令一键式启动训练
+
+```bash
+python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
+```
+
+### 2. 评估
+
+使用单张GPU通过如下命令一键式启动评估
+
+```bash
+# 使用PaddleDetection发布的权重
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+
+# 使用训练保存的checkpoint
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams
+```
+**注意:**
+ - 默认评估的是MOT-16 Train Set数据集, 如需换评估数据集可参照以下代码修改`configs/datasets/mot.yml`：
+  ```
+  EvalMOTDataset:
+    !MOTImageFolder
+      dataset_dir: dataset/mot
+      data_root: MOT17/images/train
+      keep_ori_im: False # set True if save visualization images or video
+  ```
+ - 跟踪结果会存于`{output_dir}/mot_results/`中，里面每个视频序列对应一个txt，每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。
+
+### 3. 预测
+
+使用单个GPU通过如下命令预测一个视频，并保存为视频
+
+```bash
+# 预测一个视频
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4  --save_videos
+```
+
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+
+### 4. 导出预测模型
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+```
+
+### 5. 用导出的模型基于Python去预测
+
+```bash
+python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
+```
+**注意:**
+ - 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。
+
+### 6. 用导出的跟踪和关键点模型Python联合预测
+
+```bash
+python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
+```
+**注意:**
+ - 关键点模型导出教程请参考`configs/keypoint/README.md`。
+
+
+## 引用
+```
+@article{zhang2020fair,
+  title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking},
+  author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu},
+  journal={arXiv preprint arXiv:2004.01888},
+  year={2020}
+}
+@article{shao2018crowdhuman,
+  title={CrowdHuman: A Benchmark for Detecting Human in a Crowd},
+  author={Shao, Shuai and Zhao, Zijian and Li, Boxun and Xiao, Tete and Yu, Gang and Zhang, Xiangyu and Sun, Jian},
+  journal={arXiv preprint arXiv:1805.00123},
+  year={2018}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_dla34.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_dla34.yml
@@ -0,0 +1,46 @@
+architecture: FairMOT
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/fairmot_dla34_crowdhuman_pretrained.pdparams
+for_mot: True
+
+FairMOT:
+  detector: CenterNet
+  reid: FairMOTEmbeddingHead
+  loss: FairMOTLoss
+  tracker: JDETracker
+
+CenterNet:
+  backbone: DLA
+  neck: CenterNetDLAFPN
+  head: CenterNetHead
+  post_process: CenterNetPostProcess
+
+CenterNetDLAFPN:
+  down_ratio: 4
+  last_level: 5
+  out_channel: 0
+  dcn_v2: True
+  with_sge: False
+
+CenterNetHead:
+  head_planes: 256
+  prior_bias: -2.19
+  regress_ltrb: True
+  size_loss: 'L1'
+  loss_weight: {'heatmap': 1.0, 'size': 0.1, 'offset': 1.0, 'iou': 0.0}
+  add_iou: False
+
+FairMOTEmbeddingHead:
+  ch_head: 256
+  ch_emb: 128
+
+CenterNetPostProcess:
+  max_per_img: 500
+  down_ratio: 4
+  regress_ltrb: True
+
+JDETracker:
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+  min_box_area: 200
+  vertical_ratio: 1.6 # for pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_hardnet85.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_hardnet85.yml
@@ -0,0 +1,43 @@
+architecture: FairMOT
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/centernet_hardnet85_coco.pdparams
+for_mot: True
+
+FairMOT:
+  detector: CenterNet
+  reid: FairMOTEmbeddingHead
+  loss: FairMOTLoss
+  tracker: JDETracker
+
+CenterNet:
+  backbone: HarDNet
+  neck: CenterNetHarDNetFPN
+  head: CenterNetHead
+  post_process: CenterNetPostProcess
+
+HarDNet:
+  depth_wise: False
+  return_idx: [1,3,8,13]
+  arch: 85
+
+CenterNetHarDNetFPN:
+  num_layers: 85
+  down_ratio: 4
+  last_level: 4
+  out_channel: 0
+
+CenterNetHead:
+  head_planes: 128
+
+FairMOTEmbeddingHead:
+  ch_head: 512
+
+CenterNetPostProcess:
+  max_per_img: 500
+  regress_ltrb: True
+
+JDETracker:
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+  min_box_area: 200
+  vertical_ratio: 1.6 # for pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_hrnetv2_w18_dlafpn.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_hrnetv2_w18_dlafpn.yml
@@ -0,0 +1,38 @@
+architecture: FairMOT
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams
+for_mot: True
+
+FairMOT:
+  detector: CenterNet
+  reid: FairMOTEmbeddingHead
+  loss: FairMOTLoss
+  tracker: JDETracker
+
+CenterNet:
+  backbone: HRNet
+  head: CenterNetHead
+  post_process: CenterNetPostProcess
+  neck: CenterNetDLAFPN
+
+HRNet:
+  width: 18
+  freeze_at: 0
+  return_idx: [0, 1, 2, 3]
+  upsample: False
+
+CenterNetDLAFPN:
+  down_ratio: 4
+  last_level: 3
+  out_channel: 0
+  first_level: 0
+  dcn_v2: False
+
+CenterNetPostProcess:
+  max_per_img: 500
+
+JDETracker:
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+  min_box_area: 200
+  vertical_ratio: 1.6 # for pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_reader_1088x608.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_reader_1088x608.yml
@@ -0,0 +1,41 @@
+worker_num: 4
+TrainReader:
+  inputs_def:
+    image_shape: [3, 608, 1088]
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - MOTRandomAffine: {reject_outside: False}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2FairMOTTarget: {}
+  batch_size: 6
+  shuffle: True
+  drop_last: True
+  use_shared_memory: True
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 608, 1088]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_reader_576x320.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_reader_576x320.yml
@@ -0,0 +1,41 @@
+worker_num: 4
+TrainReader:
+  inputs_def:
+    image_shape: [3, 320, 576]
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - MOTRandomAffine: {reject_outside: False}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2FairMOTTarget: {}
+  batch_size: 6
+  shuffle: True
+  drop_last: True
+  use_shared_memory: True
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 320, 576]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_reader_864x480.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/fairmot_reader_864x480.yml
@@ -0,0 +1,41 @@
+worker_num: 4
+TrainReader:
+  inputs_def:
+    image_shape: [3, 480, 864]
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - MOTRandomAffine: {reject_outside: False}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2FairMOTTarget: {}
+  batch_size: 6
+  shuffle: True
+  drop_last: True
+  use_shared_memory: True
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 480, 864]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/optimizer_30e.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/optimizer_30e.yml
@@ -0,0 +1,14 @@
+epoch: 30
+
+LearningRate:
+  base_lr: 0.0001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [20,]
+    use_warmup: False
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer: NULL
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/optimizer_30e_momentum.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/_base_/optimizer_30e_momentum.yml
@@ -0,0 +1,19 @@
+epoch: 30
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [15, 22]
+    use_warmup: True
+  - !ExpWarmup
+    steps: 1000
+    power: 4
+
+OptimizerBuilder:
+  optimizer:
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
@@ -0,0 +1,9 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/fairmot_dla34.yml',
+  '_base_/fairmot_reader_1088x608.yml',
+]
+
+weights: output/fairmot_dla34_30e_1088x608/model_final
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_dla34_30e_1088x608_airplane.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_dla34_30e_1088x608_airplane.yml
@@ -0,0 +1,33 @@
+_BASE_: [
+  'fairmot_dla34_30e_1088x608.yml',
+]
+pretrain_weights: https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+weights: output/fairmot_dla34_30e_1088x608_airplane/model_final
+
+JDETracker:
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+  min_box_area: 0
+  vertical_ratio: 0
+
+# for MOT training
+TrainDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['airplane.train']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+# for MOT evaluation
+# If you want to change the MOT evaluation dataset, please modify 'data_root'
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: airplane/images/train
+    keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
+
+# for MOT video inference
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_dla34_30e_1088x608_bytetracker.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_dla34_30e_1088x608_bytetracker.yml
@@ -0,0 +1,31 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/fairmot_dla34.yml',
+  '_base_/fairmot_reader_1088x608.yml',
+]
+weights: output/fairmot_dla34_30e_1088x608_bytetracker/model_final
+
+# for ablation study, MIX + MOT17-half
+TrainDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['mot17.half', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+# for MOT evaluation
+# If you want to change the MOT evaluation dataset, please modify 'data_root'
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: MOT17/images/half
+    keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
+
+JDETracker:
+  use_byte: True
+  match_thres: 0.8
+  conf_thres: 0.4
+  low_conf_thres: 0.2
+  min_box_area: 200
+  vertical_ratio: 1.6 # for pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_dla34_30e_576x320.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_dla34_30e_576x320.yml
@@ -0,0 +1,9 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/fairmot_dla34.yml',
+  '_base_/fairmot_reader_576x320.yml',
+]
+
+weights: output/fairmot_dla34_30e_576x320/model_final
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_dla34_30e_864x480.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_dla34_30e_864x480.yml
@@ -0,0 +1,9 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/fairmot_dla34.yml',
+  '_base_/fairmot_reader_864x480.yml',
+]
+
+weights: output/fairmot_dla34_30e_864x480/model_final
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_enhance_dla34_60e_1088x608.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_enhance_dla34_60e_1088x608.yml
@@ -0,0 +1,56 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/fairmot_dla34.yml',
+  '_base_/fairmot_reader_1088x608.yml',
+]
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+
+# add crowdhuman
+TrainDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+worker_num: 4
+TrainReader:
+  inputs_def:
+    image_shape: [3, 608, 1088]
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - MOTRandomAffine: {reject_outside: False}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2FairMOTTarget: {}
+  batch_size: 16
+  shuffle: True
+  drop_last: True
+  use_shared_memory: True
+
+epoch: 60
+LearningRate:
+  base_lr: 0.0005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [40,]
+    use_warmup: False
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer: NULL
+
+weights: output/fairmot_enhance_dla34_60e_1088x608/model_final
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml
@@ -0,0 +1,56 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/fairmot_hardnet85.yml',
+  '_base_/fairmot_reader_1088x608.yml',
+]
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+
+# add crowdhuman
+TrainDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+worker_num: 4
+TrainReader:
+  inputs_def:
+    image_shape: [3, 608, 1088]
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - MOTRandomAffine: {reject_outside: False}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2FairMOTTarget: {}
+  batch_size: 10
+  shuffle: True
+  drop_last: True
+  use_shared_memory: True
+
+epoch: 30
+LearningRate:
+  base_lr: 0.0001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [20,]
+    use_warmup: False
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer: NULL
+
+weights: output/fairmot_enhance_hardnet85_30e_1088x608/model_final
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml
@@ -0,0 +1,43 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e_momentum.yml',
+  '_base_/fairmot_hrnetv2_w18_dlafpn.yml',
+  '_base_/fairmot_reader_1088x608.yml',
+]
+
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+
+# add crowdhuman
+TrainDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+worker_num: 4
+TrainReader:
+  inputs_def:
+    image_shape: [3, 608, 1088]
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - MOTRandomAffine: {reject_outside: False}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2FairMOTTarget: {}
+  batch_size: 4
+  shuffle: True
+  drop_last: True
+  use_shared_memory: True
+
+weights: output/fairmot_hrnetv2_w18_dlafpn_30e_1088x608/model_final
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml
@@ -0,0 +1,43 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e_momentum.yml',
+  '_base_/fairmot_hrnetv2_w18_dlafpn.yml',
+  '_base_/fairmot_reader_576x320.yml',
+]
+
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+
+# add crowdhuman
+TrainDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+worker_num: 4
+TrainReader:
+  inputs_def:
+    image_shape: [3, 320, 576]
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - MOTRandomAffine: {reject_outside: False}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2FairMOTTarget: {}
+  batch_size: 4
+  shuffle: True
+  drop_last: True
+  use_shared_memory: True
+
+weights: output/fairmot_hrnetv2_w18_dlafpn_30e_576x320/model_final
--- a/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml
@@ -0,0 +1,43 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e_momentum.yml',
+  '_base_/fairmot_hrnetv2_w18_dlafpn.yml',
+  '_base_/fairmot_reader_864x480.yml',
+]
+
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+
+# add crowdhuman
+TrainDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['mot17.train', 'caltech.all', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train', 'crowdhuman.train', 'crowdhuman.val']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+worker_num: 4
+TrainReader:
+  inputs_def:
+    image_shape: [3, 480, 864]
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - MOTRandomAffine: {reject_outside: False}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2FairMOTTarget: {}
+  batch_size: 4
+  shuffle: True
+  drop_last: True
+  use_shared_memory: True
+
+weights: output/fairmot_hrnetv2_w18_dlafpn_30e_864x480/model_final
--- a/services/paddle_services/paddle_detection/configs/mot/headtracking21/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/headtracking21/README.md
@@ -0,0 +1 @@
+README_cn.md
--- a/services/paddle_services/paddle_detection/configs/mot/headtracking21/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/headtracking21/README_cn.md
@@ -0,0 +1,95 @@
+[English](README.md) | 简体中文
+# 特色垂类跟踪模型
+
+## 人头跟踪（Head Tracking)
+
+现有行人跟踪器对高人群密度场景表现不佳，人头跟踪更适用于密集场景的跟踪。
+[HT-21](https://motchallenge.net/data/Head_Tracking_21)是一个高人群密度拥挤场景的人头跟踪数据集，场景包括不同的光线和环境条件下的拥挤的室内和室外场景，所有序列的帧速率都是25fps。
+<div align="center">
+  <img src="https://user-images.githubusercontent.com/22989727/205540742-820984c2-8920-467a-bdde-faea421018c5.gif" width='800'/>
+</div>
+
+## 模型库
+### FairMOT 和 ByteTrack 在 HT-21 Training Set上的结果
+|    模型      |  输入尺寸 |  MOTA  |  IDF1  |  IDS  |   FP  |   FN   |   FPS   |  下载链接 | 配置文件 |
+| :--------------| :------- | :----: | :----: | :---: | :----: | :---: | :------: | :----: |:----: |
+| FairMOT DLA-34  | 1088x608 |  64.7 |  69.0  |   8533  |  148817  |  234970  |     -   | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) |
+| ByteTrack-x     | 1440x800 |  64.1 |  63.4  |  4191   |  185162  |  210240 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](../bytetrack/bytetrack_yolox_ht21.yml) |
+
+### FairMOT 和 ByteTrack 在 HT-21 Test Set上的结果
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |   IDS  |   FP   |   FN   |    FPS   |  下载链接  | 配置文件 |
+| :--------------| :------- | :----: | :----: | :----: | :----: | :----: |:-------: | :----: | :----: |
+| FairMOT DLA-34  | 1088x608 |  60.8  |  62.8  |  12781   |  118109  |  198896 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams) | [配置文件](./fairmot_dla34_30e_1088x608_headtracking21.yml) |
+| ByteTrack-x         | 1440x800 |  72.6  |  61.8  |  5163   |  71235  |  154139 |    -     | [下载链接](https://paddledet.bj.bcebos.com/models/mot/bytetrack_yolox_ht21.pdparams) | [配置文件](../bytetrack/bytetrack_yolox_ht21.yml) |
+
+**注意:**
+ - FairMOT DLA-34使用2个GPU进行训练，每个GPU上batch size为6，训练30个epoch。
+ - ByteTrack使用YOLOX-x做检测器，使用8个GPU进行训练，每个GPU上batch size为8，训练30个epoch，具体细节参照[bytetrack](../bytetrack/)。
+ - 此处提供PaddleDetection团队整理后的[下载链接](https://bj.bcebos.com/v1/paddledet/data/mot/HT21.zip)，下载后需解压放到`dataset/mot/`目录下，HT-21 Test集的结果需要交到[官网](https://motchallenge.net)评测。
+
+
+## 快速开始
+
+### 1. 训练
+使用2个GPU通过如下命令一键式启动训练
+```bash
+python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608_headtracking21/ --gpus 0,1 tools/train.py -c configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml
+```
+
+### 2. 评估
+使用单张GPU通过如下命令一键式启动评估
+```bash
+# 使用PaddleDetection发布的权重
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams
+
+# 使用训练保存的checkpoint
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml -o weights=output/fairmot_dla34_30e_1088x608_headtracking21/model_final.pdparams
+```
+
+### 3. 预测
+使用单个GPU通过如下命令预测一个视频，并保存为视频
+```bash
+# 预测一个视频
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams --video_file={your video name}.mp4  --save_videos
+```
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+
+### 4. 导出预测模型
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608_headtracking21.pdparams
+```
+
+### 5. 用导出的模型基于Python去预测
+```bash
+python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608_headtracking21 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
+```
+**注意:**
+ - 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。
+
+## 引用
+```
+@article{zhang2020fair,
+  title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking},
+  author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu},
+  journal={arXiv preprint arXiv:2004.01888},
+  year={2020}
+}
+
+@InProceedings{Sundararaman_2021_CVPR,
+    author    = {Sundararaman, Ramana and De Almeida Braga, Cedric and Marchand, Eric and Pettre, Julien},
+    title     = {Tracking Pedestrian Heads in Dense Crowd},
+    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+    month     = {June},
+    year      = {2021},
+    pages     = {3865-3875}
+}
+
+@article{zhang2021bytetrack,
+  title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
+  author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
+  journal={arXiv preprint arXiv:2110.06864},
+  year={2021}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/headtracking21/fairmot_dla34_30e_1088x608_headtracking21.yml
@@ -0,0 +1,26 @@
+_BASE_: [
+  '../fairmot/fairmot_dla34_30e_1088x608.yml'
+]
+
+weights: output/fairmot_dla34_30e_1088x608_headtracking21/model_final
+
+# for MOT training
+TrainDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['ht21.train']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+# for MOT evaluation
+# If you want to change the MOT evaluation dataset, please modify 'data_root'
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: HT21/images/test
+    keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
+
+# for MOT video inference
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
--- a/services/paddle_services/paddle_detection/configs/mot/jde/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/README.md
@@ -0,0 +1,119 @@
+English | [简体中文](README_cn.md)
+
+# JDE (Towards Real-Time Multi-Object Tracking)
+
+## Table of Contents
+- [Introduction](#Introduction)
+- [Model Zoo](#Model_Zoo)
+- [Getting Start](#Getting_Start)
+- [Citations](#Citations)
+
+## Introduction
+
+- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) learns the object detection task and appearance embedding task simutaneously in a shared neural network. And the detection results and the corresponding embeddings are also outputed at the same time. JDE original paper is based on an Anchor Base detector YOLOv3, adding a new ReID branch to learn embeddings. The training process is constructed as a multi-task learning problem, taking into account both accuracy and speed.
+
+### PP-Tracking real-time MOT system
+In addition, PaddleDetection also provides [PP-Tracking](../../../deploy/pptracking/README.md) real-time multi-object tracking system.
+PP-Tracking is the first open source real-time Multi-Object Tracking system, and it is based on PaddlePaddle deep learning framework. It has rich models, wide application and high efficiency deployment.
+
+PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). Aiming at the difficulties and pain points of actual business, PP-Tracking provides various MOT functions and applications such as pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking, traffic statistics and multi-camera tracking. The deployment method supports API and GUI visual interface, and the deployment language supports Python and C++, The deployment platform environment supports Linux, NVIDIA Jetson, etc.
+
+### AI studio public project tutorial
+PP-tracking provides an AI studio public project tutorial. Please refer to this [tutorial](https://aistudio.baidu.com/aistudio/projectdetail/3022582).
+
+<div align="center">
+  <img src="https://user-images.githubusercontent.com/22989727/205540305-457d48bf-e9ec-4f28-896c-64c870126e05.gif" width=500 />
+</div>
+
+## Model Zoo
+
+### JDE Results on MOT-16 Training Set
+
+| backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
+| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
+| DarkNet53          | 1088x608 |  72.0  |  66.9  | 1397  |  7274  | 22209 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 864x480 |  69.1  |  64.7  | 1539  |  7544  | 25046 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |  63.7  |  64.4  | 1310  |  6782  | 31964 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+
+### JDE Results on MOT-16 Test Set
+
+| backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
+| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
+| DarkNet53(paper)   | 1088x608 |  64.4  |  55.8  | 1544  |    -    |   -   |   -   |   -  |   -   |
+| DarkNet53          | 1088x608 |  64.6  |  58.5  | 1864  |  10550 | 52088 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53(paper)   | 864x480 |   62.1  |  56.9  | 1608  |    -    |   -   |   -   |   -  |   -   |
+| DarkNet53          | 864x480 |   63.2  |  57.7  | 1966  |  10070  | 55081 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |   59.1  |  56.4  | 1911  |  10923  | 61789  |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+
+**Notes:**
+ - JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.
+
+## Getting Start
+
+### 1. Training
+
+Training JDE on 8 GPUs with following command
+
+```bash
+python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml
+```
+
+### 2. Evaluation
+
+Evaluating the track performance of JDE on val dataset in single GPU with following commands:
+
+```bash
+# use weights released in PaddleDetection model zoo
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
+
+# use saved checkpoint in training
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams
+```
+**Notes:**
+ - The default evaluation dataset is MOT-16 Train Set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mot.yml`：
+  ```
+  EvalMOTDataset:
+    !MOTImageFolder
+      dataset_dir: dataset/mot
+      data_root: MOT17/images/train
+      keep_ori_im: False # set True if save visualization images or video
+  ```
+ - Tracking results will be saved in `{output_dir}/mot_results/`, and every sequence has one txt file, each line of the txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`, and you can set `{output_dir}` by `--output_dir`.
+
+### 3. Inference
+
+Inference a video on single GPU with following command:
+
+```bash
+# inference on video and save a video
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4  --save_videos
+```
+**Notes:**
+ - Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`.
+
+
+### 4. Export model
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
+```
+
+### 5. Using exported model for python inference
+
+```bash
+python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/jde_darknet53_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
+```
+**Notes:**
+ - The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
+ - Each line of the tracking results txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`.
+
+
+## Citations
+```
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/jde/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/README_cn.md
@@ -0,0 +1,118 @@
+简体中文 | [English](README.md)
+
+# JDE (Towards Real-Time Multi-Object Tracking)
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+- [快速开始](#快速开始)
+- [引用](#引用)
+
+## 内容
+
+[JDE](https://arxiv.org/abs/1909.12605)(Joint Detection and Embedding)是在一个单一的共享神经网络中同时学习目标检测任务和embedding任务，并同时输出检测结果和对应的外观embedding匹配的算法。JDE原论文是基于Anchor Base的YOLOv3检测器新增加一个ReID分支学习embedding，训练过程被构建为一个多任务联合学习问题，兼顾精度和速度。
+
+### PP-Tracking 实时多目标跟踪系统
+此外，PaddleDetection还提供了[PP-Tracking](../../../deploy/pptracking/README.md)实时多目标跟踪系统。PP-Tracking是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统，具有模型丰富、应用广泛和部署高效三大优势。
+PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式，针对实际业务的难点和痛点，提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用，部署方式支持API调用和GUI可视化界面，部署语言支持Python和C++，部署平台环境支持Linux、NVIDIA Jetson等。
+
+### AI Studio公开项目案例
+PP-Tracking 提供了AI Studio公开项目案例，教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。
+
+<div align="center">
+  <img src="https://user-images.githubusercontent.com/22989727/205540305-457d48bf-e9ec-4f28-896c-64c870126e05.gif" width=500 />
+</div>
+
+## 模型库
+
+### JDE在MOT-16 Training Set上结果
+
+| 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  下载链接  | 配置文件 |
+| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
+| DarkNet53          | 1088x608 |  72.0  |  66.9  | 1397  |  7274  | 22209 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 864x480 |  69.1  |  64.7  | 1539  |  7544  | 25046 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |  63.7  |  64.4  | 1310  |  6782  | 31964 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+
+
+### JDE在MOT-16 Test Set上结果
+
+| 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  下载链接  | 配置文件 |
+| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
+| DarkNet53(paper)   | 1088x608 |  64.4  |  55.8  | 1544  |    -   |   -   |   -   |   -   |   -   |
+| DarkNet53          | 1088x608 |  64.6  |  58.5  | 1864  |  10550 | 52088 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53(paper)   | 864x480 |   62.1  |  56.9  | 1608  |    -   |   -   |   -   |   -   |   -   |
+| DarkNet53          | 864x480 |   63.2  |  57.7  | 1966  |  10070 | 55081 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |   59.1  |  56.4  | 1911  |  10923 | 61789 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+
+**注意:**
+ - JDE使用8个GPU进行训练，每个GPU上batch size为4，训练了30个epoch。
+
+## 快速开始
+
+### 1. 训练
+
+使用8GPU通过如下命令一键式启动训练
+
+```bash
+python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml
+```
+
+### 2. 评估
+
+使用8GPU通过如下命令一键式启动评估
+
+```bash
+# 使用PaddleDetection发布的权重
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
+
+# 使用训练保存的checkpoint
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams
+```
+**注意:**
+ - 默认评估的是MOT-16 Train Set数据集, 如需换评估数据集可参照以下代码修改`configs/datasets/mot.yml`：
+  ```
+  EvalMOTDataset:
+    !MOTImageFolder
+      dataset_dir: dataset/mot
+      data_root: MOT17/images/train
+      keep_ori_im: False # set True if save visualization images or video
+```
+ - 跟踪结果会存于`{output_dir}/mot_results/`中，里面每个视频序列对应一个txt，每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。
+
+### 3. 预测
+
+使用单个GPU通过如下命令预测一个视频，并保存为视频
+
+```bash
+# 预测一个视频
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4  --save_videos
+```
+
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+
+### 4. 导出预测模型
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
+```
+
+### 5. 用导出的模型基于Python去预测
+
+```bash
+python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/jde_darknet53_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
+```
+**注意:**
+ - 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。
+
+
+## 引用
+```
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/jde/_base_/jde_darknet53.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/_base_/jde_darknet53.yml
@@ -0,0 +1,56 @@
+architecture: JDE
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
+find_unused_parameters: True
+
+JDE:
+  detector: YOLOv3
+  reid: JDEEmbeddingHead
+  tracker: JDETracker
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+  for_mot: True
+
+DarkNet:
+  depth: 53
+  return_idx: [2, 3, 4]
+  freeze_norm: True
+
+YOLOv3FPN:
+  freeze_norm: True
+
+YOLOv3Head:
+  anchors: [[128,384], [180,540], [256,640], [512,640],
+            [32,96], [45,135], [64,192], [90,271],
+            [8,24], [11,34], [16,48], [23,68]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.3
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.5
+    nms_top_k: 2000
+    normalized: true
+
+JDEEmbeddingHead:
+  anchor_levels: 3
+  anchor_scales: 4
+  embedding_dim: 512
+  emb_loss: JDEEmbeddingLoss
+  jde_loss: JDELoss
+
+JDETracker:
+  det_thresh: 0.3
+  track_buffer: 30
+  min_box_area: 200
+  vertical_ratio: 1.6 # for pedestrian
--- a/services/paddle_services/paddle_detection/configs/mot/jde/_base_/jde_reader_1088x608.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/_base_/jde_reader_1088x608.yml
@@ -0,0 +1,48 @@
+worker_num: 8
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - MOTRandomAffine: {}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2JDETargetThres:
+        anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+        anchors: [[[128,384], [180,540], [256,640], [512,640]],
+                  [[32,96], [45,135], [64,192], [90,271]],
+                  [[8,24], [11,34], [16,48], [23,68]]]
+        downsample_ratios: [32, 16, 8]
+        ide_thresh: 0.5
+        fg_thresh: 0.5
+        bg_thresh: 0.4
+  batch_size: 4
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 608, 1088]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/services/paddle_services/paddle_detection/configs/mot/jde/_base_/jde_reader_576x320.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/_base_/jde_reader_576x320.yml
@@ -0,0 +1,48 @@
+worker_num: 2
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - MOTRandomAffine: {}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2JDETargetThres:
+        anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+        anchors: [[[85,255], [120,360], [170,420], [340,420]],
+                  [[21,64], [30,90], [43,128], [60,180]],
+                  [[6,16], [8,23], [11,32], [16,45]]]
+        downsample_ratios: [32, 16, 8]
+        ide_thresh: 0.5
+        fg_thresh: 0.5
+        bg_thresh: 0.4
+  batch_size: 4
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 320, 576]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/services/paddle_services/paddle_detection/configs/mot/jde/_base_/jde_reader_864x480.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/_base_/jde_reader_864x480.yml
@@ -0,0 +1,48 @@
+worker_num: 2
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - MOTRandomAffine: {}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2JDETargetThres:
+        anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+        anchors: [[[102,305], [143, 429], [203,508], [407,508]],
+                  [[25,76], [36,107], [51,152], [71,215]],
+                  [[6,19], [9,27], [13,38], [18,54]]]
+        downsample_ratios: [32, 16, 8]
+        ide_thresh: 0.5
+        fg_thresh: 0.5
+        bg_thresh: 0.4
+  batch_size: 4
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 480, 864]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/services/paddle_services/paddle_detection/configs/mot/jde/_base_/optimizer_30e.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/_base_/optimizer_30e.yml
@@ -0,0 +1,20 @@
+epoch: 30
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [15, 22]
+    use_warmup: True
+  - !ExpWarmup
+    steps: 1000
+    power: 4
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
--- a/services/paddle_services/paddle_detection/configs/mot/jde/_base_/optimizer_60e.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/_base_/optimizer_60e.yml
@@ -0,0 +1,20 @@
+epoch: 60
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [30, 44]
+    use_warmup: True
+  - !ExpWarmup
+    steps: 1000
+    power: 4
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
--- a/services/paddle_services/paddle_detection/configs/mot/jde/jde_darknet53_30e_1088x608.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/jde_darknet53_30e_1088x608.yml
@@ -0,0 +1,47 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/jde_darknet53.yml',
+  '_base_/jde_reader_1088x608.yml',
+]
+weights: output/jde_darknet53_30e_1088x608/model_final
+
+JDE:
+  detector: YOLOv3
+  reid: JDEEmbeddingHead
+  tracker: JDETracker
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+  for_mot: True
+
+YOLOv3Head:
+  anchors: [[128,384], [180,540], [256,640], [512,640],
+            [32,96], [45,135], [64,192], [90,271],
+            [8,24], [11,34], [16,48], [23,68]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDETracker:
+  det_thresh: 0.3
+  track_buffer: 30
+  min_box_area: 200
+  motion: KalmanFilter
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.5
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.4
+    nms_top_k: 2000
+    normalized: true
+    return_index: true
--- a/services/paddle_services/paddle_detection/configs/mot/jde/jde_darknet53_30e_576x320.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/jde_darknet53_30e_576x320.yml
@@ -0,0 +1,47 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/jde_darknet53.yml',
+  '_base_/jde_reader_576x320.yml',
+]
+weights: output/jde_darknet53_30e_576x320/model_final
+
+JDE:
+  detector: YOLOv3
+  reid: JDEEmbeddingHead
+  tracker: JDETracker
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+  for_mot: True
+
+YOLOv3Head:
+  anchors: [[85,255], [120,360], [170,420], [340,420],
+            [21,64], [30,90], [43,128], [60,180],
+            [6,16], [8,23], [11,32], [16,45]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDETracker:
+  det_thresh: 0.3
+  track_buffer: 30
+  min_box_area: 200
+  motion: KalmanFilter
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.5
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.4
+    nms_top_k: 2000
+    normalized: true
+    return_index: true
--- a/services/paddle_services/paddle_detection/configs/mot/jde/jde_darknet53_30e_864x480.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/jde/jde_darknet53_30e_864x480.yml
@@ -0,0 +1,47 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/jde_darknet53.yml',
+  '_base_/jde_reader_864x480.yml',
+]
+weights: output/jde_darknet53_30e_864x480/model_final
+
+JDE:
+  detector: YOLOv3
+  reid: JDEEmbeddingHead
+  tracker: JDETracker
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+  for_mot: True
+
+YOLOv3Head:
+  anchors: [[102,305], [143, 429], [203,508], [407,508],
+            [25,76], [36,107], [51,152], [71,215],
+            [6,19], [9,27], [13,38], [18,54]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDETracker:
+  det_thresh: 0.3
+  track_buffer: 30
+  min_box_area: 200
+  motion: KalmanFilter
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.5
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.4
+    nms_top_k: 2000
+    normalized: true
+    return_index: true
--- a/services/paddle_services/paddle_detection/configs/mot/mcfairmot/README.md
+++ b/services/paddle_services/paddle_detection/configs/mot/mcfairmot/README.md
@@ -0,0 +1,140 @@
+English | [简体中文](README_cn.md)
+
+# MCFairMOT (Multi-class FairMOT)
+
+## Table of Contents
+- [Introduction](#Introduction)
+- [Model Zoo](#Model_Zoo)
+- [Getting Start](#Getting_Start)
+- [Citations](#Citations)
+
+## Introduction
+
+MCFairMOT is the Multi-class extended version of [FairMOT](https://arxiv.org/abs/2004.01888).
+
+### PP-Tracking real-time MOT system
+In addition, PaddleDetection also provides [PP-Tracking](../../../deploy/pptracking/README.md) real-time multi-object tracking system.
+PP-Tracking is the first open source real-time Multi-Object Tracking system, and it is based on PaddlePaddle deep learning framework. It has rich models, wide application and high efficiency deployment.
+
+PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). Aiming at the difficulties and pain points of actual business, PP-Tracking provides various MOT functions and applications such as pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking, traffic statistics and multi-camera tracking. The deployment method supports API and GUI visual interface, and the deployment language supports Python and C++, The deployment platform environment supports Linux, NVIDIA Jetson, etc.
+
+### AI studio public project tutorial
+PP-tracking provides an AI studio public project tutorial. Please refer to this [tutorial](https://aistudio.baidu.com/aistudio/projectdetail/3022582).
+
+## Model Zoo
+### MCFairMOT Results on VisDrone2019 Val Set
+| backbone       | input shape | MOTA | IDF1 |  IDS    |   FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :---:  | :------: | :----: |:----: |
+| DLA-34         | 1088x608 |  24.3  |  41.6  |  2314  |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams) | [config](./mcfairmot_dla34_30e_1088x608_visdrone.yml) |
+| HRNetV2-W18    | 1088x608 |  20.4  |  39.9  |  2603  |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml) |
+| HRNetV2-W18    | 864x480 |  18.2  |  38.7  |  2416  |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml) |
+| HRNetV2-W18    | 576x320 |  12.0  |  33.8  |  2178  |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml) |
+
+**Notes:**
+ - MOTA is the average MOTA of 10 categories in the VisDrone2019 MOT dataset, and its value is also equal to the average MOTA of all the evaluated video sequences. Here we provide the download [link](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot.zip) of the dataset.
+ - MCFairMOT used 4 GPUs for training 30 epochs. The batch size is 6 on each GPU for MCFairMOT DLA-34, and 8 for MCFairMOT HRNetV2-W18.
+
+### MCFairMOT Results on VisDrone Vehicle Val Set
+| backbone       | input shape | MOTA | IDF1 |  IDS    |   FPS    | download | config |
+| :--------------| :------- | :----: | :----: | :---:  | :------: | :----: |:----: |
+| DLA-34         | 1088x608 |  37.7  |  56.8  |  199  |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.pdparams) | [config](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml) |
+| HRNetV2-W18    | 1088x608 |  35.6  |  56.3  |  190  |    -     |[model](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.pdparams) | [config](./mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.yml) |
+
+**Notes:**
+ - MOTA is the average MOTA of 4 categories in the VisDrone Vehicle dataset, and this dataset is extracted from the VisDrone2019 MOT dataset, here we provide the download [link](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot_vehicle.zip).
+ - The tracker used in MCFairMOT model here is ByteTracker.
+
+### MCFairMOT off-line quantization results on VisDrone Vehicle val-set
+|    Model      |  Compression Strategy | Prediction Delay（T4） |Prediction Delay（V100）| Model Configuration File |Compression Algorithm Configuration File |
+| :--------------| :------- | :------: | :----: | :----: | :----: |
+| DLA-34         | baseline |    41.3  |    21.9 |[Configuration File](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)|    -     |
+| DLA-34         | off-line quantization   |  37.8    |  21.2  |[Configuration File](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)|[Configuration File](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/slim/post_quant/mcfairmot_ptq.yml)|
+
+
+## Getting Start
+
+### 1. Training
+Training MCFairMOT on 4 GPUs with following command
+```bash
+python -m paddle.distributed.launch --log_dir=./mcfairmot_dla34_30e_1088x608_visdrone/ --gpus 0,1,2,3 tools/train.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml
+```
+
+### 2. Evaluation
+Evaluating the track performance of MCFairMOT on val dataset in single GPU with following commands:
+```bash
+# use weights released in PaddleDetection model zoo
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams
+
+# use saved checkpoint in training
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=output/mcfairmot_dla34_30e_1088x608_visdrone/model_final.pdparams
+```
+**Notes:**
+ - The default evaluation dataset is VisDrone2019 MOT val-set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mcmot.yml`：
+  ```
+  EvalMOTDataset:
+    !MOTImageFolder
+      dataset_dir: dataset/mot
+      data_root: your_dataset/images/val
+      keep_ori_im: False # set True if save visualization images or video
+  ```
+ - Tracking results will be saved in `{output_dir}/mot_results/`, and every sequence has one txt file, each line of the txt file is `frame,id,x1,y1,w,h,score,cls_id,-1,-1`, and you can set `{output_dir}` by `--output_dir`.
+
+### 3. Inference
+Inference a video on single GPU with following command:
+```bash
+# inference on video and save a video
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams --video_file={your video name}.mp4  --save_videos
+```
+**Notes:**
+ - Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`.
+
+
+### 4. Export model
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams
+```
+
+### 5. Using exported model for python inference
+```bash
+python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/mcfairmot_dla34_30e_1088x608_visdrone --video_file={your video name}.mp4 --device=GPU --save_mot_txts
+```
+**Notes:**
+ - The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
+ - Each line of the tracking results txt file is `frame,id,x1,y1,w,h,score,cls_id,-1,-1`.
+
+### 6. Off-line quantization
+
+The offline quantization model is calibrated using the VisDrone Vehicle val-set, running as:
+```bash
+CUDA_VISIBLE_DEVICES=0 python3.7 tools/post_quant.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml --slim_config=configs/slim/post_quant/mcfairmot_ptq.yml
+```
+**Notes:**
+ - Offline quantization uses the VisDrone Vehicle val-set dataset and a 4-class vehicle tracking model by default.
+
+## Citations
+```
+@article{zhang2020fair,
+  title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking},
+  author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu},
+  journal={arXiv preprint arXiv:2004.01888},
+  year={2020}
+}
+
+@ARTICLE{9573394,
+  author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin},
+  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
+  title={Detection and Tracking Meet Drones Challenge},
+  year={2021},
+  volume={},
+  number={},
+  pages={1-1},
+  doi={10.1109/TPAMI.2021.3119563}
+}
+
+@article{zhang2021bytetrack,
+  title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
+  author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
+  journal={arXiv preprint arXiv:2110.06864},
+  year={2021}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/mcfairmot/README_cn.md
+++ b/services/paddle_services/paddle_detection/configs/mot/mcfairmot/README_cn.md
@@ -0,0 +1,137 @@
+简体中文 | [English](README.md)
+
+# MCFairMOT (Multi-class FairMOT)
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+- [快速开始](#快速开始)
+- [引用](#引用)
+
+## 内容
+
+MCFairMOT是[FairMOT](https://arxiv.org/abs/2004.01888)的多类别扩展版本。
+
+### PP-Tracking 实时多目标跟踪系统
+此外，PaddleDetection还提供了[PP-Tracking](../../../deploy/pptracking/README.md)实时多目标跟踪系统。PP-Tracking是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统，具有模型丰富、应用广泛和部署高效三大优势。
+PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式，针对实际业务的难点和痛点，提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用，部署方式支持API调用和GUI可视化界面，部署语言支持Python和C++，部署平台环境支持Linux、NVIDIA Jetson等。
+
+### AI Studio公开项目案例
+PP-Tracking 提供了AI Studio公开项目案例，教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。
+
+## 模型库
+
+### MCFairMOT 在VisDrone2019 MOT val-set上结果
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |  IDS   |   FPS   |  下载链接 | 配置文件 |
+| :--------------| :------- | :----: | :----: | :---:  | :------: | :----: |:----: |
+| DLA-34         | 1088x608 |  24.3  |  41.6  |  2314  |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams) | [配置文件](./mcfairmot_dla34_30e_1088x608_visdrone.yml) |
+| HRNetV2-W18    | 1088x608 |  20.4  |  39.9  |  2603  |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml) |
+| HRNetV2-W18    | 864x480 |  18.2  |  38.7  |  2416  |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml) |
+| HRNetV2-W18    | 576x320 |  12.0  |  33.8  |  2178  |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml) |
+
+**注意:**
+ - MOTA是VisDrone2019 MOT数据集10类目标的平均MOTA, 其值也等于所有评估的视频序列的平均MOTA，此处提供数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot.zip)。
+ - MCFairMOT模型均使用4个GPU进行训练，训练30个epoch。DLA-34骨干网络的每个GPU上batch size为6，HRNetV2-W18骨干网络的每个GPU上batch size为8。
+
+### MCFairMOT 在VisDrone Vehicle val-set上结果
+|    骨干网络      |  输入尺寸 |  MOTA  |  IDF1  |  IDS   |   FPS   |  下载链接 | 配置文件 |
+| :--------------| :------- | :----: | :----: | :---:  | :------: | :----: |:----: |
+| DLA-34         | 1088x608 |  37.7  |  56.8  |  199  |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.pdparams) | [配置文件](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml) |
+| HRNetV2-W18    | 1088x608 |  35.6  |  56.3  |  190  |    -     |[下载链接](https://paddledet.bj.bcebos.com/models/mot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.pdparams) | [配置文件](./mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.yml) |
+
+**注意:**
+ - MOTA是VisDrone Vehicle数据集4类车辆目标的平均MOTA, 该数据集是VisDrone数据集中抽出4类车辆类别组成的，此处提供数据集[下载链接](https://bj.bcebos.com/v1/paddledet/data/mot/visdrone_mcmot_vehicle.zip)。
+ - MCFairMOT模型此处使用的跟踪器是使用的ByteTracker。
+
+### MCFairMOT 在VisDrone Vehicle val-set上离线量化结果
+|    骨干网络      |  压缩策略 | 预测时延（T4） |预测时延（V100）| 配置文件 |压缩算法配置文件 |
+| :--------------| :------- | :------: | :----: | :----: | :----: |
+| DLA-34         | baseline |    41.3  |    21.9 |[配置文件](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)|    -     |
+| DLA-34         | 离线量化   |  37.8    |  21.2  |[配置文件](./mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/slim/post_quant/mcfairmot_ptq.yml)|
+
+## 快速开始
+
+### 1. 训练
+使用4个GPU通过如下命令一键式启动训练
+```bash
+python -m paddle.distributed.launch --log_dir=./mcfairmot_dla34_30e_1088x608_visdrone/ --gpus 0,1,2,3 tools/train.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml
+```
+
+### 2. 评估
+使用单张GPU通过如下命令一键式启动评估
+```bash
+# 使用PaddleDetection发布的权重
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams
+
+# 使用训练保存的checkpoint
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=output/mcfairmot_dla34_30e_1088x608_visdrone/model_final.pdparams
+```
+**注意:**
+ - 默认评估的是VisDrone2019 MOT val-set数据集, 如需换评估数据集可参照以下代码修改`configs/datasets/mcmot.yml`：
+  ```
+  EvalMOTDataset:
+    !MOTImageFolder
+      dataset_dir: dataset/mot
+      data_root: your_dataset/images/val
+      keep_ori_im: False # set True if save visualization images or video
+  ```
+ - 多类别跟踪结果会存于`{output_dir}/mot_results/`中，里面每个视频序列对应一个txt，每个txt文件每行信息是`frame,id,x1,y1,w,h,score,cls_id,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。
+
+### 3. 预测
+使用单个GPU通过如下命令预测一个视频，并保存为视频
+```bash
+# 预测一个视频
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams --video_file={your video name}.mp4  --save_videos
+```
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+
+### 4. 导出预测模型
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/mcfairmot_dla34_30e_1088x608_visdrone.pdparams
+```
+
+### 5. 用导出的模型基于Python去预测
+```bash
+python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/mcfairmot_dla34_30e_1088x608_visdrone --video_file={your video name}.mp4 --device=GPU --save_mot_txts
+```
+**注意:**
+ - 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ - 多类别跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,cls_id,-1,-1`。
+
+### 6. 离线量化
+
+使用 VisDrone Vehicle val-set 对离线量化模型进行校准，运行方式：
+```bash
+CUDA_VISIBLE_DEVICES=0 python3.7 tools/post_quant.py -c configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml --slim_config=configs/slim/post_quant/mcfairmot_ptq.yml
+```
+**注意:**
+ - 离线量化默认使用的是VisDrone Vehicle val-set数据集以及4类车辆跟踪模型。
+
+## 引用
+```
+@article{zhang2020fair,
+  title={FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking},
+  author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu},
+  journal={arXiv preprint arXiv:2004.01888},
+  year={2020}
+}
+
+@ARTICLE{9573394,
+  author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin},
+  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
+  title={Detection and Tracking Meet Drones Challenge},
+  year={2021},
+  volume={},
+  number={},
+  pages={1-1},
+  doi={10.1109/TPAMI.2021.3119563}
+}
+
+@article{zhang2021bytetrack,
+  title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
+  author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
+  journal={arXiv preprint arXiv:2110.06864},
+  year={2021}
+}
+```
--- a/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone.yml
@@ -0,0 +1,42 @@
+_BASE_: [
+  '../fairmot/fairmot_dla34_30e_1088x608.yml',
+  '../../datasets/mcmot.yml'
+]
+
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/fairmot_dla34_crowdhuman_pretrained.pdparams
+
+FairMOT:
+  detector: CenterNet
+  reid: FairMOTEmbeddingHead
+  loss: FairMOTLoss
+  tracker: JDETracker # multi-class tracker
+
+CenterNetHead:
+  regress_ltrb: False
+
+CenterNetPostProcess:
+  regress_ltrb: False
+  max_per_img: 200
+
+JDETracker:
+  min_box_area: 0
+  vertical_ratio: 0 # no need to filter bboxes according to w/h
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+
+weights: output/mcfairmot_dla34_30e_1088x608_visdrone/model_final
+
+epoch: 30
+LearningRate:
+  base_lr: 0.0005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [10, 20]
+    use_warmup: False
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer: NULL
--- a/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker.yml
@@ -0,0 +1,68 @@
+_BASE_: [
+  '../fairmot/fairmot_dla34_30e_1088x608.yml',
+  '../../datasets/mcmot.yml'
+]
+metric: MCMOT
+num_classes: 4
+
+# for MCMOT training
+TrainDataset:
+  !MCMOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['visdrone_mcmot_vehicle.train']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+    label_list: label_list.txt
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: visdrone_mcmot_vehicle/images/val
+    keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
+    anno_path: dataset/mot/visdrone_mcmot_vehicle/label_list.txt
+
+# for MCMOT video inference
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
+    anno_path: dataset/mot/visdrone_mcmot_vehicle/label_list.txt
+
+
+pretrain_weights: https://paddledet.bj.bcebos.com/models/centernet_dla34_140e_coco.pdparams
+
+FairMOT:
+  detector: CenterNet
+  reid: FairMOTEmbeddingHead
+  loss: FairMOTLoss
+  tracker: JDETracker # multi-class tracker
+
+CenterNetHead:
+  regress_ltrb: False
+
+CenterNetPostProcess:
+  regress_ltrb: False
+  max_per_img: 200
+
+JDETracker:
+  min_box_area: 0
+  vertical_ratio: 0 # no need to filter bboxes according to w/h
+  use_byte: True
+  match_thres: 0.8
+  conf_thres: 0.4
+  low_conf_thres: 0.2
+
+weights: output/mcfairmot_dla34_30e_1088x608_visdrone_vehicle_bytetracker/model_final
+
+epoch: 30
+LearningRate:
+  base_lr: 0.0005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [10, 20]
+    use_warmup: False
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer: NULL
--- a/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone.yml
@@ -0,0 +1,47 @@
+_BASE_: [
+  '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml',
+  '../../datasets/mcmot.yml'
+]
+
+architecture: FairMOT
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams
+for_mot: True
+
+FairMOT:
+  detector: CenterNet
+  reid: FairMOTEmbeddingHead
+  loss: FairMOTLoss
+  tracker: JDETracker # multi-class tracker
+
+CenterNetHead:
+  regress_ltrb: False
+
+CenterNetPostProcess:
+  regress_ltrb: False
+  max_per_img: 200
+
+JDETracker:
+  min_box_area: 0
+  vertical_ratio: 0 # no need to filter bboxes according to w/h
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+
+weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone/model_final
+
+epoch: 30
+LearningRate:
+  base_lr: 0.0005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [10, 20]
+    use_warmup: False
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer: NULL
+
+TrainReader:
+  batch_size: 8
--- a/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker.yml
@@ -0,0 +1,78 @@
+_BASE_: [
+  '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml',
+  '../../datasets/mcmot.yml'
+]
+metric: MCMOT
+num_classes: 4
+
+# for MCMOT training
+TrainDataset:
+  !MCMOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['visdrone_mcmot_vehicle.train']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+    label_list: label_list.txt
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: visdrone_mcmot_vehicle/images/val
+    keep_ori_im: False # set True if save visualization images or video, or used in DeepSORT
+    anno_path: dataset/mot/visdrone_mcmot_vehicle/label_list.txt
+
+# for MCMOT video inference
+TestMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: True # set True if save visualization images or video
+    anno_path: dataset/mot/visdrone_mcmot_vehicle/label_list.txt
+
+
+architecture: FairMOT
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams
+for_mot: True
+
+FairMOT:
+  detector: CenterNet
+  reid: FairMOTEmbeddingHead
+  loss: FairMOTLoss
+  tracker: JDETracker # multi-class tracker
+
+CenterNetHead:
+  regress_ltrb: False
+
+CenterNetPostProcess:
+  regress_ltrb: False
+  max_per_img: 200
+
+JDETracker:
+  min_box_area: 0
+  vertical_ratio: 0 # no need to filter bboxes according to w/h
+  use_byte: True
+  match_thres: 0.8
+  conf_thres: 0.4
+  low_conf_thres: 0.2
+
+weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_1088x608_visdrone_vehicle_bytetracker/model_final
+
+epoch: 30
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [15, 22]
+    use_warmup: True
+  - !ExpWarmup
+    steps: 1000
+    power: 4
+
+OptimizerBuilder:
+  optimizer:
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+
+TrainReader:
+  batch_size: 8
--- a/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot.yml
@@ -0,0 +1,64 @@
+_BASE_: [
+  '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml',
+  '../../datasets/mcmot.yml'
+]
+
+metric: MCMOT
+num_classes: 11
+weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_bdd100k_mcmot/model_final
+
+# for MCMOT training
+TrainDataset:
+  !MCMOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['bdd100k_mcmot.train']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+    label_list: label_list.txt
+
+EvalMOTDataset:
+  !MOTImageFolder
+    dataset_dir: dataset/mot
+    data_root: bdd100k_mcmot/images/val
+    keep_ori_im: False
+
+# model config
+architecture: FairMOT
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams
+for_mot: True
+
+FairMOT:
+  detector: CenterNet
+  reid: FairMOTEmbeddingHead
+  loss: FairMOTLoss
+  tracker: JDETracker # multi-class tracker
+
+CenterNetHead:
+  regress_ltrb: False
+
+CenterNetPostProcess:
+  regress_ltrb: False
+  max_per_img: 200
+
+JDETracker:
+  min_box_area: 0
+  vertical_ratio: 0 # no need to filter bboxes according to w/h
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+
+epoch: 30
+LearningRate:
+  base_lr: 0.0005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [10, 20]
+    use_warmup: False
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer: NULL
+
+TrainReader:
+  batch_size: 8
--- a/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone.yml
@@ -0,0 +1,47 @@
+_BASE_: [
+  '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.yml',
+  '../../datasets/mcmot.yml'
+]
+
+architecture: FairMOT
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams
+for_mot: True
+
+FairMOT:
+  detector: CenterNet
+  reid: FairMOTEmbeddingHead
+  loss: FairMOTLoss
+  tracker: JDETracker # multi-class tracker
+
+CenterNetHead:
+  regress_ltrb: False
+
+CenterNetPostProcess:
+  regress_ltrb: False
+  max_per_img: 200
+
+JDETracker:
+  min_box_area: 0
+  vertical_ratio: 0 # no need to filter bboxes according to w/h
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+
+weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_576x320_visdrone/model_final
+
+epoch: 30
+LearningRate:
+  base_lr: 0.0005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [10, 20]
+    use_warmup: False
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer: NULL
+
+TrainReader:
+  batch_size: 8
--- a/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml
+++ b/services/paddle_services/paddle_detection/configs/mot/mcfairmot/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone.yml
@@ -0,0 +1,47 @@
+_BASE_: [
+  '../fairmot/fairmot_hrnetv2_w18_dlafpn_30e_864x480.yml',
+  '../../datasets/mcmot.yml'
+]
+
+architecture: FairMOT
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/HRNet_W18_C_pretrained.pdparams
+for_mot: True
+
+FairMOT:
+  detector: CenterNet
+  reid: FairMOTEmbeddingHead
+  loss: FairMOTLoss
+  tracker: JDETracker # multi-class tracker
+
+CenterNetHead:
+  regress_ltrb: False
+
+CenterNetPostProcess:
+  regress_ltrb: False
+  max_per_img: 200
+
+JDETracker:
+  min_box_area: 0
+  vertical_ratio: 0 # no need to filter bboxes according to w/h
+  conf_thres: 0.4
+  tracked_thresh: 0.4
+  metric_type: cosine
+
+weights: output/mcfairmot_hrnetv2_w18_dlafpn_30e_864x480_visdrone/model_final
+
+epoch: 30
+LearningRate:
+  base_lr: 0.0005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [10, 20]
+    use_warmup: False
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer: NULL
+
+TrainReader:
+  batch_size: 8
--- a/Show More
+++ b/Show More