更换文档检测模型

2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions
--- a/paddle_detection/configs/mot/jde/README.md
+++ b/paddle_detection/configs/mot/jde/README.md
@@ -0,0 +1,119 @@
+English | [简体中文](README_cn.md)
+
+# JDE (Towards Real-Time Multi-Object Tracking)
+
+## Table of Contents
+- [Introduction](#Introduction)
+- [Model Zoo](#Model_Zoo)
+- [Getting Start](#Getting_Start)
+- [Citations](#Citations)
+
+## Introduction
+
+- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) learns the object detection task and appearance embedding task simutaneously in a shared neural network. And the detection results and the corresponding embeddings are also outputed at the same time. JDE original paper is based on an Anchor Base detector YOLOv3, adding a new ReID branch to learn embeddings. The training process is constructed as a multi-task learning problem, taking into account both accuracy and speed.
+
+### PP-Tracking real-time MOT system
+In addition, PaddleDetection also provides [PP-Tracking](../../../deploy/pptracking/README.md) real-time multi-object tracking system.
+PP-Tracking is the first open source real-time Multi-Object Tracking system, and it is based on PaddlePaddle deep learning framework. It has rich models, wide application and high efficiency deployment.
+
+PP-Tracking supports two paradigms: single camera tracking (MOT) and multi-camera tracking (MTMCT). Aiming at the difficulties and pain points of actual business, PP-Tracking provides various MOT functions and applications such as pedestrian tracking, vehicle tracking, multi-class tracking, small object tracking, traffic statistics and multi-camera tracking. The deployment method supports API and GUI visual interface, and the deployment language supports Python and C++, The deployment platform environment supports Linux, NVIDIA Jetson, etc.
+
+### AI studio public project tutorial
+PP-tracking provides an AI studio public project tutorial. Please refer to this [tutorial](https://aistudio.baidu.com/aistudio/projectdetail/3022582).
+
+<div align="center">
+  <img src="https://user-images.githubusercontent.com/22989727/205540305-457d48bf-e9ec-4f28-896c-64c870126e05.gif" width=500 />
+</div>
+
+## Model Zoo
+
+### JDE Results on MOT-16 Training Set
+
+| backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
+| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
+| DarkNet53          | 1088x608 |  72.0  |  66.9  | 1397  |  7274  | 22209 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 864x480 |  69.1  |  64.7  | 1539  |  7544  | 25046 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |  63.7  |  64.4  | 1310  |  6782  | 31964 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+
+### JDE Results on MOT-16 Test Set
+
+| backbone           | input shape | MOTA | IDF1  |  IDS  |   FP  |  FN  |  FPS  | download | config |
+| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
+| DarkNet53(paper)   | 1088x608 |  64.4  |  55.8  | 1544  |    -    |   -   |   -   |   -  |   -   |
+| DarkNet53          | 1088x608 |  64.6  |  58.5  | 1864  |  10550 | 52088 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53(paper)   | 864x480 |   62.1  |  56.9  | 1608  |    -    |   -   |   -   |   -  |   -   |
+| DarkNet53          | 864x480 |   63.2  |  57.7  | 1966  |  10070  | 55081 |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |   59.1  |  56.4  | 1911  |  10923  | 61789  |   -   |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+
+**Notes:**
+ - JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.
+
+## Getting Start
+
+### 1. Training
+
+Training JDE on 8 GPUs with following command
+
+```bash
+python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml
+```
+
+### 2. Evaluation
+
+Evaluating the track performance of JDE on val dataset in single GPU with following commands:
+
+```bash
+# use weights released in PaddleDetection model zoo
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
+
+# use saved checkpoint in training
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams
+```
+**Notes:**
+ - The default evaluation dataset is MOT-16 Train Set. If you want to change the evaluation dataset, please refer to the following code and modify `configs/datasets/mot.yml`：
+  ```
+  EvalMOTDataset:
+    !MOTImageFolder
+      dataset_dir: dataset/mot
+      data_root: MOT17/images/train
+      keep_ori_im: False # set True if save visualization images or video
+  ```
+ - Tracking results will be saved in `{output_dir}/mot_results/`, and every sequence has one txt file, each line of the txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`, and you can set `{output_dir}` by `--output_dir`.
+
+### 3. Inference
+
+Inference a video on single GPU with following command:
+
+```bash
+# inference on video and save a video
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4  --save_videos
+```
+**Notes:**
+ - Please make sure that [ffmpeg](https://ffmpeg.org/ffmpeg.html) is installed first, on Linux(Ubuntu) platform you can directly install it by the following command:`apt-get update && apt-get install -y ffmpeg`.
+
+
+### 4. Export model
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
+```
+
+### 5. Using exported model for python inference
+
+```bash
+python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/jde_darknet53_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
+```
+**Notes:**
+ - The tracking model is used to predict the video, and does not support the prediction of a single image. The visualization video of the tracking results is saved by default. You can add `--save_mot_txts` to save the txt result file, or `--save_images` to save the visualization images.
+ - Each line of the tracking results txt file is `frame,id,x1,y1,w,h,score,-1,-1,-1`.
+
+
+## Citations
+```
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+```
--- a/paddle_detection/configs/mot/jde/README_cn.md
+++ b/paddle_detection/configs/mot/jde/README_cn.md
@@ -0,0 +1,118 @@
+简体中文 | [English](README.md)
+
+# JDE (Towards Real-Time Multi-Object Tracking)
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+- [快速开始](#快速开始)
+- [引用](#引用)
+
+## 内容
+
+[JDE](https://arxiv.org/abs/1909.12605)(Joint Detection and Embedding)是在一个单一的共享神经网络中同时学习目标检测任务和embedding任务，并同时输出检测结果和对应的外观embedding匹配的算法。JDE原论文是基于Anchor Base的YOLOv3检测器新增加一个ReID分支学习embedding，训练过程被构建为一个多任务联合学习问题，兼顾精度和速度。
+
+### PP-Tracking 实时多目标跟踪系统
+此外，PaddleDetection还提供了[PP-Tracking](../../../deploy/pptracking/README.md)实时多目标跟踪系统。PP-Tracking是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统，具有模型丰富、应用广泛和部署高效三大优势。
+PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式，针对实际业务的难点和痛点，提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用，部署方式支持API调用和GUI可视化界面，部署语言支持Python和C++，部署平台环境支持Linux、NVIDIA Jetson等。
+
+### AI Studio公开项目案例
+PP-Tracking 提供了AI Studio公开项目案例，教程请参考[PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582)。
+
+<div align="center">
+  <img src="https://user-images.githubusercontent.com/22989727/205540305-457d48bf-e9ec-4f28-896c-64c870126e05.gif" width=500 />
+</div>
+
+## 模型库
+
+### JDE在MOT-16 Training Set上结果
+
+| 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  下载链接  | 配置文件 |
+| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
+| DarkNet53          | 1088x608 |  72.0  |  66.9  | 1397  |  7274  | 22209 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53          | 864x480 |  69.1  |  64.7  | 1539  |  7544  | 25046 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |  63.7  |  64.4  | 1310  |  6782  | 31964 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+
+
+### JDE在MOT-16 Test Set上结果
+
+| 骨干网络            |  输入尺寸  |  MOTA  |  IDF1 |  IDS  |  FP  |  FN  |  FPS  |  下载链接  | 配置文件 |
+| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
+| DarkNet53(paper)   | 1088x608 |  64.4  |  55.8  | 1544  |    -   |   -   |   -   |   -   |   -   |
+| DarkNet53          | 1088x608 |  64.6  |  58.5  | 1864  |  10550 | 52088 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
+| DarkNet53(paper)   | 864x480 |   62.1  |  56.9  | 1608  |    -   |   -   |   -   |   -   |   -   |
+| DarkNet53          | 864x480 |   63.2  |  57.7  | 1966  |  10070 | 55081 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
+| DarkNet53          | 576x320 |   59.1  |  56.4  | 1911  |  10923 | 61789 |   -   |[下载链接](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
+
+**注意:**
+ - JDE使用8个GPU进行训练，每个GPU上batch size为4，训练了30个epoch。
+
+## 快速开始
+
+### 1. 训练
+
+使用8GPU通过如下命令一键式启动训练
+
+```bash
+python -m paddle.distributed.launch --log_dir=./jde_darknet53_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml
+```
+
+### 2. 评估
+
+使用8GPU通过如下命令一键式启动评估
+
+```bash
+# 使用PaddleDetection发布的权重
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
+
+# 使用训练保存的checkpoint
+CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=output/jde_darknet53_30e_1088x608/model_final.pdparams
+```
+**注意:**
+ - 默认评估的是MOT-16 Train Set数据集, 如需换评估数据集可参照以下代码修改`configs/datasets/mot.yml`：
+  ```
+  EvalMOTDataset:
+    !MOTImageFolder
+      dataset_dir: dataset/mot
+      data_root: MOT17/images/train
+      keep_ori_im: False # set True if save visualization images or video
+```
+ - 跟踪结果会存于`{output_dir}/mot_results/`中，里面每个视频序列对应一个txt，每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。
+
+### 3. 预测
+
+使用单个GPU通过如下命令预测一个视频，并保存为视频
+
+```bash
+# 预测一个视频
+CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams --video_file={your video name}.mp4  --save_videos
+```
+
+**注意:**
+ - 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装：`apt-get update && apt-get install -y ffmpeg`。
+
+### 4. 导出预测模型
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/jde/jde_darknet53_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams
+```
+
+### 5. 用导出的模型基于Python去预测
+
+```bash
+python deploy/pptracking/python/mot_jde_infer.py --model_dir=output_inference/jde_darknet53_30e_1088x608 --video_file={your video name}.mp4 --device=GPU --save_mot_txts
+```
+**注意:**
+ - 跟踪模型是对视频进行预测，不支持单张图的预测，默认保存跟踪结果可视化后的视频，可添加`--save_mot_txts`表示保存跟踪结果的txt文件，或`--save_images`表示保存跟踪结果可视化图片。
+ - 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`。
+
+
+## 引用
+```
+@article{wang2019towards,
+  title={Towards Real-Time Multi-Object Tracking},
+  author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
+  journal={arXiv preprint arXiv:1909.12605},
+  year={2019}
+}
+```
--- a/paddle_detection/configs/mot/jde/_base_/jde_darknet53.yml
+++ b/paddle_detection/configs/mot/jde/_base_/jde_darknet53.yml
@@ -0,0 +1,56 @@
+architecture: JDE
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
+find_unused_parameters: True
+
+JDE:
+  detector: YOLOv3
+  reid: JDEEmbeddingHead
+  tracker: JDETracker
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+  for_mot: True
+
+DarkNet:
+  depth: 53
+  return_idx: [2, 3, 4]
+  freeze_norm: True
+
+YOLOv3FPN:
+  freeze_norm: True
+
+YOLOv3Head:
+  anchors: [[128,384], [180,540], [256,640], [512,640],
+            [32,96], [45,135], [64,192], [90,271],
+            [8,24], [11,34], [16,48], [23,68]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.3
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.5
+    nms_top_k: 2000
+    normalized: true
+
+JDEEmbeddingHead:
+  anchor_levels: 3
+  anchor_scales: 4
+  embedding_dim: 512
+  emb_loss: JDEEmbeddingLoss
+  jde_loss: JDELoss
+
+JDETracker:
+  det_thresh: 0.3
+  track_buffer: 30
+  min_box_area: 200
+  vertical_ratio: 1.6 # for pedestrian
--- a/paddle_detection/configs/mot/jde/_base_/jde_reader_1088x608.yml
+++ b/paddle_detection/configs/mot/jde/_base_/jde_reader_1088x608.yml
@@ -0,0 +1,48 @@
+worker_num: 8
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - MOTRandomAffine: {}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2JDETargetThres:
+        anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+        anchors: [[[128,384], [180,540], [256,640], [512,640]],
+                  [[32,96], [45,135], [64,192], [90,271]],
+                  [[8,24], [11,34], [16,48], [23,68]]]
+        downsample_ratios: [32, 16, 8]
+        ide_thresh: 0.5
+        fg_thresh: 0.5
+        bg_thresh: 0.4
+  batch_size: 4
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 608, 1088]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [608, 1088]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/paddle_detection/configs/mot/jde/_base_/jde_reader_576x320.yml
+++ b/paddle_detection/configs/mot/jde/_base_/jde_reader_576x320.yml
@@ -0,0 +1,48 @@
+worker_num: 2
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - MOTRandomAffine: {}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2JDETargetThres:
+        anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+        anchors: [[[85,255], [120,360], [170,420], [340,420]],
+                  [[21,64], [30,90], [43,128], [60,180]],
+                  [[6,16], [8,23], [11,32], [16,45]]]
+        downsample_ratios: [32, 16, 8]
+        ide_thresh: 0.5
+        fg_thresh: 0.5
+        bg_thresh: 0.4
+  batch_size: 4
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 320, 576]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [320, 576]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/paddle_detection/configs/mot/jde/_base_/jde_reader_864x480.yml
+++ b/paddle_detection/configs/mot/jde/_base_/jde_reader_864x480.yml
@@ -0,0 +1,48 @@
+worker_num: 2
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RGBReverse: {}
+    - AugmentHSV: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - MOTRandomAffine: {}
+    - RandomFlip: {}
+    - BboxXYXY2XYWH: {}
+    - NormalizeBox: {}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - RGBReverse: {}
+    - Permute: {}
+  batch_transforms:
+    - Gt2JDETargetThres:
+        anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+        anchors: [[[102,305], [143, 429], [203,508], [407,508]],
+                  [[25,76], [36,107], [51,152], [71,215]],
+                  [[6,19], [9,27], [13,38], [18,54]]]
+        downsample_ratios: [32, 16, 8]
+        ide_thresh: 0.5
+        fg_thresh: 0.5
+        bg_thresh: 0.4
+  batch_size: 4
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+
+EvalMOTReader:
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
+
+
+TestMOTReader:
+  inputs_def:
+    image_shape: [3, 480, 864]
+  sample_transforms:
+    - Decode: {}
+    - LetterBoxResize: {target_size: [480, 864]}
+    - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/paddle_detection/configs/mot/jde/_base_/optimizer_30e.yml
+++ b/paddle_detection/configs/mot/jde/_base_/optimizer_30e.yml
@@ -0,0 +1,20 @@
+epoch: 30
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [15, 22]
+    use_warmup: True
+  - !ExpWarmup
+    steps: 1000
+    power: 4
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
--- a/paddle_detection/configs/mot/jde/_base_/optimizer_60e.yml
+++ b/paddle_detection/configs/mot/jde/_base_/optimizer_60e.yml
@@ -0,0 +1,20 @@
+epoch: 60
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [30, 44]
+    use_warmup: True
+  - !ExpWarmup
+    steps: 1000
+    power: 4
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
--- a/paddle_detection/configs/mot/jde/jde_darknet53_30e_1088x608.yml
+++ b/paddle_detection/configs/mot/jde/jde_darknet53_30e_1088x608.yml
@@ -0,0 +1,47 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/jde_darknet53.yml',
+  '_base_/jde_reader_1088x608.yml',
+]
+weights: output/jde_darknet53_30e_1088x608/model_final
+
+JDE:
+  detector: YOLOv3
+  reid: JDEEmbeddingHead
+  tracker: JDETracker
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+  for_mot: True
+
+YOLOv3Head:
+  anchors: [[128,384], [180,540], [256,640], [512,640],
+            [32,96], [45,135], [64,192], [90,271],
+            [8,24], [11,34], [16,48], [23,68]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDETracker:
+  det_thresh: 0.3
+  track_buffer: 30
+  min_box_area: 200
+  motion: KalmanFilter
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.5
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.4
+    nms_top_k: 2000
+    normalized: true
+    return_index: true
--- a/paddle_detection/configs/mot/jde/jde_darknet53_30e_576x320.yml
+++ b/paddle_detection/configs/mot/jde/jde_darknet53_30e_576x320.yml
@@ -0,0 +1,47 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/jde_darknet53.yml',
+  '_base_/jde_reader_576x320.yml',
+]
+weights: output/jde_darknet53_30e_576x320/model_final
+
+JDE:
+  detector: YOLOv3
+  reid: JDEEmbeddingHead
+  tracker: JDETracker
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+  for_mot: True
+
+YOLOv3Head:
+  anchors: [[85,255], [120,360], [170,420], [340,420],
+            [21,64], [30,90], [43,128], [60,180],
+            [6,16], [8,23], [11,32], [16,45]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDETracker:
+  det_thresh: 0.3
+  track_buffer: 30
+  min_box_area: 200
+  motion: KalmanFilter
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.5
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.4
+    nms_top_k: 2000
+    normalized: true
+    return_index: true
--- a/paddle_detection/configs/mot/jde/jde_darknet53_30e_864x480.yml
+++ b/paddle_detection/configs/mot/jde/jde_darknet53_30e_864x480.yml
@@ -0,0 +1,47 @@
+_BASE_: [
+  '../../datasets/mot.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_30e.yml',
+  '_base_/jde_darknet53.yml',
+  '_base_/jde_reader_864x480.yml',
+]
+weights: output/jde_darknet53_30e_864x480/model_final
+
+JDE:
+  detector: YOLOv3
+  reid: JDEEmbeddingHead
+  tracker: JDETracker
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: JDEBBoxPostProcess
+  for_mot: True
+
+YOLOv3Head:
+  anchors: [[102,305], [143, 429], [203,508], [407,508],
+            [25,76], [36,107], [51,152], [71,215],
+            [6,19], [9,27], [13,38], [18,54]]
+  anchor_masks: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  loss: JDEDetectionLoss
+
+JDETracker:
+  det_thresh: 0.3
+  track_buffer: 30
+  min_box_area: 200
+  motion: KalmanFilter
+
+JDEBBoxPostProcess:
+  decode:
+    name: JDEBox
+    conf_thresh: 0.5
+    downsample_ratio: 32
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 500
+    score_threshold: 0.01
+    nms_threshold: 0.4
+    nms_top_k: 2000
+    normalized: true
+    return_index: true