更换文档检测模型

2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions
--- a/paddle_detection/deploy/auto_compression/README.md
+++ b/paddle_detection/deploy/auto_compression/README.md
@@ -0,0 +1,186 @@
+# 自动化压缩
+
+目录：
+- [1.简介](#1简介)
+- [2.Benchmark](#2Benchmark)
+- [3.开始自动压缩](#自动压缩流程)
+  - [3.1 环境准备](#31-准备环境)
+  - [3.2 准备数据集](#32-准备数据集)
+  - [3.3 准备预测模型](#33-准备预测模型)
+  - [3.4 测试模型精度](#34-测试模型精度)
+  - [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
+- [4.预测部署](#4预测部署)
+
+## 1. 简介
+本示例使用PaddleDetection中Inference部署模型进行自动化压缩，使用的自动化压缩策略为量化蒸馏。
+
+
+## 2.Benchmark
+
+### PP-YOLOE+
+
+| 模型  | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 |  配置文件 | 量化模型  |
+| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
+| PP-YOLOE+_s	 | 43.7  |  - | 42.9  |   -  |   -   |  -  |  [config](./configs/ppyoloe_plus_s_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_s_qat_dis.tar) |
+| PP-YOLOE+_m | 49.8  |  - | 49.3  |   -  |   -   |  -  |  [config](./configs/ppyoloe_plus_m_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_m_qat_dis.tar) |
+| PP-YOLOE+_l | 52.9  |  - | 52.6  |   -  |   -   |  -  |  [config](./configs/ppyoloe_plus_l_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_l_qat_dis.tar) |
+| PP-YOLOE+_x | 54.7  |  - | 54.4  |   -  |   -   |  -  |  [config](./configs/ppyoloe_plus_x_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddledet/deploy/Inference/ppyoloe_plus_x_qat_dis.tar) |
+
+- mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+
+### YOLOv8
+
+| 模型  | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 |  配置文件 | 量化模型  |
+| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
+| YOLOv8-s | 44.9 |  43.9 | 44.3  |   9.27ms  |   4.65ms   |  **3.78ms**  |  [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/yolov8_s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov8_s_500e_coco_trt_nms_quant.tar) |
+
+**注意：**
+- 表格中YOLOv8模型均为带NMS的模型，可直接在TRT中部署，如果需要对齐测试标准，需要测试不带NMS的模型。
+- mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+- 表格中的性能在Tesla T4的GPU环境下测试，并且开启TensorRT，batch_size=1。
+
+### PP-YOLOE
+
+| 模型  | Base mAP | 离线量化mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 |  配置文件 | 量化模型  |
+| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
+| PP-YOLOE-l | 50.9  |  - | 50.6  |   11.2ms  |   7.7ms   |  **6.7ms**  |  [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml) | [Quant Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco_quant.tar) |
+| PP-YOLOE-SOD | 38.5  |  - | 37.6  |   -  |   -   |  -  |  [config](./configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_qat.yml) | [Quant Model](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_sod_visdrone.tar) |
+
+git
+- PP-YOLOE-l mAP的指标在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+- PP-YOLOE-l模型在Tesla V100的GPU环境下测试，并且开启TensorRT，batch_size=1，包含NMS，测试脚本是[benchmark demo](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy/python)。
+- PP-YOLOE-SOD 的指标在VisDrone-DET数据集切图后的COCO格式[数据集](https://bj.bcebos.com/v1/paddledet/data/smalldet/visdrone_sliced.zip)中评测得到，IoU=0.5:0.95。定义文件[ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml](../../configs/smalldet/ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml)
+
+### PP-PicoDet
+
+| 模型  | 策略 | mAP | FP32 | FP16 | INT8 |  配置文件 | 模型  |
+| :-------- |:-------- |:--------: | :----------------: | :----------------: | :---------------: | :----------------------: | :---------------------: |
+| PicoDet-S-NPU | Baseline | 30.1   |   -   |  -  |  -  | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet/picodet_s_416_coco_npu.yml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_416_coco_npu.tar) |
+| PicoDet-S-NPU |  量化训练 | 29.7  |   -  |   -   |  -  |  [config](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/full_quantization/detection/configs/picodet_s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/picodet_s_npu_quant.tar) |
+
+- mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+
+### RT-DETR
+
+| 模型              | Base mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 |  TRT-INT8  |                           配置文件                           |                           量化模型                           |
+| :---------------- | :------- | :--------: | :------: | :------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
+| RT-DETR-R50       | 53.1     |    53.0    | 32.05ms  |  9.12ms  | **6.96ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r50vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r50vd_6x_coco_quant.tar) |
+| RT-DETR-R101      | 54.3     |    54.1    | 54.13ms  | 12.68ms  | **9.20ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r101vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r101vd_6x_coco_quant.tar) |
+| RT-DETR-HGNetv2-L | 53.0     |    52.9    | 26.16ms  |  8.54ms  | **6.65ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_l_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_l_6x_coco_quant.tar) |
+| RT-DETR-HGNetv2-X | 54.8     |    54.6    | 49.22ms  | 12.50ms  | **9.24ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_x_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_x_6x_coco_quant.tar) |
+
+- 上表测试环境：Tesla T4，TensorRT 8.6.0，CUDA 11.7，batch_size=1。
+
+| 模型              | Base mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 |  TRT-INT8  |                           配置文件                           |                           量化模型                           |
+| :---------------- | :------- | :--------: | :------: | :------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
+| RT-DETR-R50       | 53.1     |    53.0    |  9.64ms  |  5.00ms  | **3.99ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r50vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r50vd_6x_coco_quant.tar) |
+| RT-DETR-R101      | 54.3     |    54.1    | 14.93ms  |  7.15ms  | **5.12ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r101vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r101vd_6x_coco_quant.tar) |
+| RT-DETR-HGNetv2-L | 53.0     |    52.9    |  8.17ms  |  4.77ms  | **4.00ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_l_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_l_6x_coco_quant.tar) |
+| RT-DETR-HGNetv2-X | 54.8     |    54.6    | 12.81ms  |  6.97ms  | **5.32ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_x_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_x_6x_coco_quant.tar) |
+
+- 上表测试环境：A10，TensorRT 8.6.0，CUDA 11.6，batch_size=1。
+- mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
+
+## 3. 自动压缩流程
+
+#### 3.1 准备环境
+- PaddlePaddle >= 2.4 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装）
+- PaddleSlim >= 2.4.1
+- PaddleDet >= 2.5
+- opencv-python
+
+安装paddlepaddle：
+```shell
+# CPU
+pip install paddlepaddle
+# GPU
+pip install paddlepaddle-gpu
+```
+
+安装paddleslim：
+```shell
+pip install paddleslim
+```
+
+安装paddledet：
+```shell
+pip install paddledet
+```
+
+**注意：** YOLOv8模型的自动化压缩需要依赖安装最新[Develop Paddle](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)和[Develop PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim#%E5%AE%89%E8%A3%85)版本。
+
+#### 3.2 准备数据集
+
+本案例默认以COCO数据进行自动压缩实验，如果自定义COCO数据，或者其他格式数据，请参考[数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/docs/tutorials/data/PrepareDataSet.md) 来准备数据。
+
+如果数据集为非COCO格式数据，请修改[configs](./configs)中reader配置文件中的Dataset字段。
+
+以PP-YOLOE模型为例，如果已经准备好数据集，请直接修改[./configs/yolo_reader.yml]中`EvalDataset`的`dataset_dir`字段为自己数据集路径即可。
+
+#### 3.3 准备预测模型
+
+预测模型的格式为：`model.pdmodel` 和 `model.pdiparams`两个，带`pdmodel`的是模型文件，带`pdiparams`后缀的是权重文件。
+
+
+根据[PaddleDetection文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#8-%E6%A8%A1%E5%9E%8B%E5%AF%BC%E5%87%BA) 导出Inference模型，具体可参考下方PP-YOLOE模型的导出示例：
+- 下载代码
+```
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+```
+- 导出预测模型
+
+PPYOLOE-l模型，包含NMS：如快速体验，可直接下载[PP-YOLOE-l导出模型](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco.tar)
+```shell
+python tools/export_model.py \
+        -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml \
+        -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams \
+        trt=True \
+```
+
+YOLOv8-s模型，包含NMS，具体可参考[YOLOv8模型文档](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8), 然后执行：
+```shell
+python tools/export_model.py \
+        -c configs/yolov8/yolov8_s_500e_coco.yml \
+        -o weights=https://paddledet.bj.bcebos.com/models/yolov8_s_500e_coco.pdparams \
+        trt=True
+```
+
+如快速体验，可直接下载[YOLOv8-s导出模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov8_s_500e_coco_trt_nms.tar)
+
+#### 3.4 自动压缩并产出模型
+
+蒸馏量化自动压缩示例通过run.py脚本启动，会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数，配置完成后便可对模型进行量化和蒸馏。具体运行命令为：
+
+- 单卡训练：
+```
+export CUDA_VISIBLE_DEVICES=0
+python run.py --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'
+```
+
+- 多卡训练：
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
+          --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'
+```
+
+#### 3.5 测试模型精度
+
+使用eval.py脚本得到模型的mAP：
+```
+export CUDA_VISIBLE_DEVICES=0
+python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml
+```
+
+使用paddle inference并使用trt int8得到模型的mAP:
+```
+export CUDA_VISIBLE_DEVICES=0
+python paddle_inference_eval.py --model_path ./output/ --reader_config configs/ppyoloe_reader.yml --precision int8 --use_trt=True
+```
+
+**注意**：
+- 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。
+- --precision 默认为paddle，如果使用trt，需要设置--use_trt=True，同时--precision 可设置为fp32/fp16/int8
+
+## 4.预测部署
+
+- 可以参考[PaddleDetection部署教程](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy)，GPU上量化模型开启TensorRT并设置trt_int8模式进行部署。
--- a/paddle_detection/deploy/auto_compression/configs/picodet_reader.yml
+++ b/paddle_detection/deploy/auto_compression/configs/picodet_reader.yml
@@ -0,0 +1,32 @@
+metric: COCO
+num_classes: 80
+
+
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco/
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+
+worker_num: 6
+eval_height: &eval_height 416
+eval_width: &eval_width 416
+eval_size: &eval_size [*eval_height, *eval_width]
+
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
+  - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32}
+  batch_size: 8
+  shuffle: false
--- a/paddle_detection/deploy/auto_compression/configs/picodet_s_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/picodet_s_qat_dis.yaml
@@ -0,0 +1,34 @@
+Global:
+  reader_config: ./configs/picodet_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./picodet_s_416_coco_npu/
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: l2
+
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  weight_bits: 8
+  activation_bits: 8
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 8000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00001
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 4.0e-05
+
+
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_qat.yml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_qat.yml
@@ -0,0 +1,34 @@
+
+Global:
+  reader_config: configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_reader.yml
+  input_list: ['image', 'scale_factor']
+  arch: YOLO
+  include_nms: True
+  Evaluation: True
+  model_dir: ../../output_inference/ppyoloe_crn_l_80e_sliced_visdrone_640_025
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  onnx_format: True
+  use_pact: False
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 8000
+  eval_iter: 500
+  learning_rate:
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 4.0e-05
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_reader.yml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_crn_l_80e_sliced_visdrone_640_025_reader.yml
@@ -0,0 +1,25 @@
+metric: COCO
+num_classes: 10
+
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train_images_640_025
+    anno_path: train_640_025.json
+    dataset_dir: dataset/visdrone_sliced
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val_images_640_025
+    anno_path: val_640_025.json
+    dataset_dir: dataset/visdrone_sliced
+worker_num: 0
+
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    #- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 16
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_l_qat_dis.yaml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/ppyoloe_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./ppyoloe_crn_l_300e_coco
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
+
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_crn_t_auxhead_300e_coco_qat.yml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_crn_t_auxhead_300e_coco_qat.yml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ../../output_inference/ppyoloe_plus_crn_t_auxhead_300e_coco/
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  onnx_format: True
+  use_pact: False
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 8000
+  eval_iter: 1000
+  learning_rate:
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 4.0e-05
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_l_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_l_qat_dis.yaml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./ppyoloe_plus_crn_l_80e_coco  
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
+
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_m_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_m_qat_dis.yaml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./ppyoloe_plus_crn_m_80e_coco
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
+
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_reader.yml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_reader.yml
@@ -0,0 +1,26 @@
+metric: COCO
+num_classes: 80
+
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco/
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+
+worker_num: 0
+
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 4
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_s_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_s_qat_dis.yaml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./ppyoloe_plus_crn_s_80e_coco
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
+
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_sod_crn_l_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_sod_crn_l_qat_dis.yaml
@@ -0,0 +1,33 @@
+
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ../../output_inference/ppyoloe_plus_sod_crn_l_80e_coco  
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  onnx_format: True
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 1
+  eval_iter: 1
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
+
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_x_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_plus_x_qat_dis.yaml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/ppyoloe_plus_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./ppyoloe_plus_crn_x_80e_coco  
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 5000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 6000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
+
--- a/paddle_detection/deploy/auto_compression/configs/ppyoloe_reader.yml
+++ b/paddle_detection/deploy/auto_compression/configs/ppyoloe_reader.yml
@@ -0,0 +1,26 @@
+metric: COCO
+num_classes: 80
+
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco/
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+
+worker_num: 0
+
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 4
--- a/paddle_detection/deploy/auto_compression/configs/rtdetr_hgnetv2_l_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/rtdetr_hgnetv2_l_qat_dis.yaml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/rtdetr_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./rtdetr_hgnetv2_l_6x_coco/
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+  - matmul_v2
+
+TrainConfig:
+  train_iter: 200
+  eval_iter: 50
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 10000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/paddle_detection/deploy/auto_compression/configs/rtdetr_hgnetv2_x_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/rtdetr_hgnetv2_x_qat_dis.yaml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/rtdetr_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./rtdetr_hgnetv2_x_6x_coco/
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+  - matmul_v2
+
+TrainConfig:
+  train_iter: 500
+  eval_iter: 100
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 10000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/paddle_detection/deploy/auto_compression/configs/rtdetr_r101vd_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/rtdetr_r101vd_qat_dis.yaml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/rtdetr_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./rtdetr_r101vd_6x_coco/ 
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+  - matmul_v2
+
+TrainConfig:
+  train_iter: 200
+  eval_iter: 50
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 10000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/paddle_detection/deploy/auto_compression/configs/rtdetr_r50vd_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/rtdetr_r50vd_qat_dis.yaml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/rtdetr_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./rtdetr_r50vd_6x_coco/ 
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+  - matmul_v2
+
+TrainConfig:
+  train_iter: 500
+  eval_iter: 100
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 10000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
--- a/paddle_detection/deploy/auto_compression/configs/rtdetr_reader.yml
+++ b/paddle_detection/deploy/auto_compression/configs/rtdetr_reader.yml
@@ -0,0 +1,38 @@
+metric: COCO
+num_classes: 80
+
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco/
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+
+TestDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+
+worker_num: 0
+
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
--- a/paddle_detection/deploy/auto_compression/configs/yolov5_reader.yml
+++ b/paddle_detection/deploy/auto_compression/configs/yolov5_reader.yml
@@ -0,0 +1,26 @@
+metric: COCO
+num_classes: 80
+
+# Datset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco/
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+
+worker_num: 0
+
+# preprocess reader in test
+TestReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: True, interp: 1}
+    - Pad: {size: [640, 640], fill_value: [114., 114., 114.]}
+    - Permute: {}
+  batch_size: 1
--- a/paddle_detection/deploy/auto_compression/configs/yolov5_s_qat_dis.yml
+++ b/paddle_detection/deploy/auto_compression/configs/yolov5_s_qat_dis.yml
@@ -0,0 +1,29 @@
+
+Global:
+  reader_config: configs/yolov5_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./yolov5_s_300e_coco
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  use_pact: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 3000
+  eval_iter: 1000
+  learning_rate: 0.00001
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 4.0e-05
+  target_metric: 0.365
--- a/paddle_detection/deploy/auto_compression/configs/yolov6mt_s_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/yolov6mt_s_qat_dis.yaml
@@ -0,0 +1,30 @@
+
+Global:
+  reader_config: configs/yolov5_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./yolov6mt_s_400e_coco
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 8000
+  eval_iter: 1000
+  learning_rate: 
+    type: CosineAnnealingDecay 
+    learning_rate: 0.00003
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 0.00004
--- a/paddle_detection/deploy/auto_compression/configs/yolov7_l_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/yolov7_l_qat_dis.yaml
@@ -0,0 +1,30 @@
+
+Global:
+  reader_config: configs/yolov5_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./yolov7_l_300e_coco
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 8000
+  eval_iter: 1000
+  learning_rate: 
+    type: CosineAnnealingDecay 
+    learning_rate: 0.00003
+    T_max: 8000
+  optimizer_builder:
+    optimizer:
+      type: SGD
+    weight_decay: 0.00004
--- a/paddle_detection/deploy/auto_compression/configs/yolov8_reader.yml
+++ b/paddle_detection/deploy/auto_compression/configs/yolov8_reader.yml
@@ -0,0 +1,27 @@
+metric: COCO
+num_classes: 80
+
+# Dataset configuration
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco/
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco/
+
+worker_num: 0
+
+# preprocess reader in test
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: True, interp: 1}
+    - Pad: {size: [640, 640], fill_value: [114., 114., 114.]}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 4
--- a/paddle_detection/deploy/auto_compression/configs/yolov8_s_qat_dis.yaml
+++ b/paddle_detection/deploy/auto_compression/configs/yolov8_s_qat_dis.yaml
@@ -0,0 +1,32 @@
+
+Global:
+  reader_config: configs/yolov8_reader.yml
+  include_nms: True
+  Evaluation: True
+  model_dir: ./yolov8_s_500e_coco_trt_nms/
+  model_filename: model.pdmodel
+  params_filename: model.pdiparams
+
+Distillation:
+  alpha: 1.0
+  loss: soft_label
+
+QuantAware:
+  onnx_format: true
+  activation_quantize_type: 'moving_average_abs_max'
+  quantize_op_types:
+  - conv2d
+  - depthwise_conv2d
+
+TrainConfig:
+  train_iter: 8000
+  eval_iter: 1000
+  learning_rate:  
+    type: CosineAnnealingDecay
+    learning_rate: 0.00003
+    T_max: 10000
+  optimizer_builder:
+    optimizer: 
+      type: SGD
+    weight_decay: 4.0e-05
+
--- a/paddle_detection/deploy/auto_compression/eval.py
+++ b/paddle_detection/deploy/auto_compression/eval.py
@@ -0,0 +1,163 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import numpy as np
+import argparse
+import paddle
+from ppdet.core.workspace import load_config, merge_config
+from ppdet.core.workspace import create
+from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
+from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from post_process import PPYOLOEPostProcess
+
+
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        '--config_path',
+        type=str,
+        default=None,
+        help="path of compression strategy config.",
+        required=True)
+    parser.add_argument(
+        '--devices',
+        type=str,
+        default='gpu',
+        help="which device used to compress.")
+
+    return parser
+
+
+def reader_wrapper(reader, input_list):
+    def gen():
+        for data in reader:
+            in_dict = {}
+            if isinstance(input_list, list):
+                for input_name in input_list:
+                    in_dict[input_name] = data[input_name]
+            elif isinstance(input_list, dict):
+                for input_name in input_list.keys():
+                    in_dict[input_list[input_name]] = data[input_name]
+            yield in_dict
+
+    return gen
+
+
+def convert_numpy_data(data, metric):
+    data_all = {}
+    data_all = {k: np.array(v) for k, v in data.items()}
+    if isinstance(metric, VOCMetric):
+        for k, v in data_all.items():
+            if not isinstance(v[0], np.ndarray):
+                tmp_list = []
+                for t in v:
+                    tmp_list.append(np.array(t))
+                data_all[k] = np.array(tmp_list)
+    else:
+        data_all = {k: np.array(v) for k, v in data.items()}
+    return data_all
+
+
+def eval():
+
+    place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
+    exe = paddle.static.Executor(place)
+
+    val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(
+        global_config["model_dir"].rstrip('/'),
+        exe,
+        model_filename=global_config["model_filename"],
+        params_filename=global_config["params_filename"])
+    print('Loaded model from: {}'.format(global_config["model_dir"]))
+
+    metric = global_config['metric']
+    for batch_id, data in enumerate(val_loader):
+        data_all = convert_numpy_data(data, metric)
+        data_input = {}
+        for k, v in data.items():
+            if isinstance(global_config['input_list'], list):
+                if k in global_config['input_list']:
+                    data_input[k] = np.array(v)
+            elif isinstance(global_config['input_list'], dict):
+                if k in global_config['input_list'].keys():
+                    data_input[global_config['input_list'][k]] = np.array(v)
+
+        outs = exe.run(val_program,
+                       feed=data_input,
+                       fetch_list=fetch_targets,
+                       return_numpy=False)
+        res = {}
+        if 'arch' in global_config and global_config['arch'] == 'PPYOLOE':
+            postprocess = PPYOLOEPostProcess(
+                score_threshold=0.01, nms_threshold=0.6)
+            res = postprocess(np.array(outs[0]), data_all['scale_factor'])
+        else:
+            for out in outs:
+                v = np.array(out)
+                if len(v.shape) > 1:
+                    res['bbox'] = v
+                else:
+                    res['bbox_num'] = v
+        metric.update(data_all, res)
+        if batch_id % 100 == 0:
+            print('Eval iter:', batch_id)
+    metric.accumulate()
+    metric.log()
+    metric.reset()
+
+
+def main():
+    global global_config
+    all_config = load_slim_config(FLAGS.config_path)
+    assert "Global" in all_config, "Key 'Global' not found in config file."
+    global_config = all_config["Global"]
+    reader_cfg = load_config(global_config['reader_config'])
+
+    dataset = reader_cfg['EvalDataset']
+    global val_loader
+    val_loader = create('EvalReader')(reader_cfg['EvalDataset'],
+                                      reader_cfg['worker_num'],
+                                      return_list=True)
+    metric = None
+    if reader_cfg['metric'] == 'COCO':
+        clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
+        anno_file = dataset.get_anno()
+        metric = COCOMetric(
+            anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
+    elif reader_cfg['metric'] == 'VOC':
+        metric = VOCMetric(
+            label_list=dataset.get_label_list(),
+            class_num=reader_cfg['num_classes'],
+            map_type=reader_cfg['map_type'])
+    elif reader_cfg['metric'] == 'KeyPointTopDownCOCOEval':
+        anno_file = dataset.get_anno()
+        metric = KeyPointTopDownCOCOEval(anno_file,
+                                         len(dataset), 17, 'output_eval')
+    else:
+        raise ValueError("metric currently only supports COCO and VOC.")
+    global_config['metric'] = metric
+
+    eval()
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
+    paddle.set_device(FLAGS.devices)
+
+    main()
--- a/paddle_detection/deploy/auto_compression/paddle_inference_eval.py
+++ b/paddle_detection/deploy/auto_compression/paddle_inference_eval.py
@@ -0,0 +1,446 @@
+#opyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+import time
+import sys
+import cv2
+import numpy as np
+
+import paddle
+from paddle.inference import Config
+from paddle.inference import create_predictor
+from ppdet.core.workspace import load_config, create
+from ppdet.metrics import COCOMetric
+
+from post_process import PPYOLOEPostProcess
+
+
+def argsparser():
+    """
+    argsparser func
+    """
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model_path", type=str, help="inference model filepath")
+    parser.add_argument(
+        "--image_file",
+        type=str,
+        default=None,
+        help="image path, if set image_file, it will not eval coco.")
+    parser.add_argument(
+        "--reader_config",
+        type=str,
+        default=None,
+        help="path of datset and reader config.")
+    parser.add_argument(
+        "--benchmark",
+        type=bool,
+        default=False,
+        help="Whether run benchmark or not.")
+    parser.add_argument(
+        "--use_trt",
+        type=bool,
+        default=False,
+        help="Whether use TensorRT or not.")
+    parser.add_argument(
+        "--precision",
+        type=str,
+        default="paddle",
+        help="mode of running(fp32/fp16/int8)")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default="GPU",
+        help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU",
+    )
+    parser.add_argument(
+        "--use_dynamic_shape",
+        type=bool,
+        default=True,
+        help="Whether use dynamic shape or not.")
+    parser.add_argument(
+        "--use_mkldnn",
+        type=bool,
+        default=False,
+        help="Whether use mkldnn or not.")
+    parser.add_argument(
+        "--cpu_threads", type=int, default=10, help="Num of cpu threads.")
+    parser.add_argument("--img_shape", type=int, default=640, help="input_size")
+    parser.add_argument(
+        '--include_nms',
+        type=bool,
+        default=True,
+        help="Whether include nms or not.")
+
+    return parser
+
+
+CLASS_LABEL = [
+    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
+    'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
+    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
+    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
+    'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
+    'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+    'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
+    'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+    'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
+    'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
+    'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
+    'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
+    'hair drier', 'toothbrush'
+]
+
+
+def generate_scale(im, target_shape, keep_ratio=True):
+    """
+    Args:
+        im (np.ndarray): image (np.ndarray)
+    Returns:
+        im_scale_x: the resize ratio of X
+        im_scale_y: the resize ratio of Y
+    """
+    origin_shape = im.shape[:2]
+    if keep_ratio:
+        im_size_min = np.min(origin_shape)
+        im_size_max = np.max(origin_shape)
+        target_size_min = np.min(target_shape)
+        target_size_max = np.max(target_shape)
+        im_scale = float(target_size_min) / float(im_size_min)
+        if np.round(im_scale * im_size_max) > target_size_max:
+            im_scale = float(target_size_max) / float(im_size_max)
+        im_scale_x = im_scale
+        im_scale_y = im_scale
+    else:
+        resize_h, resize_w = target_shape
+        im_scale_y = resize_h / float(origin_shape[0])
+        im_scale_x = resize_w / float(origin_shape[1])
+    return im_scale_y, im_scale_x
+
+
+def image_preprocess(img_path, target_shape):
+    """
+    image_preprocess func
+    """
+    img = cv2.imread(img_path)
+    im_scale_y, im_scale_x = generate_scale(img, target_shape, keep_ratio=False)
+    img = cv2.resize(
+        img, (target_shape[0], target_shape[0]),
+        interpolation=cv2.INTER_LANCZOS4)
+    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+    img = np.transpose(img, [2, 0, 1]) / 255
+    img = np.expand_dims(img, 0)
+    img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
+    img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
+    img -= img_mean
+    img /= img_std
+    scale_factor = np.array([[im_scale_y, im_scale_x]])
+    return img.astype(np.float32), scale_factor.astype(np.float32)
+
+
+def get_color_map_list(num_classes):
+    """
+    get_color_map_list func
+    """
+    color_map = num_classes * [0, 0, 0]
+    for i in range(0, num_classes):
+        j = 0
+        lab = i
+        while lab:
+            color_map[i * 3] |= ((lab >> 0) & 1) << (7 - j)
+            color_map[i * 3 + 1] |= ((lab >> 1) & 1) << (7 - j)
+            color_map[i * 3 + 2] |= ((lab >> 2) & 1) << (7 - j)
+            j += 1
+            lab >>= 3
+    color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
+    return color_map
+
+
+def draw_box(image_file, results, class_label, threshold=0.5):
+    """
+    draw_box func
+    """
+    srcimg = cv2.imread(image_file, 1)
+    for i in range(len(results)):
+        color_list = get_color_map_list(len(class_label))
+        clsid2color = {}
+        classid, conf = int(results[i, 0]), results[i, 1]
+        if conf < threshold:
+            continue
+        xmin, ymin, xmax, ymax = int(results[i, 2]), int(results[i, 3]), int(
+            results[i, 4]), int(results[i, 5])
+
+        if classid not in clsid2color:
+            clsid2color[classid] = color_list[classid]
+        color = tuple(clsid2color[classid])
+
+        cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2)
+        print(class_label[classid] + ": " + str(round(conf, 3)))
+        cv2.putText(
+            srcimg,
+            class_label[classid] + ":" + str(round(conf, 3)),
+            (xmin, ymin - 10),
+            cv2.FONT_HERSHEY_SIMPLEX,
+            0.8,
+            (0, 255, 0),
+            thickness=2, )
+    return srcimg
+
+
+def load_predictor(
+        model_dir,
+        precision="fp32",
+        use_trt=False,
+        use_mkldnn=False,
+        batch_size=1,
+        device="CPU",
+        min_subgraph_size=3,
+        use_dynamic_shape=False,
+        trt_min_shape=1,
+        trt_max_shape=1280,
+        trt_opt_shape=640,
+        cpu_threads=1, ):
+    """set AnalysisConfig, generate AnalysisPredictor
+    Args:
+        model_dir (str): root path of __model__ and __params__
+        precision (str): mode of running(fp32/fp16/int8)
+        use_trt (bool): whether use TensorRT or not.
+        use_mkldnn (bool): whether use MKLDNN or not in CPU.
+        device (str): Choose the device you want to run, it can be: CPU/GPU, default is CPU
+        use_dynamic_shape (bool): use dynamic shape or not
+        trt_min_shape (int): min shape for dynamic shape in trt
+        trt_max_shape (int): max shape for dynamic shape in trt
+        trt_opt_shape (int): opt shape for dynamic shape in trt
+    Returns:
+        predictor (PaddlePredictor): AnalysisPredictor
+    Raises:
+        ValueError: predict by TensorRT need device == 'GPU'.
+    """
+    rerun_flag = False
+    if device != "GPU" and use_trt:
+        raise ValueError(
+            "Predict by TensorRT mode: {}, expect device=='GPU', but device == {}".
+            format(precision, device))
+    config = Config(
+        os.path.join(model_dir, "model.pdmodel"),
+        os.path.join(model_dir, "model.pdiparams"))
+    if device == "GPU":
+        # initial GPU memory(M), device ID
+        config.enable_use_gpu(200, 0)
+        # optimize graph and fuse op
+        config.switch_ir_optim(True)
+    else:
+        config.disable_gpu()
+        config.set_cpu_math_library_num_threads(cpu_threads)
+        config.switch_ir_optim()
+        if use_mkldnn:
+            config.enable_mkldnn()
+            if precision == "int8":
+                config.enable_mkldnn_int8(
+                    {"conv2d", "depthwise_conv2d", "transpose2", "pool2d"})
+
+    precision_map = {
+        "int8": Config.Precision.Int8,
+        "fp32": Config.Precision.Float32,
+        "fp16": Config.Precision.Half,
+    }
+    if precision in precision_map.keys() and use_trt:
+        config.enable_tensorrt_engine(
+            workspace_size=(1 << 25) * batch_size,
+            max_batch_size=batch_size,
+            min_subgraph_size=min_subgraph_size,
+            precision_mode=precision_map[precision],
+            use_static=True,
+            use_calib_mode=False, )
+
+        if use_dynamic_shape:
+            dynamic_shape_file = os.path.join(FLAGS.model_path,
+                                              "dynamic_shape.txt")
+            if os.path.exists(dynamic_shape_file):
+                config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
+                                                           True)
+                print("trt set dynamic shape done!")
+            else:
+                config.collect_shape_range_info(dynamic_shape_file)
+                print("Start collect dynamic shape...")
+                rerun_flag = True
+
+    # enable shared memory
+    config.enable_memory_optim()
+    predictor = create_predictor(config)
+    return predictor, rerun_flag
+
+
+def predict_image(predictor,
+                  image_file,
+                  image_shape=[640, 640],
+                  warmup=1,
+                  repeats=1,
+                  threshold=0.5):
+    """
+    predict image main func
+    """
+    img, scale_factor = image_preprocess(image_file, image_shape)
+    inputs = {}
+    inputs["image"] = img
+    if FLAGS.include_nms:
+        inputs['scale_factor'] = scale_factor
+    input_names = predictor.get_input_names()
+    for i, _ in enumerate(input_names):
+        input_tensor = predictor.get_input_handle(input_names[i])
+        input_tensor.copy_from_cpu(inputs[input_names[i]])
+
+    for i in range(warmup):
+        predictor.run()
+
+    np_boxes, np_boxes_num = None, None
+    cpu_mems, gpu_mems = 0, 0
+    predict_time = 0.0
+    time_min = float("inf")
+    time_max = float("-inf")
+    for i in range(repeats):
+        start_time = time.time()
+        predictor.run()
+        output_names = predictor.get_output_names()
+        boxes_tensor = predictor.get_output_handle(output_names[0])
+        np_boxes = boxes_tensor.copy_to_cpu()
+        if FLAGS.include_nms:
+            boxes_num = predictor.get_output_handle(output_names[1])
+            np_boxes_num = boxes_num.copy_to_cpu()
+        end_time = time.time()
+        timed = end_time - start_time
+        time_min = min(time_min, timed)
+        time_max = max(time_max, timed)
+        predict_time += timed
+    time_avg = predict_time / repeats
+    print("[Benchmark]Inference time(ms): min={}, max={}, avg={}".format(
+        round(time_min * 1000, 2),
+        round(time_max * 1000, 1), round(time_avg * 1000, 1)))
+    if not FLAGS.include_nms:
+        postprocess = PPYOLOEPostProcess(score_threshold=0.3, nms_threshold=0.6)
+        res = postprocess(np_boxes, scale_factor)
+    else:
+        res = {'bbox': np_boxes, 'bbox_num': np_boxes_num}
+    res_img = draw_box(
+        image_file, res["bbox"], CLASS_LABEL, threshold=threshold)
+    cv2.imwrite("result.jpg", res_img)
+
+
+def eval(predictor, val_loader, metric, rerun_flag=False):
+    """
+    eval main func
+    """
+    cpu_mems, gpu_mems = 0, 0
+    predict_time = 0.0
+    time_min = float("inf")
+    time_max = float("-inf")
+    sample_nums = len(val_loader)
+    input_names = predictor.get_input_names()
+    output_names = predictor.get_output_names()
+    boxes_tensor = predictor.get_output_handle(output_names[0])
+    if FLAGS.include_nms:
+        boxes_num = predictor.get_output_handle(output_names[1])
+    for batch_id, data in enumerate(val_loader):
+        data_all = {k: np.array(v) for k, v in data.items()}
+        for i, _ in enumerate(input_names):
+            input_tensor = predictor.get_input_handle(input_names[i])
+            input_tensor.copy_from_cpu(data_all[input_names[i]])
+        start_time = time.time()
+        predictor.run()
+        np_boxes = boxes_tensor.copy_to_cpu()
+        if FLAGS.include_nms:
+            np_boxes_num = boxes_num.copy_to_cpu()
+        if rerun_flag:
+            return
+        end_time = time.time()
+        timed = end_time - start_time
+        time_min = min(time_min, timed)
+        time_max = max(time_max, timed)
+        predict_time += timed
+        if not FLAGS.include_nms:
+            postprocess = PPYOLOEPostProcess(
+                score_threshold=0.3, nms_threshold=0.6)
+            res = postprocess(np_boxes, data_all['scale_factor'])
+        else:
+            res = {'bbox': np_boxes, 'bbox_num': np_boxes_num}
+        metric.update(data_all, res)
+        if batch_id % 100 == 0:
+            print("Eval iter:", batch_id)
+            sys.stdout.flush()
+    metric.accumulate()
+    metric.log()
+    map_res = metric.get_results()
+    metric.reset()
+    time_avg = predict_time / sample_nums
+    print("[Benchmark]Inference time(ms): min={}, max={}, avg={}".format(
+        round(time_min * 1000, 2),
+        round(time_max * 1000, 1), round(time_avg * 1000, 1)))
+    print("[Benchmark] COCO mAP: {}".format(map_res["bbox"][0]))
+    sys.stdout.flush()
+
+
+def main():
+    """
+    main func
+    """
+    predictor, rerun_flag = load_predictor(
+        FLAGS.model_path,
+        device=FLAGS.device,
+        use_trt=FLAGS.use_trt,
+        use_mkldnn=FLAGS.use_mkldnn,
+        precision=FLAGS.precision,
+        use_dynamic_shape=FLAGS.use_dynamic_shape,
+        cpu_threads=FLAGS.cpu_threads)
+
+    if FLAGS.image_file:
+        warmup, repeats = 1, 1
+        if FLAGS.benchmark:
+            warmup, repeats = 50, 100
+        predict_image(
+            predictor,
+            FLAGS.image_file,
+            image_shape=[FLAGS.img_shape, FLAGS.img_shape],
+            warmup=warmup,
+            repeats=repeats)
+    else:
+        reader_cfg = load_config(FLAGS.reader_config)
+
+        dataset = reader_cfg["EvalDataset"]
+        global val_loader
+        val_loader = create("EvalReader")(reader_cfg["EvalDataset"],
+                                          reader_cfg["worker_num"],
+                                          return_list=True)
+        clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
+        anno_file = dataset.get_anno()
+        metric = COCOMetric(
+            anno_file=anno_file, clsid2catid=clsid2catid, IouType="bbox")
+        eval(predictor, val_loader, metric, rerun_flag=rerun_flag)
+
+    if rerun_flag:
+        print(
+            "***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
+        )
+
+
+if __name__ == "__main__":
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+
+    # DataLoader need run on cpu
+    paddle.set_device("cpu")
+
+    main()
--- a/paddle_detection/deploy/auto_compression/post_process.py
+++ b/paddle_detection/deploy/auto_compression/post_process.py
@@ -0,0 +1,157 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+import cv2
+
+
+def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200):
+    """
+    Args:
+        box_scores (N, 5): boxes in corner-form and probabilities.
+        iou_threshold: intersection over union threshold.
+        top_k: keep top_k results. If k <= 0, keep all the results.
+        candidate_size: only consider the candidates with the highest scores.
+    Returns:
+         picked: a list of indexes of the kept boxes
+    """
+    scores = box_scores[:, -1]
+    boxes = box_scores[:, :-1]
+    picked = []
+    indexes = np.argsort(scores)
+    indexes = indexes[-candidate_size:]
+    while len(indexes) > 0:
+        current = indexes[-1]
+        picked.append(current)
+        if 0 < top_k == len(picked) or len(indexes) == 1:
+            break
+        current_box = boxes[current, :]
+        indexes = indexes[:-1]
+        rest_boxes = boxes[indexes, :]
+        iou = iou_of(
+            rest_boxes,
+            np.expand_dims(
+                current_box, axis=0), )
+        indexes = indexes[iou <= iou_threshold]
+
+    return box_scores[picked, :]
+
+
+def iou_of(boxes0, boxes1, eps=1e-5):
+    """Return intersection-over-union (Jaccard index) of boxes.
+    Args:
+        boxes0 (N, 4): ground truth boxes.
+        boxes1 (N or 1, 4): predicted boxes.
+        eps: a small number to avoid 0 as denominator.
+    Returns:
+        iou (N): IoU values.
+    """
+    overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2])
+    overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:])
+
+    overlap_area = area_of(overlap_left_top, overlap_right_bottom)
+    area0 = area_of(boxes0[..., :2], boxes0[..., 2:])
+    area1 = area_of(boxes1[..., :2], boxes1[..., 2:])
+    return overlap_area / (area0 + area1 - overlap_area + eps)
+
+
+def area_of(left_top, right_bottom):
+    """Compute the areas of rectangles given two corners.
+    Args:
+        left_top (N, 2): left top corner.
+        right_bottom (N, 2): right bottom corner.
+    Returns:
+        area (N): return the area.
+    """
+    hw = np.clip(right_bottom - left_top, 0.0, None)
+    return hw[..., 0] * hw[..., 1]
+
+
+class PPYOLOEPostProcess(object):
+    """
+    Args:
+        input_shape (int): network input image size
+        scale_factor (float): scale factor of ori image
+    """
+
+    def __init__(self,
+                 score_threshold=0.4,
+                 nms_threshold=0.5,
+                 nms_top_k=10000,
+                 keep_top_k=300):
+        self.score_threshold = score_threshold
+        self.nms_threshold = nms_threshold
+        self.nms_top_k = nms_top_k
+        self.keep_top_k = keep_top_k
+
+    def _non_max_suppression(self, prediction, scale_factor):
+        batch_size = prediction.shape[0]
+        out_boxes_list = []
+        box_num_list = []
+        for batch_id in range(batch_size):
+            bboxes, confidences = prediction[batch_id][..., :4], prediction[
+                batch_id][..., 4:]
+            # nms
+            picked_box_probs = []
+            picked_labels = []
+            for class_index in range(0, confidences.shape[1]):
+                probs = confidences[:, class_index]
+                mask = probs > self.score_threshold
+                probs = probs[mask]
+                if probs.shape[0] == 0:
+                    continue
+                subset_boxes = bboxes[mask, :]
+                box_probs = np.concatenate(
+                    [subset_boxes, probs.reshape(-1, 1)], axis=1)
+                box_probs = hard_nms(
+                    box_probs,
+                    iou_threshold=self.nms_threshold,
+                    top_k=self.nms_top_k)
+                picked_box_probs.append(box_probs)
+                picked_labels.extend([class_index] * box_probs.shape[0])
+
+            if len(picked_box_probs) == 0:
+                out_boxes_list.append(np.empty((0, 4)))
+
+            else:
+                picked_box_probs = np.concatenate(picked_box_probs)
+                # resize output boxes
+                picked_box_probs[:, 0] /= scale_factor[batch_id][1]
+                picked_box_probs[:, 2] /= scale_factor[batch_id][1]
+                picked_box_probs[:, 1] /= scale_factor[batch_id][0]
+                picked_box_probs[:, 3] /= scale_factor[batch_id][0]
+
+                # clas score box
+                out_box = np.concatenate(
+                    [
+                        np.expand_dims(
+                            np.array(picked_labels), axis=-1), np.expand_dims(
+                                picked_box_probs[:, 4], axis=-1),
+                        picked_box_probs[:, :4]
+                    ],
+                    axis=1)
+                if out_box.shape[0] > self.keep_top_k:
+                    out_box = out_box[out_box[:, 1].argsort()[::-1]
+                                      [:self.keep_top_k]]
+                out_boxes_list.append(out_box)
+                box_num_list.append(out_box.shape[0])
+
+        out_boxes_list = np.concatenate(out_boxes_list, axis=0)
+        box_num_list = np.array(box_num_list)
+        return out_boxes_list, box_num_list
+
+    def __call__(self, outs, scale_factor):
+        out_boxes_list, box_num_list = self._non_max_suppression(outs,
+                                                                 scale_factor)
+        return {'bbox': out_boxes_list, 'bbox_num': box_num_list}
--- a/paddle_detection/deploy/auto_compression/run.py
+++ b/paddle_detection/deploy/auto_compression/run.py
@@ -0,0 +1,191 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import numpy as np
+import argparse
+import paddle
+from ppdet.core.workspace import load_config, merge_config
+from ppdet.core.workspace import create
+from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
+from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
+from paddleslim.auto_compression import AutoCompression
+from post_process import PPYOLOEPostProcess
+from paddleslim.common.dataloader import get_feed_vars
+
+
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        '--config_path',
+        type=str,
+        default=None,
+        help="path of compression strategy config.",
+        required=True)
+    parser.add_argument(
+        '--save_dir',
+        type=str,
+        default='output',
+        help="directory to save compressed model.")
+    parser.add_argument(
+        '--devices',
+        type=str,
+        default='gpu',
+        help="which device used to compress.")
+
+    return parser
+
+
+def reader_wrapper(reader, input_list):
+    def gen():
+        for data in reader:
+            in_dict = {}
+            if isinstance(input_list, list):
+                for input_name in input_list:
+                    in_dict[input_name] = data[input_name]
+            elif isinstance(input_list, dict):
+                for input_name in input_list.keys():
+                    in_dict[input_list[input_name]] = data[input_name]
+            yield in_dict
+
+    return gen
+
+
+def convert_numpy_data(data, metric):
+    data_all = {}
+    data_all = {k: np.array(v) for k, v in data.items()}
+    if isinstance(metric, VOCMetric):
+        for k, v in data_all.items():
+            if not isinstance(v[0], np.ndarray):
+                tmp_list = []
+                for t in v:
+                    tmp_list.append(np.array(t))
+                data_all[k] = np.array(tmp_list)
+    else:
+        data_all = {k: np.array(v) for k, v in data.items()}
+    return data_all
+
+
+def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
+    metric = global_config['metric']
+    for batch_id, data in enumerate(val_loader):
+        data_all = convert_numpy_data(data, metric)
+        data_input = {}
+        for k, v in data.items():
+            if isinstance(global_config['input_list'], list):
+                if k in test_feed_names:
+                    data_input[k] = np.array(v)
+            elif isinstance(global_config['input_list'], dict):
+                if k in global_config['input_list'].keys():
+                    data_input[global_config['input_list'][k]] = np.array(v)
+        outs = exe.run(compiled_test_program,
+                       feed=data_input,
+                       fetch_list=test_fetch_list,
+                       return_numpy=False)
+        res = {}
+        if 'include_nms' in global_config and not global_config['include_nms']:
+            if 'arch' in global_config and global_config['arch'] == 'PPYOLOE':
+                postprocess = PPYOLOEPostProcess(
+                    score_threshold=0.01, nms_threshold=0.6)
+            else:
+                assert "Not support arch={} now.".format(global_config['arch'])
+            res = postprocess(np.array(outs[0]), data_all['scale_factor'])
+        else:
+            for out in outs:
+                v = np.array(out)
+                if len(v.shape) > 1:
+                    res['bbox'] = v
+                else:
+                    res['bbox_num'] = v
+
+        metric.update(data_all, res)
+        if batch_id % 100 == 0:
+            print('Eval iter:', batch_id)
+    metric.accumulate()
+    metric.log()
+    map_res = metric.get_results()
+    metric.reset()
+    map_key = 'keypoint' if 'arch' in global_config and global_config[
+        'arch'] == 'keypoint' else 'bbox'
+    return map_res[map_key][0]
+
+
+def main():
+    global global_config
+    all_config = load_slim_config(FLAGS.config_path)
+    assert "Global" in all_config, "Key 'Global' not found in config file."
+    global_config = all_config["Global"]
+    reader_cfg = load_config(global_config['reader_config'])
+
+    train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
+                                        reader_cfg['worker_num'],
+                                        return_list=True)
+    if global_config.get('input_list') is None:
+        global_config['input_list'] = get_feed_vars(
+            global_config['model_dir'], global_config['model_filename'],
+            global_config['params_filename'])
+    train_loader = reader_wrapper(train_loader, global_config['input_list'])
+
+    if 'Evaluation' in global_config.keys() and global_config[
+            'Evaluation'] and paddle.distributed.get_rank() == 0:
+        eval_func = eval_function
+        dataset = reader_cfg['EvalDataset']
+        global val_loader
+        _eval_batch_sampler = paddle.io.BatchSampler(
+            dataset, batch_size=reader_cfg['EvalReader']['batch_size'])
+        val_loader = create('EvalReader')(dataset,
+                                          reader_cfg['worker_num'],
+                                          batch_sampler=_eval_batch_sampler,
+                                          return_list=True)
+        metric = None
+        if reader_cfg['metric'] == 'COCO':
+            clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
+            anno_file = dataset.get_anno()
+            metric = COCOMetric(
+                anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
+        elif reader_cfg['metric'] == 'VOC':
+            metric = VOCMetric(
+                label_list=dataset.get_label_list(),
+                class_num=reader_cfg['num_classes'],
+                map_type=reader_cfg['map_type'])
+        elif reader_cfg['metric'] == 'KeyPointTopDownCOCOEval':
+            anno_file = dataset.get_anno()
+            metric = KeyPointTopDownCOCOEval(anno_file,
+                                             len(dataset), 17, 'output_eval')
+        else:
+            raise ValueError("metric currently only supports COCO and VOC.")
+        global_config['metric'] = metric
+    else:
+        eval_func = None
+
+    ac = AutoCompression(
+        model_dir=global_config["model_dir"],
+        model_filename=global_config["model_filename"],
+        params_filename=global_config["params_filename"],
+        save_dir=FLAGS.save_dir,
+        config=all_config,
+        train_dataloader=train_loader,
+        eval_callback=eval_func)
+    ac.compress()
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
+    paddle.set_device(FLAGS.devices)
+
+    main()