Echo/fcb_photo_review

Fork 0

Files

liuyebo 1514e09c40 更换文档检测模型

2024-08-27 14:42:45 +08:00

13 KiB

Raw Blame History

自动化压缩

1.简介
2.Benchmark
3.开始自动压缩
4.预测部署

1. 简介

本示例使用PaddleDetection中Inference部署模型进行自动化压缩，使用的自动化压缩策略为量化蒸馏。

2.Benchmark

PP-YOLOE+

模型	Base mAP	离线量化mAP	ACT量化mAP	TRT-FP32	TRT-FP16	TRT-INT8	配置文件	量化模型
PP-YOLOE+_s	43.7	-	42.9	-	-	-	config	Quant Model
PP-YOLOE+_m	49.8	-	49.3	-	-	-	config	Quant Model
PP-YOLOE+_l	52.9	-	52.6	-	-	-	config	Quant Model
PP-YOLOE+_x	54.7	-	54.4	-	-	-	config	Quant Model

mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。

YOLOv8

模型	Base mAP	离线量化mAP	ACT量化mAP	TRT-FP32	TRT-FP16	TRT-INT8	配置文件	量化模型
YOLOv8-s	44.9	43.9	44.3	9.27ms	4.65ms	3.78ms	config	Model

注意：

表格中YOLOv8模型均为带NMS的模型，可直接在TRT中部署，如果需要对齐测试标准，需要测试不带NMS的模型。
mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。
表格中的性能在Tesla T4的GPU环境下测试，并且开启TensorRT，batch_size=1。

PP-YOLOE

模型	Base mAP	离线量化mAP	ACT量化mAP	TRT-FP32	TRT-FP16	TRT-INT8	配置文件	量化模型
PP-YOLOE-l	50.9	-	50.6	11.2ms	7.7ms	6.7ms	config	Quant Model
PP-YOLOE-SOD	38.5	-	37.6	-	-	-	config	Quant Model

git

PP-YOLOE-l mAP的指标在COCO val2017数据集中评测得到，IoU=0.5:0.95。
PP-YOLOE-l模型在Tesla V100的GPU环境下测试，并且开启TensorRT，batch_size=1，包含NMS，测试脚本是benchmark demo。
PP-YOLOE-SOD 的指标在VisDrone-DET数据集切图后的COCO格式数据集中评测得到，IoU=0.5:0.95。定义文件ppyoloe_crn_l_80e_sliced_visdrone_640_025.yml

PP-PicoDet

模型	策略	mAP	FP32	FP16	INT8	配置文件	模型
PicoDet-S-NPU	Baseline	30.1	-	-	-	config	Model
PicoDet-S-NPU	量化训练	29.7	-	-	-	config	Model

mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。

RT-DETR

模型	Base mAP	ACT量化mAP	TRT-FP32	TRT-FP16	TRT-INT8	配置文件	量化模型
RT-DETR-R50	53.1	53.0	32.05ms	9.12ms	6.96ms	config	Model
RT-DETR-R101	54.3	54.1	54.13ms	12.68ms	9.20ms	config	Model
RT-DETR-HGNetv2-L	53.0	52.9	26.16ms	8.54ms	6.65ms	config	Model
RT-DETR-HGNetv2-X	54.8	54.6	49.22ms	12.50ms	9.24ms	config	Model

上表测试环境：Tesla T4，TensorRT 8.6.0，CUDA 11.7，batch_size=1。

模型	Base mAP	ACT量化mAP	TRT-FP32	TRT-FP16	TRT-INT8	配置文件	量化模型
RT-DETR-R50	53.1	53.0	9.64ms	5.00ms	3.99ms	config	Model
RT-DETR-R101	54.3	54.1	14.93ms	7.15ms	5.12ms	config	Model
RT-DETR-HGNetv2-L	53.0	52.9	8.17ms	4.77ms	4.00ms	config	Model
RT-DETR-HGNetv2-X	54.8	54.6	12.81ms	6.97ms	5.32ms	config	Model

上表测试环境：A10，TensorRT 8.6.0，CUDA 11.6，batch_size=1。
mAP的指标均在COCO val2017数据集中评测得到，IoU=0.5:0.95。

3. 自动压缩流程

3.1 准备环境

PaddlePaddle >= 2.4 （可从Paddle官网下载安装）
PaddleSlim >= 2.4.1
PaddleDet >= 2.5
opencv-python

安装paddlepaddle：

# CPU
pip install paddlepaddle
# GPU
pip install paddlepaddle-gpu

安装paddleslim：

pip install paddleslim

安装paddledet：

pip install paddledet

注意： YOLOv8模型的自动化压缩需要依赖安装最新Develop Paddle和Develop PaddleSlim版本。

3.2 准备数据集

本案例默认以COCO数据进行自动压缩实验，如果自定义COCO数据，或者其他格式数据，请参考数据准备文档来准备数据。

如果数据集为非COCO格式数据，请修改configs中reader配置文件中的Dataset字段。

以PP-YOLOE模型为例，如果已经准备好数据集，请直接修改[./configs/yolo_reader.yml]中EvalDataset的dataset_dir字段为自己数据集路径即可。

3.3 准备预测模型

预测模型的格式为：model.pdmodel 和 model.pdiparams两个，带pdmodel的是模型文件，带pdiparams后缀的是权重文件。

根据PaddleDetection文档导出Inference模型，具体可参考下方PP-YOLOE模型的导出示例：

下载代码

git clone https://github.com/PaddlePaddle/PaddleDetection.git

导出预测模型

PPYOLOE-l模型，包含NMS：如快速体验，可直接下载PP-YOLOE-l导出模型

python tools/export_model.py \
        -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml \
        -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams \
        trt=True \

YOLOv8-s模型，包含NMS，具体可参考YOLOv8模型文档, 然后执行：

python tools/export_model.py \
        -c configs/yolov8/yolov8_s_500e_coco.yml \
        -o weights=https://paddledet.bj.bcebos.com/models/yolov8_s_500e_coco.pdparams \
        trt=True

如快速体验，可直接下载YOLOv8-s导出模型

3.4 自动压缩并产出模型

蒸馏量化自动压缩示例通过run.py脚本启动，会使用接口paddleslim.auto_compression.AutoCompression对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数，配置完成后便可对模型进行量化和蒸馏。具体运行命令为：

单卡训练：

export CUDA_VISIBLE_DEVICES=0
python run.py --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'

多卡训练：

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
          --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'

3.5 测试模型精度

使用eval.py脚本得到模型的mAP：

export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml

使用paddle inference并使用trt int8得到模型的mAP:

export CUDA_VISIBLE_DEVICES=0
python paddle_inference_eval.py --model_path ./output/ --reader_config configs/ppyoloe_reader.yml --precision int8 --use_trt=True

注意：

要测试的模型路径可以在配置文件中model_dir字段下进行修改。
--precision 默认为paddle，如果使用trt，需要设置--use_trt=True，同时--precision 可设置为fp32/fp16/int8

4.预测部署

可以参考PaddleDetection部署教程，GPU上量化模型开启TensorRT并设置trt_int8模式进行部署。

13 KiB Raw Blame History Unescape Escape