26 KiB
26 KiB
Model Compression
In PaddleDetection, a complete tutorial and benchmarks for model compression based on PaddleSlim are provided. Currently supported methods:
It is recommended that you use a combination of pruning and distillation training, or use pruning and quantization for test model compression. The following takes YOLOv3 as an example to carry out cutting, distillation and quantization experiments.
Experimental Environment
- Python 3.7+
- PaddlePaddle >= 2.1.0
- PaddleSlim >= 2.1.0
- CUDA 10.1+
- cuDNN >=7.6.5
Version Dependency between PaddleDetection, Paddle and PaddleSlim Version
| PaddleDetection Version | PaddlePaddle Version | PaddleSlim Version | Note |
|---|---|---|---|
| release/2.1 | >= 2.1.0 | 2.1 | Quantitative model exports rely on the latest Paddle Develop branch, available inPaddlePaddle Daily version |
| release/2.0 | >= 2.0.1 | 2.0 | Quantization depends on Paddle 2.1 and PaddleSlim 2.1 |
Install PaddleSlim
- Method 1: Install it directly:
pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
- Method 2: Compile and install:
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd PaddleSlim
python setup.py install
Quick Start
Train
python tools/train.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml}
-c: Specify the model configuration file.--slim_config: Specify the compression policy profile.- If you want to use distillation, please refer to Distillation Doc for specific distillation methods and more distillation of detection models.
Evaluation
python tools/eval.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final
-c: Specify the model configuration file.--slim_config: Specify the compression policy profile.-o weights: Specifies the path of the model trained by the compression algorithm.
Test
python tools/infer.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} \
-o weights=output/{SLIM_CONFIG}/model_final
--infer_img={IMAGE_PATH}
-c: Specify the model configuration file.--slim_config: Specify the compression policy profile.-o weights: Specifies the path of the model trained by the compression algorithm.--infer_img: Specifies the test image path.
Full Chain Deployment
the model is derived from moving to static
python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final
-c: Specify the model configuration file.--slim_config: Specify the compression policy profile.-o weights: Specifies the path of the model trained by the compression algorithm.
prediction and deployment
- Paddle-Inference Prediction:
- Server deployment: UsedPaddleServing
- Mobile deployment: UsePaddle-Lite Deploy it on the mobile terminal.
Benchmark
Pruning
Pascal VOC Benchmark
| Model | Compression Strategy | GFLOPs | Model Volume(MB) | Input Size | Predict Delay(SD855) | Box AP | Download | Model Configuration File | Compression Algorithm Configuration File |
|---|---|---|---|---|---|---|---|---|---|
| YOLOv3-MobileNetV1 | baseline | 24.13 | 93 | 608 | 332.0ms | 75.1 | link | configuration file | - |
| YOLOv3-MobileNetV1 | 剪裁-l1_norm(sensity) | 15.78(-34.49%) | 66(-29%) | 608 | - | 78.4(+3.3) | link | configuration file | slim configuration file |
COCO Benchmark
| Mode | Compression Strategy | GFLOPs | Model Volume(MB) | Input Size | Predict Delay(SD855) | Box AP | Download | Model Configuration File | Compression Algorithm Configuration File |
|---|---|---|---|---|---|---|---|---|---|
| PP-YOLO-MobileNetV3_large | baseline | -- | 18.5 | 608 | 25.1ms | 23.2 | link | configuration file | - |
| PP-YOLO-MobileNetV3_large | 剪裁-FPGM | -37% | 12.6 | 608 | - | 22.3 | link | configuration file | slim configuration file |
| YOLOv3-DarkNet53 | baseline | -- | 238.2 | 608 | - | 39.0 | link | configuration file | - |
| YOLOv3-DarkNet53 | 剪裁-FPGM | -24% | - | 608 | - | 37.6 | link | configuration file | slim configuration file |
| PP-YOLO_R50vd | baseline | -- | 183.3 | 608 | - | 44.8 | link | configuration file | - |
| PP-YOLO_R50vd | 剪裁-FPGM | -35% | - | 608 | - | 42.1 | link | configuration file | slim configuration file |
Description:
- Currently, all models except RCNN series models are supported.
- The SD855 predicts the delay for deployment using Paddle Lite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay.
Quantitative
COCO Benchmark
| Model | Compression Strategy | Input Size | Model Volume(MB) | Prediction Delay(V100) | Prediction Delay(SD855) | Box AP | Download | Download of Inference Model | Model Configuration File | Compression Algorithm Configuration File |
|---|---|---|---|---|---|---|---|---|---|---|
| PP-YOLOE-l | baseline | 640 | - | 11.2ms(trt_fp32) | 7.7ms(trt_fp16) | -- | 50.9 | link | - | Configuration File | - |
| PP-YOLOE-l | Common Online quantitative | 640 | - | 6.7ms(trt_int8) | -- | 48.8 | link | - | Configuration File | Configuration File |
| PP-YOLOv2_R50vd | baseline | 640 | 208.6 | 19.1ms | -- | 49.1 | link | link | Configuration File | - |
| PP-YOLOv2_R50vd | PACT Online quantitative | 640 | -- | 17.3ms | -- | 48.1 | link | link | Configuration File | Configuration File |
| PP-YOLO_R50vd | baseline | 608 | 183.3 | 17.4ms | -- | 44.8 | link | link | Configuration File | - |
| PP-YOLO_R50vd | PACT Online quantitative | 608 | 67.3 | 13.8ms | -- | 44.3 | link | link | Configuration File | Configuration File |
| PP-YOLO-MobileNetV3_large | baseline | 320 | 18.5 | 2.7ms | 27.9ms | 23.2 | link | link | Configuration File | - |
| PP-YOLO-MobileNetV3_large | Common Online quantitative | 320 | 5.6 | -- | 25.1ms | 24.3 | link | link | Configuration File | Configuration File |
| YOLOv3-MobileNetV1 | baseline | 608 | 94.2 | 8.9ms | 332ms | 29.4 | link | link | Configuration File | - |
| YOLOv3-MobileNetV1 | Common Online quantitative | 608 | 25.4 | 6.6ms | 248ms | 30.5 | link | link | Configuration File | slim Configuration File |
| YOLOv3-MobileNetV3 | baseline | 608 | 90.3 | 9.4ms | 367.2ms | 31.4 | link | link | Configuration File | - |
| YOLOv3-MobileNetV3 | PACT Online quantitative | 608 | 24.4 | 8.0ms | 280.0ms | 31.1 | link | link | Configuration File | slim Configuration File |
| YOLOv3-DarkNet53 | baseline | 608 | 238.2 | 16.0ms | -- | 39.0 | link | link | Configuration File | - |
| YOLOv3-DarkNet53 | Common Online quantitative | 608 | 78.8 | 12.4ms | -- | 38.8 | link | link | Configuration File | slim Configuration File |
| SSD-MobileNet_v1 | baseline | 300 | 22.5 | 4.4ms | 26.6ms | 73.8 | link | link | Configuration File | - |
| SSD-MobileNet_v1 | Common Online quantitative | 300 | 7.1 | -- | 21.5ms | 72.9 | link | link | Configuration File | slim Configuration File |
| Mask-ResNet50-FPN | baseline | (800, 1333) | 174.1 | 359.5ms | -- | 39.2/35.6 | link | link | Configuration File | - |
| Mask-ResNet50-FPN | Common Online quantitative | (800, 1333) | -- | -- | -- | 39.7(+0.5)/35.9(+0.3) | link | link | Configuration File | slim Configuration File |
Description:
- The above V100 prediction delay non-quantified model is tested by TensorRT FP32, and the quantified model is tested by TensorRT INT8, and both of them include NMS time.
- The SD855 predicts the delay for deployment using PaddleLite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay.
Distillation
COCO Benchmark
| Model | Compression Strategy | Input Size | Box AP | Download | Model Configuration File | Compression Strategy Configuration File |
|---|---|---|---|---|---|---|
| YOLOv3-MobileNetV1 | baseline | 608 | 29.4 | link | Configuration File | - |
| YOLOv3-MobileNetV1 | Distillation | 608 | 31.0(+1.6) | link | Configuration File | slimConfiguration File |
- For the specific distillation method and more distillation detection models, please refer to distill.
Distillation Pruning Combined Strategy
COCO Benchmark
| Model | Compression Strategy | Input Size | GFLOPs | Model Volume(MB) | Prediction Delay(SD855) | Box AP | Download | Model Configuration File | Compression Algorithm Configuration File |
|---|---|---|---|---|---|---|---|---|---|
| YOLOv3-MobileNetV1 | baseline | 608 | 24.65 | 94.2 | 332.0ms | 29.4 | link | Configuration File | - |
| YOLOv3-MobileNetV1 | Distillation + Tailoring | 608 | 7.54(-69.4%) | 30.9(-67.2%) | 166.1ms | 28.4(-1.0) | link | Configuration File | slimConfiguration File |