移动paddle_detection

This commit is contained in:
2024-09-24 17:02:56 +08:00
parent 90a6d5ec75
commit 3438cf6e0e
2025 changed files with 11 additions and 11 deletions

View File

@@ -0,0 +1,68 @@
# 1. TTFNet
## 简介
TTFNet是一种用于实时目标检测且对训练时间友好的网络对CenterNet收敛速度慢的问题进行改进提出了利用高斯核生成训练样本的新方法有效的消除了anchor-free head中存在的模糊性。同时简单轻量化的网络结构也易于进行任务扩展。
**特点:**
结构简单仅需要两个head检测目标位置和大小并且去除了耗时的后处理操作
训练时间短基于DarkNet53的骨干网路V100 8卡仅需要训练2个小时即可达到较好的模型效果
## Model Zoo
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 |
| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
| DarkNet53 | TTFNet | 12 | 1x | ---- | 33.5 | [下载链接](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) |
# 2. PAFNet
## 简介
PAFNetPaddle Anchor Free是PaddleDetection基于TTFNet的优化模型精度达到anchor free领域SOTA水平同时产出移动端轻量级模型PAFNet-Lite
PAFNet系列模型从如下方面优化TTFNet模型
- [CutMix](https://arxiv.org/abs/1905.04899)
- 更优的骨干网络: ResNet50vd-DCN
- 更大的训练batch size: 8 GPUs每GPU batch_size=18
- Synchronized Batch Normalization
- [Deformable Convolution](https://arxiv.org/abs/1703.06211)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- 更优的预训练模型
## 模型库
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 |
| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
| ResNet50vd | PAFNet | 18 | 10x | ---- | 39.8 | [下载链接](https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_10x_coco.yml) |
### PAFNet-Lite
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 | Box AP | 麒麟990延时ms | 体积M | 下载 | 配置文件 |
| :-------------- | :------------- | :-----: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
| MobileNetv3 | PAFNet-Lite | 12 | 20x | 23.9 | 26.00 | 14 | [下载链接](https://paddledet.bj.bcebos.com/models/pafnet_lite_mobilenet_v3_20x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml) |
**注意:** 由于动态图框架整体升级PAFNet的PaddleDetection发布的权重模型评估时需要添加--bias字段, 例如
```bash
# 使用PaddleDetection发布的权重
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/pafnet_10x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams --bias
```
## Citations
```
@article{liu2019training,
title = {Training-Time-Friendly Network for Real-Time Object Detection},
author = {Zili Liu, Tu Zheng, Guodong Xu, Zheng Yang, Haifeng Liu, Deng Cai},
journal = {arXiv preprint arXiv:1909.00700},
year = {2019}
}
```

View File

@@ -0,0 +1,69 @@
# 1. TTFNet
## Introduction
TTFNet is a network used for real-time object detection and friendly to training time. It improves the slow convergence speed of CenterNet and proposes a new method to generate training samples using Gaussian kernel, which effectively eliminates the fuzziness existing in Anchor Free head. At the same time, the simple and lightweight network structure is also easy to expand the task.
**Characteristics:**
The structure is simple, requiring only two heads to detect target position and size, and eliminating time-consuming post-processing operations
The training time is short. Based on DarkNet53 backbone network, V100 8 cards only need 2 hours of training to achieve better model effect
## Model Zoo
| Backbone | Network type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Download | Configuration File |
| :-------- | :----------- | :----------------------: | :--------------------: | :-----------------: | :----: | :------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: |
| DarkNet53 | TTFNet | 12 | 1x | ---- | 33.5 | [link](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) |
# 2. PAFNet
## Introduction
PAFNet (Paddle Anchor Free) is an optimized model of PaddleDetection based on TTF Net, whose accuracy reaches the SOTA level in the Anchor Free field, and meanwhile produces mobile lightweight model PAFNet-Lite
PAFNet series models optimize TTFNet model from the following aspects:
- [CutMix](https://arxiv.org/abs/1905.04899)
- Better backbone network: ResNet50vd-DCN
- Larger training batch size: 8 GPUs, each GPU batch size=18
- Synchronized Batch Normalization
- [Deformable Convolution](https://arxiv.org/abs/1703.06211)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- Better pretraining model
## Model library
| Backbone | Net type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Download | Configuration File |
| :--------- | :------- | :----------------------: | :--------------------: | :-----------------: | :----: | :---------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: |
| ResNet50vd | PAFNet | 18 | 10x | ---- | 39.8 | [link](https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_10x_coco.yml) |
### PAFNet-Lite
| Backbone | Net type | Number of images per GPU | Learning rate strategy | Box AP | kirin 990 delayms | volumeM | Download | Configuration File |
| :---------- | :---------- | :----------------------: | :--------------------: | :----: | :-------------------: | :---------: | :---------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: |
| MobileNetv3 | PAFNet-Lite | 12 | 20x | 23.9 | 26.00 | 14 | [link](https://paddledet.bj.bcebos.com/models/pafnet_lite_mobilenet_v3_20x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml) |
**Attention:** Due to the overall upgrade of the dynamic graph framework, the weighting model published by PaddleDetection of PAF Net needs to be evaluated with a --bias field, for example
```bash
# Published weights using Paddle Detection
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/pafnet_10x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams --bias
```
## Citations
```
@article{liu2019training,
title = {Training-Time-Friendly Network for Real-Time Object Detection},
author = {Zili Liu, Tu Zheng, Guodong Xu, Zheng Yang, Haifeng Liu, Deng Cai},
journal = {arXiv preprint arXiv:1909.00700},
year = {2019}
}
```

View File

@@ -0,0 +1,19 @@
epoch: 120
LearningRate:
base_lr: 0.015
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [80, 110]
- !LinearWarmup
start_factor: 0.2
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0004
type: L2

View File

@@ -0,0 +1,19 @@
epoch: 12
LearningRate:
base_lr: 0.015
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [8, 11]
- !LinearWarmup
start_factor: 0.2
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0004
type: L2

View File

@@ -0,0 +1,20 @@
epoch: 240
LearningRate:
base_lr: 0.015
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [160, 220]
- !LinearWarmup
start_factor: 0.2
steps: 1000
OptimizerBuilder:
clip_grad_by_norm: 35
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0004
type: L2

View File

@@ -0,0 +1,40 @@
architecture: TTFNet
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
TTFNet:
backbone: ResNet
neck: TTFFPN
ttf_head: TTFHead
post_process: BBoxPostProcess
ResNet:
depth: 50
variant: d
return_idx: [0, 1, 2, 3]
freeze_at: -1
norm_decay: 0.
dcn_v2_stages: [1, 2, 3]
TTFFPN:
planes: [256, 128, 64]
shortcut_num: [3, 2, 1]
TTFHead:
dcn_head: true
hm_loss:
name: CTFocalLoss
loss_weight: 1.
wh_loss:
name: GIoULoss
loss_weight: 5.
reduction: sum
BBoxPostProcess:
decode:
name: TTFBox
max_per_img: 100
score_thresh: 0.01
down_ratio: 4

View File

@@ -0,0 +1,44 @@
architecture: TTFNet
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams
norm_type: sync_bn
TTFNet:
backbone: MobileNetV3
neck: TTFFPN
ttf_head: TTFHead
post_process: BBoxPostProcess
MobileNetV3:
scale: 1.0
model_name: large
feature_maps: [5, 8, 14, 17]
with_extra_blocks: true
lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75]
conv_decay: 0.00001
norm_decay: 0.0
extra_block_filters: []
TTFFPN:
planes: [96, 48, 24]
shortcut_num: [2, 2, 1]
lite_neck: true
fusion_method: concat
TTFHead:
hm_head_planes: 48
wh_head_planes: 24
lite_head: true
hm_loss:
name: CTFocalLoss
loss_weight: 1.
wh_loss:
name: GIoULoss
loss_weight: 5.
reduction: sum
BBoxPostProcess:
decode:
name: TTFBox
max_per_img: 100
score_thresh: 0.01
down_ratio: 4

View File

@@ -0,0 +1,37 @@
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
- RandomDistort: {brightness: [-32., 32., 0.5], random_apply: False, random_channel: True}
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
- RandomCrop: {aspect_ratio: NULL, cover_all_box: True}
- RandomFlip: {}
- GridMask: {upper_iter: 300000}
batch_transforms:
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512], random_interp: True, keep_ratio: False}
- NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false}
- Permute: {}
- Gt2TTFTarget: {down_ratio: 4}
- PadBatch: {pad_to_stride: 32}
batch_size: 12
shuffle: true
drop_last: true
use_shared_memory: true
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 1, target_size: [320, 320], keep_ratio: False}
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
- Permute: {}
batch_size: 1
drop_last: false
TestReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 1, target_size: [320, 320], keep_ratio: False}
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
- Permute: {}
batch_size: 1
drop_last: false

View File

@@ -0,0 +1,36 @@
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
- RandomDistort: {brightness: [-32., 32., 0.5], random_apply: false, random_channel: true}
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
- RandomCrop: {aspect_ratio: NULL, cover_all_box: True}
- RandomFlip: {prob: 0.5}
batch_transforms:
- BatchRandomResize: {target_size: [416, 448, 480, 512, 544, 576, 608, 640, 672], keep_ratio: false}
- NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false}
- Permute: {}
- Gt2TTFTarget: {down_ratio: 4}
- PadBatch: {pad_to_stride: 32}
batch_size: 18
shuffle: true
drop_last: true
use_shared_memory: true
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
- Permute: {}
batch_size: 1
drop_last: false
TestReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
- Permute: {}
batch_size: 1
drop_last: false

View File

@@ -0,0 +1,35 @@
architecture: TTFNet
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
TTFNet:
backbone: DarkNet
neck: TTFFPN
ttf_head: TTFHead
post_process: BBoxPostProcess
DarkNet:
depth: 53
freeze_at: 0
return_idx: [1, 2, 3, 4]
norm_type: bn
norm_decay: 0.0004
TTFFPN:
planes: [256, 128, 64]
shortcut_num: [3, 2, 1]
TTFHead:
hm_loss:
name: CTFocalLoss
loss_weight: 1.
wh_loss:
name: GIoULoss
loss_weight: 5.
reduction: sum
BBoxPostProcess:
decode:
name: TTFBox
max_per_img: 100
score_thresh: 0.01
down_ratio: 4

View File

@@ -0,0 +1,33 @@
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
- RandomFlip: {prob: 0.5}
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
- NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false}
- Permute: {}
batch_transforms:
- Gt2TTFTarget: {down_ratio: 4}
- PadBatch: {pad_to_stride: 32}
batch_size: 12
shuffle: true
drop_last: true
use_shared_memory: true
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
- Permute: {}
batch_size: 1
drop_last: false
TestReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
- Permute: {}
batch_size: 1
drop_last: false

View File

@@ -0,0 +1,8 @@
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'_base_/optimizer_10x.yml',
'_base_/pafnet.yml',
'_base_/pafnet_reader.yml',
]
weights: output/pafnet_10x_coco/model_final

View File

@@ -0,0 +1,8 @@
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'_base_/optimizer_20x.yml',
'_base_/pafnet_lite.yml',
'_base_/pafnet_lite_reader.yml',
]
weights: output/pafnet_lite_mobilenet_v3_10x_coco/model_final

View File

@@ -0,0 +1,8 @@
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'_base_/optimizer_1x.yml',
'_base_/ttfnet_darknet53.yml',
'_base_/ttfnet_reader.yml',
]
weights: output/ttfnet_darknet53_1x_coco/model_final