移动paddle_detection
This commit is contained in:
@@ -0,0 +1,68 @@
|
||||
# 1. TTFNet
|
||||
|
||||
## 简介
|
||||
|
||||
TTFNet是一种用于实时目标检测且对训练时间友好的网络,对CenterNet收敛速度慢的问题进行改进,提出了利用高斯核生成训练样本的新方法,有效的消除了anchor-free head中存在的模糊性。同时简单轻量化的网络结构也易于进行任务扩展。
|
||||
|
||||
**特点:**
|
||||
|
||||
结构简单,仅需要两个head检测目标位置和大小,并且去除了耗时的后处理操作
|
||||
训练时间短,基于DarkNet53的骨干网路,V100 8卡仅需要训练2个小时即可达到较好的模型效果
|
||||
|
||||
## Model Zoo
|
||||
|
||||
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 |
|
||||
| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
|
||||
| DarkNet53 | TTFNet | 12 | 1x | ---- | 33.5 | [下载链接](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) |
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# 2. PAFNet
|
||||
|
||||
## 简介
|
||||
|
||||
PAFNet(Paddle Anchor Free)是PaddleDetection基于TTFNet的优化模型,精度达到anchor free领域SOTA水平,同时产出移动端轻量级模型PAFNet-Lite
|
||||
|
||||
PAFNet系列模型从如下方面优化TTFNet模型:
|
||||
|
||||
- [CutMix](https://arxiv.org/abs/1905.04899)
|
||||
- 更优的骨干网络: ResNet50vd-DCN
|
||||
- 更大的训练batch size: 8 GPUs,每GPU batch_size=18
|
||||
- Synchronized Batch Normalization
|
||||
- [Deformable Convolution](https://arxiv.org/abs/1703.06211)
|
||||
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
|
||||
- 更优的预训练模型
|
||||
|
||||
|
||||
## 模型库
|
||||
|
||||
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 |
|
||||
| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
|
||||
| ResNet50vd | PAFNet | 18 | 10x | ---- | 39.8 | [下载链接](https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_10x_coco.yml) |
|
||||
|
||||
|
||||
|
||||
### PAFNet-Lite
|
||||
|
||||
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 | Box AP | 麒麟990延时(ms) | 体积(M) | 下载 | 配置文件 |
|
||||
| :-------------- | :------------- | :-----: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
|
||||
| MobileNetv3 | PAFNet-Lite | 12 | 20x | 23.9 | 26.00 | 14 | [下载链接](https://paddledet.bj.bcebos.com/models/pafnet_lite_mobilenet_v3_20x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml) |
|
||||
|
||||
**注意:** 由于动态图框架整体升级,PAFNet的PaddleDetection发布的权重模型评估时需要添加--bias字段, 例如
|
||||
|
||||
```bash
|
||||
# 使用PaddleDetection发布的权重
|
||||
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/pafnet_10x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams --bias
|
||||
```
|
||||
|
||||
## Citations
|
||||
```
|
||||
@article{liu2019training,
|
||||
title = {Training-Time-Friendly Network for Real-Time Object Detection},
|
||||
author = {Zili Liu, Tu Zheng, Guodong Xu, Zheng Yang, Haifeng Liu, Deng Cai},
|
||||
journal = {arXiv preprint arXiv:1909.00700},
|
||||
year = {2019}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,69 @@
|
||||
# 1. TTFNet
|
||||
|
||||
## Introduction
|
||||
|
||||
TTFNet is a network used for real-time object detection and friendly to training time. It improves the slow convergence speed of CenterNet and proposes a new method to generate training samples using Gaussian kernel, which effectively eliminates the fuzziness existing in Anchor Free head. At the same time, the simple and lightweight network structure is also easy to expand the task.
|
||||
|
||||
|
||||
**Characteristics:**
|
||||
|
||||
The structure is simple, requiring only two heads to detect target position and size, and eliminating time-consuming post-processing operations
|
||||
The training time is short. Based on DarkNet53 backbone network, V100 8 cards only need 2 hours of training to achieve better model effect
|
||||
|
||||
## Model Zoo
|
||||
|
||||
| Backbone | Network type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Download | Configuration File |
|
||||
| :-------- | :----------- | :----------------------: | :--------------------: | :-----------------: | :----: | :------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: |
|
||||
| DarkNet53 | TTFNet | 12 | 1x | ---- | 33.5 | [link](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) |
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# 2. PAFNet
|
||||
|
||||
## Introduction
|
||||
|
||||
PAFNet (Paddle Anchor Free) is an optimized model of PaddleDetection based on TTF Net, whose accuracy reaches the SOTA level in the Anchor Free field, and meanwhile produces mobile lightweight model PAFNet-Lite
|
||||
|
||||
PAFNet series models optimize TTFNet model from the following aspects:
|
||||
|
||||
- [CutMix](https://arxiv.org/abs/1905.04899)
|
||||
- Better backbone network: ResNet50vd-DCN
|
||||
- Larger training batch size: 8 GPUs, each GPU batch size=18
|
||||
- Synchronized Batch Normalization
|
||||
- [Deformable Convolution](https://arxiv.org/abs/1703.06211)
|
||||
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
|
||||
- Better pretraining model
|
||||
|
||||
|
||||
## Model library
|
||||
|
||||
| Backbone | Net type | Number of images per GPU | Learning rate strategy | Inferring time(fps) | Box AP | Download | Configuration File |
|
||||
| :--------- | :------- | :----------------------: | :--------------------: | :-----------------: | :----: | :---------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: |
|
||||
| ResNet50vd | PAFNet | 18 | 10x | ---- | 39.8 | [link](https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_10x_coco.yml) |
|
||||
|
||||
|
||||
|
||||
### PAFNet-Lite
|
||||
|
||||
| Backbone | Net type | Number of images per GPU | Learning rate strategy | Box AP | kirin 990 delay(ms) | volume(M) | Download | Configuration File |
|
||||
| :---------- | :---------- | :----------------------: | :--------------------: | :----: | :-------------------: | :---------: | :---------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------: |
|
||||
| MobileNetv3 | PAFNet-Lite | 12 | 20x | 23.9 | 26.00 | 14 | [link](https://paddledet.bj.bcebos.com/models/pafnet_lite_mobilenet_v3_20x_coco.pdparams) | [Configuration File](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml) |
|
||||
|
||||
**Attention:** Due to the overall upgrade of the dynamic graph framework, the weighting model published by PaddleDetection of PAF Net needs to be evaluated with a --bias field, for example
|
||||
|
||||
```bash
|
||||
# Published weights using Paddle Detection
|
||||
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/pafnet_10x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams --bias
|
||||
```
|
||||
|
||||
## Citations
|
||||
```
|
||||
@article{liu2019training,
|
||||
title = {Training-Time-Friendly Network for Real-Time Object Detection},
|
||||
author = {Zili Liu, Tu Zheng, Guodong Xu, Zheng Yang, Haifeng Liu, Deng Cai},
|
||||
journal = {arXiv preprint arXiv:1909.00700},
|
||||
year = {2019}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,19 @@
|
||||
epoch: 120
|
||||
|
||||
LearningRate:
|
||||
base_lr: 0.015
|
||||
schedulers:
|
||||
- !PiecewiseDecay
|
||||
gamma: 0.1
|
||||
milestones: [80, 110]
|
||||
- !LinearWarmup
|
||||
start_factor: 0.2
|
||||
steps: 500
|
||||
|
||||
OptimizerBuilder:
|
||||
optimizer:
|
||||
momentum: 0.9
|
||||
type: Momentum
|
||||
regularizer:
|
||||
factor: 0.0004
|
||||
type: L2
|
||||
@@ -0,0 +1,19 @@
|
||||
epoch: 12
|
||||
|
||||
LearningRate:
|
||||
base_lr: 0.015
|
||||
schedulers:
|
||||
- !PiecewiseDecay
|
||||
gamma: 0.1
|
||||
milestones: [8, 11]
|
||||
- !LinearWarmup
|
||||
start_factor: 0.2
|
||||
steps: 500
|
||||
|
||||
OptimizerBuilder:
|
||||
optimizer:
|
||||
momentum: 0.9
|
||||
type: Momentum
|
||||
regularizer:
|
||||
factor: 0.0004
|
||||
type: L2
|
||||
@@ -0,0 +1,20 @@
|
||||
epoch: 240
|
||||
|
||||
LearningRate:
|
||||
base_lr: 0.015
|
||||
schedulers:
|
||||
- !PiecewiseDecay
|
||||
gamma: 0.1
|
||||
milestones: [160, 220]
|
||||
- !LinearWarmup
|
||||
start_factor: 0.2
|
||||
steps: 1000
|
||||
|
||||
OptimizerBuilder:
|
||||
clip_grad_by_norm: 35
|
||||
optimizer:
|
||||
momentum: 0.9
|
||||
type: Momentum
|
||||
regularizer:
|
||||
factor: 0.0004
|
||||
type: L2
|
||||
@@ -0,0 +1,40 @@
|
||||
architecture: TTFNet
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
|
||||
norm_type: sync_bn
|
||||
use_ema: true
|
||||
ema_decay: 0.9998
|
||||
|
||||
TTFNet:
|
||||
backbone: ResNet
|
||||
neck: TTFFPN
|
||||
ttf_head: TTFHead
|
||||
post_process: BBoxPostProcess
|
||||
|
||||
ResNet:
|
||||
depth: 50
|
||||
variant: d
|
||||
return_idx: [0, 1, 2, 3]
|
||||
freeze_at: -1
|
||||
norm_decay: 0.
|
||||
dcn_v2_stages: [1, 2, 3]
|
||||
|
||||
TTFFPN:
|
||||
planes: [256, 128, 64]
|
||||
shortcut_num: [3, 2, 1]
|
||||
|
||||
TTFHead:
|
||||
dcn_head: true
|
||||
hm_loss:
|
||||
name: CTFocalLoss
|
||||
loss_weight: 1.
|
||||
wh_loss:
|
||||
name: GIoULoss
|
||||
loss_weight: 5.
|
||||
reduction: sum
|
||||
|
||||
BBoxPostProcess:
|
||||
decode:
|
||||
name: TTFBox
|
||||
max_per_img: 100
|
||||
score_thresh: 0.01
|
||||
down_ratio: 4
|
||||
@@ -0,0 +1,44 @@
|
||||
architecture: TTFNet
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams
|
||||
norm_type: sync_bn
|
||||
|
||||
TTFNet:
|
||||
backbone: MobileNetV3
|
||||
neck: TTFFPN
|
||||
ttf_head: TTFHead
|
||||
post_process: BBoxPostProcess
|
||||
|
||||
MobileNetV3:
|
||||
scale: 1.0
|
||||
model_name: large
|
||||
feature_maps: [5, 8, 14, 17]
|
||||
with_extra_blocks: true
|
||||
lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75]
|
||||
conv_decay: 0.00001
|
||||
norm_decay: 0.0
|
||||
extra_block_filters: []
|
||||
|
||||
TTFFPN:
|
||||
planes: [96, 48, 24]
|
||||
shortcut_num: [2, 2, 1]
|
||||
lite_neck: true
|
||||
fusion_method: concat
|
||||
|
||||
TTFHead:
|
||||
hm_head_planes: 48
|
||||
wh_head_planes: 24
|
||||
lite_head: true
|
||||
hm_loss:
|
||||
name: CTFocalLoss
|
||||
loss_weight: 1.
|
||||
wh_loss:
|
||||
name: GIoULoss
|
||||
loss_weight: 5.
|
||||
reduction: sum
|
||||
|
||||
BBoxPostProcess:
|
||||
decode:
|
||||
name: TTFBox
|
||||
max_per_img: 100
|
||||
score_thresh: 0.01
|
||||
down_ratio: 4
|
||||
@@ -0,0 +1,37 @@
|
||||
worker_num: 2
|
||||
TrainReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- RandomDistort: {brightness: [-32., 32., 0.5], random_apply: False, random_channel: True}
|
||||
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
|
||||
- RandomCrop: {aspect_ratio: NULL, cover_all_box: True}
|
||||
- RandomFlip: {}
|
||||
- GridMask: {upper_iter: 300000}
|
||||
batch_transforms:
|
||||
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512], random_interp: True, keep_ratio: False}
|
||||
- NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false}
|
||||
- Permute: {}
|
||||
- Gt2TTFTarget: {down_ratio: 4}
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 12
|
||||
shuffle: true
|
||||
drop_last: true
|
||||
use_shared_memory: true
|
||||
|
||||
EvalReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {interp: 1, target_size: [320, 320], keep_ratio: False}
|
||||
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
|
||||
- Permute: {}
|
||||
batch_size: 1
|
||||
drop_last: false
|
||||
|
||||
TestReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {interp: 1, target_size: [320, 320], keep_ratio: False}
|
||||
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
|
||||
- Permute: {}
|
||||
batch_size: 1
|
||||
drop_last: false
|
||||
@@ -0,0 +1,36 @@
|
||||
worker_num: 2
|
||||
TrainReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- RandomDistort: {brightness: [-32., 32., 0.5], random_apply: false, random_channel: true}
|
||||
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
|
||||
- RandomCrop: {aspect_ratio: NULL, cover_all_box: True}
|
||||
- RandomFlip: {prob: 0.5}
|
||||
batch_transforms:
|
||||
- BatchRandomResize: {target_size: [416, 448, 480, 512, 544, 576, 608, 640, 672], keep_ratio: false}
|
||||
- NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false}
|
||||
- Permute: {}
|
||||
- Gt2TTFTarget: {down_ratio: 4}
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 18
|
||||
shuffle: true
|
||||
drop_last: true
|
||||
use_shared_memory: true
|
||||
|
||||
EvalReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
|
||||
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
|
||||
- Permute: {}
|
||||
batch_size: 1
|
||||
drop_last: false
|
||||
|
||||
TestReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
|
||||
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
|
||||
- Permute: {}
|
||||
batch_size: 1
|
||||
drop_last: false
|
||||
@@ -0,0 +1,35 @@
|
||||
architecture: TTFNet
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
|
||||
|
||||
TTFNet:
|
||||
backbone: DarkNet
|
||||
neck: TTFFPN
|
||||
ttf_head: TTFHead
|
||||
post_process: BBoxPostProcess
|
||||
|
||||
DarkNet:
|
||||
depth: 53
|
||||
freeze_at: 0
|
||||
return_idx: [1, 2, 3, 4]
|
||||
norm_type: bn
|
||||
norm_decay: 0.0004
|
||||
|
||||
TTFFPN:
|
||||
planes: [256, 128, 64]
|
||||
shortcut_num: [3, 2, 1]
|
||||
|
||||
TTFHead:
|
||||
hm_loss:
|
||||
name: CTFocalLoss
|
||||
loss_weight: 1.
|
||||
wh_loss:
|
||||
name: GIoULoss
|
||||
loss_weight: 5.
|
||||
reduction: sum
|
||||
|
||||
BBoxPostProcess:
|
||||
decode:
|
||||
name: TTFBox
|
||||
max_per_img: 100
|
||||
score_thresh: 0.01
|
||||
down_ratio: 4
|
||||
@@ -0,0 +1,33 @@
|
||||
worker_num: 2
|
||||
TrainReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- RandomFlip: {prob: 0.5}
|
||||
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
|
||||
- NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false}
|
||||
- Permute: {}
|
||||
batch_transforms:
|
||||
- Gt2TTFTarget: {down_ratio: 4}
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 12
|
||||
shuffle: true
|
||||
drop_last: true
|
||||
use_shared_memory: true
|
||||
|
||||
EvalReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
|
||||
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
|
||||
- Permute: {}
|
||||
batch_size: 1
|
||||
drop_last: false
|
||||
|
||||
TestReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
|
||||
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
|
||||
- Permute: {}
|
||||
batch_size: 1
|
||||
drop_last: false
|
||||
@@ -0,0 +1,8 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_10x.yml',
|
||||
'_base_/pafnet.yml',
|
||||
'_base_/pafnet_reader.yml',
|
||||
]
|
||||
weights: output/pafnet_10x_coco/model_final
|
||||
@@ -0,0 +1,8 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_20x.yml',
|
||||
'_base_/pafnet_lite.yml',
|
||||
'_base_/pafnet_lite_reader.yml',
|
||||
]
|
||||
weights: output/pafnet_lite_mobilenet_v3_10x_coco/model_final
|
||||
@@ -0,0 +1,8 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_1x.yml',
|
||||
'_base_/ttfnet_darknet53.yml',
|
||||
'_base_/ttfnet_reader.yml',
|
||||
]
|
||||
weights: output/ttfnet_darknet53_1x_coco/model_final
|
||||
Reference in New Issue
Block a user