移动paddle_detection
This commit is contained in:
@@ -0,0 +1,229 @@
|
||||
# DETRs Beat YOLOs on Real-time Object Detection
|
||||
|
||||
## 最新动态
|
||||
|
||||
- 发布RT-DETR-R50和RT-DETR-R101的代码和预训练模型
|
||||
- 发布RT-DETR-L和RT-DETR-X的代码和预训练模型
|
||||
- 发布RT-DETR-R50-m模型(scale模型的范例)
|
||||
- 发布RT-DETR-R34模型
|
||||
- 发布RT-DETR-R18模型
|
||||
- 发布RT-DETR-Swin和RT-DETR-FocalNet模型
|
||||
- 发布RTDETR Obj365预训练模型
|
||||
|
||||
## 简介
|
||||
<!-- We propose a **R**eal-**T**ime **DE**tection **TR**ansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge. Specifically, we design an efficient hybrid encoder to efficiently process multi-scale features by decoupling the intra-scale interaction and cross-scale fusion, and propose IoU-aware query selection to improve the initialization of object queries. In addition, our proposed detector supports flexibly adjustment of the inference speed by using different decoder layers without the need for retraining, which facilitates the practical application of real-time object detectors. Our RT-DETR-L achieves 53.0% AP on COCO val2017 and 114 FPS on T4 GPU, while RT-DETR-X achieves 54.8% AP and 74 FPS, outperforming all YOLO detectors of the same scale in both speed and accuracy. Furthermore, our RT-DETR-R50 achieves 53.1% AP and 108 FPS, outperforming DINO-Deformable-DETR-R50 by 2.2% AP in accuracy and by about 21 times in FPS. -->
|
||||
RT-DETR是第一个实时端到端目标检测器。具体而言,我们设计了一个高效的混合编码器,通过解耦尺度内交互和跨尺度融合来高效处理多尺度特征,并提出了IoU感知的查询选择机制,以优化解码器查询的初始化。此外,RT-DETR支持通过使用不同的解码器层来灵活调整推理速度,而不需要重新训练,这有助于实时目标检测器的实际应用。RT-DETR-L在COCO val2017上实现了53.0%的AP,在T4 GPU上实现了114FPS,RT-DETR-X实现了54.8%的AP和74FPS,在速度和精度方面都优于相同规模的所有YOLO检测器。RT-DETR-R50实现了53.1%的AP和108FPS,RT-DETR-R101实现了54.3%的AP和74FPS,在精度上超过了全部使用相同骨干网络的DETR检测器。
|
||||
若要了解更多细节,请参考我们的论文[paper](https://arxiv.org/abs/2304.08069).
|
||||
|
||||
<div align="center">
|
||||
<img src="https://github.com/PaddlePaddle/PaddleDetection/assets/17582080/196b0a10-d2e8-401c-9132-54b9126e0a33" width=500 />
|
||||
</div>
|
||||
|
||||
## 基础模型
|
||||
|
||||
| Model | Epoch | Backbone | Input shape | $AP^{val}$ | $AP^{val}_{50}$| Params(M) | FLOPs(G) | T4 TensorRT FP16(FPS) | Pretrained Model | config |
|
||||
|:--------------:|:-----:|:----------:| :-------:|:--------------------------:|:---------------------------:|:---------:|:--------:| :---------------------: |:------------------------------------------------------------------------------------:|:-------------------------------------------:|
|
||||
| RT-DETR-R18 | 6x | ResNet-18 | 640 | 46.5 | 63.8 | 20 | 60 | 217 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r18vd_dec3_6x_coco.pdparams) | [config](./rtdetr_r18vd_6x_coco.yml)
|
||||
| RT-DETR-R34 | 6x | ResNet-34 | 640 | 48.9 | 66.8 | 31 | 92 | 161 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r34vd_dec4_6x_coco.pdparams) | [config](./rtdetr_r34vd_6x_coco.yml)
|
||||
| RT-DETR-R50-m | 6x | ResNet-50 | 640 | 51.3 | 69.6 | 36 | 100 | 145 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_m_6x_coco.pdparams) | [config](./rtdetr_r50vd_m_6x_coco.yml)
|
||||
| RT-DETR-R50 | 6x | ResNet-50 | 640 | 53.1 | 71.3 | 42 | 136 | 108 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams) | [config](./rtdetr_r50vd_6x_coco.yml)
|
||||
| RT-DETR-R101 | 6x | ResNet-101 | 640 | 54.3 | 72.7 | 76 | 259 | 74 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r101vd_6x_coco.pdparams) | [config](./rtdetr_r101vd_6x_coco.yml)
|
||||
| RT-DETR-L | 6x | HGNetv2 | 640 | 53.0 | 71.6 | 32 | 110 | 114 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_hgnetv2_l_6x_coco.pdparams) | [config](rtdetr_hgnetv2_l_6x_coco.yml)
|
||||
| RT-DETR-X | 6x | HGNetv2 | 640 | 54.8 | 73.1 | 67 | 234 | 74 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_hgnetv2_x_6x_coco.pdparams) | [config](rtdetr_hgnetv2_x_6x_coco.yml)
|
||||
|
||||
## 高精度模型
|
||||
|
||||
| Model | Epoch | backbone | input shape | $AP^{val}$ | $AP^{val}_{50}$ | Pretrained Model | config |
|
||||
|:-----:|:-----:|:---------:| :---------:|:-----------:|:---------------:|:----------------:|:------:|
|
||||
| RT-DETR-Swin | 3x | Swin_L_384 | 640 | 56.2 | 73.5 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_swin_L_384_3x_coco.pdparams) | [config](./rtdetr_swin_L_384_3x_coco.yml)
|
||||
| RT-DETR-FocalNet | 3x | FocalNet_L_384 | 640 | 56.9 | 74.3 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_focalnet_L_384_3x_coco.pdparams) | [config](./rtdetr_focalnet_L_384_3x_coco.yml)
|
||||
|
||||
|
||||
## Objects365预训练模型
|
||||
| Model | Epoch | Dataset | Input shape | $AP^{val}$ | $AP^{val}_{50}$ | T4 TensorRT FP16(FPS) | Weight | Logs
|
||||
|:---:|:---:|:---:| :---:|:---:|:---:|:---:|:---:|:---:|
|
||||
RT-DETR-R18 | 1x | Objects365 | 640 | 22.9 | 31.2 | - | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r18vd_1x_objects365.pdparams) | [log](https://github.com/lyuwenyu/RT-DETR/issues/8)
|
||||
RT-DETR-R18 | 5x | COCO + Objects365 | 640 | 49.2 | 66.6 | 217 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r18vd_5x_coco_objects365.pdparams) | [log](https://github.com/lyuwenyu/RT-DETR/issues/8)
|
||||
RT-DETR-R50 | 1x | Objects365 | 640 | 35.1 | 46.2 | - | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_1x_objects365.pdparams) | [log](https://github.com/lyuwenyu/RT-DETR/issues/8)
|
||||
RT-DETR-R50 | 2x | COCO + Objects365 | 640 | 55.3 | 73.4 | 108 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_2x_coco_objects365.pdparams) | [log](https://github.com/lyuwenyu/RT-DETR/issues/8)
|
||||
RT-DETR-R101 | 1x | Objects365 | 640 | 36.8 | 48.3 | - | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r101vd_1x_objects365.pdparams) | [log](https://github.com/lyuwenyu/RT-DETR/issues/8)
|
||||
RT-DETR-R101 | 2x | COCO + Objects365 | 640 | 56.2 | 74.5 | 74 | [download](https://bj.bcebos.com/v1/paddledet/models/rtdetr_r101vd_2x_coco_objects365.pdparams) | [log](https://github.com/lyuwenyu/RT-DETR/issues/8)
|
||||
|
||||
|
||||
**Notes:**
|
||||
- `COCO + Objects365` 代表使用Objects365预训练权重,在COCO上finetune的结果
|
||||
|
||||
|
||||
|
||||
**注意事项:**
|
||||
- RT-DETR 基础模型均使用4个GPU训练。
|
||||
- RT-DETR 在COCO train2017上训练,并在val2017上评估。
|
||||
- 高精度模型RT-DETR-Swin和RT-DETR-FocalNet使用8个GPU训练,显存需求较高。
|
||||
|
||||
## 快速开始
|
||||
|
||||
<details open>
|
||||
<summary>依赖包:</summary>
|
||||
|
||||
- PaddlePaddle >= 2.4.1
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>安装</summary>
|
||||
|
||||
- [安装指导文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL.md)
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>训练&评估</summary>
|
||||
|
||||
- 单卡GPU上训练:
|
||||
|
||||
```shell
|
||||
# training on single-GPU
|
||||
export CUDA_VISIBLE_DEVICES=0
|
||||
python tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml --eval
|
||||
```
|
||||
|
||||
- 多卡GPU上训练:
|
||||
|
||||
```shell
|
||||
# training on multi-GPU
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml --fleet --eval
|
||||
```
|
||||
|
||||
- 评估:
|
||||
|
||||
```shell
|
||||
python tools/eval.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
|
||||
-o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams
|
||||
```
|
||||
|
||||
- 测试:
|
||||
|
||||
```shell
|
||||
python tools/infer.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
|
||||
-o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams \
|
||||
--infer_img=./demo/000000570688.jpg
|
||||
```
|
||||
|
||||
详情请参考[快速开始文档](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED.md).
|
||||
|
||||
</details>
|
||||
|
||||
## 部署
|
||||
|
||||
<details open>
|
||||
<summary>1. 导出模型 </summary>
|
||||
|
||||
```shell
|
||||
cd PaddleDetection
|
||||
python tools/export_model.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
|
||||
-o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams trt=True \
|
||||
--output_dir=output_inference
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2. 转换模型至ONNX </summary>
|
||||
|
||||
- 安装[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) 和 ONNX
|
||||
|
||||
```shell
|
||||
pip install onnx==1.13.0
|
||||
pip install paddle2onnx==1.0.5
|
||||
```
|
||||
|
||||
- 转换模型:
|
||||
|
||||
```shell
|
||||
paddle2onnx --model_dir=./output_inference/rtdetr_r50vd_6x_coco/ \
|
||||
--model_filename model.pdmodel \
|
||||
--params_filename model.pdiparams \
|
||||
--opset_version 16 \
|
||||
--save_file rtdetr_r50vd_6x_coco.onnx
|
||||
```
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>3. 转换成TensorRT(可选) </summary>
|
||||
|
||||
- 确保TensorRT的版本>=8.5.1
|
||||
- TRT推理可以参考[RT-DETR](https://github.com/lyuwenyu/RT-DETR)的部分代码或者其他网络资源
|
||||
|
||||
```shell
|
||||
trtexec --onnx=./rtdetr_r50vd_6x_coco.onnx \
|
||||
--workspace=4096 \
|
||||
--shapes=image:1x3x640x640 \
|
||||
--saveEngine=rtdetr_r50vd_6x_coco.trt \
|
||||
--avgRuns=100 \
|
||||
--fp16
|
||||
```
|
||||
|
||||
-
|
||||
</details>
|
||||
|
||||
## 量化压缩
|
||||
|
||||
详细步骤请参考:[RT-DETR自动化量化压缩](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/deploy/auto_compression#rt-detr)
|
||||
|
||||
| 模型 | Base mAP | ACT量化mAP | TRT-FP32 | TRT-FP16 | TRT-INT8 | 配置文件 | 量化模型 |
|
||||
| :---------------- | :------- | :--------: | :------: | :------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
|
||||
| RT-DETR-R50 | 53.1 | 53.0 | 32.05ms | 9.12ms | **6.96ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r50vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r50vd_6x_coco_quant.tar) |
|
||||
| RT-DETR-R101 | 54.3 | 54.1 | 54.13ms | 12.68ms | **9.20ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_r101vd_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_r101vd_6x_coco_quant.tar) |
|
||||
| RT-DETR-HGNetv2-L | 53.0 | 52.9 | 26.16ms | 8.54ms | **6.65ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_l_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_l_6x_coco_quant.tar) |
|
||||
| RT-DETR-HGNetv2-X | 54.8 | 54.6 | 49.22ms | 12.50ms | **9.24ms** | [config](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/detection/configs/rtdetr_hgnetv2_x_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/rtdetr_hgnetv2_x_6x_coco_quant.tar) |
|
||||
|
||||
- 上表测试环境:Tesla T4,TensorRT 8.6.0,CUDA 11.7,batch_size=1。
|
||||
- 也可直接参考:[PaddleSlim自动化压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/detection)
|
||||
|
||||
## 其他
|
||||
|
||||
<details>
|
||||
<summary>1. 参数量和计算量统计 </summary>
|
||||
可以使用以下代码片段实现参数量和计算量的统计
|
||||
|
||||
```
|
||||
import paddle
|
||||
from ppdet.core.workspace import load_config, merge_config
|
||||
from ppdet.core.workspace import create
|
||||
|
||||
cfg_path = './configs/rtdetr/rtdetr_r50vd_6x_coco.yml'
|
||||
cfg = load_config(cfg_path)
|
||||
model = create(cfg.architecture)
|
||||
|
||||
blob = {
|
||||
'image': paddle.randn([1, 3, 640, 640]),
|
||||
'im_shape': paddle.to_tensor([[640], [640]]),
|
||||
'scale_factor': paddle.to_tensor([[1.], [1.]])
|
||||
}
|
||||
paddle.flops(model, None, blob, custom_ops=None, print_detail=False)
|
||||
```
|
||||
</details>
|
||||
|
||||
|
||||
<details open>
|
||||
<summary>2. YOLOs端到端速度测速 </summary>
|
||||
|
||||
- 可以参考[RT-DETR](https://github.com/lyuwenyu/RT-DETR) benchmark部分或者其他网络资源
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
|
||||
## 引用RT-DETR
|
||||
如果需要在你的研究中使用RT-DETR,请通过以下方式引用我们的论文:
|
||||
```
|
||||
@misc{lv2023detrs,
|
||||
title={DETRs Beat YOLOs on Real-time Object Detection},
|
||||
author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
|
||||
year={2023},
|
||||
eprint={2304.08069},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,19 @@
|
||||
epoch: 72
|
||||
|
||||
LearningRate:
|
||||
base_lr: 0.0001
|
||||
schedulers:
|
||||
- !PiecewiseDecay
|
||||
gamma: 1.0
|
||||
milestones: [100]
|
||||
use_warmup: true
|
||||
- !LinearWarmup
|
||||
start_factor: 0.001
|
||||
steps: 2000
|
||||
|
||||
OptimizerBuilder:
|
||||
clip_grad_by_norm: 0.1
|
||||
regularizer: false
|
||||
optimizer:
|
||||
type: AdamW
|
||||
weight_decay: 0.0001
|
||||
@@ -0,0 +1,71 @@
|
||||
architecture: DETR
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
|
||||
norm_type: sync_bn
|
||||
use_ema: True
|
||||
ema_decay: 0.9999
|
||||
ema_decay_type: "exponential"
|
||||
ema_filter_no_grad: True
|
||||
hidden_dim: 256
|
||||
use_focal_loss: True
|
||||
eval_size: [640, 640]
|
||||
|
||||
|
||||
DETR:
|
||||
backbone: ResNet
|
||||
neck: HybridEncoder
|
||||
transformer: RTDETRTransformer
|
||||
detr_head: DINOHead
|
||||
post_process: DETRPostProcess
|
||||
|
||||
ResNet:
|
||||
# index 0 stands for res2
|
||||
depth: 50
|
||||
variant: d
|
||||
norm_type: bn
|
||||
freeze_at: 0
|
||||
return_idx: [1, 2, 3]
|
||||
lr_mult_list: [0.1, 0.1, 0.1, 0.1]
|
||||
num_stages: 4
|
||||
freeze_stem_only: True
|
||||
|
||||
HybridEncoder:
|
||||
hidden_dim: 256
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 1
|
||||
encoder_layer:
|
||||
name: TransformerLayer
|
||||
d_model: 256
|
||||
nhead: 8
|
||||
dim_feedforward: 1024
|
||||
dropout: 0.
|
||||
activation: 'gelu'
|
||||
expansion: 1.0
|
||||
|
||||
|
||||
RTDETRTransformer:
|
||||
num_queries: 300
|
||||
position_embed_type: sine
|
||||
feat_strides: [8, 16, 32]
|
||||
num_levels: 3
|
||||
nhead: 8
|
||||
num_decoder_layers: 6
|
||||
dim_feedforward: 1024
|
||||
dropout: 0.0
|
||||
activation: relu
|
||||
num_denoising: 100
|
||||
label_noise_ratio: 0.5
|
||||
box_noise_scale: 1.0
|
||||
learnt_init_query: False
|
||||
|
||||
DINOHead:
|
||||
loss:
|
||||
name: DINOLoss
|
||||
loss_coeff: {class: 1, bbox: 5, giou: 2}
|
||||
aux_loss: True
|
||||
use_vfl: True
|
||||
matcher:
|
||||
name: HungarianMatcher
|
||||
matcher_coeff: {class: 2, bbox: 5, giou: 2}
|
||||
|
||||
DETRPostProcess:
|
||||
num_top_queries: 300
|
||||
@@ -0,0 +1,43 @@
|
||||
worker_num: 4
|
||||
TrainReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- RandomDistort: {prob: 0.8}
|
||||
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
|
||||
- RandomCrop: {prob: 0.8}
|
||||
- RandomFlip: {}
|
||||
batch_transforms:
|
||||
- BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}
|
||||
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
|
||||
- NormalizeBox: {}
|
||||
- BboxXYXY2XYWH: {}
|
||||
- Permute: {}
|
||||
batch_size: 4
|
||||
shuffle: true
|
||||
drop_last: true
|
||||
collate_batch: false
|
||||
use_shared_memory: true
|
||||
|
||||
|
||||
EvalReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
|
||||
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
|
||||
- Permute: {}
|
||||
batch_size: 4
|
||||
shuffle: false
|
||||
drop_last: false
|
||||
|
||||
|
||||
TestReader:
|
||||
inputs_def:
|
||||
image_shape: [3, 640, 640]
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
|
||||
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
|
||||
- Permute: {}
|
||||
batch_size: 1
|
||||
shuffle: false
|
||||
drop_last: false
|
||||
@@ -0,0 +1,87 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_6x.yml',
|
||||
'_base_/rtdetr_r50vd.yml',
|
||||
'_base_/rtdetr_reader.yml',
|
||||
]
|
||||
|
||||
weights: output/rtdetr_focalnet_L_384_3x_coco/model_final
|
||||
find_unused_parameters: True
|
||||
log_iter: 100
|
||||
snapshot_epoch: 2
|
||||
|
||||
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/focalnet_large_fl4_pretrained_on_o365.pdparams
|
||||
DETR:
|
||||
backbone: FocalNet
|
||||
neck: HybridEncoder
|
||||
transformer: RTDETRTransformer
|
||||
detr_head: DINOHead
|
||||
post_process: DETRPostProcess
|
||||
|
||||
FocalNet:
|
||||
arch: 'focalnet_L_384_22k_fl4'
|
||||
out_indices: [1, 2, 3]
|
||||
|
||||
HybridEncoder:
|
||||
hidden_dim: 256
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 6 #
|
||||
encoder_layer:
|
||||
name: TransformerLayer
|
||||
d_model: 256
|
||||
nhead: 8
|
||||
dim_feedforward: 2048
|
||||
dropout: 0.
|
||||
activation: 'gelu'
|
||||
expansion: 1.0
|
||||
|
||||
|
||||
RTDETRTransformer:
|
||||
num_queries: 300
|
||||
position_embed_type: sine
|
||||
feat_strides: [8, 16, 32]
|
||||
num_levels: 3
|
||||
nhead: 8
|
||||
num_decoder_layers: 6
|
||||
dim_feedforward: 2048 #
|
||||
dropout: 0.0
|
||||
activation: relu
|
||||
num_denoising: 100
|
||||
label_noise_ratio: 0.5
|
||||
box_noise_scale: 1.0
|
||||
learnt_init_query: False
|
||||
query_pos_head_inv_sig: True #
|
||||
|
||||
DINOHead:
|
||||
loss:
|
||||
name: DINOLoss
|
||||
loss_coeff: {class: 1, bbox: 5, giou: 2}
|
||||
aux_loss: True
|
||||
use_vfl: True
|
||||
matcher:
|
||||
name: HungarianMatcher
|
||||
matcher_coeff: {class: 2, bbox: 5, giou: 2}
|
||||
|
||||
DETRPostProcess:
|
||||
num_top_queries: 300
|
||||
|
||||
|
||||
epoch: 36
|
||||
LearningRate:
|
||||
base_lr: 0.0001
|
||||
schedulers:
|
||||
- !PiecewiseDecay
|
||||
gamma: 0.1
|
||||
milestones: [36]
|
||||
use_warmup: false
|
||||
|
||||
OptimizerBuilder:
|
||||
clip_grad_by_norm: 0.1
|
||||
regularizer: false
|
||||
optimizer:
|
||||
type: AdamW
|
||||
weight_decay: 0.0001
|
||||
param_groups:
|
||||
- params: ['absolute_pos_embed', 'relative_position_bias_table', 'norm']
|
||||
weight_decay: 0.0
|
||||
@@ -0,0 +1,24 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_6x.yml',
|
||||
'_base_/rtdetr_r50vd.yml',
|
||||
'_base_/rtdetr_reader.yml',
|
||||
]
|
||||
|
||||
weights: output/rtdetr_hgnetv2_l_6x_coco/model_final
|
||||
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/PPHGNetV2_L_ssld_pretrained.pdparams
|
||||
find_unused_parameters: True
|
||||
log_iter: 200
|
||||
|
||||
|
||||
DETR:
|
||||
backbone: PPHGNetV2
|
||||
|
||||
PPHGNetV2:
|
||||
arch: 'L'
|
||||
return_idx: [1, 2, 3]
|
||||
freeze_stem_only: True
|
||||
freeze_at: 0
|
||||
freeze_norm: True
|
||||
lr_mult_list: [0., 0.05, 0.05, 0.05, 0.05]
|
||||
@@ -0,0 +1,40 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_6x.yml',
|
||||
'_base_/rtdetr_r50vd.yml',
|
||||
'_base_/rtdetr_reader.yml',
|
||||
]
|
||||
|
||||
weights: output/rtdetr_hgnetv2_l_6x_coco/model_final
|
||||
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/PPHGNetV2_X_ssld_pretrained.pdparams
|
||||
find_unused_parameters: True
|
||||
log_iter: 200
|
||||
|
||||
|
||||
|
||||
DETR:
|
||||
backbone: PPHGNetV2
|
||||
|
||||
|
||||
PPHGNetV2:
|
||||
arch: 'X'
|
||||
return_idx: [1, 2, 3]
|
||||
freeze_stem_only: True
|
||||
freeze_at: 0
|
||||
freeze_norm: True
|
||||
lr_mult_list: [0., 0.01, 0.01, 0.01, 0.01]
|
||||
|
||||
|
||||
HybridEncoder:
|
||||
hidden_dim: 384
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 1
|
||||
encoder_layer:
|
||||
name: TransformerLayer
|
||||
d_model: 384
|
||||
nhead: 8
|
||||
dim_feedforward: 2048
|
||||
dropout: 0.
|
||||
activation: 'gelu'
|
||||
expansion: 1.0
|
||||
@@ -0,0 +1,37 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_6x.yml',
|
||||
'_base_/rtdetr_r50vd.yml',
|
||||
'_base_/rtdetr_reader.yml',
|
||||
]
|
||||
|
||||
weights: output/rtdetr_r101vd_6x_coco/model_final
|
||||
find_unused_parameters: True
|
||||
log_iter: 200
|
||||
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_vd_ssld_pretrained.pdparams
|
||||
|
||||
ResNet:
|
||||
# index 0 stands for res2
|
||||
depth: 101
|
||||
variant: d
|
||||
norm_type: bn
|
||||
freeze_at: 0
|
||||
return_idx: [1, 2, 3]
|
||||
lr_mult_list: [0.01, 0.01, 0.01, 0.01]
|
||||
num_stages: 4
|
||||
freeze_stem_only: True
|
||||
|
||||
HybridEncoder:
|
||||
hidden_dim: 384
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 1
|
||||
encoder_layer:
|
||||
name: TransformerLayer
|
||||
d_model: 384
|
||||
nhead: 8
|
||||
dim_feedforward: 2048
|
||||
dropout: 0.
|
||||
activation: 'gelu'
|
||||
expansion: 1.0
|
||||
@@ -0,0 +1,38 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_6x.yml',
|
||||
'_base_/rtdetr_r50vd.yml',
|
||||
'_base_/rtdetr_reader.yml',
|
||||
]
|
||||
|
||||
weights: output/rtdetr_r18_6x_coco/model_final
|
||||
find_unused_parameters: True
|
||||
log_iter: 200
|
||||
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet18_vd_pretrained.pdparams
|
||||
ResNet:
|
||||
depth: 18
|
||||
variant: d
|
||||
return_idx: [1, 2, 3]
|
||||
freeze_at: -1
|
||||
freeze_norm: false
|
||||
norm_decay: 0.
|
||||
|
||||
HybridEncoder:
|
||||
hidden_dim: 256
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 1
|
||||
encoder_layer:
|
||||
name: TransformerLayer
|
||||
d_model: 256
|
||||
nhead: 8
|
||||
dim_feedforward: 1024
|
||||
dropout: 0.
|
||||
activation: 'gelu'
|
||||
expansion: 0.5
|
||||
depth_mult: 1.0
|
||||
|
||||
RTDETRTransformer:
|
||||
eval_idx: -1
|
||||
num_decoder_layers: 3
|
||||
@@ -0,0 +1,38 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_6x.yml',
|
||||
'_base_/rtdetr_r50vd.yml',
|
||||
'_base_/rtdetr_reader.yml',
|
||||
]
|
||||
|
||||
weights: output/rtdetr_r34vd_6x_coco/model_final
|
||||
find_unused_parameters: True
|
||||
log_iter: 200
|
||||
|
||||
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/ResNet34_vd_pretrained.pdparams
|
||||
ResNet:
|
||||
depth: 34
|
||||
variant: d
|
||||
return_idx: [1, 2, 3]
|
||||
freeze_at: -1
|
||||
freeze_norm: false
|
||||
norm_decay: 0.
|
||||
|
||||
HybridEncoder:
|
||||
hidden_dim: 256
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 1
|
||||
encoder_layer:
|
||||
name: TransformerLayer
|
||||
d_model: 256
|
||||
nhead: 8
|
||||
dim_feedforward: 1024
|
||||
dropout: 0.
|
||||
activation: 'gelu'
|
||||
expansion: 0.5
|
||||
depth_mult: 1.0
|
||||
|
||||
RTDETRTransformer:
|
||||
eval_idx: -1
|
||||
num_decoder_layers: 4
|
||||
@@ -0,0 +1,11 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_6x.yml',
|
||||
'_base_/rtdetr_r50vd.yml',
|
||||
'_base_/rtdetr_reader.yml',
|
||||
]
|
||||
|
||||
weights: output/rtdetr_r50vd_6x_coco/model_final
|
||||
find_unused_parameters: True
|
||||
log_iter: 200
|
||||
@@ -0,0 +1,28 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_6x.yml',
|
||||
'_base_/rtdetr_r50vd.yml',
|
||||
'_base_/rtdetr_reader.yml',
|
||||
]
|
||||
|
||||
weights: output/rtdetr_r50vd_m_6x_coco/model_final
|
||||
find_unused_parameters: True
|
||||
log_iter: 200
|
||||
|
||||
HybridEncoder:
|
||||
hidden_dim: 256
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 1
|
||||
encoder_layer:
|
||||
name: TransformerLayer
|
||||
d_model: 256
|
||||
nhead: 8
|
||||
dim_feedforward: 1024
|
||||
dropout: 0.
|
||||
activation: 'gelu'
|
||||
expansion: 0.5
|
||||
depth_mult: 1.0
|
||||
|
||||
RTDETRTransformer:
|
||||
eval_idx: 2 # use 3th decoder layer to eval
|
||||
@@ -0,0 +1,89 @@
|
||||
_BASE_: [
|
||||
'../datasets/coco_detection.yml',
|
||||
'../runtime.yml',
|
||||
'_base_/optimizer_6x.yml',
|
||||
'_base_/rtdetr_r50vd.yml',
|
||||
'_base_/rtdetr_reader.yml',
|
||||
]
|
||||
|
||||
weights: output/rtdetr_swin_L_384_3x_coco/model_final
|
||||
find_unused_parameters: True
|
||||
log_iter: 100
|
||||
snapshot_epoch: 2
|
||||
|
||||
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/dino_swin_large_384_4scale_3x_coco.pdparams
|
||||
DETR:
|
||||
backbone: SwinTransformer
|
||||
neck: HybridEncoder
|
||||
transformer: RTDETRTransformer
|
||||
detr_head: DINOHead
|
||||
post_process: DETRPostProcess
|
||||
|
||||
|
||||
SwinTransformer:
|
||||
arch: 'swin_L_384' # ['swin_T_224', 'swin_S_224', 'swin_B_224', 'swin_L_224', 'swin_B_384', 'swin_L_384']
|
||||
ape: false
|
||||
drop_path_rate: 0.2
|
||||
patch_norm: true
|
||||
out_indices: [1, 2, 3]
|
||||
|
||||
HybridEncoder:
|
||||
hidden_dim: 256
|
||||
use_encoder_idx: [2]
|
||||
num_encoder_layers: 6 #
|
||||
encoder_layer:
|
||||
name: TransformerLayer
|
||||
d_model: 256
|
||||
nhead: 8
|
||||
dim_feedforward: 2048 #
|
||||
dropout: 0.
|
||||
activation: 'gelu'
|
||||
expansion: 1.0
|
||||
|
||||
RTDETRTransformer:
|
||||
num_queries: 300
|
||||
position_embed_type: sine
|
||||
feat_strides: [8, 16, 32]
|
||||
num_levels: 3
|
||||
nhead: 8
|
||||
num_decoder_layers: 6
|
||||
dim_feedforward: 2048 #
|
||||
dropout: 0.0
|
||||
activation: relu
|
||||
num_denoising: 100
|
||||
label_noise_ratio: 0.5
|
||||
box_noise_scale: 1.0
|
||||
learnt_init_query: False
|
||||
|
||||
DINOHead:
|
||||
loss:
|
||||
name: DINOLoss
|
||||
loss_coeff: {class: 1, bbox: 5, giou: 2}
|
||||
aux_loss: True
|
||||
use_vfl: True
|
||||
matcher:
|
||||
name: HungarianMatcher
|
||||
matcher_coeff: {class: 2, bbox: 5, giou: 2}
|
||||
|
||||
DETRPostProcess:
|
||||
num_top_queries: 300
|
||||
|
||||
|
||||
epoch: 36
|
||||
LearningRate:
|
||||
base_lr: 0.0001
|
||||
schedulers:
|
||||
- !PiecewiseDecay
|
||||
gamma: 0.1
|
||||
milestones: [36]
|
||||
use_warmup: false
|
||||
|
||||
OptimizerBuilder:
|
||||
clip_grad_by_norm: 0.1
|
||||
regularizer: false
|
||||
optimizer:
|
||||
type: AdamW
|
||||
weight_decay: 0.0001
|
||||
param_groups:
|
||||
- params: ['absolute_pos_embed', 'relative_position_bias_table', 'norm']
|
||||
weight_decay: 0.0
|
||||
Reference in New Issue
Block a user