更换文档检测模型

2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions
--- a/paddle_detection/configs/semi_det/rtdetr_ssod/README.md
+++ b/paddle_detection/configs/semi_det/rtdetr_ssod/README.md
@@ -0,0 +1,109 @@
+简体中文 | [English](README_en.md)
+
+# RTDETR-SSOD(基于RTDETR的配套半监督目标检测方法)
+# 复现模型指标注意事项: 模型中指标是采用在监督数据训练饱和后加载监督数据所训练的模型进行半监督训练
+  - 例如 使用 baseline/rtdetr_r50vd_6x_coco_sup005.yml使用5%coco数据训练全监督模型，得到rtdetr_r50vd_6x_coco_sup005.pdparams,在rt_detr_ssod005_coco_no_warmup.yml中设置
+      - pretrain_student_weights: rtdetr_r50vd_6x_coco_sup005.pdparams
+      - pretrain_teacher_weights: rtdetr_r50vd_6x_coco_sup005.pdparams
+  - 1.使用coco数据集5%和10%有标记数据和voc数据集VOC2007trainval 所训练的权重已给出请参考 semi_det/baseline/README.md.
+  - 2.rt_detr_ssod_voc_no_warmup.yml rt_detr_ssod005_coco_no_warmup.yml rt_detr_ssod010_coco_no_warmup.yml 是使用训练好的全监督权中直接开启半监督训练（推荐）
+## RTDETR-SSOD模型库
+
+|      模型       |  监督数据比例 |        Sup Baseline     |    Sup Epochs (Iters)   |  Sup mAP<sup>val<br>0.5:0.95 | Semi mAP<sup>val<br>0.5:0.95 |  Semi Epochs (Iters)  |  模型下载  |   配置文件   |
+| :------------: | :---------: | :---------------------: | :---------------------: |:---------------------------: |:----------------------------: | :------------------: |:--------: |:----------: |
+| RTDETR-SSOD    |   5% |   [sup_config](../baseline/rtdetr_r50vd_6x_coco_sup005.yml)    |  - | 39.0 |  **42.3**  | -  | [download](https://bj.bcebos.com/v1/paddledet/rt_detr_ssod005_coco_no_warmup.pdparams) | [config](./rt_detr_ssod005_coco_no_warmup.yml) |
+| RTDETR-SSOD     |   10% |   [sup_config](../baseline/rtdetr_r50vd_6x_coco_sup010.yml)    | -| 42.3 |  **44.8**  | - | [download](https://bj.bcebos.com/v1/paddledet/data/semidet/rtdetr_ssod/rt_detr_ssod010_coco/rt_detr_ssod010_coco_no_warmup.pdparams) | [config](./rt_detr_ssod010_coco_with_warmup.yml) |
+| RTDETR-SSOD(VOC)|   VOC |   [sup_config](../baseline/rtdetr_r50vd_6x_coco_voc2007.yml)    | -  | 62.7 |  **65.8(LSJ)**  | -  | [download](https://bj.bcebos.com/v1/paddledet/data/semidet/rtdetr_ssod/rt_detr_ssod_voc/rt_detr_ssod_voc_no_warmup.pdparams) | [config](./rt_detr_ssod_voc_with_warmup.yml) |
+
+**注意:**
+ - 以上模型训练默认使用8 GPUs，监督数据总batch_size默认为16，无监督数据总batch_size默认也为16，默认初始学习率为0.01。如果改动了总batch_size，请按线性比例相应地调整学习率；
+ - **监督数据比例**是指使用的有标签COCO数据集占 COCO train2017 全量训练集的百分比，使用的无标签COCO数据集一般也是相同比例，但具体图片和有标签数据的图片不重合；
+ - `Semi Epochs (Iters)`表示**半监督训练**的模型的 Epochs (Iters)，如果使用**自定义数据集**，需自行根据Iters换算到对应的Epochs调整，最好保证总Iters 和COCO数据集的设置较为接近；
+ - `Sup mAP`是**只使用有监督数据训练**的模型的精度，请参照**基础检测器的配置文件** 和 [baseline](../baseline)；
+ - `Semi mAP`是**半监督训练**的模型的精度，模型下载和配置文件的链接均为**半监督模型**；
+ - `LSJ`表示 **large-scale jittering**，表示使用更大范围的多尺度训练，可进一步提升精度，但训练速度也会变慢；
+ - 半监督检测的配置讲解，请参照[文档](../README.md/#半监督检测配置)；
+ - `Dense Teacher`原文使用`R50-va-caffe`预训练，PaddleDetection中默认使用`R50-vb`预训练，如果使用`R50-vd`结合[SSLD](../../../docs/feature_models/SSLD_PRETRAINED_MODEL.md)的预训练模型，可进一步显著提升检测精度，同时backbone部分配置也需要做出相应更改，如：
+ ```python
+  pretrain_weights:  https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+  ResNet:
+    depth: 50
+    variant: d
+    norm_type: bn
+    freeze_at: 0
+    return_idx: [1, 2, 3]
+    num_stages: 4
+    lr_mult_list: [0.05, 0.05, 0.1, 0.15]
+ ```
+
+## 使用说明
+
+仅训练时必须使用半监督检测的配置文件去训练，评估、预测、部署也可以按基础检测器的配置文件去执行。
+
+### 训练
+
+```bash
+# 单卡训练 (不推荐，需按线性比例相应地调整学习率)
+CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/semi_det/rtdetr_ssod/rt_detr_ssod010_coco_no_warmup.yml --eval
+
+# 多卡训练
+python -m paddle.distributed.launch --log_dir=denseteacher_fcos_semi010/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/semi_det/rtdetr_ssod/rt_detr_ssod010_coco_no_warmup.yml --eval
+```
+
+### 评估
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/semi_det/rtdetr_ssod/rt_detr_ssod010_coco_no_warmup.yml -o weights=output/rt_detr_ssod/model_final/model_final.pdparams
+```
+
+### 预测
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/semi_det/rtdetr_ssod/rt_detr_ssod010_coco_no_warmup.yml -o weights=output/rt_detr_ssod/model_final/model_final.pdparams --infer_img=demo/000000014439.jpg
+```
+
+### 部署
+
+部署可以使用半监督检测配置文件，也可以使用基础检测器的配置文件去部署和使用。
+
+```bash
+# 导出模型
+CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/semi_det/rtdetr_ssod/rt_detr_ssod010_coco_no_warmup.yml -o weights=https://paddledet.bj.bcebos.com/models/rt_detr_ssod010_coco_no_warmup.pdparams
+
+# 导出权重预测
+CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/rt_detr_ssod010_coco_no_warmup --image_file=demo/000000014439_640x640.jpg --device=GPU
+
+# 部署测速
+CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/rt_detr_ssod010_coco_no_warmup --image_file=demo/000000014439_640x640.jpg --device=GPU --run_benchmark=True # --run_mode=trt_fp16
+
+# 导出ONNX
+paddle2onnx --model_dir output_inference/drt_detr_ssod010_coco_no_warmup/ --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file rt_detr_ssod010_coco_no_warmup.onnx
+```
+
+
+# RTDETR-SSOD 下游任务
+
+我们验证了RTDETR-SSOD模型强大的泛化能力，在低光、工业、交通等不同场景下游任务检测效果稳定提升!
+
+voc数据集采用[voc](https://github.com/thsant/wgisd)，是一个广泛使用的计算机视觉数据集，用于目标检测、图像分割和场景理解等任务。该数据集包含20个类别的图像,处理后的COCO格式，包含图片标注训练集5011张，图片无标注训练集11540张，测试集2510张，20个类别；
+
+低光数据集使用[ExDark](https://github.com/cs-chan/Exclusively-Dark-Image-Dataset/tree/master/Dataset)，该数据集是一个专门在低光照环境下拍摄出针对低光目标检测的数据集，包括从极低光环境到暮光环境等10种不同光照条件下的图片，处理后的COCO格式，包含图片训练集5891张，测试集1472张，12个类别;
+
+工业数据集使用[PKU-Market-PCB](https://robotics.pkusz.edu.cn/resources/dataset/)，该数据集用于印刷电路板（PCB）的瑕疵检测，提供了6种常见的PCB缺陷;
+
+商超数据集[SKU110k](https://github.com/eg4000/SKU110K_CVPR19)是商品超市场景下的密集目标检测数据集，包含11,762张图片和超过170个实例。其中包括8,233张用于训练的图像、588张用于验证的图像和2,941张用于测试的图像;
+
+自动驾驶数据集使用[sslad](https://soda-2d.github.io/index.html);
+
+交通数据集使用[visdrone](http://aiskyeye.com/home/);
+
+## 下游数据集实验结果：
+
+| 数据集   |               业务方向   | 划分           | labeled数据量 | 全监督mAP | 半监督mAP    |
+|----------|-----------|---------------------|-----------------|------------------|--------------|
+| voc     | 通用       | voc07, 12；1:2 | 5000          | 63.1      | 65.8（+2.7） |
+| visdrone        | 无人机交通 | 1:9            | 647           | 19.4      | 20.6 (+1.2)  |
+| pcb     | 工业缺陷   | 1:9            | 55            | 22.9      | 26.8 (+3.9)  |
+| sku110k               | 商品       | 1:9            | 821           | 38.9      | 52.4 (+13.5) |
+| sslad                       | 自动驾驶   | 1:32           | 4967          | 42.1      | 43.3 (+1.2)  |
+| exdark   | 低光照     | 1:9            | 589           | 39.6      | 44.1 (+4.5)  |
--- a/paddle_detection/configs/semi_det/rtdetr_ssod/rt_detr_ssod005_coco_no_warmup.yml
+++ b/paddle_detection/configs/semi_det/rtdetr_ssod/rt_detr_ssod005_coco_no_warmup.yml
@@ -0,0 +1,212 @@
+_BASE_: [
+  '../../runtime.yml',
+  '../../rtdetr/_base_/rtdetr_r50vd.yml',
+  '../../rtdetr/_base_/rtdetr_reader.yml',
+]
+eval_interval: 4000
+save_interval: 4000
+weights: output/rt_detr_ssod/model_final
+find_unused_parameters: True
+save_dir: output
+log_iter: 1
+ssod_method: Semi_RTDETR
+### global config
+use_simple_ema: True
+ema_decay: 0.9996
+use_gpu: true
+
+### reader config
+worker_num: 4
+
+SemiTrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RandomDistort: {prob: 0.8}
+    - RandomExpand: {fill_value: [0., 0., 0.]}
+    - RandomCrop: {prob: 0.8}
+    - RandomFlip: {}
+  weak_aug:
+    - RandomFlip: {prob: 0.0}
+  strong_aug:
+    - StrongAugImage: {transforms: [
+        RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1},
+        RandomErasingCrop: {},
+        RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]},
+        RandomGrayscale: {prob: 0.2},
+      ]}
+  sup_batch_transforms:
+    - BatchRandomResizeForSSOD: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - NormalizeBox: {}
+    - BboxXYXY2XYWH: {}
+    - Permute: {}
+  unsup_batch_transforms:
+    - BatchRandomResizeForSSOD: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - NormalizeBox: {}
+    - BboxXYXY2XYWH: {}
+    - Permute: {}
+  sup_batch_size: 2
+  unsup_batch_size: 2
+  shuffle: true
+  drop_last: true
+  collate_batch: false
+  use_shared_memory: false
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 2
+  shuffle: false
+  drop_last: false
+
+
+TestReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: { target_size: [640, 640], keep_ratio: False }
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
+
+
+pretrain_student_weights: https://bj.bcebos.com/v1/paddledet/data/semidet/rtdetr_ssod/baseline/rtdetr_r50vd_6x_coco_sup005.pdparams
+pretrain_teacher_weights: https://bj.bcebos.com/v1/paddledet/data/semidet/rtdetr_ssod/baseline/rtdetr_r50vd_6x_coco_sup005.pdparams
+
+hidden_dim: 256
+use_focal_loss: True
+eval_size: [640, 640]
+
+architecture: DETR
+DETR:
+  backbone: ResNet
+  neck: HybridEncoder
+  transformer: RTDETRTransformer
+  detr_head: DINOHead
+  post_process: DETRPostProcess
+  post_process_semi: DETRBBoxSemiPostProcess
+ResNet:
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [1, 2, 3]
+  lr_mult_list: [0.1, 0.1, 0.1, 0.1]
+  num_stages: 4
+  freeze_stem_only: True
+
+HybridEncoder:
+  hidden_dim: 256
+  use_encoder_idx: [2]
+  num_encoder_layers: 1
+  encoder_layer:
+    name: TransformerLayer
+    d_model: 256
+    nhead: 8
+    dim_feedforward: 1024
+    dropout: 0.
+    activation: 'gelu'
+  expansion: 1.0
+
+
+PPDETRTransformer:
+  num_queries: 300
+  position_embed_type: sine
+  feat_strides: [8, 16, 32]
+  num_levels: 3
+  nhead: 8
+  num_decoder_layers: 6
+  dim_feedforward: 1024
+  dropout: 0.0
+  activation: relu
+  num_denoising: 100
+  label_noise_ratio: 0.5
+  box_noise_scale: 1.0
+  learnt_init_query: False
+
+DINOHead:
+  loss:
+    name: DINOLoss
+    loss_coeff: {class: 1, bbox: 5, giou: 2}
+    aux_loss: True
+    use_vfl: True
+    matcher:
+      name: HungarianMatcher
+      matcher_coeff: {class: 2, bbox: 5, giou: 2}
+
+DETRPostProcess:
+  num_top_queries: 300
+
+
+
+SSOD: DETR_SSOD
+DETR_SSOD:
+  teacher: DETR
+  student: DETR
+  train_cfg:
+    sup_weight: 1.0
+    unsup_weight: 1.0
+    ema_start_iters:  -1
+    pseudo_label_initial_score_thr: 0.7
+    min_pseduo_box_size: 0
+    concat_sup_data: True
+  test_cfg:
+    inference_on: teacher
+
+
+
+metric: COCO
+num_classes: 80
+
+# partial labeled COCO, use `SemiCOCODataSet` rather than `COCODataSet`
+TrainDataset:
+  !SemiCOCODataSet
+    image_dir: train2017
+    anno_path: semi_annotations/instances_train2017.1@5.json
+    dataset_dir: dataset/coco
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+# partial unlabeled COCO, use `SemiCOCODataSet` rather than `COCODataSet`
+UnsupTrainDataset:
+  !SemiCOCODataSet
+    image_dir: train2017
+    anno_path: semi_annotations/instances_train2017.1@5-unlabeled.json
+    dataset_dir: dataset/coco
+    data_fields: ['image']
+    supervised: False
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco
+    allow_empty: true
+
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt)
+    dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path'
+
+epoch: 400 #epoch: 60
+
+LearningRate:
+  base_lr: 0.0002
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 1.0
+    milestones: [400]
+    use_warmup: false
+  - !LinearWarmup
+    start_factor: 0.001
+    steps: 2000
+
+OptimizerBuilder:
+  clip_grad_by_norm: 0.1
+  regularizer: false
+  optimizer:
+    type: AdamW
+    weight_decay: 0.0001
--- a/paddle_detection/configs/semi_det/rtdetr_ssod/rt_detr_ssod005_coco_with_warmup.yml
+++ b/paddle_detection/configs/semi_det/rtdetr_ssod/rt_detr_ssod005_coco_with_warmup.yml
@@ -0,0 +1,215 @@
+_BASE_: [
+  '../../runtime.yml',
+  '../../rtdetr/_base_/rtdetr_r50vd.yml',
+  '../../rtdetr/_base_/rtdetr_reader.yml',
+]
+
+#for debug
+eval_interval: 4000
+save_interval: 4000
+weights: output/rt_detr_ssod/model_final
+find_unused_parameters: True
+save_dir: output
+log_iter: 50
+ssod_method: Semi_RTDETR
+### global config
+use_simple_ema: True
+ema_decay: 0.9996
+use_gpu: true
+
+### reader config
+worker_num: 4
+
+SemiTrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RandomDistort: {prob: 0.8}
+    - RandomExpand: {fill_value: [0., 0., 0.]}
+    - RandomCrop: {prob: 0.8}
+    - RandomFlip: {}
+  weak_aug:
+    - RandomFlip: {prob: 0.0}
+  strong_aug:
+    - StrongAugImage: {transforms: [
+        RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1},
+        RandomErasingCrop: {},
+        RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]},
+        RandomGrayscale: {prob: 0.2},
+      ]}
+  sup_batch_transforms:
+    - BatchRandomResizeForSSOD: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - NormalizeBox: {}
+    - BboxXYXY2XYWH: {}
+    - Permute: {}
+  unsup_batch_transforms:
+    - BatchRandomResizeForSSOD: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - NormalizeBox: {}
+    - BboxXYXY2XYWH: {}
+    - Permute: {}
+  sup_batch_size: 2
+  unsup_batch_size: 2
+  shuffle: true
+  drop_last: true
+  collate_batch: false
+  use_shared_memory: false
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 2
+  shuffle: false
+  drop_last: false
+
+
+TestReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: { target_size: [640, 640], keep_ratio: False }
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
+
+
+pretrain_student_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+pretrain_teacher_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+
+hidden_dim: 256
+use_focal_loss: True
+eval_size: [640, 640]
+
+architecture: DETR
+DETR:
+  backbone: ResNet
+  neck: HybridEncoder
+  transformer: RTDETRTransformer
+  detr_head: DINOHead
+  post_process: DETRPostProcess
+  post_process_semi: DETRBBoxSemiPostProcess
+ResNet:
+  # index 0 stands for res2
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [1, 2, 3]
+  lr_mult_list: [0.1, 0.1, 0.1, 0.1]
+  num_stages: 4
+  freeze_stem_only: True
+
+HybridEncoder:
+  hidden_dim: 256
+  use_encoder_idx: [2]
+  num_encoder_layers: 1
+  encoder_layer:
+    name: TransformerLayer
+    d_model: 256
+    nhead: 8
+    dim_feedforward: 1024
+    dropout: 0.
+    activation: 'gelu'
+  expansion: 1.0
+
+
+RTDETRTransformer:
+  num_queries: 300
+  position_embed_type: sine
+  feat_strides: [8, 16, 32]
+  num_levels: 3
+  nhead: 8
+  num_decoder_layers: 6
+  dim_feedforward: 1024
+  dropout: 0.0
+  activation: relu
+  num_denoising: 100
+  label_noise_ratio: 0.5
+  box_noise_scale: 1.0
+  learnt_init_query: False
+
+DINOHead:
+  loss:
+    name: DINOLoss
+    loss_coeff: {class: 1, bbox: 5, giou: 2}
+    aux_loss: True
+    use_vfl: True
+    matcher:
+      name: HungarianMatcher
+      matcher_coeff: {class: 2, bbox: 5, giou: 2}
+    use_uni_match: True
+DETRPostProcess:
+  num_top_queries: 300
+
+
+
+SSOD: DETR_SSOD
+DETR_SSOD:
+  teacher: DETR
+  student: DETR
+  train_cfg:
+    sup_weight: 1.0
+    unsup_weight: 1.0
+    ema_start_iters:  10000
+    pseudo_label_initial_score_thr: 0.7
+    min_pseduo_box_size: 0
+    concat_sup_data: True
+  test_cfg:
+    inference_on: teacher
+
+
+
+metric: COCO
+num_classes: 80
+
+# partial labeled COCO, use `SemiCOCODataSet` rather than `COCODataSet`
+TrainDataset:
+  !SemiCOCODataSet
+    image_dir: train2017
+    anno_path: semi_annotations/instances_train2017.1@5.json
+    dataset_dir: dataset/coco
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+# partial unlabeled COCO, use `SemiCOCODataSet` rather than `COCODataSet`
+UnsupTrainDataset:
+  !SemiCOCODataSet
+    image_dir: train2017
+    anno_path: semi_annotations/instances_train2017.1@5-unlabeled.json
+    dataset_dir: dataset/coco
+    data_fields: ['image']
+    supervised: False
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco
+    allow_empty: true
+
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt)
+    dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path'
+
+epoch: 500 #epoch: 60
+
+LearningRate:
+  base_lr: 0.0002
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 1.0
+    milestones: [500]
+    use_warmup: false
+  - !LinearWarmup
+    start_factor: 0.001
+    steps: 2000
+
+OptimizerBuilder:
+  clip_grad_by_norm: 0.1
+  regularizer: false
+  optimizer:
+    type: AdamW
+    weight_decay: 0.0001
--- a/paddle_detection/configs/semi_det/rtdetr_ssod/rt_detr_ssod010_coco_no_warmup.yml
+++ b/paddle_detection/configs/semi_det/rtdetr_ssod/rt_detr_ssod010_coco_no_warmup.yml
@@ -0,0 +1,212 @@
+_BASE_: [
+  '../../runtime.yml',
+  '../../rtdetr/_base_/rtdetr_r50vd.yml',
+  '../../rtdetr/_base_/rtdetr_reader.yml',
+]
+eval_interval: 4000
+save_interval: 4000
+weights: output/rt_detr_ssod/model_final
+find_unused_parameters: True
+save_dir: output
+log_iter: 1
+ssod_method: Semi_RTDETR
+### global config
+use_simple_ema: True
+ema_decay: 0.9996
+use_gpu: true
+
+### reader config
+worker_num: 4
+
+SemiTrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RandomDistort: {prob: 0.8}
+    - RandomExpand: {fill_value: [0., 0., 0.]}
+    - RandomCrop: {prob: 0.8}
+    - RandomFlip: {}
+  weak_aug:
+    - RandomFlip: {prob: 0.0}
+  strong_aug:
+    - StrongAugImage: {transforms: [
+        RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1},
+        RandomErasingCrop: {},
+        RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]},
+        RandomGrayscale: {prob: 0.2},
+      ]}
+  sup_batch_transforms:
+    - BatchRandomResizeForSSOD: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - NormalizeBox: {}
+    - BboxXYXY2XYWH: {}
+    - Permute: {}
+  unsup_batch_transforms:
+    - BatchRandomResizeForSSOD: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - NormalizeBox: {}
+    - BboxXYXY2XYWH: {}
+    - Permute: {}
+  sup_batch_size: 2
+  unsup_batch_size: 2
+  shuffle: true
+  drop_last: true
+  collate_batch: false
+  use_shared_memory: false
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 2
+  shuffle: false
+  drop_last: false
+
+
+TestReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: { target_size: [640, 640], keep_ratio: False }
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
+
+
+pretrain_student_weights: https://bj.bcebos.com/v1/paddledet/data/semidet/rtdetr_ssod/baseline/rtdetr_r50vd_6x_coco_sup010.pdparams
+pretrain_teacher_weights: https://bj.bcebos.com/v1/paddledet/data/semidet/rtdetr_ssod/baseline/rtdetr_r50vd_6x_coco_sup010.pdparams
+
+hidden_dim: 256
+use_focal_loss: True
+eval_size: [640, 640]
+
+architecture: DETR
+DETR:
+  backbone: ResNet
+  neck: HybridEncoder
+  transformer: RTDETRTransformer
+  detr_head: DINOHead
+  post_process: DETRPostProcess
+  post_process_semi: DETRBBoxSemiPostProcess
+ResNet:
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [1, 2, 3]
+  lr_mult_list: [0.1, 0.1, 0.1, 0.1]
+  num_stages: 4
+  freeze_stem_only: True
+
+HybridEncoder:
+  hidden_dim: 256
+  use_encoder_idx: [2]
+  num_encoder_layers: 1
+  encoder_layer:
+    name: TransformerLayer
+    d_model: 256
+    nhead: 8
+    dim_feedforward: 1024
+    dropout: 0.
+    activation: 'gelu'
+  expansion: 1.0
+
+
+PPDETRTransformer:
+  num_queries: 300
+  position_embed_type: sine
+  feat_strides: [8, 16, 32]
+  num_levels: 3
+  nhead: 8
+  num_decoder_layers: 6
+  dim_feedforward: 1024
+  dropout: 0.0
+  activation: relu
+  num_denoising: 100
+  label_noise_ratio: 0.5
+  box_noise_scale: 1.0
+  learnt_init_query: False
+
+DINOHead:
+  loss:
+    name: DINOLoss
+    loss_coeff: {class: 1, bbox: 5, giou: 2}
+    aux_loss: True
+    use_vfl: True
+    matcher:
+      name: HungarianMatcher
+      matcher_coeff: {class: 2, bbox: 5, giou: 2}
+
+DETRPostProcess:
+  num_top_queries: 300
+
+
+
+SSOD: DETR_SSOD
+DETR_SSOD:
+  teacher: DETR
+  student: DETR
+  train_cfg:
+    sup_weight: 1.0
+    unsup_weight: 1.0
+    ema_start_iters:  -1
+    pseudo_label_initial_score_thr: 0.7
+    min_pseduo_box_size: 0
+    concat_sup_data: True
+  test_cfg:
+    inference_on: teacher
+
+
+
+metric: COCO
+num_classes: 80
+
+# partial labeled COCO, use `SemiCOCODataSet` rather than `COCODataSet`
+TrainDataset:
+  !SemiCOCODataSet
+    image_dir: train2017
+    anno_path: semi_annotations/instances_train2017.1@10.json
+    dataset_dir: dataset/coco
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+# partial unlabeled COCO, use `SemiCOCODataSet` rather than `COCODataSet`
+UnsupTrainDataset:
+  !SemiCOCODataSet
+    image_dir: train2017
+    anno_path: semi_annotations/instances_train2017.1@10-unlabeled.json
+    dataset_dir: dataset/coco
+    data_fields: ['image']
+    supervised: False
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco
+    allow_empty: true
+
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt)
+    dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path'
+
+epoch: 400 #epoch: 60
+
+LearningRate:
+  base_lr: 0.0002
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 1.0
+    milestones: [400]
+    use_warmup: false
+  - !LinearWarmup
+    start_factor: 0.001
+    steps: 2000
+
+OptimizerBuilder:
+  clip_grad_by_norm: 0.1
+  regularizer: false
+  optimizer:
+    type: AdamW
+    weight_decay: 0.0001
--- a/paddle_detection/configs/semi_det/rtdetr_ssod/rt_detr_ssod010_coco_with_warmup.yml
+++ b/paddle_detection/configs/semi_det/rtdetr_ssod/rt_detr_ssod010_coco_with_warmup.yml
@@ -0,0 +1,215 @@
+_BASE_: [
+  '../../runtime.yml',
+  '../../rtdetr/_base_/rtdetr_r50vd.yml',
+  '../../rtdetr/_base_/rtdetr_reader.yml',
+]
+
+#for debug
+eval_interval: 4000
+save_interval: 4000
+weights: output/rt_detr_ssod/model_final
+find_unused_parameters: True
+save_dir: output
+log_iter: 50
+ssod_method: Semi_RTDETR
+### global config
+use_simple_ema: True
+ema_decay: 0.9996
+use_gpu: true
+
+### reader config
+worker_num: 4
+
+SemiTrainReader:
+  sample_transforms:
+    - Decode: {}
+    - RandomDistort: {prob: 0.8}
+    - RandomExpand: {fill_value: [0., 0., 0.]}
+    - RandomCrop: {prob: 0.8}
+    - RandomFlip: {}
+  weak_aug:
+    - RandomFlip: {prob: 0.0}
+  strong_aug:
+    - StrongAugImage: {transforms: [
+        RandomColorJitter: {prob: 0.8, brightness: 0.4, contrast: 0.4, saturation: 0.4, hue: 0.1},
+        RandomErasingCrop: {},
+        RandomGaussianBlur: {prob: 0.5, sigma: [0.1, 2.0]},
+        RandomGrayscale: {prob: 0.2},
+      ]}
+  sup_batch_transforms:
+    - BatchRandomResizeForSSOD: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - NormalizeBox: {}
+    - BboxXYXY2XYWH: {}
+    - Permute: {}
+  unsup_batch_transforms:
+    - BatchRandomResizeForSSOD: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - NormalizeBox: {}
+    - BboxXYXY2XYWH: {}
+    - Permute: {}
+  sup_batch_size: 2
+  unsup_batch_size: 2
+  shuffle: true
+  drop_last: true
+  collate_batch: false
+  use_shared_memory: false
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False}
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 2
+  shuffle: false
+  drop_last: false
+
+
+TestReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: { target_size: [640, 640], keep_ratio: False }
+    - NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
+    - Permute: {}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
+
+
+pretrain_student_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+pretrain_teacher_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+
+hidden_dim: 256
+use_focal_loss: True
+eval_size: [640, 640]
+
+architecture: DETR
+DETR:
+  backbone: ResNet
+  neck: HybridEncoder
+  transformer: RTDETRTransformer
+  detr_head: DINOHead
+  post_process: DETRPostProcess
+  post_process_semi: DETRBBoxSemiPostProcess
+ResNet:
+  # index 0 stands for res2
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [1, 2, 3]
+  lr_mult_list: [0.1, 0.1, 0.1, 0.1]
+  num_stages: 4
+  freeze_stem_only: True
+
+HybridEncoder:
+  hidden_dim: 256
+  use_encoder_idx: [2]
+  num_encoder_layers: 1
+  encoder_layer:
+    name: TransformerLayer
+    d_model: 256
+    nhead: 8
+    dim_feedforward: 1024
+    dropout: 0.
+    activation: 'gelu'
+  expansion: 1.0
+
+
+RTDETRTransformer:
+  num_queries: 300
+  position_embed_type: sine
+  feat_strides: [8, 16, 32]
+  num_levels: 3
+  nhead: 8
+  num_decoder_layers: 6
+  dim_feedforward: 1024
+  dropout: 0.0
+  activation: relu
+  num_denoising: 100
+  label_noise_ratio: 0.5
+  box_noise_scale: 1.0
+  learnt_init_query: False
+
+DINOHead:
+  loss:
+    name: DINOLoss
+    loss_coeff: {class: 1, bbox: 5, giou: 2}
+    aux_loss: True
+    use_vfl: True
+    matcher:
+      name: HungarianMatcher
+      matcher_coeff: {class: 2, bbox: 5, giou: 2}
+    use_uni_match: True
+DETRPostProcess:
+  num_top_queries: 300
+
+
+
+SSOD: DETR_SSOD
+DETR_SSOD:
+  teacher: DETR
+  student: DETR
+  train_cfg:
+    sup_weight: 1.0
+    unsup_weight: 1.0
+    ema_start_iters:  10000
+    pseudo_label_initial_score_thr: 0.7
+    min_pseduo_box_size: 0
+    concat_sup_data: True
+  test_cfg:
+    inference_on: teacher
+
+
+
+metric: COCO
+num_classes: 80
+
+# partial labeled COCO, use `SemiCOCODataSet` rather than `COCODataSet`
+TrainDataset:
+  !SemiCOCODataSet
+    image_dir: train2017
+    anno_path: semi_annotations/instances_train2017.1@10.json
+    dataset_dir: dataset/coco
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+# partial unlabeled COCO, use `SemiCOCODataSet` rather than `COCODataSet`
+UnsupTrainDataset:
+  !SemiCOCODataSet
+    image_dir: train2017
+    anno_path: semi_annotations/instances_train2017.1@10-unlabeled.json
+    dataset_dir: dataset/coco
+    data_fields: ['image']
+    supervised: False
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017
+    anno_path: annotations/instances_val2017.json
+    dataset_dir: dataset/coco
+    allow_empty: true
+
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt)
+    dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path'
+
+epoch: 500 #epoch: 60
+
+LearningRate:
+  base_lr: 0.0002
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 1.0
+    milestones: [500]
+    use_warmup: false
+  - !LinearWarmup
+    start_factor: 0.001
+    steps: 2000
+
+OptimizerBuilder:
+  clip_grad_by_norm: 0.1
+  regularizer: false
+  optimizer:
+    type: AdamW
+    weight_decay: 0.0001