更换文档检测模型

2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions
--- a/paddle_detection/configs/rotate/fcosr/README.md
+++ b/paddle_detection/configs/rotate/fcosr/README.md
@@ -0,0 +1,91 @@
+简体中文 | [English](README_en.md)
+
+# FCOSR
+
+## 内容
+- [简介](#简介)
+- [模型库](#模型库)
+- [使用说明](#使用说明)
+- [预测部署](#预测部署)
+- [引用](#引用)
+
+## 简介
+
+[FCOSR](https://arxiv.org/abs/2111.10780)是基于[FCOS](https://arxiv.org/abs/1904.01355)的单阶段Anchor-Free的旋转框检测算法。FCOSR主要聚焦于旋转框的标签匹配策略，提出了椭圆中心采样和模糊样本标签匹配的方法。在loss方面，FCOSR使用了[ProbIoU](https://arxiv.org/abs/2106.06072)避免边界不连续性问题。
+
+## 模型库
+
+| 模型 | Backbone | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 |
+|:---:|:--------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
+| FCOSR-M | ResNeXt-50 | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
+
+**注意:**
+
+- 如果**GPU卡数**或者**batch size**发生了改变，你需要按照公式 **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)** 调整学习率。
+- 模型库中的模型默认使用单尺度训练单尺度测试。如果数据增广一栏标明MS，意味着使用多尺度训练和多尺度测试。如果数据增广一栏标明RR，意味着使用RandomRotate数据增广进行训练。
+
+## 使用说明
+
+参考[数据准备](../README.md#数据准备)准备数据。
+
+### 训练
+
+GPU单卡训练
+``` bash
+CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml
+```
+
+GPU多卡训练
+``` bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml
+```
+
+### 预测
+
+执行以下命令预测单张图片，图片预测结果会默认保存在`output`文件夹下面
+``` bash
+python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5
+```
+
+### DOTA数据集评估
+
+参考[DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), 评估DOTA数据集需要生成一个包含所有检测结果的zip文件，每一类的检测结果储存在一个txt文件中，txt文件中每行格式为：`image_name score x1 y1 x2 y2 x3 y3 x4 y4`。将生成的zip文件提交到[DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html)的Task1进行评估。你可以执行以下命令得到test数据集的预测结果：
+``` bash
+python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_fcosr --visualize=False --save_results=True
+```
+将预测结果处理成官网评估所需要的格式：
+``` bash
+python configs/rotate/tools/generate_result.py --pred_txt_dir=output_fcosr/ --output_dir=submit/ --data_type=dota10
+
+zip -r submit.zip submit
+```
+
+## 预测部署
+
+部署教程请参考[预测部署](../../../deploy/README.md)
+
+## 引用
+
+```
+@article{li2021fcosr,
+  title={Fcosr: A simple anchor-free rotated detector for aerial object detection},
+  author={Li, Zhonghua and Hou, Biao and Wu, Zitong and Jiao, Licheng and Ren, Bo and Yang, Chen},
+  journal={arXiv preprint arXiv:2111.10780},
+  year={2021}
+}
+
+@inproceedings{tian2019fcos,
+  title={Fcos: Fully convolutional one-stage object detection},
+  author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
+  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
+  pages={9627--9636},
+  year={2019}
+}
+
+@article{llerena2021gaussian,
+  title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection},
+  author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio},
+  journal={arXiv preprint arXiv:2106.06072},
+  year={2021}
+}
+```
--- a/paddle_detection/configs/rotate/fcosr/README_en.md
+++ b/paddle_detection/configs/rotate/fcosr/README_en.md
@@ -0,0 +1,92 @@
+English | [简体中文](README.md)
+
+# FCOSR
+
+## Content
+- [Introduction](#Introduction)
+- [Model Zoo](#Model-Zoo)
+- [Getting Start](#Getting-Start)
+- [Deployment](#Deployment)
+- [Citations](#Citations)
+
+## Introduction
+
+[FCOSR](https://arxiv.org/abs/2111.10780) is one stage anchor-free model based on [FCOS](https://arxiv.org/abs/1904.01355). FCOSR focuses on the label assignment strategy for oriented bounding boxes and proposes ellipse center sampling method and fuzzy sample assignment strategy. In terms of loss, FCOSR uses [ProbIoU](https://arxiv.org/abs/2106.06072) to avoid boundary discontinuity problem.
+
+## Model Zoo
+
+| Model | Backbone | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config |
+|:---:|:--------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
+| FCOSR-M | ResNeXt-50 | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
+
+**Notes:**
+
+- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)**.
+- Models in model zoo is trained and tested with single scale by default. If `MS` is indicated in the data augmentation column, it means that multi-scale training and multi-scale testing are used. If `RR` is indicated in the data augmentation column, it means that RandomRotate data augmentation is used for training.
+
+## Getting Start
+
+Refer to [Data-Preparation](../README_en.md#Data-Preparation) to prepare data.
+
+### Training
+
+Single GPU Training
+``` bash
+CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml
+```
+
+Multiple GPUs Training
+``` bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml
+```
+
+### Inference
+
+Run the follow command to infer single image, the result of inference will be saved in `output` directory by default.
+
+``` bash
+python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5
+```
+
+### Evaluation on DOTA Dataset
+Refering to [DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), You need to submit a zip file containing results for all test images for evaluation. The detection results of each category are stored in a txt file, each line of which is in the following format
+`image_id score x1 y1 x2 y2 x3 y3 x4 y4`. To evaluate, you should submit the generated zip file to the Task1 of [DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html). You can run the following command to get the inference results of test dataset:
+``` bash
+python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_fcosr --visualize=False --save_results=True
+```
+Process the prediction results into the format required for the official website evaluation:
+``` bash
+python configs/rotate/tools/generate_result.py --pred_txt_dir=output_fcosr/ --output_dir=submit/ --data_type=dota10
+
+zip -r submit.zip submit
+```
+
+## Deployment
+
+Please refer to the deployment tutorial[Deployment](../../../deploy/README_en.md)
+
+## Citations
+
+```
+@article{li2021fcosr,
+  title={Fcosr: A simple anchor-free rotated detector for aerial object detection},
+  author={Li, Zhonghua and Hou, Biao and Wu, Zitong and Jiao, Licheng and Ren, Bo and Yang, Chen},
+  journal={arXiv preprint arXiv:2111.10780},
+  year={2021}
+}
+
+@inproceedings{tian2019fcos,
+  title={Fcos: Fully convolutional one-stage object detection},
+  author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
+  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
+  pages={9627--9636},
+  year={2019}
+}
+
+@article{llerena2021gaussian,
+  title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection},
+  author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio},
+  journal={arXiv preprint arXiv:2106.06072},
+  year={2021}
+}
+```
--- a/paddle_detection/configs/rotate/fcosr/_base_/fcosr_reader.yml
+++ b/paddle_detection/configs/rotate/fcosr/_base_/fcosr_reader.yml
@@ -0,0 +1,46 @@
+worker_num: 4
+image_height: &image_height 1024
+image_width: &image_width 1024
+image_size: &image_size [*image_height, *image_width]
+
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - Poly2Array: {}
+    - RandomRFlip: {}
+    - RandomRRotate: {angle_mode: 'value', angle: [0, 90, 180, -90]}
+    - RandomRRotate: {angle_mode: 'value', angle: [30, 60], rotate_prob: 0.5}
+    - RResize: {target_size: *image_size, keep_ratio: True, interp: 2}
+    - Poly2RBox: {filter_threshold: 2, filter_mode: 'edge', rbox_type: 'oc'}
+  batch_transforms:
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - PadRGT: {}
+    - PadBatch: {pad_to_stride: 32}
+  batch_size: 4
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+  collate_batch: true
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Poly2Array: {}
+    - RResize: {target_size: *image_size, keep_ratio: True, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_transforms:
+    - PadBatch: {pad_to_stride: 32}
+  batch_size: 2
+  collate_batch: false
+
+TestReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: *image_size, keep_ratio: True, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_transforms:
+    - PadBatch: {pad_to_stride: 32}
+  batch_size: 2
--- a/paddle_detection/configs/rotate/fcosr/_base_/fcosr_x50.yml
+++ b/paddle_detection/configs/rotate/fcosr/_base_/fcosr_x50.yml
@@ -0,0 +1,44 @@
+architecture: YOLOv3
+snapshot_epoch: 1
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt50_32x4d_pretrained.pdparams
+
+YOLOv3:
+  backbone: ResNet
+  neck: FPN
+  yolo_head: FCOSRHead
+  post_process: ~
+
+ResNet:
+  depth: 50
+  groups: 32
+  base_width: 4
+  variant: b
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [1,2,3]
+  num_stages: 4
+
+FPN:
+  out_channel: 256
+  extra_stage: 2
+  has_extra_convs: true
+  use_c5: false
+  relu_before_extra_convs: true
+
+FCOSRHead:
+  feat_channels: 256
+  fpn_strides: [8, 16, 32, 64, 128]
+  stacked_convs: 4
+  loss_weight: {class: 1.0, probiou: 1.0}
+  assigner:
+    name: FCOSRAssigner
+    factor: 12
+    threshold: 0.23
+    boundary: [[-1, 64], [64, 128], [128, 256], [256, 512], [512, 100000000.0]]
+  nms:
+    name: MultiClassNMS
+    nms_top_k: 2000
+    keep_top_k: -1
+    score_threshold: 0.1
+    nms_threshold: 0.1
+    normalized: False
--- a/paddle_detection/configs/rotate/fcosr/_base_/optimizer_3x.yml
+++ b/paddle_detection/configs/rotate/fcosr/_base_/optimizer_3x.yml
@@ -0,0 +1,20 @@
+epoch: 36
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+    - !PiecewiseDecay
+      gamma: 0.1
+      milestones: [24, 33]
+    - !LinearWarmup
+      start_factor: 0.3333333
+      steps: 500
+
+OptimizerBuilder:
+  clip_grad_by_norm: 35.
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
--- a/paddle_detection/configs/rotate/fcosr/fcosr_x50_3x_dota.yml
+++ b/paddle_detection/configs/rotate/fcosr/fcosr_x50_3x_dota.yml
@@ -0,0 +1,9 @@
+_BASE_: [
+  '../../datasets/dota.yml',
+  '../../runtime.yml',
+  '_base_/optimizer_3x.yml',
+  '_base_/fcosr_reader.yml',
+  '_base_/fcosr_x50.yml'
+]
+
+weights: output/fcosr_x50_3x_dota/model_final