更换文档检测模型

2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions
--- a/paddle_detection/configs/sniper/README.md
+++ b/paddle_detection/configs/sniper/README.md
@@ -0,0 +1,67 @@
+English | [简体中文](README_cn.md)
+
+# SNIPER: Efficient Multi-Scale Training
+
+## Model Zoo
+
+| Sniper   | GPU number    | images/GPU |    Model  |    Dataset     | Schedulers | Box AP |          Download                  | Config |
+| :---------------- | :-------------------: | :------------------: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
+| w/o   |    4    |    1    | ResNet-r50-FPN      | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset)  |   1x    |  23.3  | [Download Link](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_visdrone.pdparams ) | [config](./faster_rcnn_r50_fpn_1x_visdrone.yml) |
+| w/ |    4    |    1    | ResNet-r50-FPN      | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset)   |   1x    |  29.7  | [Download Link](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_sniper_visdrone.pdparams) | [config](./faster_rcnn_r50_fpn_1x_sniper_visdrone.yml) |
+
+### Note
+- Here, we use VisDrone dataset, and to detect 9 objects including `person, bicycles, car, van, truck, tricycle, awning-tricycle, bus, motor`.
+- Do not support deploy by now because sniper dataset crop behavior.
+
+## Getting Start
+### 1. Training
+a. optional: Run `tools/sniper_params_stats.py` to get image_target_sizes\valid_box_ratio_ranges\chip_target_size\chip_target_stride，and modify this params in configs/datasets/sniper_coco_detection.yml
+```bash
+python tools/sniper_params_stats.py FasterRCNN annotations/instances_train2017.json
+```
+b. optional: train detector to get negative proposals.
+```bash
+python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --save_proposals --proposals_path=./proposals.json &>sniper.log 2>&1 &
+```
+c. train models
+```bash
+python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --eval &>sniper.log 2>&1 &
+```
+
+### 2. Evaluation
+Evaluating SNIPER on custom dataset in single GPU with following commands:
+```bash
+# use saved checkpoint in training
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final
+```
+
+### 3. Inference
+Inference images in single GPU with following commands, use `--infer_img` to inference a single image and `--infer_dir` to inference all images in the directory.
+
+```bash
+# inference single image
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_img=demo/P0861__1.0__1154___824.png
+
+# inference all images in the directory
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_dir=demo
+```
+
+## Citations
+```
+@misc{1805.09300,
+Author = {Bharat Singh and Mahyar Najibi and Larry S. Davis},
+Title = {SNIPER: Efficient Multi-Scale Training},
+Year = {2018},
+Eprint = {arXiv:1805.09300},
+}
+
+@ARTICLE{9573394,
+  author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin},
+  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
+  title={Detection and Tracking Meet Drones Challenge},
+  year={2021},
+  volume={},
+  number={},
+  pages={1-1},
+  doi={10.1109/TPAMI.2021.3119563}}
+```
--- a/paddle_detection/configs/sniper/README_cn.md
+++ b/paddle_detection/configs/sniper/README_cn.md
@@ -0,0 +1,68 @@
+简体中文 | [English](README.md)
+
+# SNIPER: Efficient Multi-Scale Training
+
+## 模型库
+| 有无sniper   | GPU个数    | 每张GPU图片个数 | 骨架网络             | 数据集     | 学习率策略 | Box AP |                           模型下载                          | 配置文件 |
+| :---------------- | :-------------------: | :------------------: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
+| w/o sniper   |    4    |    1    | ResNet-r50-FPN      | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset)  |   1x    |  23.3  | [下载链接](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_visdrone.pdparams ) | [配置文件](./faster_rcnn_r50_fpn_1x_visdrone.yml) |
+| w sniper |    4    |    1    | ResNet-r50-FPN      | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset)    |   1x    |  29.7  | [下载链接](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_sniper_visdrone.pdparams) | [配置文件](./faster_rcnn_r50_fpn_1x_sniper_visdrone.yml) |
+
+
+### 注意
+- 我们使用的是`VisDrone`数据集, 并且检查其中的9类，包括 `person, bicycles, car, van, truck, tricyle, awning-tricyle, bus, motor`.
+- 暂时不支持和导出预测部署（deploy).
+
+
+## 使用说明
+### 1. 训练
+a. 可选：统计数据集信息，获得数据缩放尺度、有效框范围、chip尺度和步长等参数，修改configs/datasets/sniper_coco_detection.yml中对应参数
+```bash
+python tools/sniper_params_stats.py FasterRCNN annotations/instances_train2017.json
+```
+b. 可选：训练检测器，生成负样本
+```bash
+python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --save_proposals --proposals_path=./proposals.json &>sniper.log 2>&1 &
+```
+c. 训练模型
+```bash
+python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --eval &>sniper.log 2>&1 &
+```
+
+### 2. 评估
+使用单GPU通过如下命令一键式评估模型在COCO val2017数据集效果
+```bash
+# 使用训练保存的checkpoint
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final
+```
+
+### 3. 推理
+使用单GPU通过如下命令一键式推理图像，通过`--infer_img`指定图像路径，或通过`--infer_dir`指定目录并推理目录下所有图像
+
+```bash
+# 推理单张图像
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_img=demo/P0861__1.0__1154___824.png
+
+# 推理目录下所有图像
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_dir=demo
+```
+
+## Citations
+```
+@misc{1805.09300,
+Author = {Bharat Singh and Mahyar Najibi and Larry S. Davis},
+Title = {SNIPER: Efficient Multi-Scale Training},
+Year = {2018},
+Eprint = {arXiv:1805.09300},
+}
+
+@ARTICLE{9573394,
+  author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin},
+  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
+  title={Detection and Tracking Meet Drones Challenge},
+  year={2021},
+  volume={},
+  number={},
+  pages={1-1},
+  doi={10.1109/TPAMI.2021.3119563}}
+```
--- a/paddle_detection/configs/sniper/_base_/faster_fpn_reader.yml
+++ b/paddle_detection/configs/sniper/_base_/faster_fpn_reader.yml
@@ -0,0 +1,40 @@
+worker_num: 2
+TrainReader:
+  sample_transforms:
+  - SniperDecodeCrop: {}
+  - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True}
+  - RandomFlip: {prob: 0.5}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32}
+  batch_size: 1
+  shuffle: true
+  drop_last: true
+  collate_batch: false
+
+
+EvalReader:
+  sample_transforms:
+  - SniperDecodeCrop: {}
+  - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
+
+
+TestReader:
+  sample_transforms:
+  - SniperDecodeCrop: {}
+  - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
--- a/paddle_detection/configs/sniper/_base_/faster_reader.yml
+++ b/paddle_detection/configs/sniper/_base_/faster_reader.yml
@@ -0,0 +1,41 @@
+worker_num: 2
+TrainReader:
+  sample_transforms:
+  - SniperDecodeCrop: {}
+  - RandomResize: {target_size: [[800, 1333]], interp: 2, keep_ratio: True}
+  - RandomFlip: {prob: 0.5}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: -1}
+  batch_size: 1
+  shuffle: true
+  drop_last: true
+  collate_batch: false
+  use_shared_memory: true
+
+
+EvalReader:
+  sample_transforms:
+  - SniperDecodeCrop: {}
+  - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: -1}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
+
+
+TestReader:
+  sample_transforms:
+  - SniperDecodeCrop: {}
+  - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: -1}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
--- a/paddle_detection/configs/sniper/_base_/ppyolo_reader.yml
+++ b/paddle_detection/configs/sniper/_base_/ppyolo_reader.yml
@@ -0,0 +1,40 @@
+worker_num: 2
+TrainReader:
+  inputs_def:
+    num_max_boxes: 50
+  sample_transforms:
+    - SniperDecodeCrop: {}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {}
+    - RandomFlip: {}
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeBox: {}
+    - PadBox: {num_max_boxes: 50}
+    - BboxXYXY2XYWH: {}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
+  batch_size: 8
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+EvalReader:
+  sample_transforms:
+    - SniperDecodeCrop: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 8
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 608, 608]
+  sample_transforms:
+    - SniperDecodeCrop: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
--- a/paddle_detection/configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml
+++ b/paddle_detection/configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml
@@ -0,0 +1,9 @@
+_BASE_: [
+  '../datasets/sniper_visdrone_detection.yml',
+  '../runtime.yml',
+  '../faster_rcnn/_base_/faster_rcnn_r50_fpn.yml',
+  '../faster_rcnn/_base_/optimizer_1x.yml',
+  '_base_/faster_fpn_reader.yml',
+]
+weights: output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final
+find_unused_parameters: true
--- a/paddle_detection/configs/sniper/faster_rcnn_r50_fpn_1x_visdrone.yml
+++ b/paddle_detection/configs/sniper/faster_rcnn_r50_fpn_1x_visdrone.yml
@@ -0,0 +1,29 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '../faster_rcnn/_base_/optimizer_1x.yml',
+  '../faster_rcnn/_base_/faster_rcnn_r50_fpn.yml',
+  '../faster_rcnn/_base_/faster_fpn_reader.yml',
+]
+weights: output/faster_rcnn_r50_fpn_1x_visdrone/model_final
+
+
+metric: COCO
+num_classes: 9
+
+TrainDataset:
+  !COCODataSet
+    image_dir: train
+    anno_path: annotations/train.json
+    dataset_dir: dataset/VisDrone2019_coco
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val
+    anno_path: annotations/val.json
+    dataset_dir: dataset/VisDrone2019_coco
+
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/val.json
--- a/paddle_detection/configs/sniper/ppyolo_r50vd_dcn_1x_sniper_visdrone.yml
+++ b/paddle_detection/configs/sniper/ppyolo_r50vd_dcn_1x_sniper_visdrone.yml
@@ -0,0 +1,33 @@
+_BASE_: [
+  '../datasets/sniper_visdrone_detection.yml',
+  '../runtime.yml',
+  '../ppyolo/_base_/ppyolo_r50vd_dcn.yml',
+  '../ppyolo/_base_/optimizer_1x.yml',
+  './_base_/ppyolo_reader.yml',
+]
+
+snapshot_epoch: 8
+use_ema: true
+weights: output/ppyolo_r50vd_dcn_1x_sniper_visdrone/model_final
+
+
+
+LearningRate:
+  base_lr: 0.005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.
+    milestones:
+    - 153
+    - 173
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 4000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
--- a/paddle_detection/configs/sniper/ppyolo_r50vd_dcn_1x_visdrone.yml
+++ b/paddle_detection/configs/sniper/ppyolo_r50vd_dcn_1x_visdrone.yml
@@ -0,0 +1,54 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '../ppyolo/_base_/ppyolo_r50vd_dcn.yml',
+  '../ppyolo/_base_/optimizer_1x.yml',
+  '../ppyolo/_base_/ppyolo_reader.yml',
+]
+
+snapshot_epoch: 8
+use_ema: true
+weights: output/ppyolo_r50vd_dcn_1x_visdrone/model_final
+
+epoch: 192
+
+LearningRate:
+  base_lr: 0.005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 153
+    - 173
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+
+metric: COCO
+num_classes: 9
+
+TrainDataset:
+  !COCODataSet
+    image_dir: train
+    anno_path: annotations/train.json
+    dataset_dir: dataset/VisDrone2019_coco
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val
+    anno_path: annotations/val.json
+    dataset_dir: dataset/VisDrone2019_coco
+
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/val.json