更换文档检测模型

This commit is contained in:
2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions

View File

@@ -0,0 +1,67 @@
English | [简体中文](README_cn.md)
# SNIPER: Efficient Multi-Scale Training
## Model Zoo
| Sniper | GPU number | images/GPU | Model | Dataset | Schedulers | Box AP | Download | Config |
| :---------------- | :-------------------: | :------------------: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
| w/o | 4 | 1 | ResNet-r50-FPN | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset) | 1x | 23.3 | [Download Link](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_visdrone.pdparams ) | [config](./faster_rcnn_r50_fpn_1x_visdrone.yml) |
| w/ | 4 | 1 | ResNet-r50-FPN | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset) | 1x | 29.7 | [Download Link](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_sniper_visdrone.pdparams) | [config](./faster_rcnn_r50_fpn_1x_sniper_visdrone.yml) |
### Note
- Here, we use VisDrone dataset, and to detect 9 objects including `person, bicycles, car, van, truck, tricycle, awning-tricycle, bus, motor`.
- Do not support deploy by now because sniper dataset crop behavior.
## Getting Start
### 1. Training
a. optional: Run `tools/sniper_params_stats.py` to get image_target_sizes\valid_box_ratio_ranges\chip_target_size\chip_target_strideand modify this params in configs/datasets/sniper_coco_detection.yml
```bash
python tools/sniper_params_stats.py FasterRCNN annotations/instances_train2017.json
```
b. optional: train detector to get negative proposals.
```bash
python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --save_proposals --proposals_path=./proposals.json &>sniper.log 2>&1 &
```
c. train models
```bash
python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --eval &>sniper.log 2>&1 &
```
### 2. Evaluation
Evaluating SNIPER on custom dataset in single GPU with following commands:
```bash
# use saved checkpoint in training
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final
```
### 3. Inference
Inference images in single GPU with following commands, use `--infer_img` to inference a single image and `--infer_dir` to inference all images in the directory.
```bash
# inference single image
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_img=demo/P0861__1.0__1154___824.png
# inference all images in the directory
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_dir=demo
```
## Citations
```
@misc{1805.09300,
Author = {Bharat Singh and Mahyar Najibi and Larry S. Davis},
Title = {SNIPER: Efficient Multi-Scale Training},
Year = {2018},
Eprint = {arXiv:1805.09300},
}
@ARTICLE{9573394,
author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Detection and Tracking Meet Drones Challenge},
year={2021},
volume={},
number={},
pages={1-1},
doi={10.1109/TPAMI.2021.3119563}}
```

View File

@@ -0,0 +1,68 @@
简体中文 | [English](README.md)
# SNIPER: Efficient Multi-Scale Training
## 模型库
| 有无sniper | GPU个数 | 每张GPU图片个数 | 骨架网络 | 数据集 | 学习率策略 | Box AP | 模型下载 | 配置文件 |
| :---------------- | :-------------------: | :------------------: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
| w/o sniper | 4 | 1 | ResNet-r50-FPN | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset) | 1x | 23.3 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_visdrone.pdparams ) | [配置文件](./faster_rcnn_r50_fpn_1x_visdrone.yml) |
| w sniper | 4 | 1 | ResNet-r50-FPN | [VisDrone](https://github.com/VisDrone/VisDrone-Dataset) | 1x | 29.7 | [下载链接](https://bj.bcebos.com/v1/paddledet/models/faster_rcnn_r50_fpn_1x_sniper_visdrone.pdparams) | [配置文件](./faster_rcnn_r50_fpn_1x_sniper_visdrone.yml) |
### 注意
- 我们使用的是`VisDrone`数据集, 并且检查其中的9类包括 `person, bicycles, car, van, truck, tricyle, awning-tricyle, bus, motor`.
- 暂时不支持和导出预测部署deploy).
## 使用说明
### 1. 训练
a. 可选统计数据集信息获得数据缩放尺度、有效框范围、chip尺度和步长等参数修改configs/datasets/sniper_coco_detection.yml中对应参数
```bash
python tools/sniper_params_stats.py FasterRCNN annotations/instances_train2017.json
```
b. 可选:训练检测器,生成负样本
```bash
python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --save_proposals --proposals_path=./proposals.json &>sniper.log 2>&1 &
```
c. 训练模型
```bash
python -m paddle.distributed.launch --log_dir=./sniper/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml --eval &>sniper.log 2>&1 &
```
### 2. 评估
使用单GPU通过如下命令一键式评估模型在COCO val2017数据集效果
```bash
# 使用训练保存的checkpoint
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final
```
### 3. 推理
使用单GPU通过如下命令一键式推理图像通过`--infer_img`指定图像路径,或通过`--infer_dir`指定目录并推理目录下所有图像
```bash
# 推理单张图像
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_img=demo/P0861__1.0__1154___824.png
# 推理目录下所有图像
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/sniper/faster_rcnn_r50_fpn_1x_sniper_visdrone.yml -o weights=output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final --infer_dir=demo
```
## Citations
```
@misc{1805.09300,
Author = {Bharat Singh and Mahyar Najibi and Larry S. Davis},
Title = {SNIPER: Efficient Multi-Scale Training},
Year = {2018},
Eprint = {arXiv:1805.09300},
}
@ARTICLE{9573394,
author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Detection and Tracking Meet Drones Challenge},
year={2021},
volume={},
number={},
pages={1-1},
doi={10.1109/TPAMI.2021.3119563}}
```

View File

@@ -0,0 +1,40 @@
worker_num: 2
TrainReader:
sample_transforms:
- SniperDecodeCrop: {}
- RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True}
- RandomFlip: {prob: 0.5}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: true
drop_last: true
collate_batch: false
EvalReader:
sample_transforms:
- SniperDecodeCrop: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: false
drop_last: false
TestReader:
sample_transforms:
- SniperDecodeCrop: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: false
drop_last: false

View File

@@ -0,0 +1,41 @@
worker_num: 2
TrainReader:
sample_transforms:
- SniperDecodeCrop: {}
- RandomResize: {target_size: [[800, 1333]], interp: 2, keep_ratio: True}
- RandomFlip: {prob: 0.5}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: -1}
batch_size: 1
shuffle: true
drop_last: true
collate_batch: false
use_shared_memory: true
EvalReader:
sample_transforms:
- SniperDecodeCrop: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: -1}
batch_size: 1
shuffle: false
drop_last: false
TestReader:
sample_transforms:
- SniperDecodeCrop: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: -1}
batch_size: 1
shuffle: false
drop_last: false

View File

@@ -0,0 +1,40 @@
worker_num: 2
TrainReader:
inputs_def:
num_max_boxes: 50
sample_transforms:
- SniperDecodeCrop: {}
- RandomDistort: {}
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
- RandomCrop: {}
- RandomFlip: {}
batch_transforms:
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeBox: {}
- PadBox: {num_max_boxes: 50}
- BboxXYXY2XYWH: {}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
- Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
batch_size: 8
shuffle: true
drop_last: true
use_shared_memory: true
EvalReader:
sample_transforms:
- SniperDecodeCrop: {}
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
batch_size: 8
TestReader:
inputs_def:
image_shape: [3, 608, 608]
sample_transforms:
- SniperDecodeCrop: {}
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
batch_size: 1

View File

@@ -0,0 +1,9 @@
_BASE_: [
'../datasets/sniper_visdrone_detection.yml',
'../runtime.yml',
'../faster_rcnn/_base_/faster_rcnn_r50_fpn.yml',
'../faster_rcnn/_base_/optimizer_1x.yml',
'_base_/faster_fpn_reader.yml',
]
weights: output/faster_rcnn_r50_fpn_1x_sniper_visdrone/model_final
find_unused_parameters: true

View File

@@ -0,0 +1,29 @@
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'../faster_rcnn/_base_/optimizer_1x.yml',
'../faster_rcnn/_base_/faster_rcnn_r50_fpn.yml',
'../faster_rcnn/_base_/faster_fpn_reader.yml',
]
weights: output/faster_rcnn_r50_fpn_1x_visdrone/model_final
metric: COCO
num_classes: 9
TrainDataset:
!COCODataSet
image_dir: train
anno_path: annotations/train.json
dataset_dir: dataset/VisDrone2019_coco
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: val
anno_path: annotations/val.json
dataset_dir: dataset/VisDrone2019_coco
TestDataset:
!ImageFolder
anno_path: annotations/val.json

View File

@@ -0,0 +1,33 @@
_BASE_: [
'../datasets/sniper_visdrone_detection.yml',
'../runtime.yml',
'../ppyolo/_base_/ppyolo_r50vd_dcn.yml',
'../ppyolo/_base_/optimizer_1x.yml',
'./_base_/ppyolo_reader.yml',
]
snapshot_epoch: 8
use_ema: true
weights: output/ppyolo_r50vd_dcn_1x_sniper_visdrone/model_final
LearningRate:
base_lr: 0.005
schedulers:
- !PiecewiseDecay
gamma: 0.
milestones:
- 153
- 173
- !LinearWarmup
start_factor: 0.1
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2

View File

@@ -0,0 +1,54 @@
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'../ppyolo/_base_/ppyolo_r50vd_dcn.yml',
'../ppyolo/_base_/optimizer_1x.yml',
'../ppyolo/_base_/ppyolo_reader.yml',
]
snapshot_epoch: 8
use_ema: true
weights: output/ppyolo_r50vd_dcn_1x_visdrone/model_final
epoch: 192
LearningRate:
base_lr: 0.005
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 153
- 173
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
metric: COCO
num_classes: 9
TrainDataset:
!COCODataSet
image_dir: train
anno_path: annotations/train.json
dataset_dir: dataset/VisDrone2019_coco
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: val
anno_path: annotations/val.json
dataset_dir: dataset/VisDrone2019_coco
TestDataset:
!ImageFolder
anno_path: annotations/val.json