移动paddle_detection
This commit is contained in:
@@ -0,0 +1,132 @@
|
||||
简体中文 | [English](README_en.md)
|
||||
|
||||
# 旋转框检测
|
||||
|
||||
## 内容
|
||||
- [简介](#简介)
|
||||
- [模型库](#模型库)
|
||||
- [数据准备](#数据准备)
|
||||
- [安装依赖](#安装依赖)
|
||||
|
||||
## 简介
|
||||
旋转框常用于检测带有角度信息的矩形框,即矩形框的宽和高不再与图像坐标轴平行。相较于水平矩形框,旋转矩形框一般包括更少的背景信息。旋转框检测常用于遥感等场景中。
|
||||
|
||||
## 模型库
|
||||
|
||||
| 模型 | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 |
|
||||
|:---:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
|
||||
| [S2ANet](./s2anet/README.md) | 73.84 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) |
|
||||
| [FCOSR](./fcosr/README.md) | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
|
||||
| [PP-YOLOE-R-s](./ppyoloe_r/README.md) | 73.82 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) |
|
||||
| [PP-YOLOE-R-s](./ppyoloe_r/README.md) | 79.42 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) |
|
||||
| [PP-YOLOE-R-m](./ppyoloe_r/README.md) | 77.64 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) |
|
||||
| [PP-YOLOE-R-m](./ppyoloe_r/README.md) | 79.71 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) |
|
||||
| [PP-YOLOE-R-l](./ppyoloe_r/README.md) | 78.14 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) |
|
||||
| [PP-YOLOE-R-l](./ppyoloe_r/README.md) | 80.02 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) |
|
||||
| [PP-YOLOE-R-x](./ppyoloe_r/README.md) | 78.28 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) |
|
||||
| [PP-YOLOE-R-x](./ppyoloe_r/README.md) | 80.73 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) |
|
||||
|
||||
**注意:**
|
||||
|
||||
- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)** 调整学习率。
|
||||
- 模型库中的模型默认使用单尺度训练单尺度测试。如果数据增广一栏标明MS,意味着使用多尺度训练和多尺度测试。如果数据增广一栏标明RR,意味着使用RandomRotate数据增广进行训练。
|
||||
|
||||
## 数据准备
|
||||
### DOTA数据准备
|
||||
DOTA数据集是一个大规模的遥感图像数据集,包含旋转框和水平框的标注。可以从[DOTA数据集官网](https://captain-whu.github.io/DOTA/)下载数据集并解压,解压后的数据集目录结构如下所示:
|
||||
```
|
||||
${DOTA_ROOT}
|
||||
├── test
|
||||
│ └── images
|
||||
├── train
|
||||
│ ├── images
|
||||
│ └── labelTxt
|
||||
└── val
|
||||
├── images
|
||||
└── labelTxt
|
||||
```
|
||||
|
||||
对于有标注的数据,每一张图片会对应一个同名的txt文件,文件中每一行为一个旋转框的标注,其格式如下:
|
||||
```
|
||||
x1 y1 x2 y2 x3 y3 x4 y4 class_name difficult
|
||||
```
|
||||
|
||||
#### 单尺度切图
|
||||
DOTA数据集分辨率较高,因此一般在训练和测试之前对图像进行离线切图,使用单尺度进行切图可以使用以下命令:
|
||||
``` bash
|
||||
# 对于有标注的数据进行切图
|
||||
python configs/rotate/tools/prepare_data.py \
|
||||
--input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \
|
||||
--output_dir ${OUTPUT_DIR}/trainval1024/ \
|
||||
--coco_json_file DOTA_trainval1024.json \
|
||||
--subsize 1024 \
|
||||
--gap 200 \
|
||||
--rates 1.0
|
||||
|
||||
# 对于无标注的数据进行切图需要设置--image_only
|
||||
python configs/rotate/tools/prepare_data.py \
|
||||
--input_dirs ${DOTA_ROOT}/test/ \
|
||||
--output_dir ${OUTPUT_DIR}/test1024/ \
|
||||
--coco_json_file DOTA_test1024.json \
|
||||
--subsize 1024 \
|
||||
--gap 200 \
|
||||
--rates 1.0 \
|
||||
--image_only
|
||||
|
||||
```
|
||||
|
||||
#### 多尺度切图
|
||||
使用多尺度进行切图可以使用以下命令:
|
||||
``` bash
|
||||
# 对于有标注的数据进行切图
|
||||
python configs/rotate/tools/prepare_data.py \
|
||||
--input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \
|
||||
--output_dir ${OUTPUT_DIR}/trainval/ \
|
||||
--coco_json_file DOTA_trainval1024.json \
|
||||
--subsize 1024 \
|
||||
--gap 500 \
|
||||
--rates 0.5 1.0 1.5
|
||||
|
||||
# 对于无标注的数据进行切图需要设置--image_only
|
||||
python configs/rotate/tools/prepare_data.py \
|
||||
--input_dirs ${DOTA_ROOT}/test/ \
|
||||
--output_dir ${OUTPUT_DIR}/test1024/ \
|
||||
--coco_json_file DOTA_test1024.json \
|
||||
--subsize 1024 \
|
||||
--gap 500 \
|
||||
--rates 0.5 1.0 1.5 \
|
||||
--image_only
|
||||
```
|
||||
|
||||
### 自定义数据集
|
||||
旋转框使用标准COCO数据格式,你可以将你的数据集转换成COCO格式以训练模型。COCO标准数据格式的标注信息中包含以下信息:
|
||||
``` python
|
||||
'annotations': [
|
||||
{
|
||||
'id': 2083, 'category_id': 9, 'image_id': 9008,
|
||||
'bbox': [x, y, w, h], # 水平框标注
|
||||
'segmentation': [[x1, y1, x2, y2, x3, y3, x4, y4]], # 旋转框标注
|
||||
...
|
||||
}
|
||||
...
|
||||
]
|
||||
```
|
||||
**需要注意的是`bbox`的标注是水平框标注,`segmentation`为旋转框四个点的标注(顺时针或逆时针均可)。在旋转框训练时`bbox`是可以缺省,一般推荐根据旋转框标注`segmentation`生成。** 在PaddleDetection 2.4及之前的版本,`bbox`为旋转框标注[x, y, w, h, angle],`segmentation`缺省,**目前该格式已不再支持,请下载最新数据集或者转换成标准COCO格式**。
|
||||
|
||||
## 安装依赖
|
||||
旋转框检测模型需要依赖外部算子进行训练,评估等。Linux环境下,你可以执行以下命令进行编译安装
|
||||
```
|
||||
cd ppdet/ext_op
|
||||
python setup.py install
|
||||
```
|
||||
Windows环境请按照如下步骤安装:
|
||||
|
||||
(1)准备Visual Studio (版本需要>=Visual Studio 2015 update3),这里以VS2017为例;
|
||||
|
||||
(2)点击开始-->Visual Studio 2017-->适用于 VS 2017 的x64本机工具命令提示;
|
||||
|
||||
(3)设置环境变量:`set DISTUTILS_USE_SDK=1`
|
||||
|
||||
(4)进入`PaddleDetection/ppdet/ext_op`目录,通过`python setup.py install`命令进行安装。
|
||||
|
||||
安装完成后,可以执行`ppdet/ext_op/unittest`下的单测验证外部op是否正确安装
|
||||
@@ -0,0 +1,129 @@
|
||||
English | [简体中文](README.md)
|
||||
|
||||
# Rotated Object Detection
|
||||
|
||||
## Table of Contents
|
||||
- [Introduction](#Introduction)
|
||||
- [Model Zoo](#Model-Zoo)
|
||||
- [Data Preparation](#Data-Preparation)
|
||||
- [Installation](#Installation)
|
||||
|
||||
## Introduction
|
||||
Rotated object detection is used to detect rectangular bounding boxes with angle information, that is, the long and short sides of the rectangular bounding box are no longer parallel to the image coordinate axes. Oriented bounding boxes generally contain less background information than horizontal bounding boxes. Rotated object detection is often used in remote sensing scenarios.
|
||||
|
||||
## Model Zoo
|
||||
| Model | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config |
|
||||
|:---:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
|
||||
| [S2ANet](./s2anet/README_en.md) | 73.84 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) |
|
||||
| [FCOSR](./fcosr/README_en.md) | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
|
||||
| [PP-YOLOE-R-s](./ppyoloe_r/README_en.md) | 73.82 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) |
|
||||
| [PP-YOLOE-R-s](./ppyoloe_r/README_en.md) | 79.42 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) |
|
||||
| [PP-YOLOE-R-m](./ppyoloe_r/README_en.md) | 77.64 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) |
|
||||
| [PP-YOLOE-R-m](./ppyoloe_r/README_en.md) | 79.71 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) |
|
||||
| [PP-YOLOE-R-l](./ppyoloe_r/README_en.md) | 78.14 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) |
|
||||
| [PP-YOLOE-R-l](./ppyoloe_r/README_en.md) | 80.02 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) |
|
||||
| [PP-YOLOE-R-x](./ppyoloe_r/README_en.md) | 78.28 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) |
|
||||
| [PP-YOLOE-R-x](./ppyoloe_r/README_en.md) | 80.73 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) |
|
||||
|
||||
**Notes:**
|
||||
|
||||
- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)**.
|
||||
- Models in model zoo is trained and tested with single scale by default. If `MS` is indicated in the data augmentation column, it means that multi-scale training and multi-scale testing are used. If `RR` is indicated in the data augmentation column, it means that RandomRotate data augmentation is used for training.
|
||||
|
||||
## Data Preparation
|
||||
### DOTA Dataset preparation
|
||||
The DOTA dataset is a large-scale remote sensing image dataset containing annotations of oriented and horizontal bounding boxes. The dataset can be download from [Official Website of DOTA Dataset](https://captain-whu.github.io/DOTA/). When the dataset is decompressed, its directory structure is shown as follows.
|
||||
```
|
||||
${DOTA_ROOT}
|
||||
├── test
|
||||
│ └── images
|
||||
├── train
|
||||
│ ├── images
|
||||
│ └── labelTxt
|
||||
└── val
|
||||
├── images
|
||||
└── labelTxt
|
||||
```
|
||||
|
||||
For labeled data, each image corresponds to a txt file with the same name, and each row in the txt file represent a rotated bouding box. The format is as follows:
|
||||
|
||||
```
|
||||
x1 y1 x2 y2 x3 y3 x4 y4 class_name difficult
|
||||
```
|
||||
|
||||
#### Slicing data with single scale
|
||||
The image resolution of DOTA dataset is relatively high, so we usually slice the images before training and testing. To slice the images with a single scale, you can use the command below
|
||||
``` bash
|
||||
# slicing labeled data
|
||||
python configs/rotate/tools/prepare_data.py \
|
||||
--input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \
|
||||
--output_dir ${OUTPUT_DIR}/trainval1024/ \
|
||||
--coco_json_file DOTA_trainval1024.json \
|
||||
--subsize 1024 \
|
||||
--gap 200 \
|
||||
--rates 1.0
|
||||
# slicing unlabeled data by setting --image_only
|
||||
python configs/rotate/tools/prepare_data.py \
|
||||
--input_dirs ${DOTA_ROOT}/test/ \
|
||||
--output_dir ${OUTPUT_DIR}/test1024/ \
|
||||
--coco_json_file DOTA_test1024.json \
|
||||
--subsize 1024 \
|
||||
--gap 200 \
|
||||
--rates 1.0 \
|
||||
--image_only
|
||||
|
||||
```
|
||||
|
||||
#### Slicing data with multi scale
|
||||
To slice the images with multiple scales, you can use the command below
|
||||
``` bash
|
||||
# slicing labeled data
|
||||
python configs/rotate/tools/prepare_data.py \
|
||||
--input_dirs ${DOTA_ROOT}/train/ ${DOTA_ROOT}/val/ \
|
||||
--output_dir ${OUTPUT_DIR}/trainval/ \
|
||||
--coco_json_file DOTA_trainval1024.json \
|
||||
--subsize 1024 \
|
||||
--gap 500 \
|
||||
--rates 0.5 1.0 1.5
|
||||
# slicing unlabeled data by setting --image_only
|
||||
python configs/rotate/tools/prepare_data.py \
|
||||
--input_dirs ${DOTA_ROOT}/test/ \
|
||||
--output_dir ${OUTPUT_DIR}/test1024/ \
|
||||
--coco_json_file DOTA_test1024.json \
|
||||
--subsize 1024 \
|
||||
--gap 500 \
|
||||
--rates 0.5 1.0 1.5 \
|
||||
--image_only
|
||||
```
|
||||
|
||||
### Custom Dataset
|
||||
Rotated object detction uses the standard COCO data format, and you can convert your dataset to COCO format to train the model. The annotations of standard COCO format contains the following information
|
||||
``` python
|
||||
'annotations': [
|
||||
{
|
||||
'id': 2083, 'category_id': 9, 'image_id': 9008,
|
||||
'bbox': [x, y, w, h], # horizontal bouding box
|
||||
'segmentation': [[x1, y1, x2, y2, x3, y3, x4, y4]], # rotated bounding box
|
||||
...
|
||||
}
|
||||
...
|
||||
]
|
||||
```
|
||||
**It should be noted that `bbox` is the horizontal bouding box, and `segmentation` is four points of rotated bounding box (clockwise or counterclockwise). The `bbox` can be empty when training rotated object detector, and it is recommended to generate `bbox` according to `segmentation`**. In PaddleDetection 2.4 and earlier versions, `bbox` represents the rotated bounding box [x, y, w, h, angle] and `segmentation` is empty. **But this format is no longer supported after PaddleDetection 2.5, please download the latest dataset or convert to standard COCO format**.
|
||||
## Installation
|
||||
Models of rotated object detection depend on external operators for training, evaluation, etc. In Linux environment, you can execute the following command to compile and install.
|
||||
```
|
||||
cd ppdet/ext_op
|
||||
python setup.py install
|
||||
```
|
||||
In Windows environment, perform the following steps to install it:
|
||||
|
||||
(1)Visual Studio (version required >= Visual Studio 2015 Update3);
|
||||
|
||||
(2)Go to Start --> Visual Studio 2017 --> X64 native Tools command prompt for VS 2017;
|
||||
|
||||
(3)Setting Environment Variables:set DISTUTILS_USE_SDK=1
|
||||
|
||||
(4)Enter `ppdet/ext_op` directory,use `python setup.py install` to install。
|
||||
|
||||
After the installation, you can execute the unittest of `ppdet/ext_op/unittest` to verify whether the external oprators is installed correctly.
|
||||
@@ -0,0 +1,91 @@
|
||||
简体中文 | [English](README_en.md)
|
||||
|
||||
# FCOSR
|
||||
|
||||
## 内容
|
||||
- [简介](#简介)
|
||||
- [模型库](#模型库)
|
||||
- [使用说明](#使用说明)
|
||||
- [预测部署](#预测部署)
|
||||
- [引用](#引用)
|
||||
|
||||
## 简介
|
||||
|
||||
[FCOSR](https://arxiv.org/abs/2111.10780)是基于[FCOS](https://arxiv.org/abs/1904.01355)的单阶段Anchor-Free的旋转框检测算法。FCOSR主要聚焦于旋转框的标签匹配策略,提出了椭圆中心采样和模糊样本标签匹配的方法。在loss方面,FCOSR使用了[ProbIoU](https://arxiv.org/abs/2106.06072)避免边界不连续性问题。
|
||||
|
||||
## 模型库
|
||||
|
||||
| 模型 | Backbone | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 |
|
||||
|:---:|:--------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
|
||||
| FCOSR-M | ResNeXt-50 | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
|
||||
|
||||
**注意:**
|
||||
|
||||
- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)** 调整学习率。
|
||||
- 模型库中的模型默认使用单尺度训练单尺度测试。如果数据增广一栏标明MS,意味着使用多尺度训练和多尺度测试。如果数据增广一栏标明RR,意味着使用RandomRotate数据增广进行训练。
|
||||
|
||||
## 使用说明
|
||||
|
||||
参考[数据准备](../README.md#数据准备)准备数据。
|
||||
|
||||
### 训练
|
||||
|
||||
GPU单卡训练
|
||||
``` bash
|
||||
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml
|
||||
```
|
||||
|
||||
GPU多卡训练
|
||||
``` bash
|
||||
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml
|
||||
```
|
||||
|
||||
### 预测
|
||||
|
||||
执行以下命令预测单张图片,图片预测结果会默认保存在`output`文件夹下面
|
||||
``` bash
|
||||
python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5
|
||||
```
|
||||
|
||||
### DOTA数据集评估
|
||||
|
||||
参考[DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), 评估DOTA数据集需要生成一个包含所有检测结果的zip文件,每一类的检测结果储存在一个txt文件中,txt文件中每行格式为:`image_name score x1 y1 x2 y2 x3 y3 x4 y4`。将生成的zip文件提交到[DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html)的Task1进行评估。你可以执行以下命令得到test数据集的预测结果:
|
||||
``` bash
|
||||
python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_fcosr --visualize=False --save_results=True
|
||||
```
|
||||
将预测结果处理成官网评估所需要的格式:
|
||||
``` bash
|
||||
python configs/rotate/tools/generate_result.py --pred_txt_dir=output_fcosr/ --output_dir=submit/ --data_type=dota10
|
||||
|
||||
zip -r submit.zip submit
|
||||
```
|
||||
|
||||
## 预测部署
|
||||
|
||||
部署教程请参考[预测部署](../../../deploy/README.md)
|
||||
|
||||
## 引用
|
||||
|
||||
```
|
||||
@article{li2021fcosr,
|
||||
title={Fcosr: A simple anchor-free rotated detector for aerial object detection},
|
||||
author={Li, Zhonghua and Hou, Biao and Wu, Zitong and Jiao, Licheng and Ren, Bo and Yang, Chen},
|
||||
journal={arXiv preprint arXiv:2111.10780},
|
||||
year={2021}
|
||||
}
|
||||
|
||||
@inproceedings{tian2019fcos,
|
||||
title={Fcos: Fully convolutional one-stage object detection},
|
||||
author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
|
||||
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
|
||||
pages={9627--9636},
|
||||
year={2019}
|
||||
}
|
||||
|
||||
@article{llerena2021gaussian,
|
||||
title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection},
|
||||
author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio},
|
||||
journal={arXiv preprint arXiv:2106.06072},
|
||||
year={2021}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,92 @@
|
||||
English | [简体中文](README.md)
|
||||
|
||||
# FCOSR
|
||||
|
||||
## Content
|
||||
- [Introduction](#Introduction)
|
||||
- [Model Zoo](#Model-Zoo)
|
||||
- [Getting Start](#Getting-Start)
|
||||
- [Deployment](#Deployment)
|
||||
- [Citations](#Citations)
|
||||
|
||||
## Introduction
|
||||
|
||||
[FCOSR](https://arxiv.org/abs/2111.10780) is one stage anchor-free model based on [FCOS](https://arxiv.org/abs/1904.01355). FCOSR focuses on the label assignment strategy for oriented bounding boxes and proposes ellipse center sampling method and fuzzy sample assignment strategy. In terms of loss, FCOSR uses [ProbIoU](https://arxiv.org/abs/2106.06072) to avoid boundary discontinuity problem.
|
||||
|
||||
## Model Zoo
|
||||
|
||||
| Model | Backbone | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config |
|
||||
|:---:|:--------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
|
||||
| FCOSR-M | ResNeXt-50 | 76.62 | 3x | oc | RR | 4 | 4 | [model](https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr/fcosr_x50_3x_dota.yml) |
|
||||
|
||||
**Notes:**
|
||||
|
||||
- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)**.
|
||||
- Models in model zoo is trained and tested with single scale by default. If `MS` is indicated in the data augmentation column, it means that multi-scale training and multi-scale testing are used. If `RR` is indicated in the data augmentation column, it means that RandomRotate data augmentation is used for training.
|
||||
|
||||
## Getting Start
|
||||
|
||||
Refer to [Data-Preparation](../README_en.md#Data-Preparation) to prepare data.
|
||||
|
||||
### Training
|
||||
|
||||
Single GPU Training
|
||||
``` bash
|
||||
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml
|
||||
```
|
||||
|
||||
Multiple GPUs Training
|
||||
``` bash
|
||||
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml
|
||||
```
|
||||
|
||||
### Inference
|
||||
|
||||
Run the follow command to infer single image, the result of inference will be saved in `output` directory by default.
|
||||
|
||||
``` bash
|
||||
python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5
|
||||
```
|
||||
|
||||
### Evaluation on DOTA Dataset
|
||||
Refering to [DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), You need to submit a zip file containing results for all test images for evaluation. The detection results of each category are stored in a txt file, each line of which is in the following format
|
||||
`image_id score x1 y1 x2 y2 x3 y3 x4 y4`. To evaluate, you should submit the generated zip file to the Task1 of [DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html). You can run the following command to get the inference results of test dataset:
|
||||
``` bash
|
||||
python tools/infer.py -c configs/rotate/fcosr/fcosr_x50_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/fcosr_x50_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_fcosr --visualize=False --save_results=True
|
||||
```
|
||||
Process the prediction results into the format required for the official website evaluation:
|
||||
``` bash
|
||||
python configs/rotate/tools/generate_result.py --pred_txt_dir=output_fcosr/ --output_dir=submit/ --data_type=dota10
|
||||
|
||||
zip -r submit.zip submit
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
Please refer to the deployment tutorial[Deployment](../../../deploy/README_en.md)
|
||||
|
||||
## Citations
|
||||
|
||||
```
|
||||
@article{li2021fcosr,
|
||||
title={Fcosr: A simple anchor-free rotated detector for aerial object detection},
|
||||
author={Li, Zhonghua and Hou, Biao and Wu, Zitong and Jiao, Licheng and Ren, Bo and Yang, Chen},
|
||||
journal={arXiv preprint arXiv:2111.10780},
|
||||
year={2021}
|
||||
}
|
||||
|
||||
@inproceedings{tian2019fcos,
|
||||
title={Fcos: Fully convolutional one-stage object detection},
|
||||
author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
|
||||
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
|
||||
pages={9627--9636},
|
||||
year={2019}
|
||||
}
|
||||
|
||||
@article{llerena2021gaussian,
|
||||
title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection},
|
||||
author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio},
|
||||
journal={arXiv preprint arXiv:2106.06072},
|
||||
year={2021}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,46 @@
|
||||
worker_num: 4
|
||||
image_height: &image_height 1024
|
||||
image_width: &image_width 1024
|
||||
image_size: &image_size [*image_height, *image_width]
|
||||
|
||||
TrainReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Poly2Array: {}
|
||||
- RandomRFlip: {}
|
||||
- RandomRRotate: {angle_mode: 'value', angle: [0, 90, 180, -90]}
|
||||
- RandomRRotate: {angle_mode: 'value', angle: [30, 60], rotate_prob: 0.5}
|
||||
- RResize: {target_size: *image_size, keep_ratio: True, interp: 2}
|
||||
- Poly2RBox: {filter_threshold: 2, filter_mode: 'edge', rbox_type: 'oc'}
|
||||
batch_transforms:
|
||||
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
|
||||
- Permute: {}
|
||||
- PadRGT: {}
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 4
|
||||
shuffle: true
|
||||
drop_last: true
|
||||
use_shared_memory: true
|
||||
collate_batch: true
|
||||
|
||||
EvalReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Poly2Array: {}
|
||||
- RResize: {target_size: *image_size, keep_ratio: True, interp: 2}
|
||||
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
|
||||
- Permute: {}
|
||||
batch_transforms:
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 2
|
||||
collate_batch: false
|
||||
|
||||
TestReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {target_size: *image_size, keep_ratio: True, interp: 2}
|
||||
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
|
||||
- Permute: {}
|
||||
batch_transforms:
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 2
|
||||
@@ -0,0 +1,44 @@
|
||||
architecture: YOLOv3
|
||||
snapshot_epoch: 1
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNeXt50_32x4d_pretrained.pdparams
|
||||
|
||||
YOLOv3:
|
||||
backbone: ResNet
|
||||
neck: FPN
|
||||
yolo_head: FCOSRHead
|
||||
post_process: ~
|
||||
|
||||
ResNet:
|
||||
depth: 50
|
||||
groups: 32
|
||||
base_width: 4
|
||||
variant: b
|
||||
norm_type: bn
|
||||
freeze_at: 0
|
||||
return_idx: [1,2,3]
|
||||
num_stages: 4
|
||||
|
||||
FPN:
|
||||
out_channel: 256
|
||||
extra_stage: 2
|
||||
has_extra_convs: true
|
||||
use_c5: false
|
||||
relu_before_extra_convs: true
|
||||
|
||||
FCOSRHead:
|
||||
feat_channels: 256
|
||||
fpn_strides: [8, 16, 32, 64, 128]
|
||||
stacked_convs: 4
|
||||
loss_weight: {class: 1.0, probiou: 1.0}
|
||||
assigner:
|
||||
name: FCOSRAssigner
|
||||
factor: 12
|
||||
threshold: 0.23
|
||||
boundary: [[-1, 64], [64, 128], [128, 256], [256, 512], [512, 100000000.0]]
|
||||
nms:
|
||||
name: MultiClassNMS
|
||||
nms_top_k: 2000
|
||||
keep_top_k: -1
|
||||
score_threshold: 0.1
|
||||
nms_threshold: 0.1
|
||||
normalized: False
|
||||
@@ -0,0 +1,20 @@
|
||||
epoch: 36
|
||||
|
||||
LearningRate:
|
||||
base_lr: 0.01
|
||||
schedulers:
|
||||
- !PiecewiseDecay
|
||||
gamma: 0.1
|
||||
milestones: [24, 33]
|
||||
- !LinearWarmup
|
||||
start_factor: 0.3333333
|
||||
steps: 500
|
||||
|
||||
OptimizerBuilder:
|
||||
clip_grad_by_norm: 35.
|
||||
optimizer:
|
||||
momentum: 0.9
|
||||
type: Momentum
|
||||
regularizer:
|
||||
factor: 0.0001
|
||||
type: L2
|
||||
@@ -0,0 +1,9 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/optimizer_3x.yml',
|
||||
'_base_/fcosr_reader.yml',
|
||||
'_base_/fcosr_x50.yml'
|
||||
]
|
||||
|
||||
weights: output/fcosr_x50_3x_dota/model_final
|
||||
@@ -0,0 +1,178 @@
|
||||
简体中文 | [English](README_en.md)
|
||||
|
||||
# PP-YOLOE-R
|
||||
|
||||
## 内容
|
||||
- [简介](#简介)
|
||||
- [模型库](#模型库)
|
||||
- [使用说明](#使用说明)
|
||||
- [预测部署](#预测部署)
|
||||
- [附录](#附录)
|
||||
- [引用](#引用)
|
||||
|
||||
## 简介
|
||||
PP-YOLOE-R是一个高效的单阶段Anchor-free旋转框检测模型。基于PP-YOLOE, PP-YOLOE-R以极少的参数量和计算量为代价,引入了一系列有用的设计来提升检测精度。在DOTA 1.0数据集上,PP-YOLOE-R-l和PP-YOLOE-R-x在单尺度训练和测试的情况下分别达到了78.14和78.27 mAP,这超越了几乎所有的旋转框检测模型。通过多尺度训练和测试,PP-YOLOE-R-l和PP-YOLOE-R-x的检测精度进一步提升至80.02和80.73 mAP。在这种情况下,PP-YOLOE-R-x超越了所有的anchor-free方法并且和最先进的anchor-based的两阶段模型精度几乎相当。此外,PP-YOLOE-R-s和PP-YOLOE-R-m通过多尺度训练和测试可以达到79.42和79.71 mAP。考虑到这两个模型的参数量和计算量,其性能也非常卓越。在保持高精度的同时,PP-YOLOE-R避免使用特殊的算子,例如Deformable Convolution或Rotated RoI Align,以使其能轻松地部署在多种多样的硬件上。在1024x1024的输入分辨率下,PP-YOLOE-R-s/m/l/x在RTX 2080 Ti上使用TensorRT FP16分别能达到69.8/55.1/48.3/37.1 FPS,在Tesla V100上分别能达到114.5/86.8/69.7/50.7 FPS。更多细节可以参考我们的[**技术报告**](https://arxiv.org/abs/2211.02386)。
|
||||
|
||||
<div align="center">
|
||||
<img src="../../../docs/images/ppyoloe_r_map_fps.png" width=500 />
|
||||
</div>
|
||||
|
||||
PP-YOLOE-R相较于PP-YOLOE做了以下几点改动:
|
||||
- Rotated Task Alignment Learning
|
||||
- 解耦的角度预测头
|
||||
- 使用DFL进行角度预测
|
||||
- 可学习的门控单元
|
||||
- [ProbIoU损失函数](https://arxiv.org/abs/2106.06072)
|
||||
|
||||
## 模型库
|
||||
|
||||
| 模型 | Backbone | mAP | V100 TRT FP16 (FPS) | RTX 2080 Ti TRT FP16 (FPS) | Params (M) | FLOPs (G) | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 |
|
||||
|:---:|:--------:|:----:|:--------------------:|:------------------------:|:----------:|:---------:|:--------:|:----------:|:-------:|:------:|:-----------:|:--------:|:------:|
|
||||
| PP-YOLOE-R-s | CRN-s | 73.82 | 114.5 | 69.8 | 8.09 | 43.46 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) |
|
||||
| PP-YOLOE-R-s | CRN-s | 79.42 | 114.5 | 69.8 | 8.09 | 43.46 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) |
|
||||
| PP-YOLOE-R-m | CRN-m | 77.64 | 86.8 | 55.1 | 23.96 |127.00 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) |
|
||||
| PP-YOLOE-R-m | CRN-m | 79.71 | 86.8 | 55.1 | 23.96 |127.00 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) |
|
||||
| PP-YOLOE-R-l | CRN-l | 78.14 | 69.7 | 48.3 | 53.29 |281.65 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) |
|
||||
| PP-YOLOE-R-l | CRN-l | 80.02 | 69.7 | 48.3 | 53.29 |281.65 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) |
|
||||
| PP-YOLOE-R-x | CRN-x | 78.28 | 50.7 | 37.1 | 100.27|529.82 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) |
|
||||
| PP-YOLOE-R-x | CRN-x | 80.73 | 50.7 | 37.1 | 100.27|529.82 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) |
|
||||
|
||||
**注意:**
|
||||
|
||||
- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)** 调整学习率。
|
||||
- 模型库中的模型默认使用单尺度训练单尺度测试。如果数据增广一栏标明MS,意味着使用多尺度训练和多尺度测试。如果数据增广一栏标明RR,意味着使用RandomRotate数据增广进行训练。
|
||||
- CRN表示在PP-YOLOE中提出的CSPRepResNet
|
||||
- PP-YOLOE-R的参数量和计算量是在重参数化之后计算得到,输入图像的分辨率为1024x1024
|
||||
- 速度测试使用TensorRT 8.2.3在DOTA测试集中测试2000张图片计算平均值得到。参考速度测试以复现[速度测试](#速度测试)
|
||||
|
||||
## 使用说明
|
||||
|
||||
参考[数据准备](../README.md#数据准备)准备数据。
|
||||
|
||||
### 训练
|
||||
|
||||
GPU单卡训练
|
||||
``` bash
|
||||
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml
|
||||
```
|
||||
|
||||
GPU多卡训练
|
||||
``` bash
|
||||
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml
|
||||
```
|
||||
|
||||
### 预测
|
||||
|
||||
执行以下命令预测单张图片,图片预测结果会默认保存在`output`文件夹下面
|
||||
``` bash
|
||||
python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5
|
||||
```
|
||||
|
||||
### DOTA数据集评估
|
||||
|
||||
参考[DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), 评估DOTA数据集需要生成一个包含所有检测结果的zip文件,每一类的检测结果储存在一个txt文件中,txt文件中每行格式为:`image_name score x1 y1 x2 y2 x3 y3 x4 y4`。将生成的zip文件提交到[DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html)的Task1进行评估。你可以执行以下命令得到test数据集的预测结果:
|
||||
``` bash
|
||||
python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_ppyoloe_r --visualize=False --save_results=True
|
||||
```
|
||||
将预测结果处理成官网评估所需要的格式:
|
||||
``` bash
|
||||
python configs/rotate/tools/generate_result.py --pred_txt_dir=output_ppyoloe_r/ --output_dir=submit/ --data_type=dota10
|
||||
|
||||
zip -r submit.zip submit
|
||||
```
|
||||
|
||||
### 速度测试
|
||||
可以使用Paddle模式或者Paddle-TRT模式进行测速。当使用Paddle-TRT模式测速时,需要确保**TensorRT版本大于8.2, PaddlePaddle版本为develop版本**。使用Paddle-TRT进行测速,可以执行以下命令:
|
||||
|
||||
``` bash
|
||||
# 导出模型
|
||||
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True
|
||||
|
||||
# 速度测试
|
||||
CUDA_VISIBLE_DEVICES=0 python configs/rotate/tools/inference_benchmark.py --model_dir output_inference/ppyoloe_r_crn_l_3x_dota/ --image_dir /path/to/dota/test/dir --run_mode trt_fp16
|
||||
```
|
||||
当只使用Paddle进行测速,可以执行以下命令:
|
||||
``` bash
|
||||
# 导出模型
|
||||
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams
|
||||
|
||||
# 速度测试
|
||||
CUDA_VISIBLE_DEVICES=0 python configs/rotate/tools/inference_benchmark.py --model_dir output_inference/ppyoloe_r_crn_l_3x_dota/ --image_dir /path/to/dota/test/dir --run_mode paddle
|
||||
```
|
||||
|
||||
## 预测部署
|
||||
|
||||
**使用Paddle**进行部署,执行以下命令:
|
||||
``` bash
|
||||
# 导出模型
|
||||
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams
|
||||
|
||||
# 预测图片
|
||||
python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=paddle --device=gpu
|
||||
```
|
||||
|
||||
**使用Paddle-TRT进行部署**,执行以下命令:
|
||||
```
|
||||
# 导出模型
|
||||
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True
|
||||
|
||||
# 预测图片
|
||||
python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=trt_fp16 --device=gpu
|
||||
```
|
||||
|
||||
**注意:**
|
||||
- 使用Paddle-TRT使用确保**PaddlePaddle版本为develop版本且TensorRT版本大于8.2**.
|
||||
|
||||
**使用ONNX Runtime进行部署**,执行以下命令:
|
||||
```
|
||||
# 导出模型
|
||||
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams export_onnx=True
|
||||
|
||||
# 安装paddle2onnx
|
||||
pip install paddle2onnx
|
||||
|
||||
# 转换成onnx模型
|
||||
paddle2onnx --model_dir output_inference/ppyoloe_r_crn_l_3x_dota --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_r_crn_l_3x_dota.onnx
|
||||
|
||||
# 预测图片
|
||||
python configs/rotate/tools/onnx_infer.py --infer_cfg output_inference/ppyoloe_r_crn_l_3x_dota/infer_cfg.yml --onnx_file ppyoloe_r_crn_l_3x_dota.onnx --image_file demo/P0072__1.0__0___0.png
|
||||
|
||||
```
|
||||
|
||||
## 附录
|
||||
|
||||
PP-YOLOE-R消融实验
|
||||
|
||||
| 模型 | mAP | 参数量(M) | FLOPs(G) |
|
||||
| :-: | :-: | :------: | :------: |
|
||||
| Baseline | 75.61 | 50.65 | 269.09 |
|
||||
| +Rotated Task Alignment Learning | 77.24 | 50.65 | 269.09 |
|
||||
| +Decoupled Angle Prediction Head | 77.78 | 52.20 | 272.72 |
|
||||
| +Angle Prediction with DFL | 78.01 | 53.29 | 281.65 |
|
||||
| +Learnable Gating Unit for RepVGG | 78.14 | 53.29 | 281.65 |
|
||||
|
||||
|
||||
## 引用
|
||||
|
||||
```
|
||||
@article{wang2022pp,
|
||||
title={PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector},
|
||||
author={Wang, Xinxin and Wang, Guanzhong and Dang, Qingqing and Liu, Yi and Hu, Xiaoguang and Yu, Dianhai},
|
||||
journal={arXiv preprint arXiv:2211.02386},
|
||||
year={2022}
|
||||
}
|
||||
|
||||
@article{xu2022pp,
|
||||
title={PP-YOLOE: An evolved version of YOLO},
|
||||
author={Xu, Shangliang and Wang, Xinxin and Lv, Wenyu and Chang, Qinyao and Cui, Cheng and Deng, Kaipeng and Wang, Guanzhong and Dang, Qingqing and Wei, Shengyu and Du, Yuning and others},
|
||||
journal={arXiv preprint arXiv:2203.16250},
|
||||
year={2022}
|
||||
}
|
||||
|
||||
@article{llerena2021gaussian,
|
||||
title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection},
|
||||
author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio},
|
||||
journal={arXiv preprint arXiv:2106.06072},
|
||||
year={2021}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,180 @@
|
||||
English | [简体中文](README.md)
|
||||
|
||||
# PP-YOLOE-R
|
||||
|
||||
## Content
|
||||
- [Introduction](#Introduction)
|
||||
- [Model Zoo](#Model-Zoo)
|
||||
- [Getting Start](#Getting-Start)
|
||||
- [Deployment](#Deployment)
|
||||
- [Appendix](#Appendix)
|
||||
- [Citations](#Citations)
|
||||
|
||||
## Introduction
|
||||
PP-YOLOE-R is an efficient anchor-free rotated object detector. Based on PP-YOLOE, PP-YOLOE-R introduces a bag of useful tricks to improve detection precision at the expense of marginal parameters and computations.PP-YOLOE-R-l and PP-YOLOE-R-x achieve 78.14 and 78.27 mAP respectively on DOTA 1.0 dataset with single-scale training and testing, which outperform almost all other rotated object detectors. With multi-scale training and testing, the detection precision of PP-YOLOE-R-l and PP-YOLOE-R-x is further improved to 80.02 and 80.73 mAP. In this case, PP-YOLOE-R-x surpasses all anchor-free methods and demonstrates competitive performance to state-of-the-art anchor-based two-stage model. Moreover, PP-YOLOE-R-s and PP-YOLOE-R-m can achieve 79.42 and 79.71 mAP with multi-scale training and testing, which is an excellent result considering the parameters and GLOPS of these two models. While maintaining high precision, PP-YOLOE-R avoids using special operators, such as Deformable Convolution or Rotated RoI Align, to be deployed friendly on various hardware. At the input resolution of 1024$\times$1024, PP-YOLOE-R-s/m/l/x can reach 69.8/55.1/48.3/37.1 FPS on RTX 2080 Ti and 114.5/86.8/69.7/50.7 FPS on Tesla V100 GPU with TensorRT and FP16-precision. For more details, please refer to our [**technical report**](https://arxiv.org/abs/2211.02386).
|
||||
|
||||
<div align="center">
|
||||
<img src="../../../docs/images/ppyoloe_r_map_fps.png" width=500 />
|
||||
</div>
|
||||
|
||||
Compared with PP-YOLOE, PP-YOLOE-R has made the following changes:
|
||||
- Rotated Task Alignment Learning
|
||||
- Decoupled Angle Prediction Head
|
||||
- Angle Prediction with DFL
|
||||
- Learnable Gating Unit for RepVGG
|
||||
- [ProbIoU Loss](https://arxiv.org/abs/2106.06072)
|
||||
|
||||
## Model Zoo
|
||||
| Model | Backbone | mAP | V100 TRT FP16 (FPS) | RTX 2080 Ti TRT FP16 (FPS) | Params (M) | FLOPs (G) | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config |
|
||||
|:-----:|:--------:|:----:|:-------------------:|:--------------------------:|:-----------:|:---------:|:--------:|:-----:|:---:|:----------:|:----------:|:--------:|:------:|
|
||||
| PP-YOLOE-R-s | CRN-s | 73.82 | 114.5 | 69.8 | 8.09 | 43.46 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml) |
|
||||
| PP-YOLOE-R-s | CRN-s | 79.42 | 114.5 | 69.8 | 8.09 | 43.46 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_s_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml) |
|
||||
| PP-YOLOE-R-m | CRN-m | 77.64 | 86.8 | 55.1 | 23.96 |127.00 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota.yml) |
|
||||
| PP-YOLOE-R-m | CRN-m | 79.71 | 86.8 | 55.1 | 23.96 |127.00 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_m_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_m_3x_dota_ms.yml) |
|
||||
| PP-YOLOE-R-l | CRN-l | 78.14 | 69.7 | 48.3 | 53.29 |281.65 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml) |
|
||||
| PP-YOLOE-R-l | CRN-l | 80.02 | 69.7 | 48.3 | 53.29 |281.65 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) |
|
||||
| PP-YOLOE-R-x | CRN-x | 78.28 | 50.7 | 37.1 | 100.27|529.82 | 3x | oc | RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota.yml) |
|
||||
| PP-YOLOE-R-x | CRN-x | 80.73 | 50.7 | 37.1 | 100.27|529.82 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_x_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_x_3x_dota_ms.yml) |
|
||||
|
||||
**Notes:**
|
||||
|
||||
- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)**.
|
||||
- Models in model zoo is trained and tested with single scale by default. If `MS` is indicated in the data augmentation column, it means that multi-scale training and multi-scale testing are used. If `RR` is indicated in the data augmentation column, it means that RandomRotate data augmentation is used for training.
|
||||
- CRN denotes CSPRepResNet proposed in PP-YOLOE
|
||||
- The parameters and GLOPs of PP-YOLOE-R are calculated after re-parameterization, and the resolution of the input image is 1024x1024
|
||||
- Speed is calculated and averaged by testing 2000 images on the DOTA test dataset. Refer to [Speed testing](#Speed-testing) to reproduce the results.
|
||||
|
||||
## Getting Start
|
||||
|
||||
Refer to [Data-Preparation](../README_en.md#Data-Preparation) to prepare data.
|
||||
|
||||
### Training
|
||||
|
||||
Single GPU Training
|
||||
``` bash
|
||||
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml
|
||||
```
|
||||
|
||||
Multiple GPUs Training
|
||||
``` bash
|
||||
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml
|
||||
```
|
||||
|
||||
### Inference
|
||||
|
||||
Run the follow command to infer single image, the result of inference will be saved in `output` directory by default.
|
||||
|
||||
``` bash
|
||||
python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_img=demo/P0861__1.0__1154___824.png --draw_threshold=0.5
|
||||
```
|
||||
|
||||
### Evaluation on DOTA Dataset
|
||||
Refering to [DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), You need to submit a zip file containing results for all test images for evaluation. The detection results of each category are stored in a txt file, each line of which is in the following format
|
||||
`image_id score x1 y1 x2 y2 x3 y3 x4 y4`. To evaluate, you should submit the generated zip file to the Task1 of [DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html). You can run the following command to get the inference results of test dataset:
|
||||
``` bash
|
||||
python tools/infer.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output_ppyoloe_r --visualize=False --save_results=True
|
||||
```
|
||||
Process the prediction results into the format required for the official website evaluation:
|
||||
``` bash
|
||||
python configs/rotate/tools/generate_result.py --pred_txt_dir=output_ppyoloe_r/ --output_dir=submit/ --data_type=dota10
|
||||
|
||||
zip -r submit.zip submit
|
||||
```
|
||||
|
||||
### Speed testing
|
||||
|
||||
You can use Paddle mode or Paddle-TRT mode for speed testing. When using Paddle-TRT for speed testing, make sure that **the version of TensorRT is larger than 8.2 and the version of PaddlePaddle is the develop version**. Using Paddle-TRT to test speed, run following command
|
||||
|
||||
``` bash
|
||||
# export inference model
|
||||
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True
|
||||
|
||||
# speed testing
|
||||
CUDA_VISIBLE_DEVICES=0 python configs/rotate/tools/inference_benchmark.py --model_dir output_inference/ppyoloe_r_crn_l_3x_dota/ --image_dir /path/to/dota/test/dir --run_mode trt_fp16
|
||||
```
|
||||
Using Paddle to test speed, run following command
|
||||
``` bash
|
||||
# export inference model
|
||||
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams
|
||||
|
||||
# speed testing
|
||||
CUDA_VISIBLE_DEVICES=0 python configs/rotate/tools/inference_benchmark.py --model_dir output_inference/ppyoloe_r_crn_l_3x_dota/ --image_dir /path/to/dota/test/dir --run_mode paddle
|
||||
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
**Using Paddle** to for deployment, run following command
|
||||
|
||||
``` bash
|
||||
# export inference model
|
||||
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams
|
||||
|
||||
# inference single image
|
||||
python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=paddle --device=gpu
|
||||
```
|
||||
|
||||
**Using Paddle-TRT** for deployment, run following command
|
||||
|
||||
``` bash
|
||||
# export inference model
|
||||
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams trt=True
|
||||
|
||||
# inference single image
|
||||
python deploy/python/infer.py --image_file demo/P0072__1.0__0___0.png --model_dir=output_inference/ppyoloe_r_crn_l_3x_dota --run_mode=trt_fp16 --device=gpu
|
||||
```
|
||||
**Notes:**
|
||||
- When using Paddle-TRT for speed testing, make sure that **the version of TensorRT is larger than 8.2 and the version of PaddlePaddle is the develop version**
|
||||
|
||||
**Using ONNX Runtime** for deployment, run following command
|
||||
|
||||
``` bash
|
||||
# export inference model
|
||||
python tools/export_model.py -c configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota.pdparams export_onnx=True
|
||||
|
||||
# install paddle2onnx
|
||||
pip install paddle2onnx
|
||||
|
||||
# convert to onnx model
|
||||
paddle2onnx --model_dir output_inference/ppyoloe_r_crn_l_3x_dota --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_r_crn_l_3x_dota.onnx
|
||||
|
||||
# inference single image
|
||||
python configs/rotate/tools/onnx_infer.py --infer_cfg output_inference/ppyoloe_r_crn_l_3x_dota/infer_cfg.yml --onnx_file ppyoloe_r_crn_l_3x_dota.onnx --image_file demo/P0072__1.0__0___0.png
|
||||
```
|
||||
|
||||
## Appendix
|
||||
|
||||
Ablation experiments of PP-YOLOE-R
|
||||
|
||||
| Model | mAP | Params(M) | FLOPs(G) |
|
||||
| :-: | :-: | :------: | :------: |
|
||||
| Baseline | 75.61 | 50.65 | 269.09 |
|
||||
| +Rotated Task Alignment Learning | 77.24 | 50.65 | 269.09 |
|
||||
| +Decoupled Angle Prediction Head | 77.78 | 52.20 | 272.72 |
|
||||
| +Angle Prediction with DFL | 78.01 | 53.29 | 281.65 |
|
||||
| +Learnable Gating Unit for RepVGG | 78.14 | 53.29 | 281.65 |
|
||||
|
||||
## Citations
|
||||
|
||||
```
|
||||
@article{wang2022pp,
|
||||
title={PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector},
|
||||
author={Wang, Xinxin and Wang, Guanzhong and Dang, Qingqing and Liu, Yi and Hu, Xiaoguang and Yu, Dianhai},
|
||||
journal={arXiv preprint arXiv:2211.02386},
|
||||
year={2022}
|
||||
}
|
||||
|
||||
@article{xu2022pp,
|
||||
title={PP-YOLOE: An evolved version of YOLO},
|
||||
author={Xu, Shangliang and Wang, Xinxin and Lv, Wenyu and Chang, Qinyao and Cui, Cheng and Deng, Kaipeng and Wang, Guanzhong and Dang, Qingqing and Wei, Shengyu and Du, Yuning and others},
|
||||
journal={arXiv preprint arXiv:2203.16250},
|
||||
year={2022}
|
||||
}
|
||||
|
||||
@article{llerena2021gaussian,
|
||||
title={Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection},
|
||||
author={Llerena, Jeffri M and Zeni, Luis Felipe and Kristen, Lucas N and Jung, Claudio},
|
||||
journal={arXiv preprint arXiv:2106.06072},
|
||||
year={2021}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,19 @@
|
||||
epoch: 36
|
||||
|
||||
LearningRate:
|
||||
base_lr: 0.008
|
||||
schedulers:
|
||||
- !CosineDecay
|
||||
max_epochs: 44
|
||||
- !LinearWarmup
|
||||
start_factor: 0.
|
||||
steps: 1000
|
||||
|
||||
OptimizerBuilder:
|
||||
clip_grad_by_norm: 35.
|
||||
optimizer:
|
||||
momentum: 0.9
|
||||
type: Momentum
|
||||
regularizer:
|
||||
factor: 0.0005
|
||||
type: L2
|
||||
@@ -0,0 +1,49 @@
|
||||
architecture: YOLOv3
|
||||
norm_type: sync_bn
|
||||
use_ema: true
|
||||
ema_decay: 0.9998
|
||||
|
||||
YOLOv3:
|
||||
backbone: CSPResNet
|
||||
neck: CustomCSPPAN
|
||||
yolo_head: PPYOLOERHead
|
||||
post_process: ~
|
||||
|
||||
CSPResNet:
|
||||
layers: [3, 6, 6, 3]
|
||||
channels: [64, 128, 256, 512, 1024]
|
||||
return_idx: [1, 2, 3]
|
||||
use_large_stem: True
|
||||
use_alpha: True
|
||||
|
||||
CustomCSPPAN:
|
||||
out_channels: [768, 384, 192]
|
||||
stage_num: 1
|
||||
block_num: 3
|
||||
act: 'swish'
|
||||
spp: true
|
||||
use_alpha: True
|
||||
|
||||
PPYOLOERHead:
|
||||
fpn_strides: [32, 16, 8]
|
||||
grid_cell_offset: 0.5
|
||||
use_varifocal_loss: true
|
||||
static_assigner_epoch: -1
|
||||
loss_weight: {class: 1.0, iou: 2.5, dfl: 0.05}
|
||||
static_assigner:
|
||||
name: FCOSRAssigner
|
||||
factor: 12
|
||||
threshold: 0.23
|
||||
boundary: [[512, 10000], [256, 512], [-1, 256]]
|
||||
assigner:
|
||||
name: RotatedTaskAlignedAssigner
|
||||
topk: 13
|
||||
alpha: 1.0
|
||||
beta: 6.0
|
||||
nms:
|
||||
name: MultiClassNMS
|
||||
nms_top_k: 2000
|
||||
keep_top_k: -1
|
||||
score_threshold: 0.1
|
||||
nms_threshold: 0.1
|
||||
normalized: False
|
||||
@@ -0,0 +1,46 @@
|
||||
worker_num: 4
|
||||
image_height: &image_height 1024
|
||||
image_width: &image_width 1024
|
||||
image_size: &image_size [*image_height, *image_width]
|
||||
|
||||
TrainReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Poly2Array: {}
|
||||
- RandomRFlip: {}
|
||||
- RandomRRotate: {angle_mode: 'value', angle: [0, 90, 180, -90]}
|
||||
- RandomRRotate: {angle_mode: 'value', angle: [30, 60], rotate_prob: 0.5}
|
||||
- RResize: {target_size: *image_size, keep_ratio: True, interp: 2}
|
||||
- Poly2RBox: {filter_threshold: 2, filter_mode: 'edge', rbox_type: 'oc'}
|
||||
batch_transforms:
|
||||
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
|
||||
- Permute: {}
|
||||
- PadRGT: {}
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 2
|
||||
shuffle: true
|
||||
drop_last: true
|
||||
use_shared_memory: true
|
||||
collate_batch: true
|
||||
|
||||
EvalReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Poly2Array: {}
|
||||
- RResize: {target_size: *image_size, keep_ratio: True, interp: 2}
|
||||
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
|
||||
- Permute: {}
|
||||
batch_transforms:
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 2
|
||||
collate_batch: false
|
||||
|
||||
TestReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {target_size: *image_size, keep_ratio: True, interp: 2}
|
||||
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
|
||||
- Permute: {}
|
||||
batch_transforms:
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 2
|
||||
@@ -0,0 +1,15 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/optimizer_3x.yml',
|
||||
'_base_/ppyoloe_r_reader.yml',
|
||||
'_base_/ppyoloe_r_crn.yml'
|
||||
]
|
||||
|
||||
log_iter: 50
|
||||
snapshot_epoch: 1
|
||||
weights: output/ppyoloe_r_crn_l_3x_dota/model_final
|
||||
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams
|
||||
depth_mult: 1.0
|
||||
width_mult: 1.0
|
||||
@@ -0,0 +1,15 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota_ms.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/optimizer_3x.yml',
|
||||
'_base_/ppyoloe_r_reader.yml',
|
||||
'_base_/ppyoloe_r_crn.yml'
|
||||
]
|
||||
|
||||
log_iter: 50
|
||||
snapshot_epoch: 1
|
||||
weights: output/ppyoloe_r_crn_l_3x_dota/model_final
|
||||
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_l_pretrained.pdparams
|
||||
depth_mult: 1.0
|
||||
width_mult: 1.0
|
||||
@@ -0,0 +1,15 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/optimizer_3x.yml',
|
||||
'_base_/ppyoloe_r_reader.yml',
|
||||
'_base_/ppyoloe_r_crn.yml'
|
||||
]
|
||||
|
||||
log_iter: 50
|
||||
snapshot_epoch: 1
|
||||
weights: output/ppyoloe_r_crn_m_3x_dota/model_final
|
||||
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_m_pretrained.pdparams
|
||||
depth_mult: 0.67
|
||||
width_mult: 0.75
|
||||
@@ -0,0 +1,15 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota_ms.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/optimizer_3x.yml',
|
||||
'_base_/ppyoloe_r_reader.yml',
|
||||
'_base_/ppyoloe_r_crn.yml'
|
||||
]
|
||||
|
||||
log_iter: 50
|
||||
snapshot_epoch: 1
|
||||
weights: output/ppyoloe_r_crn_m_3x_dota/model_final
|
||||
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_m_pretrained.pdparams
|
||||
depth_mult: 0.67
|
||||
width_mult: 0.75
|
||||
@@ -0,0 +1,15 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/optimizer_3x.yml',
|
||||
'_base_/ppyoloe_r_reader.yml',
|
||||
'_base_/ppyoloe_r_crn.yml'
|
||||
]
|
||||
|
||||
log_iter: 50
|
||||
snapshot_epoch: 1
|
||||
weights: output/ppyoloe_r_crn_s_3x_dota/model_final
|
||||
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams
|
||||
depth_mult: 0.33
|
||||
width_mult: 0.50
|
||||
@@ -0,0 +1,15 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota_ms.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/optimizer_3x.yml',
|
||||
'_base_/ppyoloe_r_reader.yml',
|
||||
'_base_/ppyoloe_r_crn.yml'
|
||||
]
|
||||
|
||||
log_iter: 50
|
||||
snapshot_epoch: 1
|
||||
weights: output/ppyoloe_r_crn_s_3x_dota/model_final
|
||||
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams
|
||||
depth_mult: 0.33
|
||||
width_mult: 0.50
|
||||
@@ -0,0 +1,15 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/optimizer_3x.yml',
|
||||
'_base_/ppyoloe_r_reader.yml',
|
||||
'_base_/ppyoloe_r_crn.yml'
|
||||
]
|
||||
|
||||
log_iter: 50
|
||||
snapshot_epoch: 1
|
||||
weights: output/ppyoloe_r_crn_x_3x_dota/model_final
|
||||
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_x_pretrained.pdparams
|
||||
depth_mult: 1.33
|
||||
width_mult: 1.25
|
||||
@@ -0,0 +1,15 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota_ms.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/optimizer_3x.yml',
|
||||
'_base_/ppyoloe_r_reader.yml',
|
||||
'_base_/ppyoloe_r_crn.yml'
|
||||
]
|
||||
|
||||
log_iter: 50
|
||||
snapshot_epoch: 1
|
||||
weights: output/ppyoloe_r_crn_x_3x_dota/model_final
|
||||
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_x_pretrained.pdparams
|
||||
depth_mult: 1.33
|
||||
width_mult: 1.25
|
||||
@@ -0,0 +1,104 @@
|
||||
简体中文 | [English](README_en.md)
|
||||
|
||||
# S2ANet
|
||||
|
||||
## 内容
|
||||
- [简介](#简介)
|
||||
- [模型库](#模型库)
|
||||
- [使用说明](#使用说明)
|
||||
- [预测部署](#预测部署)
|
||||
- [引用](#引用)
|
||||
|
||||
## 简介
|
||||
|
||||
[S2ANet](https://arxiv.org/pdf/2008.09397.pdf)是用于检测旋转框的模型.
|
||||
|
||||
## 模型库
|
||||
|
||||
| 模型 | Conv类型 | mAP | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 |
|
||||
|:---:|:------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
|
||||
| S2ANet | Conv | 71.45 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_conv_2x_dota.yml) |
|
||||
| S2ANet | AlignConv | 73.84 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) |
|
||||
|
||||
**注意:**
|
||||
|
||||
- 如果**GPU卡数**或者**batch size**发生了改变,你需要按照公式 **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)** 调整学习率。
|
||||
- 模型库中的模型默认使用单尺度训练单尺度测试。如果数据增广一栏标明MS,意味着使用多尺度训练和多尺度测试。如果数据增广一栏标明RR,意味着使用RandomRotate数据增广进行训练。
|
||||
- 这里使用`multiclass_nms`,与原作者使用nms略有不同。
|
||||
|
||||
|
||||
## 使用说明
|
||||
|
||||
参考[数据准备](../README.md#数据准备)准备数据。
|
||||
|
||||
### 1. 训练
|
||||
|
||||
GPU单卡训练
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES=0
|
||||
python tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml
|
||||
```
|
||||
|
||||
GPU多卡训练
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml
|
||||
```
|
||||
|
||||
可以通过`--eval`开启边训练边测试。
|
||||
|
||||
### 2. 评估
|
||||
```bash
|
||||
python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams
|
||||
|
||||
# 使用提供训练好的模型评估
|
||||
python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams
|
||||
```
|
||||
|
||||
### 3. 预测
|
||||
执行如下命令,会将图像预测结果保存到`output`文件夹下。
|
||||
```bash
|
||||
python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
|
||||
```
|
||||
使用提供训练好的模型预测:
|
||||
```bash
|
||||
python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
|
||||
```
|
||||
|
||||
### 4. DOTA数据评估
|
||||
执行如下命令,会在`output`文件夹下将每个图像预测结果保存到同文件夹名的txt文本中。
|
||||
```
|
||||
python tools/infer.py -c configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output --visualize=False --save_results=True
|
||||
```
|
||||
参考[DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), 评估DOTA数据集需要生成一个包含所有检测结果的zip文件,每一类的检测结果储存在一个txt文件中,txt文件中每行格式为:`image_name score x1 y1 x2 y2 x3 y3 x4 y4`。将生成的zip文件提交到[DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html)的Task1进行评估。你可以执行以下命令生成评估文件
|
||||
```
|
||||
python configs/rotate/tools/generate_result.py --pred_txt_dir=output/ --output_dir=submit/ --data_type=dota10
|
||||
|
||||
zip -r submit.zip submit
|
||||
```
|
||||
|
||||
## 预测部署
|
||||
|
||||
Paddle中`multiclass_nms`算子的输入支持四边形输入,因此部署时可以不需要依赖旋转框IOU计算算子。
|
||||
|
||||
部署教程请参考[预测部署](../../../deploy/README.md)
|
||||
|
||||
|
||||
## 引用
|
||||
```
|
||||
@article{han2021align,
|
||||
author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}},
|
||||
journal={IEEE Transactions on Geoscience and Remote Sensing},
|
||||
title={Align Deep Features for Oriented Object Detection},
|
||||
year={2021},
|
||||
pages={1-11},
|
||||
doi={10.1109/TGRS.2021.3062048}}
|
||||
|
||||
@inproceedings{xia2018dota,
|
||||
title={DOTA: A large-scale dataset for object detection in aerial images},
|
||||
author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
|
||||
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
|
||||
pages={3974--3983},
|
||||
year={2018}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,102 @@
|
||||
English | [简体中文](README.md)
|
||||
|
||||
# S2ANet
|
||||
|
||||
## Content
|
||||
- [Introduction](#Introduction)
|
||||
- [Model Zoo](#Model-Zoo)
|
||||
- [Getting Start](#Getting-Start)
|
||||
- [Deployment](#Deployment)
|
||||
- [Citations](#Citations)
|
||||
|
||||
## Introduction
|
||||
|
||||
[S2ANet](https://arxiv.org/pdf/2008.09397.pdf) is used to detect rotated objects.
|
||||
|
||||
## Model Zoo
|
||||
| Model | Conv Type | mAP | Lr Scheduler | Angle | Aug | GPU Number | images/GPU | download | config |
|
||||
|:---:|:------:|:----:|:---------:|:-----:|:--------:|:-----:|:------------:|:-------:|:------:|
|
||||
| S2ANet | Conv | 71.45 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_conv_2x_dota.yml) |
|
||||
| S2ANet | AlignConv | 73.84 | 2x | le135 | - | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml) |
|
||||
|
||||
**Notes:**
|
||||
- if **GPU number** or **mini-batch size** is changed, **learning rate** should be adjusted according to the formula **lr<sub>new</sub> = lr<sub>default</sub> * (batch_size<sub>new</sub> * GPU_number<sub>new</sub>) / (batch_size<sub>default</sub> * GPU_number<sub>default</sub>)**.
|
||||
- Models in model zoo is trained and tested with single scale by default. If `MS` is indicated in the data augmentation column, it means that multi-scale training and multi-scale testing are used. If `RR` is indicated in the data augmentation column, it means that RandomRotate data augmentation is used for training.
|
||||
- `multiclass_nms` is used here, which is slightly different from the original author's use of NMS.
|
||||
|
||||
## Getting Start
|
||||
|
||||
Refer to [Data-Preparation](../README_en.md#Data-Preparation) to prepare data.
|
||||
|
||||
### 1. Train
|
||||
|
||||
Single GPU Training
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES=0
|
||||
python tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml
|
||||
```
|
||||
|
||||
Multiple GPUs Training
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rotate/s2anet/s2anet_1x_spine.yml
|
||||
```
|
||||
|
||||
You can use `--eval`to enable train-by-test.
|
||||
|
||||
### 2. Evaluation
|
||||
```bash
|
||||
python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams
|
||||
|
||||
# Use a trained model to evaluate
|
||||
python tools/eval.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams
|
||||
```
|
||||
|
||||
### 3. Prediction
|
||||
Executing the following command will save the image prediction results to the `output` folder.
|
||||
```bash
|
||||
python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=output/s2anet_1x_spine/model_final.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
|
||||
```
|
||||
Prediction using models that provide training:
|
||||
```bash
|
||||
python tools/infer.py -c configs/rotate/s2anet/s2anet_1x_spine.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_1x_spine.pdparams --infer_img=demo/39006.jpg --draw_threshold=0.3
|
||||
```
|
||||
|
||||
### 4. DOTA Data evaluation
|
||||
Execute the following command, will save each image prediction result in `output` folder txt text with the same folder name.
|
||||
```
|
||||
python tools/infer.py -c configs/rotate/s2anet/s2anet_alignconv_2x_dota.yml -o weights=https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams --infer_dir=/path/to/test/images --output_dir=output --visualize=False --save_results=True
|
||||
```
|
||||
Refering to [DOTA Task](https://captain-whu.github.io/DOTA/tasks.html), You need to submit a zip file containing results for all test images for evaluation. The detection results of each category are stored in a txt file, each line of which is in the following format
|
||||
`image_id score x1 y1 x2 y2 x3 y3 x4 y4`. To evaluate, you should submit the generated zip file to the Task1 of [DOTA Evaluation](https://captain-whu.github.io/DOTA/evaluation.html). You can execute the following command to generate the file
|
||||
```
|
||||
python configs/rotate/tools/generate_result.py --pred_txt_dir=output/ --output_dir=submit/ --data_type=dota10
|
||||
|
||||
zip -r submit.zip submit
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
The inputs of the `multiclass_nms` operator in Paddle support quadrilateral inputs, so deployment can be done without relying on the rotating frame IOU operator.
|
||||
|
||||
Please refer to the deployment tutorial[Predict deployment](../../../deploy/README_en.md)
|
||||
|
||||
|
||||
## Citations
|
||||
```
|
||||
@article{han2021align,
|
||||
author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}},
|
||||
journal={IEEE Transactions on Geoscience and Remote Sensing},
|
||||
title={Align Deep Features for Oriented Object Detection},
|
||||
year={2021},
|
||||
pages={1-11},
|
||||
doi={10.1109/TGRS.2021.3062048}}
|
||||
|
||||
@inproceedings{xia2018dota,
|
||||
title={DOTA: A large-scale dataset for object detection in aerial images},
|
||||
author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
|
||||
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
|
||||
pages={3974--3983},
|
||||
year={2018}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,52 @@
|
||||
architecture: S2ANet
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
|
||||
weights: output/s2anet_r50_fpn_1x_dota/model_final.pdparams
|
||||
|
||||
|
||||
# Model Achitecture
|
||||
S2ANet:
|
||||
backbone: ResNet
|
||||
neck: FPN
|
||||
head: S2ANetHead
|
||||
|
||||
ResNet:
|
||||
depth: 50
|
||||
variant: d
|
||||
norm_type: bn
|
||||
return_idx: [1,2,3]
|
||||
num_stages: 4
|
||||
|
||||
FPN:
|
||||
in_channels: [256, 512, 1024]
|
||||
out_channel: 256
|
||||
spatial_scales: [0.25, 0.125, 0.0625]
|
||||
has_extra_convs: True
|
||||
extra_stage: 2
|
||||
relu_before_extra_convs: False
|
||||
|
||||
S2ANetHead:
|
||||
anchor_strides: [8, 16, 32, 64, 128]
|
||||
anchor_scales: [4]
|
||||
anchor_ratios: [1.0]
|
||||
anchor_assign: RBoxAssigner
|
||||
stacked_convs: 2
|
||||
feat_in: 256
|
||||
feat_out: 256
|
||||
align_conv_type: 'AlignConv' # AlignConv Conv
|
||||
align_conv_size: 3
|
||||
use_sigmoid_cls: True
|
||||
reg_loss_weight: [1.0, 1.0, 1.0, 1.0, 1.1]
|
||||
cls_loss_weight: [1.1, 1.05]
|
||||
nms_pre: 2000
|
||||
nms:
|
||||
name: MultiClassNMS
|
||||
keep_top_k: -1
|
||||
score_threshold: 0.05
|
||||
nms_threshold: 0.1
|
||||
normalized: False
|
||||
|
||||
RBoxAssigner:
|
||||
pos_iou_thr: 0.5
|
||||
neg_iou_thr: 0.4
|
||||
min_iou_thr: 0.0
|
||||
ignore_iof_thr: -2
|
||||
@@ -0,0 +1,20 @@
|
||||
epoch: 12
|
||||
|
||||
LearningRate:
|
||||
base_lr: 0.01
|
||||
schedulers:
|
||||
- !PiecewiseDecay
|
||||
gamma: 0.1
|
||||
milestones: [7, 10]
|
||||
- !LinearWarmup
|
||||
start_factor: 0.3333333333333333
|
||||
steps: 500
|
||||
|
||||
OptimizerBuilder:
|
||||
optimizer:
|
||||
momentum: 0.9
|
||||
type: Momentum
|
||||
regularizer:
|
||||
factor: 0.0001
|
||||
type: L2
|
||||
clip_grad_by_norm: 35
|
||||
@@ -0,0 +1,20 @@
|
||||
epoch: 24
|
||||
|
||||
LearningRate:
|
||||
base_lr: 0.01
|
||||
schedulers:
|
||||
- !PiecewiseDecay
|
||||
gamma: 0.1
|
||||
milestones: [14, 20]
|
||||
- !LinearWarmup
|
||||
start_factor: 0.3333333333333333
|
||||
steps: 1000
|
||||
|
||||
OptimizerBuilder:
|
||||
optimizer:
|
||||
momentum: 0.9
|
||||
type: Momentum
|
||||
regularizer:
|
||||
factor: 0.0001
|
||||
type: L2
|
||||
clip_grad_by_norm: 35
|
||||
@@ -0,0 +1,44 @@
|
||||
worker_num: 4
|
||||
TrainReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Poly2Array: {}
|
||||
- RandomRFlip: {}
|
||||
- RResize: {target_size: [1024, 1024], keep_ratio: True, interp: 2}
|
||||
- Poly2RBox: {rbox_type: 'le135'}
|
||||
batch_transforms:
|
||||
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
|
||||
- Permute: {}
|
||||
- PadRGT: {}
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 2
|
||||
shuffle: true
|
||||
drop_last: true
|
||||
|
||||
|
||||
EvalReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Poly2Array: {}
|
||||
- RResize: {target_size: [1024, 1024], keep_ratio: True, interp: 2}
|
||||
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
|
||||
- Permute: {}
|
||||
batch_transforms:
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 2
|
||||
shuffle: false
|
||||
drop_last: false
|
||||
collate_batch: false
|
||||
|
||||
|
||||
TestReader:
|
||||
sample_transforms:
|
||||
- Decode: {}
|
||||
- Resize: {interp: 2, target_size: [1024, 1024], keep_ratio: True}
|
||||
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
|
||||
- Permute: {}
|
||||
batch_transforms:
|
||||
- PadBatch: {pad_to_stride: 32}
|
||||
batch_size: 1
|
||||
shuffle: false
|
||||
drop_last: false
|
||||
@@ -0,0 +1,25 @@
|
||||
_BASE_: [
|
||||
'../../datasets/spine_coco.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/s2anet_optimizer_1x.yml',
|
||||
'_base_/s2anet.yml',
|
||||
'_base_/s2anet_reader.yml',
|
||||
]
|
||||
|
||||
weights: output/s2anet_1x_spine/model_final
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/s2anet_alignconv_2x_dota.pdparams
|
||||
|
||||
# for 4 card
|
||||
LearningRate:
|
||||
base_lr: 0.01
|
||||
schedulers:
|
||||
- !PiecewiseDecay
|
||||
gamma: 0.1
|
||||
milestones: [7, 10]
|
||||
- !LinearWarmup
|
||||
start_factor: 0.3333333333333333
|
||||
epochs: 5
|
||||
|
||||
S2ANetHead:
|
||||
reg_loss_weight: [1.0, 1.0, 1.0, 1.0, 1.05]
|
||||
cls_loss_weight: [1.05, 1.0]
|
||||
@@ -0,0 +1,10 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/s2anet_optimizer_2x.yml',
|
||||
'_base_/s2anet.yml',
|
||||
'_base_/s2anet_reader.yml',
|
||||
]
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
|
||||
|
||||
weights: output/s2anet_alignconv_2x_dota/model_final
|
||||
@@ -0,0 +1,19 @@
|
||||
_BASE_: [
|
||||
'../../datasets/dota.yml',
|
||||
'../../runtime.yml',
|
||||
'_base_/s2anet_optimizer_2x.yml',
|
||||
'_base_/s2anet.yml',
|
||||
'_base_/s2anet_reader.yml',
|
||||
]
|
||||
weights: output/s2anet_conv_1x_dota/model_final
|
||||
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
|
||||
|
||||
ResNet:
|
||||
depth: 50
|
||||
variant: b
|
||||
norm_type: bn
|
||||
return_idx: [1,2,3]
|
||||
num_stages: 4
|
||||
|
||||
S2ANetHead:
|
||||
align_conv_type: 'Conv'
|
||||
@@ -0,0 +1,163 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
# Reference: https://github.com/CAPTAIN-WHU/DOTA_devkit
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import json
|
||||
import cv2
|
||||
from tqdm import tqdm
|
||||
from multiprocessing import Pool
|
||||
|
||||
|
||||
def load_dota_info(image_dir, anno_dir, file_name, ext=None):
|
||||
base_name, extension = os.path.splitext(file_name)
|
||||
if ext and (extension != ext and extension not in ext):
|
||||
return None
|
||||
info = {'image_file': os.path.join(image_dir, file_name), 'annotation': []}
|
||||
anno_file = os.path.join(anno_dir, base_name + '.txt')
|
||||
if not os.path.exists(anno_file):
|
||||
return info
|
||||
with open(anno_file, 'r') as f:
|
||||
for line in f:
|
||||
items = line.strip().split()
|
||||
if (len(items) < 9):
|
||||
continue
|
||||
|
||||
anno = {
|
||||
'poly': list(map(float, items[:8])),
|
||||
'name': items[8],
|
||||
'difficult': '0' if len(items) == 9 else items[9],
|
||||
}
|
||||
info['annotation'].append(anno)
|
||||
|
||||
return info
|
||||
|
||||
|
||||
def load_dota_infos(root_dir, num_process=8, ext=None):
|
||||
image_dir = os.path.join(root_dir, 'images')
|
||||
anno_dir = os.path.join(root_dir, 'labelTxt')
|
||||
data_infos = []
|
||||
if num_process > 1:
|
||||
pool = Pool(num_process)
|
||||
results = []
|
||||
for file_name in os.listdir(image_dir):
|
||||
results.append(
|
||||
pool.apply_async(load_dota_info, (image_dir, anno_dir,
|
||||
file_name, ext)))
|
||||
|
||||
pool.close()
|
||||
pool.join()
|
||||
|
||||
for result in results:
|
||||
info = result.get()
|
||||
if info:
|
||||
data_infos.append(info)
|
||||
|
||||
else:
|
||||
for file_name in os.listdir(image_dir):
|
||||
info = load_dota_info(image_dir, anno_dir, file_name, ext)
|
||||
if info:
|
||||
data_infos.append(info)
|
||||
|
||||
return data_infos
|
||||
|
||||
|
||||
def process_single_sample(info, image_id, class_names):
|
||||
image_file = info['image_file']
|
||||
single_image = dict()
|
||||
single_image['file_name'] = os.path.split(image_file)[-1]
|
||||
single_image['id'] = image_id
|
||||
image = cv2.imread(image_file)
|
||||
height, width, _ = image.shape
|
||||
single_image['width'] = width
|
||||
single_image['height'] = height
|
||||
|
||||
# process annotation field
|
||||
single_objs = []
|
||||
objects = info['annotation']
|
||||
for obj in objects:
|
||||
poly, name, difficult = obj['poly'], obj['name'], obj['difficult']
|
||||
if difficult == '2':
|
||||
continue
|
||||
|
||||
single_obj = dict()
|
||||
single_obj['category_id'] = class_names.index(name) + 1
|
||||
single_obj['segmentation'] = [poly]
|
||||
single_obj['iscrowd'] = 0
|
||||
xmin, ymin, xmax, ymax = min(poly[0::2]), min(poly[1::2]), max(poly[
|
||||
0::2]), max(poly[1::2])
|
||||
width, height = xmax - xmin, ymax - ymin
|
||||
single_obj['bbox'] = [xmin, ymin, width, height]
|
||||
single_obj['area'] = height * width
|
||||
single_obj['image_id'] = image_id
|
||||
single_objs.append(single_obj)
|
||||
|
||||
return (single_image, single_objs)
|
||||
|
||||
|
||||
def data_to_coco(infos, output_path, class_names, num_process):
|
||||
data_dict = dict()
|
||||
data_dict['categories'] = []
|
||||
|
||||
for i, name in enumerate(class_names):
|
||||
data_dict['categories'].append({
|
||||
'id': i + 1,
|
||||
'name': name,
|
||||
'supercategory': name
|
||||
})
|
||||
|
||||
pbar = tqdm(total=len(infos), desc='data to coco')
|
||||
images, annotations = [], []
|
||||
if num_process > 1:
|
||||
pool = Pool(num_process)
|
||||
results = []
|
||||
for i, info in enumerate(infos):
|
||||
image_id = i + 1
|
||||
results.append(
|
||||
pool.apply_async(
|
||||
process_single_sample, (info, image_id, class_names),
|
||||
callback=lambda x: pbar.update()))
|
||||
|
||||
pool.close()
|
||||
pool.join()
|
||||
|
||||
for result in results:
|
||||
single_image, single_anno = result.get()
|
||||
images.append(single_image)
|
||||
annotations += single_anno
|
||||
|
||||
else:
|
||||
for i, info in enumerate(infos):
|
||||
image_id = i + 1
|
||||
single_image, single_anno = process_single_sample(info, image_id,
|
||||
class_names)
|
||||
images.append(single_image)
|
||||
annotations += single_anno
|
||||
pbar.update()
|
||||
|
||||
pbar.close()
|
||||
|
||||
for i, anno in enumerate(annotations):
|
||||
anno['id'] = i + 1
|
||||
|
||||
data_dict['images'] = images
|
||||
data_dict['annotations'] = annotations
|
||||
|
||||
with open(output_path, 'w') as f:
|
||||
json.dump(data_dict, f)
|
||||
@@ -0,0 +1,266 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import os
|
||||
import re
|
||||
import glob
|
||||
|
||||
import numpy as np
|
||||
from multiprocessing import Pool
|
||||
from functools import partial
|
||||
from shapely.geometry import Polygon
|
||||
import argparse
|
||||
|
||||
wordname_15 = [
|
||||
'plane', 'baseball-diamond', 'bridge', 'ground-track-field',
|
||||
'small-vehicle', 'large-vehicle', 'ship', 'tennis-court',
|
||||
'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout',
|
||||
'harbor', 'swimming-pool', 'helicopter'
|
||||
]
|
||||
|
||||
wordname_16 = wordname_15 + ['container-crane']
|
||||
|
||||
wordname_18 = wordname_16 + ['airport', 'helipad']
|
||||
|
||||
DATA_CLASSES = {
|
||||
'dota10': wordname_15,
|
||||
'dota15': wordname_16,
|
||||
'dota20': wordname_18
|
||||
}
|
||||
|
||||
|
||||
def rbox_iou(g, p):
|
||||
"""
|
||||
iou of rbox
|
||||
"""
|
||||
g = np.array(g)
|
||||
p = np.array(p)
|
||||
g = Polygon(g[:8].reshape((4, 2)))
|
||||
p = Polygon(p[:8].reshape((4, 2)))
|
||||
g = g.buffer(0)
|
||||
p = p.buffer(0)
|
||||
if not g.is_valid or not p.is_valid:
|
||||
return 0
|
||||
inter = Polygon(g).intersection(Polygon(p)).area
|
||||
union = g.area + p.area - inter
|
||||
if union == 0:
|
||||
return 0
|
||||
else:
|
||||
return inter / union
|
||||
|
||||
|
||||
def py_cpu_nms_poly_fast(dets, thresh):
|
||||
"""
|
||||
Args:
|
||||
dets: pred results
|
||||
thresh: nms threshold
|
||||
|
||||
Returns: index of keep
|
||||
"""
|
||||
obbs = dets[:, 0:-1]
|
||||
x1 = np.min(obbs[:, 0::2], axis=1)
|
||||
y1 = np.min(obbs[:, 1::2], axis=1)
|
||||
x2 = np.max(obbs[:, 0::2], axis=1)
|
||||
y2 = np.max(obbs[:, 1::2], axis=1)
|
||||
scores = dets[:, 8]
|
||||
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
|
||||
|
||||
polys = []
|
||||
for i in range(len(dets)):
|
||||
tm_polygon = [
|
||||
dets[i][0], dets[i][1], dets[i][2], dets[i][3], dets[i][4],
|
||||
dets[i][5], dets[i][6], dets[i][7]
|
||||
]
|
||||
polys.append(tm_polygon)
|
||||
polys = np.array(polys)
|
||||
order = scores.argsort()[::-1]
|
||||
|
||||
keep = []
|
||||
while order.size > 0:
|
||||
ovr = []
|
||||
i = order[0]
|
||||
keep.append(i)
|
||||
|
||||
xx1 = np.maximum(x1[i], x1[order[1:]])
|
||||
yy1 = np.maximum(y1[i], y1[order[1:]])
|
||||
xx2 = np.minimum(x2[i], x2[order[1:]])
|
||||
yy2 = np.minimum(y2[i], y2[order[1:]])
|
||||
w = np.maximum(0.0, xx2 - xx1)
|
||||
h = np.maximum(0.0, yy2 - yy1)
|
||||
hbb_inter = w * h
|
||||
hbb_ovr = hbb_inter / (areas[i] + areas[order[1:]] - hbb_inter)
|
||||
h_inds = np.where(hbb_ovr > 0)[0]
|
||||
tmp_order = order[h_inds + 1]
|
||||
for j in range(tmp_order.size):
|
||||
iou = rbox_iou(polys[i], polys[tmp_order[j]])
|
||||
hbb_ovr[h_inds[j]] = iou
|
||||
|
||||
try:
|
||||
if math.isnan(ovr[0]):
|
||||
pdb.set_trace()
|
||||
except:
|
||||
pass
|
||||
inds = np.where(hbb_ovr <= thresh)[0]
|
||||
|
||||
order = order[inds + 1]
|
||||
return keep
|
||||
|
||||
|
||||
def poly2origpoly(poly, x, y, rate):
|
||||
origpoly = []
|
||||
for i in range(int(len(poly) / 2)):
|
||||
tmp_x = float(poly[i * 2] + x) / float(rate)
|
||||
tmp_y = float(poly[i * 2 + 1] + y) / float(rate)
|
||||
origpoly.append(tmp_x)
|
||||
origpoly.append(tmp_y)
|
||||
return origpoly
|
||||
|
||||
|
||||
def nmsbynamedict(nameboxdict, nms, thresh):
|
||||
"""
|
||||
Args:
|
||||
nameboxdict: nameboxdict
|
||||
nms: nms
|
||||
thresh: nms threshold
|
||||
|
||||
Returns: nms result as dict
|
||||
"""
|
||||
nameboxnmsdict = {x: [] for x in nameboxdict}
|
||||
for imgname in nameboxdict:
|
||||
keep = nms(np.array(nameboxdict[imgname]), thresh)
|
||||
outdets = []
|
||||
for index in keep:
|
||||
outdets.append(nameboxdict[imgname][index])
|
||||
nameboxnmsdict[imgname] = outdets
|
||||
return nameboxnmsdict
|
||||
|
||||
|
||||
def merge_single(output_dir, nms, nms_thresh, pred_class_lst):
|
||||
"""
|
||||
Args:
|
||||
output_dir: output_dir
|
||||
nms: nms
|
||||
pred_class_lst: pred_class_lst
|
||||
class_name: class_name
|
||||
|
||||
Returns:
|
||||
|
||||
"""
|
||||
class_name, pred_bbox_list = pred_class_lst
|
||||
nameboxdict = {}
|
||||
for line in pred_bbox_list:
|
||||
splitline = line.split(' ')
|
||||
subname = splitline[0]
|
||||
splitname = subname.split('__')
|
||||
oriname = splitname[0]
|
||||
pattern1 = re.compile(r'__\d+___\d+')
|
||||
x_y = re.findall(pattern1, subname)
|
||||
x_y_2 = re.findall(r'\d+', x_y[0])
|
||||
x, y = int(x_y_2[0]), int(x_y_2[1])
|
||||
|
||||
pattern2 = re.compile(r'__([\d+\.]+)__\d+___')
|
||||
|
||||
rate = re.findall(pattern2, subname)[0]
|
||||
|
||||
confidence = splitline[1]
|
||||
poly = list(map(float, splitline[2:]))
|
||||
origpoly = poly2origpoly(poly, x, y, rate)
|
||||
det = origpoly
|
||||
det.append(confidence)
|
||||
det = list(map(float, det))
|
||||
if (oriname not in nameboxdict):
|
||||
nameboxdict[oriname] = []
|
||||
nameboxdict[oriname].append(det)
|
||||
nameboxnmsdict = nmsbynamedict(nameboxdict, nms, nms_thresh)
|
||||
|
||||
# write result
|
||||
dstname = os.path.join(output_dir, class_name + '.txt')
|
||||
with open(dstname, 'w') as f_out:
|
||||
for imgname in nameboxnmsdict:
|
||||
for det in nameboxnmsdict[imgname]:
|
||||
confidence = det[-1]
|
||||
bbox = det[0:-1]
|
||||
outline = imgname + ' ' + str(confidence) + ' ' + ' '.join(
|
||||
map(str, bbox))
|
||||
f_out.write(outline + '\n')
|
||||
|
||||
|
||||
def generate_result(pred_txt_dir,
|
||||
output_dir='output',
|
||||
class_names=wordname_15,
|
||||
nms_thresh=0.1):
|
||||
"""
|
||||
pred_txt_dir: dir of pred txt
|
||||
output_dir: dir of output
|
||||
class_names: class names of data
|
||||
"""
|
||||
pred_txt_list = glob.glob("{}/*.txt".format(pred_txt_dir))
|
||||
|
||||
# step1: summary pred bbox
|
||||
pred_classes = {}
|
||||
for class_name in class_names:
|
||||
pred_classes[class_name] = []
|
||||
|
||||
for current_txt in pred_txt_list:
|
||||
img_id = os.path.split(current_txt)[1]
|
||||
img_id = img_id.split('.txt')[0]
|
||||
with open(current_txt) as f:
|
||||
res = f.readlines()
|
||||
for item in res:
|
||||
item = item.split(' ')
|
||||
pred_class = item[0]
|
||||
item[0] = img_id
|
||||
pred_bbox = ' '.join(item)
|
||||
pred_classes[pred_class].append(pred_bbox)
|
||||
|
||||
pred_classes_lst = []
|
||||
for class_name in pred_classes.keys():
|
||||
print('class_name: {}, count: {}'.format(class_name,
|
||||
len(pred_classes[class_name])))
|
||||
pred_classes_lst.append((class_name, pred_classes[class_name]))
|
||||
|
||||
# step2: merge
|
||||
pool = Pool(len(class_names))
|
||||
nms = py_cpu_nms_poly_fast
|
||||
mergesingle_fn = partial(merge_single, output_dir, nms, nms_thresh)
|
||||
pool.map(mergesingle_fn, pred_classes_lst)
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description='generate test results')
|
||||
parser.add_argument('--pred_txt_dir', type=str, help='path of pred txt dir')
|
||||
parser.add_argument(
|
||||
'--output_dir', type=str, default='output', help='path of output dir')
|
||||
parser.add_argument(
|
||||
'--data_type', type=str, default='dota10', help='data type')
|
||||
parser.add_argument(
|
||||
'--nms_thresh',
|
||||
type=float,
|
||||
default=0.1,
|
||||
help='nms threshold while merging results')
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
args = parse_args()
|
||||
|
||||
output_dir = args.output_dir
|
||||
if not os.path.exists(output_dir):
|
||||
os.makedirs(output_dir)
|
||||
|
||||
class_names = DATA_CLASSES[args.data_type]
|
||||
|
||||
generate_result(args.pred_txt_dir, output_dir, class_names)
|
||||
print('done!')
|
||||
@@ -0,0 +1,378 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import sys
|
||||
import six
|
||||
import glob
|
||||
import time
|
||||
import yaml
|
||||
import argparse
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
import paddle
|
||||
import paddle.version as paddle_version
|
||||
from paddle.inference import Config, create_predictor, PrecisionType, get_trt_runtime_version
|
||||
|
||||
TUNED_TRT_DYNAMIC_MODELS = {'DETR'}
|
||||
|
||||
|
||||
def check_version(version='2.2'):
|
||||
err = "PaddlePaddle version {} or higher is required, " \
|
||||
"or a suitable develop version is satisfied as well. \n" \
|
||||
"Please make sure the version is good with your code.".format(version)
|
||||
|
||||
version_installed = [
|
||||
paddle_version.major, paddle_version.minor, paddle_version.patch,
|
||||
paddle_version.rc
|
||||
]
|
||||
|
||||
if version_installed == ['0', '0', '0', '0']:
|
||||
return
|
||||
|
||||
if version == 'develop':
|
||||
raise Exception("PaddlePaddle develop version is required!")
|
||||
|
||||
version_split = version.split('.')
|
||||
|
||||
length = min(len(version_installed), len(version_split))
|
||||
for i in six.moves.range(length):
|
||||
if version_installed[i] > version_split[i]:
|
||||
return
|
||||
if version_installed[i] < version_split[i]:
|
||||
raise Exception(err)
|
||||
|
||||
|
||||
def check_trt_version(version='8.2'):
|
||||
err = "TensorRT version {} or higher is required," \
|
||||
"Please make sure the version is good with your code.".format(version)
|
||||
version_split = list(map(int, version.split('.')))
|
||||
version_installed = get_trt_runtime_version()
|
||||
length = min(len(version_installed), len(version_split))
|
||||
for i in six.moves.range(length):
|
||||
if version_installed[i] > version_split[i]:
|
||||
return
|
||||
if version_installed[i] < version_split[i]:
|
||||
raise Exception(err)
|
||||
|
||||
|
||||
# preprocess ops
|
||||
def decode_image(im_file, im_info):
|
||||
if isinstance(im_file, str):
|
||||
with open(im_file, 'rb') as f:
|
||||
im_read = f.read()
|
||||
data = np.frombuffer(im_read, dtype='uint8')
|
||||
im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode
|
||||
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
|
||||
else:
|
||||
im = im_file
|
||||
im_info['im_shape'] = np.array(im.shape[:2], dtype=np.float32)
|
||||
im_info['scale_factor'] = np.array([1., 1.], dtype=np.float32)
|
||||
return im, im_info
|
||||
|
||||
|
||||
class Resize(object):
|
||||
def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR):
|
||||
if isinstance(target_size, int):
|
||||
target_size = [target_size, target_size]
|
||||
self.target_size = target_size
|
||||
self.keep_ratio = keep_ratio
|
||||
self.interp = interp
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
assert len(self.target_size) == 2
|
||||
assert self.target_size[0] > 0 and self.target_size[1] > 0
|
||||
im_channel = im.shape[2]
|
||||
im_scale_y, im_scale_x = self.generate_scale(im)
|
||||
im = cv2.resize(
|
||||
im,
|
||||
None,
|
||||
None,
|
||||
fx=im_scale_x,
|
||||
fy=im_scale_y,
|
||||
interpolation=self.interp)
|
||||
im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
|
||||
im_info['scale_factor'] = np.array(
|
||||
[im_scale_y, im_scale_x]).astype('float32')
|
||||
return im, im_info
|
||||
|
||||
def generate_scale(self, im):
|
||||
origin_shape = im.shape[:2]
|
||||
im_c = im.shape[2]
|
||||
if self.keep_ratio:
|
||||
im_size_min = np.min(origin_shape)
|
||||
im_size_max = np.max(origin_shape)
|
||||
target_size_min = np.min(self.target_size)
|
||||
target_size_max = np.max(self.target_size)
|
||||
im_scale = float(target_size_min) / float(im_size_min)
|
||||
if np.round(im_scale * im_size_max) > target_size_max:
|
||||
im_scale = float(target_size_max) / float(im_size_max)
|
||||
im_scale_x = im_scale
|
||||
im_scale_y = im_scale
|
||||
else:
|
||||
resize_h, resize_w = self.target_size
|
||||
im_scale_y = resize_h / float(origin_shape[0])
|
||||
im_scale_x = resize_w / float(origin_shape[1])
|
||||
return im_scale_y, im_scale_x
|
||||
|
||||
|
||||
class Permute(object):
|
||||
def __init__(self, ):
|
||||
super(Permute, self).__init__()
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
im = im.transpose((2, 0, 1))
|
||||
return im, im_info
|
||||
|
||||
|
||||
class NormalizeImage(object):
|
||||
def __init__(self, mean, std, is_scale=True, norm_type='mean_std'):
|
||||
self.mean = mean
|
||||
self.std = std
|
||||
self.is_scale = is_scale
|
||||
self.norm_type = norm_type
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
im = im.astype(np.float32, copy=False)
|
||||
if self.is_scale:
|
||||
scale = 1.0 / 255.0
|
||||
im *= scale
|
||||
|
||||
if self.norm_type == 'mean_std':
|
||||
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
|
||||
std = np.array(self.std)[np.newaxis, np.newaxis, :]
|
||||
im -= mean
|
||||
im /= std
|
||||
return im, im_info
|
||||
|
||||
|
||||
class PadStride(object):
|
||||
def __init__(self, stride=0):
|
||||
self.coarsest_stride = stride
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
coarsest_stride = self.coarsest_stride
|
||||
if coarsest_stride <= 0:
|
||||
return im, im_info
|
||||
im_c, im_h, im_w = im.shape
|
||||
pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride)
|
||||
pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride)
|
||||
padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32)
|
||||
padding_im[:, :im_h, :im_w] = im
|
||||
return padding_im, im_info
|
||||
|
||||
|
||||
def preprocess(im, preprocess_ops):
|
||||
# process image by preprocess_ops
|
||||
im_info = {
|
||||
'scale_factor': np.array(
|
||||
[1., 1.], dtype=np.float32),
|
||||
'im_shape': None,
|
||||
}
|
||||
im, im_info = decode_image(im, im_info)
|
||||
for operator in preprocess_ops:
|
||||
im, im_info = operator(im, im_info)
|
||||
return im, im_info
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
'--model_dir', type=str, help='directory of inference model')
|
||||
parser.add_argument(
|
||||
'--run_mode', type=str, default='paddle', help='running mode')
|
||||
parser.add_argument('--batch_size', type=int, default=1, help='batch size')
|
||||
parser.add_argument(
|
||||
'--image_dir',
|
||||
type=str,
|
||||
default='/paddle/data/DOTA_1024_ss/test1024/images',
|
||||
help='directory of test images')
|
||||
parser.add_argument(
|
||||
'--warmup_iter', type=int, default=5, help='num of warmup iters')
|
||||
parser.add_argument(
|
||||
'--total_iter', type=int, default=2000, help='num of total iters')
|
||||
parser.add_argument(
|
||||
'--log_iter', type=int, default=50, help='num of log interval')
|
||||
parser.add_argument(
|
||||
'--tuned_trt_shape_file',
|
||||
type=str,
|
||||
default='shape_range_info.pbtxt',
|
||||
help='dynamic shape range info')
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def init_predictor(FLAGS):
|
||||
model_dir, run_mode, batch_size = FLAGS.model_dir, FLAGS.run_mode, FLAGS.batch_size
|
||||
yaml_file = os.path.join(model_dir, 'infer_cfg.yml')
|
||||
with open(yaml_file) as f:
|
||||
yml_conf = yaml.safe_load(f)
|
||||
|
||||
config = Config(
|
||||
os.path.join(model_dir, 'model.pdmodel'),
|
||||
os.path.join(model_dir, 'model.pdiparams'))
|
||||
|
||||
# initial GPU memory(M), device ID
|
||||
config.enable_use_gpu(200, 0)
|
||||
# optimize graph and fuse op
|
||||
config.switch_ir_optim(True)
|
||||
|
||||
precision_map = {
|
||||
'trt_int8': Config.Precision.Int8,
|
||||
'trt_fp32': Config.Precision.Float32,
|
||||
'trt_fp16': Config.Precision.Half
|
||||
}
|
||||
|
||||
arch = yml_conf['arch']
|
||||
tuned_trt_shape_file = os.path.join(model_dir, FLAGS.tuned_trt_shape_file)
|
||||
|
||||
if run_mode in precision_map.keys():
|
||||
if arch in TUNED_TRT_DYNAMIC_MODELS and not os.path.exists(
|
||||
tuned_trt_shape_file):
|
||||
print(
|
||||
'dynamic shape range info is saved in {}. After that, rerun the code'.
|
||||
format(tuned_trt_shape_file))
|
||||
config.collect_shape_range_info(tuned_trt_shape_file)
|
||||
config.enable_tensorrt_engine(
|
||||
workspace_size=(1 << 25) * batch_size,
|
||||
max_batch_size=batch_size,
|
||||
min_subgraph_size=yml_conf['min_subgraph_size'],
|
||||
precision_mode=precision_map[run_mode],
|
||||
use_static=True,
|
||||
use_calib_mode=False)
|
||||
|
||||
if yml_conf['use_dynamic_shape']:
|
||||
if arch in TUNED_TRT_DYNAMIC_MODELS and os.path.exists(
|
||||
tuned_trt_shape_file):
|
||||
config.enable_tuned_tensorrt_dynamic_shape(tuned_trt_shape_file,
|
||||
True)
|
||||
else:
|
||||
min_input_shape = {
|
||||
'image': [batch_size, 3, 640, 640],
|
||||
'scale_factor': [batch_size, 2]
|
||||
}
|
||||
max_input_shape = {
|
||||
'image': [batch_size, 3, 1280, 1280],
|
||||
'scale_factor': [batch_size, 2]
|
||||
}
|
||||
opt_input_shape = {
|
||||
'image': [batch_size, 3, 1024, 1024],
|
||||
'scale_factor': [batch_size, 2]
|
||||
}
|
||||
config.set_trt_dynamic_shape_info(
|
||||
min_input_shape, max_input_shape, opt_input_shape)
|
||||
|
||||
# disable print log when predict
|
||||
config.disable_glog_info()
|
||||
# enable shared memory
|
||||
config.enable_memory_optim()
|
||||
# disable feed, fetch OP, needed by zero_copy_run
|
||||
config.switch_use_feed_fetch_ops(False)
|
||||
predictor = create_predictor(config)
|
||||
return predictor, yml_conf
|
||||
|
||||
|
||||
def create_preprocess_ops(yml_conf):
|
||||
preprocess_ops = []
|
||||
for op_info in yml_conf['Preprocess']:
|
||||
new_op_info = op_info.copy()
|
||||
op_type = new_op_info.pop('type')
|
||||
preprocess_ops.append(eval(op_type)(**new_op_info))
|
||||
return preprocess_ops
|
||||
|
||||
|
||||
def get_test_images(image_dir):
|
||||
images = set()
|
||||
infer_dir = os.path.abspath(image_dir)
|
||||
exts = ['jpg', 'jpeg', 'png', 'bmp']
|
||||
exts += [ext.upper() for ext in exts]
|
||||
for ext in exts:
|
||||
images.update(glob.glob('{}/*.{}'.format(infer_dir, ext)))
|
||||
images = list(images)
|
||||
return images
|
||||
|
||||
|
||||
def create_inputs(image_files, preprocess_ops):
|
||||
inputs = dict()
|
||||
im_list, im_info_list = [], []
|
||||
for im_path in image_files:
|
||||
im, im_info = preprocess(im_path, preprocess_ops)
|
||||
im_list.append(im)
|
||||
im_info_list.append(im_info)
|
||||
|
||||
inputs['im_shape'] = np.stack(
|
||||
[e['im_shape'] for e in im_info_list], axis=0).astype('float32')
|
||||
inputs['scale_factor'] = np.stack(
|
||||
[e['scale_factor'] for e in im_info_list], axis=0).astype('float32')
|
||||
inputs['image'] = np.stack(im_list, axis=0).astype('float32')
|
||||
return inputs
|
||||
|
||||
|
||||
def measure_speed(FLAGS):
|
||||
predictor, yml_conf = init_predictor(FLAGS)
|
||||
input_names = predictor.get_input_names()
|
||||
preprocess_ops = create_preprocess_ops(yml_conf)
|
||||
|
||||
image_files = get_test_images(FLAGS.image_dir)
|
||||
|
||||
batch_size = FLAGS.batch_size
|
||||
warmup_iter, log_iter, total_iter = FLAGS.warmup_iter, FLAGS.log_iter, FLAGS.total_iter
|
||||
|
||||
total_time = 0
|
||||
fps = 0
|
||||
for i in range(0, total_iter, batch_size):
|
||||
# make data ready
|
||||
inputs = create_inputs(image_files[i:i + batch_size], preprocess_ops)
|
||||
for name in input_names:
|
||||
input_tensor = predictor.get_input_handle(name)
|
||||
input_tensor.copy_from_cpu(inputs[name])
|
||||
|
||||
paddle.device.cuda.synchronize()
|
||||
# start running
|
||||
start_time = time.perf_counter()
|
||||
predictor.run()
|
||||
paddle.device.cuda.synchronize()
|
||||
|
||||
if i >= warmup_iter:
|
||||
total_time += time.perf_counter() - start_time
|
||||
if (i + 1) % log_iter == 0:
|
||||
fps = (i + 1 - warmup_iter) / total_time
|
||||
print(
|
||||
f'Done image [{i + 1:<3}/ {total_iter}], '
|
||||
f'fps: {fps:.1f} img / s, '
|
||||
f'times per image: {1000 / fps:.1f} ms / img',
|
||||
flush=True)
|
||||
|
||||
if (i + 1) == total_iter:
|
||||
fps = (i + 1 - warmup_iter) / total_time
|
||||
print(
|
||||
f'Overall fps: {fps:.1f} img / s, '
|
||||
f'times per image: {1000 / fps:.1f} ms / img',
|
||||
flush=True)
|
||||
break
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
FLAGS = parse_args()
|
||||
if 'trt' in FLAGS.run_mode:
|
||||
check_version('develop')
|
||||
check_trt_version('8.2')
|
||||
else:
|
||||
check_version('2.4')
|
||||
measure_speed(FLAGS)
|
||||
@@ -0,0 +1,302 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import sys
|
||||
import six
|
||||
import glob
|
||||
import copy
|
||||
import yaml
|
||||
import argparse
|
||||
import cv2
|
||||
import numpy as np
|
||||
from shapely.geometry import Polygon
|
||||
from onnxruntime import InferenceSession
|
||||
|
||||
|
||||
# preprocess ops
|
||||
def decode_image(img_path):
|
||||
with open(img_path, 'rb') as f:
|
||||
im_read = f.read()
|
||||
data = np.frombuffer(im_read, dtype='uint8')
|
||||
im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode
|
||||
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
|
||||
img_info = {
|
||||
"im_shape": np.array(
|
||||
im.shape[:2], dtype=np.float32),
|
||||
"scale_factor": np.array(
|
||||
[1., 1.], dtype=np.float32)
|
||||
}
|
||||
return im, img_info
|
||||
|
||||
|
||||
class Resize(object):
|
||||
def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR):
|
||||
if isinstance(target_size, int):
|
||||
target_size = [target_size, target_size]
|
||||
self.target_size = target_size
|
||||
self.keep_ratio = keep_ratio
|
||||
self.interp = interp
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
assert len(self.target_size) == 2
|
||||
assert self.target_size[0] > 0 and self.target_size[1] > 0
|
||||
im_channel = im.shape[2]
|
||||
im_scale_y, im_scale_x = self.generate_scale(im)
|
||||
im = cv2.resize(
|
||||
im,
|
||||
None,
|
||||
None,
|
||||
fx=im_scale_x,
|
||||
fy=im_scale_y,
|
||||
interpolation=self.interp)
|
||||
im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
|
||||
im_info['scale_factor'] = np.array(
|
||||
[im_scale_y, im_scale_x]).astype('float32')
|
||||
return im, im_info
|
||||
|
||||
def generate_scale(self, im):
|
||||
origin_shape = im.shape[:2]
|
||||
im_c = im.shape[2]
|
||||
if self.keep_ratio:
|
||||
im_size_min = np.min(origin_shape)
|
||||
im_size_max = np.max(origin_shape)
|
||||
target_size_min = np.min(self.target_size)
|
||||
target_size_max = np.max(self.target_size)
|
||||
im_scale = float(target_size_min) / float(im_size_min)
|
||||
if np.round(im_scale * im_size_max) > target_size_max:
|
||||
im_scale = float(target_size_max) / float(im_size_max)
|
||||
im_scale_x = im_scale
|
||||
im_scale_y = im_scale
|
||||
else:
|
||||
resize_h, resize_w = self.target_size
|
||||
im_scale_y = resize_h / float(origin_shape[0])
|
||||
im_scale_x = resize_w / float(origin_shape[1])
|
||||
return im_scale_y, im_scale_x
|
||||
|
||||
|
||||
class Permute(object):
|
||||
def __init__(self, ):
|
||||
super(Permute, self).__init__()
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
im = im.transpose((2, 0, 1))
|
||||
return im, im_info
|
||||
|
||||
|
||||
class NormalizeImage(object):
|
||||
def __init__(self, mean, std, is_scale=True, norm_type='mean_std'):
|
||||
self.mean = mean
|
||||
self.std = std
|
||||
self.is_scale = is_scale
|
||||
self.norm_type = norm_type
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
im = im.astype(np.float32, copy=False)
|
||||
if self.is_scale:
|
||||
scale = 1.0 / 255.0
|
||||
im *= scale
|
||||
|
||||
if self.norm_type == 'mean_std':
|
||||
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
|
||||
std = np.array(self.std)[np.newaxis, np.newaxis, :]
|
||||
im -= mean
|
||||
im /= std
|
||||
return im, im_info
|
||||
|
||||
|
||||
class PadStride(object):
|
||||
def __init__(self, stride=0):
|
||||
self.coarsest_stride = stride
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
coarsest_stride = self.coarsest_stride
|
||||
if coarsest_stride <= 0:
|
||||
return im, im_info
|
||||
im_c, im_h, im_w = im.shape
|
||||
pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride)
|
||||
pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride)
|
||||
padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32)
|
||||
padding_im[:, :im_h, :im_w] = im
|
||||
return padding_im, im_info
|
||||
|
||||
|
||||
class Compose:
|
||||
def __init__(self, transforms):
|
||||
self.transforms = []
|
||||
for op_info in transforms:
|
||||
new_op_info = op_info.copy()
|
||||
op_type = new_op_info.pop('type')
|
||||
self.transforms.append(eval(op_type)(**new_op_info))
|
||||
|
||||
def __call__(self, img_path):
|
||||
img, im_info = decode_image(img_path)
|
||||
for t in self.transforms:
|
||||
img, im_info = t(img, im_info)
|
||||
inputs = copy.deepcopy(im_info)
|
||||
inputs['image'] = img
|
||||
return inputs
|
||||
|
||||
|
||||
# postprocess
|
||||
def rbox_iou(g, p):
|
||||
g = np.array(g)
|
||||
p = np.array(p)
|
||||
g = Polygon(g[:8].reshape((4, 2)))
|
||||
p = Polygon(p[:8].reshape((4, 2)))
|
||||
g = g.buffer(0)
|
||||
p = p.buffer(0)
|
||||
if not g.is_valid or not p.is_valid:
|
||||
return 0
|
||||
inter = Polygon(g).intersection(Polygon(p)).area
|
||||
union = g.area + p.area - inter
|
||||
if union == 0:
|
||||
return 0
|
||||
else:
|
||||
return inter / union
|
||||
|
||||
|
||||
def multiclass_nms_rotated(pred_bboxes,
|
||||
pred_scores,
|
||||
iou_threshlod=0.1,
|
||||
score_threshold=0.1):
|
||||
"""
|
||||
Args:
|
||||
pred_bboxes (numpy.ndarray): [B, N, 8]
|
||||
pred_scores (numpy.ndarray): [B, C, N]
|
||||
|
||||
Return:
|
||||
bboxes (numpy.ndarray): [N, 10]
|
||||
bbox_num (numpy.ndarray): [B]
|
||||
"""
|
||||
bbox_num = []
|
||||
bboxes = []
|
||||
for bbox_per_img, score_per_img in zip(pred_bboxes, pred_scores):
|
||||
num_per_img = 0
|
||||
for cls_id, score_per_cls in enumerate(score_per_img):
|
||||
keep_mask = score_per_cls > score_threshold
|
||||
bbox = bbox_per_img[keep_mask]
|
||||
score = score_per_cls[keep_mask]
|
||||
|
||||
idx = score.argsort()[::-1]
|
||||
bbox = bbox[idx]
|
||||
score = score[idx]
|
||||
keep_idx = []
|
||||
for i, b in enumerate(bbox):
|
||||
supressed = False
|
||||
for gi in keep_idx:
|
||||
g = bbox[gi]
|
||||
if rbox_iou(b, g) > iou_threshlod:
|
||||
supressed = True
|
||||
break
|
||||
|
||||
if supressed:
|
||||
continue
|
||||
|
||||
keep_idx.append(i)
|
||||
|
||||
keep_box = bbox[keep_idx]
|
||||
keep_score = score[keep_idx]
|
||||
keep_cls_ids = np.ones(len(keep_idx)) * cls_id
|
||||
bboxes.append(
|
||||
np.concatenate(
|
||||
[keep_cls_ids[:, None], keep_score[:, None], keep_box],
|
||||
axis=-1))
|
||||
num_per_img += len(keep_idx)
|
||||
|
||||
bbox_num.append(num_per_img)
|
||||
|
||||
return np.concatenate(bboxes, axis=0), np.array(bbox_num)
|
||||
|
||||
|
||||
def get_test_images(infer_dir, infer_img):
|
||||
"""
|
||||
Get image path list in TEST mode
|
||||
"""
|
||||
assert infer_img is not None or infer_dir is not None, \
|
||||
"--image_file or --image_dir should be set"
|
||||
assert infer_img is None or os.path.isfile(infer_img), \
|
||||
"{} is not a file".format(infer_img)
|
||||
assert infer_dir is None or os.path.isdir(infer_dir), \
|
||||
"{} is not a directory".format(infer_dir)
|
||||
|
||||
# infer_img has a higher priority
|
||||
if infer_img and os.path.isfile(infer_img):
|
||||
return [infer_img]
|
||||
|
||||
images = set()
|
||||
infer_dir = os.path.abspath(infer_dir)
|
||||
assert os.path.isdir(infer_dir), \
|
||||
"infer_dir {} is not a directory".format(infer_dir)
|
||||
exts = ['jpg', 'jpeg', 'png', 'bmp']
|
||||
exts += [ext.upper() for ext in exts]
|
||||
for ext in exts:
|
||||
images.update(glob.glob('{}/*.{}'.format(infer_dir, ext)))
|
||||
images = list(images)
|
||||
|
||||
assert len(images) > 0, "no image found in {}".format(infer_dir)
|
||||
print("Found {} inference images in total.".format(len(images)))
|
||||
|
||||
return images
|
||||
|
||||
|
||||
def predict_image(infer_config, predictor, img_list):
|
||||
# load preprocess transforms
|
||||
transforms = Compose(infer_config['Preprocess'])
|
||||
# predict image
|
||||
for img_path in img_list:
|
||||
inputs = transforms(img_path)
|
||||
inputs_name = [var.name for var in predictor.get_inputs()]
|
||||
inputs = {k: inputs[k][None, ] for k in inputs_name}
|
||||
|
||||
outputs = predictor.run(output_names=None, input_feed=inputs)
|
||||
|
||||
bboxes, bbox_num = multiclass_nms_rotated(
|
||||
np.array(outputs[0]), np.array(outputs[1]))
|
||||
print("ONNXRuntime predict: ")
|
||||
for bbox in bboxes:
|
||||
if bbox[0] > -1 and bbox[1] > infer_config['draw_threshold']:
|
||||
print(f"{int(bbox[0])} {bbox[1]} "
|
||||
f"{bbox[2]} {bbox[3]} {bbox[4]} {bbox[5]}"
|
||||
f"{bbox[6]} {bbox[7]} {bbox[8]} {bbox[9]}")
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument("--infer_cfg", type=str, help="infer_cfg.yml")
|
||||
parser.add_argument(
|
||||
'--onnx_file',
|
||||
type=str,
|
||||
default="model.onnx",
|
||||
help="onnx model file path")
|
||||
parser.add_argument("--image_dir", type=str)
|
||||
parser.add_argument("--image_file", type=str)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
FLAGS = parse_args()
|
||||
# load image list
|
||||
img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
|
||||
# load predictor
|
||||
predictor = InferenceSession(FLAGS.onnx_file)
|
||||
# load infer config
|
||||
with open(FLAGS.infer_cfg) as f:
|
||||
infer_config = yaml.safe_load(f)
|
||||
|
||||
predict_image(infer_config, predictor, img_list)
|
||||
@@ -0,0 +1,128 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import argparse
|
||||
from convert import load_dota_infos, data_to_coco
|
||||
from slicebase import SliceBase
|
||||
|
||||
wordname_15 = [
|
||||
'plane', 'baseball-diamond', 'bridge', 'ground-track-field',
|
||||
'small-vehicle', 'large-vehicle', 'ship', 'tennis-court',
|
||||
'basketball-court', 'storage-tank', 'soccer-ball-field', 'roundabout',
|
||||
'harbor', 'swimming-pool', 'helicopter'
|
||||
]
|
||||
|
||||
wordname_16 = wordname_15 + ['container-crane']
|
||||
|
||||
wordname_18 = wordname_16 + ['airport', 'helipad']
|
||||
|
||||
DATA_CLASSES = {
|
||||
'dota10': wordname_15,
|
||||
'dota15': wordname_16,
|
||||
'dota20': wordname_18
|
||||
}
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser('prepare data for training')
|
||||
|
||||
parser.add_argument(
|
||||
'--input_dirs',
|
||||
nargs='+',
|
||||
type=str,
|
||||
default=None,
|
||||
help='input dirs which contain image and labelTxt dir')
|
||||
|
||||
parser.add_argument(
|
||||
'--output_dir',
|
||||
type=str,
|
||||
default=None,
|
||||
help='output dirs which contain image and labelTxt dir and coco style json file'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--coco_json_file',
|
||||
type=str,
|
||||
default='',
|
||||
help='coco json annotation files')
|
||||
|
||||
parser.add_argument('--subsize', type=int, default=1024, help='patch size')
|
||||
|
||||
parser.add_argument('--gap', type=int, default=200, help='step size')
|
||||
|
||||
parser.add_argument(
|
||||
'--data_type', type=str, default='dota10', help='data type')
|
||||
|
||||
parser.add_argument(
|
||||
'--rates',
|
||||
nargs='+',
|
||||
type=float,
|
||||
default=[1.],
|
||||
help='scales for multi-slice training')
|
||||
|
||||
parser.add_argument(
|
||||
'--nproc', type=int, default=8, help='the processor number')
|
||||
|
||||
parser.add_argument(
|
||||
'--iof_thr',
|
||||
type=float,
|
||||
default=0.5,
|
||||
help='the minimal iof between a object and a window')
|
||||
|
||||
parser.add_argument(
|
||||
'--image_only',
|
||||
action='store_true',
|
||||
default=False,
|
||||
help='only processing image')
|
||||
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def load_dataset(input_dir, nproc, data_type):
|
||||
if 'dota' in data_type.lower():
|
||||
infos = load_dota_infos(input_dir, nproc)
|
||||
else:
|
||||
raise ValueError('only dota dataset is supported now')
|
||||
|
||||
return infos
|
||||
|
||||
|
||||
def main():
|
||||
args = parse_args()
|
||||
infos = []
|
||||
for input_dir in args.input_dirs:
|
||||
infos += load_dataset(input_dir, args.nproc, args.data_type)
|
||||
|
||||
slicer = SliceBase(
|
||||
args.gap,
|
||||
args.subsize,
|
||||
args.iof_thr,
|
||||
num_process=args.nproc,
|
||||
image_only=args.image_only)
|
||||
slicer.slice_data(infos, args.rates, args.output_dir)
|
||||
if args.coco_json_file:
|
||||
infos = load_dota_infos(args.output_dir, args.nproc)
|
||||
coco_json_file = os.path.join(args.output_dir, args.coco_json_file)
|
||||
class_names = DATA_CLASSES[args.data_type]
|
||||
data_to_coco(infos, coco_json_file, class_names, args.nproc)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@@ -0,0 +1,267 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
# Reference: https://github.com/CAPTAIN-WHU/DOTA_devkit
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import math
|
||||
import copy
|
||||
from numbers import Number
|
||||
from multiprocessing import Pool
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
from tqdm import tqdm
|
||||
import shapely.geometry as shgeo
|
||||
|
||||
|
||||
def choose_best_pointorder_fit_another(poly1, poly2):
|
||||
"""
|
||||
To make the two polygons best fit with each point
|
||||
"""
|
||||
x1, y1, x2, y2, x3, y3, x4, y4 = poly1
|
||||
combinate = [
|
||||
np.array([x1, y1, x2, y2, x3, y3, x4, y4]),
|
||||
np.array([x2, y2, x3, y3, x4, y4, x1, y1]),
|
||||
np.array([x3, y3, x4, y4, x1, y1, x2, y2]),
|
||||
np.array([x4, y4, x1, y1, x2, y2, x3, y3])
|
||||
]
|
||||
dst_coordinate = np.array(poly2)
|
||||
distances = np.array(
|
||||
[np.sum((coord - dst_coordinate)**2) for coord in combinate])
|
||||
sorted = distances.argsort()
|
||||
return combinate[sorted[0]]
|
||||
|
||||
|
||||
def cal_line_length(point1, point2):
|
||||
return math.sqrt(
|
||||
math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1], 2))
|
||||
|
||||
|
||||
class SliceBase(object):
|
||||
def __init__(self,
|
||||
gap=512,
|
||||
subsize=1024,
|
||||
thresh=0.7,
|
||||
choosebestpoint=True,
|
||||
ext='.png',
|
||||
padding=True,
|
||||
num_process=8,
|
||||
image_only=False):
|
||||
self.gap = gap
|
||||
self.subsize = subsize
|
||||
self.slide = subsize - gap
|
||||
self.thresh = thresh
|
||||
self.choosebestpoint = choosebestpoint
|
||||
self.ext = ext
|
||||
self.padding = padding
|
||||
self.num_process = num_process
|
||||
self.image_only = image_only
|
||||
|
||||
def get_windows(self, height, width):
|
||||
windows = []
|
||||
left, up = 0, 0
|
||||
while (left < width):
|
||||
if (left + self.subsize >= width):
|
||||
left = max(width - self.subsize, 0)
|
||||
up = 0
|
||||
while (up < height):
|
||||
if (up + self.subsize >= height):
|
||||
up = max(height - self.subsize, 0)
|
||||
right = min(left + self.subsize, width - 1)
|
||||
down = min(up + self.subsize, height - 1)
|
||||
windows.append((left, up, right, down))
|
||||
if (up + self.subsize >= height):
|
||||
break
|
||||
else:
|
||||
up = up + self.slide
|
||||
if (left + self.subsize >= width):
|
||||
break
|
||||
else:
|
||||
left = left + self.slide
|
||||
|
||||
return windows
|
||||
|
||||
def slice_image_single(self, image, windows, output_dir, output_name):
|
||||
image_dir = os.path.join(output_dir, 'images')
|
||||
for (left, up, right, down) in windows:
|
||||
image_name = output_name + str(left) + '___' + str(up) + self.ext
|
||||
subimg = copy.deepcopy(image[up:up + self.subsize, left:left +
|
||||
self.subsize])
|
||||
h, w, c = subimg.shape
|
||||
if (self.padding):
|
||||
outimg = np.zeros((self.subsize, self.subsize, 3))
|
||||
outimg[0:h, 0:w, :] = subimg
|
||||
cv2.imwrite(os.path.join(image_dir, image_name), outimg)
|
||||
else:
|
||||
cv2.imwrite(os.path.join(image_dir, image_name), subimg)
|
||||
|
||||
def iof(self, poly1, poly2):
|
||||
inter_poly = poly1.intersection(poly2)
|
||||
inter_area = inter_poly.area
|
||||
poly1_area = poly1.area
|
||||
half_iou = inter_area / poly1_area
|
||||
return inter_poly, half_iou
|
||||
|
||||
def translate(self, poly, left, up):
|
||||
n = len(poly)
|
||||
out_poly = np.zeros(n)
|
||||
for i in range(n // 2):
|
||||
out_poly[i * 2] = int(poly[i * 2] - left)
|
||||
out_poly[i * 2 + 1] = int(poly[i * 2 + 1] - up)
|
||||
return out_poly
|
||||
|
||||
def get_poly4_from_poly5(self, poly):
|
||||
distances = [
|
||||
cal_line_length((poly[i * 2], poly[i * 2 + 1]),
|
||||
(poly[(i + 1) * 2], poly[(i + 1) * 2 + 1]))
|
||||
for i in range(int(len(poly) / 2 - 1))
|
||||
]
|
||||
distances.append(
|
||||
cal_line_length((poly[0], poly[1]), (poly[8], poly[9])))
|
||||
pos = np.array(distances).argsort()[0]
|
||||
count = 0
|
||||
out_poly = []
|
||||
while count < 5:
|
||||
if (count == pos):
|
||||
out_poly.append(
|
||||
(poly[count * 2] + poly[(count * 2 + 2) % 10]) / 2)
|
||||
out_poly.append(
|
||||
(poly[(count * 2 + 1) % 10] + poly[(count * 2 + 3) % 10]) /
|
||||
2)
|
||||
count = count + 1
|
||||
elif (count == (pos + 1) % 5):
|
||||
count = count + 1
|
||||
continue
|
||||
|
||||
else:
|
||||
out_poly.append(poly[count * 2])
|
||||
out_poly.append(poly[count * 2 + 1])
|
||||
count = count + 1
|
||||
return out_poly
|
||||
|
||||
def slice_anno_single(self, annos, windows, output_dir, output_name):
|
||||
anno_dir = os.path.join(output_dir, 'labelTxt')
|
||||
for (left, up, right, down) in windows:
|
||||
image_poly = shgeo.Polygon(
|
||||
[(left, up), (right, up), (right, down), (left, down)])
|
||||
anno_file = output_name + str(left) + '___' + str(up) + '.txt'
|
||||
with open(os.path.join(anno_dir, anno_file), 'w') as f:
|
||||
for anno in annos:
|
||||
gt_poly = shgeo.Polygon(
|
||||
[(anno['poly'][0], anno['poly'][1]),
|
||||
(anno['poly'][2], anno['poly'][3]),
|
||||
(anno['poly'][4], anno['poly'][5]),
|
||||
(anno['poly'][6], anno['poly'][7])])
|
||||
if gt_poly.area <= 0:
|
||||
continue
|
||||
inter_poly, iof = self.iof(gt_poly, image_poly)
|
||||
if iof == 1:
|
||||
final_poly = self.translate(anno['poly'], left, up)
|
||||
elif iof > 0:
|
||||
inter_poly = shgeo.polygon.orient(inter_poly, sign=1)
|
||||
out_poly = list(inter_poly.exterior.coords)[0:-1]
|
||||
if len(out_poly) < 4 or len(out_poly) > 5:
|
||||
continue
|
||||
|
||||
final_poly = []
|
||||
for p in out_poly:
|
||||
final_poly.append(p[0])
|
||||
final_poly.append(p[1])
|
||||
|
||||
if len(out_poly) == 5:
|
||||
final_poly = self.get_poly4_from_poly5(final_poly)
|
||||
|
||||
if self.choosebestpoint:
|
||||
final_poly = choose_best_pointorder_fit_another(
|
||||
final_poly, anno['poly'])
|
||||
|
||||
final_poly = self.translate(final_poly, left, up)
|
||||
final_poly = np.clip(final_poly, 1, self.subsize)
|
||||
else:
|
||||
continue
|
||||
outline = ' '.join(list(map(str, final_poly)))
|
||||
if iof >= self.thresh:
|
||||
outline = outline + ' ' + anno['name'] + ' ' + str(anno[
|
||||
'difficult'])
|
||||
else:
|
||||
outline = outline + ' ' + anno['name'] + ' ' + '2'
|
||||
|
||||
f.write(outline + '\n')
|
||||
|
||||
def slice_data_single(self, info, rate, output_dir):
|
||||
file_name = info['image_file']
|
||||
base_name = os.path.splitext(os.path.split(file_name)[-1])[0]
|
||||
base_name = base_name + '__' + str(rate) + '__'
|
||||
img = cv2.imread(file_name)
|
||||
if img.shape == ():
|
||||
return
|
||||
|
||||
if (rate != 1):
|
||||
resize_img = cv2.resize(
|
||||
img, None, fx=rate, fy=rate, interpolation=cv2.INTER_CUBIC)
|
||||
else:
|
||||
resize_img = img
|
||||
|
||||
height, width, _ = resize_img.shape
|
||||
windows = self.get_windows(height, width)
|
||||
self.slice_image_single(resize_img, windows, output_dir, base_name)
|
||||
if not self.image_only:
|
||||
annos = info['annotation']
|
||||
for anno in annos:
|
||||
anno['poly'] = list(map(lambda x: rate * x, anno['poly']))
|
||||
self.slice_anno_single(annos, windows, output_dir, base_name)
|
||||
|
||||
def check_or_mkdirs(self, path):
|
||||
if not os.path.exists(path):
|
||||
os.makedirs(path, exist_ok=True)
|
||||
|
||||
def slice_data(self, infos, rates, output_dir):
|
||||
"""
|
||||
Args:
|
||||
infos (list[dict]): data_infos
|
||||
rates (float, list): scale rates
|
||||
output_dir (str): output directory
|
||||
"""
|
||||
if isinstance(rates, Number):
|
||||
rates = [rates, ]
|
||||
|
||||
self.check_or_mkdirs(output_dir)
|
||||
self.check_or_mkdirs(os.path.join(output_dir, 'images'))
|
||||
if not self.image_only:
|
||||
self.check_or_mkdirs(os.path.join(output_dir, 'labelTxt'))
|
||||
|
||||
pbar = tqdm(total=len(rates) * len(infos), desc='slicing data')
|
||||
|
||||
if self.num_process <= 1:
|
||||
for rate in rates:
|
||||
for info in infos:
|
||||
self.slice_data_single(info, rate, output_dir)
|
||||
pbar.update()
|
||||
else:
|
||||
pool = Pool(self.num_process)
|
||||
for rate in rates:
|
||||
for info in infos:
|
||||
pool.apply_async(
|
||||
self.slice_data_single, (info, rate, output_dir),
|
||||
callback=lambda x: pbar.update())
|
||||
|
||||
pool.close()
|
||||
pool.join()
|
||||
|
||||
pbar.close()
|
||||
Reference in New Issue
Block a user