更换文档检测模型

This commit is contained in:
2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions

View File

@@ -0,0 +1,279 @@
简体中文 | [English](DetAnnoTools_en.md)
# 目标检测标注工具
## 目录
[LabelMe](#LabelMe)
* [使用说明](#使用说明)
* [安装](#LabelMe安装)
* [图片标注过程](#LabelMe图片标注过程)
* [标注格式](#LabelMe标注格式)
* [导出数据格式](#LabelMe导出数据格式)
* [格式转化总结](#格式转化总结)
* [标注文件(json)-->VOC](#标注文件(json)-->VOC数据集)
* [标注文件(json)-->COCO](#标注文件(json)-->COCO数据集)
[LabelImg](#LabelImg)
* [使用说明](#使用说明)
* [LabelImg安装](#LabelImg安装)
* [安装注意事项](#安装注意事项)
* [图片标注过程](#LabelImg图片标注过程)
* [标注格式](#LabelImg标注格式)
* [导出数据格式](#LabelImg导出数据格式)
* [格式转换注意事项](#格式转换注意事项)
## [LabelMe](https://github.com/wkentaro/labelme)
### 使用说明
#### LabelMe安装
具体安装操作请参考[LabelMe官方教程](https://github.com/wkentaro/labelme)中的Installation
<details>
<summary><b> Ubuntu</b></summary>
```
sudo apt-get install labelme
# or
sudo pip3 install labelme
# or install standalone executable from:
# https://github.com/wkentaro/labelme/releases
```
</details>
<details>
<summary><b> macOS</b></summary>
```
brew install pyqt # maybe pyqt5
pip install labelme
# or
brew install wkentaro/labelme/labelme # command line interface
# brew install --cask wkentaro/labelme/labelme # app
# or install standalone executable/app from:
# https://github.com/wkentaro/labelme/releases
```
</details>
推荐使用Anaconda的安装方式
```
conda create name=labelme python=3
conda activate labelme
pip install pyqt5
pip install labelme
```
#### LabelMe图片标注过程
启动labelme后选择图片文件或者图片所在文件夹
左侧编辑栏选择`create polygons` 绘制标注区域如下图所示右击图像区域可以选择不同的标注形状绘制好区域后按下回车弹出新的框填入标注区域对应的标签people
左侧菜单栏点击保存,生成`json`形式的**标注文件**
![](https://media3.giphy.com/media/XdnHZgge5eynRK3ATK/giphy.gif?cid=790b7611192e4c0ec2b5e6990b6b0f65623154ffda66b122&rid=giphy.gif&ct=g)
### LabelMe标注格式
#### LabelMe导出数据格式
```
#生成标注文件
png/jpeg/jpg-->labelme标注-->json
```
#### 格式转化总结
```
#标注文件转化为VOC数据集格式
json-->labelme2voc.py-->VOC数据集
#标注文件转化为COCO数据集格式
json-->labelme2coco.py-->COCO数据集
```
#### 标注文件(json)-->VOC数据集
使用[官方给出的labelme2voc.py](https://github.com/wkentaro/labelme/blob/main/examples/bbox_detection/labelme2voc.py)这份脚本
下载该脚本,在命令行中使用
```Te
python labelme2voc.py data_annotated(标注文件所在文件夹) data_dataset_voc(输出文件夹) --labels labels.txt
```
运行后,在指定的输出文件夹中会如下的目录
```
# It generates:
# - data_dataset_voc/JPEGImages
# - data_dataset_voc/Annotations
# - data_dataset_voc/AnnotationsVisualization
```
#### 标注文件(json)-->COCO数据集
使用[PaddleDetection提供的x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/tools/x2coco.py) 将labelme标注的数据转换为COCO数据集形式
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
用户数据集转成COCO数据后目录结构如下注意数据集中路径名、文件名尽量不要使用中文避免中文编码问题导致出错
```
dataset/xxx/
├── annotations
│ ├── train.json # coco数据的标注文件
│ ├── valid.json # coco数据的标注文件
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```
## [LabelImg](https://github.com/tzutalin/labelImg)
### 使用说明
#### LabelImg安装
安装操作请参考[LabelImg官方教程](https://github.com/tzutalin/labelImg)
<details>
<summary><b> Ubuntu</b></summary>
```
sudo apt-get install pyqt5-dev-tools
sudo pip3 install -r requirements/requirements-linux-python3.txt
make qt5py3
python3 labelImg.py
python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
</details>
<details>
<summary><b>macOS</b></summary>
```
brew install qt # Install qt-5.x.x by Homebrew
brew install libxml2
or using pip
pip3 install pyqt5 lxml # Install qt and lxml by pip
make qt5py3
python3 labelImg.py
python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
</details>
推荐使用Anaconda的安装方式
首先下载并进入 [labelImg](https://github.com/tzutalin/labelImg#labelimg) 的目录
```
conda install pyqt=5
conda install -c anaconda lxml
pyrcc5 -o libs/resources.py resources.qrc
python labelImg.py
python labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
#### 安装注意事项
以Anaconda安装方式为例比Labelme配置要麻烦一些
启动方式是通过python运行脚本`python labelImg.py <图片路径>`
#### LabelImg图片标注过程
启动labelImg后选择图片文件或者图片所在文件夹
左侧编辑栏选择`创建区块` 绘制标注区,在弹出新的框选择对应的标签
左侧菜单栏点击保存可以选择VOC/YOLO/CreateML三种类型的标注文件
![](https://user-images.githubusercontent.com/34162360/177526022-fd9c63d8-e476-4b63-ae02-76d032bb7656.gif)
### LabelImg标注格式
#### LabelImg导出数据格式
```
#生成标注文件
png/jpeg/jpg-->labelImg标注-->xml/txt/json
```
#### 格式转换注意事项
**PaddleDetection支持VOC或COCO格式的数据**经LabelImg标注导出后的标注文件需要修改为**VOC或COCO格式**,调整说明可以参考[准备训练数据](./PrepareDataSet.md#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE)

View File

@@ -0,0 +1,271 @@
[简体中文](DetAnnoTools.md) | English
# Object Detection Annotation Tools
## Concents
[LabelMe](#LabelMe)
* [Instruction](#Instruction-of-LabelMe)
* [Installation](#Installation)
* [Annotation of Images](#Annotation-of-images-in-LabelMe)
* [Annotation Format](#Annotation-Format-of-LabelMe)
* [Export Format](#Export-Format-of-LabelMe)
* [Summary of Format Conversion](#Summary-of-Format-Conversion)
* [Annotation file(json)—>VOC Dataset](#annotation-filejsonvoc-dataset)
* [Annotation file(json)—>COCO Dataset](#annotation-filejsoncoco-dataset)
[LabelImg](#LabelImg)
* [Instruction](#Instruction-of-LabelImg)
* [Installation](#Installation-of-LabelImg)
* [Installation Notes](#Installation-Notes)
* [Annotation of images](#Annotation-of-images-in-LabelImg)
* [Annotation Format](#Annotation-Format-of-LabelImg)
* [Export Format](#Export-Format-of-LabelImg)
* [Notes of Format Conversion](#Notes-of-Format-Conversion)
## [LabelMe](https://github.com/wkentaro/labelme)
### Instruction of LabelMe
#### Installation
Please refer to [The github of LabelMe](https://github.com/wkentaro/labelme) for installation details.
<details>
<summary><b> Ubuntu</b></summary>
```
sudo apt-get install labelme
# or
sudo pip3 install labelme
# or install standalone executable from:
# https://github.com/wkentaro/labelme/releases
```
</details>
<details>
<summary><b> macOS</b></summary>
```
brew install pyqt # maybe pyqt5
pip install labelme
# or
brew install wkentaro/labelme/labelme # command line interface
# brew install --cask wkentaro/labelme/labelme # app
# or install standalone executable/app from:
# https://github.com/wkentaro/labelme/releases
```
</details>
We recommend installing by Anoncanda.
```
conda create name=labelme python=3
conda activate labelme
pip install pyqt5
pip install labelme
```
#### Annotation of Images in LabelMe
After starting labelme, select an image or an folder with images.
Select `create polygons` in the formula bar. Draw an annotation area as shown in the following GIF. You can right-click on the image to select different shape. When finished, press the Enter/Return key, then fill the corresponding label in the popup box, such as, people.
Click the save button in the formula barit will generate an annotation file in json.
![](https://media3.giphy.com/media/XdnHZgge5eynRK3ATK/giphy.gif?cid=790b7611192e4c0ec2b5e6990b6b0f65623154ffda66b122&rid=giphy.gif&ct=g)
### Annotation Format of LabelMe
#### Export Format of LabelMe
```
#generate an annotation file
png/jpeg/jpg-->labelme-->json
```
#### Summary of Format Conversion
```
#convert annotation file to VOC dataset format
json-->labelme2voc.py-->VOC dataset
#convert annotation file to COCO dataset format
json-->labelme2coco.py-->COCO dataset
```
#### Annotation file(json)—>VOC Dataset
Use this script [labelme2voc.py](https://github.com/wkentaro/labelme/blob/main/examples/bbox_detection/labelme2voc.py) in command line.
```Te
python labelme2voc.py data_annotated(annotation folder) data_dataset_voc(output folder) --labels labels.txt
```
Then, it will generate following contents:
```
# It generates:
# - data_dataset_voc/JPEGImages
# - data_dataset_voc/Annotations
# - data_dataset_voc/AnnotationsVisualization
```
#### Annotation file(json)—>COCO Dataset
Convert the data annotated by LabelMe to COCO dataset by the script [x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/tools/x2coco.py) provided by PaddleDetection.
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
After the user dataset is converted to COCO data, the directory structure is as follows (Try to avoid use Chinese for the path name in case of errors caused by Chinese coding problems):
```
dataset/xxx/
├── annotations
│ ├── train.json # Annotation file of coco data
│ ├── valid.json # Annotation file of coco data
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```
## [LabelImg](https://github.com/tzutalin/labelImg)
### Instruction
#### Installation of LabelImg
Please refer to [The github of LabelImg](https://github.com/tzutalin/labelImg) for installation details.
<details>
<summary><b> Ubuntu</b></summary>
```
sudo apt-get install pyqt5-dev-tools
sudo pip3 install -r requirements/requirements-linux-python3.txt
make qt5py3
python3 labelImg.py
python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
</details>
<details>
<summary><b>macOS</b></summary>
```
brew install qt # Install qt-5.x.x by Homebrew
brew install libxml2
or using pip
pip3 install pyqt5 lxml # Install qt and lxml by pip
make qt5py3
python3 labelImg.py
python3 labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
</details>
We recommend installing by Anoncanda.
Download and go to the folder of [labelImg](https://github.com/tzutalin/labelImg#labelimg)
```
conda install pyqt=5
conda install -c anaconda lxml
pyrcc5 -o libs/resources.py resources.qrc
python labelImg.py
python labelImg.py [IMAGE_PATH] [PRE-DEFINED CLASS FILE]
```
#### Installation Notes
Use python scripts to startup LabelImg: `python labelImg.py <IMAGE_PATH>`
#### Annotation of images in LabelImg
After the startup of LabelImg, select an image or a folder with images.
Select `Create RectBox` in the formula bar. Draw an annotation area as shown in the following GIF. When finished, select corresponding label in the popup box. Then save the annotated file in three forms: VOC/YOLO/CreateML.
![](https://user-images.githubusercontent.com/34162360/177526022-fd9c63d8-e476-4b63-ae02-76d032bb7656.gif)
### Annotation Format of LabelImg
#### Export Format of LabelImg
```
#generate annotation files
png/jpeg/jpg-->labelImg-->xml/txt/json
```
#### Notes of Format Conversion
**PaddleDetection supports the format of VOC or COCO.** The annotation file generated by LabelImg needs to be converted by VOC or COCO. You can refer to [PrepareDataSet](./PrepareDataSet.md#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE).

View File

@@ -0,0 +1,165 @@
简体中文 | [English](KeyPointAnnoTools_en.md)
# 关键点检测标注工具
## 目录
[LabelMe](#LabelMe)
- [使用说明](#使用说明)
- [安装](#安装)
- [关键点数据说明](#关键点数据说明)
- [图片标注过程](#图片标注过程)
- [标注格式](#标注格式)
- [导出数据格式](#导出数据格式)
- [格式转化总结](#格式转化总结)
- [标注文件(json)-->COCO](#标注文件(json)-->COCO数据集)
## [LabelMe](https://github.com/wkentaro/labelme)
### 使用说明
#### 安装
具体安装操作请参考[LabelMe官方教程](https://github.com/wkentaro/labelme)中的Installation
<details>
<summary><b> Ubuntu</b></summary>
```
sudo apt-get install labelme
# or
sudo pip3 install labelme
# or install standalone executable from:
# https://github.com/wkentaro/labelme/releases
```
</details>
<details>
<summary><b> macOS</b></summary>
```
brew install pyqt # maybe pyqt5
pip install labelme
# or
brew install wkentaro/labelme/labelme # command line interface
# brew install --cask wkentaro/labelme/labelme # app
# or install standalone executable/app from:
# https://github.com/wkentaro/labelme/releases
```
</details>
推荐使用Anaconda的安装方式
```
conda create name=labelme python=3
conda activate labelme
pip install pyqt5
pip install labelme
```
#### 关键点数据说明
以COCO数据集为例共需采集17个关键点
```
keypoint indexes:
0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
```
#### 图片标注过程
启动labelme后选择图片文件或者图片所在文件夹
左侧编辑栏选择`create polygons` ,右击图像区域选择标注形状,绘制好关键点后按下回车,弹出新的框填入标注关键点对应的标签
左侧菜单栏点击保存,生成`json`形式的**标注文件**
![操作说明](https://user-images.githubusercontent.com/34162360/178250648-29ee781a-676b-419c-83b1-de1e4e490526.gif)
### 标注格式
#### 导出数据格式
```
#生成标注文件
png/jpeg/jpg-->labelme标注-->json
```
#### 格式转化总结
```
#标注文件转化为COCO数据集格式
json-->labelme2coco.py-->COCO数据集
```
#### 标注文件(json)-->COCO数据集
使用[PaddleDetection提供的x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/tools/x2coco.py) 将labelme标注的数据转换为COCO数据集形式
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
用户数据集转成COCO数据后目录结构如下注意数据集中路径名、文件名尽量不要使用中文避免中文编码问题导致出错
```
dataset/xxx/
├── annotations
│ ├── train.json # coco数据的标注文件
│ ├── valid.json # coco数据的标注文件
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```

View File

@@ -0,0 +1,165 @@
[简体中文](KeyPointAnnoTools.md) | English
# Key Points Detection Annotation Tool
## Concents
[LabelMe](#LabelMe)
- [Instruction](#Instruction)
- [Installation](#Installation)
- [Notes of Key Points Data](#Notes-of-Key-Points-Data)
- [Annotation of LabelMe](#Annotation-of-LabelMe)
- [Annotation Format](#Annotation-Format)
- [Data Export Format](#Data-Export-Format)
- [Summary of Format Conversion](#Summary-of-Format-Conversion)
- [Annotation file(json)—>COCO Dataset](#annotation-filejsoncoco-dataset)
## [LabelMe](https://github.com/wkentaro/labelme)
### Instruction
#### Installation
Please refer to [The github of LabelMe](https://github.com/wkentaro/labelme) for installation details.
<details>
<summary><b> Ubuntu</b></summary>
```
sudo apt-get install labelme
# or
sudo pip3 install labelme
# or install standalone executable from:
# https://github.com/wkentaro/labelme/releases
```
</details>
<details>
<summary><b> macOS</b></summary>
```
brew install pyqt # maybe pyqt5
pip install labelme
# or
brew install wkentaro/labelme/labelme # command line interface
# brew install --cask wkentaro/labelme/labelme # app
# or install standalone executable/app from:
# https://github.com/wkentaro/labelme/releases
```
</details>
We recommend installing by Anoncanda.
```
conda create name=labelme python=3
conda activate labelme
pip install pyqt5
pip install labelme
```
#### Notes of Key Points Data
COCO dataset needs to collect 17 key points.
```
keypoint indexes:
0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
```
#### Annotation of LabelMe
After starting labelme, select an image or an folder with images.
Select `create polygons` in the formula bar. Draw an annotation area as shown in the following GIF. You can right-click on the image to select different shape. When finished, press the Enter/Return key, then fill the corresponding label in the popup box, such as, people.
Click the save button in the formula barit will generate an annotation file in json.
![操作说明](https://user-images.githubusercontent.com/34162360/178250648-29ee781a-676b-419c-83b1-de1e4e490526.gif)
### Annotation Format
#### Data Export Format
```
#generate an annotation file
png/jpeg/jpg-->labelme-->json
```
#### Summary of Format Conversion
```
#convert annotation file to COCO dataset format
json-->labelme2coco.py-->COCO dataset
```
#### Annotation file(json)—>COCO Dataset
Convert the data annotated by LabelMe to COCO dataset by this script [x2coco.py](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/tools/x2coco.py).
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
After the user dataset is converted to COCO data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems):
```
dataset/xxx/
├── annotations
│ ├── train.json # Annotation file of coco data
│ ├── valid.json # Annotation file of coco data
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```

View File

@@ -0,0 +1,75 @@
# 多目标跟踪标注工具
## 目录
* [前期准备](#前期准备)
* [SDE数据集](#SDE数据集)
* [LabelMe](#LabelMe)
* [LabelImg](#LabelImg)
* [JDE数据集](#JDE数据集)
* [DarkLabel](#DarkLabel)
* [标注格式](#标注格式)
### 前期准备
请先查看[多目标跟踪数据集准备](PrepareMOTDataSet.md)确定MOT模型选型和MOT数据集的类型。
通常综合数据标注成本和模型精度速度平衡考虑更推荐使用SDE系列数据集和SDE系列模型的ByteTrack或OC-SORT。SDE系列数据集的标注工具与目标检测任务是一致的。
### SDE数据集
SDE数据集是纯检测标注的数据集用户自定义数据集可以参照[DET数据准备文档](./PrepareDetDataSet.md)准备。
#### LabelMe
LabelMe的使用可以参考[DetAnnoTools](DetAnnoTools.md)
#### LabelImg
LabelImg的使用可以参考[DetAnnoTools](DetAnnoTools.md)
### JDE数据集
JDE数据集是同时有检测和ReID标注的数据集标注成本比SDE数据集更高。
#### [DarkLabel](https://github.com/darkpgmr/DarkLabel)
#### 使用说明
##### 安装
从官方给出的下载[链接](https://github.com/darkpgmr/DarkLabel/releases)中下载想要的版本Windows环境解压后能够直接使用
**视频/图片标注过程**
1. 启动应用程序后,能看到左侧的工具栏
2. 选择视频/图像文件后,按需选择标注形式:
* Box仅绘制标注框
* Box+Label绘制标注框&标签
* Box+Label+AutoID绘制标注框&标签&ID号
* Popup LabelSelect可以自行定义标签
3. 在视频帧/图像上进行拖动鼠标,进行标注框的绘制
4. 绘制完成后,在上数第六行里选择保存标注文件的形式,默认.txt
![1](https://user-images.githubusercontent.com/34162360/179673519-511b4167-97ed-4228-8869-db9c69a68b6b.mov)
##### 注意事项
1. 如果标注的是视频文件,需要在工具栏上数第五行的下拉框里选择`[fn,cname,id,x1,y1,w,h]` DarkLabel2.4版本)
2. 鼠标移动到标注框所在区域,右键可以删除标注框
3. 按下shift可以选中标注框进行框的移动和对某条边的编辑
4. 按住enter回车可以自动跟踪标注目标
5. 自动跟踪标注目标过程中可以暂停松开enter按需修改标注框
##### 其他使用参考视频
* [DarkLabel (Video/Image Annotation Tool) - Ver.2.0](https://www.youtube.com/watch?v=lok30aIZgUw)
* [DarkLabel (Image/Video Annotation Tool)](https://www.youtube.com/watch?v=vbydG78Al8s&t=11s)
#### 标注格式
标注文件需要转化为MOT JDE数据集格式包含`images``labels_with_ids`文件夹,具体参照[用户自定义数据集准备](PrepareMOTDataSet.md#用户自定义数据集准备)。

View File

@@ -0,0 +1,497 @@
# 目标检测数据准备
## 目录
- [目标检测数据说明](#目标检测数据说明)
- [准备训练数据](#准备训练数据)
- [VOC数据](#VOC数据)
- [VOC数据集下载](#VOC数据集下载)
- [VOC数据标注文件介绍](#VOC数据标注文件介绍)
- [COCO数据](#COCO数据)
- [COCO数据集下载](#COCO数据下载)
- [COCO数据标注文件介绍](#COCO数据标注文件介绍)
- [用户数据准备](#用户数据准备)
- [用户数据转成VOC数据](#用户数据转成VOC数据)
- [用户数据转成COCO数据](#用户数据转成COCO数据)
- [用户数据自定义reader](#用户数据自定义reader)
- [用户数据使用示例](#用户数据使用示例)
- [数据格式转换](#数据格式转换)
- [自定义数据训练](#自定义数据训练)
- [(可选)生成Anchor](#(可选)生成Anchor)
### 目标检测数据说明
目标检测的数据比分类复杂,一张图像中,需要标记出各个目标区域的位置和类别。
一般的目标区域位置用一个矩形框来表示一般用以下3种方式表达
| 表达方式 | 说明 |
| :----------------: | :--------------------------------: |
| x1,y1,x2,y2 | (x1,y1)为左上角坐标,(x2,y2)为右下角坐标 |
| x1,y1,w,h | (x1,y1)为左上角坐标w为目标区域宽度h为目标区域高度 |
| xc,yc,w,h | (xc,yc)为目标区域中心坐标w为目标区域宽度h为目标区域高度 |
常见的目标检测数据集如Pascal VOC采用的`[x1,y1,x2,y2]` 表示物体的bounding box, [COCO](https://cocodataset.org/#format-data)采用的`[x1,y1,w,h]` 表示物体的bounding box.
### 准备训练数据
PaddleDetection默认支持[COCO](http://cocodataset.org)和[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 和[WIDER-FACE](http://shuoyang1213.me/WIDERFACE/) 数据源。
同时还支持自定义数据源,包括:
(1) 自定义数据转换成VOC数据
(2) 自定义数据转换成COCO数据
(3) 自定义新的数据源增加自定义的reader。
首先进入到`PaddleDetection`根目录下
```
cd PaddleDetection/
ppdet_root=$(pwd)
```
#### VOC数据
VOC数据是[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 比赛使用的数据。Pascal VOC比赛不仅包含图像分类分类任务还包含图像目标检测、图像分割等任务其标注文件中包含多个任务的标注内容。
VOC数据集指的是Pascal VOC比赛使用的数据。用户自定义的VOC数据xml文件中的非必须字段请根据实际情况选择是否标注或是否使用默认值。
##### VOC数据集下载
- 通过代码自动化下载VOC数据集数据集较大下载需要较长时间
```
# 执行代码自动化下载VOC数据集
python dataset/voc/download_voc.py
```
代码执行完成后VOC数据集文件组织结构为
```
>>cd dataset/voc/
>>tree
├── create_list.py
├── download_voc.py
├── generic_det_label_list.txt
├── generic_det_label_list_zh.txt
├── label_list.txt
├── VOCdevkit/VOC2007
│ ├── annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.jpg
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 2011_003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 2011_003876.jpg
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
各个文件说明
```
# label_list.txt 是类别名称列表,文件名必须是 label_list.txt。若使用VOC数据集config文件中use_default_label为true时不需要这个文件
>>cat label_list.txt
aeroplane
bicycle
...
# trainval.txt 是训练数据集文件列表
>>cat trainval.txt
VOCdevkit/VOC2007/JPEGImages/007276.jpg VOCdevkit/VOC2007/Annotations/007276.xml
VOCdevkit/VOC2012/JPEGImages/2011_002612.jpg VOCdevkit/VOC2012/Annotations/2011_002612.xml
...
# test.txt 是测试数据集文件列表
>>cat test.txt
VOCdevkit/VOC2007/JPEGImages/000001.jpg VOCdevkit/VOC2007/Annotations/000001.xml
...
# label_list.txt voc 类别名称列表
>>cat label_list.txt
aeroplane
bicycle
...
```
- 已下载VOC数据集
按照如上数据文件组织结构组织文件即可。
##### VOC数据标注文件介绍
VOC数据是每个图像文件对应一个同名的xml文件xml文件中标记物体框的坐标和类别等信息。例如图像`2007_002055.jpg`
![](../images/2007_002055.jpg)
图片对应的xml文件内包含对应图片的基本信息比如文件名、来源、图像尺寸以及图像中包含的物体区域信息和类别信息等。
xml文件中包含以下字段
- filename表示图像名称。
- size表示图像尺寸。包括图像宽度、图像高度、图像深度。
```
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
```
- object字段表示每个物体。包括:
| 标签 | 说明 |
| :--------: | :-----------: |
| name | 物体类别名称 |
| pose | 关于目标物体姿态描述(非必须字段) |
| truncated | 如果物体的遮挡超过15-20并且位于边界框之外请标记为`truncated`(非必须字段) |
| difficult | 难以识别的物体标记为`difficult`(非必须字段) |
| bndbox子标签 | (xmin,ymin) 左上角坐标,(xmax,ymax) 右下角坐标, |
#### COCO数据
COCO数据是[COCO](http://cocodataset.org) 比赛使用的数据。同样的COCO比赛数也包含多个比赛任务其标注文件中包含多个任务的标注内容。
COCO数据集指的是COCO比赛使用的数据。用户自定义的COCO数据json文件中的一些字段请根据实际情况选择是否标注或是否使用默认值。
##### COCO数据下载
- 通过代码自动化下载COCO数据集数据集较大下载需要较长时间
```
# 执行代码自动化下载COCO数据集
python dataset/coco/download_coco.py
```
代码执行完成后COCO数据集文件组织结构为
```
>>cd dataset/coco/
>>tree
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
- 已下载COCO数据集
按照如上数据文件组织结构组织文件即可。
##### COCO数据标注介绍
COCO数据标注是将所有训练图像的标注都存放到一个json文件中。数据以字典嵌套的形式存放。
json文件中包含以下key
- info表示标注文件info。
- licenses表示标注文件licenses。
- images表示标注文件中图像信息列表每个元素是一张图像的信息。如下为其中一张图像的信息
```
{
'license': 3, # license
'file_name': '000000391895.jpg', # file_name
# coco_url
'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg',
'height': 360, # image height
'width': 640, # image width
'date_captured': '2013-11-14 11:18:45', # date_captured
# flickr_url
'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg',
'id': 391895 # image id
}
```
- annotations表示标注文件中目标物体的标注信息列表每个元素是一个目标物体的标注信息。如下为其中一个目标物体的标注信息
```
{
'segmentation': # 物体的分割标注
'area': 2765.1486500000005, # 物体的区域面积
'iscrowd': 0, # iscrowd
'image_id': 558840, # image id
'bbox': [199.84, 200.46, 77.71, 70.88], # bbox [x1,y1,w,h]
'category_id': 58, # category_id
'id': 156 # image id
}
```
```
# 查看COCO标注文件
import json
coco_anno = json.load(open('./annotations/instances_train2017.json'))
# coco_anno.keys
print('\nkeys:', coco_anno.keys())
# 查看类别信息
print('\n物体类别:', coco_anno['categories'])
# 查看一共多少张图
print('\n图像数量', len(coco_anno['images']))
# 查看一共多少个目标物体
print('\n标注物体数量', len(coco_anno['annotations']))
# 查看一条目标物体标注信息
print('\n查看一条目标物体标注信息', coco_anno['annotations'][0])
```
#### 用户数据准备
对于用户数据有3种处理方法
(1) 将用户数据转成VOC数据(根据需要仅包含物体检测所必须的标签即可)
(2) 将用户数据转成COCO数据(根据需要仅包含物体检测所必须的标签即可)
(3) 自定义一个用户数据的reader(较复杂数据需要自定义reader)
##### 用户数据转成VOC数据
用户数据集转成VOC数据后目录结构如下注意数据集中路径名、文件名尽量不要使用中文避免中文编码问题导致出错
```
dataset/xxx/
├── annotations
│ ├── xxx1.xml
│ ├── xxx2.xml
│ ├── xxx3.xml
│ | ...
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
├── label_list.txt (必须提供且文件名称必须是label_list.txt )
├── train.txt (训练数据集文件列表, ./images/xxx1.jpg ./annotations/xxx1.xml)
└── valid.txt (测试数据集文件列表)
```
各个文件说明
```
# label_list.txt 是类别名称列表,改文件名必须是这个
>>cat label_list.txt
classname1
classname2
...
# train.txt 是训练数据文件列表
>>cat train.txt
./images/xxx1.jpg ./annotations/xxx1.xml
./images/xxx2.jpg ./annotations/xxx2.xml
...
# valid.txt 是验证数据文件列表
>>cat valid.txt
./images/xxx3.jpg ./annotations/xxx3.xml
...
```
##### 用户数据转成COCO数据
在`./tools/`中提供了`x2coco.py`用于将VOC数据集、labelme标注的数据集或cityscape数据集转换为COCO数据例如:
1labelme数据转换为COCO数据
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
2voc数据转换为COCO数据
```bash
python tools/x2coco.py \
--dataset_type voc \
--voc_anno_dir path/to/VOCdevkit/VOC2007/Annotations/ \
--voc_anno_list path/to/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt \
--voc_label_list dataset/voc/label_list.txt \
--voc_out_name voc_train.json
```
用户数据集转成COCO数据后目录结构如下注意数据集中路径名、文件名尽量不要使用中文避免中文编码问题导致出错
```
dataset/xxx/
├── annotations
│ ├── train.json # coco数据的标注文件
│ ├── valid.json # coco数据的标注文件
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```
##### 用户数据自定义reader
如果数据集有新的数据需要添加进PaddleDetection中您可参考数据处理文档中的[添加新数据源](../advanced_tutorials/READER.md#2.3自定义数据集)文档部分,开发相应代码完成新的数据源支持,同时数据处理具体代码解析等可阅读[数据处理文档](../advanced_tutorials/READER.md)。
#### 用户数据使用示例
以[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据为例,说明如何准备自定义数据。
Kaggle上的 [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) 比赛数据包含877张图像数据类别4类crosswalkspeedlimitstoptrafficlight。
可从Kaggle上下载也可以从[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar) 下载。
路标数据集示例图:
![](../images/road554.png)
```
# 下载解压数据
>>cd $(ppdet_root)/dataset
# 下载kaggle数据集并解压当前文件组织结构如下
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
```
#### 数据格式转换
将数据划分为训练集和测试集
```
# 生成 label_list.txt 文件
>>echo -e "speedlimit\ncrosswalk\ntrafficlight\nstop" > label_list.txt
# 生成 train.txt、valid.txt和test.txt列表文件
>>ls images/*.png | shuf > all_image_list.txt
>>awk -F"/" '{print $2}' all_image_list.txt | awk -F".png" '{print $1}' | awk -F"\t" '{print "images/"$1".png annotations/"$1".xml"}' > all_list.txt
# 训练集、验证集、测试集比例分别约80%、10%、10%。
>>head -n 88 all_list.txt > test.txt
>>head -n 176 all_list.txt | tail -n 88 > valid.txt
>>tail -n 701 all_list.txt > train.txt
# 删除不用文件
>>rm -rf all_image_list.txt all_list.txt
最终数据集文件组织结构为:
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
├── label_list.txt
├── test.txt
├── train.txt
└── valid.txt
# label_list.txt 是类别名称列表,文件名必须是 label_list.txt
>>cat label_list.txt
crosswalk
speedlimit
stop
trafficlight
# train.txt 是训练数据集文件列表,每一行是一张图像路径和对应标注文件路径,以空格分开。注意这里的路径是数据集文件夹内的相对路径。
>>cat train.txt
./images/road839.png ./annotations/road839.xml
./images/road363.png ./annotations/road363.xml
...
# valid.txt 是验证数据集文件列表,每一行是一张图像路径和对应标注文件路径,以空格分开。注意这里的路径是数据集文件夹内的相对路径。
>>cat valid.txt
./images/road218.png ./annotations/road218.xml
./images/road681.png ./annotations/road681.xml
```
也可以下载准备好的数据[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar) ,解压到`dataset/roadsign_voc/`文件夹下即可。
准备好数据后,一般的我们要对数据有所了解,比如图像量,图像尺寸,每一类目标区域个数,目标区域大小等。如有必要,还要对数据进行清洗。
roadsign数据集统计:
| 数据 | 图片数量 |
| :--------: | :-----------: |
| train | 701 |
| valid | 176 |
**说明:**
1用户数据建议在训练前仔细检查数据避免因数据标注格式错误或图像数据不完整造成训练过程中的crash
2如果图像尺寸太大的话在不限制读入数据尺寸情况下占用内存较多会造成内存/显存溢出请合理设置batch_size可从小到大尝试
#### 自定义数据训练
数据准备完成后需要修改PaddleDetection中关于Dataset的配置文件在`configs/datasets`文件夹下。比如roadsign数据集的配置文件如下
```
metric: VOC # 目前支持COCO, VOC, WiderFace等评估标准
num_classes: 4 # 数据集的类别数不包含背景类roadsign数据集为4类其他数据需要修改为自己的数据类别
TrainDataset:
!VOCDataSet
dataset_dir: dataset/roadsign_voc # 训练集的图片所在文件相对于dataset_dir的路径
anno_path: train.txt # 训练集的标注文件相对于dataset_dir的路径
label_list: label_list.txt # 数据集所在路径相对于PaddleDetection路径
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult'] # 控制dataset输出的sample所包含的字段注意此为训练集Reader独有的且必须配置的字段
EvalDataset:
!VOCDataSet
dataset_dir: dataset/roadsign_voc # 数据集所在路径相对于PaddleDetection路径
anno_path: valid.txt # 验证集的标注文件相对于dataset_dir的路径
label_list: label_list.txt # 标签文件相对于dataset_dir的路径
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
TestDataset:
!ImageFolder
anno_path: label_list.txt # 标注文件所在路径仅用于读取数据集的类别信息支持json和txt格式
dataset_dir: dataset/roadsign_voc # 数据集所在路径,若添加了此行,则`anno_path`路径为相对于`dataset_dir`路径若此行不设置或去掉此行则为相对于PaddleDetection路径
```
然后在对应模型配置文件中将自定义数据文件路径替换为新路径,以`configs/yolov3/yolov3_mobilenet_v1_roadsign.yml`为例
```
_BASE_: [
'../datasets/roadsign_voc.yml', # 指定为自定义数据集配置路径
'../runtime.yml',
'_base_/optimizer_40e.yml',
'_base_/yolov3_mobilenet_v1.yml',
'_base_/yolov3_reader.yml',
]
pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams
weights: output/yolov3_mobilenet_v1_roadsign/model_final
YOLOv3Loss:
ignore_thresh: 0.7
label_smooth: true
```
在PaddleDetection的yml配置文件中使用`!`直接序列化模块实例(可以是函数,实例等)上述的配置文件均使用Dataset进行了序列化。
配置修改完成后,即可以启动训练评估,命令如下
```
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval
```
更详细的命令参考[30分钟快速上手PaddleDetection](../GETTING_STARTED_cn.md)
**注意:**
请运行前自行仔细检查数据集的配置路径在训练或验证时如果TrainDataset和EvalDataset的路径配置有误会提示自动下载数据集。若使用自定义数据集在推理时如果TestDataset路径配置有误会提示使用默认COCO数据集的类别信息。
### (可选)生成Anchor
在yolo系列模型中大多数情况下使用默认的anchor设置即可, 你也可以运行`tools/anchor_cluster.py`来得到适用于你的数据集Anchor使用方法如下
``` bash
python tools/anchor_cluster.py -c configs/ppyolo/ppyolo.yml -n 9 -s 608 -m v2 -i 1000
```
目前`tools/anchor_cluster.py`支持的主要参数配置如下表所示:
| 参数 | 用途 | 默认值 | 备注 |
|:------:|:------:|:------:|:------:|
| -c/--config | 模型的配置文件 | 无默认值 | 必须指定 |
| -n/--n | 聚类的簇数 | 9 | Anchor的数目 |
| -s/--size | 图片的输入尺寸 | None | 若指定,则使用指定的尺寸,如果不指定, 则尝试从配置文件中读取图片尺寸 |
| -m/--method | 使用的Anchor聚类方法 | v2 | 目前只支持yolov2的聚类算法 |
| -i/--iters | kmeans聚类算法的迭代次数 | 1000 | kmeans算法收敛或者达到迭代次数后终止 |

View File

@@ -0,0 +1,450 @@
# How to Prepare Training Data
## Directory
- [How to Prepare Training Data](#how-to-prepare-training-data)
- [Directory](#directory)
- [Description of Object Detection Data](#description-of-object-detection-data)
- [Prepare Training Data](#prepare-training-data)
- [VOC Data](#voc-data)
- [VOC Dataset Download](#voc-dataset-download)
- [Introduction to VOC Data Annotation File](#introduction-to-voc-data-annotation-file)
- [COCO Data](#coco-data)
- [COCO Data Download](#coco-data-download)
- [Description of COCO Data Annotation](#description-of-coco-data-annotation)
- [User Data](#user-data)
- [Convert User Data to VOC Data](#convert-user-data-to-voc-data)
- [Convert User Data to COCO Data](#convert-user-data-to-coco-data)
- [Reader of User Define Data](#reader-of-user-define-data)
- [Example of User Data Conversion](#example-of-user-data-conversion)
### Description of Object Detection Data
The data of object detection is more complex than classification. In an image, it is necessary to mark the position and category of each object.
The general object position is represented by a rectangular box, which is generally expressed in the following three ways
| Expression | Explanation |
| :---------: | :----------------------------------------------------------------------------: |
| x1,y1,x2,y2 | (x1,y1)is the top left coordinate, (x2,y2)is the bottom right coordonate |
| x1,y1,w,h | (x1,y1)is the top left coordinate, w is width of object, h is height of object |
| xc,yc,w,h | (xc,yc)is center of object, w is width of object, h is height of object |
Common object detection datasets such as Pascal VOC, adopting `[x1,y1,x2,y2]` to express the bounding box of object. COCO uses `[x1,y1,w,h]`, [format](https://cocodataset.org/#format-data).
### Prepare Training Data
PaddleDetection is supported [COCO](http://cocodataset.org) and [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) and [WIDER-FACE](http://shuoyang1213.me/WIDERFACE/) datasets by default.
It also supports custom data sources including:
(1) Convert custom data to VOC format;
(2) Convert custom data to COOC format;
(3) Customize a new data source, and add custom reader;
firstly, enter `PaddleDetection` root directory
```
cd PaddleDetection/
ppdet_root=$(pwd)
```
#### VOC Data
VOC data is used in [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) competition. Pascal VOC competition not only contains image classification task, but also contains object detection and object segmentation et al., the annotation file contains the ground truth of multiple tasks.
VOC dataset denotes the data of PAscal VOC competition. when customizeing VOC data, For non mandatory fields in the XML file, please select whether to label or use the default value according to the actual situation.
##### VOC Dataset Download
- Download VOC datasets through code automation. The datasets are large and take a long time to download
```
# Execute code to automatically download VOC dataset
python dataset/voc/download_voc.py
```
After code execution, the VOC dataset file organization structure is
```
>>cd dataset/voc/
>>tree
├── create_list.py
├── download_voc.py
├── generic_det_label_list.txt
├── generic_det_label_list_zh.txt
├── label_list.txt
├── VOCdevkit/VOC2007
│ ├── annotations
│ ├── 001789.xml
│ | ...
│ ├── JPEGImages
│ ├── 001789.jpg
│ | ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 2011_003876.xml
│ | ...
│ ├── JPEGImages
│ ├── 2011_003876.jpg
│ | ...
│ ├── ImageSets
│ | ...
| ...
```
Description of each document
```
# label_list.txt is list of classes namefilename must be label_list.txt. If using VOC dataset, when `use_default_label=true` in config file, this file is not required.
>>cat label_list.txt
aeroplane
bicycle
...
# trainval.txt is file list of trainset
>>cat trainval.txt
VOCdevkit/VOC2007/JPEGImages/007276.jpg VOCdevkit/VOC2007/Annotations/007276.xml
VOCdevkit/VOC2012/JPEGImages/2011_002612.jpg VOCdevkit/VOC2012/Annotations/2011_002612.xml
...
# test.txt is file list of testset
>>cat test.txt
VOCdevkit/VOC2007/JPEGImages/000001.jpg VOCdevkit/VOC2007/Annotations/000001.xml
...
# label_list.txt voc list of classes name
>>cat label_list.txt
aeroplane
bicycle
...
```
- If the VOC dataset has been downloaded
You can organize files according to the above data file organization structure.
##### Introduction to VOC Data Annotation File
In VOC dataset, Each image file corresponds to an XML file with the same name, the coordinates and categories of the marked object frame in the XML file, such as `2007_002055.jpg`:
![](../images/2007_002055.jpg)
The XML file corresponding to the image contains the basic information of the corresponding image, such as file name, source, image size, object area information and category information contained in the image.
The XML file contains the following fields
- filename, indicating the image name.
- size, indicating the image size, including: image width, image height and image depth
```
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
```
- object field, indict each object, including:
| Label | Explanation |
| :--------------: | :------------------------------------------------------------------------------------------------------------------------: |
| name | name of object class |
| pose | attitude description of the target object (non required field) |
| truncated | If the occlusion of the object exceeds 15-20% and is outside the bounding boxmark it as `truncated` (non required field) |
| difficult | objects that are difficult to recognize are marked as`difficult` (non required field) |
| bndbox son laebl | (xmin,ymin) top left coordinate, (xmax,ymax) bottom right coordinate |
#### COCO Data
COOC data is used in [COCO](http://cocodataset.org) competition. alike, Coco competition also contains multiple competition tasks, and its annotation file contains the annotation contents of multiple tasks.
The coco dataset refers to the data used in the coco competition. Customizing coco data, some fields in JSON file, please select whether to label or use the default value according to the actual situation.
##### COCO Data Download
- The coco dataset is downloaded automatically through the code. The dataset is large and takes a long time to download
```
# automatically download coco datasets by executing code
python dataset/coco/download_coco.py
```
after code execution, the organization structure of coco dataset file is
```
>>cd dataset/coco/
>>tree
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ | ...
| ...
```
- If the coco dataset has been downloaded
The files can be organized according to the above data file organization structure.
##### Description of COCO Data Annotation
Coco data annotation is to store the annotations of all training images in a JSON file. Data is stored in the form of nested dictionaries.
The JSON file contains the following keys:
- infoindicating the annotation file info。
- licenses, indicating the label file licenses。
- images, indicating the list of image information in the annotation file, and each element is the information of an image. The following is the information of one of the images:
```
{
'license': 3, # license
'file_name': '000000391895.jpg', # file_name
# coco_url
'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg',
'height': 360, # image height
'width': 640, # image width
'date_captured': '2013-11-14 11:18:45', # date_captured
# flickr_url
'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg',
'id': 391895 # image id
}
```
- annotations: indicating the annotation information list of the target object in the annotation file. Each element is the annotation information of a target object. The following is the annotation information of one of the target objects:
```
{
'segmentation': # object segmentation annotation
'area': 2765.1486500000005, # object area
'iscrowd': 0, # iscrowd
'image_id': 558840, # image id
'bbox': [199.84, 200.46, 77.71, 70.88], # bbox [x1,y1,w,h]
'category_id': 58, # category_id
'id': 156 # image id
}
```
```
# Viewing coco annotation files
import json
coco_anno = json.load(open('./annotations/instances_train2017.json'))
# coco_anno.keys
print('\nkeys:', coco_anno.keys())
# Viewing categories information
print('\ncategories:', coco_anno['categories'])
# Viewing the number of images
print('\nthe number of images', len(coco_anno['images']))
# Viewing the number of obejcts
print('\nthe number of annotation', len(coco_anno['annotations']))
# View object annotation information
print('\nobject annotation information: ', coco_anno['annotations'][0])
```
Coco data is prepared as follows.
`dataset/coco/`Initial document organization
```
>>cd dataset/coco/
>>tree
├── download_coco.py
```
#### User Data
There are three processing methods for user data:
(1) Convert user data into VOC data (only include labels necessary for object detection as required)
(2) Convert user data into coco data (only include labels necessary for object detection as required)
(3) Customize a reader for user data (for complex data, you need to customize the reader)
##### Convert User Data to VOC Data
After the user dataset is converted to VOC data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems):
```
dataset/xxx/
├── annotations
│ ├── xxx1.xml
│ ├── xxx2.xml
│ ├── xxx3.xml
│ | ...
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
├── label_list.txt (Must be provided and the file name must be label_list.txt )
├── train.txt (list of trainset ./images/xxx1.jpg ./annotations/xxx1.xml)
└── valid.txt (list of valid file)
```
Description of each document
```
# label_list.txt is a list of category names. The file name must be this
>>cat label_list.txt
classname1
classname2
...
# train.txt is list of trainset
>>cat train.txt
./images/xxx1.jpg ./annotations/xxx1.xml
./images/xxx2.jpg ./annotations/xxx2.xml
...
# valid.txt is list of validset
>>cat valid.txt
./images/xxx3.jpg ./annotations/xxx3.xml
...
```
##### Convert User Data to COCO Data
`x2coco.py` is provided in `./tools/` to convert VOC dataset, labelme labeled dataset or cityscape dataset into coco data, for example:
1Conversion of labelme data to coco data:
```bash
python tools/x2coco.py \
--dataset_type labelme \
--json_input_dir ./labelme_annos/ \
--image_input_dir ./labelme_imgs/ \
--output_dir ./cocome/ \
--train_proportion 0.8 \
--val_proportion 0.2 \
--test_proportion 0.0
```
2Convert VOC data to coco data:
```bash
python tools/x2coco.py \
--dataset_type voc \
--voc_anno_dir path/to/VOCdevkit/VOC2007/Annotations/ \
--voc_anno_list path/to/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt \
--voc_label_list dataset/voc/label_list.txt \
--voc_out_name voc_train.json
```
After the user dataset is converted to coco data, the directory structure is as follows (note that the path name and file name in the dataset should not use Chinese as far as possible to avoid errors caused by Chinese coding problems):
```
dataset/xxx/
├── annotations
│ ├── train.json # Annotation file of coco data
│ ├── valid.json # Annotation file of coco data
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
```
##### Reader of User Define Data
If new data in the dataset needs to be added to paddedetection, you can refer to the [add new data source] (../advanced_tutorials/READER.md#2.3_Customizing_Dataset) document section in the data processing document to develop corresponding code to complete the new data source support. At the same time, you can read the [data processing document] (../advanced_tutorials/READER.md) for specific code analysis of data processing
The configuration file for the Dataset exists in the `configs/datasets` folder. For example, the COCO dataset configuration file is as follows:
```
metric: COCO # Currently supports COCO, VOC, OID, Wider Face and other evaluation standards
num_classes: 80 # num_classes: The number of classes in the dataset, excluding background classes
TrainDataset:
!COCODataSet
image_dir: train2017 # The path where the training set image resides relative to the dataset_dir
anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir
dataset_dir: dataset/coco #The path where the dataset is located relative to the PaddleDetection path
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # Controls the fields contained in the sample output of the dataset, note data_fields are unique to the trainreader and must be configured
EvalDataset:
!COCODataSet
image_dir: val2017 # The path where the images of the validation set reside relative to the dataset_dir
anno_path: annotations/instances_val2017.json # The path to the annotation file of the validation set relative to the dataset_dir
dataset_dir: dataset/coco # The path where the dataset is located relative to the PaddleDetection path
TestDataset:
!ImageFolder
anno_path: dataset/coco/annotations/instances_val2017.json # The path of the annotation file, it is only used to read the category information of the dataset. JSON and TXT formats are supported
dataset_dir: dataset/coco # The path of the dataset, note if this row is added, `anno_path` will be 'dataset_dir/anno_path`, if not set or removed, `anno_path` is `anno_path`
```
In the YML profile for Paddle Detection, use `!`directly serializes module instances (functions, instances, etc.). The above configuration files are serialized using Dataset.
**Note:**
Please carefully check the configuration path of the dataset before running. During training or verification, if the path of TrainDataset or EvalDataset is wrong, it will download the dataset automatically. When using a user-defined dataset, if the TestDataset path is incorrectly configured during inference, the category of the default COCO dataset will be used.
#### Example of User Data Conversion
Take [Kaggle Dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) competition data as an example to illustrate how to prepare custom data. The dataset of Kaggle [road-sign-detection](https://www.kaggle.com/andrewmvd/road-sign-detection) competition contains 877 images, four categoriescrosswalkspeedlimitstoptrafficlight. Available for download from kaggle, also available from [link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
Example diagram of road sign dataset:
![](../images/road554.png)
```
# Downing and unziping data
>>cd $(ppdet_root)/dataset
# Download and unzip the kaggle dataset. The current file organization is as follows
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
```
The data is divided into training set and test set
```
# Generating label_list.txt
>>echo -e "speedlimit\ncrosswalk\ntrafficlight\nstop" > label_list.txt
# Generating train.txt, valid.txt and test.txt
>>ls images/*.png | shuf > all_image_list.txt
>>awk -F"/" '{print $2}' all_image_list.txt | awk -F".png" '{print $1}' | awk -F"\t" '{print "images/"$1".png annotations/"$1".xml"}' > all_list.txt
# The proportion of training set, verification set and test set is about 80%, 10% and 10% respectively.
>>head -n 88 all_list.txt > test.txt
>>head -n 176 all_list.txt | tail -n 88 > valid.txt
>>tail -n 701 all_list.txt > train.txt
# Deleting unused files
>>rm -rf all_image_list.txt all_list.txt
The organization structure of the final dataset file is:
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ ├── road10.xml
│ | ...
├── images
│ ├── road0.jpg
│ ├── road1.jpg
│ ├── road2.jpg
│ | ...
├── label_list.txt
├── test.txt
├── train.txt
└── valid.txt
# label_list.txt is list of file name, file name must be label_list.txt
>>cat label_list.txt
crosswalk
speedlimit
stop
trafficlight
# train.txt is the list of training dataset files, and each line is an image path and the corresponding annotation file path, separated by spaces. Note that the path here is a relative path within the dataset folder.
>>cat train.txt
./images/road839.png ./annotations/road839.xml
./images/road363.png ./annotations/road363.xml
...
# valid.txt is the list of validation dataset files. Each line is an image path and the corresponding annotation file path, separated by spaces. Note that the path here is a relative path within the dataset folder.
>>cat valid.txt
./images/road218.png ./annotations/road218.xml
./images/road681.png ./annotations/road681.xml
```
You can also download [the prepared data](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar), unzip to `dataset/roadsign_voc/`
After preparing the data, we should generally understand the data, such as image quantity, image size, number of target areas of each type, target area size, etc. If necessary, clean the data.
Roadsign dataset statistics:
| data | number of images |
| :---: | :--------------: |
| train | 701 |
| valid | 176 |
**Explanation:**
(1) For user data, it is recommended to carefully check the data before training to avoid crash during training due to wrong data annotation format or incomplete image data
(2) If the image size is too large, it will occupy more memory without limiting the read data size, which will cause memory / video memory overflow. Please set batch reasonably_ Size, you can try from small to large

View File

@@ -0,0 +1,176 @@
简体中文 | [English](PrepareKeypointDataSet_en.md)
# 关键点数据准备
## 目录
- [COCO数据集](#COCO数据集)
- [MPII数据集](#MPII数据集)
- [用户数据准备](#用户数据准备)
- [数据格式转换](#数据格式转换)
- [自定义数据训练](#自定义数据训练)
## COCO数据集
### COCO数据集的准备
我们提供了一键脚本来自动完成COCO2017数据集的下载及准备工作请参考[COCO数据集下载](https://github.com/PaddlePaddle/PaddleDetection/blob/f0a30f3ba6095ebfdc8fffb6d02766406afc438a/docs/tutorials/PrepareDetDataSet.md#COCO%E6%95%B0%E6%8D%AE)。
### COCO数据集KeyPoint说明
在COCO中关键点序号与部位的对应关系为
```
COCO keypoint indexes:
0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
```
与Detection任务不同KeyPoint任务的标注文件为`person_keypoints_train2017.json``person_keypoints_val2017.json`两个json文件。json文件中包含的`info``licenses``images`字段的含义与Detection相同`annotations``categories`则是不同的。
`categories`字段中,除了给出类别,还给出了关键点的名称和互相之间的连接性。
`annotations`字段中标注了每一个实例的ID与所在图像同时还有分割信息和关键点信息。其中与关键点信息较为相关的有
- `keypoints``[x1,y1,v1 ...]`,是一个长度为17*3=51的List,每组表示了一个关键点的坐标与可见性,`v=0, x=0, y=0`表示该点不可见且未标注,`v=1`表示该点有标注但不可见,`v=2`表示该点有标注且可见。
- `bbox`: `[x1,y1,w,h]`表示该实例的检测框位置。
- `num_keypoints`: 表示该实例标注关键点的数目。
## MPII数据集
### MPII数据集的准备
请先通过[MPII Human Pose Dataset](http://human-pose.mpi-inf.mpg.de/#download)下载MPII数据集的图像与对应标注文件并存放到`dataset/mpii`路径下。标注文件可以采用[mpii_annotations](https://download.openmmlab.com/mmpose/datasets/mpii_annotations.tar),已对应转换为json格式完成后的目录结构为
```
mpii
|── annotations
| |── mpii_gt_val.mat
| |── mpii_test.json
| |── mpii_train.json
| |── mpii_trainval.json
| `── mpii_val.json
`── images
|── 000001163.jpg
|── 000003072.jpg
```
### MPII数据集的说明
在MPII中关键点序号与部位的对应关系为
```
MPII keypoint indexes:
0: 'right_ankle',
1: 'right_knee',
2: 'right_hip',
3: 'left_hip',
4: 'left_knee',
5: 'left_ankle',
6: 'pelvis',
7: 'thorax',
8: 'upper_neck',
9: 'head_top',
10: 'right_wrist',
11: 'right_elbow',
12: 'right_shoulder',
13: 'left_shoulder',
14: 'left_elbow',
15: 'left_wrist',
```
下面以一个解析后的标注信息为例,说明标注的内容,其中每条标注信息标注了一个人物实例:
```
{
'joints_vis': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
'gt_joints': [
[-1.0, -1.0],
[-1.0, -1.0],
[-1.0, -1.0],
[-1.0, -1.0],
[-1.0, -1.0],
[-1.0, -1.0],
[-1.0, -1.0],
[1232.0, 288.0],
[1236.1271, 311.7755],
[1181.8729, -0.77553],
[692.0, 464.0],
[902.0, 417.0],
[1059.0, 247.0],
[1405.0, 329.0],
[1498.0, 613.0],
[1303.0, 562.0]
],
'image': '077096718.jpg',
'scale': 9.516749,
'center': [1257.0, 297.0]
}
```
- `joints_vis`分别表示16个关键点是否标注若为0则对应序号的坐标也为`[-1.0, -1.0]`
- `joints`分别表示16个关键点的坐标。
- `image`:表示对应的图片文件。
- `center`:表示人物的大致坐标,用于定位人物在图像中的位置。
- `scale`表示人物的比例对应200px。
## 用户数据准备
### 数据格式转换
这里我们以`AIChallenger`数据集为例展示如何将其他数据集对齐到COCO格式并加入关键点模型训练中。
`AI challenger`的标注格式如下:
```
AI Challenger Description:
0: 'Right Shoulder',
1: 'Right Elbow',
2: 'Right Wrist',
3: 'Left Shoulder',
4: 'Left Elbow',
5: 'Left Wrist',
6: 'Right Hip',
7: 'Right Knee',
8: 'Right Ankle',
9: 'Left Hip',
10: 'Left Knee',
11: 'Left Ankle',
12: 'Head top',
13: 'Neck'
```
1.`AI Challenger`点位序号,调整至与`COCO`数据集一致,(如`Right Shoulder`的序号由`0`调整到`13`
2. 统一是否标注/可见的标志位信息,如`AI Challenger``标注且可见`需要由`1`调整到`2`
3. 在该过程中,舍弃该数据集特有的点位(如`Neck`)同时该数据集中没有的COCO点位`left_eye`等),对应设置为`v=0, x=0, y=0`,表示该未标注。
4. 为了避免不同数据集ID重复的问题需要重新排列图像的`image_id``annotation id`
5. 整理图像路径`file_name`,使其能够被正确访问到。
我们提供了整合`COCO`训练集和`AI Challenger`数据集的[标注文件](https://bj.bcebos.com/v1/paddledet/data/keypoint/aic_coco_train_cocoformat.json),供您参考调整后的效果。
### 自定义数据训练
以[tinypose_256x192](../../../configs/keypoint/tiny_pose/README.md)为例来说明对于自定义数据如何修改:
#### 1、配置文件[tinypose_256x192.yml](../../../configs/keypoint/tiny_pose/tinypose_256x192.yml)
基本的修改内容及其含义如下:
```
num_joints: &num_joints 17 #自定义数据的关键点数量
train_height: &train_height 256 #训练图片尺寸-高度h
train_width: &train_width 192 #训练图片尺寸-宽度w
hmsize: &hmsize [48, 64] #对应训练尺寸的输出尺寸,这里是输入[w,h]的1/4
flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #关键点定义中左右对称的关键点用于flip增强。若没有对称结构在 TrainReader 的 RandomFlipHalfBodyTransform 一栏中 flip_pairs 后面加一行 "flip: False"(注意缩紧对齐)
num_joints_half_body: 8 #半身关键点数量,用于半身增强
prob_half_body: 0.3 #半身增强实现概率若不需要则修改为0
upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #上半身对应关键点id用于半身增强中获取上半身对应的关键点。
```
上述是自定义数据时所需要的修改部分,完整的配置及含义说明可参考文件:[关键点配置文件说明](../KeyPointConfigGuide_cn.md)。
#### 2、其他代码修改影响测试、可视化
- keypoint_utils.py中的sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07,.87, .87, .89, .89]) / 10.0表示每个关键点的确定范围方差根据实际关键点可信区域设置区域精确的一般0.25-0.5例如眼睛。区域范围大的一般0.5-1.0例如肩膀。若不确定建议0.75。
- visualizer.py中的draw_pose函数中的EDGES表示可视化时关键点之间的连接线关系。
- pycocotools工具中的sigmas同第一个keypoint_utils.py中的设置。用于coco指标评估时计算。
#### 3、数据准备注意
- 训练数据请按coco数据格式处理。需要包括关键点[Nx3]、检测框[N]标注。
- 请注意area>0area=0时数据在训练时会被过滤掉。此外由于COCO的评估机制area较小的数据在评估时也会被过滤掉我们建议在自定义数据时取`area = bbox_w * bbox_h`

View File

@@ -0,0 +1,142 @@
[简体中文](PrepareKeypointDataSet.md) | English
# How to prepare dataset?
## Table of Contents
- [COCO](#COCO)
- [MPII](#MPII)
- [Training for other dataset](#Training_for_other_dataset)
## COCO
### Preperation for COCO dataset
We provide a one-click script to automatically complete the download and preparation of the COCO2017 dataset. Please refer to [COCO Download](https://github.com/PaddlePaddle/PaddleDetection/blob/f0a30f3ba6095ebfdc8fffb6d02766406afc438a/docs/tutorials/PrepareDetDataSet_en.md#COCO%E6%95%B0%E6%8D%AE).
### Description for COCO datasetKeypoint):
In COCO, the indexes and corresponding keypoint name are:
```
COCO keypoint indexes:
0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
```
Being different from detection task, the annotation files for keyPoint task are `person_keypoints_train2017.json` and `person_keypoints_val2017.json`. In these two json files, the terms `info``licenses` and `images` are same with detection task. However, the `annotations` and `categories` are different.
In `categories`, in addition to the category, there are also the names of the keypoints and the connectivity among them.
In `annotations`, the ID and image of each instance are annotated, as well as segmentation information and keypoint information. Among them, terms related to the keypoints are:
- `keypoints`: `[x1,y1,v1 ...]`, which is a `List` with length 17*3=51. Each combination represents the coordinates and visibility of one keypoint. `v=0, x=0, y=0` indicates this keypoint is not visible and unlabeled. `v=1` indicates this keypoint is labeled but not visible. `v=2` indicates this keypoint is labeled and visible.
- `bbox`: `[x1,y1,w,h]`, the bounding box of this instance.
- `num_keypoints`: the number of labeled keypoints of this instance.
## MPII
### Preperation for MPII dataset
Please download MPII dataset images and corresponding annotation files from [MPII Human Pose Dataset](http://human-pose.mpi-inf.mpg.de/#download), and save them to `dataset/mpii`. You can use [mpii_annotations](https://download.openmmlab.com/mmpose/datasets/mpii_annotations.tar), which are already converted to `.json`. The directory structure will be shown as:
```
mpii
|── annotations
| |── mpii_gt_val.mat
| |── mpii_test.json
| |── mpii_train.json
| |── mpii_trainval.json
| `── mpii_val.json
`── images
|── 000001163.jpg
|── 000003072.jpg
```
### Description for MPII dataset
In MPII, the indexes and corresponding keypoint name are:
```
MPII keypoint indexes:
0: 'right_ankle',
1: 'right_knee',
2: 'right_hip',
3: 'left_hip',
4: 'left_knee',
5: 'left_ankle',
6: 'pelvis',
7: 'thorax',
8: 'upper_neck',
9: 'head_top',
10: 'right_wrist',
11: 'right_elbow',
12: 'right_shoulder',
13: 'left_shoulder',
14: 'left_elbow',
15: 'left_wrist',
```
The following example takes a parsed annotation information to illustrate the content of the annotation, each annotation information represents a person instance:
```
{
'joints_vis': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
'gt_joints': [
[-1.0, -1.0],
[-1.0, -1.0],
[-1.0, -1.0],
[-1.0, -1.0],
[-1.0, -1.0],
[-1.0, -1.0],
[-1.0, -1.0],
[1232.0, 288.0],
[1236.1271, 311.7755],
[1181.8729, -0.77553],
[692.0, 464.0],
[902.0, 417.0],
[1059.0, 247.0],
[1405.0, 329.0],
[1498.0, 613.0],
[1303.0, 562.0]
],
'image': '077096718.jpg',
'scale': 9.516749,
'center': [1257.0, 297.0]
}
```
- `joints_vis`: indicates whether the 16 keypoints are labeled respectively, if it is 0, the corresponding coordinate will be `[-1.0, -1.0]`.
- `joints`: the coordinates of 16 keypoints.
- `image`: image file which this instance belongs to.
- `center`: the coordinate of person instance center, which is used to locate instance in the image.
- `scale`: scale of the instance, corresponding to 200px.
## Training for other dataset
Here, we take `AI Challenger` dataset as example, to show how to align other datasets to `COCO` and add them into training of keypoint models.
In `AI Challenger`, the indexes and corresponding keypoint name are:
```
AI Challenger Description:
0: 'Right Shoulder',
1: 'Right Elbow',
2: 'Right Wrist',
3: 'Left Shoulder',
4: 'Left Elbow',
5: 'Left Wrist',
6: 'Right Hip',
7: 'Right Knee',
8: 'Right Ankle',
9: 'Left Hip',
10: 'Left Knee',
11: 'Left Ankle',
12: 'Head top',
13: 'Neck'
```
1. Align the indexes of the `AI Challenger` keypoint to be consistent with `COCO`. For example, the index of `Right Shoulder` should be adjusted from `0` to `13`.
2. Unify the flags whether the keypoint is labeled/visible. For example, `labeled and visible` in `AI Challenger` needs to be adjusted from `1` to `2`.
3. In this proprocess, we discard the unique keypoints in this dataset (like `Neck`). For keypoints not in this dataset but in `COCO` (like `left_eye`), we set `v=0, x=0, y=0` to indicate these keypoints are not labeled.
4. To avoid the problem of ID duplication in different datasets, the `image_id` and `annotation id` need to be rearranged.
5. Rewrite the image path `file_name`, to make sure images can be accessed correctly.
We also provide an [annotation file](https://bj.bcebos.com/v1/paddledet/data/keypoint/aic_coco_train_cocoformat.json) combining `COCO` trainset and `AI Challenger` trainset.

View File

@@ -0,0 +1,302 @@
简体中文 | [English](PrepareMOTDataSet_en.md)
# 多目标跟踪数据集准备
## 目录
- [简介和模型选型](#简介和模型选型)
- [MOT数据集准备](#MOT数据集准备)
- [SDE数据集](#SDE数据集)
- [JDE数据集](#JDE数据集)
- [用户自定义数据集准备](#用户自定义数据集准备)
- [SDE数据集](#SDE数据集)
- [JDE数据集](#JDE数据集)
- [引用](#引用)
## 简介和模型选型
PaddleDetection中提供了SDE和JDE两个系列的多种算法实现
- SDE(Separate Detection and Embedding)
- [ByteTrack](../../../configs/mot/bytetrack)
- [DeepSORT](../../../configs/mot/deepsort)
- JDE(Joint Detection and Embedding)
- [JDE](../../../configs/mot/jde)
- [FairMOT](../../../configs/mot/fairmot)
- [MCFairMOT](../../../configs/mot/mcfairmot)
**注意:**
- 以上算法原论文均为单类别的多目标跟踪PaddleDetection团队同时也支持了[ByteTrack](./bytetrack)和FairMOT([MCFairMOT](./mcfairmot))的多类别的多目标跟踪;
- [DeepSORT](../../../configs/mot/deepsort)和[JDE](../../../configs/mot/jde)均只支持单类别的多目标跟踪;
- [DeepSORT](../../../configs/mot/deepsort)需要额外添加ReID权重一起执行[ByteTrack](../../../configs/mot/bytetrack)可加可不加ReID权重默认不加
关于模型选型PaddleDetection团队提供的总结建议如下
| MOT方式 | 经典算法 | 算法流程 | 数据集要求 | 其他特点 |
| :--------------| :--------------| :------- | :----: | :----: |
| SDE系列 | DeepSORT,ByteTrack | 分离式两个独立模型权重先检测后ReID也可不加ReID | 检测和ReID数据相对独立不加ReID时即纯检测数据集 |检测和ReID可分别调优鲁棒性较高AI竞赛常用|
| JDE系列 | FairMOT | 联合式一个模型权重端到端同时检测和ReID | 必须同时具有检测和ReID标注 | 检测和ReID联合训练不易调优泛化性不强|
**注意:**
- 由于数据标注的成本较大,建议选型前优先考虑**数据集要求**如果数据集只有检测框标注而没有ReID标注是无法使用JDE系列算法训练的更推荐使用SDE系列
- SDE系列算法在检测器精度足够高时也可以不使用ReID权重进行物体间的长时序关联可以参照[ByteTrack](bytetrack)
- 耗时速度和模型权重参数量计算量有一定关系,耗时从理论上看`不使用ReID的SDE系列 < JDE系列 < 使用ReID的SDE系列`
## MOT数据集准备
PaddleDetection团队提供了众多公开数据集或整理后数据集的下载链接参考[数据集下载汇总](../../../configs/mot/DataDownload.md),用户可以自行下载使用。
根据模型选型总结MOT数据集可以分为两类一类纯检测框标注的数据集仅SDE系列可以使用另一类是同时有检测和ReID标注的数据集SDE系列和JDE系列都可以使用。
### SDE数据集
SDE数据集是纯检测标注的数据集用户自定义数据集可以参照[DET数据准备文档](./PrepareDetDataSet.md)准备。
以MOT17数据集为例下载并解压放在`PaddleDetection/dataset/mot`目录下:
```
wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip
```
并修改数据集部分的配置文件如下:
```
num_classes: 1
TrainDataset:
!COCODataSet
dataset_dir: dataset/mot/MOT17
anno_path: annotations/train_half.json
image_dir: images/train
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
dataset_dir: dataset/mot/MOT17
anno_path: annotations/val_half.json
image_dir: images/train
TestDataset:
!ImageFolder
dataset_dir: dataset/mot/MOT17
anno_path: annotations/val_half.json
```
数据集目录为:
```
dataset/mot
|——————MOT17
|——————annotations
|——————images
```
### JDE数据集
JDE数据集是同时有检测和ReID标注的数据集首先按照以下命令`image_lists.zip`并解压放在`PaddleDetection/dataset/mot`目录下:
```
wget https://bj.bcebos.com/v1/paddledet/data/mot/image_lists.zip
```
然后按照以下命令可以快速下载各个公开数据集,也解压放在`PaddleDetection/dataset/mot`目录下:
```
# MIX数据同JDE,FairMOT论文使用的数据集
wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/Caltech.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/CUHKSYSU.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/PRW.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/Cityscapes.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/ETHZ.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT16.zip
```
数据集目录为:
```
dataset/mot
|——————image_lists
|——————caltech.all
|——————citypersons.train
|——————cuhksysu.train
|——————eth.train
|——————mot16.train
|——————mot17.train
|——————prw.train
|——————Caltech
|——————Cityscapes
|——————CUHKSYSU
|——————ETHZ
|——————MOT16
|——————MOT17
|——————PRW
```
#### JDE数据集的格式
这几个相关数据集都遵循以下结构:
```
MOT17
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train
```
所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径,可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中,每行都描述一个边界框,格式如下:
```
[class] [identity] [x_center] [y_center] [width] [height]
```
- `class`为类别id支持单类别和多类别`0`开始计,单类别即为`0`
- `identity`是从`1``num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`
- `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意他们的值是由图片的宽度/高度标准化的因此它们是从0到1的浮点数。
**注意:**
- MIX数据集是[JDE](https://github.com/Zhongdao/Towards-Realtime-MOT)和[FairMOT](https://github.com/ifzhang/FairMOT)原论文使用的数据集,包括**Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17和MOT16**。使用前6者作为联合数据集参与训练MOT16作为评测数据集。如果您想使用这些数据集请**遵循他们的License**。
- MIX数据集以及其子数据集都是单类别的行人跟踪数据集可认为相比于行人检测数据集多了id号的标注。
- 更多场景的垂类模型例如车辆行人人头跟踪等垂类数据集也需要处理成与MIX数据集相同的格式参照[数据集下载汇总](DataDownload.md)、[车辆跟踪](vehicle/README_cn.md)、[人头跟踪](headtracking21/README_cn.md)以及更通用的[行人跟踪](pedestrian/README_cn.md)。
- 用户自定义数据集可参照[MOT数据集准备教程](../../docs/tutorials/PrepareMOTDataSet_cn.md)去准备。
## 用户自定义数据集准备
### SDE数据集
如果用户选择SDE系列方案是准备准检测标注的自定义数据集则可以参照[DET数据准备文档](./PrepareDetDataSet.md)准备。
### JDE数据集
如果用户选择JDE系列方案则需要同时具有检测和ReID标注且符合MOT-17数据集的格式。
为了规范地进行训练和评测用户数据需要转成和MOT-17数据集相同的目录和格式
```
custom_data
|——————images
| └——————test
| └——————train
| └——————seq1
| | └——————gt
| | | └——————gt.txt
| | └——————img1
| | | └——————000001.jpg
| | | |——————000002.jpg
| | | └—————— ...
| | └——————seqinfo.ini
| └——————seq2
| └——————...
└——————labels_with_ids
└——————train
└——————seq1
| └——————000001.txt
| |——————000002.txt
| └—————— ...
└——————seq2
└—————— ...
```
##### images文件夹
- `gt.txt`是原始标注文件,而训练所用标注是`labels_with_ids`文件夹。
- `gt.txt`里是当前视频中所有图片的原始标注文件,每行都描述一个边界框,格式如下:
```
[frame_id],[identity],[bb_left],[bb_top],[width],[height],[score],[label],[vis_ratio]
```
- `img1`文件夹里是按照一定帧率抽好的图片。
- `seqinfo.ini`文件是视频信息描述文件,需要如下格式的信息:
```
[Sequence]
name=MOT17-02
imDir=img1
frameRate=30
seqLength=600
imWidth=1920
imHeight=1080
imExt=.jpg
```
其中`gt.txt`里是当前视频中所有图片的原始标注文件,每行都描述一个边界框,格式如下:
```
[frame_id],[identity],[bb_left],[bb_top],[width],[height],[score],[label],[vis_ratio]
```
**注意**:
- `frame_id`为当前图片帧序号
- `identity`是从`1`到`num_identities`的整数(`num_identities`是**当前视频或图片序列**的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。
- `bb_left`是目标框的左边界的x坐标
- `bb_top`是目标框的上边界的y坐标
- `widthheight`是真实的像素宽高
- `score`是当前目标是否进入考虑范围内的标志(值为0表示此目标在计算中被忽略而值为1则用于将其标记为活动实例),默认为`1`
- `label`是当前目标的种类标签,由于目前仅支持单类别跟踪,默认为`1`MOT-16数据集中会有其他类别标签但都是当作ignore类别计算
- `vis_ratio`是当前目标被其他目标包含或覆挡后的可见率是从0到1的浮点数默认为`1`
##### labels_with_ids文件夹
所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径,可以通过将字符串`images`替换为`labels_with_ids`并将`.jpg`替换为`.txt`来生成标注文本路径。在标注文本中,每行都描述一个边界框,格式如下:
```
[class] [identity] [x_center] [y_center] [width] [height]
```
**注意**:
- `class`为类别id支持单类别和多类别从`0`开始计,单类别即为`0`。
- `identity`是从`1`到`num_identities`的整数(`num_identities`是数据集中所有视频或图片序列的不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。
- `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,注意是由图片的宽度/高度标准化的因此它们是从0到1的浮点数。
可采用如下脚本生成相应的`labels_with_ids`:
```
cd dataset/mot
python gen_labels_MOT.py
```
### 引用
Caltech:
```
@inproceedings{ dollarCVPR09peds,
author = "P. Doll\'ar and C. Wojek and B. Schiele and P. Perona",
title = "Pedestrian Detection: A Benchmark",
booktitle = "CVPR",
month = "June",
year = "2009",
city = "Miami",
}
```
Citypersons:
```
@INPROCEEDINGS{Shanshan2017CVPR,
Author = {Shanshan Zhang and Rodrigo Benenson and Bernt Schiele},
Title = {CityPersons: A Diverse Dataset for Pedestrian Detection},
Booktitle = {CVPR},
Year = {2017}
}
@INPROCEEDINGS{Cordts2016Cityscapes,
title={The Cityscapes Dataset for Semantic Urban Scene Understanding},
author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt},
booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2016}
}
```
CUHK-SYSU:
```
@inproceedings{xiaoli2017joint,
title={Joint Detection and Identification Feature Learning for Person Search},
author={Xiao, Tong and Li, Shuang and Wang, Bochao and Lin, Liang and Wang, Xiaogang},
booktitle={CVPR},
year={2017}
}
```
PRW:
```
@inproceedings{zheng2017person,
title={Person re-identification in the wild},
author={Zheng, Liang and Zhang, Hengheng and Sun, Shaoyan and Chandraker, Manmohan and Yang, Yi and Tian, Qi},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={1367--1376},
year={2017}
}
```
ETHZ:
```
@InProceedings{eth_biwi_00534,
author = {A. Ess and B. Leibe and K. Schindler and and L. van Gool},
title = {A Mobile Vision System for Robust Multi-Person Tracking},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08)},
year = {2008},
month = {June},
publisher = {IEEE Press},
keywords = {}
}
```
MOT-16&17:
```
@article{milan2016mot16,
title={MOT16: A benchmark for multi-object tracking},
author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad},
journal={arXiv preprint arXiv:1603.00831},
year={2016}
}
```

View File

@@ -0,0 +1,229 @@
English | [简体中文](PrepareMOTDataSet.md)
# Contents
## Multi-Object Tracking Dataset Preparation
- [MOT Dataset](#MOT_Dataset)
- [Dataset Directory](#Dataset_Directory)
- [Data Format](#Data_Format)
- [Custom Dataset Preparation](#Custom_Dataset_Preparation)
- [Citations](#Citations)
### MOT Dataset
PaddleDetection implement [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT), and use the same training data named 'MIX' as them, including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. The former six are used as the mixed dataset for training, and MOT16 are used as the evaluation dataset. If you want to use these datasets, please **follow their licenses**.
**Notes:**
- Multi-Object Tracking(MOT) datasets are always used for single category tracking. DeepSORT, JDE and FairMOT are single category MOT models. 'MIX' dataset and it's sub datasets are also single category pedestrian tracking datasets. It can be considered that there are additional IDs ground truth for detection datasets.
- In order to train the feature models of more scenes, more datasets are also processed into the same format as the MIX dataset. PaddleDetection Team also provides feature datasets and models of [vehicle tracking](../../configs/mot/vehicle/readme.md), [head tracking](../../configs/mot/headtracking21/readme.md) and more general [pedestrian tracking](../../configs/mot/pedestrian/readme.md). User defined datasets can also be prepared by referring to this data preparation doc.
- The multipe category MOT model is [MCFairMOT] (../../configs/mot/mcfairmot/readme_cn.md), and the multi category dataset is the integrated version of VisDrone dataset. Please refer to the doc of [MCFairMOT](../../configs/mot/mcfairmot/README.md).
- The Multi-Target Multi-Camera Tracking (MTMCT) model is [AIC21 MTMCT](https://www.aicitychallenge.org)(CityFlow) Multi-Camera Vehicle Tracking dataset. The dataset and model can refer to the doc of [MTMCT](../../configs/mot/mtmct/README.md).
### Dataset Directory
First, download the image_lists.zip using the following command, and unzip them into `PaddleDetection/dataset/mot`:
```
wget https://bj.bcebos.com/v1/paddledet/data/mot/image_lists.zip
```
Then, download the MIX dataset using the following command, and unzip them into `PaddleDetection/dataset/mot`:
```
wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/Caltech.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/CUHKSYSU.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/PRW.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/Cityscapes.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/ETHZ.zip
wget https://bj.bcebos.com/v1/paddledet/data/mot/MOT16.zip
```
The final directory is:
```
dataset/mot
|——————image_lists
|——————caltech.10k.val
|——————caltech.all
|——————caltech.train
|——————caltech.val
|——————citypersons.train
|——————citypersons.val
|——————cuhksysu.train
|——————cuhksysu.val
|——————eth.train
|——————mot16.train
|——————mot17.train
|——————prw.train
|——————prw.val
|——————Caltech
|——————Cityscapes
|——————CUHKSYSU
|——————ETHZ
|——————MOT16
|——————MOT17
|——————PRW
```
### Data Format
These several relevant datasets have the following structure:
```
MOT17
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train
```
Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
In the annotation text, each line is describing a bounding box and has the following format:
```
[class] [identity] [x_center] [y_center] [width] [height]
```
**Notes:**
- `class` is the class id, support single class and multi-class, start from `0`, and for single class is `0`.
- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
### Custom Dataset Preparation
In order to standardize training and evaluation, custom data needs to be converted into the same directory and format as MOT-16 dataset:
```
custom_data
|——————images
| └——————test
| └——————train
| └——————seq1
| | └——————gt
| | | └——————gt.txt
| | └——————img1
| | | └——————000001.jpg
| | | |——————000002.jpg
| | | └—————— ...
| | └——————seqinfo.ini
| └——————seq2
| └——————...
└——————labels_with_ids
└——————train
└——————seq1
| └——————000001.txt
| |——————000002.txt
| └—————— ...
└——————seq2
└—————— ...
```
#### images
- `gt.txt` is the original annotation file of all images extracted from the video.
- `img1` is the folder of images extracted from the video by a certain frame rate.
- `seqinfo.ini` is a video information description file, and the following format is required:
```
[Sequence]
name=MOT16-02
imDir=img1
frameRate=30
seqLength=600
imWidth=1920
imHeight=1080
imExt=.jpg
```
Each line in `gt.txt` describes a bounding box, with the format as follows:
```
[frame_id],[identity],[bb_left],[bb_top],[width],[height],[score],[label],[vis_ratio]
```
**Notes:**:
- `frame_id` is the current frame id.
- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in **this video or image sequence**), or `-1` if this box has no identity annotation.
- `bb_left` is the x coordinate of the left boundary of the target box
- `bb_top` is the Y coordinate of the upper boundary of the target box
- `width, height` are the pixel width and height
- `score` acts as a flag whether the entry is to be considered. A value of 0 means that this particular instance is ignored in the evaluation, while a value of 1 is used to mark it as active. `1` by default.
- `label` is the type of object annotated, use `1` as default because only single-class multi-object tracking is supported now. There are other classes of object in MOT-16, but they are treated as ignore.
- `vis_ratio` is the visibility ratio of each bounding box. This can be due to occlusion by another
static or moving object, or due to image border cropping. `1` by default.
#### labels_with_ids
Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
In the annotation text, each line is describing a bounding box and has the following format:
```
[class] [identity] [x_center] [y_center] [width] [height]
```
**Notes:**
- `class` is the class id, support single class and multi-class, start from `0`, and for single class is `0`.
- `identity` is an integer from `1` to `num_identities`(`num_identities` is the total number of instances of objects in the dataset of all videos or image squences), or `-1` if this box has no identity annotation.
- `[x_center] [y_center] [width] [height]` are the center coordinates, width and height, note that they are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
Generate the corresponding `labels_with_ids` with following command:
```
cd dataset/mot
python gen_labels_MOT.py
```
### Citation
Caltech:
```
@inproceedings{ dollarCVPR09peds,
author = "P. Doll\'ar and C. Wojek and B. Schiele and P. Perona",
title = "Pedestrian Detection: A Benchmark",
booktitle = "CVPR",
month = "June",
year = "2009",
city = "Miami",
}
```
Citypersons:
```
@INPROCEEDINGS{Shanshan2017CVPR,
Author = {Shanshan Zhang and Rodrigo Benenson and Bernt Schiele},
Title = {CityPersons: A Diverse Dataset for Pedestrian Detection},
Booktitle = {CVPR},
Year = {2017}
}
@INPROCEEDINGS{Cordts2016Cityscapes,
title={The Cityscapes Dataset for Semantic Urban Scene Understanding},
author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt},
booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2016}
}
```
CUHK-SYSU:
```
@inproceedings{xiaoli2017joint,
title={Joint Detection and Identification Feature Learning for Person Search},
author={Xiao, Tong and Li, Shuang and Wang, Bochao and Lin, Liang and Wang, Xiaogang},
booktitle={CVPR},
year={2017}
}
```
PRW:
```
@inproceedings{zheng2017person,
title={Person re-identification in the wild},
author={Zheng, Liang and Zhang, Hengheng and Sun, Shaoyan and Chandraker, Manmohan and Yang, Yi and Tian, Qi},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={1367--1376},
year={2017}
}
```
ETHZ:
```
@InProceedings{eth_biwi_00534,
author = {A. Ess and B. Leibe and K. Schindler and and L. van Gool},
title = {A Mobile Vision System for Robust Multi-Person Tracking},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08)},
year = {2008},
month = {June},
publisher = {IEEE Press},
keywords = {}
}
```
MOT-16&17:
```
@article{milan2016mot16,
title={MOT16: A benchmark for multi-object tracking},
author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad},
journal={arXiv preprint arXiv:1603.00831},
year={2016}
}
```

View File

@@ -0,0 +1,27 @@
# 数据准备
数据对于深度学习开发起到了至关重要的作用数据采集和标注的质量是提升业务模型效果的重要因素。本文档主要介绍PaddleDetection中如何进行数据准备包括采集高质量数据方法覆盖多场景类型提升模型泛化能力以及各类任务数据标注工具和方法并在PaddleDetection下使用
## 数据采集
在深度学习任务的实际落地中,数据采集往往决定了最终模型的效果,对于数据采集的几点建议如下:
### 确定方向
任务类型、数据的类别和目标场景这些因素决定了要收集什么数据,首先需要根据这些因素来确定整体数据收集的工作方向。
### 开源数据集
在实际场景中数据采集成本其实十分高昂完全靠自己收集在时间和金钱上都有很高的成本开源数据集是帮助增加训练数据量的重要手段所以很多时候会考虑加入一些相似任务的开源数据。在使用中请遵守各个开源数据集的license规定的使用条件。
### 增加场景数据
开源数据一般不会覆盖实际使用的的目标场景,用户需要评估开源数据集中已包含的场景和目标场景间的差异,有针对性地补充目标场景数据,尽量让训练和部署数据的场景一致。
### 类别均衡
在采集阶段,也需要尽量保持类别均衡,帮助模型正确学习到目标特征。
## 数据标注及格式说明
| 任务类型 | 数据标注 | 数据格式说明 |
|:--------:| :--------:|:--------:|
| 目标检测 | [文档链接](DetAnnoTools.md) | [文档链接](PrepareDetDataSet.md) |
| 关键点检测 | [文档链接](KeyPointAnnoTools.md) | [文档链接](PrepareKeypointDataSet.md) |
| 多目标跟踪 | [文档链接](MOTAnnoTools.md) | [文档链接](PrepareMOTDataSet.md) |