更换文档检测模型

This commit is contained in:
2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions

View File

@@ -0,0 +1,54 @@
简体中文 | [English](./README_en.md)
# 行为识别任务二次开发
在产业落地过程中应用行为识别算法不可避免地会出现希望自定义类型的行为识别的需求或是对已有行为识别模型的优化以提升在特定场景下模型的效果。鉴于行为的多样性PP-Human支持抽烟、打电话、摔倒、打架、人员闯入五种异常行为识别并根据行为的不同集成了基于视频分类、基于检测、基于图像分类、基于跟踪以及基于骨骼点的五种行为识别技术方案可覆盖90%+动作类型的识别满足各类开发需求。我们在本文档通过案例来介绍如何根据期望识别的行为来进行行为识别方案的选择以及使用PaddleDetection进行行为识别算法二次开发工作包括方案选择、数据准备、模型优化思路和新增行为的开发流程。
## 方案选择
在PaddleDetection的PP-Human中我们为行为识别提供了多种方案基于视频分类、基于图像分类、基于检测、基于跟踪以及基于骨骼点的行为识别方案以期望满足不同场景、不同目标行为的需求。对于二次开发首先我们需要确定要采用何种方案来实现行为识别的需求其核心是要通过对场景和具体行为的分析、并考虑数据采集成本等因素综合选择一个合适的识别方案。我们在这里简要列举了当前PaddleDetection中所支持的方案的优劣势和适用场景供大家参考。
<img width="1091" alt="image" src="https://user-images.githubusercontent.com/22989727/178742352-d0c61784-3e93-4406-b2a2-9067f42cb343.png">
下面以PaddleDetection目前已经支持的几个具体动作为例介绍每个动作方案的选型依据
### 吸烟
方案选择基于人体id检测的行为识别
原因:吸烟动作中具有香烟这个明显特征目标,因此我们可以认为当在某个人物的对应图像中检测到香烟时,该人物即在吸烟动作中。相比于基于视频或基于骨骼点的识别方案,训练检测模型需要采集的是图片级别而非视频级别的数据,可以明显减轻数据收集与标注的难度。此外,目标检测任务具有丰富的预训练模型资源,整体模型的效果会更有保障,
### 打电话
方案选择基于人体id分类的行为识别
原因:打电话动作中虽然有手机这个特征目标,但为了区分看手机等动作,以及考虑到在安防场景下打电话动作中会出现较多对手机的遮挡(如手对手机的遮挡、人头对手机的遮挡等等),不利于检测模型正确检测到目标。同时打电话通常持续的时间较长,且人物本身的动作不会发生太大变化,因此可以因此采用帧级别图像分类的策略。
此外,打电话这个动作主要可以通过上半身判别,可以采用半身图片,去除冗余信息以降低模型训练的难度。
### 摔倒
方案选择:基于人体骨骼点的行为识别
原因摔倒是一个明显的时序行为的动作可由一个人物本身进行区分具有场景无关的特性。由于PP-Human的场景定位偏向安防监控场景背景变化较为复杂且部署上需要考虑到实时性因此采用了基于骨骼点的行为识别方案以获得更好的泛化性及运行速度。
### 闯入
方案选择基于人体id跟踪的行为识别
原因:闯入识别判断行人的路径或所在位置是否在某区域内即可,与人体自身动作无关,因此只需要跟踪人体跟踪结果分析是否存在闯入行为。
### 打架
方案选择:基于视频分类的行为识别
原因与上面的动作不同打架是一个典型的多人组成的行为。因此不再通过检测与跟踪模型来提取行人及其ID而对整体视频片段进行处理。此外打架场景下各个目标间的互相遮挡极为严重关键点识别的准确性不高采用基于骨骼点的方案难以保证精度。
下面详细展开五大类方案的数据准备、模型优化和新增行为识别方法
1. [基于人体id检测的行为识别](./idbased_det.md)
2. [基于人体id分类的行为识别](./idbased_clas.md)
3. [基于人体骨骼点的行为识别](./skeletonbased_rec.md)
4. [基于人体id跟踪的行为识别](../pphuman_mot.md)
5. [基于视频分类的行为识别](./videobased_rec.md)

View File

@@ -0,0 +1,55 @@
[简体中文](./README.md) | English
# Secondary Development for Action Recognition Task
In the process of industrial implementation, the application of action recognition algorithms will inevitably lead to the need for customized types of action, or the optimization of existing action recognition models to improve the performance of the model in specific scenarios. In view of the diversity of behaviors, PP-Human supports the identification of five abnormal behavioras of smoking, making phone calls, falling, fighting, and people intrusion. At the same time, according to the different behaviors, PP-Human integrates five action recognition technology solutions based on video classification, detection-based, image-based classification, tracking-based and skeleton-based, which can cover 90%+ action type recognition and meet various development needs. In this document, we use a case to introduce how to select a action recognition solution according to the expected behavior, and use PaddleDetection to carry out the secondary development of the action recognition algorithm, including: solution selection, data preparation, model optimization and development process for adding new actions.
## Solution Selection
In PaddleDetection's PP-Human, we provide a variety of solutions for behavior recognition: video classification, image classification, detection, tracking-based, and skeleton point-based behavior recognition solutions, in order to meet the needs of different scenes and different target behaviors.
<img width="1091" alt="image" src="https://user-images.githubusercontent.com/22989727/178742352-d0c61784-3e93-4406-b2a2-9067f42cb343.png">
The following takes several specific actions that PaddleDetection currently supports as an example to introduce the selection basis of each action:
### Smoking
Solution selection: action recognition based on detection with human id.
Reason: The smoking action has a obvious feature target, that is, cigarette. So we can think that when a cigarette is detected in the corresponding image of a person, the person is with the smoking action. Compared with video-based or skeleton-based recognition schemes, training detection model needs to collect data at the image level rather than the video level, which can significantly reduce the difficulty of data collection and labeling. In addition, the detection task has abundant pre-training model resources, and the performance of the model will be more guaranteed.
### Making Phone Calls
Solution selection: action recognition based on classification with human id.
Reason: Although there is a characteristic target of a mobile phone in the call action, in order to distinguish actions such as looking at the mobile phone, and considering that there will be much occlusion of the mobile phone in the calling action in the security scene (such as the occlusion of the mobile phone by the hand or head, etc.), is not conducive to the detection model to correctly detect the target. Simultaneous, calls usually last a long time, and the character's action do not change much, so a strategy for frame-level image classification can therefore be employed. In addition, the action of making a phone call can mainly be judged by the upper body, and the half-body picture can be used to remove redundant information to reduce the difficulty of model training.
### Falling
Solution selection: action recognition based on skelenton.
Reason: Falling is an obvious temporal action, which is distinguishable by a character himself, and it is scene-independent. Since PP-Human is towards the security monitoring scene, where the background changes are more complicated, and the real-time inference needs to be considered in the deployment, the action recognition based on skeleton points is adopted to obtain better generalization and running speed.
### People Intrusion
Solution selection: action recognition based on tracking with human id.
Reason: The intrusion recognition can be judged by whether the pedestrian's path or location is in a selected area, and it is unrelated to pedestrian's body action. Therefore, it is only necessary to track the human and use coordinate results to analyze whether there is intrusion behavior.
### Fighting
Solution selection: action recognition based on video classification.
Reason: Unlike the actions above, fighting is a typical multiplayer action. Therefore, the detection and tracking model is no longer used to extract pedestrians and their IDs, but the entire video clip is processed. In addition, the mutual occlusion between various targets in the fighting scene is extremely serious, leading to the accuracy of keypoint recognition is not good.
The following are detailed description for the five major categories of solutions, including the data preparation, model optimization and adding new actions.
1. [action recognition based on detection with human id.](./idbased_det_en.md)
2. [action recognition based on classification with human id.](./idbased_clas_en.md)
3. [action recognition based on skelenton.](./skeletonbased_rec_en.md)
4. [action recognition based on tracking with human id](../pphuman_mot_en.md)
5. [action recognition based on video classification](./videobased_rec_en.md)

View File

@@ -0,0 +1,223 @@
简体中文 | [English](./idbased_clas_en.md)
# 基于人体id的分类模型开发
## 环境准备
基于人体id的分类方案是使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/installation/install_paddleclas.md)完成环境安装,以进行后续的模型训练及使用流程。
## 数据准备
基于图像分类的行为识别方案直接对视频中的图像帧结果进行识别,因此模型训练流程与通常的图像分类模型一致。
### 数据集下载
打电话的行为识别是基于公开数据集[UAV-Human](https://github.com/SUTDCV/UAV-Human)进行训练的。请通过该链接填写相关数据集申请材料后获取下载链接。
`UAVHuman/ActionRecognition/RGBVideos`路径下包含了该数据集中RGB视频数据集每个视频的文件名即为其标注信息。
### 训练及测试图像处理
根据视频文件名,其中与行为识别相关的为`A`相关的字段即action我们可以找到期望识别的动作类型数据。
- 正样本视频:以打电话为例,我们只需找到包含`A024`的文件。
- 负样本视频:除目标动作以外所有的视频。
鉴于视频数据转化为图像会有较多冗余对于正样本视频我们间隔8帧进行采样并使用行人检测模型处理为半身图像取检测框的上半部分`img = img[:H/2, :, :]`)。正样本视频中的采样得到的图像即视为正样本,负样本视频中采样得到的图像即为负样本。
**注意**: 正样本视频中并不完全符合打电话这一动作,在视频开头结尾部分会出现部分冗余动作,需要移除。
### 标注文件准备
基于图像分类的行为识别方案是借助[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)进行模型训练的。使用该方案训练的模型,需要准备期望识别的图像数据及对应标注文件。根据[PaddleClas数据集格式说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/data_preparation/classification_dataset.md#1-%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%BC%E5%BC%8F%E8%AF%B4%E6%98%8E)准备对应的数据即可。标注文件样例如下,其中`0`,`1`分别是图片对应所属的类别:
```
# 每一行采用"空格"分隔图像路径与标注
train/000001.jpg 0
train/000002.jpg 0
train/000003.jpg 1
...
```
此外,标签文件`phone_label_list.txt`,帮助将分类序号映射到具体的类型名称:
```
0 make_a_phone_call # 类型0
1 normal # 类型1
```
完成上述内容后,放置于`dataset`目录下,文件结构如下:
```
data/
├── images # 放置所有图片
├── phone_label_list.txt # 标签文件
├── phone_train_list.txt # 训练列表,包含图片及其对应类型
└── phone_val_list.txt # 测试列表,包含图片及其对应类型
```
## 模型优化
### 检测-跟踪模型优化
基于分类的行为识别模型效果依赖于前序的检测和跟踪效果如果实际场景中不能准确检测到行人位置或是难以正确在不同帧之间正确分配人物ID都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。
### 半身图预测
在打电话这一动作中,实际是通过上半身就能实现动作的区分的,因此在训练和预测过程中,将图像由行人全身图换为半身图
## 新增行为
### 数据准备
参考前述介绍的内容,完成数据准备的部分,放置于`{root of PaddleClas}/dataset`下:
```
data/
├── images # 放置所有图片
├── label_list.txt # 标签文件
├── train_list.txt # 训练列表,包含图片及其对应类型
└── val_list.txt # 测试列表,包含图片及其对应类型
```
其中,训练及测试列表如下:
```
# 每一行采用"空格"分隔图像路径与标注
train/000001.jpg 0
train/000002.jpg 0
train/000003.jpg 1
train/000004.jpg 2 # 新增的类别直接填写对应类别号即可
...
```
`label_list.txt`中需要同样对应扩展类型的名称:
```
0 make_a_phone_call # 类型0
1 Your New Action # 类型1
...
n normal # 类型n
```
### 配置文件设置
在PaddleClas中已经集成了[训练配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml),需要重点关注的设置项如下:
```yaml
# model architecture
Arch:
name: PPHGNet_tiny
class_num: 2 # 对应新增后的数量
...
# 正确设置image_root与cls_label_path保证image_root + cls_label_path中的图片路径能够正确访问图片路径
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/
cls_label_path: ./dataset/phone_train_list_halfbody.txt
...
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 1
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 2 # 显示topk的数量不要超过类别总数
class_id_map_file: dataset/phone_label_list.txt # 修改后的label_list.txt路径
```
### 模型训练及评估
#### 模型训练
通过如下命令启动训练:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
-o Arch.pretrained=True
```
其中 `Arch.pretrained``True`表示使用预训练权重帮助训练。
#### 模型评估
训练好模型之后,可以通过以下命令实现对模型指标的评估。
```bash
python3 tools/eval.py \
-c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
-o Global.pretrained_model=output/PPHGNet_tiny/best_model
```
其中 `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` 指定了当前最佳权重所在的路径,如果指定其他权重,只需替换对应的路径即可。
### 模型导出
模型导出的详细介绍请参考[这里](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model)
可以参考以下步骤实现:
```python
python tools/export_model.py
-c ./PPHGNet_tiny_calling_halfbody.yaml \
-o Global.pretrained_model=./output/PPHGNet_tiny/best_model \
-o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody
```
然后将导出的模型重命名并加入配置文件以适配PP-Human的使用。
```bash
cd ./output_inference/PPHGNet_tiny_calling_halfbody
mv inference.pdiparams model.pdiparams
mv inference.pdiparams.info model.pdiparams.info
mv inference.pdmodel model.pdmodel
# 下载预测配置文件
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml
```
至此即可使用PP-Human进行实际预测了。
### 自定义行为输出
基于人体id的分类的行为识别方案中将任务转化为对应人物的图像进行图片级别的分类。对应分类的类型最终即视为当前阶段的行为。因此在完成自定义模型的训练及部署的基础上还需要将分类模型结果转化为最终的行为识别结果作为输出并修改可视化的显示结果。
#### 转换为行为识别结果
请对应修改[后处理函数](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L509)。
核心代码为:
```python
# 确定分类模型的最高分数输出结果
cls_id_res = 1
cls_score_res = -1.0
for cls_id in range(len(cls_result[idx])):
score = cls_result[idx][cls_id]
if score > cls_score_res:
cls_id_res = cls_id
cls_score_res = score
# Current now, class 0 is positive, class 1 is negative.
if cls_id_res == 1 or (cls_id_res == 0 and
cls_score_res < self.threshold):
# 如果分类结果不是目标行为或是置信度未达到阈值,则根据历史结果确定当前帧的行为
history_cls, life_remain, history_score = self.result_history.get(
tracker_id, [1, self.frame_life, -1.0])
cls_id_res = history_cls
cls_score_res = 1 - cls_score_res
life_remain -= 1
if life_remain <= 0 and tracker_id in self.result_history:
del (self.result_history[tracker_id])
elif tracker_id in self.result_history:
self.result_history[tracker_id][1] = life_remain
else:
self.result_history[
tracker_id] = [cls_id_res, life_remain, cls_score_res]
else:
# 分类结果属于目标行为,则使用将该结果,并记录到历史结果中
self.result_history[
tracker_id] = [cls_id_res, self.frame_life, cls_score_res]
...
```
#### 修改可视化输出
目前基于ID的行为识别是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。

View File

@@ -0,0 +1,224 @@
[简体中文](./idbased_clas.md) | English
# Development for Action Recognition Based on Classification with Human ID
## Environmental Preparation
The model of action recognition based on classification with human id is trained with [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). Please refer to [Install PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md) to complete the environment installation for subsequent model training and usage processes.
## Data Preparation
The model of action recognition based on classification with human id directly recognizes the image frames of video, so the model training process is same with the usual image classification model.
### Dataset Download
The action recognition of making phone calls is trained on the public dataset [UAV-Human](https://github.com/SUTDCV/UAV-Human). Please fill in the relevant application materials through this link to obtain the download link.
The RGB video in this dataset is included in the `UAVHuman/ActionRecognition/RGBVideos` path, and the file name of each video is its annotation information.
### Image Processing for Training and Validation
According to the video file name, in which the `A` field (i.e. action) related to action recognition, we can find the action type of the video data that we expect to recognize.
- Positive sample video: Taking phone calls as an example, we just need to find the file containing `A024`.
- Negative sample video: All videos except the target action.
In view of the fact that there will be much redundancy when converting video data into images, for positive sample videos, we sample at intervals of 8 frames, and use the pedestrian detection model to process it into a half-body image (take the upper half of the detection frame, that is, `img = img[: H/2, :, :]`). The image sampled from the positive sample video is regarded as a positive sample, and the sampled image from the negative sample video is regarded as a negative sample.
**Note**: The positive sample video does not completely are the action of making a phone call. There will be some redundant actions at the beginning and end of the video, which need to be removed.
### Preparation for Annotation File
The model of action recognition based on classification with human id is trained with [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). Thus the model trained with this scheme needs to prepare the desired image data and corresponding annotation files. Please refer to [Image Classification Datasets](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/data_preparation/classification_dataset_en.md) to prepare the data. An example of an annotation file is as follows, where `0` and `1` are the corresponding categories of the image:
```
# Each line uses "space" to separate the image path and label
train/000001.jpg 0
train/000002.jpg 0
train/000003.jpg 1
...
```
Additionally, the label file `phone_label_list.txt` helps map category numbers to specific type names:
```
0 make_a_phone_call # type 0
1 normal # type 1
```
After the above content finished, place it to the `dataset` directory, the file structure is as follow:
```
data/
├── images # All images
├── phone_label_list.txt # Label file
├── phone_train_list.txt # Training list, including pictures and their corresponding types
└── phone_val_list.txt # Validation list, including pictures and their corresponding types
```
## Model Optimization
### Detection-Tracking Model Optimization
The performance of action recognition based on classification with human id depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization.
### Half-Body Prediction
In the action of making a phone call, the action classification can be achieved through the upper body image. Therefore, during the training and prediction process, the image is changed from the pedestrian full-body to half-body.
## Add New Action
### Data Preparation
Referring to the previous introduction, complete the data preparation part and place it under `{root of PaddleClas}/dataset`:
```
data/
├── images # All images
├── label_list.txt # Label file
├── train_list.txt # Training list, including pictures and their corresponding types
└── val_list.txt # Validation list, including pictures and their corresponding types
```
Where the training list and validation list file are as follow:
```
# Each line uses "space" to separate the image path and label
train/000001.jpg 0
train/000002.jpg 0
train/000003.jpg 1
train/000004.jpg 2 # For the newly added categories, simply fill in the corresponding category number.
`label_list.txt` should give name of the extension type:
```
0 make_a_phone_call # class 0
1 Your New Action # class 1
...
n normal # class n
```
...
```
### Configuration File Settings
The [training configuration file] (https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml) has been integrated in PaddleClas. The settings that need to be paid attention to are as follows:
```yaml
# model architecture
Arch:
name: PPHGNet_tiny
class_num: 2 # Corresponding to the number of action categories
...
# Please correctly set image_root and cls_label_path to ensure that the image_root + image path in cls_label_path can access the image correctly
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/
cls_label_path: ./dataset/phone_train_list_halfbody.txt
...
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 1
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 2 # Display the number of topks, do not exceed the total number of categories
class_id_map_file: dataset/phone_label_list.txt # path of label_list.txt
```
### Model Training And Evaluation
#### Model Training
Start training with the following command:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
-o Arch.pretrained=True
```
where `Arch.pretrained=True` is to use pretrained weights to help with training.
#### Model Evaluation
After training the model, use the following command to evaluate the model metrics.
```bash
python3 tools/eval.py \
-c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
-o Global.pretrained_model=output/PPHGNet_tiny/best_model
```
Where `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` specifies the path where the current best weight is located. If other weights are needed, just replace the corresponding path.
#### Model Export
For the detailed introduction of model export, please refer to [here](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model)
You can refer to the following steps:
```python
python tools/export_model.py
-c ./PPHGNet_tiny_calling_halfbody.yaml \
-o Global.pretrained_model=./output/PPHGNet_tiny/best_model \
-o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody
```
Then rename the exported model and add the configuration file to suit the usage of PP-Human.
```bash
cd ./output_inference/PPHGNet_tiny_calling_halfbody
mv inference.pdiparams model.pdiparams
mv inference.pdiparams.info model.pdiparams.info
mv inference.pdmodel model.pdmodel
# Download configuration file for inference
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml
```
At this point, this model can be used in PP-Human.
### Custom Action Output
In the model of action recognition based on classification with human id, the task is defined as a picture-level classification task of corresponding person. The type of the corresponding classification is finally regarded as the action type of the current stage. Therefore, on the basis of completing the training and deployment of the custom model, it is also necessary to convert the classification model results to the final action recognition results as output, and the displayed result of the visualization should be modified.
Please modify the [postprocessing function](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L509).
The core code are:
```python
# Get the highest score output of the classification model
cls_id_res = 1
cls_score_res = -1.0
for cls_id in range(len(cls_result[idx])):
score = cls_result[idx][cls_id]
if score > cls_score_res:
cls_id_res = cls_id
cls_score_res = score
# Current now, class 0 is positive, class 1 is negative.
if cls_id_res == 1 or (cls_id_res == 0 and
cls_score_res < self.threshold):
# If the classification result is not the target action or its confidence does not reach the threshold,
# determine the action type of the current frame according to the historical results
history_cls, life_remain, history_score = self.result_history.get(
tracker_id, [1, self.frame_life, -1.0])
cls_id_res = history_cls
cls_score_res = 1 - cls_score_res
life_remain -= 1
if life_remain <= 0 and tracker_id in self.result_history:
del (self.result_history[tracker_id])
elif tracker_id in self.result_history:
self.result_history[tracker_id][1] = life_remain
else:
self.result_history[
tracker_id] = [cls_id_res, life_remain, cls_score_res]
else:
# If the classification result belongs to the target action, use the result and record it in the historical result
self.result_history[
tracker_id] = [cls_id_res, self.frame_life, cls_score_res]
...
```
#### Modify Visual Output
At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result.

View File

@@ -0,0 +1,202 @@
简体中文 | [English](./idbased_det_en.md)
# 基于人体id的检测模型开发
## 环境准备
基于人体id的检测方案是直接使用[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL_cn.md)完成环境安装,以进行后续的模型训练及使用流程。
## 数据准备
基于检测的行为识别方案中,数据准备的流程与一般的检测模型一致,详情可参考[目标检测数据准备](../../../tutorials/data/PrepareDetDataSet.md)。将图像和标注数据组织成PaddleDetection中支持的格式之一即可。
**注意** 在实际使用的预测过程中,使用的是单人图像进行预测,因此在训练过程中建议将图像裁剪为单人图像,再进行烟头检测框的标注,以提升准确率。
## 模型优化
### 检测-跟踪模型优化
基于检测的行为识别模型效果依赖于前序的检测和跟踪效果如果实际场景中不能准确检测到行人位置或是难以正确在不同帧之间正确分配人物ID都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。
### 更大的分辨率
烟头的检测在监控视角下是一个典型的小目标检测问题,使用更大的分辨率有助于提升模型整体的识别率
### 预训练模型
加入小目标场景数据集VisDrone下的预训练模型进行训练模型mAP由38.1提升到39.7。
## 新增行为
### 数据准备
参考[目标检测数据准备](../../../tutorials/data/PrepareDetDataSet.md)完成训练数据准备。
准备完成后,数据路径为
```
dataset/smoking
├── smoking # 存放所有的图片
│   ├── 1.jpg
│   ├── 2.jpg
├── smoking_test_cocoformat.json # 测试标注文件
├── smoking_train_cocoformat.json # 训练标注文件
```
`COCO`格式为例完成后的json标注文件内容如下
```json
# images字段下包含了图像的路径id及对应宽高信息
"images": [
{
"file_name": "smoking/1.jpg",
"id": 0, # 此处id为图片id序号不要重复
"height": 437,
"width": 212
},
{
"file_name": "smoking/2.jpg",
"id": 1,
"height": 655,
"width": 365
},
...
# categories 字段下包含所有类别信息,如果希望新增更多的检测类别,请在这里增加, 示例如下。
"categories": [
{
"supercategory": "cigarette",
"id": 1,
"name": "cigarette"
},
{
"supercategory": "Class_Defined_by_Yourself",
"id": 2,
"name": "Class_Defined_by_Yourself"
},
...
# annotations 字段下包含了所有目标实例的信息,包括类别,检测框坐标, id, 所属图像id等信息
"annotations": [
{
"category_id": 1, # 对应定义的类别在这里1代表cigarette
"bbox": [
97.0181345931,
332.7033243081,
7.5943999555,
16.4545332369
],
"id": 0, # 此处id为实例的id序号不要重复
"image_id": 0, # 此处为实例所在图片的id序号可能重复此时即一张图片上有多个实例对象
"iscrowd": 0,
"area": 124.96230648208665
},
{
"category_id": 2, # 对应定义的类别在这里2代表Class_Defined_by_Yourself
"bbox": [
114.3895698372,
221.9131122343,
25.9530363697,
50.5401234568
],
"id": 1,
"image_id": 1,
"iscrowd": 0,
"area": 1311.6696622034585
```
### 配置文件设置
参考[配置文件](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), 其中需要关注重点如下:
```yaml
metric: COCO
num_classes: 1 # 如果新增了更多的类别,请对应修改此处
# 正确设置image_diranno_pathdataset_dir
# 保证dataset_dir + anno_path 能正确对应标注文件的路径
# 保证dataset_dir + image_dir + 标注文件中的图片路径可以正确对应到图片路径
TrainDataset:
!COCODataSet
image_dir: ""
anno_path: smoking_train_cocoformat.json
dataset_dir: dataset/smoking
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: ""
anno_path: smoking_test_cocoformat.json
dataset_dir: dataset/smoking
TestDataset:
!ImageFolder
anno_path: smoking_test_cocoformat.json
dataset_dir: dataset/smoking
```
### 模型训练及评估
#### 模型训练
参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),执行下列步骤实现
```bash
# At Root of PaddleDetection
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval
```
#### 模型评估
训练好模型之后,可以通过以下命令实现对模型指标的评估
```bash
# At Root of PaddleDetection
python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml
```
### 模型导出
注意如果在Tensor-RT环境下预测, 请开启`-o trt=True`以获得更好的性能
```bash
# At Root of PaddleDetection
python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True
```
导出模型后,可以得到:
```
ppyoloe_crn_s_80e_smoking_visdrone/
├── infer_cfg.yml
├── model.pdiparams
├── model.pdiparams.info
└── model.pdmodel
```
至此即可使用PP-Human进行实际预测了。
### 自定义行为输出
基于人体id的检测的行为识别方案中将任务转化为在对应人物的图像中检测目标特征对象。当目标特征对象被检测到时则视为行为正在发生。因此在完成自定义模型的训练及部署的基础上还需要将检测模型结果转化为最终的行为识别结果作为输出并修改可视化的显示结果。
#### 转换为行为识别结果
请对应修改[后处理函数](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L338)。
核心代码为:
```python
# 解析检测模型输出,并筛选出置信度高于阈值的有效检测框。
# Current now, class 0 is positive, class 1 is negative.
action_ret = {'class': 1.0, 'score': -1.0}
box_num = np_boxes_num[idx]
boxes = det_result['boxes'][cur_box_idx:cur_box_idx + box_num]
cur_box_idx += box_num
isvalid = (boxes[:, 1] > self.threshold) & (boxes[:, 0] == 0)
valid_boxes = boxes[isvalid, :]
if valid_boxes.shape[0] >= 1:
# 存在有效检测框时,行为识别结果的类别和分数对应修改
action_ret['class'] = valid_boxes[0, 0]
action_ret['score'] = valid_boxes[0, 1]
# 由于动作的持续性,有效检测结果可复用一定帧数
self.result_history[
tracker_id] = [0, self.frame_life, valid_boxes[0, 1]]
else:
# 不存在有效检测框,则根据历史检测数据确定当前帧的结果
...
```
#### 修改可视化输出
目前基于ID的行为识别是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。

View File

@@ -0,0 +1,199 @@
[简体中文](./idbased_det.md) | English
# Development for Action Recognition Based on Detection with Human ID
## Environmental Preparation
The model of action recognition based on detection with human id is trained with [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). Please refer to [Installation](../../../tutorials/INSTALL.md) to complete the environment installation for subsequent model training and usage processes.
## Data Preparation
The model of action recognition based on detection with human id directly recognizes the image frames of video, so the model training process is same with preparation process of general detection model. For details, please refer to [Data Preparation for Detection](../../../tutorials/data/PrepareDetDataSet_en.md). Please process image and annotation of data into one of the formats PaddleDetection supports.
**Note**: In the actual prediction process, a single person image is used for prediction. So it is recommended to crop the image into a single person image during the training process, and label the cigarette detection bounding box to improve the accuracy.
## Model Optimization
### Detection-Tracking Model Optimization
The performance of action recognition based on detection with human id depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization.
### Larger resolution
The detection of cigarette is a typical small target detection problem from the monitoring perspective. Using a larger resolution can help improve the overall performance of the model.
### Pretrained model
The pretrained model under the small target scene dataset VisDrone is used for training, and the mAP of the model is increased from 38.1 to 39.7.
## Add New Action
### Data Preparation
please refer to [Data Preparation for Detection](../../../tutorials/data/PrepareDetDataSet_en.md) to complete the data preparation part.
When finish this step, the path will look like:
```
dataset/smoking
├── smoking # all images
│   ├── 1.jpg
│   ├── 2.jpg
├── smoking_test_cocoformat.json # Validation file
├── smoking_train_cocoformat.json # Training file
```
Taking the `COCO` format as an example, the content of the completed json annotation file is as follows:
```json
# The "images" field contains the path, id and corresponding width and height information of the images.
"images": [
{
"file_name": "smoking/1.jpg",
"id": 0, # Here id is the picture id serial number, do not duplicate
"height": 437,
"width": 212
},
{
"file_name": "smoking/2.jpg",
"id": 1,
"height": 655,
"width": 365
},
...
# The "categories" field contains all category information. If you want to add more detection categories, please add them here. The example is as follows.
"categories": [
{
"supercategory": "cigarette",
"id": 1,
"name": "cigarette"
},
{
"supercategory": "Class_Defined_by_Yourself",
"id": 2,
"name": "Class_Defined_by_Yourself"
},
...
# The "annotations" field contains information about all instances, including category, bounding box coordinates, id, image id and other information
"annotations": [
{
"category_id": 1, # Corresponding to the defined category, where 1 represents cigarette
"bbox": [
97.0181345931,
332.7033243081,
7.5943999555,
16.4545332369
],
"id": 0, # Here id is the id serial number of the instance, do not duplicate
"image_id": 0, # Here is the id serial number of the image where the instance is located, which may be duplicated. In this case, there are multiple instance objects on one image.
"iscrowd": 0,
"area": 124.96230648208665
},
{
"category_id": 2, # Corresponding to the defined category, where 2 represents Class_Defined_by_Yourself
"bbox": [
114.3895698372,
221.9131122343,
25.9530363697,
50.5401234568
],
"id": 1,
"image_id": 1,
"iscrowd": 0,
"area": 1311.6696622034585
```
### Configuration File Settings
Refer to [Configuration File](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), the key should be paid attention to are as follows:
```yaml
metric: COCO
num_classes: 1 # If more categories are added, please modify here accordingly
# Set image_diranno_pathdataset_dir correctly
# Ensure that dataset_dir + anno_path can correctly access to the path of the annotation file
# Ensure that dataset_dir + image_dir + the image path in the annotation file can correctly access to the image path
TrainDataset:
!COCODataSet
image_dir: ""
anno_path: smoking_train_cocoformat.json
dataset_dir: dataset/smoking
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: ""
anno_path: smoking_test_cocoformat.json
dataset_dir: dataset/smoking
TestDataset:
!ImageFolder
anno_path: smoking_test_cocoformat.json
dataset_dir: dataset/smoking
```
### Model Training And Evaluation
#### Model Training
As [PP-YOLOE](../../../../configs/ppyoloe/README.md), start training with the following command:
```bash
# At Root of PaddleDetection
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval
```
#### Model Evaluation
After training the model, use the following command to evaluate the model metrics.
```bash
# At Root of PaddleDetection
python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml
```
#### Model Export
Note: If predicting in Tensor-RT environment, please enable `-o trt=True` for better performance.
```bash
# At Root of PaddleDetection
python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True
```
After exporting the model, you can get:
```
ppyoloe_crn_s_80e_smoking_visdrone/
├── infer_cfg.yml
├── model.pdiparams
├── model.pdiparams.info
└── model.pdmodel
```
At this point, this model can be used in PP-Human.
### Custom Action Output
In the model of action recognition based on detection with human id, the task is defined to detect target objects in images of corresponding person. When the target object is detected, the behavior type of the character in a certain period of time. The type of the corresponding classification is regarded as the action of the current period. Therefore, on the basis of completing the training and deployment of the custom model, it is also necessary to convert the detection model results to the final action recognition results as output, and the displayed result of the visualization should be modified.
#### Convert to Action Recognition Result
Please modify the [postprocessing function](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L338).
The core code are:
```python
# Parse the detection model output and filter out valid detection boxes with confidence higher than a threshold.
# Current now, class 0 is positive, class 1 is negative.
action_ret = {'class': 1.0, 'score': -1.0}
box_num = np_boxes_num[idx]
boxes = det_result['boxes'][cur_box_idx:cur_box_idx + box_num]
cur_box_idx += box_num
isvalid = (boxes[:, 1] > self.threshold) & (boxes[:, 0] == 0)
valid_boxes = boxes[isvalid, :]
if valid_boxes.shape[0] >= 1:
# When there is a valid detection frame, the category and score of the behavior recognition result are modified accordingly.
action_ret['class'] = valid_boxes[0, 0]
action_ret['score'] = valid_boxes[0, 1]
# Due to the continuity of the action, valid detection results can be reused for a certain number of frames.
self.result_history[
tracker_id] = [0, self.frame_life, valid_boxes[0, 1]]
else:
# If there is no valid detection frame, the result of the current frame is determined according to the historical detection result.
...
```
#### Modify Visual Output
At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result.

View File

@@ -0,0 +1,205 @@
简体中文 | [English](./skeletonbased_rec_en.md)
# 基于人体骨骼点的行为识别
## 环境准备
基于骨骼点的行为识别方案是借助[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/install.md)完成PaddleVideo的环境安装以进行后续的模型训练及使用流程。
## 数据准备
使用该方案训练的模型,可以参考[此文档](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE)准备训练数据以适配PaddleVideo进行训练其主要流程包含以下步骤
### 数据格式说明
STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo中训练数据为采用`.npy`格式存储的`Numpy`数据,标签则可以是`.npy``.pkl`格式存储的文件。对于序列数据的维度要求为`(N,C,T,V,M)`,当前方案仅支持单人构成的行为(但视频中可以存在多人,每个人独自进行行为识别判断),即`M=1`
| 维度 | 大小 | 说明 |
| ---- | ---- | ---------- |
| N | 不定 | 数据集序列个数 |
| C | 2 | 关键点坐标维度,即(x, y) |
| T | 50 | 动作序列的时序维度(即持续帧数)|
| V | 17 | 每个人物关键点的个数,这里我们使用了`COCO`数据集的定义,具体可见[这里](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareKeypointDataSet_cn.md#COCO%E6%95%B0%E6%8D%AE%E9%9B%86) |
| M | 1 | 人物个数,这里我们每个动作序列只针对单人预测 |
### 获取序列的骨骼点坐标
对于一个待标注的序列(这里序列指一个动作片段,可以是视频或有顺序的图片集合)。可以通过模型预测或人工标注的方式获取骨骼点(也称为关键点)坐标。
- 模型预测:可以直接选用[PaddleDetection KeyPoint模型系列](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/keypoint) 模型库中的模型,并根据`3、训练与测试 - 部署预测 - 检测+keypoint top-down模型联合部署`中的步骤获取目标序列的17个关键点坐标。
- 人工标注:若对关键点的数量或是定义有其他需求,也可以直接人工标注各个关键点的坐标位置,注意对于被遮挡或较难标注的点,仍需要标注一个大致坐标,否则后续网络学习过程会受到影响。
当使用模型预测获取时可以参考如下步骤进行请注意此时在PaddleDetection中进行操作。
```bash
# current path is under root of PaddleDetection
# Step 1: download pretrained inference models.
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip
unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip
unzip -d output_inference/ dark_hrnet_w32_256x192.zip
# Step 2: Get the keypoint coordinarys
# if your data is image sequence
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True
# if your data is video
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True
```
这样我们会得到一个`det_keypoint_unite_image_results.json`的检测结果文件。内容的具体含义请见[这里](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108)。
### 统一序列的时序长度
由于实际数据中每个动作的长度不一首先需要根据您的数据和实际场景预定时序长度在PP-Human中我们采用50帧为一个动作序列并对数据做以下处理
- 实际长度超过预定长度的数据随机截取一个50帧的片段
- 实际长度不足预定长度的数据补0直到满足50帧
- 恰好等于预定长度的数据: 无需处理
注意:在这一步完成后,请严格确认处理后的数据仍然包含了一个完整的行为动作,不会产生预测上的歧义,建议通过可视化数据的方式进行确认。
### 保存为PaddleVideo可用的文件格式
在经过前两步处理后,我们得到了每个人物动作片段的标注,此时我们已有一个列表`all_kpts`,这个列表中包含多个关键点序列片段,其中每一个片段形状为(T, V, C) (在我们的例子中即(50, 17, 2)), 下面进一步将其转化为PaddleVideo可用的格式。
- 调整维度顺序: 可通过`np.transpose``np.expand_dims`将每一个片段的维度转化为(C, T, V, M)的格式。
- 将所有片段组合并保存为一个文件
注意:这里的`class_id``int`类型,与其他分类任务类似。例如`0摔倒 1其他`
我们提供了执行该步骤的[脚本文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py),可以直接处理生成的`det_keypoint_unite_image_results.json`文件该脚本执行的内容包括解析json文件内容、前述步骤中介绍的整理训练数据及保存数据文件。
```bash
mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations
mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json
cd {root of PaddleVideo}/applications/PPHuman/datasets/
python prepare_dataset.py
```
至此,我们得到了可用的训练数据(`.npy`)和对应的标注文件(`.pkl`)。
## 模型优化
### 检测-跟踪模型优化
基于骨骼点的行为识别模型效果依赖于前序的检测和跟踪效果如果实际场景中不能准确检测到行人位置或是难以正确在不同帧之间正确分配人物ID都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。
### 关键点模型优化
骨骼点作为该方案的核心特征,对行人的骨骼点定位效果也决定了行为识别的整体效果。若发现在实际场景中对关键点坐标的识别结果有明显错误,从关键点组成的骨架图像看,已经难以辨别具体动作,可以参考[关键点检测任务二次开发](../keypoint_detection.md)对关键点模型进行优化。
### 坐标归一化处理
在完成骨骼点坐标的获取后,建议根据各人物的检测框进行归一化处理,以消除人物位置、尺度的差异给网络带来的收敛难度。
## 新增行为
基于关键点的行为识别方案中,行为识别模型使用的是[ST-GCN](https://arxiv.org/abs/1801.07455),并在[PaddleVideo训练流程](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md)的基础上修改适配,完成模型训练及导出使用流程。
### 数据准备与配置文件修改
- 按照`数据准备`, 准备训练数据(`.npy`)和对应的标注文件(`.pkl`)。对应放置在`{root of PaddleVideo}/applications/PPHuman/datasets/`下。
- 参考[配置文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), 需要重点关注的内容如下:
```yaml
MODEL: #MODEL field
framework:
backbone:
name: "STGCN"
in_channels: 2 # 此处对应数据说明中的C维表示二维坐标。
dropout: 0.5
layout: 'coco_keypoint'
data_bn: True
head:
name: "STGCNHead"
num_classes: 2 # 如果数据中有多种行为类型,需要修改此处使其与预测类型数目一致。
if_top5: False # 行为类型数量不足5时请设置为False否则会报错
...
# 请根据数据路径正确设置train/valid/test部分的数据及label路径
DATASET: #DATASET field
batch_size: 64
num_workers: 4
test_batch_size: 1
test_num_workers: 0
train:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle
file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path
label_path: "./applications/PPHuman/datasets/train_label.pkl"
valid:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
label_path: "./applications/PPHuman/datasets/val_label.pkl"
test_mode: True
test:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
label_path: "./applications/PPHuman/datasets/val_label.pkl"
test_mode: True
```
### 模型训练与测试
- 在PaddleVideo中使用以下命令即可开始训练
```bash
# current path is under root of PaddleVideo
python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml
# 由于整个任务可能过拟合,建议同时开启验证以保存最佳模型
python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml
```
- 在训练完成后,采用以下命令进行预测:
```bash
python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml -w output/STGCN/STGCN_best.pdparams
```
### 模型导出
- 在PaddleVideo中通过以下命令实现模型的导出得到模型结构文件`STGCN.pdmodel`和模型权重文件`STGCN.pdiparams`,并增加配置文件:
```bash
# current path is under root of PaddleVideo
python tools/export_model.py -c applications/PPHuman/configs/stgcn_pphuman.yaml \
-p output/STGCN/STGCN_best.pdparams \
-o output_inference/STGCN
cp applications/PPHuman/configs/infer_cfg.yml output_inference/STGCN
# 重命名模型文件适配PP-Human的调用
cd output_inference/STGCN
mv STGCN.pdiparams model.pdiparams
mv STGCN.pdiparams.info model.pdiparams.info
mv STGCN.pdmodel model.pdmodel
```
完成后的导出模型目录结构如下:
```
STGCN
├── infer_cfg.yml
├── model.pdiparams
├── model.pdiparams.info
├── model.pdmodel
```
至此就可以使用PP-Human进行行为识别的推理了。
**注意**:如果在训练时调整了视频序列的长度或关键点的数量,在此处需要对应修改配置文件中`INFERENCE`字段内容,以实现正确预测。
```yaml
# 序列数据的维度为(N,C,T,V,M)
INFERENCE:
name: 'STGCN_Inference_helper'
num_channels: 2 # 对应C维
window_size: 50 # 对应T维请对应调整为数据长度
vertex_nums: 17 # 对应V维请对应调整为关键点数目
person_nums: 1 # 对应M维
```
### 自定义行为输出
基于人体骨骼点的行为识别方案中,模型输出的分类结果即代表了该人物在一定时间段内行为类型。对应分类的类型最终即视为当前阶段的行为。因此在完成自定义模型的训练及部署的基础上,使用模型输出作为最终结果,修改可视化的显示结果即可。
#### 修改可视化输出
目前基于ID的行为识别是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。

View File

@@ -0,0 +1,200 @@
[简体中文](./skeletonbased_rec.md) | English
# Skeleton-based action recognition
## Environmental Preparation
The skeleton-based action recognition is trained with [PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo). Please refer to [Installation](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/install.md) to complete the environment installation for subsequent model training and usage processes.
## Data Preparation
For the model of skeleton-based model, you can refer to [this document](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE %AD%E7%BB%83%E6%95%B0%E6%8D%AE) to preparation training adapted to PaddleVideo. The main process includes the following steps:
### Data Format Description
STGCN is a model based on the sequence of skeleton point coordinates. In PaddleVideo, training data is `Numpy` data stored with `.npy` format, and labels can be files stored in `.npy` or `.pkl` format. The dimension requirement for sequence data is `(N,C,T,V,M)`, the current solution only supports behaviors composed of a single person (but there can be multiple people in the video, and each person performs action recognition separately), that is` M=1`.
| Dim | Size | Description |
| ---- | ---- | ---------- |
| N | Not Fixed | The number of sequences in the dataset |
| C | 2 | Keypoint coordinate, i.e. (x, y) |
| T | 50 | The temporal dimension of the action sequence (i.e. the number of continuous frames)|
| V | 17 | The number of keypoints of each person, here we use the definition of the `COCO` dataset, see [here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareKeypointDataSet_en.md#description-for-coco-datasetkeypoint) |
| M | 1 | The number of persons, here we only predict a single person for each action sequence |
### Get The Skeleton Point Coordinates of The Sequence
For a sequence to be labeled (here a sequence refers to an action segment, which can be a video or an ordered collection of pictures). The coordinates of skeletal points (also known as keypoints) can be obtained through model prediction or manual annotation.
- Model prediction: You can directly select the model in the [PaddleDetection KeyPoint Models](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/README_en.md) and according to `3, training and testing - Deployment Prediction - Detect + keypoint top-down model joint deployment` to get the 17 keypoint coordinates of the target sequence.
When using the model to predict and obtain the coordinates, you can refer to the following steps, please note that the operation in PaddleDetection at this time.
```bash
# current path is under root of PaddleDetection
# Step 1: download pretrained inference models.
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip
unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip
unzip -d output_inference/ dark_hrnet_w32_256x192.zip
# Step 2: Get the keypoint coordinarys
# if your data is image sequence
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True
# if your data is video
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True
```
We can get a detection result file named `det_keypoint_unite_image_results.json`. The detail of content can be seen at [Here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108).
### Uniform Sequence Length
Since the length of each action in the actual data is different, the first step is to pre-determine the time sequence length according to your data and the actual scene (in PP-Human, we use 50 frames as an action sequence), and do the following processing to the data:
- If the actual length exceeds the predetermined length, a 50-frame segment will be randomly intercepted
- Data whose actual length is less than the predetermined length: fill with 0 until 50 frames are met
- data exactly equal to the predeter: no processing required
Note: After this step is completed, please strictly confirm that the processed data contains a complete action, and there will be no ambiguity in prediction. It is recommended to confirm by visualizing the data.
### Save to PaddleVideo usable formats
After the first two steps of processing, we get the annotation of each character action fragment. At this time, we have a list `all_kpts`, which contains multiple keypoint sequence fragments, each one has a shape of (T, V, C) (in our case (50, 17, 2)), which is further converted into a format usable by PaddleVideo.
- Adjust dimension order: `np.transpose` and `np.expand_dims` can be used to convert the dimension of each fragment into (C, T, V, M) format.
- Combine and save all clips as one file
Note: `class_id` is a `int` type variable, similar to other classification tasks. For example `0: falling, 1: other`.
We provide a [script file](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py) to do this step, which can directly process the generated `det_keypoint_unite_image_results.json` file. The content executed by the script includes parsing the content of the json file, unforming the training data sequence and saving the data file as described in the preceding steps.
```bash
mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations
mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json
cd {root of PaddleVideo}/applications/PPHuman/datasets/
python prepare_dataset.py
```
Now, we have available training data (`.npy`) and corresponding annotation files (`.pkl`).
## Model Optimization
### detection-tracking model optimization
The performance of action recognition based on skelenton depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization.
### keypoint model optimization
As the core feature of the scheme, the skeleton point positioning performance also determines the overall effect of action recognition. If there are obvious errors in the recognition results of the keypoint coordinates of in the actual scene, it is difficult to distinguish the specific actions from the skeleton image composed of the keypoint.
You can refer to [Secondary Development of Keypoint Detection Task](../keypoint_detection_en.md) to optimize the keypoint model.
### Coordinate Normalization
After getting coordinates of the skeleton points, it is recommended to perform normalization processing according to the detection bounding box of each person to reduce the convergence difficulty brought by the difference in the position and scale of the person.
## Add New Action
In skeleton-based action recognition, the model is [ST-GCN](https://arxiv.org/abs/1801.07455). Modified to adapt PaddleVideo based on [Training Step](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/model_zoo/recognition/stgcn.md). And complete the model training and exporting process.
### Data Preparation And Configuration File Settings
- Prepare the training data (`.npy`) and the corresponding annotation file (`.pkl`) according to `Data preparation`. Correspondingly placed under `{root of PaddleVideo}/applications/PPHuman/datasets/`.
- Refer [Configuration File](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), the things to focus on are as follows:
```yaml
MODEL: #MODEL field
framework:
backbone:
name: "STGCN"
in_channels: 2 # This corresponds to the C dimension in the data format description, representing two-dimensional coordinates.
dropout: 0.5
layout: 'coco_keypoint'
data_bn: True
head:
name: "STGCNHead"
num_classes: 2 # If there are multiple action types in the data, this needs to be modified to match the number of types.
if_top5: False # When the number of action types is less than 5, please set it to False, otherwise an error will be raised.
...
# Please set the data and label path of the train/valid/test part correctly according to the data path
DATASET: #DATASET field
batch_size: 64
num_workers: 4
test_batch_size: 1
test_num_workers: 0
train:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle
file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path
label_path: "./applications/PPHuman/datasets/train_label.pkl"
valid:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
label_path: "./applications/PPHuman/datasets/val_label.pkl"
test_mode: True
test:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
label_path: "./applications/PPHuman/datasets/val_label.pkl"
test_mode: True
```
### Model Training And Evaluation
- In PaddleVideo, start training with the following command:
```bash
# current path is under root of PaddleVideo
python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml
# Since the task may overfit, it is recommended to evaluate model during training to save the best model.
python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml
```
- After training the model, use the following command to do inference.
```bash
python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml -w output/STGCN/STGCN_best.pdparams
```
### Model Export
In PaddleVideo, use the following command to export model and get structure file `STGCN.pdmodel` and weight file `STGCN.pdiparams`. And add the configuration file here.
```bash
# current path is under root of PaddleVideo
python tools/export_model.py -c applications/PPHuman/configs/stgcn_pphuman.yaml \
-p output/STGCN/STGCN_best.pdparams \
-o output_inference/STGCN
cp applications/PPHuman/configs/infer_cfg.yml output_inference/STGCN
# Rename model files to adapt PP-Human
cd output_inference/STGCN
mv STGCN.pdiparams model.pdiparams
mv STGCN.pdiparams.info model.pdiparams.info
mv STGCN.pdmodel model.pdmodel
```
The directory structure will look like:
```
STGCN
├── infer_cfg.yml
├── model.pdiparams
├── model.pdiparams.info
├── model.pdmodel
```
At this point, this model can be used in PP-Human.
**Note**: If the length of the video sequence or the number of keypoints is changed during training, the content of the `INFERENCE` field in the configuration file needs to be modified accordingly to correct prediction.
```yaml
# The dimension of the sequence data is (N,C,T,V,M)
INFERENCE:
name: 'STGCN_Inference_helper'
num_channels: 2 # Corresponding to C dimension
window_size: 50 # Corresponding to T dimension, please set it accordingly to the sequence length.
vertex_nums: 17 # Corresponding to V dimension, please set it accordingly to the number of keypoints
person_nums: 1 # Corresponding to M dimension
```
### Custom Action Output
In the skeleton-based action recognition, the classification result of the model represents the behavior type of the character in a certain period of time. The type of the corresponding classification is regarded as the action of the current period. Therefore, on the basis of completing the training and deployment of the custom model, the model output is directly used as the final result, and the displayed result of the visualization should be modified.
#### Modify Visual Output
At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result.

View File

@@ -0,0 +1,159 @@
# 基于视频分类的行为识别
## 数据准备
视频分类任务输入的视频格式一般为`.mp4``.avi`等格式视频或者是抽帧后的视频帧序列,标签则可以是`.txt`格式存储的文件。
对于打架识别任务,具体数据准备流程如下:
### 数据集下载
打架识别基于6个公开的打架、暴力行为相关数据集合并后的数据进行模型训练。公开数据集具体信息如下
| 数据集 | 下载连接 | 简介 | 标注 | 数量 | 时长 |
| ---- | ---- | ---------- | ---- | ---- | ---------- |
| Surveillance Camera Fight Dataset| https://github.com/sayibet/fight-detection-surv-dataset | 裁剪视频,监控视角 | 视频级别 | 打架150非打架150 | 2s |
| A Dataset for Automatic Violence Detection in Videos | https://github.com/airtlab/A-Dataset-for-Automatic-Violence-Detection-in-Videos | 裁剪视频,室内自行录制 | 视频级别 | 暴力行为115个场景2个机位共230 非暴力行为60个场景2个机位共120 | 几秒钟 |
| Hockey Fight Detection Dataset | https://www.kaggle.com/datasets/yassershrief/hockey-fight-vidoes?resource=download | 裁剪视频,非真实场景 | 视频级别 | 打架500非打架500 | 2s |
| Video Fight Detection Dataset | https://www.kaggle.com/datasets/naveenk903/movies-fight-detection-dataset | 裁剪视频,非真实场景 | 视频级别 | 打架100非打架101 | 2s |
| Real Life Violence Situations Dataset | https://www.kaggle.com/datasets/mohamedmustafa/real-life-violence-situations-dataset | 裁剪视频,非真实场景 | 视频级别 | 暴力行为1000非暴力行为1000 | 几秒钟 |
| UBI Abnormal Event Detection Dataset| http://socia-lab.di.ubi.pt/EventDetection/ | 未裁剪视频,监控视角 | 帧级别 | 打架216非打架784裁剪后二次标注打架1976非打架1630 | 原视频几秒到几分钟不等裁剪后2s |
打架暴力行为视频3956个非打架非暴力行为视频3501个共7457个视频每个视频几秒钟。
本项目为大家整理了前5个数据集下载链接[https://aistudio.baidu.com/aistudio/datasetdetail/149085](https://aistudio.baidu.com/aistudio/datasetdetail/149085)。
### 视频抽帧
首先下载PaddleVideo代码
```bash
git clone https://github.com/PaddlePaddle/PaddleVideo.git
```
假设PaddleVideo源码路径为PaddleVideo_root。
为了加快训练速度将视频进行抽帧。下面命令会根据视频的帧率FPS进行抽帧如FPS=30则每秒视频会抽取30帧图像。
```bash
cd ${PaddleVideo_root}
python data/ucf101/extract_rawframes.py dataset/ rawframes/ --level 2 --ext mp4
```
其中,假设视频已经存放在了`dataset`目录下,如果是其他路径请对应修改。打架(暴力)视频存放在`dataset/fight`中;非打架(非暴力)视频存放在`dataset/nofight`中。`rawframes`目录存放抽取的视频帧。
### 训练集和验证集划分
打架识别验证集1500条来自Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、UBI Abnormal Event Detection Dataset三个数据集。
也可根据下面的命令将数据按照8:2的比例划分成训练集和测试集
```bash
python split_fight_train_test_dataset.py "rawframes" 2 0.8
```
参数说明“rawframes”为视频帧存放的文件夹2表示目录结构为两级第二级表示每个行为对应的子文件夹0.8表示训练集比例。
其中`split_fight_train_test_dataset.py`文件在PaddleDetection中的`deploy/pipeline/tools`路径下。
执行完命令后会最终生成fight_train_list.txt和fight_val_list.txt两个文件。打架的标签为1非打架的标签为0。
### 视频裁剪
对于未裁剪的视频如UBI Abnormal Event Detection Dataset数据集需要先进行裁剪才能用于模型训练`deploy/pipeline/tools/clip_video.py`中给出了视频裁剪的函数`cut_video`,输入为视频路径,裁剪的起始帧和结束帧以及裁剪后的视频保存路径。
## 模型优化
### VideoMix
[VideoMix](https://arxiv.org/abs/2012.03457)是视频数据增强的方法之一是对图像数据增强CutMix的扩展可以缓解模型的过拟合问题。
与Mixup将两个视频片段的每个像素点按照一定比例融合不同的是VideoMix是每个像素点要么属于片段A要么属于片段B。输出结果是两个片段原始标签的加权和权重是两个片段各自的比例。
在baseline的基础上加入VideoMix数据增强后精度由87.53%提升至88.01%。
### 更大的分辨率
由于监控摄像头角度、距离等问题存在监控画面下人比较小的情况小目标行为的识别较困难尝试增大输入图像的分辨率模型精度由88.01%提升至89.06%。
## 新增行为
目前打架识别模型使用的是[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)套件中[PP-TSM](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md)并在PP-TSM视频分类模型训练流程的基础上修改适配完成模型训练。
请先参考[使用说明](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/usage.md)了解PaddleVideo模型库的使用。
| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 |
| ---- | ---- | ---------- | ---- | ---- | ---------- |
| 打架识别 | PP-TSM | 准确率89.06% | T4, 2s视频128ms | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) |
#### 模型训练
下载预训练模型:
```bash
wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_vd_ssld_v2_pretrained.pdparams
```
执行训练:
```bash
# 单卡训练
cd ${PaddleVideo_root}
python main.py --validate -c pptsm_fight_frames_dense.yaml
```
本方案针对的是视频的二分类问题,如果不是二分类,需要修改配置文件中`MODEL-->head-->num_classes`为具体的类别数目。
```bash
cd ${PaddleVideo_root}
# 多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -B -m paddle.distributed.launch --gpus=“0,1,2,3” \
--log_dir=log_pptsm_dense main.py --validate \
-c pptsm_fight_frames_dense.yaml
```
#### 模型评估
训练好的模型下载:[https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams)
模型评估:
```bash
cd ${PaddleVideo_root}
python main.py --test -c pptsm_fight_frames_dense.yaml \
-w ppTSM_fight_best.pdparams
```
其中`ppTSM_fight_best.pdparams`为训练好的模型。
#### 模型导出
导出inference模型
```bash
cd ${PaddleVideo_root}
python tools/export_model.py -c pptsm_fight_frames_dense.yaml \
-p ppTSM_fight_best.pdparams \
-o inference/ppTSM
```
#### 推理可视化
利用上步骤导出的模型基于PaddleDetection中推理pipeline可完成自定义行为识别及可视化。
新增行为后,需要对现有的可视化代码进行修改,目前代码支持打架二分类可视化,新增类别后需要根据识别结果自适应可视化推理结果。
具体修改PaddleDetection中develop/deploy/pipeline/pipeline.py路径下PipePredictor类中visualize_video成员函数。当结果中存在'video_action'数据时会对行为进行可视化。目前的逻辑是如果推理的类别为1则为打架行为进行可视化否则不进行显示即"video_action_score"为None。用户新增行为后可根据类别index和对应的行为设置"video_action_text"字段目前index=1对应"Fight"。相关代码块如下:
```
video_action_res = result.get('video_action')
if video_action_res is not None:
video_action_score = None
if video_action_res and video_action_res["class"] == 1:
video_action_score = video_action_res["score"]
mot_boxes = None
if mot_res:
mot_boxes = mot_res['boxes']
image = visualize_action(
image,
mot_boxes,
action_visual_collector=None,
action_text="SkeletonAction",
video_action_score=video_action_score,
video_action_text="Fight")
```

View File

@@ -0,0 +1,84 @@
简体中文 | [English](./detection_en.md)
# 目标检测任务二次开发
在目标检测算法产业落地过程中常常会出现需要额外训练以满足实际使用的要求项目迭代过程中也会出先需要修改类别的情况。本文档详细介绍如何使用PaddleDetection进行目标检测算法二次开发流程包括数据准备、模型优化思路和修改类别开发流程。
## 数据准备
二次开发首先需要进行数据集的准备针对场景特点采集合适的数据从而提升模型效果和泛化性能。然后使用LabemeLabelImg等标注工具标注目标检测框并将标注结果转化为COCO或VOC数据格式。详细文档可以参考[数据准备文档](../../tutorials/data/README.md)
## 模型优化
### 1. 使用自定义数据集训练
基于准备的数据在数据配置文件中修改对应路径,例如`configs/dataset/coco_detection.yml`:
```
metric: COCO
num_classes: 80
TrainDataset:
!COCODataSet
image_dir: train2017 # 训练集的图片所在文件相对于dataset_dir的路径
anno_path: annotations/instances_train2017.json # 训练集的标注文件相对于dataset_dir的路径
dataset_dir: dataset/coco # 数据集所在路径相对于PaddleDetection路径
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: val2017 # 验证集的图片所在文件相对于dataset_dir的路径
anno_path: annotations/instances_val2017.json # 验证集的标注文件相对于dataset_dir的路径
dataset_dir: dataset/coco # 数据集所在路径相对于PaddleDetection路径
TestDataset:
!ImageFolder
anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) # 标注文件所在文件 相对于dataset_dir的路径
dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' # 数据集所在路径相对于PaddleDetection路径
```
配置修改完成后,即可以启动训练评估,命令如下
```
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --eval
```
更详细的命令参考[30分钟快速上手PaddleDetection](../../tutorials/GETTING_STARTED_cn.md)
### 2. 加载COCO模型作为预训练
目前PaddleDetection提供的配置文件加载的预训练模型均为ImageNet数据集的权重加载到检测算法的骨干网络中实际使用时建议加载COCO数据集训练好的权重通常能够对模型精度有较大提升使用方法如下
#### 1) 设置预训练权重路径
COCO数据集训练好的模型权重均在各算法配置文件夹下例如`configs/ppyoloe`下提供了PP-YOLOE-l COCO数据集权重[链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) 。配置文件中设置`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams`
#### 2) 修改超参数
加载COCO预训练权重后需要修改学习率超参数例如`configs/ppyoloe/_base_/optimizer_300e.yml`中:
```
epoch: 120 # 原始配置为300epoch加载COCO权重后可以适当减少迭代轮数
LearningRate:
base_lr: 0.005 # 原始配置为0.025加载COCO权重后需要降低学习率
schedulers:
- !CosineDecay
max_epochs: 144 # 依据epoch数进行修改
- !LinearWarmup
start_factor: 0.
epochs: 5
```
## 修改类别
当实际使用场景类别发生变化时,需要修改数据配置文件,例如`configs/datasets/coco_detection.yml`中:
```
metric: COCO
num_classes: 10 # 原始类别80
```
配置修改完成后同样可以加载COCO预训练权重PaddleDetection支持自动加载shape匹配的权重对于shape不匹配的权重会自动忽略因此无需其他修改。

View File

@@ -0,0 +1,89 @@
[简体中文](./detection.md) | English
# Customize Object Detection task
In the practical application of object detection algorithms in a specific industry, additional training is often required for practical use. The project iteration will also need to modify categories. This document details how to use PaddleDetection for a customized object detection algorithm. The process includes data preparation, model optimization roadmap, and modifying the category development process.
## Data Preparation
Customization starts with the preparation of the dataset. We need to collect suitable data for the scenario features, so as to improve the model effect and generalization performance. Then Labeme, LabelImg and other labeling tools will be used to label the object detection bouding boxes and convert the labeling results into COCO or VOC data format. Details please refer to [Data Preparation](../../tutorials/data/PrepareDetDataSet_en.md)
## Model Optimization
### 1. Use customized dataset for training
Modify the corresponding path in the data configuration file based on the prepared data, for example:
configs/dataset/coco_detection.yml`:
```
metric: COCO
num_classes: 80
TrainDataset:
!COCODataSet
image_dir: train2017 # Path to the images of the training set relative to the dataset_dir
anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir
dataset_dir: dataset/coco # Path to the dataset relative to the PaddleDetection path
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: val2017 # Path to the images of the evaldataset set relative to the dataset_dir
anno_path: annotations/instances_val2017.json # Path to the annotation file of the evaldataset relative to the dataset_dir
dataset_dir: dataset/coco # Path to the dataset relative to the PaddleDetection path
TestDataset:
!ImageFolder
anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) # Path to the annotation files relative to dataset_di.
dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' # Path to the dataset relative to the PaddleDetection path
```
Once the configuration changes are completed, the training evaluation can be started with the following command
```
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --eval
```
More details please refer to [Getting Started for PaddleDetection](../../tutorials/GETTING_STARTED_cn.md)
###
### 2. Load the COCO model as pre-training
The currently provided pre-trained models in PaddleDetection's configurations are weights from the ImageNet dataset, loaded into the backbone network of the detection algorithm. For practical use, it is recommended to load the weights trained on the COCO dataset, which can usually provide a large improvement to the model accuracy. The method is as follows.
#### 1) Set pre-training weight path
The trained model weights for the COCO dataset are saved in the configuration folder of each algorithm, for example, PP-YOLOE-l COCO dataset weights are provided under `configs/ppyoloe`: [Link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) The configuration file sets`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams`
#### 2) Modify hyperparameters
After loading the COCO pre-training weights, the learning rate hyperparameters need to be modified, for example
In `configs/ppyoloe/_base_/optimizer_300e.yml`:
```
epoch: 120 # The original configuration is 300 epoch, after loading COCO weights, the iteration number can be reduced appropriately
LearningRate:
base_lr: 0.005 # The original configuration is 0.025, after loading COCO weights, the learning rate should be reduced.
schedulers:
- !CosineDecay
max_epochs: 144 # Modify based on the number of epochs
- LinearWarmup
start_factor: 0.
epochs: 5
```
## Modify categories
When the actual application scenario category changes, the data configuration file needs to be modified, for example in `configs/datasets/coco_detection.yml`:
```
metric: COCO
num_classes: 10 # original class 80
```
After the configuration changes are completed, the COCO pre-training weights can also be loaded. PaddleDetection supports automatic loading of shape-matching weights, and weights that do not match the shape are automatically ignored, so no other modifications are needed.

View File

@@ -0,0 +1,261 @@
简体中文 | [English](./keypoint_detection_en.md)
# 关键点检测任务二次开发
在实际场景中应用关键点检测算法不可避免地会出现需要二次开发的需求。包括对目前的预训练模型效果不满意希望优化模型效果或是目前的关键点点位定义不能满足实际场景需求希望新增或是替换关键点点位的定义训练新的关键点模型。本文档将介绍如何在PaddleDetection中对关键点检测算法进行二次开发。
## 数据准备
### 基本流程说明
在PaddleDetection中目前支持的标注数据格式为`COCO``MPII`。这两个数据格式的详细说明,可以参考文档[关键点数据准备](../../tutorials/data/PrepareKeypointDataSet.md)。在这一步中通过使用Labeme等标注工具依照特征点序号标注对应坐标。并转化成对应可训练的标注格式。建议使用`COCO`格式进行。
### 合并数据集
为了扩展使用的训练数据,合并多个不同的数据集一起训练是一个很直观的解决手段,但不同的数据集往往对关键点的定义并不一致。合并数据集的第一步是需要统一不同数据集的点位定义,确定标杆点位,即最终模型学习的特征点类型,然后根据各个数据集的点位定义与标杆点位定义之间的关系进行调整。
- 在标杆点位中的点:调整点位序号,使其与标杆点位一致
- 未在标杆点位中的点:舍去
- 数据集缺少标杆点位中的点:对应将标注的标志位记为“未标注”
在[关键点数据准备](../../tutorials/data/PrepareKeypointDataSet.md)中,提供了如何合并`COCO`数据集和`AI Challenger`数据集,并统一为以`COCO`为标杆点位定义的案例说明,供参考。
## 模型优化
### 检测-跟踪模型优化
在PaddleDetection中关键点检测能力支持Top-Down、Bottom-Up两套方案Top-Down先检测主体再检测局部关键点优点是精度较高缺点是速度会随着检测对象的个数增加Bottom-Up先检测关键点再组合到对应的部位上优点是速度快与检测对象个数无关缺点是精度较低。关于两种方案的详情及对应模型可参考[关键点检测系列模型](../../../configs/keypoint/README.md)
当使用Top-Down方案时模型效果依赖于前序的检测和跟踪效果如果实际场景中不能准确检测到行人位置会使关键点检测部分表现受限。如果在实际使用中遇到了上述问题请参考[目标检测任务二次开发](./detection.md)以及[多目标跟踪任务二次开发](./pphuman_mot.md)对检测/跟踪模型进行优化。
### 使用符合场景的数据迭代
目前发布的关键点检测算法模型主要在`COCO`/ `AI Challenger`等开源数据集上迭代,这部分数据集中可能缺少与实际任务较为相似的监控场景(视角、光照等因素)、体育场景(存在较多非常规的姿态)。使用更符合实际任务场景的数据进行训练,有助于提升模型效果。
### 使用预训练模型迭代
关键点模型的数据的标注复杂度较大,直接使用模型从零开始在业务数据集上训练,效果往往难以满足需求。在实际工程中使用时,建议加载已经训练好的权重,通常能够对模型精度有较大提升,以`HRNet`为例,使用方法如下:
```bash
python tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o pretrain_weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams
```
在加载预训练模型后,可以适当减小初始学习率和最终迭代轮数, 建议初始学习率取默认配置值的1/2至1/5并可开启`--eval`观察迭代过程中AP值的变化。
### 遮挡数据增强
关键点任务中有较多遮挡问题,包括自身遮挡与不同目标之间的遮挡。
1. 检测模型优化仅针对Top-Down方案
参考[目标检测任务二次开发](./detection.md),提升检测模型在复杂场景下的效果。
2. 关键点数据增强
在关键点模型训练中增加遮挡的数据增强,参考[PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/tiny_pose/tinypose_256x192.yml#L100)。有助于模型提升这类场景下的表现。
### 对视频预测进行平滑处理
关键点模型是在图片级别的基础上进行训练和预测的,对于视频类型的输入也是将视频拆分为帧进行预测。帧与帧之间虽然内容大多相似,但微小的差异仍然可能导致模型的输出发生较大的变化,表现为虽然预测的坐标大体正确,但视觉效果上有较大的抖动问题。通过添加滤波平滑处理,将每一帧预测的结果与历史结果综合考虑,得到最终的输出结果,可以有效提升视频上的表现。该部分内容可参考[滤波平滑处理](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/python/det_keypoint_unite_infer.py#L206)。
## 新增或修改关键点点位定义
### 数据准备
根据前述说明,完成数据的准备,放置于`{root of PaddleDetection}/dataset`下。
<details>
<summary><b> 标注文件示例</b></summary>
一个标注文件示例如下:
```
self_dataset/
├── train_coco_joint.json # 训练集标注文件
├── val_coco_joint.json # 验证集标注文件
├── images/ # 存放图片文件
   ├── 0.jpg
   ├── 1.jpg
   ├── 2.jpg
```
其中标注文件中需要注意的改动如下:
```json
{
"images": [
{
"file_name": "images/0.jpg",
"id": 0, # 图片id注意不可重复
"height": 1080,
"width": 1920
},
{
"file_name": "images/1.jpg",
"id": 1,
"height": 1080,
"width": 1920
},
{
"file_name": "images/2.jpg",
"id": 2,
"height": 1080,
"width": 1920
},
...
"categories": [
{
"supercategory": "person",
"id": 1,
"name": "person",
"keypoints": [ # 点位序号的名称
"point1",
"point2",
"point3",
"point4",
"point5",
],
"skeleton": [ # 点位构成的骨骼, 训练中非必要
[
1,
2
],
[
1,
3
],
[
2,
4
],
[
3,
5
]
]
...
"annotations": [
{
{
"category_id": 1, # 实例所属类别
"num_keypoints": 3, # 该实例已标注点数量
"bbox": [ # 检测框位置,格式为x, y, w, h
799,
575,
55,
185
],
# N*3 的列表内容为x, y, v。
"keypoints": [
807.5899658203125,
597.5455322265625,
2,
0,
0,
0, # 未标注的点记为000
805.8563232421875,
592.3446655273438,
2,
816.258056640625,
594.0783081054688,
2,
0,
0,
0
]
"id": 1, # 实例id不可重复
"image_id": 8, # 实例所在图像的id可重复。此时代表一张图像上存在多个目标
"iscrowd": 0, # 是否遮挡为0时参与训练
"area": 10175 # 实例所占面积可简单取为w * h。注意为0时会跳过过小时在eval时会被忽略
...
```
</details>
### 配置文件设置
在配置文件中,完整的含义参考[config yaml配置项说明](../../tutorials/KeyPointConfigGuide_cn.md)。以[HRNet模型配置](../../../configs/keypoint/hrnet/hrnet_w32_256x192.yml)为例,重点需要关注的内容如下:
<details>
<summary><b> 配置文件示例</b></summary>
一个配置文件的示例如下
```yaml
use_gpu: true
log_iter: 5
save_dir: output
snapshot_epoch: 10
weights: output/hrnet_w32_256x192/model_final
epoch: 210
num_joints: &num_joints 5 # 预测的点数与定义点数量一致
pixel_std: &pixel_std 200
metric: KeyPointTopDownCOCOEval
num_classes: 1
train_height: &train_height 256
train_width: &train_width 192
trainsize: &trainsize [*train_width, *train_height]
hmsize: &hmsize [48, 64]
flip_perm: &flip_perm [[1, 2], [3, 4]] # 注意只有含义上镜像对称的点才写到这里
...
# 保证dataset_dir + anno_path 能正确定位到标注文件位置
# 保证dataset_dir + image_dir + 标注文件中的图片路径能正确定位到图片
TrainDataset:
!KeypointTopDownCocoDataset
image_dir: images
anno_path: train_coco_joint.json
dataset_dir: dataset/self_dataset
num_joints: *num_joints
trainsize: *trainsize
pixel_std: *pixel_std
use_gt_bbox: True
EvalDataset:
!KeypointTopDownCocoDataset
image_dir: images
anno_path: val_coco_joint.json
dataset_dir: dataset/self_dataset
bbox_file: bbox.json
num_joints: *num_joints
trainsize: *trainsize
pixel_std: *pixel_std
use_gt_bbox: True
image_thre: 0.0
```
</details>
### 模型训练及评估
#### 模型训练
通过如下命令启动训练:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml
```
#### 模型评估
训练好模型之后,可以通过以下命令实现对模型指标的评估:
```bash
python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml
```
注意由于测试依赖pycocotools工具其默认为`COCO`数据集的17点如果修改后的模型并非预测17点直接使用评估命令会报错。
需要修改以下内容以获得正确的评估结果:
- [sigma列表](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/keypoint_utils.py#L219)表示每个关键点的范围方差越大则容忍度越高。其长度与预测点数一致。根据实际关键点可信区域设置区域精确的一般0.25-0.5例如眼睛。区域范围大的一般0.5-1.0例如肩膀。若不确定建议0.75。
- [pycocotools sigma列表](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L523)含义及内容同上取值与sigma列表一致。
### 模型导出及预测
#### Top-Down模型联合部署
```shell
#导出关键点模型
python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights={path_to_your_weights}
#detector 检测 + keypoint top-down模型联合部署联合推理只支持top-down方式
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_256x192/ --video_file=../video/xxx.mp4 --device=gpu
```
- 注意目前PP-Human中使用的为该方案
#### Bottom-Up模型独立部署
```shell
#导出模型
python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=output/higherhrnet_hrnet_w32_512/model_final.pdparams
#部署推理
python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=gpu --threshold=0.5
```

View File

@@ -0,0 +1,258 @@
[简体中文](./keypoint_detection.md) | English
# Customized Keypoint Detection
When applying keypoint detection algorithms in real practice, inevitably, we may need customization as we may dissatisfy with the current pre-trained model results, or the current keypoint detection cannot meet the actual demand, or we may want to add or replace the definition of keypoints and train a new keypoint detection model. This document will introduce how to customize the keypoint detection algorithm in PaddleDetection.
## Data Preparation
### Basic Process Description
PaddleDetection currently supports `COCO` and `MPII` annotation data formats. For detailed descriptions of these two data formats, please refer to the document [Keypoint Data Preparation](./../tutorials/data/PrepareKeypointDataSet.md). In this step, by using annotation tools such as Labeme, the corresponding coordinates are annotated according to the feature point serial numbers and then converted into the corresponding trainable annotation format. And we recommend `COCO` format.
### Merging datasets
To extend the training data, we can merge several different datasets together. But different datasets often have different definitions of key points. Therefore, the first step in merging datasets is to unify the point definitions of different datasets, and determine the benchmark points, i.e., the types of feature points finally learned by the model, and then adjust them according to the relationship between the point definitions of each dataset and the benchmark point definitions.
- Points in the benchmark point location: adjust the point number to make it consistent with the benchmark point location
- Points that are not in the benchmark points: discard
- Points in the dataset that are missing from the benchmark: annotate the marked points as "unannotated".
In [Key point data preparation](... /... /tutorials/data/PrepareKeypointDataSet.md), we provide a case illustration of how to merge the `COCO` dataset and the `AI Challenger` dataset and unify them as a benchmark point definition with `COCO` for your reference.
## Model Optimization
### Detection and tracking model optimization
In PaddleDetection, the keypoint detection supports Top-Down and Bottom-Up solutions. Top-Down first detects the main body and then detects the local key points. It has higher accuracy but will take a longer time as the number of detected objects increases.The Bottom-Up plan first detects the keypoints and then combines them with the corresponding parts. It is fast and its speed is independent of the number of detected objects. Its disadvantage is that the accuracy is relatively low. For details of the two solutions and the corresponding models, please refer to [Keypoint Detection Series Models](../../../configs/keypoint/README.md)
When using the Top-Down solution, the model's effects depend on the previous detection or tracking effect. If the pedestrian position cannot be accurately detected in the actual practice, the performance of the keypoint detection will be limited. If you encounter the above problem in actual application, please refer to [Customized Object Detection](./detection_en.md) and [Customized Multi-target tracking](./pphuman_mot_en.md) for optimization of the detection and tracking model.
### Iterate with scenario-compatible data
The currently released keypoint detection algorithm models are mainly iterated on open source datasets such as `COCO`/ `AI Challenger`, which may lack surveillance scenarios (angles, lighting and other factors), sports scenarios (more unconventional poses) that are more similar to the actual task. Training with data that more closely matches the actual task scenario can help improve the model's results.
### Iteration via pre-trained models
The data annotation of the keypoint model is complex, and using the model directly to train on the business dataset from scratch is often difficult to meet the demand. When used in practical projects, it is recommended to load the pre-trained weights, which usually improve the model accuracy significantly. Let's take `HRNet` as an example with the following method:
```
python tools/train.py \
-c configs/keypoint/hrnet/hrnet_w32_256x192.yml \
-o pretrain_weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams
```
After loading the pre-trained model, the initial learning rate and the rounds of iterations can be reduced appropriately. It is recommended that the initial learning rate be 1/2 to 1/5 of the default configuration, and you can enable`--eval` to observe the change of AP values during the iterations.
## Data augmentation with occlusion
There are a lot of data in occlusion in keypoint tasks, including self-covered objects and occlusion between different objects.
1. Detection model optimization (only for Top-Down solutions)
Refer to [Target Detection Task Secondary Development](. /detection.md) to improve the detection model in complex scenarios.
2. Keypoint data augmentation
Augmentation of covered data in keypoint model training to improve model performance in such scenarios, please refer to [PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/tiny_pose/)
### Smooth video prediction
The keypoint model is trained and predicted on the basis of image, and video input is also predicted by splitting the video into frames. Although the content is mostly similar between frames, small differences may still lead to large changes in the output of the model. As a result of that, although the predicted coordinates are roughly correct, there may be jitters in the visual effect.
By adding a smoothing filter process, the performance of the video output can be effectively improved by combining the predicted results of each frame and the historical results. For this part, please see [Filter Smoothing](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/python/det_keypoint_unite_infer.py#L206).
## Add or modify keypoint definition
### Data Preparation
Complete the data preparation according to the previous instructions and place it under `{root of PaddleDetection}/dataset`.
<details>
<summary><b> Examples of annotation file</b></summary>
```
self_dataset/
├── train_coco_joint.json # training set annotation file
├── val_coco_joint.json # Validation set annotation file
├── images/ # Store the image files
   ├── 0.jpg
   ├── 1.jpg
   ├── 2.jpg
```
Notable changes as follows:
```
{
"images": [
{
"file_name": "images/0.jpg",
"id": 0, # image id, id cannotdo not repeat
"height": 1080,
"width": 1920
},
{
"file_name": "images/1.jpg",
"id": 1,
"height": 1080,
"width": 1920
},
{
"file_name": "images/2.jpg",
"id": 2,
"height": 1080,
"width": 1920
},
...
"categories": [
{
"supercategory": "person",
"id": 1,
"name": "person",
"keypoints": [ # the name of the point serial number
"point1",
"point2",
"point3",
"point4",
"point5",
],
"skeleton": [ # Skeleton composed of points, not necessary for training
[
1,
2
],
[
1,
3
],
[
2,
4
],
[
3,
5
]
]
...
"annotations": [
{
{
"category_id": 1, # The category to which the instance belongs
"num_keypoints": 3, # the number of marked points of the instance
"bbox": [ # location of detection box,format is x, y, w, h
799,
575,
55,
185
],
# N*3 list of x, y, v.
"keypoints": [
807.5899658203125,
597.5455322265625,
2,
0,
0,
0, # unlabeled points noted as 0, 0, 0
805.8563232421875,
592.3446655273438,
2,
816.258056640625,
594.0783081054688,
2,
0,
0,
0
]
"id": 1, # the id of the instance, id cannot repeat
"image_id": 8, # The id of the image where the instance is located, repeatable. This represents the presence of multiple objects on a single image
"iscrowd": 0, # covered or not, when the value is 0, it will participate in training
"area": 10175 # the area occupied by the instance, can be simply taken as w * h. Note that when the value is 0, it will be skipped, and if it is too small, it will be ignored in eval
...
```
### Settings of configuration file
In the configuration file, refer to [config yaml configuration](... /... /tutorials/KeyPointConfigGuide_cn.md) for more details . Take [HRNet model configuration](... /... /... /configs/keypoint/hrnet/hrnet_w32_256x192.yml) as an example, we need to focus on following contents:
<details>
<summary><b> Example of configuration</b></summary>
```
use_gpu: true
log_iter: 5
save_dir: output
snapshot_epoch: 10
weights: output/hrnet_w32_256x192/model_final
epoch: 210
num_joints: &num_joints 5 # The number of predicted points matches the number of defined points
pixel_std: &pixel_std 200
Metric. keyPointTopDownCOCOEval
num_classes: 1
train_height: &train_height 256
train_width: &train_width 192
trainsize: &trainsize [*train_width, *train_height].
hmsize: &hmsize [48, 64].
flip_perm: &flip_perm [[1, 2], [3, 4]]. # Note that only points that are mirror-symmetric are recorded here.
...
# Ensure that dataset_dir + anno_path can correctly locate the annotation file
# Ensure that dataset_dir + image_dir + image path in annotation file can correctly locate the image.
TrainDataset:
!KeypointTopDownCocoDataset
image_dir: images
anno_path: train_coco_joint.json
dataset_dir: dataset/self_dataset
num_joints: *num_joints
trainsize. *trainsize
pixel_std: *pixel_std
use_gt_box: true
Evaluate the dataset.
!KeypointTopDownCocoDataset
image_dir: images
anno_path: val_coco_joint.json
dataset_dir: dataset/self_dataset
bbox_file: bbox.json
num_joints: *num_joints
trainsize. *trainsize
pixel_std: *pixel_std
use_gt_box: true
image_thre: 0.0
```
### Model Training and Evaluation
#### Model Training
Run the following command to start training:
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml
```
#### Model Evaluation
After training the model, you can evaluate the model metrics by running the following commands:
```
python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml
```
### Model Export and Inference
#### Top-Down model deployment
```
#Export keypoint model
python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights={path_to_your_weights}
#detector detection + keypoint top-down model co-deploymentfor top-down solutions only
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_256x192/ --video_file=../video/xxx.mp4 --device=gpu
```

View File

@@ -0,0 +1,295 @@
简体中文 | [English](./pphuman_attribute_en.md)
# 行人属性识别任务二次开发
## 数据准备
### 数据格式
格式采用PA100K的属性标注格式共有26位属性。
这26位属性的名称、位置、种类数量见下表。
| Attribute | index | length |
|:----------|:----------|:----------|
| 'Hat','Glasses' | [0, 1] | 2 |
| 'ShortSleeve','LongSleeve','UpperStride','UpperLogo','UpperPlaid','UpperSplice' | [2, 3, 4, 5, 6, 7] | 6 |
| 'LowerStripe','LowerPattern','LongCoat','Trousers','Shorts','Skirt&Dress' | [8, 9, 10, 11, 12, 13] | 6 |
| 'boots' | [14, ] | 1 |
| 'HandBag','ShoulderBag','Backpack','HoldObjectsInFront' | [15, 16, 17, 18] | 4 |
| 'AgeOver60', 'Age18-60', 'AgeLess18' | [19, 20, 21] | 3 |
| 'Female' | [22, ] | 1 |
| 'Front','Side','Back' | [23, 24, 25] | 3 |
举例:
[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
第一组,位置[0, 1]数值分别是[0, 1],表示'no hat'、'has glasses'。
第二组,位置[22, ]数值分别是[0, ], 表示gender属性是'male', 否则是'female'。
第三组,位置[23, 24, 25]数值分别是[0, 1, 0], 表示方向属性是侧面'side'。
其他组依次类推
### 数据标注
理解了上面`属性标注`格式的含义后就可以进行数据标注的工作。其本质是每张单人图建立一组26个长度的标注项分别与26个位置的属性值对应。
举例:
对于一张原始图片,
1 使用检测框,标注图片中每一个人的位置。
2 每一个检测框对应每一个人包含一组26位的属性值数组数组的每一位以0或1表示。对应上述26个属性。例如如果图片是'Female'则数组第22位为0如果满足'Age18-60',则位置[19, 20, 21]对应的数值是[0, 1, 0], 或者满足'AgeOver60',则相应数值为[1, 0, 0].
标注完成后利用检测框将每一个人截取成单人图其图片与26位属性标注建立对应关系。也可先截成单人图再进行标注效果相同。
## 模型训练
数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。
其主要有两步工作需要完成1将数据与标注数据整理成训练格式。2修改配置文件开始训练。
### 训练数据格式
训练数据包括训练使用的图片和一个训练列表train.txt其具体位置在训练配置中指定其放置方式示例如下
```
Attribute/
|-- data 训练图片文件夹
| |-- 00001.jpg
| |-- 00002.jpg
| `-- 0000x.jpg
`-- train.txt 训练数据列表
```
train.txt文件内为所有训练图片名称相对于根路径的文件路径+ 26个标注值
其每一行表示一个人的图片和标注结果。其格式为:
```
00001.jpg 0,0,1,0,....
```
注意1)图片与标注值之间是以Tab[\t]符号隔开, 2)标注值之间是以逗号[,]隔开。该格式不能错,否则解析失败。
### 修改配置开始训练
首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md):
```shell
git clone https://github.com/PaddlePaddle/PaddleClas
```
需要在配置文件`PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`中,修改的配置项如下:
```
DataLoader:
Train:
dataset:
name: MultiLabelDataset
image_root: "dataset/pa100k/" #指定训练图片所在根路径
cls_label_path: "dataset/pa100k/train_list.txt" #指定训练列表文件位置
label_ratio: True
transform_ops:
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/pa100k/" #指定评估图片所在根路径
cls_label_path: "dataset/pa100k/val_list.txt" #指定评估列表文件位置
label_ratio: True
transform_ops:
```
注意:
1. 这里image_root路径+train.txt中图片相对路径对应图片的完整路径位置。
2. 如果有修改属性数量,则还需修改内容配置项中属性种类数量:
```
# model architecture
Arch:
name: "PPLCNet_x1_0"
pretrained: True
use_ssld: True
class_num: 26 #属性种类数量
```
然后运行以下命令开始训练。
```
#多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
#单卡训练
python3 tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
```
训练完成后可以执行以下命令进行性能评估:
```
#多卡评估
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
#单卡评估
python3 tools/eval.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
```
### 模型导出
使用下述命令将训练好的模型导出为预测部署模型。
```
python3 tools/export_model.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer
```
导出模型后,需要下载[infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_cfg.yml)文件,并放置到导出的模型文件夹`PPLCNet_x1_0_person_attribute_infer`中。
使用时在PP-Human中的配置文件`./deploy/pipeline/config/infer_cfg_pphuman.yml`中修改新的模型路径`model_dir`项,并开启功能`enable: True`
```
ATTR:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_person_attribute_infer/ #新导出的模型路径位置
enable: True #开启功能
```
然后可以使用-->至此即完成新增属性类别识别任务。
## 属性增减
上述是以26个属性为例的标注、训练过程。
如果需要增加、减少属性数量,则需要:
1)标注时需增加新属性类别信息或删减属性类别信息;
2)对应修改训练中train.txt所使用的属性数量和名称
3)修改训练配置,例如``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml``文件中的属性数量,详细见上述`修改配置开始训练`部分。
增加属性示例:
1. 在标注数据时在26位后继续增加新的属性标注数值
2. 在train.txt文件的标注数值中也增加新的属性数值。
3. 注意属性类型在train.txt中属性数值列表中的位置的对应关系需要时固定的例如第[19, 20, 21]位表示年龄,所有图片都要使用[19, 20, 21]位置表示年龄,不再赘述。
<div width="500" align="center">
<img src="../../images/add_attribute.png"/>
</div>
删减属性同理。
例如,如果不需要年龄属性,则位置[19, 20, 21]的数值可以去掉。只需在train.txt中标注的26个数字中全部删除第19-21位数值即可同时标注数据时也不再需要标注这3位属性值。
## 修改后处理代码
修改了属性定义后pipeline后处理部分也需要做相应修改主要影响结果可视化时的显示结果。
相应代码在路径`deploy/pipeline/pphuman/attr_infer.py`文件中`postprocess`函数。
其函数实现说明如下:
```
# 函数入口
def postprocess(self, inputs, result):
# postprocess output of predictor
im_results = result['output']
# 1) 定义各组属性实际意义,其数量及位置与输出结果中占用位数一一对应。
labels = self.pred_config.labels
age_list = ['AgeLess18', 'Age18-60', 'AgeOver60']
direct_list = ['Front', 'Side', 'Back']
bag_list = ['HandBag', 'ShoulderBag', 'Backpack']
upper_list = ['UpperStride', 'UpperLogo', 'UpperPlaid', 'UpperSplice']
lower_list = [
'LowerStripe', 'LowerPattern', 'LongCoat', 'Trousers', 'Shorts',
'Skirt&Dress'
]
# 2 部分属性所用阈值与通用值有明显区别,单独设置
glasses_threshold = 0.3
hold_threshold = 0.6
batch_res = []
for res in im_results:
res = res.tolist()
label_res = []
# gender
# 3) 单个位置属性类别,判断该位置是否大于阈值,来分配二分类结果
gender = 'Female' if res[22] > self.threshold else 'Male'
label_res.append(gender)
# age
# 4多个位置属性类别N选一形式选择得分最高的属性
age = age_list[np.argmax(res[19:22])]
label_res.append(age)
# direction
direction = direct_list[np.argmax(res[23:])]
label_res.append(direction)
# glasses
glasses = 'Glasses: '
if res[1] > glasses_threshold:
glasses += 'True'
else:
glasses += 'False'
label_res.append(glasses)
# hat
hat = 'Hat: '
if res[0] > self.threshold:
hat += 'True'
else:
hat += 'False'
label_res.append(hat)
# hold obj
hold_obj = 'HoldObjectsInFront: '
if res[18] > hold_threshold:
hold_obj += 'True'
else:
hold_obj += 'False'
label_res.append(hold_obj)
# bag
bag = bag_list[np.argmax(res[15:18])]
bag_score = res[15 + np.argmax(res[15:18])]
bag_label = bag if bag_score > self.threshold else 'No bag'
label_res.append(bag_label)
# upper
# 5同一类属性分为两组这里是款式和花色每小组内单独选择相当于两组不同属性。
upper_label = 'Upper:'
sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve'
upper_label += ' {}'.format(sleeve)
upper_res = res[4:8]
if np.max(upper_res) > self.threshold:
upper_label += ' {}'.format(upper_list[np.argmax(upper_res)])
label_res.append(upper_label)
# lower
lower_res = res[8:14]
lower_label = 'Lower: '
has_lower = False
for i, l in enumerate(lower_res):
if l > self.threshold:
lower_label += ' {}'.format(lower_list[i])
has_lower = True
if not has_lower:
lower_label += ' {}'.format(lower_list[np.argmax(lower_res)])
label_res.append(lower_label)
# shoe
shoe = 'Boots' if res[14] > self.threshold else 'No boots'
label_res.append(shoe)
batch_res.append(label_res)
result = {'output': batch_res}
return result
```

View File

@@ -0,0 +1,223 @@
[简体中文](pphuman_attribute.md) | English
# Customized Pedestrian Attribute Recognition
## Data Preparation
### Data format
We use the PA100K attribute annotation format, with a total of 26 attributes.
The names, locations, and the number of these 26 attributes are shown in the table below.
| Attribute | index | length |
|:------------------------------------------------------------------------------- |:---------------------- |:------ |
| 'Hat','Glasses' | [0, 1] | 2 |
| 'ShortSleeve','LongSleeve','UpperStride','UpperLogo','UpperPlaid','UpperSplice' | [2, 3, 4, 5, 6, 7] | 6 |
| 'LowerStripe','LowerPattern','LongCoat','Trousers','Shorts','Skirt&Dress' | [8, 9, 10, 11, 12, 13] | 6 |
| 'boots' | [14, ] | 1 |
| 'HandBag','ShoulderBag','Backpack','HoldObjectsInFront' | [15, 16, 17, 18] | 4 |
| 'AgeOver60', 'Age18-60', 'AgeLess18' | [19, 20, 21] | 3 |
| 'Female' | [22, ] | 1 |
| 'Front','Side','Back' | [23, 24, 25] | 3 |
Examples:
[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
The first group: position [0, 1] values are [0, 1], which means'no hat', 'has glasses'.
The second group: position [22, ] values are [0, ], indicating that the gender attribute is 'male', otherwise it is 'female'.
The third group: position [23, 24, 25] values are [0, 1, 0], indicating that the direction attribute is 'side'.
Other groups follow in this order
### Data Annotation
After knowing the purpose of the above `attribute annotation` format, we can start to annotate data. The essence is that each single-person image creates a set of 26 annotation items, corresponding to the attribute values at 26 positions.
Examples:
For an original image:
1) Using bounding boxes to annotate the position of each person in the picture.
2) Each detection box (corresponding to each person) contains 26 attribute values which are represented by 0 or 1. It corresponds to the above 26 attributes. For example, if the picture is 'Female', then the 22nd bit of the array is 0. If the person is between 'Age18-60', then the corresponding value at position [19, 20, 21] is [0, 1, 0], or if the person matches 'AgeOver60', then the corresponding value is [1, 0, 0].
After the annotation is completed, the model will use the detection box to intercept each person into a single-person picture, and its picture establishes a corresponding relationship with the 26 attribute annotation. It is also possible to cut into a single-person image first and then annotate it. The results are the same.
## Model Training
Once the data is annotated, it can be used for model training to complete the optimization of the customized model.
There are two main steps: 1) Organize the data and annotated data into the training format. 2) Modify the configuration file to start training.
### Training data format
The training data includes the images used for training and a training list called train.txt. Its location is specified in the training configuration, with the following example:
```
Attribute/
|-- data Training images folder
|-- 00001.jpg
|-- 00002.jpg
| `-- 0000x.jpg
train.txt List of training data
```
train.txt file contains the names of all training images (file path relative to the root path) + 26 annotation values
Each line of it represents a person's image and annotation result. The format is as follows:
```
00001.jpg 0,0,1,0,....
```
Note 1) The images are separated by Tab[\t], 2) The annotated values are separated by commas [,]. If the format is wrong, the parsing will fail.
### Modify the configuration to start training
First run the following command to download the training code (for more environmental issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md)):
```
git clone https://github.com/PaddlePaddle/PaddleClas
```
You need to modify the following configuration in the configuration file `PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`
```
DataLoader:
Train:
Train: dataset:
name: MultiLabelDataset
image_root: "dataset/pa100k/" #Specify the root path of training image
cls_label_path: "dataset/pa100k/train_list.txt" #Specify the location of the training list file
label_ratio: True
transform_ops:
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/pa100k/" #Specify the root path of evaluated image
cls_label_path: "dataset/pa100k/val_list.txt" #Specify the location of the evaluation list file
label_ratio: True
transform_ops:
```
Note:
1. here image_root path and the relative path of the image in train.txt, corresponding to the full path of the image.
2. If you modify the number of attributes, the number of attribute types in the content configuration item should also be modified accordingly.
```
# model architecture
Arch:
name: "PPLCNet_x1_0"
pretrained: True
use_ssld: True
class_num: 26 #Attribute classes and numbers
```
Then run the following command to start training:
```
#Multi-card training
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
#Single card training
python3 tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
```
You can run the following commands for performance evaluation after the training is completed:
```
#Multi-card evaluation
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
#Single card evaluation
python3 tools/eval.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
```
### Model Export
Use the following command to export the trained model as an inference deployment model.
```
python3 tools/export_model.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer
```
After exporting the model, you need to download the [infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_cfg.yml) file and put it into the exported model folder `PPLCNet_x1_0_person_ attribute_infer` .
When you use the model, you need to modify the new model path `model_dir` entry and set `enable: True` in the configuration file of PP-Human `. /deploy/pipeline/config/infer_cfg_pphuman.yml` .
```
ATTR:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_person_attribute_infer/ #The exported model location
enable: True #Whether to enable the function
```
Now, the model is ready for you.
To this point, a new attribute category recognition task is completed.
## Adding or deleting attributes
The above is the annotation and training process with 26 attributes.
If the attributes need to be added or deleted, you need to
1) New attribute category information needs to be added or deleted when annotating the data.
2) Modify the number and name of attributes used in train.txt corresponding to the training.
3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above.
Example of adding attributes.
1. Continue to add new attribute annotation values after 26 values when annotating the data.
2. Add new attribute values to the annotated values in the train.txt file as well.
3. The above is the annotation and training process with 26 attributes.
If the attributes need to be added or deleted, you need to
1) New attribute category information needs to be added or deleted when annotating the data.
2) Modify the number and name of attributes used in train.txt corresponding to the training.
3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above.
Example of adding attributes.
1. Continue to add new attribute annotation values after 26 values when annotating the data.
2. Add new attribute values to the annotated values in the train.txt file as well.
3. Note that the correlation of attribute types and values in train.txt needs to be fixed, for example, the [19, 20, 21] position indicates age, and all images should use the [19, 20, 21] position to indicate age.
The same applies to the deletion of attributes.
For example, if the age attribute is not needed, the values in positions [19, 20, 21] can be removed. You can simply remove all the values in positions 19-21 from the 26 numbers marked in train.txt, and you no longer need to annotate these 3 attribute values.

View File

@@ -0,0 +1,63 @@
简体中文 | [English](./pphuman_mot_en.md)
# 多目标跟踪任务二次开发
在产业落地过程中应用多目标跟踪算法不可避免地会出现希望自定义类型的多目标跟踪的需求或是对已有多目标跟踪模型的优化以提升在特定场景下模型的效果。我们在本文档通过案例来介绍如何根据期望识别的行为来进行多目标跟踪方案的选择以及使用PaddleDetection进行多目标跟踪算法二次开发工作包括数据准备、模型优化思路和跟踪类别修改的开发流程。
## 数据准备
多目标跟踪模型方案采用[ByteTrack](https://arxiv.org/pdf/2110.06864.pdf)其中使用PP-YOLOE替换原文的YOLOX作为检测器使用BYTETracker作为跟踪器详细文档参考[ByteTrack](../../../configs/mot/bytetrack)。原文的ByteTrack只支持行人单类别PaddleDetection中也支持多类别同时进行跟踪。训练ByteTrack也就是训练检测器的过程只需要准备好检测标注即可不需要ReID标注信息即当成纯检测来做即可。数据集最好是连续视频中抽取出来的而不是无关联的图片集合。
二次开发首先需要进行数据集的准备针对场景特点采集合适的数据从而提升模型效果和泛化性能。然后使用LabemeLabelImg等标注工具标注目标检测框并将标注结果转化为COCO或VOC数据格式。详细文档可以参考[数据准备文档](../../tutorials/data/README.md)
## 模型优化
### 1. 使用自定义数据集训练
ByteTrack跟踪方案采用的数据集只需要有检测标注即可。参照[MOT数据集准备](../../../configs/mot)和[MOT数据集教程](docs/tutorials/data/PrepareMOTDataSet.md)。
```
# 单卡训练
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp
# 多卡训练
python -m paddle.distributed.launch --log_dir=log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp
```
更详细的命令参考[30分钟快速上手PaddleDetection](../../tutorials/GETTING_STARTED_cn.md)和[ByteTrack](../../../configs/mot/bytetrack/detector)
### 2. 加载COCO模型作为预训练
目前PaddleDetection提供的配置文件加载的预训练模型均为ImageNet数据集的权重加载到检测算法的骨干网络中实际使用时建议加载COCO数据集训练好的权重通常能够对模型精度有较大提升使用方法如下
#### 1) 设置预训练权重路径
COCO数据集训练好的模型权重均在各算法配置文件夹下例如`configs/ppyoloe`下提供了PP-YOLOE-l COCO数据集权重[链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) 。配置文件中设置`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams`
#### 2) 修改超参数
加载COCO预训练权重后需要修改学习率超参数例如`configs/ppyoloe/_base_/optimizer_300e.yml`中:
```
epoch: 120 # 原始配置为300epoch加载COCO权重后可以适当减少迭代轮数
LearningRate:
base_lr: 0.005 # 原始配置为0.025加载COCO权重后需要降低学习率
schedulers:
- !CosineDecay
max_epochs: 144 # 依据epoch数进行修改一般为epoch数的1.2倍
- !LinearWarmup
start_factor: 0.
epochs: 5
```
## 跟踪类别修改
当实际使用场景类别发生变化时,需要修改数据配置文件,例如`configs/datasets/coco_detection.yml`中:
```
metric: COCO
num_classes: 10 # 原始类别1
```
配置修改完成后同样可以加载COCO预训练权重PaddleDetection支持自动加载shape匹配的权重对于shape不匹配的权重会自动忽略因此无需其他修改。

View File

@@ -0,0 +1,65 @@
[简体中文](./pphuman_mot.md) | English
# Customized multi-object tracking task
When applying multi-object tracking algorithms in industrial applications, there will be inevitable demands for customized types of multi-object tracking or optimization of existing multi-object tracking models to improve the effectiveness of the models in specific scenarios. In this document, we present examples of how to choose a multi-object tracking solution based on the expected identified behavior, and how to use PaddleDetection for further development of multi-object tracking algorithms, including data preparation, model optimization ideas, and the development process of tracking category modification.
## Data Preparation
The multi-object tracking model scheme uses [ByteTrack](https://arxiv.org/pdf/2110.06864.pdf), which adopts PP-YOLOE to replace the original YOLOX as a detector and BYTETracker as a tracker, for details, please refer to [ByteTrack](... /... /... /configs/mot/bytetrack). The original ByteTrack only supports single pedestrian category, while PaddleDetection supports multiple categories for simultaneous tracking. Training ByteTrack, which is the process of training the detector, only requires the detection annotations to be prepared, and does not require ReID annotation information, i.e., it can be done as pure detection. The dataset should preferably be extracted from continuous video rather than a collection of unrelated images.
Customization starts with the preparation of the dataset. We need to collect suitable data for the scenario features, so as to improve the model effect and generalization performance. Then Labeme, LabelImg and other labeling tools will be used to label the object detection frame and convert the labeling results into COCO or VOC data format. Details please refer to [Data Preparation](../../tutorials/data/README.md)
## Model Optimization
### 1. Use customized data set for training
The dataset used by the ByteTrack tracking solution only needs detection annotations. Refer to [MOT dataset preparation](... /... /... /configs/mot) and [MOT dataset tutorial](docs/tutorials/data/PrepareMOTDataSet.md).
```
# Single card training
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp
# Multi-card training
python -m paddle.distributed.launch --log_dir=log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp
```
More details please refer to [Getting Started for PaddleDetection](../../tutorials/GETTING_STARTED_cn.md) and [ByteTrack](../../../configs/mot/bytetrack/detector)
### 2. Load the COCO model as the pre-trained model
The currently provided pre-trained models in PaddleDetection's configurations are weights from the ImageNet dataset, loaded into the backbone network of the detection algorithm. For practical use, it is recommended to load the weights trained on the COCO dataset, which can usually provide a large improvement to the model accuracy. The method is as follows.
#### 1) Set pre-training weight path
The trained model weights for the COCO dataset are saved in the configuration folder of each algorithm, for example, PP-YOLOE-l COCO dataset weights are provided under `configs/ppyoloe`: [Link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) The configuration file sets`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams`
#### 2) Modify hyperparameters
After loading the COCO pre-training weights, the learning rate hyperparameters need to be modified, for example
In `configs/ppyoloe/*base*/optimizer_300e.yml`:
```
epoch: 120 # The original configuration is 300 epoch, after loading COCO weights, the iteration number can be reduced appropriately
LearningRate:
base_lr: 0.005 # The original configuration is 0.025, after loading COCO weights, the learning rate should be reduced.
schedulers:
- !CosineDecay
max_epochs: 144 # Modified according to the number of epochs, usually 1.2 times the number of epochs
- LinearWarmup
start_factor: 0.
epochs: 5
```
## Modify categories
When the actual application scenario category changes, the data configuration file needs to be modified, for example in `configs/datasets/coco_detection.yml`:
```
metric: COCO
num_classes: 10 # original class 80
```
After the configuration changes are completed, the COCO pre-training weights can also be loaded. PaddleDetection supports automatic loading of shape-matching weights, and weights that do not match the shape are automatically ignored, so no other modifications are needed.

View File

@@ -0,0 +1,159 @@
简体中文 | [English](./pphuman_mtmct_en.md)
# 跨镜跟踪任务二次开发
## 数据准备
### 数据格式
跨镜跟踪使用行人REID技术实现其训练方式采用多分类模型训练使用时取分类softmax头部前的特征作为检索特征向量。
因此其格式与多分类任务相同。每一个行人分配一个专属id不同行人id不同同一行人在不同图片中的id相同。
例如图片0001.jpg、0003.jpg是同一个人0002.jpg、0004.jpg是不同的其他行人。则标注id为
```
0001.jpg 00001
0002.jpg 00002
0003.jpg 00001
0004.jpg 00003
...
```
依次类推。
### 数据标注
理解了上面`标注`格式的含义后就可以进行数据标注的工作。其本质是每张单人图建立一个标注项对应该行人分配的id。
举例:
对于一张原始图片,
1 使用检测框,标注图片中每一个人的位置。
2 每一个检测框对应每一个人包含一个int类型的id属性。例如上述举例中的0001.jpg中的人对应id1.
标注完成后利用检测框将每一个人截取成单人图其图片与id属性标注建立对应关系。也可先截成单人图再进行标注效果相同。
## 模型训练
数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。
其主要有两步工作需要完成1将数据与标注数据整理成训练格式。2修改配置文件开始训练。
### 训练数据格式
训练数据包括训练使用的图片和一个训练列表bounding_box_train.txt其具体位置在训练配置中指定其放置方式示例如下
```
REID/
|-- data 训练图片文件夹
| |-- 00001.jpg
| |-- 00002.jpg
| `-- 0000x.jpg
`-- bounding_box_train.txt 训练数据列表
```
bounding_box_train.txt文件内为所有训练图片名称相对于根路径的文件路径+ 1个id标注值
其每一行表示一个人的图片和id标注结果。其格式为
```
0001.jpg 00001
0002.jpg 00002
0003.jpg 00001
0004.jpg 00003
```
注意图片与标注值之间是以Tab[\t]符号隔开。该格式不能错,否则解析失败。
### 修改配置开始训练
首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md):
```shell
git clone https://github.com/PaddlePaddle/PaddleClas
```
需要在配置文件[softmax_triplet_with_center.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml)中,修改的配置项如下:
```
Head:
name: "FC"
embedding_size: *feat_dim
class_num: &class_num 751 #行人id总数量
DataLoader:
Train:
dataset:
name: "Market1501"
image_root: "./dataset/" #训练图片根路径
cls_label_path: "bounding_box_train" #训练文件列表
Eval:
Query:
dataset:
name: "Market1501"
image_root: "./dataset/" #评估图片根路径
cls_label_path: "query" #评估文件列表
```
注意:
1. 这里image_root路径+bounding_box_train.txt中图片相对路径对应图片存放的完整路径。
然后运行以下命令开始训练。
```
#多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
#单卡训练
python3 tools/train.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
```
训练完成后可以执行以下命令进行性能评估:
```
#多卡评估
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model
#单卡评估
python3 tools/eval.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model
```
### 模型导出
使用下述命令将训练好的模型导出为预测部署模型。
```
python3 tools/export_model.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model \
-o Global.save_inference_dir=deploy/models/strong_baseline_inference
```
导出模型后,下载[infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/REID/infer_cfg.yml)文件到新导出的模型文件夹'strong_baseline_inference'中。
使用时在PP-Human中的配置文件infer_cfg_pphuman.yml中修改模型路径`model_dir`并开启功能`enable`
```
REID:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/strong_baseline_inference/
enable: True
```
然后可以使用。至此完成模型开发。

View File

@@ -0,0 +1,165 @@
[简体中文](./pphuman_mtmct.md) | English
# Customized Multi-Target Multi-Camera Tracking Module of PP-Human
## Data Preparation
### Data Format
Multi-target multi-camera tracking, or mtmct is achieved by the pedestrian REID technique. It is trained with a multiclassification model and uses the features before the head of the classification softmax as the retrieval feature vector.
Therefore its format is the same as the multi-classification task. Each pedestrian is assigned an exclusive id, which is different for different pedestrians while the same pedestrian has the same id in different images.
For example, images 0001.jpg, 0003.jpg are the same person, 0002.jpg, 0004.jpg are different pedestrians. Then the labeled ids are.
```
0001.jpg 00001
0002.jpg 00002
0003.jpg 00001
0004.jpg 00003
...
```
### Data Annotation
After understanding the meaning of the `annotation` format above, we can work on the data annotation. The essence of data annotation is that each single person diagram creates an annotation item that corresponds to the id assigned to that pedestrian.
For example:
For an original picture
1) Use bouding boxes to annotate the position of each person in the picture.
2) Each bouding box (corresponding to each person) contains an int id attribute. For example, the person in 0001.jpg in the above example corresponds to id: 1.
After the annotation is completed, use the detection box to intercept each person into a single picture, the picture and id attribute annotation will establish a corresponding relationship. You can also first cut into a single image and then annotate, the result is the same.
## Model Training
Once the data is annotated, it can be used for model training to complete the optimization of the customized model.
There are two main steps to implement: 1) organize the data and annotated data into a training format. 2) modify the configuration file to start training.
### Training data format
The training data consists of the images used for training and a training list bounding_box_train.txt, the location of which is specified in the training configuration, with the following example placement.
```
REID/
|-- data Training image folder
|-- 00001.jpg
|-- 00002.jpg
|-- 0000x.jpg
`-- bounding_box_train.txt List of training data
```
bounding_box_train.txt file contains the names of all training images (file path relative to the root path) + 1 id annotation value
Each line represents a person's image and id annotation result. The format is as follows:
```
0001.jpg 00001
0002.jpg 00002
0003.jpg 00001
0004.jpg 00003
```
Note: The images are separated from the annotated values by a Tab[\t] symbol. This format must be correct, otherwise, the parsing will fail.
### Modify the configuration to start training
First, execute the following command to download the training code (for more environment issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md):
```
git clone https://github.com/PaddlePaddle/PaddleClas
```
You need to change the following configuration items in the configuration file [softmax_triplet_with_center.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/reid/strong_ baseline/softmax_triplet_with_center.yaml):
```
Head:
name: "FC"
embedding_size: *feat_dim
class_num: &class_num 751 #Total number of pedestrian ids
DataLoader:
Train:
dataset:
name: "Market1501"
image_root: ". /dataset/" #training image root path
cls_label_path: "bounding_box_train" #training_file_list
Eval:
Query:
dataset:
name: "Market1501"
image_root: ". /dataset/" #Evaluated image root path
cls_label_path: "query" #List of evaluation files
```
Note:
1. Here the image_root path + the relative path of the image in the bounding_box_train.txt corresponds to the full path where the image is stored.
Then run the following command to start the training.
```
#Multi-card training
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
#Single card training
python3 tools/train.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
```
After the training is completed, you may run the following commands for performance evaluation:
```
#Multi-card evaluation
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model
#Single card evaluation
python3 tools/eval.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model
```
### Model Export
Use the following command to export the trained model as an inference deployment model.
```
python3 tools/export_model.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model \
-o Global.save_inference_dir=deploy/models/strong_baseline_inference
```
After exporting the model, download the [infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/REID/infer_cfg.yml) file to the newly exported model folder 'strong_baseline_ inference'.
Change the model path `model_dir` in the configuration file `infer_cfg_pphuman.yml` in PP-Human and set `enable`.
```
REID:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/strong_baseline_inference/
enable: True
```
Now, the model is ready.

View File

@@ -0,0 +1,257 @@
简体中文 | [English](./ppvehicle_attribute_en.md)
# 车辆属性识别任务二次开发
## 数据准备
### 数据格式
车辆属性模型采用VeRi数据集的属性共计10种车辆颜色及9种车型, 具体如下:
```
# 车辆颜色
- "yellow"
- "orange"
- "green"
- "gray"
- "red"
- "blue"
- "white"
- "golden"
- "brown"
- "black"
# 车型
- "sedan"
- "suv"
- "van"
- "hatchback"
- "mpv"
- "pickup"
- "bus"
- "truck"
- "estate"
```
在标注文件中使用长度为19的序列来表示上述属性。
举例:
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
前10位中位序号0的值为1表示车辆颜色为`"yellow"`
后9位中位序号11的值为1表示车型为`"suv"`
### 数据标注
理解了上面`数据格式`的含义后就可以进行数据标注的工作。其本质是每张车辆的图片建立一组长度为19的标注项分别对应各项属性值。
举例:
对于一张原始图片,
1 使用检测框,标注图片中每台车辆的位置。
2 每一个检测框对应每辆车包含一组19位的属性值数组数组的每一位以0或1表示。对应上述19个属性分类。例如如果颜色是'orange'则数组索引为1的位置值为1如果车型是'sedan'则数组索引为10的位置值为1。
标注完成后利用检测框将每辆车截取成只包含单辆车的图片则图片与19位属性标注建立了对应关系。也可先截取再进行标注效果相同。
## 模型训练
数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。
其主要有两步工作需要完成1将数据与标注数据整理成训练格式。2修改配置文件开始训练。
### 训练数据格式
训练数据包括训练使用的图片和一个训练列表train.txt其具体位置在训练配置中指定其放置方式示例如下
```
Attribute/
|-- data 训练图片文件夹
| |-- 00001.jpg
| |-- 00002.jpg
| `-- 0000x.jpg
`-- train.txt 训练数据列表
```
train.txt文件内为所有训练图片名称相对于根路径的文件路径+ 19个标注值
其每一行表示一辆车的图片和标注结果。其格式为:
```
00001.jpg 0,0,1,0,....
```
注意1)图片与标注值之间是以Tab[\t]符号隔开, 2)标注值之间是以逗号[,]隔开。该格式不能错,否则解析失败。
### 修改配置开始训练
首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md):
```shell
git clone https://github.com/PaddlePaddle/PaddleClas
```
需要在[配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml)中,修改的配置项如下:
```yaml
DataLoader:
Train:
dataset:
name: MultiLabelDataset
image_root: "dataset/VeRi/" # the root path of training images
cls_label_path: "dataset/VeRi/train_list.txt" # the location of the training list file
label_ratio: True
transform_ops:
...
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/VeRi/" # the root path of evaluation images
cls_label_path: "dataset/VeRi/val_list.txt" # the location of the evaluation list file
label_ratio: True
transform_ops:
...
```
注意:
1. 这里image_root路径+train.txt中图片相对路径对应图片的完整路径位置。
2. 如果有修改属性数量,则还需修改内容配置项中属性种类数量:
```yaml
# model architecture
Arch:
name: "PPLCNet_x1_0"
pretrained: True
use_ssld: True
class_num: 19 #属性种类数量
```
然后运行以下命令开始训练。
```
#多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
#单卡训练
python3 tools/train.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
```
训练完成后可以执行以下命令进行性能评估:
```
#多卡评估
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
#单卡评估
python3 tools/eval.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
```
### 模型导出
使用下述命令将训练好的模型导出为预测部署模型。
```
python3 tools/export_model.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_vehicle_attribute_model
```
导出模型后如果希望在PP-Vehicle中使用则需要下载[预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip),解压并将其中的配置文件`infer_cfg.yml`文件,放置到导出的模型文件夹`PPLCNet_x1_0_vehicle_attribute_model`中。
使用时在PP-Vehicle中的配置文件`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`中修改新的模型路径`model_dir`项,并开启功能`enable: True`
```
VEHICLE_ATTR:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_vehicle_attribute_infer/ #新导出的模型路径位置
enable: True #开启功能
```
然后可以使用-->至此即完成新增属性类别识别任务。
## 属性增减
该过程与行人属性的增减过程相似,如果需要增加、减少属性数量,则需要:
1)标注时需增加新属性类别信息或删减属性类别信息;
2)对应修改训练中train.txt所使用的属性数量和名称
3)修改训练配置,例如``PaddleClas/blob/develop/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml``文件中的属性数量,详细见上述`修改配置开始训练`部分。
增加属性示例:
1. 在标注数据时在19位后继续增加新的属性标注数值
2. 在train.txt文件的标注数值中也增加新的属性数值。
3. 注意属性类型在train.txt中属性数值列表中的位置的对应关系需要固定。
<div width="500" align="center">
<img src="../../images/add_attribute.png"/>
</div>
删减属性同理。
## 修改后处理代码
修改了属性定义后pipeline后处理部分也需要做相应修改主要影响结果可视化时的显示结果。
相应代码在[文件](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/ppvehicle/vehicle_attr.py#L108)中`postprocess`函数。
其函数实现说明如下:
```python
# 在类的初始化函数中,定义了颜色/车型的名称
self.color_list = [
"yellow", "orange", "green", "gray", "red", "blue", "white",
"golden", "brown", "black"
]
self.type_list = [
"sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus", "truck",
"estate"
]
...
def postprocess(self, inputs, result):
# postprocess output of predictor
im_results = result['output']
batch_res = []
for res in im_results:
res = res.tolist()
attr_res = []
color_res_str = "Color: "
type_res_str = "Type: "
color_idx = np.argmax(res[:10]) # 前10项表示各项颜色得分取得分最大项作为颜色结果
type_idx = np.argmax(res[10:]) # 后9项表示各项车型得分取得分最大项作为车型结果
# 颜色和车型的得分都需要超过对应阈值,否则视为'UnKnown'
if res[color_idx] >= self.color_threshold:
color_res_str += self.color_list[color_idx]
else:
color_res_str += "Unknown"
attr_res.append(color_res_str)
if res[type_idx + 10] >= self.type_threshold:
type_res_str += self.type_list[type_idx]
else:
type_res_str += "Unknown"
attr_res.append(type_res_str)
batch_res.append(attr_res)
result = {'output': batch_res}
return result
```

View File

@@ -0,0 +1,271 @@
[简体中文](ppvehicle_attribute.md) | English
# Customized Vehicle Attribute Recognition
## Data Preparation
### Data Format
We use the VeRi attribute annotation format, with a total of 10 color and 9 model attributes shown as follows.
```
# colors
- "yellow"
- "orange"
- "green"
- "gray"
- "red"
- "blue"
- "white"
- "golden"
- "brown"
- "black"
# models
- "sedan"
- "suv"
- "van"
- "hatchback"
- "mpv"
- "pickup"
- "bus"
- "truck"
- "estate"
```
A sequence of length 19 is used in the annotation file to represent the above attributes.
Examples:
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
In the first 10 bits, the value of bit index 0 is 1, indicating that the vehicle color is `"yellow"`.
In the last 9 bits, the value of bit index 11 is 1, indicating that the model is `"suv"`.
### Data Annotation
After knowing the purpose of the above `Data format`, we can start to annotate data. The essence is that each single-vehicle image creates a set of 19 annotation items, corresponding to the attribute values at 19 positions.
Examples:
For an original image:
1) Using bounding boxes to annotate the position of each vehicle in the picture.
2) Each detection box (corresponding to each vehicle) contains 19 attribute values which are represented by 0 or 1. It corresponds to the above 19 attributes. For example, if the color is 'orange', then the index 1 bit of the array is 1. If the model is 'sedan', then the index 10 bit of the array is 1.
After the annotation is completed, the model will use the detection box to intercept each vehicle into a single-vehicle picture, and its picture establishes a corresponding relationship with the 19 attribute annotation. It is also possible to cut into a single-vehicle image first and then annotate it. The results are the same.
## Model Training
Once the data is annotated, it can be used for model training to complete the optimization of the customized model.
There are two main steps: 1) Organize the data and annotated data into the training format. 2) Modify the configuration file to start training.
### Training Data Format
The training data includes the images used for training and a training list called train.txt. Its location is specified in the training configuration, with the following example:
```
Attribute/
|-- data Training images folder
|-- 00001.jpg
|-- 00002.jpg
| `-- 0000x.jpg
train.txt List of training data
```
train.txt file contains the names of all training images (file path relative to the root path) + 19 annotation values
Each line of it represents a vehicle's image and annotation result. The format is as follows:
```
00001.jpg 0,0,1,0,....
```
Note 1) The images are separated by Tab[\t], 2) The annotated values are separated by commas [,]. If the format is wrong, the parsing will fail.
### Modify The Configuration To Start Training
First run the following command to download the training code (for more environmental issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md)):
```
git clone https://github.com/PaddlePaddle/PaddleClas
```
You need to modify the following configuration in the configuration file `PaddleClas/blob/develop/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml`
```yaml
DataLoader:
Train:
dataset:
name: MultiLabelDataset
image_root: "dataset/VeRi/" # the root path of training images
cls_label_path: "dataset/VeRi/train_list.txt" # the location of the training list file
label_ratio: True
transform_ops:
...
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/VeRi/" # the root path of evaluation images
cls_label_path: "dataset/VeRi/val_list.txt" # the location of the training list file
label_ratio: True
transform_ops:
...
```
Note:
1. here image_root path and the relative path of the image in train.txt, corresponding to the full path of the image.
2. If you modify the number of attributes, the number of attribute types in the content configuration item should also be modified accordingly.
```yaml
# model architecture
Arch:
name: "PPLCNet_x1_0"
pretrained: True
use_ssld: True
class_num: 19 # Number of attribute classes
```
Then run the following command to start training:
```bash
#Multi-card training
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
#Single card training
python3 tools/train.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
```
You can run the following commands for performance evaluation after the training is completed:
```
#Multi-card evaluation
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
#Single card evaluation
python3 tools/eval.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
```
### Model Export
Use the following command to export the trained model as an inference deployment model.
```
python3 tools/export_model.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_vehicle_attribute_model
```
After exporting the model, if want to use it in PP-Vehicle, you need to download the [deploy infer model](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) and copy `infer_cfg.yml` into the exported model folder `PPLCNet_x1_0_vehicle_attribute_model` .
When you use the model, you need to modify the new model path `model_dir` entry and set `enable: True` in the configuration file of PP-Vehicle `. /deploy/pipeline/config/infer_cfg_ppvehicle.yml` .
```
VEHICLE_ATTR:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_vehicle_attribute_infer/ #The exported model location
enable: True #Whether to enable the function
```
To this point, a new attribute category recognition task is completed.
## Adding or deleting attributes
This is similar to the increase and decrease process of pedestrian attributes.
If the attributes need to be added or deleted, you need to
1) New attribute category information needs to be added or deleted when annotating the data.
2) Modify the number and name of attributes used in train.txt corresponding to the training.
3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above.
Example of adding attributes.
1. Continue to add new attribute annotation values after 19 values when annotating the data.
2. Add new attribute values to the annotated values in the train.txt file as well.
3. The above is the annotation and training process with 19 attributes.
<div width="500" align="center">
<img src="../../images/add_attribute.png"/>
</div>
The same applies to the deletion of attributes.
## Modifications to post-processing code
After modifying the attribute definition, the post-processing part of the pipeline also needs to be modified accordingly, which mainly affects the display results when the results are visualized.
The code is at [file](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/ppvehicle/vehicle_attr.py#L108), that is, the `postprocess` function.
The function implementation is described as follows:
```python
# The name of the color/model is defined in the initialization function of the class
self.color_list = [
"yellow", "orange", "green", "gray", "red", "blue", "white",
"golden", "brown", "black"
]
self.type_list = [
"sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus", "truck",
"estate"
]
...
def postprocess(self, inputs, result):
# postprocess output of predictor
im_results = result['output']
batch_res = []
for res in im_results:
res = res.tolist()
attr_res = []
color_res_str = "Color: "
type_res_str = "Type: "
color_idx = np.argmax(res[:10]) # The first 10 items represent the color scores, and the item with the largest score is used as the color result
type_idx = np.argmax(res[10:]) # The last 9 items represent the model scores, and the item with the largest score is used as the model result.
# The score of color and model need to be larger than the corresponding threshold, otherwise it will be regarded as 'UnKnown'
if res[color_idx] >= self.color_threshold:
color_res_str += self.color_list[color_idx]
else:
color_res_str += "Unknown"
attr_res.append(color_res_str)
if res[type_idx + 10] >= self.type_threshold:
type_res_str += self.type_list[type_idx]
else:
type_res_str += "Unknown"
attr_res.append(type_res_str)
batch_res.append(attr_res)
result = {'output': batch_res}
return result
```

View File

@@ -0,0 +1,215 @@
简体中文 | [English](./ppvehicle_plate_en.md)
# 车牌识别任务二次开发
车牌识别任务采用PP-OCRv3模型在车牌数据集上进行fine-tune得到过程参考[PaddleOCR车牌应用介绍](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/applications/%E8%BD%BB%E9%87%8F%E7%BA%A7%E8%BD%A6%E7%89%8C%E8%AF%86%E5%88%AB.md)在CCPD2019数据集上进行了拓展。
## 数据准备
1. 对于CCPD2019、CCPD2020数据集我们提供了处理脚本[ccpd2ocr_all.py](../../../deploy/pipeline/tools/ccpd2ocr_all.py), 使用时跟CCPD2019、CCPD2020数据集文件夹放在同一目录下然后执行脚本即可在CCPD2019/PPOCR、CCPD2020/PPOCR目录下得到检测、识别模型的训练标注文件。训练时可以整合到一起使用。
2. 对于其他来源数据或者自标注数据,可以按如下格式整理训练列表文件:
- **车牌检测标注**
标注文件格式如下,中间用'\t'分隔:
```
" 图像文件路径 标注框标注信息"
CCPD2020/xxx.jpg [{"transcription": "京AD88888", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
```
标注框标注信息是包含多个字典的list有多少个标注框就有多少个字典对应字典中的 `points` 表示车牌框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 `transcription` 表示当前文本框的文字,***当其内容为“###”时,表示该文本框无效,在训练时会跳过。***
- **车牌字符识别标注**
标注文件的格式如下txt文件中默认请将图片路径和图片标签用'\t'分割,如用其他方式分割将造成训练报错。其中图片是对车牌字符的截图。
```
" 图像文件名 字符标注信息 "
CCPD2020/crop_imgs/xxx.jpg 京AD88888
```
## 模型训练
首先执行以下命令clone PaddleOCR库代码到训练机器
```
git clone git@github.com:PaddlePaddle/PaddleOCR.git
```
下载预训练模型:
```
#检测预训练模型:
mkdir models
cd models
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar -xf ch_PP-OCRv3_det_distill_train.tar
#识别预训练模型:
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
tar -xf ch_PP-OCRv3_rec_train.tar
cd ..
```
安装相关依赖环境:
```
cd PaddleOCR
pip install -r requirements.txt
```
然后进行训练相关配置修改。
### 修改配置
**检测模型配置项**
修改配置项包括以下3部分内容可以在训练时以命令行修改或者直接在配置文件`configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml`中修改:
1. 模型存储和训练相关:
- Global.pretrained_model: 指向前面下载的PP-OCRv3文本检测预训练模型地址
- Global.eval_batch_step: 模型多少step评估一次一般设置为一个epoch对应的step数可以从训练开始的log中读取。此处以[0, 772]为例第一个数字表示从第0各step开始算起。
2. 优化器相关:
- Optimizer.lr.name: 学习率衰减器设为常量 Const
- Optimizer.lr.learning_rate: 做 fine-tune 实验学习率需要设置的比较小此处学习率设为配置文件中的0.05倍
- Optimizer.lr.warmup_epoch: warmup_epoch设为0
3. 数据集相关:
- Train.dataset.data_dir指向训练集图片存放根目录
- Train.dataset.label_file_list指向训练集标注文件
- Eval.dataset.data_dir指向测试集图片存放根目录
- Eval.dataset.label_file_list指向测试集标注文件
**识别模型配置项**
修改配置项包括以下3部分内容可以在训练时以命令行修改或者直接在配置文件`configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml`中修改:
1. 模型存储和训练相关:
- Global.pretrained_model: 指向PP-OCRv3文本识别预训练模型地址
- Global.eval_batch_step: 模型多少step评估一次一般设置为一个epoch对应的step数可以从训练开始的log中读取。此处以[0, 90]为例第一个数字表示从第0各step开始算起。
2. 优化器相关
- Optimizer.lr.name: 学习率衰减器设为常量 Const
- Optimizer.lr.learning_rate: 做 fine-tune 实验学习率需要设置的比较小此处学习率设为配置文件中的0.05倍
- Optimizer.lr.warmup_epoch: warmup_epoch设为0
3. 数据集相关
- Train.dataset.data_dir指向训练集图片存放根目录
- Train.dataset.label_file_list指向训练集标注文件
- Eval.dataset.data_dir指向测试集图片存放根目录
- Eval.dataset.label_file_list指向测试集标注文件
### 执行训练
然后运行以下命令开始训练。如果在配置文件中已经做了修改,可以省略`-o`及其后面的内容。
**检测模型训练命令**
```
#单卡训练
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \
Global.save_model_dir=output/CCPD/det \
Global.eval_batch_step="[0, 772]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/ccpd_data/ \
Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/det.txt]
#多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \
Global.save_model_dir=output/CCPD/det \
Global.eval_batch_step="[0, 772]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/ccpd_data/ \
Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/det.txt]
```
训练完成后可以执行以下命令进行性能评估:
```
#单卡评估
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
Eval.dataset.data_dir=/home/aistudio/ccpd_data/ \
Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/det.txt]
```
**识别模型训练命令**
```
#单卡训练
python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
Global.save_model_dir=output/CCPD/rec/ \
Global.eval_batch_step="[0, 90]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/ccpd_data \
Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/rec.txt] \
Eval.dataset.data_dir=/home/aistudio/ccpd_data \
Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/rec.txt]
#多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
Global.save_model_dir=output/CCPD/rec/ \
Global.eval_batch_step="[0, 90]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/ccpd_data \
Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/rec.txt] \
Eval.dataset.data_dir=/home/aistudio/ccpd_data \
Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/rec.txt]
```
训练完成后可以执行以下命令进行性能评估:
```
#单卡评估
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
Eval.dataset.data_dir=/home/aistudio/ccpd_data/ \
Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/rec.txt]
```
### 模型导出
使用下述命令将训练好的模型导出为预测部署模型。
**检测模型导出**
```
python tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
Global.save_inference_dir=output/det/infer
```
**识别模型导出**
```
python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
Global.save_inference_dir=output/CCPD/rec/infer
```
使用时在PP-Vehicle中的配置文件`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`中修改`VEHICLE_PLATE`模块中的`det_model_dir``rec_model_dir`项,并开启功能`enable: True`
```
VEHICLE_PLATE:
det_model_dir: [YOUR_DET_INFERENCE_MODEL_PATH] #设置检测模型路径
det_limit_side_len: 736
det_limit_type: "max"
rec_model_dir: [YOUR_REC_INFERENCE_MODEL_PATH] #设置识别模型路径
rec_image_shape: [3, 48, 320]
rec_batch_num: 6
word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt
enable: True #开启功能
```
然后可以使用-->至此即完成更新车牌识别模型任务。

View File

@@ -0,0 +1,235 @@
简体中文 | [English](./ppvehicle_violation_en.md)
# 车辆违章任务二次开发
车辆违章任务的二次开发主要集中于车道线分割模型任务。采用PP-LiteSeg模型在车道线数据集bdd100k,上进行fine-tune得到过程参考[PP-LiteSeg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/configs/pp_liteseg/README.md)。
## 数据准备
ppvehicle违法分析将车道线类别分为4类
```
0 背景
1 双黄线
2 实线
3 虚线
```
1. 对于bdd100k数据集可以结合我们的提供的处理脚本[lane_to_mask.py](../../../deploy/pipeline/tools/lane_to_mask.py)和bdd100k官方[repo](https://github.com/bdd100k/bdd100k)将数据处理成分割需要的数据格式.
```
#首先执行以下命令clone bdd100k库
git clone https://github.com/bdd100k/bdd100k.git
#拷贝lane_to_mask.py到bdd100k目录
cp PaddleDetection/deploy/pipeline/tools/lane_to_mask.py bdd100k/
#准备bdd100k环境
cd bdd100k && pip install -r requirements.txt
#数据转换
python lane_to_mask.py -i dataset/labels/lane/polygons/lane_train.json -o /output_path
# -i bdd100k数据集label的json路径
# -o 生成的mask图像路径
```
2. 整理数据,按如下格式存放数据
```
dataset_root
|
|--images
| |--train
| |--image1.jpg
| |--image2.jpg
| |--...
| |--val
| |--image3.jpg
| |--image4.jpg
| |--...
| |--test
| |--image5.jpg
| |--image6.jpg
| |--...
|
|--labels
| |--train
| |--label1.jpg
| |--label2.jpg
| |--...
| |--val
| |--label3.jpg
| |--label4.jpg
| |--...
| |--test
| |--label5.jpg
| |--label6.jpg
| |--...
|
```
运行[create_dataset_list.py](../../../deploy/pipeline/tools/create_dataset_list.py)生成txt文件
```
python create_dataset_list.py <dataset_root> #数据根目录
--type custom #数据类型支持cityscapes、custom
```
其他数据以及数据标注可参考PaddleSeg[准备自定义数据集](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/data/marker/marker_cn.md)
## 模型训练
首先执行以下命令clone PaddleSeg库代码到训练机器
```
git clone https://github.com/PaddlePaddle/PaddleSeg.git
```
安装相关依赖环境:
```
cd PaddleSeg
pip install -r requirements.txt
```
### 准备配置文件
详细可参考PaddleSeg[准备配置文件](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/config/pre_config_cn.md).
本例用pp_liteseg_stdc2_bdd100k_1024x512.yml示例
```
batch_size: 16
iters: 50000
train_dataset:
type: Dataset
dataset_root: data/bdd100k #数据集路径
train_path: data/bdd100k/train.txt #数据集训练txt文件
num_classes: 4 #ppvehicle将道路分为4类
mode: train
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [512, 1024]
- type: RandomHorizontalFlip
- type: RandomAffine
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
val_dataset:
type: Dataset
dataset_root: data/bdd100k #数据集路径
val_path: data/bdd100k/val.txt #数据集验证集txt文件
num_classes: 4
mode: val
transforms:
- type: Normalize
optimizer:
type: sgd
momentum: 0.9
weight_decay: 4.0e-5
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01 #0.01
end_lr: 0
power: 0.9
loss:
types:
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
coef: [1, 1,1]
model:
type: PPLiteSeg
backbone:
type: STDC2
pretrained: https://bj.bcebos.com/paddleseg/dygraph/PP_STDCNet2.tar.gz #预训练模型
```
### 执行训练
```
#单卡训练
export CUDA_VISIBLE_DEVICES=0 # Linux上设置1张可用的卡
# set CUDA_VISIBLE_DEVICES=0 # Windows上设置1张可用的卡
python train.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--do_eval \
--use_vdl \
--save_interval 500 \
--save_dir output
```
### 训练参数解释
```
--do_eval 是否在保存模型时启动评估, 启动时将会根据mIoU保存最佳模型至best_model
--use_vdl 是否开启visualdl记录训练数据
--save_interval 500 模型保存的间隔步数
--save_dir output 模型输出路径
```
## 2、多卡训练
如果想要使用多卡训练的话需要将环境变量CUDA_VISIBLE_DEVICES指定为多卡不指定时默认使用所有的gpu)并使用paddle.distributed.launch启动训练脚本windows下由于不支持nccl无法使用多卡训练:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3 # 设置4张可用的卡
python -m paddle.distributed.launch train.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--do_eval \
--use_vdl \
--save_interval 500 \
--save_dir output
```
训练完成后可以执行以下命令进行性能评估:
```
#单卡评估
python val.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--model_path output/iter_1000/model.pdparams
```
### 模型导出
使用下述命令将训练好的模型导出为预测部署模型。
```
python export.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--model_path output/iter_1000/model.pdparams \
--save_dir output/inference_model
```
使用时在PP-Vehicle中的配置文件`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`中修改`LANE_SEG`模块中的`model_dir`项.
```
LANE_SEG:
lane_seg_config: deploy/pipeline/config/lane_seg_config.yml
model_dir: output/inference_model
```
然后可以使用-->至此即完成更新车道线分割模型任务。

View File

@@ -0,0 +1,240 @@
English | [简体中文](./ppvehicle_violation.md)
# Customized Vehicle Violation
The secondary development of vehicle violation task mainly focuses on the task of lane line segmentation model. PP-LiteSeg model is used to get the lane line data set bdd100k through fine-tune. The process is referred to [PP-LiteSeg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/configs/pp_liteseg/README.md)。
## Data preparation
ppvehicle violation analysis divides the lane line into 4 categories
```
0 Background
1 double yellow line
2 Solid line
3 Dashed line
```
1. For the bdd100k data set, we can combine the processing script provided by [lane_to_mask.py](../../../deploy/pipeline/tools/lane_to_mask.py) and bdd100k [repo](https://github.com/bdd100k/bdd100k) to process the data into the data format required for segmentation.
```
# clone bdd100k
git clone https://github.com/bdd100k/bdd100k.git
# copy lane_to_mask.py to bdd100k/
cp PaddleDetection/deploy/pipeline/tools/lane_to_mask.py bdd100k/
# preparation bdd100k env
cd bdd100k && pip install -r requirements.txt
#bdd100k to mask
python lane_to_mask.py -i dataset/labels/lane/polygons/lane_train.json -o /output_path
# -i means input path for bdd100k dataset label json
# -o for output patn
```
2. Organize data and store data in the following format:
```
dataset_root
|
|--images
| |--train
| |--image1.jpg
| |--image2.jpg
| |--...
| |--val
| |--image3.jpg
| |--image4.jpg
| |--...
| |--test
| |--image5.jpg
| |--image6.jpg
| |--...
|
|--labels
| |--train
| |--label1.jpg
| |--label2.jpg
| |--...
| |--val
| |--label3.jpg
| |--label4.jpg
| |--...
| |--test
| |--label5.jpg
| |--label6.jpg
| |--...
|
```
run [create_dataset_list.py](../../../deploy/pipeline/tools/create_dataset_list.py) create txt file
```
python create_dataset_list.py <dataset_root> #dataset path
--type custom #dataset typesupport cityscapes、custom
```
For other data and data annotation, please refer to PaddleSeg [Prepare Custom Datasets](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/data/marker/marker_cn.md)
## model training
clone PaddleSeg
```
git clone https://github.com/PaddlePaddle/PaddleSeg.git
```
prepapation env
```
cd PaddleSeg
pip install -r requirements.txt
```
### Prepare configuration file
For details, please refer to PaddleSeg [prepare configuration file](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/config/pre_config_cn.md).
exp: pp_liteseg_stdc2_bdd100k_1024x512.yml
```
batch_size: 16
iters: 50000
train_dataset:
type: Dataset
dataset_root: data/bdd100k #dataset path
train_path: data/bdd100k/train.txt #dataset train txt
num_classes: 4 #lane classes
mode: train
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [512, 1024]
- type: RandomHorizontalFlip
- type: RandomAffine
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
val_dataset:
type: Dataset
dataset_root: data/bdd100k #dataset path
val_path: data/bdd100k/val.txt #dataset val txt
num_classes: 4
mode: val
transforms:
- type: Normalize
optimizer:
type: sgd
momentum: 0.9
weight_decay: 4.0e-5
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01 #0.01
end_lr: 0
power: 0.9
loss:
types:
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
coef: [1, 1,1]
model:
type: PPLiteSeg
backbone:
type: STDC2
pretrained: https://bj.bcebos.com/paddleseg/dygraph/PP_STDCNet2.tar.gz #Pre-training model
```
### training model
```
#Single GPU training
export CUDA_VISIBLE_DEVICES=0 # Linux
# set CUDA_VISIBLE_DEVICES=0 # Windows
python train.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--do_eval \
--use_vdl \
--save_interval 500 \
--save_dir output
```
### Explanation of training parameters
```
--do_eval Whether to start the evaluation when saving the model. When starting, the best model will be saved to best according to mIoU model
--use_vdl Whether to enable visualdl to record training data
--save_interval 500 Number of steps between model saving
--save_dir output Model output path
```
## 2、Multiple GPUs training
if you want to use multiple gpus training, you need to set the environment variable CUDA_VISIBLE_DEVICES is specified as multiple gpus (if not specified, all gpus will be used by default), and the training script will be started using paddle.distributed.launch (because nccl is not supported under windows, multi-card training cannot be used):
```
export CUDA_VISIBLE_DEVICES=0,1,2,3 # 4 gpus
python -m paddle.distributed.launch train.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--do_eval \
--use_vdl \
--save_interval 500 \
--save_dir output
```
After training, you can execute the following commands for performance evaluation:
```
python val.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--model_path output/iter_1000/model.pdparams
```
### Model export
Use the following command to export the trained model as a prediction deployment model.
```
python export.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--model_path output/iter_1000/model.pdparams \
--save_dir output/inference_model
```
Profile in PP-Vehicle when used `./deploy/pipeline/config/infer_cfg_ppvehicle.yml` set `model_dir` in `LANE_SEG`.
```
LANE_SEG:
lane_seg_config: deploy/pipeline/config/lane_seg_config.yml
model_dir: output/inference_model
```
Then you can use -->to finish the task of updating the lane line segmentation model.