更换文档检测模型

This commit is contained in:
2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions

View File

@@ -0,0 +1,405 @@
简体中文 | [English](./CHANGELOG_en.md)
# 版本更新信息
## 最新版本信息
### 2.6(02.15/2023)
- 特色模型
- 发布旋转框检测模型PP-YOLOE-RAnchor-free旋转框检测SOTA模型精度速度双高、云边一体s/m/l/x四个模型适配不用算力硬件、部署友好避免使用特殊算子能够轻松使用TensorRT加速
- 发布小目标检测模型PP-YOLOE-SOD基于切图的端到端检测方案、基于原图的检测模型精度达VisDrone开源最优
- 发布密集检测模型基于PP-YOLOE+的密集检测算法SKU数据集检测精度60.3,达到开源最优
- 前沿算法
- YOLO家族新增前沿算法YOLOv8更新YOLOv6-v3.0
- 新增目标检测算法DINOYOLOF
- 新增ViTDet系列检测模型PP-YOLOE+ViT_base, Mask RCNN + ViT_base, Mask RCNN + ViT_large
- 新增多目标跟踪算法CenterTrack
- 新增旋转框检测算法FCOSR
- 新增实例分割算法QueryInst
- 新增3D关键点检测算法Metro3d
- 新增模型蒸馏算法FGDLDCWD新增PP-YOLOE+模型蒸馏精度提升1.1 mAP
- 新增半监督检测算法 DenseTeacher并适配PP-YOLOE+
- 新增少样本迁移学习方案包含Co-tuningContrastive learning两类算法
- 场景能力
- PP-Human v2开源边缘端实时检测模型精度45.7Jetson AGX速度80FPS
- PP-Vehicle开源边缘端实时检测模型精度53.5Jetson AGX速度80FPS
- PP-Human v2PP-Vehicle支持多路视频流部署能力实现Jetson AGX 4路视频流端到端20FPS实时部署
- PP-Vehicle新增车辆压线检测和车辆逆行检测能力
- 框架能力
- 功能新增
- 新增检测热力图可视化能力适配FasterRCNN/MaskRCNN系列, PP-YOLOE系列, BlazeFace, SSD, RetinaNet
- 功能完善/Bug修复
- 支持python3.10版本
- EMA支持过滤不更新参数
- 简化PP-YOLOE architecture架构代码
- AdamW适配paddle2.4.1版本
### 2.5(08.26/2022)
- 特色模型
- PP-YOLOE+
- 发布PP-YOLOE+模型COCO test2017数据集精度提升0.7%-2.4% mAP模型训练收敛速度提升3.75倍端到端预测速度提升1.73-2.3倍
- 发布智慧农业夜间安防检测工业质检场景预训练模型精度提升1.3%-8.1% mAP
- 支持分布式训练、在线量化、serving部署等10大高性能训练部署能力新增C++/Python Serving、TRT原生推理、ONNX Runtime等5+部署demo教程
- PP-PicoDet
- 发布PicoDet-NPU模型支持模型全量化部署
- 新增PicoDet版面分析模型基于FGD蒸馏算法精度提升0.5% mAP
- PP-TinyPose
- 发布PP-TinyPose增强版在健身、舞蹈等场景的业务数据集端到端AP提升9.1% AP
- 覆盖侧身、卧躺、跳跃、高抬腿等非常规动作
- 新增滤波稳定模块,关键点稳定性显著增强
- 场景能力
- PP-Human v2
- 发布PP-Human v2支持四大产业特色功能多方案行为识别案例库、人体属性识别、人流检测与轨迹留存以及高精度跨镜跟踪
- 底层算法能力升级行人检测精度提升1.5% mAP行人跟踪精度提升10.2% MOTA轻量级模型速度提升34%属性识别精度提升0.6% ma轻量级模型速度提升62.5%
- 提供全流程教程覆盖数据采集标注模型训练优化和预测部署及pipeline中后处理代码修改
- 新增在线视频流输入支持
- 易用性提升,一行代码执行功能,执行流程判断、模型下载背后自动完成。
- PP-Vehicle
- 全新发布PP-Vehicle支持四大交通场景核心功能车牌识别、属性识别、车流量统计、违章检测
- 车牌识别支持基于PP-OCR v3的轻量级车牌识别模型
- 车辆属性识别支持基于PP-LCNet多标签分类模型
- 兼容图片、视频、在线视频流等各类数据输入格式
- 易用性提升,一行代码执行功能,执行流程判断、模型下载背后自动完成。
- 前沿算法
- YOLO家族全系列模型
- 发布YOLO家族全系列模型覆盖前沿检测算法YOLOv5、YOLOv6及YOLOv7
- 基于ConvNext骨干网络YOLO各算法训练周期缩5-8倍精度普遍提升1%-5% mAP使用模型压缩策略实现精度无损的同时速度提升30%以上
- 新增基于ViT骨干网络高精度检测模型COCO数据集精度达到55.7% mAP
- 新增OC-SORT多目标跟踪模型
- 新增ConvNeXt骨干网络
- 产业实践范例教程
- 基于PP-TinyPose增强版的智能健身动作识别
- 基于PP-Human的打架识别
- 基于PP-Human的营业厅来客分析
- 基于PP-Vehicle的车辆结构化分析
- 基于PP-YOLOE+的PCB电路板缺陷检测
- 框架能力
- 功能新增
- 新增自动压缩工具支持并提供demoPP-YOLOE l版本精度损失0.3% mAPV100速度提升13%
- 新增PaddleServing python/C++和ONNXRuntime部署demo
- 新增PP-YOLOE 端到端TensorRT部署demo
- 新增FGC蒸馏算法RetinaNet精度提升3.3%
- 新增分布式训练文档
- 功能完善/Bug修复
- 修复Windows c++部署编译问题
- 修复VOC格式数据预测时保存结果问题
- 修复FairMOT c++部署检测框输出
- 旋转框检测模型S2ANet支持batch size>1部署
### 2.4(03.24/2022)
- PP-YOLOE
- 发布PP-YOLOE特色模型l版本COCO test2017数据集精度51.6%V100预测速度78.1 FPS精度速度服务器端SOTA
- 发布s/m/l/x系列模型打通TensorRT、ONNX部署能力
- 支持混合精度训练训练较PP-YOLOv2加速33%
- PP-PicoDet:
- 发布PP-PicoDet优化模型精度提升2%左右CPU预测速度提升63%。
- 新增参数量0.7M的PicoDet-XS模型
- 后处理集成到网络中,优化端到端部署成本
- 行人分析Pipeline
- 发布PP-Human行人分析Pipeline覆盖行人检测、属性识别、行人跟踪、跨镜跟踪、人流量统计、动作识别多种功能打通TensorRT部署
- 属性识别支持StrongBaseline模型
- ReID支持Centroid模型
- 动作识别支持ST-GCN摔倒检测
- 模型丰富度:
- 发布YOLOX支持nano/tiny/s/m/l/x版本x版本COCO val2017数据集精度51.8%
- 框架功能优化:
- EMA训练速度优化20%优化EMA训练模型保存方式
- 支持infer预测结果保存为COCO格式
- 部署优化:
- RCNN全系列模型支持Paddle2ONNX导出ONNX模型
- SSD模型支持导出时融合解码OP优化边缘端部署速度
- 支持NMS导出TensorRTTensorRT部署端到端速度提升
### 2.3(11.03/2021)
- 特色模型:
- 检测: 轻量级移动端检测模型PP-PicoDet精度速度达到移动端SOTA
- 关键点: 轻量级移动端关键点模型PP-TinyPose
- 模型丰富度:
- 检测:
- 新增Swin-Transformer目标检测模型
- 新增TOOD(Task-aligned One-stage Object Detection)模型
- 新增GFL(Generalized Focal Loss)目标检测模型
- 发布Sniper小目标检测优化方法支持Faster RCNN及PP-YOLO系列模型
- 发布针对EdgeBoard优化的PP-YOLO-EB模型
- 跟踪
- 发布实时跟踪系统PP-Tracking
- 发布FairMot高精度模型、小尺度模型和轻量级模型
- 发布行人、人头和车辆实跟踪垂类模型库,覆盖航拍监控、自动驾驶、密集人群、极小目标等场景
- DeepSORT模型适配PP-YOLO, PP-PicoDet等更多检测器
- 关键点
- 新增Lite HRNet模型
- 预测部署:
- YOLOv3系列模型支持NPU预测部署
- FairMot模型C++预测部署打通
- 关键点系列模型C++预测部署打通, Paddle Lite预测部署打通
- 文档:
- 新增各系列模型英文文档
### 2.2(08.10/2021)
- 模型丰富度:
- 发布Transformer检测模型DETR、Deformable DETR、Sparse RCNN
- 关键点检测新增Dark模型发布Dark HRNet模型
- 发布MPII数据集HRNet关键点检测模型
- 发布人头、车辆跟踪垂类模型
- 模型优化:
- 旋转框检测模型S2ANet发布Align Conv优化模型DOTA数据集mAP优化至74.0
- 预测部署
- 主流模型支持batch size>1预测部署包含YOLOv3PP-YOLOFaster RCNNSSDTTFNetFCOS
- 新增多目标跟踪模型(JDE, FairMot, DeepSort) Python端预测部署支持并支持TensorRT预测
- 新增多目标跟踪模型FairMot联合关键点检测模型部署Python端预测部署支持
- 新增关键点检测模型联合PP-YOLO预测部署支持
- 文档:
- Windows预测部署文档新增TensorRT版本说明
- FAQ文档更新发布
- 问题修复:
- 修复PP-YOLO系列模型训练收敛性问题
- 修复batch size>1时无标签数据训练问题
### 2.1(05.20/2021)
- 模型丰富度提升:
- 发布关键点模型HRNetHigherHRNet
- 发布多目标跟踪模型DeepSort, FairMot, JDE
- 框架基础能力:
- 支持无标注框训练
- 预测部署:
- Paddle Inference YOLOv3系列模型支持batch size>1预测
- 旋转框检测S2ANet模型预测部署打通
- 增加量化模型Benchmark
- 增加动态图模型与静态图模型Paddle-Lite demo
- 检测模型压缩:
- 发布PPYOLO系列模型压缩模型
- 文档:
- 更新快速开始,预测部署等教程文档
- 新增ONNX模型导出教程
- 新增移动端部署文档
### 2.0(04.15/2021)
**说明:** 自2.0版本开始动态图作为PaddleDetection默认版本`dygraph`目录切换为根目录,原静态图实现移动到`static`目录下。
- 动态图模型丰富度提升:
- 发布PP-YOLOv2及PP-YOLO tiny模型PP-YOLOv2 COCO test数据集精度达到49.5%V100预测速度达到68.9 FPS
- 发布旋转框检测模型S2ANet
- 发布两阶段实用模型PSS-Det
- 发布人脸检测模型Blazeface
- 新增基础模块:
- 新增SENetGhostNetRes2Net骨干网络
- 新增VisualDL训练可视化支持
- 新增单类别精度计算及PR曲线绘制功能
- YOLO系列模型支持NHWC数据格式
- 预测部署:
- 发布主要模型的预测benchmark数据
- 适配TensorRT6支持TensorRT动态尺寸输入支持TensorRT int8量化预测
- PP-YOLO, YOLOv3, SSD, TTFNet, FCOS, Faster RCNN等7类模型在Linux、Windows、NV Jetson平台下python/cpp/TRT预测部署打通:
- 检测模型压缩:
- 蒸馏新增动态图蒸馏支持并发布YOLOv3-MobileNetV1蒸馏模型
- 联合策略:新增动态图剪裁+蒸馏联合策略压缩方案并发布YOLOv3-MobileNetV1的剪裁+蒸馏压缩模型
- 问题修复:修复动态图量化模型导出问题
- 文档:
- 新增动态图英文文档:包含首页文档,入门使用,快速开始,模型算法、新增数据集等
- 新增动态图中英文安装文档
- 新增动态图RCNN系列和YOLO系列配置文件模板及配置项说明文档
## 历史版本信息
### 2.0-rc(02.23/2021)
- 动态图模型丰富度提升:
- 优化RCNN模型组网及训练方式RCNN系列模型精度提升(依赖Paddle develop或2.0.1版本)
- 新增支持SSDLiteFCOSTTFNetSOLOv2系列模型
- 新增行人和车辆垂类目标检测模型
- 新增动态图基础模块:
- 新增MobileNetV3HRNet骨干网络
- 优化RoIAlign计算逻辑RCNN系列模型精度提升(依赖Paddle develop或2.0.1版本)
- 新增支持Synchronized Batch Norm
- 新增支持Modulated Deformable Convolution
- 预测部署:
- 发布动态图python、C++、Serving部署解决方案及文档支持Faster RCNNMask RCNNYOLOv3PP-YOLOSSDTTFNetFCOSSOLOv2等系列模型预测部署
- 动态图预测部署支持TensorRT模式FP32FP16推理加速
- 检测模型压缩:
- 裁剪新增动态图裁剪支持并发布YOLOv3-MobileNetV1裁剪模型
- 量化新增动态图量化支持并发布YOLOv3-MobileNetV1和YOLOv3-MobileNetV3量化模型
- 文档:
- 新增动态图入门教程文档:包含安装说明,快速开始,准备数据,训练/评估/预测流程文档
- 新增动态图进阶教程文档:包含模型压缩、推理部署文档
- 新增动态图模型库文档
### v2.0-beta(12.20/2020)
- 动态图支持:
- 支持Faster-RCNN, Mask-RCNN, FPN, Cascade Faster/Mask RCNN, YOLOv3和SSD模型试用版本。
- 模型提升:
- 更新PP-YOLO MobileNetv3 large和small模型精度提升并新增裁剪和蒸馏后的模型。
- 新功能:
- 支持VisualDL可视化数据预处理图片。
- Bug修复:
- 修复BlazeFace人脸关键点预测bug。
### v0.5.0(11/2020)
- 模型丰富度提升:
- 发布SOLOv2系列模型其中SOLOv2-Light-R50-VD-DCN-FPN 模型在单卡V100上达到 38.6 FPS加速24% COCO验证集精度达到38.8%, 提升2.4绝对百分点。
- 新增Android移动端检测demo包括SSD、YOLO系列模型可直接扫码安装体验。
- 移动端模型优化:
- 新增PACT新量化策略YOLOv3-Mobilenetv3在COCO数据集上比普通量化相比提升0.7%。
- 易用性提升及功能组件:
- 增强generate_proposal_labels算子功能规避模型出nan风险。
- 修复deploy下python与C++预测若干问题。
- 统一COCO与VOC数据集下评估流程支持输出单类AP和P-R曲线。
- PP-YOLO支持矩形输入图像。
- 文档:
- 新增目标检测全流程教程新增Jetson平台部署教程。
### v0.4.0(07/2020)
- 模型丰富度提升:
- 发布PPYOLO模型COCO数据集精度达到45.2%单卡V100预测速度达到72.9 FPS精度和预测速度优于YOLOv4模型。
- 新增TTFNet模型base版本对齐竞品COCO数据集精度达到32.9%。
- 新增HTC模型base版本对齐竞品COCO数据集精度达到42.2%。
- 新增BlazeFace人脸关键点检测模型在Wider-Face数据集的Easy-Set精度达到85.2%。
- 新增ACFPN模型 COCO数据集精度达到39.6%。
- 发布服务器端通用目标检测模型包含676类相同策略在COCO数据集上V100为19.5FPS时COCO mAP可以达到49.4%。
- 移动端模型优化:
- 新增SSDLite系列优化模型包括新增GhostNet的Backbone新增FPN组件等精度提升0.5%-1.5%。
- 易用性提升及功能组件:
- 新增GridMask, RandomErasing数据增强方法。
- 新增Matrix NMS支持。
- 新增EMA(Exponential Moving Average)训练支持。
- 新增多机训练方法两机相对于单机平均加速比80%,多机训练支持待进一步验证。
### v0.3.0(05/2020)
- 模型丰富度提升:
- 添加Efficientdet-D0模型速度与精度优于竞品。
- 新增YOLOv4预测模型精度对齐竞品新增YOLOv4在Pascal VOC数据集上微调训练精度达到85.5%。
- YOLOv3新增MobileNetV3骨干网络COCO数据集精度达到31.6%。
- 添加Anchor-free模型FCOS精度优于竞品。
- 添加Anchor-free模型CornernetSqueeze精度优于竞品优化模型的COCO数据集精度38.2%, +3.7%速度较YOLOv3-Darknet53快5%。
- 添加服务器端实用目标检测模型CascadeRCNN-ResNet50vd模型速度与精度优于竞品EfficientDet。
- 移动端推出3种模型
- SSDLite系列模型SSDLite-Mobilenetv3 small/large模型精度优于竞品。
- YOLOv3移动端方案: YOLOv3-MobileNetv3模型压缩后加速3.5倍速度和精度均领先于竞品的SSDLite模型。
- RCNN移动端方案CascadeRCNN-MobileNetv3经过系列优化, 推出输入图像分别为320x320和640x640的模型速度与精度具有较高性价比。
- 预测部署重构:
- 新增Python预测部署流程支持RCNNYOLOSSDRetinaNet人脸系列模型支持视频预测。
- 重构C++预测部署,提高易用性。
- 易用性提升及功能组件:
- 增加AutoAugment数据增强。
- 升级检测库文档结构。
- 支持迁移学习自动进行shape匹配。
- 优化mask分支评估阶段内存占用。
### v0.2.0(02/2020)
- 新增模型:
- 新增基于CBResNet模型。
- 新增LibraRCNN模型。
- 进一步提升YOLOv3模型精度基于COCO数据精度达到43.2%相比上个版本提升1.4%。
- 新增基础模块:
- 主干网络: 新增CBResNet。
- loss模块: YOLOv3的loss支持细粒度op组合。
- 正则模块: 新增DropBlock模块。
- 功能优化和改进:
- 加速YOLOv3数据预处理整体训练提速40%。
- 优化数据预处理逻辑,提升易用性。
- 增加人脸检测预测benchmark数据。
- 增加C++预测引擎Python API预测示例。
- 检测模型压缩 :
- 裁剪: 发布MobileNet-YOLOv3裁剪方案和模型基于VOC数据FLOPs - 69.6%, mAP + 1.4%基于COCO数据FLOPS-28.8%, mAP + 0.9%; 发布ResNet50vd-dcn-YOLOv3裁剪方案和模型基于COCO数据集FLOPS - 18.4%, mAP + 0.8%。
- 蒸馏: 发布MobileNet-YOLOv3蒸馏方案和模型基于VOC数据mAP + 2.8%基于COCO数据mAP + 2.1%。
- 量化: 发布YOLOv3-MobileNet和BlazeFace的量化模型。
- 裁剪+蒸馏: 发布MobileNet-YOLOv3裁剪+蒸馏方案和模型基于COCO数据FLOPS - 69.6%基于TensorRT预测加速64.5%mAP - 0.3 %; 发布ResNet50vd-dcn-YOLOv3裁剪+蒸馏方案和模型基于COCO数据FLOPS - 43.7%基于TensorRT预测加速24.0%mAP + 0.6 %。
- 搜索: 开源BlazeFace-Nas的完成搜索方案。
- 预测部署:
- 集成 TensorRT支持FP16、FP32、INT8量化推理加速。
- 文档:
- 增加详细的数据预处理模块介绍文档以及实现自定义数据Reader文档。
- 增加如何新增算法模型的文档。
- 文档部署到网站: https://paddledetection.readthedocs.io
### 12/2019
- 增加Res2Net模型。
- 增加HRNet模型。
- 增加GIOU loss和DIOU loss。
### 21/11/2019
- 增加CascadeClsAware RCNN模型。
- 增加CBNetResNet200和Non-local模型。
- 增加SoftNMS。
- 增加Open Image V5数据集和Objects365数据集模型。
### 10/2019
- 增加增强版YOLOv3模型精度高达41.4%。
- 增加人脸检测模型BlazeFace、Faceboxes。
- 丰富基于COCO的模型精度高达51.9%。
- 增加Objects365 2019 Challenge上夺冠的最佳单模型之一CACascade-RCNN。
- 增加行人检测和车辆检测预训练模型。
- 支持FP16训练。
- 增加跨平台的C++推理部署方案。
- 增加模型压缩示例。
### 2/9/2019
- 增加GroupNorm模型。
- 增加CascadeRCNN+Mask模型。
### 5/8/2019
- 增加Modulated Deformable Convolution系列模型。
### 29/7/2019
- 增加检测库中文文档
- 修复R-CNN系列模型训练同时进行评估的问题
- 新增ResNext101-vd + Mask R-CNN + FPN模型
- 新增基于VOC数据集的YOLOv3模型
### 3/7/2019
- 首次发布PaddleDetection检测库和检测模型库
- 模型包括Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD.

View File

@@ -0,0 +1,415 @@
English | [简体中文](./CHANGELOG.md)
# Version Update Information
## Last Version Information
### 2.6(02.15/2023)
- Featured model
- Release rotated object detector PP-YOLOE-RSOTA Anchor-free rotated object detection model with high accuracy and efficiency. It has a series of models, named s/m/l/x, for cloud and edge devices and avoids using special operators to be deployed friendly with TensorRT.
- Release small object detector PP-YOLOE-SOD: End-to-end detection pipeline based on sliced images and SOTA model on VisDrone based on original images.
- Release crowded object detector: Crowded object detection model with top accuracy on SKU dataset.
- Functions in different scenarios
- Release real-time object detection model on edge device in PP-Human v2. The model reaches 45.7mAP and 80FPS on Jetson AGX
- Release real-time object detection model on edge device in PP-Vehicle. The model reaches 53.5mAP and 80FPS on Jetson AGX
- Support multi-stream deployment in PP-Human v2 and PP-Vehicle. Achieved 20FPS in 4-stream deployment on Jetson AGX
- Support retrograde and press line detection in PP-Vehicle
- Cutting-edge algorithms
- Release YOLOv8 and YOLOv6 3.0 in YOLO Family
- Release object detection algorithm DINO, YOLOF
- Rich ViTDet series including PP-YOLOE+ViT_base, Mask RCNN + ViT_base, Mask RCNN + ViT_large
- Release MOT algorithm CenterTrack
- Release oriented object detection algorithm FCOSR
- Release instance segmentation algorithm QueryInst
- Release 3D keypoint detection algorithm Metro3d
- Release distillation algorithm FGDLDCWD and PP-YOLOE+ distillation with improvement of 1.1+ mAP
- Release SSOD algorithm DenseTeacher and adapt for PP-YOLOE+
- Release few shot finetuning algorithm, including Co-tuning and Contrastive learning
- Framework capabilities
- New functions
- Release Grad-CAM for heatmap visualization. Support Faster RCNN, Mask RCNN, PP-YOLOE, BlazeFace, SSD, RetinaNet.
- Improvement and fixes
- Support python 3.10
- Fix EMA for no-grad parameters
- Simplify PP-YOLOE architecture
- Support AdamW for Paddle 2.4.1
### 2.5(08.26/2022)
- Featured model
- PP-YOLOE+
- Released PP-YOLOE+ model, with a 0.7%-2.4% mAP improvement on COCO test2017. 3.75 times faster model training convergence rate and 1.73-2.3 times faster end-to-end inference speed
- Released pre-trained models for smart agriculture, night security detection, and industrial quality inspection with 1.3%-8.1% mAP accuracy improvement
- supports 10 high-performance training deployment capabilities, including distributed training, online quantization, and serving deployment. We also provide more than five new deployment demos, such as C++/Python Serving, TRT native inference, and ONNX Runtime
- PP-PicoDet
- Release the PicoDet-NPU model to support full quantization of model deployment
- Add PicoDet layout analysis model with 0.5% mAP accuracy improvement due to FGD distillation algorithm
- PP-TinyPose
- Release PP-TinyPose Plus with 9.1% end-to-end AP improvement for business data sets such as physical exercises, dance, and other scenarios
- Covers unconventional movements such as turning to one side, lying down, jumping, high lift
- Add stabilization module (via filter) to significantly improve the stability at key points
- Functions in different scenarios
- PP-Human v2
- Release PP-Human v2, which supports four industrial features: behavioral recognition case zoo for multiple solutions, human attribute recognition, human traffic detection and trajectory retention, as well as high precision multi-camera tracking
- Upgraded underlying algorithm capabilities: 1.5% mAP improvement in pedestrian detection accuracy; 10.2% MOTA improvement in pedestrian tracking accuracy, 34% speed improvement in the lightweight model; 0.6% ma improvement in attribute recognition accuracy, 62.5% speed improvement in the lightweight model
- Provides comprehensive tutorials covering data collection and annotation, model training optimization and prediction deployment, and post-processing code modification in the pipeline
- Supports online video streaming input
- Become more user-friendly with a one-line code execution function that automates the process determination and model download
- PP-Vehicle
- Launch PP-Vehicle, which supports four core functions for traffic application: license plate recognition, attribute recognition, traffic flow statistics, and violation detection
- License plate recognition supports a lightweight model based on PP-OCR v3
- Vehicle attribute recognition supports a multi-label classification model based on PP-LCNet
- Compatible with various data input formats such as pictures, videos and online video streaming
- Become more user-friendly with a one-line code execution function that automates the process determination and model download
- Cutting-edge algorithms
- YOLO Family
- Release the full range of YOLO family models covering the cutting-edge detection algorithms YOLOv5, YOLOv6 and YOLOv7
- Based on the ConvNext backbone network, YOLO's algorithm training periods are reduced by 5-8 times with accuracy generally improving by 1%-5% mAP; Thanks to the model compression strategy, its speed increased by over 30% with no loss of precision.
- Newly add high precision detection model based on [ViT](configs/vitdet) backbone network, with a 55.7% mAP accuracy on the COCO dataset
- Newly add multi-object tracking model [OC-SORT](configs/mot/ocsort)
- Newly add [ConvNeXt](configs/convnext) backbone network.
- Industrial application
- Intelligent physical exercise recognition based on PP-TinyPose Plus
- Fighting recognition based on PP-Human
- Business hall visitor analysis based on PP-Human
- Vehicle structuring analysis based on PP-Vehicle
- PCB board defect detection based on PP-YOLOE+
- Framework capabilities
- New functions
- Release auto-compression tools and demos, 0.3% mAP accuracy loss for PP-YOLOE l version, while 13% speed increase for V100
- Release PaddleServing python/C++ and ONNXRuntime deployment demos
- Release PP-YOLOE end-to-end TensorRT deployment demo
- Release FGC distillation algorithm with RetinaNet accuracy improved by 3.3%
- Release distributed training documentation
- Improvement and fixes
- Fix compilation problem with Windows c++ deployment
- Fix problems when saving results of inference data in VOC format
- Fix the detection box output of FairMOT c++ deployment
- Rotating frame detection model S2ANet supports batch size>1 deployment
### 2.4(03.24/2022)
- PP-YOLOE
- Release PP-YOLOE object detection models, achieve mAP as 51.6% on COCO test dataset and 78.1 FPS on Nvidia V100 by PP-YOLOE-l, reach SOTA performance for object detection on GPU``
- Release series models: s/m/l/x, and support deployment base on TensorRT & ONNX
- Spport AMP training and training speed is 33% faster than PP-YOLOv2
- PP-PicoDet:
- Release enhanced models of PP-PicoDet, mAP promoted ~2% on COCO and inference speed accelerated 63% on CPU
- Release PP-PicoDet-XS model with 0.7M parameters
- Post-processing integrated into the network to optimize deployment pipeline
- PP-Human
- Release PP-Human human analysis pipelineincluding pedestrian detection, attribute recognition, human tracking, multi-camera tracking, human statistics, action recognition. Supporting deployment with TensorRT
- Release StrongBaseline model for attribute recognition
- Release Centroid model for ReID
- Release ST-GCN model for falldown action recognition
- Model richness:
- Publish YOLOX object detection model, release series models: nano/tiny/s/m/l/x, and YOLOX-x achieves mAP as 51.8% on COCO val2017 dataset
- Function Optimize
- Optimize 20% training speed when training with EMA, improve saving method of EMA weights
- Support saving inference results in COCO format
- Deployment Optimize
- Support export ONNX model by Paddle2ONNX for all RCNN models
- Supoort export model with fused decode OP for SSD models to enhance inference speed in edge side
- Support export NMS to TensorRT model, optmize inference speed on TensorRT
### 2.3(11.03/2021)
- Feature models:
- Object detection: The lightweight object detection model PP-PicoDet, performace and inference speed reaches SOTA on mobile side
- Keypoint detection: The lightweight keypoint detection model PP-TinyPose for mobile side
- Model richness:
- Object detection:
- Publish Swin-Transformer object detection model
- Publish TOOD(Task-aligned One-stage Object Detection) model
- Publish GFL(Generalized Focal Loss) object detection model
- Publish Sniper optimization method for tiny object detection, supporting Faster RCNN and PP-YOLO series models
- Publish PP-YOLO optimized model PP-YOLO-EB for EdgeBoard
- Multi-object tracking:
- Publish Real-time tracking system PP-Tracking
- Publish high-precision, small-scale and lightweight model based on FairMot
- Publish real-time tracking model zoo for pedestrian, head and vehicle tracking, including scenarios such as aerial surveillance, autonomous driving, dense crowds, and tiny object tracking
- DeepSort support PP-YOLO, PP-PicoDet as object detector
- Keypoint detection:
- Publish Lite HRNet model
- Inference deployment:
- Support NPU deployment for YOLOv3 series
- Support C++ deployment for FairMot
- Support C++ and PaddleLite deployment for keypoint detection series model
- Documents:
- Add series English documents
### 2.2(08.10/2021)
- Model richness:
- Publish the Transformer test model: DETR, Deformable DETR, Sparse RCNN
- Key point test new Dark model, release Dark HRNet model
- Publish the MPII dataset HRNet keypoint detection model
- Release head and vehicle tracking vertical model
- Model optimization:
- AlignConv optimization model was released by S2ANet, and DOTA dataset mAP was optimized to 74.0
- Inference deployment
- Mainstream models support batch size>1 predictive deployment, including YOLOv3, PP-YOLO, Faster RCNN, SSD, TTFNet, FCOS
- New addition of target tracking models (JDE, Fair Mot, Deep Sort) Python side prediction deployment support, and support for TensorRT prediction
- FairMot joint key point detection model deployment Python side predictive deployment support
- Added support for key point detection model combined with PP-YOLO prediction deployment
- Documents:
- New TensorRT version notes to Windows Predictive Deployment documentation
- FAQ documents are updated
- Bug fixes:
- Fixed PP-YOLO series model training convergence problem
- Fixed the problem of no label data training when batch_size > 1
### 2.1(05.20/2021)
- Model richness enhancement:
- Key point model: HRNet, HigherHRNet
- Publish the multi-target tracking model: DeepSort, FairMot, JDE
- Basic framework Capabilities:
- Supports training without labels
- Forecast deployment:
- Paddle Inference YOLOv3 series model support batch_size>1 prediction
- Rotating frame detection S2ANet model prediction deployment is open
- Incremental quantization model benchmark
- Add dynamic graph model and static graph model: Paddle-Lite demo
- Detection model compression:
- Release PP-YOLO series model compression model
- Documents:
- Update quick start, forecast deployment and other tutorial documentation
- Added ONNX model export tutorial
- Added the mobile deployment document
### 2.0(04.15/2021)
**Description:** Since version 2.0, dynamic graphs are used as the default version of Paddle Detection, the original `dygraph` directory is switched to the root directory, and the original static graph implementation is moved to the `static` directory.
- Enhancement of dynamic graph model richness:
- PP-YOLOv2 and PP-YOLO tiny models were published. The accuracy of PP-YOLOv2 COCO Test dataset reached 49.5%, and the prediction speed of V100 reached 68.9 FPS
- Release the rotary frame detection model S2ANet
- Release the two-phase utility model PSS-Det
- Publish the face detection model Blazeface
- New basic module:
- Added SENet, GhostNet, and Res2Net backbone networks
- Added VisualDL training visualization support
- Added single precision calculation and PR curve drawing function
- The YOLO models support THE NHWC data format
- Forecast deployment:
- Publish forecast benchmark data for major models
- Adaptive to TensorRT6, support TensorRT dynamic size input, support TensorRT int8 quantitative prediction
- 7 types of models including PP-YOLO, YOLOv3, SSD, TTFNet, FCOS, Faster RCNN are deployed in Python/CPP/TRT prediction on Linux, Windows and NV Jetson platforms
- Detection model compression:
- Distillation: Added dynamic map distillation support and released YOLOv3-MobileNetV1 distillation model
- Joint strategy: new dynamic graph prunning + distillation joint strategy compression scheme, and release YOLOv3-MobileNetV1 prunning + distillation compression model
- Problem fix: Fixed dynamic graph quantization model export problem
- Documents:
- New English document of dynamic graph: including homepage document, getting started, quick start, model algorithm, new dataset, etc
- Added both English and Chinese installation documents of dynamic diagrams
- Added configuration file templates and description documents of dynamic graph RCNN series and YOLO series
## Historical Version Information
### 2.0-rc(02.23/2021)
- Enhancement of dynamic graph model richness:
- Optimize networking and training mode of RCNN models, and improve accuracy of RCNN series models (depending on Paddle Develop or version 2.0.1)
- Added support for SSDLite, FCOS, TTFNet, SOLOv2 series models
- Added pedestrian and vehicle vertical object detection models
- New dynamic graph basic module:
- Added MobileNetV3 and HRNet backbone networks
- Improved roi-align calculation logic for RCNN series models (depending on Paddle Develop or version 2.0.1)
- Added support for Synchronized Batch Norm
- Added support for Modulated Deformable Convolution
- Forecast deployment:
- Publish dynamic diagrams in python, C++, and Serving deployment solution and documentation. Support Faster RCNN, Mask RCNN, YOLOv3, PPYOLO, SSD, TTFNet, FCOS, SOLOv2 and other models to predict deployment
- Dynamic graph prediction deployment supports TensorRT mode FP32, FP16 inference acceleration
- Detection model compression:
- Prunning: Added dynamic graph prunning support, and released YOLOv3-MobileNetV1 prunning model
- Quantization: Added quantization support of dynamic graph, and released quantization models of YOLOv3-MobileNetV1 and YOLOv3-MobileNetV3
- Documents:
- New Dynamic Diagram tutorial documentation: includes installation instructions, quick start, data preparation, and training/evaluation/prediction process documentation
- New advanced tutorial documentation for dynamic diagrams: includes documentation for model compression and inference deployment
- Added dynamic graph model library documentation
### v2.0-beta(12.20/2020)
- Dynamic graph support:
- Support for Faster-RCNN, Mask-RCNN, FPN, Cascade Faster/Mask RCNN, YOLOv3 and SSD models, trial version.
- Model upgrade:
- Updated PP-YOLO Mobile-Netv3 large and small models with improved accuracy, and added prunning and distillation models.
- New features:
- Support VisualDL visual data preprocessing pictures.
- Bug fix:
- Fix Blaze Face keypoint prediction bug.
### v0.5.0(11/2020)
- Model richness enhancement:
- SOLOv2 series models were released, in which the SOLOv2-Light-R50-VD-DCN-FPN model achieved 38.6 FPS on a single gpu V100, accelerating by 24%, and the accuracy of COCO verification set reached 38.8%, improving by 2.4 absolute percentage points.
- Added Android mobile terminal detection demo, including SSD, YOLO series model, can directly scan code installation experience.
- Mobile terminal model optimization:
- Added to PACT's new quantization strategy, YOLOv3 Mobilenetv3 is 0.7% better than normal quantization on COCO datasets.
- Ease of use and functional components:
- Enhance the function of generate_proposal_labels operator to avoid nan risk of the model.
- Fixed several problems with deploy python and C++ prediction.
- Unified COCO and VOC datasets under the evaluation process, support the output of a single class of AP and P-R curves.
- PP-YOLO supports rectangular input images.
- Documents:
- Added object detection whole process tutorial, added Jetson platform deployment tutorial.
### v0.4.0(07/2020)
- Model richness enhancement:
- The PPYOLO model was released. The accuracy of COCO dataset reached 45.2%, and the prediction speed of single gpu V100 reached 72.9 FPS, which was better than that of YOL Ov4 model.
- New TTFNet model, base version aligned with competing products, COCO dataset accuracy up to 32.9%.
- New HTC model, base version aligned with competing products, COCO dataset accuracy up to 42.2%.
- BlazeFace key point detection model was added, with an accuracy of 85.2% in Wider-Face's Easy-Set.
- ACFPN model was added, and the accuracy of COCO dataset reached 39.6%.
- General object detection model (including 676 classes) on the publisher side. On the COCO dataset with the same strategy, when V100 is 19.5FPS, the COCO mAP can reach 49.4%.
- Mobile terminal model optimization:
- Added SSD Lite series optimization models, including Ghost Net Backbone, FPN components, etc., with accuracy improved by 0.5% and 1.5%.
- Ease of use and functional components:
- Add GridMask, Random Erasing data enhancement method.
- Added support for Matrix NMS.
- EMA(Exponential Moving Average) training support.
- The new multi-machine training method, the average acceleration ratio of two machines to single machine is 80%, multi-machine training support needs to be further verified.
### v0.3.0(05/2020)
- Model richness enhancement:
- Efficientdet-D0 model added, speed and accuracy is better than competing products.
- Added YOLOv4 prediction model, precision aligned with competing products; Added YOLOv4 fine tuning training on Pascal VOC datasets with accuracy of 85.5%.
- YOLOv3 added MobileNetV3 backbone network, COCO dataset accuracy reached 31.6%.
- Add Anchor-free model FCOS, the accuracy is better than competing products.
- Anchor-free model Cornernet Squeeze was added, the accuracy was better than competing products, and the accuracy of COCO dataset of optimized model was 38.2% and +3.7%, 5% faster than YOL Ov3 Darknet53.
- The CascadeRCNN-ResNet50vd model, which is a practical object detection model on the server side, is added, and its speed and accuracy are better than that of the competitive EfficientDet.
- Mobile terminal launched three models:
- SSSDLite model: SSDLite-Mobilenetv3 small/large model, with better accuracy than competitors.
- YOLOv3 Mobile solution: The YOLOv3-MobileNetv3 model accelerates 3.5 times after compression, which is faster and more accurate than the SSD Lite model of competing products.
- RCNN Mobile terminal scheme: CascadeRCNN-MobileNetv3, after series optimization, launched models with input images of 320x320 and 640x640 respectively, with high cost performance for speed and accuracy.
- Anticipate deployment refactoring:
- New Python prediction deployment process, support for RCNN, YOLO, SSD, Retina Net, face models, support for video prediction.
- Refactoring C++ predictive deployment to improve ease of use.
- Ease of use and functional components:
- Added Auto Augment data enhancement.
- Upgrade the detection library document structure.
- Support shape matching automatically by transfer learning.
- Optimize memory footprint during mask branch evaluation.
### v0.2.0(02/2020)
- The new model:
- Added CBResNet model.
- Added LibraRCNN model.
- The accuracy of YOLOv3 model was further improved, and the accuracy based on COCO data reached 43.2%, 1.4% higher than the previous version.
- New Basic module:
- Trunk network: CBResNet is added.
- Loss module: Loss of YOLOv3 supports fine-grained OP combinations.
- Regular module: Added the Drop Block module.
- Function optimization and improvement:
- Accelerate YOLOv3 data preprocessing and increase the overall training speed by 40%.
- Optimize data preprocessing logic to improve ease of use.
- dd face detection prediction benchmark data.
- Added C++ prediction engine Python API prediction example.
- Detection model compression:
- prunning: Release MobileNet-YOLOv3 prunning scheme and model, based on VOC data FLOPs 69.6%, mAP + 1.4%, based on COCO DATA FLOPS 28.8%, mAP + 0.9%; Release ResNet50vd-DCN-YOLOv3 clipped solution and model based on COCO datasets 18.4%, mAP + 0.8%.
- Distillation: Release MobileNet-YOLOv3 distillation scheme and model, based on VOC data mAP + 2.8%, COCO data mAP + 2.1%.
- Quantification: Release quantification models of YOLOv3 Mobile Net and Blaze Face.
- Prunning + distillation: release MobileNet-YOLOv3 prunning + distillation solution and model, 69.6% based on COCO DATA FLOPS, 64.5% based on TensorRT prediction acceleration, 0.3% mAP; Release ResNet50vd-DCN-YOLOv3 tailoring + distillation solution and model, 43.7% based on COCO Data FLOPS, 24.0% based on TensorRT prediction acceleration, mAP + 0.6%.
- Search: Open source Blaze Face Nas complete search solution.
- Predict deployment:
- Integrated TensorRT, support FP16, FP32, INT8 quantitative inference acceleration.
- Document:
- Add detailed data preprocessing module to introduce documents and implement custom data Reader documents.
- Added documentation on how to add algorithm models.
- Document deployment to the web site: https://paddledetection.readthedocs.io
### 12/2019
- Add Res2Net model.
- Add HRNet model.
- Add GIOU loss and DIOU loss。
### 21/11/2019
- Add CascadeClsAware RCNN model.
- Add CBNet, ResNet200 and Non-local model.
- Add SoftNMS.
- Add Open Image V5 dataset and Objects365 dataset model
### 10/2019
- Added enhanced YOLOv3 model with accuracy up to 41.4%.
- Added Face detection models BlazeFace and Faceboxes.
- Rich COCO based models, accuracy up to 51.9%.
- Added CA-Cascade-RCNN, one of the best single models to win on Objects365 2019 Challenge.
- Add pedestrian detection and vehicle detection pre-training models.
- Support FP16 training.
- Added cross-platform C++ inference deployment scheme.
- Add model compression examples.
### 2/9/2019
- Add GroupNorm model.
- Add CascadeRCNN+Mask model.
### 5/8/2019
- Add Modulated Deformable Convolution series model
### 29/7/2019
- Add detection library Chinese document
- Fixed an issue where R-CNN series model training was evaluated simultaneously
- Add ResNext101-vd + Mask R-CNN + FPN models
- Added YOLOv3 model based on VOC dataset
### 3/7/2019
- First release of PaddleDetection Detection library and Detection model library
- modelsFaster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD.

View File

@@ -0,0 +1,276 @@
# 模型库和基线
# 内容
- [基础设置](#基础设置)
- [测试环境](#测试环境)
- [通用设置](#通用设置)
- [训练策略](#训练策略)
- [ImageNet预训练模型](#ImageNet预训练模型)
- [基线](#基线)
- [目标检测](#目标检测)
- [实例分割](#实例分割)
- [PaddleYOLO](#PaddleYOLO)
- [人脸检测](#人脸检测)
- [旋转框检测](#旋转框检测)
- [关键点检测](#关键点检测)
- [多目标跟踪](#多目标跟踪)
# 基础设置
## 测试环境
- Python 3.7
- PaddlePaddle 每日版本
- CUDA 10.1
- cuDNN 7.5
- NCCL 2.4.8
## 通用设置
- 所有模型均在COCO17数据集中训练和测试。
- [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5)、[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov6)、[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7)和[YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov8)这几类模型的代码在[PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO)中,**PaddleYOLO库开源协议为GPL 3.0**。
- 除非特殊说明所有ResNet骨干网络采用[ResNet-B](https://arxiv.org/pdf/1812.01187)结构。
- **推理时间(fps)**: 推理时间是在一张Tesla V100的GPU上通过'tools/eval.py'测试所有验证集得到单位是fps(图片数/秒), cuDNN版本是7.5,包括数据加载、网络前向执行和后处理, batch size是1。
## 训练策略
- 我们采用和[Detectron](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#training-schedules)相同的训练策略。
- 1x 策略表示在总batch size为8时初始学习率为0.01在8 epoch和11 epoch后学习率分别下降10倍最终训练12 epoch。
- 2x 策略为1x策略的两倍同时学习率调整的epoch数位置也为1x的两倍。
## ImageNet预训练模型
Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型均通过标准的Imagenet-1k数据集训练得到ResNet和MobileNet等是采用余弦学习率调整策略或SSLD知识蒸馏训练得到的高精度预训练模型可在[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)查看模型细节。
# 基线
## 目标检测
### Faster R-CNN
请参考[Faster R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/faster_rcnn/)
### YOLOv3
请参考[YOLOv3](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/)
### PP-YOLOE/PP-YOLOE+
请参考[PP-YOLOE](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/)
### PP-YOLO/PP-YOLOv2
请参考[PP-YOLO](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/)
### PicoDet
请参考[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet)
### RetinaNet
请参考[RetinaNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/retinanet/)
### Cascade R-CNN
请参考[Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn)
### SSD/SSDLite
请参考[SSD](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ssd/)
### FCOS
请参考[FCOS](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/fcos/)
### CenterNet
请参考[CenterNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/centernet/)
### TTFNet/PAFNet
请参考[TTFNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/)
### Group Normalization
请参考[Group Normalization](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/gn/)
### Deformable ConvNets v2
请参考[Deformable ConvNets v2](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dcn/)
### HRNets
请参考[HRNets](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/hrnet/)
### Res2Net
请参考[Res2Net](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/res2net/)
### ConvNeXt
请参考[ConvNeXt](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/convnext/)
### GFL
请参考[GFL](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/gfl)
### TOOD
请参考[TOOD](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/tood)
### PSS-DET(RCNN-Enhance)
请参考[PSS-DET](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rcnn_enhance)
### DETR
请参考[DETR](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/detr)
### Deformable DETR
请参考[Deformable DETR](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/deformable_detr)
### Sparse R-CNN
请参考[Sparse R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/sparse_rcnn)
### Vision Transformer
请参考[Vision Transformer](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/vitdet)
### DINO
请参考[DINO](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dino)
### YOLOX
请参考[YOLOX](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolox)
### YOLOF
请参考[YOLOF](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolof)
## 实例分割
### Mask R-CNN
请参考[Mask R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn/)
### Cascade R-CNN
请参考[Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn)
### SOLOv2
请参考[SOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/solov2/)
### QueryInst
请参考[QueryInst](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/queryinst)
## [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO)
请参考[PaddleYOLO模型库](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/docs/MODEL_ZOO_cn.md)
### YOLOv5
请参考[YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5)
### YOLOv6(v3.0)
请参考[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov6)
### YOLOv7
请参考[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7)
### YOLOv8
请参考[YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov8)
### RTMDet
请参考[RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/rtmdet)
## 人脸检测
请参考[人脸检测模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/face_detection)
### BlazeFace
请参考[BlazeFace](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/face_detection/)
## 旋转框检测
请参考[旋转框检测模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate)
### PP-YOLOE-R
请参考[PP-YOLOE-R](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r)
### FCOSR
请参考[FCOSR](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr)
### S2ANet
请参考[S2ANet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet)
## 关键点检测
请参考[关键点检测模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint)
### PP-TinyPose
请参考[PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/tiny_pose)
### HRNet
请参考[HRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/hrnet)
### Lite-HRNet
请参考[Lite-HRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/lite_hrnet)
### HigherHRNet
请参考[HigherHRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/higherhrnet)
## 多目标跟踪
请参考[多目标跟踪模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot)
### DeepSORT
请参考[DeepSORT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort)
### ByteTrack
请参考[ByteTrack](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/bytetrack)
### OC-SORT
请参考[OC-SORT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/ocsort)
### BoT-SORT
请参考[BoT-SORT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/botsort)
### CenterTrack
请参考[CenterTrack](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/centertrack)
### FairMOT/MC-FairMOT
请参考[FairMOT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot)
### JDE
请参考[JDE](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde)

View File

@@ -0,0 +1,275 @@
# Model Zoos and Baselines
# Content
- [Basic Settings](#Basic-Settings)
- [Test Environment](#Test-Environment)
- [General Settings](#General-Settings)
- [Training strategy](#Training-strategy)
- [ImageNet pretraining model](#ImageNet-pretraining-model)
- [Baseline](#Baseline)
- [Object Detection](#Object-Detection)
- [Instance Segmentation](#Instance-Segmentation)
- [PaddleYOLO](#PaddleYOLO)
- [Face Detection](#Face-Detection)
- [Rotated Object detection](#Rotated-Object-detection)
- [KeyPoint Detection](#KeyPoint-Detection)
- [Multi Object Tracking](#Multi-Object-Tracking)
# Basic Settings
## Test Environment
- Python 3.7
- PaddlePaddle Daily version
- CUDA 10.1
- cuDNN 7.5
- NCCL 2.4.8
## General Settings
- All models were trained and tested in the COCO17 dataset.
- The codes of [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5),[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov6),[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7) and [YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov8) can be found in [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO). Note that **the LICENSE of PaddleYOLO is GPL 3.0**.
- Unless special instructions, all the ResNet backbone network using [ResNet-B](https://arxiv.org/pdf/1812.01187) structure.
- **Inference time (FPS)**: The reasoning time was calculated on a Tesla V100 GPU by `tools/eval.py` testing all validation sets in FPS (number of pictures/second). CuDNN version is 7.5, including data loading, network forward execution and post-processing, and Batch size is 1.
## Training strategy
- We adopt and [Detectron](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#training-schedules) in the same training strategy.
- 1x strategy indicates that when the total batch size is 8, the initial learning rate is 0.01, and the learning rate decreases by 10 times after 8 epoch and 11 epoch, respectively, and the final training is 12 epoch.
- 2x strategy is twice as much as strategy 1x, and the learning rate adjustment position of epochs is twice as much as strategy 1x.
## ImageNet pretraining model
Paddle provides a skeleton network pretraining model based on ImageNet. All pre-training models were trained by standard Imagenet 1K dataset. ResNet and MobileNet are high-precision pre-training models obtained by cosine learning rate adjustment strategy or SSLD knowledge distillation training. Model details are available at [PaddleClas](https://github.com/PaddlePaddle/PaddleClas).
# Baseline
## Object Detection
### Faster R-CNN
Please refer to [Faster R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/faster_rcnn/)
### YOLOv3
Please refer to [YOLOv3](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/)
### PP-YOLOE/PP-YOLOE+
Please refer to [PP-YOLOE](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyoloe/)
### PP-YOLO/PP-YOLOv2
Please refer to [PP-YOLO](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ppyolo/)
### PicoDet
Please refer to [PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/picodet)
### RetinaNet
Please refer to [RetinaNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/retinanet/)
### Cascade R-CNN
Please refer to [Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn)
### SSD/SSDLite
Please refer to [SSD](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ssd/)
### FCOS
Please refer to [FCOS](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/fcos/)
### CenterNet
Please refer to [CenterNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/centernet/)
### TTFNet/PAFNet
Please refer to [TTFNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/)
### Group Normalization
Please refer to [Group Normalization](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/gn/)
### Deformable ConvNets v2
Please refer to [Deformable ConvNets v2](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dcn/)
### HRNets
Please refer to [HRNets](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/hrnet/)
### Res2Net
Please refer to [Res2Net](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/res2net/)
### ConvNeXt
Please refer to [ConvNeXt](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/convnext/)
### GFL
Please refer to [GFL](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/gfl)
### TOOD
Please refer to [TOOD](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/tood)
### PSS-DET(RCNN-Enhance)
Please refer to [PSS-DET](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rcnn_enhance)
### DETR
Please refer to [DETR](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/detr)
### Deformable DETR
Please refer to [Deformable DETR](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/deformable_detr)
### Sparse R-CNN
Please refer to [Sparse R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/sparse_rcnn)
### Vision Transformer
Please refer to [Vision Transformer](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/vitdet)
### DINO
Please refer to [DINO](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/dino)
### YOLOX
Please refer to [YOLOX](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolox)
### YOLOF
Please refer to [YOLOF](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolof)
## Instance-Segmentation
### Mask R-CNN
Please refer to [Mask R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn/)
### Cascade R-CNN
Please refer to [Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn)
### SOLOv2
Please refer to [SOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/solov2/)
### QueryInst
Please refer to [QueryInst](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/queryinst)
## [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO)
Please refer to [Model Zoo for PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/docs/MODEL_ZOO_en.md)
### YOLOv5
Please refer to [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5)
### YOLOv6(v3.0)
Please refer to [YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov6)
### YOLOv7
Please refer to [YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7)
### YOLOv8
Please refer to [YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov8)
### RTMDet
Please refer to [RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/rtmdet)
## Face Detection
Please refer to [Model Zoo for Face Detection](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/face_detection)
### BlazeFace
Please refer to [BlazeFace](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/face_detection/)
## Rotated Object detection
Please refer to [Model Zoo for Rotated Object Detection](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate)
### PP-YOLOE-R
Please refer to [PP-YOLOE-R](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r)
### FCOSR
Please refer to [FCOSR](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/fcosr)
### S2ANet
Please refer to [S2ANet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/s2anet)
## KeyPoint Detection
Please refer to [Model Zoo for KeyPoint Detection](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint)
### PP-TinyPose
Please refer to [PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/tiny_pose)
### HRNet
Please refer to [HRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/hrnet)
### Lite-HRNet
Please refer to [Lite-HRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/lite_hrnet)
### HigherHRNet
Please refer to [HigherHRNet](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/keypoint/higherhrnet)
## Multi-Object Tracking
Please refer to [Model Zoo for Multi-Object Tracking](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot)
### DeepSORT
Please refer to [DeepSORT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort)
### ByteTrack
Please refer to [ByteTrack](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/bytetrack)
### OC-SORT
Please refer to [OC-SORT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/ocsort)
### BoT-SORT
Please refer to [BoT-SORT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/botsort)
### CenterTrack
Please refer to [CenterTrack](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/centertrack)
### FairMOT/MC-FairMOT
Please refer to [FairMOT](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot)
### JDE
Please refer to [JDE](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde)

View File

@@ -0,0 +1,407 @@
# 新增模型算法
为了让用户更好的使用PaddleDetection本文档中我们将介绍PaddleDetection的主要模型技术细节及应用
## 目录
- [1.简介](#1.简介)
- [2.新增模型](#2.新增模型)
- [2.1新增网络结构](#2.1新增网络结构)
- [2.1.1新增Backbone](#2.1.1新增Backbone)
- [2.1.2新增Neck](#2.1.2新增Neck)
- [2.1.3新增Head](#2.1.3新增Head)
- [2.1.4新增Loss](#2.1.4新增Loss)
- [2.1.5新增后处理模块](#2.1.5新增后处理模块)
- [2.1.6新增Architecture](#2.1.6新增Architecture)
- [2.2新增配置文件](#2.2新增配置文件)
- [2.2.1网络结构配置文件](#2.2.1网络结构配置文件)
- [2.2.2优化器配置文件](#2.2.2优化器配置文件)
- [2.2.3Reader配置文件](#2.2.3Reader配置文件)
### 1.简介
PaddleDetecion中的每一种模型对应一个文件夹以yolov3为例yolov3系列的模型对应于`configs/yolov3`文件夹其中yolov3_darknet的总配置文件`configs/yolov3/yolov3_darknet53_270e_coco.yml`的内容如下:
```
_BASE_: [
'../datasets/coco_detection.yml', # 数据集配置文件,所有模型共用
'../runtime.yml', # 运行时相关配置
'_base_/optimizer_270e.yml', # 优化器相关配置
'_base_/yolov3_darknet53.yml', # yolov3网络结构配置文件
'_base_/yolov3_reader.yml', # yolov3 Reader模块配置
]
# 定义在此处的相关配置可以覆盖上述文件中的同名配置
snapshot_epoch: 5
weights: output/yolov3_darknet53_270e_coco/model_final
```
可以看到配置文件中的模块进行了清晰的划分除了公共的数据集配置以及运行时配置其他配置被划分为优化器网络结构以及Reader模块。PaddleDetection中支持丰富的优化器学习率调整策略预处理算子等因此大多数情况下不需要编写优化器以及Reader相关的代码而只需要在配置文件中配置即可。因此新增一个模型的主要在于搭建网络结构。
PaddleDetection网络结构的代码在`ppdet/modeling/`中,所有网络结构以组件的形式进行定义与组合,网络结构的主要构成如下所示:
```
ppdet/modeling/
├── architectures
│ ├── faster_rcnn.py # Faster Rcnn模型
│ ├── ssd.py # SSD模型
│ ├── yolo.py # YOLOv3模型
│ │ ...
├── heads # 检测头模块
│ ├── xxx_head.py # 定义各类检测头
│ ├── roi_extractor.py #检测感兴趣区域提取
├── backbones # 基干网络模块
│ ├── resnet.py # ResNet网络
│ ├── mobilenet.py # MobileNet网络
│ │ ...
├── losses # 损失函数模块
│ ├── xxx_loss.py # 定义注册各类loss函数
├── necks # 特征融合模块
│ ├── xxx_fpn.py # 定义各种FPN模块
├── proposal_generator # anchor & proposal生成与匹配模块
│ ├── anchor_generator.py # anchor生成模块
│ ├── proposal_generator.py # proposal生成模块
│ ├── target.py # anchor & proposal的匹配函数
│ ├── target_layer.py # anchor & proposal的匹配模块
├── tests # 单元测试模块
│ ├── test_xxx.py # 对网络中的算子以及模块结构进行单元测试
├── ops.py # 封装各类PaddlePaddle物体检测相关公共检测组件/算子
├── layers.py # 封装及注册各类PaddlePaddle物体检测相关公共检测组件/算子
├── bbox_utils.py # 封装检测框相关的函数
├── post_process.py # 封装及注册后处理相关模块
├── shape_spec.py # 定义模块输出shape的类
```
![](../images/model_figure.png)
### 2.新增模型
接下来以单阶段检测器YOLOv3为例对建立模型过程进行详细描述按照此思路您可以快速搭建新的模型。
#### 2.1新增网络结构
##### 2.1.1新增Backbone
PaddleDetection中现有所有Backbone网络代码都放置在`ppdet/modeling/backbones`目录下,所以我们在其中新建`darknet.py`如下:
```python
import paddle.nn as nn
from ppdet.core.workspace import register, serializable
@register
@serializable
class DarkNet(nn.Layer):
__shared__ = ['norm_type']
def __init__(self,
depth=53,
return_idx=[2, 3, 4],
norm_type='bn',
norm_decay=0.):
super(DarkNet, self).__init__()
# 省略内容
def forward(self, inputs):
# 省略处理逻辑
pass
@property
def out_shape(self):
# 省略内容
pass
```
然后在`backbones/__init__.py`中加入引用:
```python
from . import darknet
from .darknet import *
```
**几点说明:**
- 为了在yaml配置文件中灵活配置网络所有Backbone需要利用`ppdet.core.workspace`里的`register`进行注册,形式请参考如上示例。此外,可以使用`serializable`以使backbone支持序列化
- 所有的Backbone需继承`paddle.nn.Layer`并实现forward函数。此外还需实现out_shape属性定义输出的feature map的channel信息具体可参见源码
- `__shared__`为了实现一些参数的配置全局共享这些参数可以被backbone, neckheadloss等所有注册模块共享。
##### 2.1.2新增Neck
特征融合模块放置在`ppdet/modeling/necks`目录下,我们在其中新建`yolo_fpn.py`如下:
``` python
import paddle.nn as nn
from ppdet.core.workspace import register, serializable
@register
@serializable
class YOLOv3FPN(nn.Layer):
__shared__ = ['norm_type']
def __init__(self,
in_channels=[256, 512, 1024],
norm_type='bn'):
super(YOLOv3FPN, self).__init__()
# 省略内容
def forward(self, blocks):
# 省略内容
pass
@classmethod
def from_config(cls, cfg, input_shape):
# 省略内容
pass
@property
def out_shape(self):
# 省略内容
pass
```
然后在`necks/__init__.py`中加入引用:
```python
from . import yolo_fpn
from .yolo_fpn import *
```
**几点说明:**
- neck模块需要使用`register`进行注册,可以使用`serializable`进行序列化;
- neck模块需要继承`paddle.nn.Layer`类并实现forward函数。除此之外还需要实现`out_shape`属性用于定义输出的feature map的channel信息还需要实现类函数`from_config`用于在配置文件中推理出输入channel并用于`YOLOv3FPN`的初始化;
- neck模块可以使用`__shared__`实现一些参数的配置全局共享。
##### 2.1.3新增Head
Head模块全部存放在`ppdet/modeling/heads`目录下,我们在其中新建`yolo_head.py`如下
``` python
import paddle.nn as nn
from ppdet.core.workspace import register
@register
class YOLOv3Head(nn.Layer):
__shared__ = ['num_classes']
__inject__ = ['loss']
def __init__(self,
anchors=[[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45],[59, 119],
[116, 90], [156, 198], [373, 326]],
anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
num_classes=80,
loss='YOLOv3Loss',
iou_aware=False,
iou_aware_factor=0.4):
super(YOLOv3Head, self).__init__()
# 省略内容
def forward(self, feats, targets=None):
# 省略内容
pass
```
然后在`heads/__init__.py`中加入引用:
```python
from . import yolo_head
from .yolo_head import *
```
**几点说明:**
- Head模块需要使用`register`进行注册;
- Head模块需要继承`paddle.nn.Layer`类并实现forward函数。
- `__inject__`表示引入全局字典中已经封装好的模块。如loss等。
##### 2.1.4新增Loss
Loss模块全部存放在`ppdet/modeling/losses`目录下,我们在其中新建`yolo_loss.py`下
```python
import paddle.nn as nn
from ppdet.core.workspace import register
@register
class YOLOv3Loss(nn.Layer):
__inject__ = ['iou_loss', 'iou_aware_loss']
__shared__ = ['num_classes']
def __init__(self,
num_classes=80,
ignore_thresh=0.7,
label_smooth=False,
downsample=[32, 16, 8],
scale_x_y=1.,
iou_loss=None,
iou_aware_loss=None):
super(YOLOv3Loss, self).__init__()
# 省略内容
def forward(self, inputs, targets, anchors):
# 省略内容
pass
```
然后在`losses/__init__.py`中加入引用:
```python
from . import yolo_loss
from .yolo_loss import *
```
**几点说明:**
- loss模块需要使用`register`进行注册;
- loss模块需要继承`paddle.nn.Layer`类并实现forward函数。
- 可以使用`__inject__`表示引入全局字典中已经封装好的模块,使用`__shared__`可以实现一些参数的配置全局共享。
##### 2.1.5新增后处理模块
后处理模块定义在`ppdet/modeling/post_process.py`中,其中定义了`BBoxPostProcess`类来进行后处理操作,如下所示:
``` python
from ppdet.core.workspace import register
@register
class BBoxPostProcess(object):
__shared__ = ['num_classes']
__inject__ = ['decode', 'nms']
def __init__(self, num_classes=80, decode=None, nms=None):
# 省略内容
pass
def __call__(self, head_out, rois, im_shape, scale_factor):
# 省略内容
pass
```
**几点说明:**
- 后处理模块需要使用`register`进行注册
- `__inject__`注入了全局字典中封装好的模块如decode和nms等。decode和nms定义在`ppdet/modeling/layers.py`中。
##### 2.1.6新增Architecture
所有architecture网络代码都放置在`ppdet/modeling/architectures`目录下,`meta_arch.py`中定义了`BaseArch`类,代码如下:
``` python
import paddle.nn as nn
from ppdet.core.workspace import register
@register
class BaseArch(nn.Layer):
def __init__(self):
super(BaseArch, self).__init__()
def forward(self, inputs):
self.inputs = inputs
self.model_arch()
if self.training:
out = self.get_loss()
else:
out = self.get_pred()
return out
def model_arch(self, ):
pass
def get_loss(self, ):
raise NotImplementedError("Should implement get_loss method!")
def get_pred(self, ):
raise NotImplementedError("Should implement get_pred method!")
```
所有的architecture需要继承`BaseArch`类,如`yolo.py`中的`YOLOv3`定义如下:
``` python
@register
class YOLOv3(BaseArch):
__category__ = 'architecture'
__inject__ = ['post_process']
def __init__(self,
backbone='DarkNet',
neck='YOLOv3FPN',
yolo_head='YOLOv3Head',
post_process='BBoxPostProcess'):
super(YOLOv3, self).__init__()
self.backbone = backbone
self.neck = neck
self.yolo_head = yolo_head
self.post_process = post_process
@classmethod
def from_config(cls, cfg, *args, **kwargs):
# 省略内容
pass
def get_loss(self):
# 省略内容
pass
def get_pred(self):
# 省略内容
pass
```
**几点说明:**
- 所有的architecture需要使用`register`进行注册
- 在组建一个完整的网络时必须要设定`__category__ = 'architecture'`来表示一个完整的物体检测模型;
- backbone, neck, yolo_head以及post_process等检测组件传入到architecture中组成最终的网络。像这样将检测模块化提升了检测模型的复用性可以通过组合不同的检测组件得到多个模型。
- from_config类函数实现了模块间组合时channel的自动配置。
#### 2.2新增配置文件
##### 2.2.1网络结构配置文件
上面详细地介绍了如何新增一个architecture接下来演示如何配置一个模型yolov3关于网络结构的配置在`configs/yolov3/_base_/`文件夹中定义,如`yolov3_darknet53.yml`定义了yolov3_darknet的网络结构其定义如下
```
architecture: YOLOv3
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
norm_type: sync_bn
YOLOv3:
backbone: DarkNet
neck: YOLOv3FPN
yolo_head: YOLOv3Head
post_process: BBoxPostProcess
DarkNet:
depth: 53
return_idx: [2, 3, 4]
# use default config
# YOLOv3FPN:
YOLOv3Head:
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
loss: YOLOv3Loss
YOLOv3Loss:
ignore_thresh: 0.7
downsample: [32, 16, 8]
label_smooth: false
BBoxPostProcess:
decode:
name: YOLOBox
conf_thresh: 0.005
downsample_ratio: 32
clip_bbox: true
nms:
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.01
nms_threshold: 0.45
nms_top_k: 1000
```
可以看到在配置文件中首先需要指定网络的architecturepretrain_weights指定训练模型的url或者路径norm_type等可以作为全局参数共享。模型的定义自上而下依次在文件中定义与上节中的模型组件一一对应。对于一些模型组件如果采用默认
的参数,可以不用配置,如上文中的`yolo_fpn`。通过改变相关配置,我们可以轻易地组合出另一个模型,比如`configs/yolov3/_base_/yolov3_mobilenet_v1.yml`将backbone从Darknet切换成MobileNet。
##### 2.2.2优化器配置文件
优化器配置文件定义模型使用的优化器以及学习率的调度策略目前PaddleDetection中已经集成了多种多样的优化器和学习率策略具体可参见代码`ppdet/optimizer.py`。比如yolov3的优化器配置文件定义在`configs/yolov3/_base_/optimizer_270e.yml`,其定义如下:
```
epoch: 270
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
# epoch数目
- 216
- 243
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
```
**几点说明:**
- 可以通过OptimizerBuilder.optimizer指定优化器的类型及参数目前支持的优化器可以参考[PaddlePaddle官方文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html)
- 可以设置LearningRate.schedulers设置不同学习率调整策略的组合PaddlePaddle目前支持多种学习率调整策略具体也可参考[PaddlePaddle官方文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html)。需要注意的是你需要对于PaddlePaddle中的学习率调整策略进行简单的封装具体可参考源码`ppdet/optimizer.py`。
##### 2.2.3Reader配置文件
关于Reader的配置可以参考[Reader配置文档](./READER.md#5.配置及运行)。
> 看过此文档您应该对PaddleDetection中模型搭建与配置有了一定经验结合源码会理解的更加透彻。关于模型技术如您有其他问题或建议请给我们提issue我们非常欢迎您的反馈。

View File

@@ -0,0 +1,409 @@
# How to Create Model Algorithm
In order to make better use of PaddleDetection, we will introduce the main model technical details and application of PaddleDetection in this document
## Directory
- [How to Create Model Algorithm](#how-to-create-model-algorithm)
- [Directory](#directory)
- [1. Introduction](#1-introduction)
- [2. Create Model](#2-create-model)
- [2.1 Create Model Structure](#21-create-model-structure)
- [2.1.1 Create Backbone](#211-create-backbone)
- [2.1.2 Create Neck](#212-create-neck)
- [2.1.3 Create Head](#213-create-head)
- [2.1.4 Create Loss](#214-create-loss)
- [2.1.5 Create Post-processing Module](#215-create-post-processing-module)
- [2.1.6 Create Architecture](#216-create-architecture)
- [2.2 Create Configuration File](#22-create-configuration-file)
- [2.2.1 Network Structure Configuration File](#221-network-structure-configuration-file)
- [2.2.2 Optimizer configuration file](#222-optimizer-configuration-file)
- [2.2.3 Reader Configuration File](#223-reader-configuration-file)
### 1. Introduction
Each model in the PaddleDetecion corresponds to a folder. In the case of Yolov3, models in the Yolov3 family correspond to the `configs/yolov3` folder. Yolov3 Darknet's general configuration file `configs/yolov3/yolov3_darknet53_270e_coco.yml`.
```
_BASE_: [
'../datasets/coco_detection.yml', # Dataset configuration file shared by all models
'../runtime.yml', # Runtime configuration
'_base_/optimizer_270e.yml', # Optimizer related configuration
'_base_/yolov3_darknet53.yml', # yolov3 Network structure configuration file
'_base_/yolov3_reader.yml', # yolov3 Reader module configuration
]
# The relevant configuration defined here can override the configuration of the same name in the above file
snapshot_epoch: 5
weights: output/yolov3_darknet53_270e_coco/model_final
```
As you can see, the modules in the configuration file are clearly divided into optimizer, network structure, and reader modules, with the exception of the common dataset configuration and runtime configuration. Rich optimizers, learning rate adjustment strategies, preprocessing operators, etc., are supported in PaddleDetection, so most of the time you don't need to write the optimizer and reader-related code, just configure it in the configuration file. Therefore, the main purpose of adding a new model is to build the network structure.
In `ppdet/modeling/`, all of the Paddle Detection network structures are defined and combined in the form of components. The main components of the network structure are as follows:
```
ppdet/modeling/
├── architectures
│ ├── faster_rcnn.py # Faster Rcnn model
│ ├── ssd.py # SSD model
│ ├── yolo.py # YOLOv3 model
│ │ ...
├── heads # detection head module
│ ├── xxx_head.py # define various detection heads
│ ├── roi_extractor.py # detection of region of interest extraction
├── backbones # backbone network module
│ ├── resnet.py # ResNet network
│ ├── mobilenet.py # MobileNet network
│ │ ...
├── losses # loss function module
│ ├── xxx_loss.py # define and register various loss functions
├── necks # feature fusion module
│ ├── xxx_fpn.py # define various FPN modules
├── proposal_generator # anchor & proposal generate and match modules
│ ├── anchor_generator.py # anchor generate modules
│ ├── proposal_generator.py # proposal generate modules
│ ├── target.py # anchor & proposal Matching function
│ ├── target_layer.py # anchor & proposal Matching function
├── tests # unit test module
│ ├── test_xxx.py # the operator and module structure in the network are unit tested
├── ops.py # encapsulates all kinds of common detection components/operators related to the detection of PaddlePaddle objects
├── layers.py # encapsulates and register all kinds of PaddlePaddle object detection related public detection components/operators
├── bbox_utils.py # encapsulates the box-related functions
├── post_process.py # encapsulate and process related modules after registration
├── shape_spec.py # defines a class for the module to output shape
```
![](../images/model_figure.png)
### 2. Create Model
Next, the modeling process is described in detail by taking the single-stage detector YOLOv3 as an example, so that you can quickly build a new model according to this idea.
#### 2.1 Create Model Structure
##### 2.1.1 Create Backbone
All existing Backbone network code in PaddleDetection is placed under `ppdet/modeling/backbones` directory, so we created `darknet.py` as follows:
```python
import paddle.nn as nn
from ppdet.core.workspace import register, serializable
@register
@serializable
class DarkNet(nn.Layer):
__shared__ = ['norm_type']
def __init__(self,
depth=53,
return_idx=[2, 3, 4],
norm_type='bn',
norm_decay=0.):
super(DarkNet, self).__init__()
# Omit the content
def forward(self, inputs):
# Ellipsis processing logic
pass
@property
def out_shape(self):
# Omit the content
pass
```
Then add a reference to `backbones/__init__.py`:
```python
from . import darknet
from .darknet import *
```
**A few notes:**
- To flexibly configure networks in the YAML configuration file, all backbone nodes need to register in `ppdet.core.workspace` as shown in the preceding example. In addition, `serializable` can be used to enable backbone to support serialization;
- All backbone needs to inherit the `paddle.nn.Layer` class and implement the forward function. In addition, it is necessary to implement the out shape attribute to define the channel information of the output feature map. For details, please refer to the source code.
- `__shared__` To realize global sharing of configuration parameters, these parameters can be shared by all registration modules, such as backbone, neck, head, and loss.
##### 2.1.2 Create Neck
The feature fusion module is placed under the `ppdet/modeling/necks` directory and we create the following `yolo_fpn.py`:
``` python
import paddle.nn as nn
from ppdet.core.workspace import register, serializable
@register
@serializable
class YOLOv3FPN(nn.Layer):
__shared__ = ['norm_type']
def __init__(self,
in_channels=[256, 512, 1024],
norm_type='bn'):
super(YOLOv3FPN, self).__init__()
# Omit the content
def forward(self, blocks):
# Omit the content
pass
@classmethod
def from_config(cls, cfg, input_shape):
# Omit the content
pass
@property
def out_shape(self):
# Omit the content
pass
```
Then add a reference to `necks/__init__.py`:
```python
from . import yolo_fpn
from .yolo_fpn import *
```
**A few notes:**
- The neck module needs to be registered with `register` and can be serialized with `serializable`.
- The neck module needs to inherit the `paddle.nn.Layer` class and implement the forward function. In addition, the `out_shape` attribute needs to be implemented to define the channel information of the output feature map, and the class function `from_config` needs to be implemented to deduce the input channel in the configuration file and initialize `YOLOv3FPN`.
- The neck module can use `shared` to implement global sharing of configuration parameters.
##### 2.1.3 Create Head
The head module is all stored in the `ppdet/modeling/heads` directory, where we create `yolo_head.py` as follows
``` python
import paddle.nn as nn
from ppdet.core.workspace import register
@register
class YOLOv3Head(nn.Layer):
__shared__ = ['num_classes']
__inject__ = ['loss']
def __init__(self,
anchors=[[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45],[59, 119],
[116, 90], [156, 198], [373, 326]],
anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
num_classes=80,
loss='YOLOv3Loss',
iou_aware=False,
iou_aware_factor=0.4):
super(YOLOv3Head, self).__init__()
# Omit the content
def forward(self, feats, targets=None):
# Omit the content
pass
```
Then add a reference to `heads/__init__.py`:
```python
from . import yolo_head
from .yolo_head import *
```
**A few notes:**
- The head module needs to register with `register`.
- The head module needs to inherit the `paddle.nn.Layer` class and implement the forward function.
- `__inject__` indicates that the module encapsulated in the global dictionary is imported. Such as loss, etc.
##### 2.1.4 Create Loss
The loss modules are all stored under `ppdet/modeling/losses` directory, where we created `yolo_loss.py`
```python
import paddle.nn as nn
from ppdet.core.workspace import register
@register
class YOLOv3Loss(nn.Layer):
__inject__ = ['iou_loss', 'iou_aware_loss']
__shared__ = ['num_classes']
def __init__(self,
num_classes=80,
ignore_thresh=0.7,
label_smooth=False,
downsample=[32, 16, 8],
scale_x_y=1.,
iou_loss=None,
iou_aware_loss=None):
super(YOLOv3Loss, self).__init__()
# Omit the content
def forward(self, inputs, targets, anchors):
# Omit the content
pass
```
Then add a reference to `losses/__init__.py`:
```python
from . import yolo_loss
from .yolo_loss import *
```
**A few notes:**
- The loss module needs to register with `register`.
- The loss module needs to inherit the `paddle.nn.Layer` class and implement the forward function.
- `__inject__` modules that have been encapsulated in the global dictionary can be used. Some parameters can be globally shared with `__shared__` configuration.
##### 2.1.5 Create Post-processing Module
The post-processing module is defined in `ppdet/modeling/post_process.py`, where the `BBoxPostProcess` class is defined for post-processing operations, as follows:
``` python
from ppdet.core.workspace import register
@register
class BBoxPostProcess(object):
__shared__ = ['num_classes']
__inject__ = ['decode', 'nms']
def __init__(self, num_classes=80, decode=None, nms=None):
# Omit the content
pass
def __call__(self, head_out, rois, im_shape, scale_factor):
# Omit the content
pass
```
**A few notes:**
- Post-processing modules need to register with `register`
- `__inject__` modules encapsulated in the global dictionary, such as decode and NMS. Decode and NMS are defined in `ppdet/modeling/layers.py`.
##### 2.1.6 Create Architecture
All architecture network code is placed in `ppdet/modeling/architectures` directory, `meta_arch.py` defines the `BaseArch` class, the code is as follows:
``` python
import paddle.nn as nn
from ppdet.core.workspace import register
@register
class BaseArch(nn.Layer):
def __init__(self):
super(BaseArch, self).__init__()
def forward(self, inputs):
self.inputs = inputs
self.model_arch()
if self.training:
out = self.get_loss()
else:
out = self.get_pred()
return out
def model_arch(self, ):
pass
def get_loss(self, ):
raise NotImplementedError("Should implement get_loss method!")
def get_pred(self, ):
raise NotImplementedError("Should implement get_pred method!")
```
All architecture needs to inherit from the `BaseArch` class, as defined by `yolo.py` in `YOLOv3` as follows:
``` python
@register
class YOLOv3(BaseArch):
__category__ = 'architecture'
__inject__ = ['post_process']
def __init__(self,
backbone='DarkNet',
neck='YOLOv3FPN',
yolo_head='YOLOv3Head',
post_process='BBoxPostProcess'):
super(YOLOv3, self).__init__()
self.backbone = backbone
self.neck = neck
self.yolo_head = yolo_head
self.post_process = post_process
@classmethod
def from_config(cls, cfg, *args, **kwargs):
# Omit the content
pass
def get_loss(self):
# Omit the content
pass
def get_pred(self):
# Omit the content
pass
```
**A few notes:**
- All architecture needs to be registered using a `register`
- When constructing a complete network, `__category__ = 'architecture'` must be set to represent a complete object detection model;
- Backbone, neck, YOLO head, post-process and other inspection components are passed into the architecture to form the final network. Modularization of detection like this improves the reusability of detection models, and multiple models can be obtained by combining different detection components.
- The from config class function implements the automatic configuration of channels when modules are combined.
#### 2.2 Create Configuration File
##### 2.2.1 Network Structure Configuration File
The configuration of the yolov3 network structure is defined in the `configs/yolov3/_base_/` folder. For example, `yolov3_darknet53.yml` defines the network structure of Yolov3 Darknet as follows:
```
architecture: YOLOv3
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
norm_type: sync_bn
YOLOv3:
backbone: DarkNet
neck: YOLOv3FPN
yolo_head: YOLOv3Head
post_process: BBoxPostProcess
DarkNet:
depth: 53
return_idx: [2, 3, 4]
# use default config
# YOLOv3FPN:
YOLOv3Head:
anchors: [[10, 13], [16, 30], [33, 23],
[30, 61], [62, 45], [59, 119],
[116, 90], [156, 198], [373, 326]]
anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
loss: YOLOv3Loss
YOLOv3Loss:
ignore_thresh: 0.7
downsample: [32, 16, 8]
label_smooth: false
BBoxPostProcess:
decode:
name: YOLOBox
conf_thresh: 0.005
downsample_ratio: 32
clip_bbox: true
nms:
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.01
nms_threshold: 0.45
nms_top_k: 1000
```
In the configuration file, you need to specify the network architecture, pretrain weights to specify the URL or path of the training model, and norm type to share as global parameters. The definition of the model is defined in the file from top to bottom, corresponding to the model components in the previous section. For some model components, if the default parameters are used, you do not need to configure them, such as `yolo_fpn` above. By changing related configuration, we can easily combine another model, such as `configs/yolov3/_base_/yolov3_mobilenet_v1.yml` to switch backbone from Darknet to MobileNet.
##### 2.2.2 Optimizer configuration file
The optimizer profile defines the optimizer used by the model and the learning rate scheduling strategy. Currently, a variety of optimizers and learning rate strategies have been integrated in PaddleDetection, as described in the code `ppdet/optimizer.py`. For example, the optimizer configuration file for yolov3 is defined in `configs/yolov3/_base_/optimizer_270e.yml` as follows:
```
epoch: 270
LearningRate:
base_lr: 0.001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
# epoch number
- 216
- 243
- !LinearWarmup
start_factor: 0.
steps: 4000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0005
type: L2
```
**A few notes:**
- Optimizer builder. Optimizer specifies the type and parameters of the Optimizer. Currently support the optimizer can reference [PaddlePaddle official documentation](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html)
- The `LearningRate.schedulers` sets the combination of different Learning Rate adjustment strategies. Paddle currently supports a variety of Learning Rate adjustment strategies. Specific also can reference [Paddle Paddle official documentation](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html). It is important to note that you need to simply package the learning rate adjustment strategy in Paddle, which can be found in the source code `ppdet/optimizer.py`.
##### 2.2.3 Reader Configuration File
For Reader configuration, see [Reader configuration documentation](./READER_en.md#5.Configuration-and-Operation).
> After reading this document, you should have some experience in model construction and configuration of Paddle Detection, and you will understand it more thoroughly with the source code. If you have other questions or suggestions about model technology, please send us an issue. We welcome your feedback.

View File

@@ -0,0 +1,336 @@
# 数据处理模块
## 目录
- [1.简介](#1.简介)
- [2.数据集](#2.数据集)
- [2.1COCO数据集](#2.1COCO数据集)
- [2.2Pascal VOC数据集](#2.2Pascal-VOC数据集)
- [2.3自定义数据集](#2.3自定义数据集)
- [3.数据预处理](#3.数据预处理)
- [3.1数据增强算子](#3.1数据增强算子)
- [3.2自定义数据增强算子](#3.2自定义数据增强算子)
- [4.Raeder](#4.Reader)
- [5.配置及运行](#5.配置及运行)
- [5.1配置](#5.1配置)
- [5.2运行](#5.2运行)
### 1.简介
PaddleDetection的数据处理模块的所有代码逻辑在`ppdet/data/`中,数据处理模块用于加载数据并将其转换成适用于物体检测模型的训练、评估、推理所需要的格式。
数据处理模块的主要构成如下架构所示:
```bash
ppdet/data/
├── reader.py # 基于Dataloader封装的Reader模块
├── source # 数据源管理模块
│ ├── dataset.py # 定义数据源基类,各类数据集继承于此
│ ├── coco.py # COCO数据集解析与格式化数据
│ ├── voc.py # Pascal VOC数据集解析与格式化数据
│ ├── widerface.py # WIDER-FACE数据集解析与格式化数据
│ ├── category.py # 相关数据集的类别信息
├── transform # 数据预处理模块
│ ├── batch_operators.py # 定义各类基于批量数据的预处理算子
│ ├── op_helper.py # 预处理算子的辅助函数
│ ├── operators.py # 定义各类基于单张图片的预处理算子
│ ├── gridmask_utils.py # GridMask数据增强函数
│ ├── autoaugment_utils.py # AutoAugment辅助函数
├── shm_utils.py # 用于使用共享内存的辅助函数
```
### 2.数据集
数据集定义在`source`目录下,其中`dataset.py`中定义了数据集的基类`DetDataSet`, 所有的数据集均继承于基类,`DetDataset`基类里定义了如下等方法:
| 方法 | 输入 | 输出 | 备注 |
| :------------------------: | :----: | :------------: | :--------------: |
| \_\_len\_\_ || int, 数据集中样本的数量 | 过滤掉了无标注的样本 |
| \_\_getitem\_\_ | int, 样本的索引idx | dict, 索引idx对应的样本roidb | 得到transform之后的样本roidb |
| check_or_download_dataset ||| 检查数据集是否存在如果不存在则下载目前支持COCO, VOCwiderface等数据集 |
| set_kwargs | 可选参数,以键值对的形式给出 || 目前用于支持接收mixup, cutmix等参数的设置 |
| set_transform | 一系列的transform函数 || 设置数据集的transform函数 |
| set_epoch | int, 当前的epoch || 用于dataset与训练过程的交互 |
| parse_dataset ||| 用于从数据中读取所有的样本 |
| get_anno ||| 用于获取标注文件的路径 |
当一个数据集类继承自`DetDataSet`那么它只需要实现parse_dataset函数即可。parse_dataset根据数据集设置的数据集根路径dataset_dir图片文件夹image_dir 标注文件路径anno_path取出所有的样本并将其保存在一个列表roidbs中每一个列表中的元素为一个样本xxx_rec(比如coco_rec或者voc_rec)用dict表示dict中包含样本的image, gt_bbox, gt_class等字段。COCO和Pascal-VOC数据集中的xxx_rec的数据结构定义如下
```python
xxx_rec = {
'im_file': im_fname, # 一张图像的完整路径
'im_id': np.array([img_id]), # 一张图像的ID序号
'h': im_h, # 图像高度
'w': im_w, # 图像宽度
'is_crowd': is_crowd, # 是否是群落对象, 默认为0 (VOC中无此字段)
'gt_class': gt_class, # 标注框标签名称的ID序号
'gt_bbox': gt_bbox, # 标注框坐标(xmin, ymin, xmax, ymax)
'gt_poly': gt_poly, # 分割掩码此字段只在coco_rec中出现默认为None
'difficult': difficult # 是否是困难样本此字段只在voc_rec中出现默认为0
}
```
xxx_rec中的内容也可以通过`DetDataSet`的data_fields参数来控制即可以过滤掉一些不需要的字段但大多数情况下不需要修改按照`configs/datasets`中的默认配置即可。
此外在parse_dataset函数中保存了类别名到id的映射的一个字典`cname2cid`。在coco数据集中会利用[COCO API](https://github.com/cocodataset/cocoapi)从标注文件中加载数据集的类别名并设置此字典。在voc数据集中如果设置`use_default_label=False`,将从`label_list.txt`中读取类别列表反之将使用voc默认的类别列表。
#### 2.1COCO数据集
COCO数据集目前分为COCO2014和COCO2017主要由json文件和image文件组成其组织结构如下所示
```
dataset/coco/
├── annotations
│ ├── instances_train2014.json
│ ├── instances_train2017.json
│ ├── instances_val2014.json
│ ├── instances_val2017.json
│ │ ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ │ ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ │ ...
```
`source/coco.py`中定义并注册了`COCODataSet`数据集类,其继承自`DetDataSet`并实现了parse_dataset方法调用[COCO API](https://github.com/cocodataset/cocoapi)加载并解析COCO格式数据源`roidbs``cname2cid`,具体可参见`source/coco.py`源码。将其他数据集转换成COCO格式可以参考[用户数据转成COCO数据](../tutorials/data/PrepareDetDataSet.md#用户数据转成COCO数据)
#### 2.2Pascal VOC数据集
该数据集目前分为VOC2007和VOC2012主要由xml文件和image文件组成其组织结构如下所示
```
dataset/voc/
├── trainval.txt
├── test.txt
├── label_list.txt (optional)
├── VOCdevkit/VOC2007
│ ├── Annotations
│ ├── 001789.xml
│ │ ...
│ ├── JPEGImages
│ ├── 001789.jpg
│ │ ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 2011_003876.xml
│ │ ...
│ ├── JPEGImages
│ ├── 2011_003876.jpg
│ │ ...
│ ├── ImageSets
│ │ ...
```
在`source/voc.py`中定义并注册了`VOCDataSet`数据集,它继承自`DetDataSet`基类,并重写了`parse_dataset`方法解析VOC数据集中xml格式标注文件更新`roidbs`和`cname2cid`。将其他数据集转换成VOC格式可以参考[用户数据转成VOC数据](../tutorials/data/PrepareDetDataSet.md#用户数据转成VOC数据)
#### 2.3自定义数据集
如果COCODataSet和VOCDataSet不能满足你的需求可以通过自定义数据集的方式来加载你的数据集。只需要以下两步即可实现自定义数据集
1. 新建`source/xxx.py`,定义类`XXXDataSet`继承自`DetDataSet`基类,完成注册与序列化,并重写`parse_dataset`方法对`roidbs`与`cname2cid`更新:
```python
from ppdet.core.workspace import register, serializable
#注册并序列化
@register
@serializable
class XXXDataSet(DetDataSet):
def __init__(self,
dataset_dir=None,
image_dir=None,
anno_path=None,
...
):
self.roidbs = None
self.cname2cid = None
...
def parse_dataset(self):
...
省略具体解析数据逻辑
...
self.roidbs, self.cname2cid = records, cname2cid
```
2. 在`source/__init__.py`中添加引用:
```python
from . import xxx
from .xxx import *
```
完成以上两步就将新的数据源`XXXDataSet`添加好了,你可以参考[配置及运行](#5.配置及运行)实现自定义数据集的使用。
### 3.数据预处理
#### 3.1数据增强算子
PaddleDetection中支持了种类丰富的数据增强算子有单图像数据增强算子与批数据增强算子两种方式您可选取合适的算子组合使用。单图像数据增强算子定义在`transform/operators.py`中,已支持的单图像数据增强算子详见下表:
| 名称 | 作用 |
| :---------------------: | :--------------: |
| Decode | 从图像文件或内存buffer中加载图像格式为RGB格式 |
| Permute | 假如输入是HWC顺序变成CHW |
| RandomErasingImage | 对图像进行随机擦除 |
| NormalizeImage | 对图像像素值进行归一化如果设置is_scale=True则先将像素值除以255.0, 再进行归一化。 |
| GridMask | GridMask数据增广 |
| RandomDistort | 随机扰动图片亮度、对比度、饱和度和色相 |
| AutoAugment | AutoAugment数据增广包含一系列数据增强方法 |
| RandomFlip | 随机水平翻转图像 |
| Resize | 对于图像进行resize并对标注进行相应的变换 |
| MultiscaleTestResize | 将图像重新缩放为多尺度list的每个尺寸 |
| RandomResize | 对于图像进行随机Resize可以Resize到不同的尺寸以及使用不同的插值策略 |
| RandomExpand | 将原始图片放入用像素均值填充的扩张图中,对此图进行裁剪、缩放和翻转 |
| CropWithSampling | 根据缩放比例、长宽比例生成若干候选框,再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果 |
| CropImageWithDataAchorSampling | 基于CropImage在人脸检测中随机将图片尺度变换到一定范围的尺度大大增强人脸的尺度变化 |
| RandomCrop | 原理同CropImage以随机比例与IoU阈值进行处理 |
| RandomScaledCrop | 根据长边对图像进行随机裁剪,并对标注做相应的变换 |
| Cutmix | Cutmix数据增强对两张图片做拼接 |
| Mixup | Mixup数据增强按比例叠加两张图像 |
| NormalizeBox | 对bounding box进行归一化 |
| PadBox | 如果bounding box的数量少于num_max_boxes则将零填充到bbox |
| BboxXYXY2XYWH | 将bounding box从(xmin,ymin,xmax,ymin)形式转换为(xmin,ymin,width,height)格式 |
| Pad | 将图片Pad某一个数的整数倍或者指定的size并支持指定Pad的方式 |
| Poly2Mask | Poly2Mask数据增强
批数据增强算子定义在`transform/batch_operators.py`中, 目前支持的算子列表如下:
| 名称 | 作用 |
| :---------------------: | :--------------: |
| PadBatch | 随机对每个batch的数据图片进行Pad操作使得batch中的图片具有相同的shape |
| BatchRandomResize | 对一个batch的图片进行resize使得batch中的图片随机缩放到相同的尺寸 |
| Gt2YoloTarget | 通过gt数据生成YOLO系列模型的目标 |
| Gt2FCOSTarget | 通过gt数据生成FCOS模型的目标 |
| Gt2TTFTarget | 通过gt数据生成TTFNet模型的目标 |
| Gt2Solov2Target | 通过gt数据生成SOLOv2模型的目标 |
**几点说明:**
- 数据增强算子的输入为sample或者samples每一个sample对应上文所说的`DetDataSet`输出的roidbs中的一个样本如coco_rec或者voc_rec
- 单图像数据增强算子(Mixup, Cutmix等除外)也可用于批数据处理中。但是单图像处理算子和批图像处理算子仍有一些差异以RandomResize和BatchRandomResize为例RandomResize会将一个Batch中的每张图片进行随机缩放但是每一张图像Resize之后的形状不尽相同BatchRandomResize则会将一个Batch中的所有图片随机缩放到相同的形状。
- 除BatchRandomResize外定义在`transform/batch_operators.py`的批数据增强算子接收的输入图像均为CHW形式所以使用这些批数据增强算子前请先使用Permute进行处理。如果用到Gt2xxxTarget算子需要将其放置在靠后的位置。NormalizeBox算子建议放置在Gt2xxxTarget之前。将这些限制条件总结下来推荐的预处理算子的顺序为
```
- XXX: {}
- ...
- BatchRandomResize: {...} # 如果不需要可以移除如果需要放置在Permute之前
- Permute: {} # 必须项
- NormalizeBox: {} # 如果需要建议放在Gt2XXXTarget之前
- PadBatch: {...} # 如果不需要可移除如果需要建议放置在Permute之后
- Gt2XXXTarget: {...} # 建议与PadBatch放置在最后的位置
```
#### 3.2自定义数据增强算子
如果需要自定义数据增强算子,那么您需要了解下数据增强算子的相关逻辑。数据增强算子基类为定义在`transform/operators.py`中的`BaseOperator`类,单图像数据增强算子与批数据增强算子均继承自这个基类。完整定义参考源码,以下代码显示了`BaseOperator`类的关键函数: apply和__call__方法
``` python
class BaseOperator(object):
...
def apply(self, sample, context=None):
return sample
def __call__(self, sample, context=None):
if isinstance(sample, Sequence):
for i in range(len(sample)):
sample[i] = self.apply(sample[i], context)
else:
sample = self.apply(sample, context)
return sample
```
__call__方法为`BaseOperator`的调用入口接收一个sample(单图像)或者多个sample(多图像)作为输入并调用apply函数对一个或者多个sample进行处理。大多数情况下你只需要继承`BaseOperator`重写apply方法或者重写__call__方法即可如下所示定义了一个XXXOp继承自BaseOperator并注册
```python
@register_op
class XXXOp(BaseOperator):
def __init__(self,...):
super(XXXImage, self).__init__()
...
# 大多数情况下只需要重写apply方法
def apply(self, sample, context=None):
...
省略对输入的sample具体操作
...
return sample
# 如果有需要可以重写__call__方法如Mixup, Gt2XXXTarget等
# def __call__(self, sample, context=None):
# ...
# 省略对输入的sample具体操作
# ...
# return sample
```
大多数情况下只需要重写apply方法即可如`transform/operators.py`中除Mixup和Cutmix外的预处理算子。对于批处理的情况一般需要重写__call__方法如`transform/batch_operators.py`的预处理算子。
### 4.Reader
Reader相关的类定义在`reader.py`, 其中定义了`BaseDataLoader`类。`BaseDataLoader`在`paddle.io.DataLoader`的基础上封装了一层,其具备`paddle.io.DataLoader`的所有功能,并能够实现不同模型对于`DetDataset`的不同需求如可以通过对Reader进行设置以控制`DetDataset`支持Mixup, Cutmix等操作。除此之外数据预处理算子通过`Compose`类和`BatchCompose`类组合起来分别传入`DetDataset`和`paddle.io.DataLoader`中。
所有的Reader类都继承自`BaseDataLoader`类,具体可参见源码。
### 5.配置及运行
#### 5.1 配置
与数据预处理相关的模块的配置文件包含所有模型公用的Dataset的配置文件以及不同模型专用的Reader的配置文件。
##### 5.1.1 Dataset配置
关于Dataset的配置文件存在于`configs/datasets`文件夹。比如COCO数据集的配置文件如下
```
metric: COCO # 目前支持COCO, VOC, OID WiderFace等评估标准
num_classes: 80 # num_classes数据集的类别数不包含背景类
TrainDataset:
!COCODataSet
image_dir: train2017 # 训练集的图片所在文件相对于dataset_dir的路径
anno_path: annotations/instances_train2017.json # 训练集的标注文件相对于dataset_dir的路径
dataset_dir: dataset/coco #数据集所在路径相对于PaddleDetection路径
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # 控制dataset输出的sample所包含的字段注意此为TrainDataset独有的且必须配置的字段
EvalDataset:
!COCODataSet
image_dir: val2017 # 验证集的图片所在文件夹相对于dataset_dir的路径
anno_path: annotations/instances_val2017.json # 验证集的标注文件相对于dataset_dir的路径
dataset_dir: dataset/coco # 数据集所在路径相对于PaddleDetection路径
TestDataset:
!ImageFolder
anno_path: annotations/instances_val2017.json # 标注文件所在路径仅用于读取数据集的类别信息支持json和txt格式
dataset_dir: dataset/coco # 数据集所在路径,若添加了此行,则`anno_path`路径为`dataset_dir/anno_path`,若此行不设置或去掉此行,则`anno_path`路径即为`anno_path`
```
在PaddleDetection的yml配置文件中使用`!`直接序列化模块实例(可以是函数,实例等)上述的配置文件均使用Dataset进行了序列化。
**注意:**
请运行前自行仔细检查数据集的配置路径在训练或验证时如果TrainDataset和EvalDataset的路径配置有误会提示自动下载数据集。若使用自定义数据集在推理时如果TestDataset路径配置有误会提示使用默认COCO数据集的类别信息。
##### 5.1.2 Reader配置
不同模型专用的Reader定义在每一个模型的文件夹下如yolov3的Reader配置文件定义在`configs/yolov3/_base_/yolov3_reader.yml`。一个Reader的示例配置如下
```
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
...
batch_transforms:
...
batch_size: 8
shuffle: true
drop_last: true
use_shared_memory: true
EvalReader:
sample_transforms:
- Decode: {}
...
batch_size: 1
TestReader:
inputs_def:
image_shape: [3, 608, 608]
sample_transforms:
- Decode: {}
...
batch_size: 1
```
你可以在Reader中定义不同的预处理算子每张卡的batch_size以及DataLoader的worker_num等。
#### 5.2运行
在PaddleDetection的训练、评估和测试运行程序中都通过创建Reader迭代器。Reader在`ppdet/engine/trainer.py`中创建。下面的代码展示了如何创建训练时的Reader
``` python
from ppdet.core.workspace import create
# build data loader
self.dataset = cfg['TrainDataset']
self.loader = create('TrainReader')(selfdataset, cfg.worker_num)
```
相应的预测以及评估时的Reader与之类似具体可参考`ppdet/engine/trainer.py`源码。
> 关于数据处理模块如您有其他问题或建议请给我们提issue我们非常欢迎您的反馈。

View File

@@ -0,0 +1,336 @@
# Data Processing Module
## Directory
- [Data Processing Module](#data-processing-module)
- [Directory](#directory)
- [1.Introduction](#1introduction)
- [2.Dataset](#2dataset)
- [2.1COCO Dataset](#21coco-dataset)
- [2.2Pascal VOC dataset](#22pascal-voc-dataset)
- [2.3Customize Dataset](#23customize-dataset)
- [3.Data preprocessing](#3data-preprocessing)
- [3.1Data Enhancement Operator](#31data-enhancement-operator)
- [3.2Custom data enhancement operator](#32custom-data-enhancement-operator)
- [4.Reader](#4reader)
- [5.Configuration and Operation](#5configuration-and-operation)
- [5.1Configuration](#51configuration)
- [5.2run](#52run)
### 1.Introduction
All code logic for Paddle Detection's data processing module in `ppdet/data/`, the data processing module is used to load data and convert it into a format required for training, evaluation and reasoning of object Detection models. The main components of the data processing module are as follows:
The main components of the data processing module are as follows:
```bash
ppdet/data/
├── reader.py # Reader module based on Dataloader encapsulation
├── source # Data source management module
│ ├── dataset.py # Defines the data source base class from which various datasets are inherited
│ ├── coco.py # The COCO dataset parses and formats the data
│ ├── voc.py # Pascal VOC datasets parse and format data
│ ├── widerface.py # The WIDER-FACE dataset parses and formats data
│ ├── category.py # Category information for the relevant dataset
├── transform # Data preprocessing module
│ ├── batch_operators.py # Define all kinds of preprocessing operators based on batch data
│ ├── op_helper.py # The auxiliary function of the preprocessing operator
│ ├── operators.py # Define all kinds of preprocessing operators based on single image
│ ├── gridmask_utils.py # GridMask data enhancement function
│ ├── autoaugment_utils.py # AutoAugment auxiliary function
├── shm_utils.py # Auxiliary functions for using shared memory
```
### 2.Dataset
The dataset is defined in the `source` directory, where `dataset.py` defines the base class `DetDataSet` of the dataset. All datasets inherit from the base class, and the `DetDataset` base class defines the following methods:
| Method | Input | Output | Note |
| :-----------------------: | :------------------------------------------: | :---------------------------------------: | :-------------------------------------------------------------------------------------------------------------: |
| \_\_len\_\_ | no | int, the number of samples in the dataset | Filter out the unlabeled samples |
| \_\_getitem\_\_ | int, The index of the sample | dict, Index idx to sample ROIDB | Get the sample roidb after transform |
| check_or_download_dataset | no | no | Check whether the dataset exists, if not, download, currently support COCO, VOC, Widerface and other datasets |
| set_kwargs | Optional arguments, given as key-value pairs | no | Currently used to support receiving mixup, cutMix and other parameters |
| set_transform | A series of transform functions | no | Set the transform function of the dataset |
| set_epoch | int, current epoch | no | Interaction between dataset and training process |
| parse_dataset | no | no | Used to read all samples from the data |
| get_anno | no | no | Used to get the path to the annotation file |
When a dataset class inherits from `DetDataSet`, it simply implements the Parse dataset function. parse_dataset set dataset root path dataset_dir, image folder image dir, annotated file path anno_path retrieve all samples and save them in a list roidbs Each element in the list is a sample XXX rec(such as coco_rec or voc_rec), represented by dict, which contains the sample image, gt_bbox, gt_class and other fields. The data structure of xxx_rec in COCO and Pascal-VOC datasets is defined as follows:
```python
xxx_rec = {
'im_file': im_fname, # The full path to an image
'im_id': np.array([img_id]), # The ID number of an image
'h': im_h, # Height of the image
'w': im_w, # The width of the image
'is_crowd': is_crowd, # Community object, default is 0 (VOC does not have this field)
'gt_class': gt_class, # ID number of an enclosure label name
'gt_bbox': gt_bbox, # label box coordinates(xmin, ymin, xmax, ymax)
'gt_poly': gt_poly, # Segmentation mask. This field only appears in coco_rec and defaults to None
'difficult': difficult # Is it a difficult sample? This field only appears in voc_rec and defaults to 0
}
```
The contents of the xxx_rec can also be controlled by the Data fields parameter of `DetDataSet`, that is, some unwanted fields can be filtered out, but in most cases you do not need to change them. The default configuration in `configs/datasets` will do.
In addition, a dictionary `cname2cid` holds the mapping of category names to IDS in the Parse dataset function. In coco dataset, can use [coco API](https://github.com/cocodataset/cocoapi) from the label category name of the file to load dataset, and set up the dictionary. In the VOC dataset, if `use_default_label=False` is set, the category list will be read from `label_list.txt`, otherwise the VOC default category list will be used.
#### 2.1COCO Dataset
COCO datasets are currently divided into COCO2014 and COCO2017, which are mainly composed of JSON files and image files, and their organizational structure is shown as follows:
```
dataset/coco/
├── annotations
│ ├── instances_train2014.json
│ ├── instances_train2017.json
│ ├── instances_val2014.json
│ ├── instances_val2017.json
│ │ ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ │ ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ │ ...
```
class `COCODataSet` is defined and registered on `source/coco.py`. And implements the parse the dataset method, called [COCO API](https://github.com/cocodataset/cocoapi) to load and parse COCO format data source ` roidbs ` and ` cname2cid `, See `source/coco.py` source code for details. Converting other datasets to COCO format can be done by referring to [converting User Data to COCO Data](../tutorials/data/PrepareDataSet_en.md#convert-user-data-to-coco-data)
And implements the parse the dataset method, called [COCO API](https://github.com/cocodataset/cocoapi) to load and parse COCO format data source `roidbs` and `cname2cid`, See `source/coco.py` source code for details. Converting other datasets to COCO format can be done by referring to [converting User Data to COCO Data](../tutorials/data/PrepareDetDataSet_en.md#convert-user-data-to-coco-data)
#### 2.2Pascal VOC dataset
The dataset is currently divided into VOC2007 and VOC2012, mainly composed of XML files and image files, and its organizational structure is shown as follows:
```
dataset/voc/
├── trainval.txt
├── test.txt
├── label_list.txt (optional)
├── VOCdevkit/VOC2007
│ ├── Annotations
│ ├── 001789.xml
│ │ ...
│ ├── JPEGImages
│ ├── 001789.jpg
│ │ ...
│ ├── ImageSets
│ | ...
├── VOCdevkit/VOC2012
│ ├── Annotations
│ ├── 2011_003876.xml
│ │ ...
│ ├── JPEGImages
│ ├── 2011_003876.jpg
│ │ ...
│ ├── ImageSets
│ │ ...
```
The `VOCDataSet` dataset is defined and registered in `source/voc.py` . It inherits the `DetDataSet` base class and rewrites the `parse_dataset` method to parse XML annotations in the VOC dataset. Update `roidbs` and `cname2cid`. To convert other datasets to VOC format, refer to [User Data to VOC Data](../tutorials/data/PrepareDetDataSet_en.md#convert-user-data-to-voc-data)
#### 2.3Customize Dataset
If the COCO dataset and VOC dataset do not meet your requirements, you can load your dataset by customizing it. There are only two steps to implement a custom dataset
1. create`source/xxx.py`, define class `XXXDataSet` extends from `DetDataSet` base class, complete registration and serialization, and rewrite `parse_dataset`methods to update `roidbs` and `cname2cid`:
```python
from ppdet.core.workspace import register, serializable
#Register and serialize
@register
@serializable
class XXXDataSet(DetDataSet):
def __init__(self,
dataset_dir=None,
image_dir=None,
anno_path=None,
...
):
self.roidbs = None
self.cname2cid = None
...
def parse_dataset(self):
...
Omit concrete parse data logic
...
self.roidbs, self.cname2cid = records, cname2cid
```
2. Add a reference to `source/__init__.py`:
```python
from . import xxx
from .xxx import *
```
Complete the above two steps to add the new Data source `XXXDataSet`, you can refer to [Configure and Run](#5.Configuration-and-Operation) to implement the use of custom datasets.
### 3.Data preprocessing
#### 3.1Data Enhancement Operator
A variety of data enhancement operators are supported in PaddleDetection, including single image data enhancement operator and batch data enhancement operator. You can choose suitable operators to use in combination. Single image data enhancement operators are defined in `transform/operators.py`. The supported single image data enhancement operators are shown in the following table:
| Name | Function |
| :----------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| Decode | Loads an image from an image file or memory buffer in RGB format |
| Permute | If the input is HWC, the sequence changes to CHW |
| RandomErasingImage | Random erasure of the image |
| NormalizeImage | The pixel value of the image is normalized. If is scale= True is set, the pixel value is divided by 255.0 before normalization. |
| GridMask | GridMask data is augmented |
| RandomDistort | Random disturbance of image brightness, contrast, saturation and hue |
| AutoAugment | Auto Augment data, which contains a series of data augmentation methods |
| RandomFlip | Randomly flip the image horizontally |
| Resize | Resize the image and transform the annotation accordingly |
| MultiscaleTestResize | Rescale the image to each size of the multi-scale list |
| RandomResize | Random Resize of images can be resized to different sizes and different interpolation strategies can be used |
| RandomExpand | Place the original image into an expanded image filled with pixel mean, crop, scale, and flip the image |
| CropWithSampling | Several candidate frames are generated according to the scaling ratio and length-width ratio, and then the prunning results that meet the requirements are selected according to the area intersection ratio (IoU) between these candidate frames and the marking frames |
| CropImageWithDataAchorSampling | Based on Crop Image, in face detection, the Image scale is randomly transformed to a certain range of scale, which greatly enhances the scale change of face |
| RandomCrop | The principle is the same as CropImage, which is processed with random proportion and IoU threshold |
| RandomScaledCrop | According to the long edge, the image is randomly clipped and the corresponding transformation is made to the annotations |
| Cutmix | Cutmix data enhancement, Mosaic of two images |
| Mixup | Mixup data enhancement to scale up two images |
| NormalizeBox | Bounding box is normalized |
| PadBox | If the number of bounding boxes is less than num Max boxes, zero is populated into bboxes |
| BboxXYXY2XYWH | Bounding Box is converted from (xmin,ymin,xmax,ymin) form to (xmin,ymin, Width,height) form |
| Pad | The image Pad is an integer multiple of a certain number or the specified size, and supports the way of specifying Pad |
| Poly2Mask | Poly2Mask data enhancement |
Batch data enhancement operators are defined in `transform/batch_operators.py`. The list of operators currently supported is as follows:
| Name | Function |
| :---------------: | :------------------------------------------------------------------------------------------------------------------: |
| PadBatch | Pad operation is performed on each batch of data images randomly to make the images in the batch have the same shape |
| BatchRandomResize | Resize a batch of images so that the images in the batch are randomly scaled to the same size |
| Gt2YoloTarget | Generate the objectives of YOLO series models from GT data |
| Gt2FCOSTarget | Generate the target of the FCOS model from GT data |
| Gt2TTFTarget | Generate TTF Net model targets from GT data |
| Gt2Solov2Target | Generate targets for SOL Ov2 models from GT data |
**A few notes:**
- The input of Data enhancement operator is sample or samples, and each sample corresponds to a sample of RoIDBS output by `DetDataSet` mentioned above, such as coco_rec or voc_rec
- Single image data enhancement operators (except Mixup, Cutmix, etc.) can also be used in batch data processing. However, there are still some differences between single image processing operators and Batch image processing operators. Taking Random Resize and Batch Random Resize as an example, Random Resize will randomly scale each picture in a Batch. However, the shapes of each image after Resize are different. Batch Random Resize means that all images in a Batch will be randomly scaled to the same shape.
- In addition to Batch Random Resize, the Batch data enhancement operators defined in `transform/batch_operators.py` receive input images in the form of CHW, so please use Permute before using these Batch data enhancement operators . If the Gt2xxx Target operator is used, it needs to be placed further back. The Normalize Box operator is recommended to be placed before Gt2xxx Target. After summarizing these constraints, the order of the recommended preprocessing operator is:
```
- XXX: {}
- ...
- BatchRandomResize: {...} # Remove it if not needed, and place it in front of Permute if necessary
- Permute: {} # flush privileges
- NormalizeBox: {} # If necessary, it is recommended to precede Gt2XXXTarget
- PadBatch: {...} # If not, you can remove it. If necessary, it is recommended to place it behind Permute
- Gt2XXXTarget: {...} # It is recommended to place with Pad Batch in the last position
```
#### 3.2Custom data enhancement operator
If you need to customize data enhancement operators, you need to understand the logic of data enhancement operators. The Base class of the data enhancement Operator is the `transform/operators.py`class defined in `BaseOperator`, from which both the single image data enhancement Operator and the batch data enhancement Operator inherit. Refer to the source code for the complete definition. The following code shows the key functions of the `BaseOperator` class: the apply and __call__ methods
``` python
class BaseOperator(object):
...
def apply(self, sample, context=None):
return sample
def __call__(self, sample, context=None):
if isinstance(sample, Sequence):
for i in range(len(sample)):
sample[i] = self.apply(sample[i], context)
else:
sample = self.apply(sample, context)
return sample
```
__call__ method is call entry of `BaseOperator`, Receive one sample(single image) or multiple samples (multiple images) as input, and call the Apply function to process one or more samples. In most cases, you simply inherit from `BaseOperator` and override the apply method or override the __call__ method, as shown below. Define a XXXOp that inherits from Base Operator and register it:
```python
@register_op
class XXXOp(BaseOperator):
def __init__(self,...):
super(XXXImage, self).__init__()
...
# In most cases, you just need to override the Apply method
def apply(self, sample, context=None):
...
省略对输入的sample具体操作
...
return sample
# If necessary, override call methods such as Mixup, Gt2XXXTarget, etc
# def __call__(self, sample, context=None):
# ...
# The specific operation on the input sample is omitted
# ...
# return sample
```
In most cases, you simply override the Apply method, such as the preprocessor in `transform/operators.py` in addition to Mixup and Cutmix. In the case of batch processing, it is generally necessary to override the call method, such as the preprocessing operator of `transform/batch_operators.py`.
### 4.Reader
The Reader class is defined in `reader.py`, where the `BaseDataLoader` class is defined. `BaseDataLoader` encapsulates a layer on the basis of `paddle.io.DataLoader`, which has all the functions of `paddle.io.DataLoader` and can realize the different needs of `DetDataset` for different models. For example, you can set Reader to control `DetDataset` to support Mixup, Cutmix and other operations. In addition, the Data preprocessing operators are combined into the `DetDataset` and `paddle.io.DataLoader` by the `Compose` and 'Batch Compose' classes, respectively. All Reader classes inherit from the `BaseDataLoader` class. See source code for details.
### 5.Configuration and Operation
#### 5.1 Configuration
The configuration files for modules related to data preprocessing contain the configuration files for Datasets common to all models and the configuration files for readers specific to different models.
##### 5.1.1 Dataset Configuration
The configuration file for the Dataset exists in the `configs/datasets` folder. For example, the COCO dataset configuration file is as follows:
```
metric: COCO # Currently supports COCO, VOC, OID, Wider Face and other evaluation standards
num_classes: 80 # num_classes: The number of classes in the dataset, excluding background classes
TrainDataset:
!COCODataSet
image_dir: train2017 # The path where the training set image resides relative to the dataset_dir
anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir
dataset_dir: dataset/coco #The path where the dataset is located relative to the PaddleDetection path
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # Controls the fields contained in the sample output of the dataset, note data_fields are unique to the TrainDataset and must be configured
EvalDataset:
!COCODataSet
image_dir: val2017 # The path where the images of the validation set reside relative to the dataset_dir
anno_path: annotations/instances_val2017.json # The path to the annotation file of the validation set relative to the dataset_dir
dataset_dir: dataset/coco # The path where the dataset is located relative to the PaddleDetection path
TestDataset:
!ImageFolder
anno_path: dataset/coco/annotations/instances_val2017.json # The path of the annotation file, it is only used to read the category information of the dataset. JSON and TXT formats are supported
dataset_dir: dataset/coco # The path of the dataset, note if this row is added, `anno_path` will be 'dataset_dir/anno_path`, if not set or removed, `anno_path` is `anno_path`
```
In the YML profile for Paddle Detection, use `!`directly serializes module instances (functions, instances, etc.). The above configuration files are serialized using Dataset.
**Note:**
Please carefully check the configuration path of the dataset before running. During training or verification, if the path of TrainDataset or EvalDataset is wrong, it will download the dataset automatically. When using a user-defined dataset, if the TestDataset path is incorrectly configured during inference, the category of the default COCO dataset will be used.
##### 5.1.2 Reader configuration
The Reader configuration files for yolov3 are defined in `configs/yolov3/_base_/yolov3_reader.yml`. An example Reader configuration is as follows:
```
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
...
batch_transforms:
...
batch_size: 8
shuffle: true
drop_last: true
use_shared_memory: true
EvalReader:
sample_transforms:
- Decode: {}
...
batch_size: 1
TestReader:
inputs_def:
image_shape: [3, 608, 608]
sample_transforms:
- Decode: {}
...
batch_size: 1
```
You can define different preprocessing operators in Reader, batch_size per gpu, worker_num of Data Loader, etc.
#### 5.2run
In the Paddle Detection training, evaluation, and test runs, Reader iterators are created. The Reader is created in `ppdet/engine/trainer.py`. The following code shows how to create a training-time Reader
``` python
from ppdet.core.workspace import create
# build data loader
self.dataset = cfg['TrainDataset']
self.loader = create('TrainReader')(selfdataset, cfg.worker_num)
```
The Reader for prediction and evaluation is similar to `ppdet/engine/trainer.py`.
> About the data processing module, if you have other questions or suggestions, please send us an issue, we welcome your feedback.

View File

@@ -0,0 +1,54 @@
简体中文 | [English](./README_en.md)
# 行为识别任务二次开发
在产业落地过程中应用行为识别算法不可避免地会出现希望自定义类型的行为识别的需求或是对已有行为识别模型的优化以提升在特定场景下模型的效果。鉴于行为的多样性PP-Human支持抽烟、打电话、摔倒、打架、人员闯入五种异常行为识别并根据行为的不同集成了基于视频分类、基于检测、基于图像分类、基于跟踪以及基于骨骼点的五种行为识别技术方案可覆盖90%+动作类型的识别满足各类开发需求。我们在本文档通过案例来介绍如何根据期望识别的行为来进行行为识别方案的选择以及使用PaddleDetection进行行为识别算法二次开发工作包括方案选择、数据准备、模型优化思路和新增行为的开发流程。
## 方案选择
在PaddleDetection的PP-Human中我们为行为识别提供了多种方案基于视频分类、基于图像分类、基于检测、基于跟踪以及基于骨骼点的行为识别方案以期望满足不同场景、不同目标行为的需求。对于二次开发首先我们需要确定要采用何种方案来实现行为识别的需求其核心是要通过对场景和具体行为的分析、并考虑数据采集成本等因素综合选择一个合适的识别方案。我们在这里简要列举了当前PaddleDetection中所支持的方案的优劣势和适用场景供大家参考。
<img width="1091" alt="image" src="https://user-images.githubusercontent.com/22989727/178742352-d0c61784-3e93-4406-b2a2-9067f42cb343.png">
下面以PaddleDetection目前已经支持的几个具体动作为例介绍每个动作方案的选型依据
### 吸烟
方案选择基于人体id检测的行为识别
原因:吸烟动作中具有香烟这个明显特征目标,因此我们可以认为当在某个人物的对应图像中检测到香烟时,该人物即在吸烟动作中。相比于基于视频或基于骨骼点的识别方案,训练检测模型需要采集的是图片级别而非视频级别的数据,可以明显减轻数据收集与标注的难度。此外,目标检测任务具有丰富的预训练模型资源,整体模型的效果会更有保障,
### 打电话
方案选择基于人体id分类的行为识别
原因:打电话动作中虽然有手机这个特征目标,但为了区分看手机等动作,以及考虑到在安防场景下打电话动作中会出现较多对手机的遮挡(如手对手机的遮挡、人头对手机的遮挡等等),不利于检测模型正确检测到目标。同时打电话通常持续的时间较长,且人物本身的动作不会发生太大变化,因此可以因此采用帧级别图像分类的策略。
此外,打电话这个动作主要可以通过上半身判别,可以采用半身图片,去除冗余信息以降低模型训练的难度。
### 摔倒
方案选择:基于人体骨骼点的行为识别
原因摔倒是一个明显的时序行为的动作可由一个人物本身进行区分具有场景无关的特性。由于PP-Human的场景定位偏向安防监控场景背景变化较为复杂且部署上需要考虑到实时性因此采用了基于骨骼点的行为识别方案以获得更好的泛化性及运行速度。
### 闯入
方案选择基于人体id跟踪的行为识别
原因:闯入识别判断行人的路径或所在位置是否在某区域内即可,与人体自身动作无关,因此只需要跟踪人体跟踪结果分析是否存在闯入行为。
### 打架
方案选择:基于视频分类的行为识别
原因与上面的动作不同打架是一个典型的多人组成的行为。因此不再通过检测与跟踪模型来提取行人及其ID而对整体视频片段进行处理。此外打架场景下各个目标间的互相遮挡极为严重关键点识别的准确性不高采用基于骨骼点的方案难以保证精度。
下面详细展开五大类方案的数据准备、模型优化和新增行为识别方法
1. [基于人体id检测的行为识别](./idbased_det.md)
2. [基于人体id分类的行为识别](./idbased_clas.md)
3. [基于人体骨骼点的行为识别](./skeletonbased_rec.md)
4. [基于人体id跟踪的行为识别](../pphuman_mot.md)
5. [基于视频分类的行为识别](./videobased_rec.md)

View File

@@ -0,0 +1,55 @@
[简体中文](./README.md) | English
# Secondary Development for Action Recognition Task
In the process of industrial implementation, the application of action recognition algorithms will inevitably lead to the need for customized types of action, or the optimization of existing action recognition models to improve the performance of the model in specific scenarios. In view of the diversity of behaviors, PP-Human supports the identification of five abnormal behavioras of smoking, making phone calls, falling, fighting, and people intrusion. At the same time, according to the different behaviors, PP-Human integrates five action recognition technology solutions based on video classification, detection-based, image-based classification, tracking-based and skeleton-based, which can cover 90%+ action type recognition and meet various development needs. In this document, we use a case to introduce how to select a action recognition solution according to the expected behavior, and use PaddleDetection to carry out the secondary development of the action recognition algorithm, including: solution selection, data preparation, model optimization and development process for adding new actions.
## Solution Selection
In PaddleDetection's PP-Human, we provide a variety of solutions for behavior recognition: video classification, image classification, detection, tracking-based, and skeleton point-based behavior recognition solutions, in order to meet the needs of different scenes and different target behaviors.
<img width="1091" alt="image" src="https://user-images.githubusercontent.com/22989727/178742352-d0c61784-3e93-4406-b2a2-9067f42cb343.png">
The following takes several specific actions that PaddleDetection currently supports as an example to introduce the selection basis of each action:
### Smoking
Solution selection: action recognition based on detection with human id.
Reason: The smoking action has a obvious feature target, that is, cigarette. So we can think that when a cigarette is detected in the corresponding image of a person, the person is with the smoking action. Compared with video-based or skeleton-based recognition schemes, training detection model needs to collect data at the image level rather than the video level, which can significantly reduce the difficulty of data collection and labeling. In addition, the detection task has abundant pre-training model resources, and the performance of the model will be more guaranteed.
### Making Phone Calls
Solution selection: action recognition based on classification with human id.
Reason: Although there is a characteristic target of a mobile phone in the call action, in order to distinguish actions such as looking at the mobile phone, and considering that there will be much occlusion of the mobile phone in the calling action in the security scene (such as the occlusion of the mobile phone by the hand or head, etc.), is not conducive to the detection model to correctly detect the target. Simultaneous, calls usually last a long time, and the character's action do not change much, so a strategy for frame-level image classification can therefore be employed. In addition, the action of making a phone call can mainly be judged by the upper body, and the half-body picture can be used to remove redundant information to reduce the difficulty of model training.
### Falling
Solution selection: action recognition based on skelenton.
Reason: Falling is an obvious temporal action, which is distinguishable by a character himself, and it is scene-independent. Since PP-Human is towards the security monitoring scene, where the background changes are more complicated, and the real-time inference needs to be considered in the deployment, the action recognition based on skeleton points is adopted to obtain better generalization and running speed.
### People Intrusion
Solution selection: action recognition based on tracking with human id.
Reason: The intrusion recognition can be judged by whether the pedestrian's path or location is in a selected area, and it is unrelated to pedestrian's body action. Therefore, it is only necessary to track the human and use coordinate results to analyze whether there is intrusion behavior.
### Fighting
Solution selection: action recognition based on video classification.
Reason: Unlike the actions above, fighting is a typical multiplayer action. Therefore, the detection and tracking model is no longer used to extract pedestrians and their IDs, but the entire video clip is processed. In addition, the mutual occlusion between various targets in the fighting scene is extremely serious, leading to the accuracy of keypoint recognition is not good.
The following are detailed description for the five major categories of solutions, including the data preparation, model optimization and adding new actions.
1. [action recognition based on detection with human id.](./idbased_det_en.md)
2. [action recognition based on classification with human id.](./idbased_clas_en.md)
3. [action recognition based on skelenton.](./skeletonbased_rec_en.md)
4. [action recognition based on tracking with human id](../pphuman_mot_en.md)
5. [action recognition based on video classification](./videobased_rec_en.md)

View File

@@ -0,0 +1,223 @@
简体中文 | [English](./idbased_clas_en.md)
# 基于人体id的分类模型开发
## 环境准备
基于人体id的分类方案是使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/installation/install_paddleclas.md)完成环境安装,以进行后续的模型训练及使用流程。
## 数据准备
基于图像分类的行为识别方案直接对视频中的图像帧结果进行识别,因此模型训练流程与通常的图像分类模型一致。
### 数据集下载
打电话的行为识别是基于公开数据集[UAV-Human](https://github.com/SUTDCV/UAV-Human)进行训练的。请通过该链接填写相关数据集申请材料后获取下载链接。
`UAVHuman/ActionRecognition/RGBVideos`路径下包含了该数据集中RGB视频数据集每个视频的文件名即为其标注信息。
### 训练及测试图像处理
根据视频文件名,其中与行为识别相关的为`A`相关的字段即action我们可以找到期望识别的动作类型数据。
- 正样本视频:以打电话为例,我们只需找到包含`A024`的文件。
- 负样本视频:除目标动作以外所有的视频。
鉴于视频数据转化为图像会有较多冗余对于正样本视频我们间隔8帧进行采样并使用行人检测模型处理为半身图像取检测框的上半部分`img = img[:H/2, :, :]`)。正样本视频中的采样得到的图像即视为正样本,负样本视频中采样得到的图像即为负样本。
**注意**: 正样本视频中并不完全符合打电话这一动作,在视频开头结尾部分会出现部分冗余动作,需要移除。
### 标注文件准备
基于图像分类的行为识别方案是借助[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)进行模型训练的。使用该方案训练的模型,需要准备期望识别的图像数据及对应标注文件。根据[PaddleClas数据集格式说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/data_preparation/classification_dataset.md#1-%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%BC%E5%BC%8F%E8%AF%B4%E6%98%8E)准备对应的数据即可。标注文件样例如下,其中`0`,`1`分别是图片对应所属的类别:
```
# 每一行采用"空格"分隔图像路径与标注
train/000001.jpg 0
train/000002.jpg 0
train/000003.jpg 1
...
```
此外,标签文件`phone_label_list.txt`,帮助将分类序号映射到具体的类型名称:
```
0 make_a_phone_call # 类型0
1 normal # 类型1
```
完成上述内容后,放置于`dataset`目录下,文件结构如下:
```
data/
├── images # 放置所有图片
├── phone_label_list.txt # 标签文件
├── phone_train_list.txt # 训练列表,包含图片及其对应类型
└── phone_val_list.txt # 测试列表,包含图片及其对应类型
```
## 模型优化
### 检测-跟踪模型优化
基于分类的行为识别模型效果依赖于前序的检测和跟踪效果如果实际场景中不能准确检测到行人位置或是难以正确在不同帧之间正确分配人物ID都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。
### 半身图预测
在打电话这一动作中,实际是通过上半身就能实现动作的区分的,因此在训练和预测过程中,将图像由行人全身图换为半身图
## 新增行为
### 数据准备
参考前述介绍的内容,完成数据准备的部分,放置于`{root of PaddleClas}/dataset`下:
```
data/
├── images # 放置所有图片
├── label_list.txt # 标签文件
├── train_list.txt # 训练列表,包含图片及其对应类型
└── val_list.txt # 测试列表,包含图片及其对应类型
```
其中,训练及测试列表如下:
```
# 每一行采用"空格"分隔图像路径与标注
train/000001.jpg 0
train/000002.jpg 0
train/000003.jpg 1
train/000004.jpg 2 # 新增的类别直接填写对应类别号即可
...
```
`label_list.txt`中需要同样对应扩展类型的名称:
```
0 make_a_phone_call # 类型0
1 Your New Action # 类型1
...
n normal # 类型n
```
### 配置文件设置
在PaddleClas中已经集成了[训练配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml),需要重点关注的设置项如下:
```yaml
# model architecture
Arch:
name: PPHGNet_tiny
class_num: 2 # 对应新增后的数量
...
# 正确设置image_root与cls_label_path保证image_root + cls_label_path中的图片路径能够正确访问图片路径
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/
cls_label_path: ./dataset/phone_train_list_halfbody.txt
...
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 1
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 2 # 显示topk的数量不要超过类别总数
class_id_map_file: dataset/phone_label_list.txt # 修改后的label_list.txt路径
```
### 模型训练及评估
#### 模型训练
通过如下命令启动训练:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
-o Arch.pretrained=True
```
其中 `Arch.pretrained``True`表示使用预训练权重帮助训练。
#### 模型评估
训练好模型之后,可以通过以下命令实现对模型指标的评估。
```bash
python3 tools/eval.py \
-c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
-o Global.pretrained_model=output/PPHGNet_tiny/best_model
```
其中 `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` 指定了当前最佳权重所在的路径,如果指定其他权重,只需替换对应的路径即可。
### 模型导出
模型导出的详细介绍请参考[这里](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model)
可以参考以下步骤实现:
```python
python tools/export_model.py
-c ./PPHGNet_tiny_calling_halfbody.yaml \
-o Global.pretrained_model=./output/PPHGNet_tiny/best_model \
-o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody
```
然后将导出的模型重命名并加入配置文件以适配PP-Human的使用。
```bash
cd ./output_inference/PPHGNet_tiny_calling_halfbody
mv inference.pdiparams model.pdiparams
mv inference.pdiparams.info model.pdiparams.info
mv inference.pdmodel model.pdmodel
# 下载预测配置文件
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml
```
至此即可使用PP-Human进行实际预测了。
### 自定义行为输出
基于人体id的分类的行为识别方案中将任务转化为对应人物的图像进行图片级别的分类。对应分类的类型最终即视为当前阶段的行为。因此在完成自定义模型的训练及部署的基础上还需要将分类模型结果转化为最终的行为识别结果作为输出并修改可视化的显示结果。
#### 转换为行为识别结果
请对应修改[后处理函数](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L509)。
核心代码为:
```python
# 确定分类模型的最高分数输出结果
cls_id_res = 1
cls_score_res = -1.0
for cls_id in range(len(cls_result[idx])):
score = cls_result[idx][cls_id]
if score > cls_score_res:
cls_id_res = cls_id
cls_score_res = score
# Current now, class 0 is positive, class 1 is negative.
if cls_id_res == 1 or (cls_id_res == 0 and
cls_score_res < self.threshold):
# 如果分类结果不是目标行为或是置信度未达到阈值,则根据历史结果确定当前帧的行为
history_cls, life_remain, history_score = self.result_history.get(
tracker_id, [1, self.frame_life, -1.0])
cls_id_res = history_cls
cls_score_res = 1 - cls_score_res
life_remain -= 1
if life_remain <= 0 and tracker_id in self.result_history:
del (self.result_history[tracker_id])
elif tracker_id in self.result_history:
self.result_history[tracker_id][1] = life_remain
else:
self.result_history[
tracker_id] = [cls_id_res, life_remain, cls_score_res]
else:
# 分类结果属于目标行为,则使用将该结果,并记录到历史结果中
self.result_history[
tracker_id] = [cls_id_res, self.frame_life, cls_score_res]
...
```
#### 修改可视化输出
目前基于ID的行为识别是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。

View File

@@ -0,0 +1,224 @@
[简体中文](./idbased_clas.md) | English
# Development for Action Recognition Based on Classification with Human ID
## Environmental Preparation
The model of action recognition based on classification with human id is trained with [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). Please refer to [Install PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md) to complete the environment installation for subsequent model training and usage processes.
## Data Preparation
The model of action recognition based on classification with human id directly recognizes the image frames of video, so the model training process is same with the usual image classification model.
### Dataset Download
The action recognition of making phone calls is trained on the public dataset [UAV-Human](https://github.com/SUTDCV/UAV-Human). Please fill in the relevant application materials through this link to obtain the download link.
The RGB video in this dataset is included in the `UAVHuman/ActionRecognition/RGBVideos` path, and the file name of each video is its annotation information.
### Image Processing for Training and Validation
According to the video file name, in which the `A` field (i.e. action) related to action recognition, we can find the action type of the video data that we expect to recognize.
- Positive sample video: Taking phone calls as an example, we just need to find the file containing `A024`.
- Negative sample video: All videos except the target action.
In view of the fact that there will be much redundancy when converting video data into images, for positive sample videos, we sample at intervals of 8 frames, and use the pedestrian detection model to process it into a half-body image (take the upper half of the detection frame, that is, `img = img[: H/2, :, :]`). The image sampled from the positive sample video is regarded as a positive sample, and the sampled image from the negative sample video is regarded as a negative sample.
**Note**: The positive sample video does not completely are the action of making a phone call. There will be some redundant actions at the beginning and end of the video, which need to be removed.
### Preparation for Annotation File
The model of action recognition based on classification with human id is trained with [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). Thus the model trained with this scheme needs to prepare the desired image data and corresponding annotation files. Please refer to [Image Classification Datasets](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/data_preparation/classification_dataset_en.md) to prepare the data. An example of an annotation file is as follows, where `0` and `1` are the corresponding categories of the image:
```
# Each line uses "space" to separate the image path and label
train/000001.jpg 0
train/000002.jpg 0
train/000003.jpg 1
...
```
Additionally, the label file `phone_label_list.txt` helps map category numbers to specific type names:
```
0 make_a_phone_call # type 0
1 normal # type 1
```
After the above content finished, place it to the `dataset` directory, the file structure is as follow:
```
data/
├── images # All images
├── phone_label_list.txt # Label file
├── phone_train_list.txt # Training list, including pictures and their corresponding types
└── phone_val_list.txt # Validation list, including pictures and their corresponding types
```
## Model Optimization
### Detection-Tracking Model Optimization
The performance of action recognition based on classification with human id depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization.
### Half-Body Prediction
In the action of making a phone call, the action classification can be achieved through the upper body image. Therefore, during the training and prediction process, the image is changed from the pedestrian full-body to half-body.
## Add New Action
### Data Preparation
Referring to the previous introduction, complete the data preparation part and place it under `{root of PaddleClas}/dataset`:
```
data/
├── images # All images
├── label_list.txt # Label file
├── train_list.txt # Training list, including pictures and their corresponding types
└── val_list.txt # Validation list, including pictures and their corresponding types
```
Where the training list and validation list file are as follow:
```
# Each line uses "space" to separate the image path and label
train/000001.jpg 0
train/000002.jpg 0
train/000003.jpg 1
train/000004.jpg 2 # For the newly added categories, simply fill in the corresponding category number.
`label_list.txt` should give name of the extension type:
```
0 make_a_phone_call # class 0
1 Your New Action # class 1
...
n normal # class n
```
...
```
### Configuration File Settings
The [training configuration file] (https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml) has been integrated in PaddleClas. The settings that need to be paid attention to are as follows:
```yaml
# model architecture
Arch:
name: PPHGNet_tiny
class_num: 2 # Corresponding to the number of action categories
...
# Please correctly set image_root and cls_label_path to ensure that the image_root + image path in cls_label_path can access the image correctly
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/
cls_label_path: ./dataset/phone_train_list_halfbody.txt
...
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 1
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 2 # Display the number of topks, do not exceed the total number of categories
class_id_map_file: dataset/phone_label_list.txt # path of label_list.txt
```
### Model Training And Evaluation
#### Model Training
Start training with the following command:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
-o Arch.pretrained=True
```
where `Arch.pretrained=True` is to use pretrained weights to help with training.
#### Model Evaluation
After training the model, use the following command to evaluate the model metrics.
```bash
python3 tools/eval.py \
-c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
-o Global.pretrained_model=output/PPHGNet_tiny/best_model
```
Where `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` specifies the path where the current best weight is located. If other weights are needed, just replace the corresponding path.
#### Model Export
For the detailed introduction of model export, please refer to [here](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model)
You can refer to the following steps:
```python
python tools/export_model.py
-c ./PPHGNet_tiny_calling_halfbody.yaml \
-o Global.pretrained_model=./output/PPHGNet_tiny/best_model \
-o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody
```
Then rename the exported model and add the configuration file to suit the usage of PP-Human.
```bash
cd ./output_inference/PPHGNet_tiny_calling_halfbody
mv inference.pdiparams model.pdiparams
mv inference.pdiparams.info model.pdiparams.info
mv inference.pdmodel model.pdmodel
# Download configuration file for inference
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml
```
At this point, this model can be used in PP-Human.
### Custom Action Output
In the model of action recognition based on classification with human id, the task is defined as a picture-level classification task of corresponding person. The type of the corresponding classification is finally regarded as the action type of the current stage. Therefore, on the basis of completing the training and deployment of the custom model, it is also necessary to convert the classification model results to the final action recognition results as output, and the displayed result of the visualization should be modified.
Please modify the [postprocessing function](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L509).
The core code are:
```python
# Get the highest score output of the classification model
cls_id_res = 1
cls_score_res = -1.0
for cls_id in range(len(cls_result[idx])):
score = cls_result[idx][cls_id]
if score > cls_score_res:
cls_id_res = cls_id
cls_score_res = score
# Current now, class 0 is positive, class 1 is negative.
if cls_id_res == 1 or (cls_id_res == 0 and
cls_score_res < self.threshold):
# If the classification result is not the target action or its confidence does not reach the threshold,
# determine the action type of the current frame according to the historical results
history_cls, life_remain, history_score = self.result_history.get(
tracker_id, [1, self.frame_life, -1.0])
cls_id_res = history_cls
cls_score_res = 1 - cls_score_res
life_remain -= 1
if life_remain <= 0 and tracker_id in self.result_history:
del (self.result_history[tracker_id])
elif tracker_id in self.result_history:
self.result_history[tracker_id][1] = life_remain
else:
self.result_history[
tracker_id] = [cls_id_res, life_remain, cls_score_res]
else:
# If the classification result belongs to the target action, use the result and record it in the historical result
self.result_history[
tracker_id] = [cls_id_res, self.frame_life, cls_score_res]
...
```
#### Modify Visual Output
At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result.

View File

@@ -0,0 +1,202 @@
简体中文 | [English](./idbased_det_en.md)
# 基于人体id的检测模型开发
## 环境准备
基于人体id的检测方案是直接使用[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL_cn.md)完成环境安装,以进行后续的模型训练及使用流程。
## 数据准备
基于检测的行为识别方案中,数据准备的流程与一般的检测模型一致,详情可参考[目标检测数据准备](../../../tutorials/data/PrepareDetDataSet.md)。将图像和标注数据组织成PaddleDetection中支持的格式之一即可。
**注意** 在实际使用的预测过程中,使用的是单人图像进行预测,因此在训练过程中建议将图像裁剪为单人图像,再进行烟头检测框的标注,以提升准确率。
## 模型优化
### 检测-跟踪模型优化
基于检测的行为识别模型效果依赖于前序的检测和跟踪效果如果实际场景中不能准确检测到行人位置或是难以正确在不同帧之间正确分配人物ID都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。
### 更大的分辨率
烟头的检测在监控视角下是一个典型的小目标检测问题,使用更大的分辨率有助于提升模型整体的识别率
### 预训练模型
加入小目标场景数据集VisDrone下的预训练模型进行训练模型mAP由38.1提升到39.7。
## 新增行为
### 数据准备
参考[目标检测数据准备](../../../tutorials/data/PrepareDetDataSet.md)完成训练数据准备。
准备完成后,数据路径为
```
dataset/smoking
├── smoking # 存放所有的图片
│   ├── 1.jpg
│   ├── 2.jpg
├── smoking_test_cocoformat.json # 测试标注文件
├── smoking_train_cocoformat.json # 训练标注文件
```
`COCO`格式为例完成后的json标注文件内容如下
```json
# imagesid
"images": [
{
"file_name": "smoking/1.jpg",
"id": 0, # idid
"height": 437,
"width": 212
},
{
"file_name": "smoking/2.jpg",
"id": 1,
"height": 655,
"width": 365
},
...
# categories ,
"categories": [
{
"supercategory": "cigarette",
"id": 1,
"name": "cigarette"
},
{
"supercategory": "Class_Defined_by_Yourself",
"id": 2,
"name": "Class_Defined_by_Yourself"
},
...
# annotations , id, id
"annotations": [
{
"category_id": 1, # 1cigarette
"bbox": [
97.0181345931,
332.7033243081,
7.5943999555,
16.4545332369
],
"id": 0, # idid
"image_id": 0, # id
"iscrowd": 0,
"area": 124.96230648208665
},
{
"category_id": 2, # 2Class_Defined_by_Yourself
"bbox": [
114.3895698372,
221.9131122343,
25.9530363697,
50.5401234568
],
"id": 1,
"image_id": 1,
"iscrowd": 0,
"area": 1311.6696622034585
```
### 配置文件设置
参考[配置文件](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), 其中需要关注重点如下:
```yaml
metric: COCO
num_classes: 1 # 如果新增了更多的类别,请对应修改此处
# 正确设置image_diranno_pathdataset_dir
# 保证dataset_dir + anno_path 能正确对应标注文件的路径
# 保证dataset_dir + image_dir + 标注文件中的图片路径可以正确对应到图片路径
TrainDataset:
!COCODataSet
image_dir: ""
anno_path: smoking_train_cocoformat.json
dataset_dir: dataset/smoking
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: ""
anno_path: smoking_test_cocoformat.json
dataset_dir: dataset/smoking
TestDataset:
!ImageFolder
anno_path: smoking_test_cocoformat.json
dataset_dir: dataset/smoking
```
### 模型训练及评估
#### 模型训练
参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md),执行下列步骤实现
```bash
# At Root of PaddleDetection
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval
```
#### 模型评估
训练好模型之后,可以通过以下命令实现对模型指标的评估
```bash
# At Root of PaddleDetection
python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml
```
### 模型导出
注意如果在Tensor-RT环境下预测, 请开启`-o trt=True`以获得更好的性能
```bash
# At Root of PaddleDetection
python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True
```
导出模型后,可以得到:
```
ppyoloe_crn_s_80e_smoking_visdrone/
├── infer_cfg.yml
├── model.pdiparams
├── model.pdiparams.info
└── model.pdmodel
```
至此即可使用PP-Human进行实际预测了。
### 自定义行为输出
基于人体id的检测的行为识别方案中将任务转化为在对应人物的图像中检测目标特征对象。当目标特征对象被检测到时则视为行为正在发生。因此在完成自定义模型的训练及部署的基础上还需要将检测模型结果转化为最终的行为识别结果作为输出并修改可视化的显示结果。
#### 转换为行为识别结果
请对应修改[后处理函数](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L338)。
核心代码为:
```python
# 解析检测模型输出,并筛选出置信度高于阈值的有效检测框。
# Current now, class 0 is positive, class 1 is negative.
action_ret = {'class': 1.0, 'score': -1.0}
box_num = np_boxes_num[idx]
boxes = det_result['boxes'][cur_box_idx:cur_box_idx + box_num]
cur_box_idx += box_num
isvalid = (boxes[:, 1] > self.threshold) & (boxes[:, 0] == 0)
valid_boxes = boxes[isvalid, :]
if valid_boxes.shape[0] >= 1:
# 存在有效检测框时,行为识别结果的类别和分数对应修改
action_ret['class'] = valid_boxes[0, 0]
action_ret['score'] = valid_boxes[0, 1]
# 由于动作的持续性,有效检测结果可复用一定帧数
self.result_history[
tracker_id] = [0, self.frame_life, valid_boxes[0, 1]]
else:
# 不存在有效检测框,则根据历史检测数据确定当前帧的结果
...
```
#### 修改可视化输出
目前基于ID的行为识别是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。

View File

@@ -0,0 +1,199 @@
[简体中文](./idbased_det.md) | English
# Development for Action Recognition Based on Detection with Human ID
## Environmental Preparation
The model of action recognition based on detection with human id is trained with [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). Please refer to [Installation](../../../tutorials/INSTALL.md) to complete the environment installation for subsequent model training and usage processes.
## Data Preparation
The model of action recognition based on detection with human id directly recognizes the image frames of video, so the model training process is same with preparation process of general detection model. For details, please refer to [Data Preparation for Detection](../../../tutorials/data/PrepareDetDataSet_en.md). Please process image and annotation of data into one of the formats PaddleDetection supports.
**Note**: In the actual prediction process, a single person image is used for prediction. So it is recommended to crop the image into a single person image during the training process, and label the cigarette detection bounding box to improve the accuracy.
## Model Optimization
### Detection-Tracking Model Optimization
The performance of action recognition based on detection with human id depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization.
### Larger resolution
The detection of cigarette is a typical small target detection problem from the monitoring perspective. Using a larger resolution can help improve the overall performance of the model.
### Pretrained model
The pretrained model under the small target scene dataset VisDrone is used for training, and the mAP of the model is increased from 38.1 to 39.7.
## Add New Action
### Data Preparation
please refer to [Data Preparation for Detection](../../../tutorials/data/PrepareDetDataSet_en.md) to complete the data preparation part.
When finish this step, the path will look like:
```
dataset/smoking
├── smoking # all images
│   ├── 1.jpg
│   ├── 2.jpg
├── smoking_test_cocoformat.json # Validation file
├── smoking_train_cocoformat.json # Training file
```
Taking the `COCO` format as an example, the content of the completed json annotation file is as follows:
```json
# The "images" field contains the path, id and corresponding width and height information of the images.
"images": [
{
"file_name": "smoking/1.jpg",
"id": 0, # Here id is the picture id serial number, do not duplicate
"height": 437,
"width": 212
},
{
"file_name": "smoking/2.jpg",
"id": 1,
"height": 655,
"width": 365
},
...
# The "categories" field contains all category information. If you want to add more detection categories, please add them here. The example is as follows.
"categories": [
{
"supercategory": "cigarette",
"id": 1,
"name": "cigarette"
},
{
"supercategory": "Class_Defined_by_Yourself",
"id": 2,
"name": "Class_Defined_by_Yourself"
},
...
# The "annotations" field contains information about all instances, including category, bounding box coordinates, id, image id and other information
"annotations": [
{
"category_id": 1, # Corresponding to the defined category, where 1 represents cigarette
"bbox": [
97.0181345931,
332.7033243081,
7.5943999555,
16.4545332369
],
"id": 0, # Here id is the id serial number of the instance, do not duplicate
"image_id": 0, # Here is the id serial number of the image where the instance is located, which may be duplicated. In this case, there are multiple instance objects on one image.
"iscrowd": 0,
"area": 124.96230648208665
},
{
"category_id": 2, # Corresponding to the defined category, where 2 represents Class_Defined_by_Yourself
"bbox": [
114.3895698372,
221.9131122343,
25.9530363697,
50.5401234568
],
"id": 1,
"image_id": 1,
"iscrowd": 0,
"area": 1311.6696622034585
```
### Configuration File Settings
Refer to [Configuration File](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), the key should be paid attention to are as follows:
```yaml
metric: COCO
num_classes: 1 # If more categories are added, please modify here accordingly
# Set image_diranno_pathdataset_dir correctly
# Ensure that dataset_dir + anno_path can correctly access to the path of the annotation file
# Ensure that dataset_dir + image_dir + the image path in the annotation file can correctly access to the image path
TrainDataset:
!COCODataSet
image_dir: ""
anno_path: smoking_train_cocoformat.json
dataset_dir: dataset/smoking
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: ""
anno_path: smoking_test_cocoformat.json
dataset_dir: dataset/smoking
TestDataset:
!ImageFolder
anno_path: smoking_test_cocoformat.json
dataset_dir: dataset/smoking
```
### Model Training And Evaluation
#### Model Training
As [PP-YOLOE](../../../../configs/ppyoloe/README.md), start training with the following command:
```bash
# At Root of PaddleDetection
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval
```
#### Model Evaluation
After training the model, use the following command to evaluate the model metrics.
```bash
# At Root of PaddleDetection
python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml
```
#### Model Export
Note: If predicting in Tensor-RT environment, please enable `-o trt=True` for better performance.
```bash
# At Root of PaddleDetection
python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True
```
After exporting the model, you can get:
```
ppyoloe_crn_s_80e_smoking_visdrone/
├── infer_cfg.yml
├── model.pdiparams
├── model.pdiparams.info
└── model.pdmodel
```
At this point, this model can be used in PP-Human.
### Custom Action Output
In the model of action recognition based on detection with human id, the task is defined to detect target objects in images of corresponding person. When the target object is detected, the behavior type of the character in a certain period of time. The type of the corresponding classification is regarded as the action of the current period. Therefore, on the basis of completing the training and deployment of the custom model, it is also necessary to convert the detection model results to the final action recognition results as output, and the displayed result of the visualization should be modified.
#### Convert to Action Recognition Result
Please modify the [postprocessing function](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pphuman/action_infer.py#L338).
The core code are:
```python
# Parse the detection model output and filter out valid detection boxes with confidence higher than a threshold.
# Current now, class 0 is positive, class 1 is negative.
action_ret = {'class': 1.0, 'score': -1.0}
box_num = np_boxes_num[idx]
boxes = det_result['boxes'][cur_box_idx:cur_box_idx + box_num]
cur_box_idx += box_num
isvalid = (boxes[:, 1] > self.threshold) & (boxes[:, 0] == 0)
valid_boxes = boxes[isvalid, :]
if valid_boxes.shape[0] >= 1:
# When there is a valid detection frame, the category and score of the behavior recognition result are modified accordingly.
action_ret['class'] = valid_boxes[0, 0]
action_ret['score'] = valid_boxes[0, 1]
# Due to the continuity of the action, valid detection results can be reused for a certain number of frames.
self.result_history[
tracker_id] = [0, self.frame_life, valid_boxes[0, 1]]
else:
# If there is no valid detection frame, the result of the current frame is determined according to the historical detection result.
...
```
#### Modify Visual Output
At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result.

View File

@@ -0,0 +1,205 @@
简体中文 | [English](./skeletonbased_rec_en.md)
# 基于人体骨骼点的行为识别
## 环境准备
基于骨骼点的行为识别方案是借助[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/install.md)完成PaddleVideo的环境安装以进行后续的模型训练及使用流程。
## 数据准备
使用该方案训练的模型,可以参考[此文档](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE)准备训练数据以适配PaddleVideo进行训练其主要流程包含以下步骤
### 数据格式说明
STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo中训练数据为采用`.npy`格式存储的`Numpy`数据,标签则可以是`.npy``.pkl`格式存储的文件。对于序列数据的维度要求为`(N,C,T,V,M)`,当前方案仅支持单人构成的行为(但视频中可以存在多人,每个人独自进行行为识别判断),即`M=1`
| 维度 | 大小 | 说明 |
| ---- | ---- | ---------- |
| N | 不定 | 数据集序列个数 |
| C | 2 | 关键点坐标维度,即(x, y) |
| T | 50 | 动作序列的时序维度(即持续帧数)|
| V | 17 | 每个人物关键点的个数,这里我们使用了`COCO`数据集的定义,具体可见[这里](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareKeypointDataSet_cn.md#COCO%E6%95%B0%E6%8D%AE%E9%9B%86) |
| M | 1 | 人物个数,这里我们每个动作序列只针对单人预测 |
### 获取序列的骨骼点坐标
对于一个待标注的序列(这里序列指一个动作片段,可以是视频或有顺序的图片集合)。可以通过模型预测或人工标注的方式获取骨骼点(也称为关键点)坐标。
- 模型预测:可以直接选用[PaddleDetection KeyPoint模型系列](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/keypoint) 模型库中的模型,并根据`3、训练与测试 - 部署预测 - 检测+keypoint top-down模型联合部署`中的步骤获取目标序列的17个关键点坐标。
- 人工标注:若对关键点的数量或是定义有其他需求,也可以直接人工标注各个关键点的坐标位置,注意对于被遮挡或较难标注的点,仍需要标注一个大致坐标,否则后续网络学习过程会受到影响。
当使用模型预测获取时可以参考如下步骤进行请注意此时在PaddleDetection中进行操作。
```bash
# current path is under root of PaddleDetection
# Step 1: download pretrained inference models.
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip
unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip
unzip -d output_inference/ dark_hrnet_w32_256x192.zip
# Step 2: Get the keypoint coordinarys
# if your data is image sequence
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True
# if your data is video
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True
```
这样我们会得到一个`det_keypoint_unite_image_results.json`的检测结果文件。内容的具体含义请见[这里](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108)。
### 统一序列的时序长度
由于实际数据中每个动作的长度不一首先需要根据您的数据和实际场景预定时序长度在PP-Human中我们采用50帧为一个动作序列并对数据做以下处理
- 实际长度超过预定长度的数据随机截取一个50帧的片段
- 实际长度不足预定长度的数据补0直到满足50帧
- 恰好等于预定长度的数据: 无需处理
注意:在这一步完成后,请严格确认处理后的数据仍然包含了一个完整的行为动作,不会产生预测上的歧义,建议通过可视化数据的方式进行确认。
### 保存为PaddleVideo可用的文件格式
在经过前两步处理后,我们得到了每个人物动作片段的标注,此时我们已有一个列表`all_kpts`,这个列表中包含多个关键点序列片段,其中每一个片段形状为(T, V, C) (在我们的例子中即(50, 17, 2)), 下面进一步将其转化为PaddleVideo可用的格式。
- 调整维度顺序: 可通过`np.transpose``np.expand_dims`将每一个片段的维度转化为(C, T, V, M)的格式。
- 将所有片段组合并保存为一个文件
注意:这里的`class_id``int`类型,与其他分类任务类似。例如`0摔倒 1其他`
我们提供了执行该步骤的[脚本文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py),可以直接处理生成的`det_keypoint_unite_image_results.json`文件该脚本执行的内容包括解析json文件内容、前述步骤中介绍的整理训练数据及保存数据文件。
```bash
mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations
mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json
cd {root of PaddleVideo}/applications/PPHuman/datasets/
python prepare_dataset.py
```
至此,我们得到了可用的训练数据(`.npy`)和对应的标注文件(`.pkl`)。
## 模型优化
### 检测-跟踪模型优化
基于骨骼点的行为识别模型效果依赖于前序的检测和跟踪效果如果实际场景中不能准确检测到行人位置或是难以正确在不同帧之间正确分配人物ID都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../pphuman_mot.md)对检测/跟踪模型进行优化。
### 关键点模型优化
骨骼点作为该方案的核心特征,对行人的骨骼点定位效果也决定了行为识别的整体效果。若发现在实际场景中对关键点坐标的识别结果有明显错误,从关键点组成的骨架图像看,已经难以辨别具体动作,可以参考[关键点检测任务二次开发](../keypoint_detection.md)对关键点模型进行优化。
### 坐标归一化处理
在完成骨骼点坐标的获取后,建议根据各人物的检测框进行归一化处理,以消除人物位置、尺度的差异给网络带来的收敛难度。
## 新增行为
基于关键点的行为识别方案中,行为识别模型使用的是[ST-GCN](https://arxiv.org/abs/1801.07455),并在[PaddleVideo训练流程](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md)的基础上修改适配,完成模型训练及导出使用流程。
### 数据准备与配置文件修改
- 按照`数据准备`, 准备训练数据(`.npy`)和对应的标注文件(`.pkl`)。对应放置在`{root of PaddleVideo}/applications/PPHuman/datasets/`下。
- 参考[配置文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), 需要重点关注的内容如下:
```yaml
MODEL: #MODEL field
framework:
backbone:
name: "STGCN"
in_channels: 2 # 此处对应数据说明中的C维表示二维坐标。
dropout: 0.5
layout: 'coco_keypoint'
data_bn: True
head:
name: "STGCNHead"
num_classes: 2 # 如果数据中有多种行为类型,需要修改此处使其与预测类型数目一致。
if_top5: False # 行为类型数量不足5时请设置为False否则会报错
...
# 请根据数据路径正确设置train/valid/test部分的数据及label路径
DATASET: #DATASET field
batch_size: 64
num_workers: 4
test_batch_size: 1
test_num_workers: 0
train:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle
file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path
label_path: "./applications/PPHuman/datasets/train_label.pkl"
valid:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
label_path: "./applications/PPHuman/datasets/val_label.pkl"
test_mode: True
test:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
label_path: "./applications/PPHuman/datasets/val_label.pkl"
test_mode: True
```
### 模型训练与测试
- 在PaddleVideo中使用以下命令即可开始训练
```bash
# current path is under root of PaddleVideo
python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml
# 由于整个任务可能过拟合,建议同时开启验证以保存最佳模型
python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml
```
- 在训练完成后,采用以下命令进行预测:
```bash
python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml -w output/STGCN/STGCN_best.pdparams
```
### 模型导出
- 在PaddleVideo中通过以下命令实现模型的导出得到模型结构文件`STGCN.pdmodel`和模型权重文件`STGCN.pdiparams`,并增加配置文件:
```bash
# current path is under root of PaddleVideo
python tools/export_model.py -c applications/PPHuman/configs/stgcn_pphuman.yaml \
-p output/STGCN/STGCN_best.pdparams \
-o output_inference/STGCN
cp applications/PPHuman/configs/infer_cfg.yml output_inference/STGCN
# 重命名模型文件适配PP-Human的调用
cd output_inference/STGCN
mv STGCN.pdiparams model.pdiparams
mv STGCN.pdiparams.info model.pdiparams.info
mv STGCN.pdmodel model.pdmodel
```
完成后的导出模型目录结构如下:
```
STGCN
├── infer_cfg.yml
├── model.pdiparams
├── model.pdiparams.info
├── model.pdmodel
```
至此就可以使用PP-Human进行行为识别的推理了。
**注意**:如果在训练时调整了视频序列的长度或关键点的数量,在此处需要对应修改配置文件中`INFERENCE`字段内容,以实现正确预测。
```yaml
# 序列数据的维度为(N,C,T,V,M)
INFERENCE:
name: 'STGCN_Inference_helper'
num_channels: 2 # 对应C维
window_size: 50 # 对应T维请对应调整为数据长度
vertex_nums: 17 # 对应V维请对应调整为关键点数目
person_nums: 1 # 对应M维
```
### 自定义行为输出
基于人体骨骼点的行为识别方案中,模型输出的分类结果即代表了该人物在一定时间段内行为类型。对应分类的类型最终即视为当前阶段的行为。因此在完成自定义模型的训练及部署的基础上,使用模型输出作为最终结果,修改可视化的显示结果即可。
#### 修改可视化输出
目前基于ID的行为识别是根据行为识别的结果及预定义的类别名称进行展示的。详细逻辑请见[此处](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043)。如果自定义的行为需要修改为其他的展示名称,请对应修改此处,以正确输出对应结果。

View File

@@ -0,0 +1,200 @@
[简体中文](./skeletonbased_rec.md) | English
# Skeleton-based action recognition
## Environmental Preparation
The skeleton-based action recognition is trained with [PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo). Please refer to [Installation](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/install.md) to complete the environment installation for subsequent model training and usage processes.
## Data Preparation
For the model of skeleton-based model, you can refer to [this document](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE %AD%E7%BB%83%E6%95%B0%E6%8D%AE) to preparation training adapted to PaddleVideo. The main process includes the following steps:
### Data Format Description
STGCN is a model based on the sequence of skeleton point coordinates. In PaddleVideo, training data is `Numpy` data stored with `.npy` format, and labels can be files stored in `.npy` or `.pkl` format. The dimension requirement for sequence data is `(N,C,T,V,M)`, the current solution only supports behaviors composed of a single person (but there can be multiple people in the video, and each person performs action recognition separately), that is` M=1`.
| Dim | Size | Description |
| ---- | ---- | ---------- |
| N | Not Fixed | The number of sequences in the dataset |
| C | 2 | Keypoint coordinate, i.e. (x, y) |
| T | 50 | The temporal dimension of the action sequence (i.e. the number of continuous frames)|
| V | 17 | The number of keypoints of each person, here we use the definition of the `COCO` dataset, see [here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareKeypointDataSet_en.md#description-for-coco-datasetkeypoint) |
| M | 1 | The number of persons, here we only predict a single person for each action sequence |
### Get The Skeleton Point Coordinates of The Sequence
For a sequence to be labeled (here a sequence refers to an action segment, which can be a video or an ordered collection of pictures). The coordinates of skeletal points (also known as keypoints) can be obtained through model prediction or manual annotation.
- Model prediction: You can directly select the model in the [PaddleDetection KeyPoint Models](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/README_en.md) and according to `3, training and testing - Deployment Prediction - Detect + keypoint top-down model joint deployment` to get the 17 keypoint coordinates of the target sequence.
When using the model to predict and obtain the coordinates, you can refer to the following steps, please note that the operation in PaddleDetection at this time.
```bash
# current path is under root of PaddleDetection
# Step 1: download pretrained inference models.
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip
unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip
unzip -d output_inference/ dark_hrnet_w32_256x192.zip
# Step 2: Get the keypoint coordinarys
# if your data is image sequence
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True
# if your data is video
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True
```
We can get a detection result file named `det_keypoint_unite_image_results.json`. The detail of content can be seen at [Here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108).
### Uniform Sequence Length
Since the length of each action in the actual data is different, the first step is to pre-determine the time sequence length according to your data and the actual scene (in PP-Human, we use 50 frames as an action sequence), and do the following processing to the data:
- If the actual length exceeds the predetermined length, a 50-frame segment will be randomly intercepted
- Data whose actual length is less than the predetermined length: fill with 0 until 50 frames are met
- data exactly equal to the predeter: no processing required
Note: After this step is completed, please strictly confirm that the processed data contains a complete action, and there will be no ambiguity in prediction. It is recommended to confirm by visualizing the data.
### Save to PaddleVideo usable formats
After the first two steps of processing, we get the annotation of each character action fragment. At this time, we have a list `all_kpts`, which contains multiple keypoint sequence fragments, each one has a shape of (T, V, C) (in our case (50, 17, 2)), which is further converted into a format usable by PaddleVideo.
- Adjust dimension order: `np.transpose` and `np.expand_dims` can be used to convert the dimension of each fragment into (C, T, V, M) format.
- Combine and save all clips as one file
Note: `class_id` is a `int` type variable, similar to other classification tasks. For example `0: falling, 1: other`.
We provide a [script file](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py) to do this step, which can directly process the generated `det_keypoint_unite_image_results.json` file. The content executed by the script includes parsing the content of the json file, unforming the training data sequence and saving the data file as described in the preceding steps.
```bash
mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations
mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json
cd {root of PaddleVideo}/applications/PPHuman/datasets/
python prepare_dataset.py
```
Now, we have available training data (`.npy`) and corresponding annotation files (`.pkl`).
## Model Optimization
### detection-tracking model optimization
The performance of action recognition based on skelenton depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../pphuman_mot_en.md) for detection/track model optimization.
### keypoint model optimization
As the core feature of the scheme, the skeleton point positioning performance also determines the overall effect of action recognition. If there are obvious errors in the recognition results of the keypoint coordinates of in the actual scene, it is difficult to distinguish the specific actions from the skeleton image composed of the keypoint.
You can refer to [Secondary Development of Keypoint Detection Task](../keypoint_detection_en.md) to optimize the keypoint model.
### Coordinate Normalization
After getting coordinates of the skeleton points, it is recommended to perform normalization processing according to the detection bounding box of each person to reduce the convergence difficulty brought by the difference in the position and scale of the person.
## Add New Action
In skeleton-based action recognition, the model is [ST-GCN](https://arxiv.org/abs/1801.07455). Modified to adapt PaddleVideo based on [Training Step](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/model_zoo/recognition/stgcn.md). And complete the model training and exporting process.
### Data Preparation And Configuration File Settings
- Prepare the training data (`.npy`) and the corresponding annotation file (`.pkl`) according to `Data preparation`. Correspondingly placed under `{root of PaddleVideo}/applications/PPHuman/datasets/`.
- Refer [Configuration File](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), the things to focus on are as follows:
```yaml
MODEL: #MODEL field
framework:
backbone:
name: "STGCN"
in_channels: 2 # This corresponds to the C dimension in the data format description, representing two-dimensional coordinates.
dropout: 0.5
layout: 'coco_keypoint'
data_bn: True
head:
name: "STGCNHead"
num_classes: 2 # If there are multiple action types in the data, this needs to be modified to match the number of types.
if_top5: False # When the number of action types is less than 5, please set it to False, otherwise an error will be raised.
...
# Please set the data and label path of the train/valid/test part correctly according to the data path
DATASET: #DATASET field
batch_size: 64
num_workers: 4
test_batch_size: 1
test_num_workers: 0
train:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle
file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path
label_path: "./applications/PPHuman/datasets/train_label.pkl"
valid:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
label_path: "./applications/PPHuman/datasets/val_label.pkl"
test_mode: True
test:
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
label_path: "./applications/PPHuman/datasets/val_label.pkl"
test_mode: True
```
### Model Training And Evaluation
- In PaddleVideo, start training with the following command:
```bash
# current path is under root of PaddleVideo
python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml
# Since the task may overfit, it is recommended to evaluate model during training to save the best model.
python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml
```
- After training the model, use the following command to do inference.
```bash
python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml -w output/STGCN/STGCN_best.pdparams
```
### Model Export
In PaddleVideo, use the following command to export model and get structure file `STGCN.pdmodel` and weight file `STGCN.pdiparams`. And add the configuration file here.
```bash
# current path is under root of PaddleVideo
python tools/export_model.py -c applications/PPHuman/configs/stgcn_pphuman.yaml \
-p output/STGCN/STGCN_best.pdparams \
-o output_inference/STGCN
cp applications/PPHuman/configs/infer_cfg.yml output_inference/STGCN
# Rename model files to adapt PP-Human
cd output_inference/STGCN
mv STGCN.pdiparams model.pdiparams
mv STGCN.pdiparams.info model.pdiparams.info
mv STGCN.pdmodel model.pdmodel
```
The directory structure will look like:
```
STGCN
├── infer_cfg.yml
├── model.pdiparams
├── model.pdiparams.info
├── model.pdmodel
```
At this point, this model can be used in PP-Human.
**Note**: If the length of the video sequence or the number of keypoints is changed during training, the content of the `INFERENCE` field in the configuration file needs to be modified accordingly to correct prediction.
```yaml
# The dimension of the sequence data is (N,C,T,V,M)
INFERENCE:
name: 'STGCN_Inference_helper'
num_channels: 2 # Corresponding to C dimension
window_size: 50 # Corresponding to T dimension, please set it accordingly to the sequence length.
vertex_nums: 17 # Corresponding to V dimension, please set it accordingly to the number of keypoints
person_nums: 1 # Corresponding to M dimension
```
### Custom Action Output
In the skeleton-based action recognition, the classification result of the model represents the behavior type of the character in a certain period of time. The type of the corresponding classification is regarded as the action of the current period. Therefore, on the basis of completing the training and deployment of the custom model, the model output is directly used as the final result, and the displayed result of the visualization should be modified.
#### Modify Visual Output
At present, ID-based action recognition is displayed based on the results of action recognition and predefined category names. For the detail, please refer to [here](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/pipeline.py#L1024-L1043). If the custom action needs to be modified to another display name, please modify it accordingly to output the corresponding result.

View File

@@ -0,0 +1,159 @@
# 基于视频分类的行为识别
## 数据准备
视频分类任务输入的视频格式一般为`.mp4``.avi`等格式视频或者是抽帧后的视频帧序列,标签则可以是`.txt`格式存储的文件。
对于打架识别任务,具体数据准备流程如下:
### 数据集下载
打架识别基于6个公开的打架、暴力行为相关数据集合并后的数据进行模型训练。公开数据集具体信息如下
| 数据集 | 下载连接 | 简介 | 标注 | 数量 | 时长 |
| ---- | ---- | ---------- | ---- | ---- | ---------- |
| Surveillance Camera Fight Dataset| https://github.com/sayibet/fight-detection-surv-dataset | 裁剪视频,监控视角 | 视频级别 | 打架150非打架150 | 2s |
| A Dataset for Automatic Violence Detection in Videos | https://github.com/airtlab/A-Dataset-for-Automatic-Violence-Detection-in-Videos | 裁剪视频,室内自行录制 | 视频级别 | 暴力行为115个场景2个机位共230 非暴力行为60个场景2个机位共120 | 几秒钟 |
| Hockey Fight Detection Dataset | https://www.kaggle.com/datasets/yassershrief/hockey-fight-vidoes?resource=download | 裁剪视频,非真实场景 | 视频级别 | 打架500非打架500 | 2s |
| Video Fight Detection Dataset | https://www.kaggle.com/datasets/naveenk903/movies-fight-detection-dataset | 裁剪视频,非真实场景 | 视频级别 | 打架100非打架101 | 2s |
| Real Life Violence Situations Dataset | https://www.kaggle.com/datasets/mohamedmustafa/real-life-violence-situations-dataset | 裁剪视频,非真实场景 | 视频级别 | 暴力行为1000非暴力行为1000 | 几秒钟 |
| UBI Abnormal Event Detection Dataset| http://socia-lab.di.ubi.pt/EventDetection/ | 未裁剪视频,监控视角 | 帧级别 | 打架216非打架784裁剪后二次标注打架1976非打架1630 | 原视频几秒到几分钟不等裁剪后2s |
打架暴力行为视频3956个非打架非暴力行为视频3501个共7457个视频每个视频几秒钟。
本项目为大家整理了前5个数据集下载链接[https://aistudio.baidu.com/aistudio/datasetdetail/149085](https://aistudio.baidu.com/aistudio/datasetdetail/149085)。
### 视频抽帧
首先下载PaddleVideo代码
```bash
git clone https://github.com/PaddlePaddle/PaddleVideo.git
```
假设PaddleVideo源码路径为PaddleVideo_root。
为了加快训练速度将视频进行抽帧。下面命令会根据视频的帧率FPS进行抽帧如FPS=30则每秒视频会抽取30帧图像。
```bash
cd ${PaddleVideo_root}
python data/ucf101/extract_rawframes.py dataset/ rawframes/ --level 2 --ext mp4
```
其中,假设视频已经存放在了`dataset`目录下,如果是其他路径请对应修改。打架(暴力)视频存放在`dataset/fight`中;非打架(非暴力)视频存放在`dataset/nofight`中。`rawframes`目录存放抽取的视频帧。
### 训练集和验证集划分
打架识别验证集1500条来自Surveillance Camera Fight Dataset、A Dataset for Automatic Violence Detection in Videos、UBI Abnormal Event Detection Dataset三个数据集。
也可根据下面的命令将数据按照8:2的比例划分成训练集和测试集
```bash
python split_fight_train_test_dataset.py "rawframes" 2 0.8
```
参数说明“rawframes”为视频帧存放的文件夹2表示目录结构为两级第二级表示每个行为对应的子文件夹0.8表示训练集比例。
其中`split_fight_train_test_dataset.py`文件在PaddleDetection中的`deploy/pipeline/tools`路径下。
执行完命令后会最终生成fight_train_list.txt和fight_val_list.txt两个文件。打架的标签为1非打架的标签为0。
### 视频裁剪
对于未裁剪的视频如UBI Abnormal Event Detection Dataset数据集需要先进行裁剪才能用于模型训练`deploy/pipeline/tools/clip_video.py`中给出了视频裁剪的函数`cut_video`,输入为视频路径,裁剪的起始帧和结束帧以及裁剪后的视频保存路径。
## 模型优化
### VideoMix
[VideoMix](https://arxiv.org/abs/2012.03457)是视频数据增强的方法之一是对图像数据增强CutMix的扩展可以缓解模型的过拟合问题。
与Mixup将两个视频片段的每个像素点按照一定比例融合不同的是VideoMix是每个像素点要么属于片段A要么属于片段B。输出结果是两个片段原始标签的加权和权重是两个片段各自的比例。
在baseline的基础上加入VideoMix数据增强后精度由87.53%提升至88.01%。
### 更大的分辨率
由于监控摄像头角度、距离等问题存在监控画面下人比较小的情况小目标行为的识别较困难尝试增大输入图像的分辨率模型精度由88.01%提升至89.06%。
## 新增行为
目前打架识别模型使用的是[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)套件中[PP-TSM](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/pp-tsm.md)并在PP-TSM视频分类模型训练流程的基础上修改适配完成模型训练。
请先参考[使用说明](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/usage.md)了解PaddleVideo模型库的使用。
| 任务 | 算法 | 精度 | 预测速度(ms) | 模型权重 | 预测部署模型 |
| ---- | ---- | ---------- | ---- | ---- | ---------- |
| 打架识别 | PP-TSM | 准确率89.06% | T4, 2s视频128ms | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams) | [下载链接](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.zip) |
#### 模型训练
下载预训练模型:
```bash
wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_vd_ssld_v2_pretrained.pdparams
```
执行训练:
```bash
# 单卡训练
cd ${PaddleVideo_root}
python main.py --validate -c pptsm_fight_frames_dense.yaml
```
本方案针对的是视频的二分类问题,如果不是二分类,需要修改配置文件中`MODEL-->head-->num_classes`为具体的类别数目。
```bash
cd ${PaddleVideo_root}
# 多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -B -m paddle.distributed.launch --gpus=“0,1,2,3” \
--log_dir=log_pptsm_dense main.py --validate \
-c pptsm_fight_frames_dense.yaml
```
#### 模型评估
训练好的模型下载:[https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams](https://videotag.bj.bcebos.com/PaddleVideo-release2.3/ppTSM_fight.pdparams)
模型评估:
```bash
cd ${PaddleVideo_root}
python main.py --test -c pptsm_fight_frames_dense.yaml \
-w ppTSM_fight_best.pdparams
```
其中`ppTSM_fight_best.pdparams`为训练好的模型。
#### 模型导出
导出inference模型
```bash
cd ${PaddleVideo_root}
python tools/export_model.py -c pptsm_fight_frames_dense.yaml \
-p ppTSM_fight_best.pdparams \
-o inference/ppTSM
```
#### 推理可视化
利用上步骤导出的模型基于PaddleDetection中推理pipeline可完成自定义行为识别及可视化。
新增行为后,需要对现有的可视化代码进行修改,目前代码支持打架二分类可视化,新增类别后需要根据识别结果自适应可视化推理结果。
具体修改PaddleDetection中develop/deploy/pipeline/pipeline.py路径下PipePredictor类中visualize_video成员函数。当结果中存在'video_action'数据时会对行为进行可视化。目前的逻辑是如果推理的类别为1则为打架行为进行可视化否则不进行显示即"video_action_score"为None。用户新增行为后可根据类别index和对应的行为设置"video_action_text"字段目前index=1对应"Fight"。相关代码块如下:
```
video_action_res = result.get('video_action')
if video_action_res is not None:
video_action_score = None
if video_action_res and video_action_res["class"] == 1:
video_action_score = video_action_res["score"]
mot_boxes = None
if mot_res:
mot_boxes = mot_res['boxes']
image = visualize_action(
image,
mot_boxes,
action_visual_collector=None,
action_text="SkeletonAction",
video_action_score=video_action_score,
video_action_text="Fight")
```

View File

@@ -0,0 +1,84 @@
简体中文 | [English](./detection_en.md)
# 目标检测任务二次开发
在目标检测算法产业落地过程中常常会出现需要额外训练以满足实际使用的要求项目迭代过程中也会出先需要修改类别的情况。本文档详细介绍如何使用PaddleDetection进行目标检测算法二次开发流程包括数据准备、模型优化思路和修改类别开发流程。
## 数据准备
二次开发首先需要进行数据集的准备针对场景特点采集合适的数据从而提升模型效果和泛化性能。然后使用LabemeLabelImg等标注工具标注目标检测框并将标注结果转化为COCO或VOC数据格式。详细文档可以参考[数据准备文档](../../tutorials/data/README.md)
## 模型优化
### 1. 使用自定义数据集训练
基于准备的数据在数据配置文件中修改对应路径,例如`configs/dataset/coco_detection.yml`:
```
metric: COCO
num_classes: 80
TrainDataset:
!COCODataSet
image_dir: train2017 # 训练集的图片所在文件相对于dataset_dir的路径
anno_path: annotations/instances_train2017.json # 训练集的标注文件相对于dataset_dir的路径
dataset_dir: dataset/coco # 数据集所在路径相对于PaddleDetection路径
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: val2017 # 验证集的图片所在文件相对于dataset_dir的路径
anno_path: annotations/instances_val2017.json # 验证集的标注文件相对于dataset_dir的路径
dataset_dir: dataset/coco # 数据集所在路径相对于PaddleDetection路径
TestDataset:
!ImageFolder
anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) # 标注文件所在文件 相对于dataset_dir的路径
dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' # 数据集所在路径相对于PaddleDetection路径
```
配置修改完成后,即可以启动训练评估,命令如下
```
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --eval
```
更详细的命令参考[30分钟快速上手PaddleDetection](../../tutorials/GETTING_STARTED_cn.md)
### 2. 加载COCO模型作为预训练
目前PaddleDetection提供的配置文件加载的预训练模型均为ImageNet数据集的权重加载到检测算法的骨干网络中实际使用时建议加载COCO数据集训练好的权重通常能够对模型精度有较大提升使用方法如下
#### 1) 设置预训练权重路径
COCO数据集训练好的模型权重均在各算法配置文件夹下例如`configs/ppyoloe`下提供了PP-YOLOE-l COCO数据集权重[链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) 。配置文件中设置`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams`
#### 2) 修改超参数
加载COCO预训练权重后需要修改学习率超参数例如`configs/ppyoloe/_base_/optimizer_300e.yml`中:
```
epoch: 120 # 原始配置为300epoch加载COCO权重后可以适当减少迭代轮数
LearningRate:
base_lr: 0.005 # 原始配置为0.025加载COCO权重后需要降低学习率
schedulers:
- !CosineDecay
max_epochs: 144 # 依据epoch数进行修改
- !LinearWarmup
start_factor: 0.
epochs: 5
```
## 修改类别
当实际使用场景类别发生变化时,需要修改数据配置文件,例如`configs/datasets/coco_detection.yml`中:
```
metric: COCO
num_classes: 10 # 原始类别80
```
配置修改完成后同样可以加载COCO预训练权重PaddleDetection支持自动加载shape匹配的权重对于shape不匹配的权重会自动忽略因此无需其他修改。

View File

@@ -0,0 +1,89 @@
[简体中文](./detection.md) | English
# Customize Object Detection task
In the practical application of object detection algorithms in a specific industry, additional training is often required for practical use. The project iteration will also need to modify categories. This document details how to use PaddleDetection for a customized object detection algorithm. The process includes data preparation, model optimization roadmap, and modifying the category development process.
## Data Preparation
Customization starts with the preparation of the dataset. We need to collect suitable data for the scenario features, so as to improve the model effect and generalization performance. Then Labeme, LabelImg and other labeling tools will be used to label the object detection bouding boxes and convert the labeling results into COCO or VOC data format. Details please refer to [Data Preparation](../../tutorials/data/PrepareDetDataSet_en.md)
## Model Optimization
### 1. Use customized dataset for training
Modify the corresponding path in the data configuration file based on the prepared data, for example:
configs/dataset/coco_detection.yml`:
```
metric: COCO
num_classes: 80
TrainDataset:
!COCODataSet
image_dir: train2017 # Path to the images of the training set relative to the dataset_dir
anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir
dataset_dir: dataset/coco # Path to the dataset relative to the PaddleDetection path
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: val2017 # Path to the images of the evaldataset set relative to the dataset_dir
anno_path: annotations/instances_val2017.json # Path to the annotation file of the evaldataset relative to the dataset_dir
dataset_dir: dataset/coco # Path to the dataset relative to the PaddleDetection path
TestDataset:
!ImageFolder
anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) # Path to the annotation files relative to dataset_di.
dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' # Path to the dataset relative to the PaddleDetection path
```
Once the configuration changes are completed, the training evaluation can be started with the following command
```
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --eval
```
More details please refer to [Getting Started for PaddleDetection](../../tutorials/GETTING_STARTED_cn.md)
###
### 2. Load the COCO model as pre-training
The currently provided pre-trained models in PaddleDetection's configurations are weights from the ImageNet dataset, loaded into the backbone network of the detection algorithm. For practical use, it is recommended to load the weights trained on the COCO dataset, which can usually provide a large improvement to the model accuracy. The method is as follows.
#### 1) Set pre-training weight path
The trained model weights for the COCO dataset are saved in the configuration folder of each algorithm, for example, PP-YOLOE-l COCO dataset weights are provided under `configs/ppyoloe`: [Link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) The configuration file sets`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams`
#### 2) Modify hyperparameters
After loading the COCO pre-training weights, the learning rate hyperparameters need to be modified, for example
In `configs/ppyoloe/_base_/optimizer_300e.yml`:
```
epoch: 120 # The original configuration is 300 epoch, after loading COCO weights, the iteration number can be reduced appropriately
LearningRate:
base_lr: 0.005 # The original configuration is 0.025, after loading COCO weights, the learning rate should be reduced.
schedulers:
- !CosineDecay
max_epochs: 144 # Modify based on the number of epochs
- LinearWarmup
start_factor: 0.
epochs: 5
```
## Modify categories
When the actual application scenario category changes, the data configuration file needs to be modified, for example in `configs/datasets/coco_detection.yml`:
```
metric: COCO
num_classes: 10 # original class 80
```
After the configuration changes are completed, the COCO pre-training weights can also be loaded. PaddleDetection supports automatic loading of shape-matching weights, and weights that do not match the shape are automatically ignored, so no other modifications are needed.

View File

@@ -0,0 +1,261 @@
简体中文 | [English](./keypoint_detection_en.md)
# 关键点检测任务二次开发
在实际场景中应用关键点检测算法不可避免地会出现需要二次开发的需求。包括对目前的预训练模型效果不满意希望优化模型效果或是目前的关键点点位定义不能满足实际场景需求希望新增或是替换关键点点位的定义训练新的关键点模型。本文档将介绍如何在PaddleDetection中对关键点检测算法进行二次开发。
## 数据准备
### 基本流程说明
在PaddleDetection中目前支持的标注数据格式为`COCO``MPII`。这两个数据格式的详细说明,可以参考文档[关键点数据准备](../../tutorials/data/PrepareKeypointDataSet.md)。在这一步中通过使用Labeme等标注工具依照特征点序号标注对应坐标。并转化成对应可训练的标注格式。建议使用`COCO`格式进行。
### 合并数据集
为了扩展使用的训练数据,合并多个不同的数据集一起训练是一个很直观的解决手段,但不同的数据集往往对关键点的定义并不一致。合并数据集的第一步是需要统一不同数据集的点位定义,确定标杆点位,即最终模型学习的特征点类型,然后根据各个数据集的点位定义与标杆点位定义之间的关系进行调整。
- 在标杆点位中的点:调整点位序号,使其与标杆点位一致
- 未在标杆点位中的点:舍去
- 数据集缺少标杆点位中的点:对应将标注的标志位记为“未标注”
在[关键点数据准备](../../tutorials/data/PrepareKeypointDataSet.md)中,提供了如何合并`COCO`数据集和`AI Challenger`数据集,并统一为以`COCO`为标杆点位定义的案例说明,供参考。
## 模型优化
### 检测-跟踪模型优化
在PaddleDetection中关键点检测能力支持Top-Down、Bottom-Up两套方案Top-Down先检测主体再检测局部关键点优点是精度较高缺点是速度会随着检测对象的个数增加Bottom-Up先检测关键点再组合到对应的部位上优点是速度快与检测对象个数无关缺点是精度较低。关于两种方案的详情及对应模型可参考[关键点检测系列模型](../../../configs/keypoint/README.md)
当使用Top-Down方案时模型效果依赖于前序的检测和跟踪效果如果实际场景中不能准确检测到行人位置会使关键点检测部分表现受限。如果在实际使用中遇到了上述问题请参考[目标检测任务二次开发](./detection.md)以及[多目标跟踪任务二次开发](./pphuman_mot.md)对检测/跟踪模型进行优化。
### 使用符合场景的数据迭代
目前发布的关键点检测算法模型主要在`COCO`/ `AI Challenger`等开源数据集上迭代,这部分数据集中可能缺少与实际任务较为相似的监控场景(视角、光照等因素)、体育场景(存在较多非常规的姿态)。使用更符合实际任务场景的数据进行训练,有助于提升模型效果。
### 使用预训练模型迭代
关键点模型的数据的标注复杂度较大,直接使用模型从零开始在业务数据集上训练,效果往往难以满足需求。在实际工程中使用时,建议加载已经训练好的权重,通常能够对模型精度有较大提升,以`HRNet`为例,使用方法如下:
```bash
python tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o pretrain_weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams
```
在加载预训练模型后,可以适当减小初始学习率和最终迭代轮数, 建议初始学习率取默认配置值的1/2至1/5并可开启`--eval`观察迭代过程中AP值的变化。
### 遮挡数据增强
关键点任务中有较多遮挡问题,包括自身遮挡与不同目标之间的遮挡。
1. 检测模型优化仅针对Top-Down方案
参考[目标检测任务二次开发](./detection.md),提升检测模型在复杂场景下的效果。
2. 关键点数据增强
在关键点模型训练中增加遮挡的数据增强,参考[PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/tiny_pose/tinypose_256x192.yml#L100)。有助于模型提升这类场景下的表现。
### 对视频预测进行平滑处理
关键点模型是在图片级别的基础上进行训练和预测的,对于视频类型的输入也是将视频拆分为帧进行预测。帧与帧之间虽然内容大多相似,但微小的差异仍然可能导致模型的输出发生较大的变化,表现为虽然预测的坐标大体正确,但视觉效果上有较大的抖动问题。通过添加滤波平滑处理,将每一帧预测的结果与历史结果综合考虑,得到最终的输出结果,可以有效提升视频上的表现。该部分内容可参考[滤波平滑处理](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/python/det_keypoint_unite_infer.py#L206)。
## 新增或修改关键点点位定义
### 数据准备
根据前述说明,完成数据的准备,放置于`{root of PaddleDetection}/dataset`下。
<details>
<summary><b> 标注文件示例</b></summary>
一个标注文件示例如下:
```
self_dataset/
├── train_coco_joint.json # 训练集标注文件
├── val_coco_joint.json # 验证集标注文件
├── images/ # 存放图片文件
   ├── 0.jpg
   ├── 1.jpg
   ├── 2.jpg
```
其中标注文件中需要注意的改动如下:
```json
{
"images": [
{
"file_name": "images/0.jpg",
"id": 0, # id
"height": 1080,
"width": 1920
},
{
"file_name": "images/1.jpg",
"id": 1,
"height": 1080,
"width": 1920
},
{
"file_name": "images/2.jpg",
"id": 2,
"height": 1080,
"width": 1920
},
...
"categories": [
{
"supercategory": "person",
"id": 1,
"name": "person",
"keypoints": [ #
"point1",
"point2",
"point3",
"point4",
"point5",
],
"skeleton": [ # ,
[
1,
2
],
[
1,
3
],
[
2,
4
],
[
3,
5
]
]
...
"annotations": [
{
{
"category_id": 1, #
"num_keypoints": 3, #
"bbox": [ # ,x, y, w, h
799,
575,
55,
185
],
# N*3 x, y, v
"keypoints": [
807.5899658203125,
597.5455322265625,
2,
0,
0,
0, # 000
805.8563232421875,
592.3446655273438,
2,
816.258056640625,
594.0783081054688,
2,
0,
0,
0
]
"id": 1, # id
"image_id": 8, # id
"iscrowd": 0, # 0
"area": 10175 # w * h0eval
...
```
</details>
### 配置文件设置
在配置文件中,完整的含义参考[config yaml配置项说明](../../tutorials/KeyPointConfigGuide_cn.md)。以[HRNet模型配置](../../../configs/keypoint/hrnet/hrnet_w32_256x192.yml)为例,重点需要关注的内容如下:
<details>
<summary><b> 配置文件示例</b></summary>
一个配置文件的示例如下
```yaml
use_gpu: true
log_iter: 5
save_dir: output
snapshot_epoch: 10
weights: output/hrnet_w32_256x192/model_final
epoch: 210
num_joints: &num_joints 5 # 预测的点数与定义点数量一致
pixel_std: &pixel_std 200
metric: KeyPointTopDownCOCOEval
num_classes: 1
train_height: &train_height 256
train_width: &train_width 192
trainsize: &trainsize [*train_width, *train_height]
hmsize: &hmsize [48, 64]
flip_perm: &flip_perm [[1, 2], [3, 4]] # 注意只有含义上镜像对称的点才写到这里
...
# 保证dataset_dir + anno_path 能正确定位到标注文件位置
# 保证dataset_dir + image_dir + 标注文件中的图片路径能正确定位到图片
TrainDataset:
!KeypointTopDownCocoDataset
image_dir: images
anno_path: train_coco_joint.json
dataset_dir: dataset/self_dataset
num_joints: *num_joints
trainsize: *trainsize
pixel_std: *pixel_std
use_gt_bbox: True
EvalDataset:
!KeypointTopDownCocoDataset
image_dir: images
anno_path: val_coco_joint.json
dataset_dir: dataset/self_dataset
bbox_file: bbox.json
num_joints: *num_joints
trainsize: *trainsize
pixel_std: *pixel_std
use_gt_bbox: True
image_thre: 0.0
```
</details>
### 模型训练及评估
#### 模型训练
通过如下命令启动训练:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml
```
#### 模型评估
训练好模型之后,可以通过以下命令实现对模型指标的评估:
```bash
python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml
```
注意由于测试依赖pycocotools工具其默认为`COCO`数据集的17点如果修改后的模型并非预测17点直接使用评估命令会报错。
需要修改以下内容以获得正确的评估结果:
- [sigma列表](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/keypoint_utils.py#L219)表示每个关键点的范围方差越大则容忍度越高。其长度与预测点数一致。根据实际关键点可信区域设置区域精确的一般0.25-0.5例如眼睛。区域范围大的一般0.5-1.0例如肩膀。若不确定建议0.75。
- [pycocotools sigma列表](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L523)含义及内容同上取值与sigma列表一致。
### 模型导出及预测
#### Top-Down模型联合部署
```shell
#导出关键点模型
python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights={path_to_your_weights}
#detector 检测 + keypoint top-down模型联合部署联合推理只支持top-down方式
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_256x192/ --video_file=../video/xxx.mp4 --device=gpu
```
- 注意目前PP-Human中使用的为该方案
#### Bottom-Up模型独立部署
```shell
#导出模型
python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=output/higherhrnet_hrnet_w32_512/model_final.pdparams
#部署推理
python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=gpu --threshold=0.5
```

View File

@@ -0,0 +1,258 @@
[简体中文](./keypoint_detection.md) | English
# Customized Keypoint Detection
When applying keypoint detection algorithms in real practice, inevitably, we may need customization as we may dissatisfy with the current pre-trained model results, or the current keypoint detection cannot meet the actual demand, or we may want to add or replace the definition of keypoints and train a new keypoint detection model. This document will introduce how to customize the keypoint detection algorithm in PaddleDetection.
## Data Preparation
### Basic Process Description
PaddleDetection currently supports `COCO` and `MPII` annotation data formats. For detailed descriptions of these two data formats, please refer to the document [Keypoint Data Preparation](./../tutorials/data/PrepareKeypointDataSet.md). In this step, by using annotation tools such as Labeme, the corresponding coordinates are annotated according to the feature point serial numbers and then converted into the corresponding trainable annotation format. And we recommend `COCO` format.
### Merging datasets
To extend the training data, we can merge several different datasets together. But different datasets often have different definitions of key points. Therefore, the first step in merging datasets is to unify the point definitions of different datasets, and determine the benchmark points, i.e., the types of feature points finally learned by the model, and then adjust them according to the relationship between the point definitions of each dataset and the benchmark point definitions.
- Points in the benchmark point location: adjust the point number to make it consistent with the benchmark point location
- Points that are not in the benchmark points: discard
- Points in the dataset that are missing from the benchmark: annotate the marked points as "unannotated".
In [Key point data preparation](... /... /tutorials/data/PrepareKeypointDataSet.md), we provide a case illustration of how to merge the `COCO` dataset and the `AI Challenger` dataset and unify them as a benchmark point definition with `COCO` for your reference.
## Model Optimization
### Detection and tracking model optimization
In PaddleDetection, the keypoint detection supports Top-Down and Bottom-Up solutions. Top-Down first detects the main body and then detects the local key points. It has higher accuracy but will take a longer time as the number of detected objects increases.The Bottom-Up plan first detects the keypoints and then combines them with the corresponding parts. It is fast and its speed is independent of the number of detected objects. Its disadvantage is that the accuracy is relatively low. For details of the two solutions and the corresponding models, please refer to [Keypoint Detection Series Models](../../../configs/keypoint/README.md)
When using the Top-Down solution, the model's effects depend on the previous detection or tracking effect. If the pedestrian position cannot be accurately detected in the actual practice, the performance of the keypoint detection will be limited. If you encounter the above problem in actual application, please refer to [Customized Object Detection](./detection_en.md) and [Customized Multi-target tracking](./pphuman_mot_en.md) for optimization of the detection and tracking model.
### Iterate with scenario-compatible data
The currently released keypoint detection algorithm models are mainly iterated on open source datasets such as `COCO`/ `AI Challenger`, which may lack surveillance scenarios (angles, lighting and other factors), sports scenarios (more unconventional poses) that are more similar to the actual task. Training with data that more closely matches the actual task scenario can help improve the model's results.
### Iteration via pre-trained models
The data annotation of the keypoint model is complex, and using the model directly to train on the business dataset from scratch is often difficult to meet the demand. When used in practical projects, it is recommended to load the pre-trained weights, which usually improve the model accuracy significantly. Let's take `HRNet` as an example with the following method:
```
python tools/train.py \
-c configs/keypoint/hrnet/hrnet_w32_256x192.yml \
-o pretrain_weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams
```
After loading the pre-trained model, the initial learning rate and the rounds of iterations can be reduced appropriately. It is recommended that the initial learning rate be 1/2 to 1/5 of the default configuration, and you can enable`--eval` to observe the change of AP values during the iterations.
## Data augmentation with occlusion
There are a lot of data in occlusion in keypoint tasks, including self-covered objects and occlusion between different objects.
1. Detection model optimization (only for Top-Down solutions)
Refer to [Target Detection Task Secondary Development](. /detection.md) to improve the detection model in complex scenarios.
2. Keypoint data augmentation
Augmentation of covered data in keypoint model training to improve model performance in such scenarios, please refer to [PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/tiny_pose/)
### Smooth video prediction
The keypoint model is trained and predicted on the basis of image, and video input is also predicted by splitting the video into frames. Although the content is mostly similar between frames, small differences may still lead to large changes in the output of the model. As a result of that, although the predicted coordinates are roughly correct, there may be jitters in the visual effect.
By adding a smoothing filter process, the performance of the video output can be effectively improved by combining the predicted results of each frame and the historical results. For this part, please see [Filter Smoothing](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/python/det_keypoint_unite_infer.py#L206).
## Add or modify keypoint definition
### Data Preparation
Complete the data preparation according to the previous instructions and place it under `{root of PaddleDetection}/dataset`.
<details>
<summary><b> Examples of annotation file</b></summary>
```
self_dataset/
├── train_coco_joint.json # training set annotation file
├── val_coco_joint.json # Validation set annotation file
├── images/ # Store the image files
   ├── 0.jpg
   ├── 1.jpg
   ├── 2.jpg
```
Notable changes as follows:
```
{
"images": [
{
"file_name": "images/0.jpg",
"id": 0, # image id, id cannotdo not repeat
"height": 1080,
"width": 1920
},
{
"file_name": "images/1.jpg",
"id": 1,
"height": 1080,
"width": 1920
},
{
"file_name": "images/2.jpg",
"id": 2,
"height": 1080,
"width": 1920
},
...
"categories": [
{
"supercategory": "person",
"id": 1,
"name": "person",
"keypoints": [ # the name of the point serial number
"point1",
"point2",
"point3",
"point4",
"point5",
],
"skeleton": [ # Skeleton composed of points, not necessary for training
[
1,
2
],
[
1,
3
],
[
2,
4
],
[
3,
5
]
]
...
"annotations": [
{
{
"category_id": 1, # The category to which the instance belongs
"num_keypoints": 3, # the number of marked points of the instance
"bbox": [ # location of detection box,format is x, y, w, h
799,
575,
55,
185
],
# N*3 list of x, y, v.
"keypoints": [
807.5899658203125,
597.5455322265625,
2,
0,
0,
0, # unlabeled points noted as 0, 0, 0
805.8563232421875,
592.3446655273438,
2,
816.258056640625,
594.0783081054688,
2,
0,
0,
0
]
"id": 1, # the id of the instance, id cannot repeat
"image_id": 8, # The id of the image where the instance is located, repeatable. This represents the presence of multiple objects on a single image
"iscrowd": 0, # covered or not, when the value is 0, it will participate in training
"area": 10175 # the area occupied by the instance, can be simply taken as w * h. Note that when the value is 0, it will be skipped, and if it is too small, it will be ignored in eval
...
```
### Settings of configuration file
In the configuration file, refer to [config yaml configuration](... /... /tutorials/KeyPointConfigGuide_cn.md) for more details . Take [HRNet model configuration](... /... /... /configs/keypoint/hrnet/hrnet_w32_256x192.yml) as an example, we need to focus on following contents:
<details>
<summary><b> Example of configuration</b></summary>
```
use_gpu: true
log_iter: 5
save_dir: output
snapshot_epoch: 10
weights: output/hrnet_w32_256x192/model_final
epoch: 210
num_joints: &num_joints 5 # The number of predicted points matches the number of defined points
pixel_std: &pixel_std 200
Metric. keyPointTopDownCOCOEval
num_classes: 1
train_height: &train_height 256
train_width: &train_width 192
trainsize: &trainsize [*train_width, *train_height].
hmsize: &hmsize [48, 64].
flip_perm: &flip_perm [[1, 2], [3, 4]]. # Note that only points that are mirror-symmetric are recorded here.
...
# Ensure that dataset_dir + anno_path can correctly locate the annotation file
# Ensure that dataset_dir + image_dir + image path in annotation file can correctly locate the image.
TrainDataset:
!KeypointTopDownCocoDataset
image_dir: images
anno_path: train_coco_joint.json
dataset_dir: dataset/self_dataset
num_joints: *num_joints
trainsize. *trainsize
pixel_std: *pixel_std
use_gt_box: true
Evaluate the dataset.
!KeypointTopDownCocoDataset
image_dir: images
anno_path: val_coco_joint.json
dataset_dir: dataset/self_dataset
bbox_file: bbox.json
num_joints: *num_joints
trainsize. *trainsize
pixel_std: *pixel_std
use_gt_box: true
image_thre: 0.0
```
### Model Training and Evaluation
#### Model Training
Run the following command to start training:
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml
```
#### Model Evaluation
After training the model, you can evaluate the model metrics by running the following commands:
```
python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml
```
### Model Export and Inference
#### Top-Down model deployment
```
#Export keypoint model
python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights={path_to_your_weights}
#detector detection + keypoint top-down model co-deploymentfor top-down solutions only
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_256x192/ --video_file=../video/xxx.mp4 --device=gpu
```

View File

@@ -0,0 +1,295 @@
简体中文 | [English](./pphuman_attribute_en.md)
# 行人属性识别任务二次开发
## 数据准备
### 数据格式
格式采用PA100K的属性标注格式共有26位属性。
这26位属性的名称、位置、种类数量见下表。
| Attribute | index | length |
|:----------|:----------|:----------|
| 'Hat','Glasses' | [0, 1] | 2 |
| 'ShortSleeve','LongSleeve','UpperStride','UpperLogo','UpperPlaid','UpperSplice' | [2, 3, 4, 5, 6, 7] | 6 |
| 'LowerStripe','LowerPattern','LongCoat','Trousers','Shorts','Skirt&Dress' | [8, 9, 10, 11, 12, 13] | 6 |
| 'boots' | [14, ] | 1 |
| 'HandBag','ShoulderBag','Backpack','HoldObjectsInFront' | [15, 16, 17, 18] | 4 |
| 'AgeOver60', 'Age18-60', 'AgeLess18' | [19, 20, 21] | 3 |
| 'Female' | [22, ] | 1 |
| 'Front','Side','Back' | [23, 24, 25] | 3 |
举例:
[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
第一组,位置[0, 1]数值分别是[0, 1],表示'no hat'、'has glasses'。
第二组,位置[22, ]数值分别是[0, ], 表示gender属性是'male', 否则是'female'。
第三组,位置[23, 24, 25]数值分别是[0, 1, 0], 表示方向属性是侧面'side'。
其他组依次类推
### 数据标注
理解了上面`属性标注`格式的含义后就可以进行数据标注的工作。其本质是每张单人图建立一组26个长度的标注项分别与26个位置的属性值对应。
举例:
对于一张原始图片,
1 使用检测框,标注图片中每一个人的位置。
2 每一个检测框对应每一个人包含一组26位的属性值数组数组的每一位以0或1表示。对应上述26个属性。例如如果图片是'Female'则数组第22位为0如果满足'Age18-60',则位置[19, 20, 21]对应的数值是[0, 1, 0], 或者满足'AgeOver60',则相应数值为[1, 0, 0].
标注完成后利用检测框将每一个人截取成单人图其图片与26位属性标注建立对应关系。也可先截成单人图再进行标注效果相同。
## 模型训练
数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。
其主要有两步工作需要完成1将数据与标注数据整理成训练格式。2修改配置文件开始训练。
### 训练数据格式
训练数据包括训练使用的图片和一个训练列表train.txt其具体位置在训练配置中指定其放置方式示例如下
```
Attribute/
|-- data 训练图片文件夹
| |-- 00001.jpg
| |-- 00002.jpg
| `-- 0000x.jpg
`-- train.txt 训练数据列表
```
train.txt文件内为所有训练图片名称相对于根路径的文件路径+ 26个标注值
其每一行表示一个人的图片和标注结果。其格式为:
```
00001.jpg 0,0,1,0,....
```
注意1)图片与标注值之间是以Tab[\t]符号隔开, 2)标注值之间是以逗号[,]隔开。该格式不能错,否则解析失败。
### 修改配置开始训练
首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md):
```shell
git clone https://github.com/PaddlePaddle/PaddleClas
```
需要在配置文件`PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`中,修改的配置项如下:
```
DataLoader:
Train:
dataset:
name: MultiLabelDataset
image_root: "dataset/pa100k/" #指定训练图片所在根路径
cls_label_path: "dataset/pa100k/train_list.txt" #指定训练列表文件位置
label_ratio: True
transform_ops:
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/pa100k/" #指定评估图片所在根路径
cls_label_path: "dataset/pa100k/val_list.txt" #指定评估列表文件位置
label_ratio: True
transform_ops:
```
注意:
1. 这里image_root路径+train.txt中图片相对路径对应图片的完整路径位置。
2. 如果有修改属性数量,则还需修改内容配置项中属性种类数量:
```
# model architecture
Arch:
name: "PPLCNet_x1_0"
pretrained: True
use_ssld: True
class_num: 26 #属性种类数量
```
然后运行以下命令开始训练。
```
#多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
#单卡训练
python3 tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
```
训练完成后可以执行以下命令进行性能评估:
```
#多卡评估
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
#单卡评估
python3 tools/eval.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
```
### 模型导出
使用下述命令将训练好的模型导出为预测部署模型。
```
python3 tools/export_model.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer
```
导出模型后,需要下载[infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_cfg.yml)文件,并放置到导出的模型文件夹`PPLCNet_x1_0_person_attribute_infer`中。
使用时在PP-Human中的配置文件`./deploy/pipeline/config/infer_cfg_pphuman.yml`中修改新的模型路径`model_dir`项,并开启功能`enable: True`
```
ATTR:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_person_attribute_infer/ #新导出的模型路径位置
enable: True #开启功能
```
然后可以使用-->至此即完成新增属性类别识别任务。
## 属性增减
上述是以26个属性为例的标注、训练过程。
如果需要增加、减少属性数量,则需要:
1)标注时需增加新属性类别信息或删减属性类别信息;
2)对应修改训练中train.txt所使用的属性数量和名称
3)修改训练配置,例如``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml``文件中的属性数量,详细见上述`修改配置开始训练`部分。
增加属性示例:
1. 在标注数据时在26位后继续增加新的属性标注数值
2. 在train.txt文件的标注数值中也增加新的属性数值。
3. 注意属性类型在train.txt中属性数值列表中的位置的对应关系需要时固定的例如第[19, 20, 21]位表示年龄,所有图片都要使用[19, 20, 21]位置表示年龄,不再赘述。
<div width="500" align="center">
<img src="../../images/add_attribute.png"/>
</div>
删减属性同理。
例如,如果不需要年龄属性,则位置[19, 20, 21]的数值可以去掉。只需在train.txt中标注的26个数字中全部删除第19-21位数值即可同时标注数据时也不再需要标注这3位属性值。
## 修改后处理代码
修改了属性定义后pipeline后处理部分也需要做相应修改主要影响结果可视化时的显示结果。
相应代码在路径`deploy/pipeline/pphuman/attr_infer.py`文件中`postprocess`函数。
其函数实现说明如下:
```
# 函数入口
def postprocess(self, inputs, result):
# postprocess output of predictor
im_results = result['output']
# 1) 定义各组属性实际意义,其数量及位置与输出结果中占用位数一一对应。
labels = self.pred_config.labels
age_list = ['AgeLess18', 'Age18-60', 'AgeOver60']
direct_list = ['Front', 'Side', 'Back']
bag_list = ['HandBag', 'ShoulderBag', 'Backpack']
upper_list = ['UpperStride', 'UpperLogo', 'UpperPlaid', 'UpperSplice']
lower_list = [
'LowerStripe', 'LowerPattern', 'LongCoat', 'Trousers', 'Shorts',
'Skirt&Dress'
]
# 2 部分属性所用阈值与通用值有明显区别,单独设置
glasses_threshold = 0.3
hold_threshold = 0.6
batch_res = []
for res in im_results:
res = res.tolist()
label_res = []
# gender
# 3) 单个位置属性类别,判断该位置是否大于阈值,来分配二分类结果
gender = 'Female' if res[22] > self.threshold else 'Male'
label_res.append(gender)
# age
# 4多个位置属性类别N选一形式选择得分最高的属性
age = age_list[np.argmax(res[19:22])]
label_res.append(age)
# direction
direction = direct_list[np.argmax(res[23:])]
label_res.append(direction)
# glasses
glasses = 'Glasses: '
if res[1] > glasses_threshold:
glasses += 'True'
else:
glasses += 'False'
label_res.append(glasses)
# hat
hat = 'Hat: '
if res[0] > self.threshold:
hat += 'True'
else:
hat += 'False'
label_res.append(hat)
# hold obj
hold_obj = 'HoldObjectsInFront: '
if res[18] > hold_threshold:
hold_obj += 'True'
else:
hold_obj += 'False'
label_res.append(hold_obj)
# bag
bag = bag_list[np.argmax(res[15:18])]
bag_score = res[15 + np.argmax(res[15:18])]
bag_label = bag if bag_score > self.threshold else 'No bag'
label_res.append(bag_label)
# upper
# 5同一类属性分为两组这里是款式和花色每小组内单独选择相当于两组不同属性。
upper_label = 'Upper:'
sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve'
upper_label += ' {}'.format(sleeve)
upper_res = res[4:8]
if np.max(upper_res) > self.threshold:
upper_label += ' {}'.format(upper_list[np.argmax(upper_res)])
label_res.append(upper_label)
# lower
lower_res = res[8:14]
lower_label = 'Lower: '
has_lower = False
for i, l in enumerate(lower_res):
if l > self.threshold:
lower_label += ' {}'.format(lower_list[i])
has_lower = True
if not has_lower:
lower_label += ' {}'.format(lower_list[np.argmax(lower_res)])
label_res.append(lower_label)
# shoe
shoe = 'Boots' if res[14] > self.threshold else 'No boots'
label_res.append(shoe)
batch_res.append(label_res)
result = {'output': batch_res}
return result
```

View File

@@ -0,0 +1,223 @@
[简体中文](pphuman_attribute.md) | English
# Customized Pedestrian Attribute Recognition
## Data Preparation
### Data format
We use the PA100K attribute annotation format, with a total of 26 attributes.
The names, locations, and the number of these 26 attributes are shown in the table below.
| Attribute | index | length |
|:------------------------------------------------------------------------------- |:---------------------- |:------ |
| 'Hat','Glasses' | [0, 1] | 2 |
| 'ShortSleeve','LongSleeve','UpperStride','UpperLogo','UpperPlaid','UpperSplice' | [2, 3, 4, 5, 6, 7] | 6 |
| 'LowerStripe','LowerPattern','LongCoat','Trousers','Shorts','Skirt&Dress' | [8, 9, 10, 11, 12, 13] | 6 |
| 'boots' | [14, ] | 1 |
| 'HandBag','ShoulderBag','Backpack','HoldObjectsInFront' | [15, 16, 17, 18] | 4 |
| 'AgeOver60', 'Age18-60', 'AgeLess18' | [19, 20, 21] | 3 |
| 'Female' | [22, ] | 1 |
| 'Front','Side','Back' | [23, 24, 25] | 3 |
Examples:
[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
The first group: position [0, 1] values are [0, 1], which means'no hat', 'has glasses'.
The second group: position [22, ] values are [0, ], indicating that the gender attribute is 'male', otherwise it is 'female'.
The third group: position [23, 24, 25] values are [0, 1, 0], indicating that the direction attribute is 'side'.
Other groups follow in this order
### Data Annotation
After knowing the purpose of the above `attribute annotation` format, we can start to annotate data. The essence is that each single-person image creates a set of 26 annotation items, corresponding to the attribute values at 26 positions.
Examples:
For an original image:
1) Using bounding boxes to annotate the position of each person in the picture.
2) Each detection box (corresponding to each person) contains 26 attribute values which are represented by 0 or 1. It corresponds to the above 26 attributes. For example, if the picture is 'Female', then the 22nd bit of the array is 0. If the person is between 'Age18-60', then the corresponding value at position [19, 20, 21] is [0, 1, 0], or if the person matches 'AgeOver60', then the corresponding value is [1, 0, 0].
After the annotation is completed, the model will use the detection box to intercept each person into a single-person picture, and its picture establishes a corresponding relationship with the 26 attribute annotation. It is also possible to cut into a single-person image first and then annotate it. The results are the same.
## Model Training
Once the data is annotated, it can be used for model training to complete the optimization of the customized model.
There are two main steps: 1) Organize the data and annotated data into the training format. 2) Modify the configuration file to start training.
### Training data format
The training data includes the images used for training and a training list called train.txt. Its location is specified in the training configuration, with the following example:
```
Attribute/
|-- data Training images folder
|-- 00001.jpg
|-- 00002.jpg
| `-- 0000x.jpg
train.txt List of training data
```
train.txt file contains the names of all training images (file path relative to the root path) + 26 annotation values
Each line of it represents a person's image and annotation result. The format is as follows:
```
00001.jpg 0,0,1,0,....
```
Note 1) The images are separated by Tab[\t], 2) The annotated values are separated by commas [,]. If the format is wrong, the parsing will fail.
### Modify the configuration to start training
First run the following command to download the training code (for more environmental issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md)):
```
git clone https://github.com/PaddlePaddle/PaddleClas
```
You need to modify the following configuration in the configuration file `PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`
```
DataLoader:
Train:
Train: dataset:
name: MultiLabelDataset
image_root: "dataset/pa100k/" #Specify the root path of training image
cls_label_path: "dataset/pa100k/train_list.txt" #Specify the location of the training list file
label_ratio: True
transform_ops:
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/pa100k/" #Specify the root path of evaluated image
cls_label_path: "dataset/pa100k/val_list.txt" #Specify the location of the evaluation list file
label_ratio: True
transform_ops:
```
Note:
1. here image_root path and the relative path of the image in train.txt, corresponding to the full path of the image.
2. If you modify the number of attributes, the number of attribute types in the content configuration item should also be modified accordingly.
```
# model architecture
Arch:
name: "PPLCNet_x1_0"
pretrained: True
use_ssld: True
class_num: 26 #Attribute classes and numbers
```
Then run the following command to start training:
```
#Multi-card training
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
#Single card training
python3 tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
```
You can run the following commands for performance evaluation after the training is completed:
```
#Multi-card evaluation
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
#Single card evaluation
python3 tools/eval.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
```
### Model Export
Use the following command to export the trained model as an inference deployment model.
```
python3 tools/export_model.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer
```
After exporting the model, you need to download the [infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_cfg.yml) file and put it into the exported model folder `PPLCNet_x1_0_person_ attribute_infer` .
When you use the model, you need to modify the new model path `model_dir` entry and set `enable: True` in the configuration file of PP-Human `. /deploy/pipeline/config/infer_cfg_pphuman.yml` .
```
ATTR:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_person_attribute_infer/ #The exported model location
enable: True #Whether to enable the function
```
Now, the model is ready for you.
To this point, a new attribute category recognition task is completed.
## Adding or deleting attributes
The above is the annotation and training process with 26 attributes.
If the attributes need to be added or deleted, you need to
1) New attribute category information needs to be added or deleted when annotating the data.
2) Modify the number and name of attributes used in train.txt corresponding to the training.
3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above.
Example of adding attributes.
1. Continue to add new attribute annotation values after 26 values when annotating the data.
2. Add new attribute values to the annotated values in the train.txt file as well.
3. The above is the annotation and training process with 26 attributes.
If the attributes need to be added or deleted, you need to
1) New attribute category information needs to be added or deleted when annotating the data.
2) Modify the number and name of attributes used in train.txt corresponding to the training.
3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above.
Example of adding attributes.
1. Continue to add new attribute annotation values after 26 values when annotating the data.
2. Add new attribute values to the annotated values in the train.txt file as well.
3. Note that the correlation of attribute types and values in train.txt needs to be fixed, for example, the [19, 20, 21] position indicates age, and all images should use the [19, 20, 21] position to indicate age.
The same applies to the deletion of attributes.
For example, if the age attribute is not needed, the values in positions [19, 20, 21] can be removed. You can simply remove all the values in positions 19-21 from the 26 numbers marked in train.txt, and you no longer need to annotate these 3 attribute values.

View File

@@ -0,0 +1,63 @@
简体中文 | [English](./pphuman_mot_en.md)
# 多目标跟踪任务二次开发
在产业落地过程中应用多目标跟踪算法不可避免地会出现希望自定义类型的多目标跟踪的需求或是对已有多目标跟踪模型的优化以提升在特定场景下模型的效果。我们在本文档通过案例来介绍如何根据期望识别的行为来进行多目标跟踪方案的选择以及使用PaddleDetection进行多目标跟踪算法二次开发工作包括数据准备、模型优化思路和跟踪类别修改的开发流程。
## 数据准备
多目标跟踪模型方案采用[ByteTrack](https://arxiv.org/pdf/2110.06864.pdf)其中使用PP-YOLOE替换原文的YOLOX作为检测器使用BYTETracker作为跟踪器详细文档参考[ByteTrack](../../../configs/mot/bytetrack)。原文的ByteTrack只支持行人单类别PaddleDetection中也支持多类别同时进行跟踪。训练ByteTrack也就是训练检测器的过程只需要准备好检测标注即可不需要ReID标注信息即当成纯检测来做即可。数据集最好是连续视频中抽取出来的而不是无关联的图片集合。
二次开发首先需要进行数据集的准备针对场景特点采集合适的数据从而提升模型效果和泛化性能。然后使用LabemeLabelImg等标注工具标注目标检测框并将标注结果转化为COCO或VOC数据格式。详细文档可以参考[数据准备文档](../../tutorials/data/README.md)
## 模型优化
### 1. 使用自定义数据集训练
ByteTrack跟踪方案采用的数据集只需要有检测标注即可。参照[MOT数据集准备](../../../configs/mot)和[MOT数据集教程](docs/tutorials/data/PrepareMOTDataSet.md)。
```
# 单卡训练
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp
# 多卡训练
python -m paddle.distributed.launch --log_dir=log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp
```
更详细的命令参考[30分钟快速上手PaddleDetection](../../tutorials/GETTING_STARTED_cn.md)和[ByteTrack](../../../configs/mot/bytetrack/detector)
### 2. 加载COCO模型作为预训练
目前PaddleDetection提供的配置文件加载的预训练模型均为ImageNet数据集的权重加载到检测算法的骨干网络中实际使用时建议加载COCO数据集训练好的权重通常能够对模型精度有较大提升使用方法如下
#### 1) 设置预训练权重路径
COCO数据集训练好的模型权重均在各算法配置文件夹下例如`configs/ppyoloe`下提供了PP-YOLOE-l COCO数据集权重[链接](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) 。配置文件中设置`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams`
#### 2) 修改超参数
加载COCO预训练权重后需要修改学习率超参数例如`configs/ppyoloe/_base_/optimizer_300e.yml`中:
```
epoch: 120 # 原始配置为300epoch加载COCO权重后可以适当减少迭代轮数
LearningRate:
base_lr: 0.005 # 原始配置为0.025加载COCO权重后需要降低学习率
schedulers:
- !CosineDecay
max_epochs: 144 # 依据epoch数进行修改一般为epoch数的1.2倍
- !LinearWarmup
start_factor: 0.
epochs: 5
```
## 跟踪类别修改
当实际使用场景类别发生变化时,需要修改数据配置文件,例如`configs/datasets/coco_detection.yml`中:
```
metric: COCO
num_classes: 10 # 原始类别1
```
配置修改完成后同样可以加载COCO预训练权重PaddleDetection支持自动加载shape匹配的权重对于shape不匹配的权重会自动忽略因此无需其他修改。

View File

@@ -0,0 +1,65 @@
[简体中文](./pphuman_mot.md) | English
# Customized multi-object tracking task
When applying multi-object tracking algorithms in industrial applications, there will be inevitable demands for customized types of multi-object tracking or optimization of existing multi-object tracking models to improve the effectiveness of the models in specific scenarios. In this document, we present examples of how to choose a multi-object tracking solution based on the expected identified behavior, and how to use PaddleDetection for further development of multi-object tracking algorithms, including data preparation, model optimization ideas, and the development process of tracking category modification.
## Data Preparation
The multi-object tracking model scheme uses [ByteTrack](https://arxiv.org/pdf/2110.06864.pdf), which adopts PP-YOLOE to replace the original YOLOX as a detector and BYTETracker as a tracker, for details, please refer to [ByteTrack](... /... /... /configs/mot/bytetrack). The original ByteTrack only supports single pedestrian category, while PaddleDetection supports multiple categories for simultaneous tracking. Training ByteTrack, which is the process of training the detector, only requires the detection annotations to be prepared, and does not require ReID annotation information, i.e., it can be done as pure detection. The dataset should preferably be extracted from continuous video rather than a collection of unrelated images.
Customization starts with the preparation of the dataset. We need to collect suitable data for the scenario features, so as to improve the model effect and generalization performance. Then Labeme, LabelImg and other labeling tools will be used to label the object detection frame and convert the labeling results into COCO or VOC data format. Details please refer to [Data Preparation](../../tutorials/data/README.md)
## Model Optimization
### 1. Use customized data set for training
The dataset used by the ByteTrack tracking solution only needs detection annotations. Refer to [MOT dataset preparation](... /... /... /configs/mot) and [MOT dataset tutorial](docs/tutorials/data/PrepareMOTDataSet.md).
```
# Single card training
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp
# Multi-card training
python -m paddle.distributed.launch --log_dir=log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp
```
More details please refer to [Getting Started for PaddleDetection](../../tutorials/GETTING_STARTED_cn.md) and [ByteTrack](../../../configs/mot/bytetrack/detector)
### 2. Load the COCO model as the pre-trained model
The currently provided pre-trained models in PaddleDetection's configurations are weights from the ImageNet dataset, loaded into the backbone network of the detection algorithm. For practical use, it is recommended to load the weights trained on the COCO dataset, which can usually provide a large improvement to the model accuracy. The method is as follows.
#### 1) Set pre-training weight path
The trained model weights for the COCO dataset are saved in the configuration folder of each algorithm, for example, PP-YOLOE-l COCO dataset weights are provided under `configs/ppyoloe`: [Link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) The configuration file sets`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams`
#### 2) Modify hyperparameters
After loading the COCO pre-training weights, the learning rate hyperparameters need to be modified, for example
In `configs/ppyoloe/*base*/optimizer_300e.yml`:
```
epoch: 120 # The original configuration is 300 epoch, after loading COCO weights, the iteration number can be reduced appropriately
LearningRate:
base_lr: 0.005 # The original configuration is 0.025, after loading COCO weights, the learning rate should be reduced.
schedulers:
- !CosineDecay
max_epochs: 144 # Modified according to the number of epochs, usually 1.2 times the number of epochs
- LinearWarmup
start_factor: 0.
epochs: 5
```
## Modify categories
When the actual application scenario category changes, the data configuration file needs to be modified, for example in `configs/datasets/coco_detection.yml`:
```
metric: COCO
num_classes: 10 # original class 80
```
After the configuration changes are completed, the COCO pre-training weights can also be loaded. PaddleDetection supports automatic loading of shape-matching weights, and weights that do not match the shape are automatically ignored, so no other modifications are needed.

View File

@@ -0,0 +1,159 @@
简体中文 | [English](./pphuman_mtmct_en.md)
# 跨镜跟踪任务二次开发
## 数据准备
### 数据格式
跨镜跟踪使用行人REID技术实现其训练方式采用多分类模型训练使用时取分类softmax头部前的特征作为检索特征向量。
因此其格式与多分类任务相同。每一个行人分配一个专属id不同行人id不同同一行人在不同图片中的id相同。
例如图片0001.jpg、0003.jpg是同一个人0002.jpg、0004.jpg是不同的其他行人。则标注id为
```
0001.jpg 00001
0002.jpg 00002
0003.jpg 00001
0004.jpg 00003
...
```
依次类推。
### 数据标注
理解了上面`标注`格式的含义后就可以进行数据标注的工作。其本质是每张单人图建立一个标注项对应该行人分配的id。
举例:
对于一张原始图片,
1 使用检测框,标注图片中每一个人的位置。
2 每一个检测框对应每一个人包含一个int类型的id属性。例如上述举例中的0001.jpg中的人对应id1.
标注完成后利用检测框将每一个人截取成单人图其图片与id属性标注建立对应关系。也可先截成单人图再进行标注效果相同。
## 模型训练
数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。
其主要有两步工作需要完成1将数据与标注数据整理成训练格式。2修改配置文件开始训练。
### 训练数据格式
训练数据包括训练使用的图片和一个训练列表bounding_box_train.txt其具体位置在训练配置中指定其放置方式示例如下
```
REID/
|-- data 训练图片文件夹
| |-- 00001.jpg
| |-- 00002.jpg
| `-- 0000x.jpg
`-- bounding_box_train.txt 训练数据列表
```
bounding_box_train.txt文件内为所有训练图片名称相对于根路径的文件路径+ 1个id标注值
其每一行表示一个人的图片和id标注结果。其格式为
```
0001.jpg 00001
0002.jpg 00002
0003.jpg 00001
0004.jpg 00003
```
注意图片与标注值之间是以Tab[\t]符号隔开。该格式不能错,否则解析失败。
### 修改配置开始训练
首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md):
```shell
git clone https://github.com/PaddlePaddle/PaddleClas
```
需要在配置文件[softmax_triplet_with_center.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml)中,修改的配置项如下:
```
Head:
name: "FC"
embedding_size: *feat_dim
class_num: &class_num 751 #行人id总数量
DataLoader:
Train:
dataset:
name: "Market1501"
image_root: "./dataset/" #训练图片根路径
cls_label_path: "bounding_box_train" #训练文件列表
Eval:
Query:
dataset:
name: "Market1501"
image_root: "./dataset/" #评估图片根路径
cls_label_path: "query" #评估文件列表
```
注意:
1. 这里image_root路径+bounding_box_train.txt中图片相对路径对应图片存放的完整路径。
然后运行以下命令开始训练。
```
#多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
#单卡训练
python3 tools/train.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
```
训练完成后可以执行以下命令进行性能评估:
```
#多卡评估
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model
#单卡评估
python3 tools/eval.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model
```
### 模型导出
使用下述命令将训练好的模型导出为预测部署模型。
```
python3 tools/export_model.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model \
-o Global.save_inference_dir=deploy/models/strong_baseline_inference
```
导出模型后,下载[infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/REID/infer_cfg.yml)文件到新导出的模型文件夹'strong_baseline_inference'中。
使用时在PP-Human中的配置文件infer_cfg_pphuman.yml中修改模型路径`model_dir`并开启功能`enable`
```
REID:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/strong_baseline_inference/
enable: True
```
然后可以使用。至此完成模型开发。

View File

@@ -0,0 +1,165 @@
[简体中文](./pphuman_mtmct.md) | English
# Customized Multi-Target Multi-Camera Tracking Module of PP-Human
## Data Preparation
### Data Format
Multi-target multi-camera tracking, or mtmct is achieved by the pedestrian REID technique. It is trained with a multiclassification model and uses the features before the head of the classification softmax as the retrieval feature vector.
Therefore its format is the same as the multi-classification task. Each pedestrian is assigned an exclusive id, which is different for different pedestrians while the same pedestrian has the same id in different images.
For example, images 0001.jpg, 0003.jpg are the same person, 0002.jpg, 0004.jpg are different pedestrians. Then the labeled ids are.
```
0001.jpg 00001
0002.jpg 00002
0003.jpg 00001
0004.jpg 00003
...
```
### Data Annotation
After understanding the meaning of the `annotation` format above, we can work on the data annotation. The essence of data annotation is that each single person diagram creates an annotation item that corresponds to the id assigned to that pedestrian.
For example:
For an original picture
1) Use bouding boxes to annotate the position of each person in the picture.
2) Each bouding box (corresponding to each person) contains an int id attribute. For example, the person in 0001.jpg in the above example corresponds to id: 1.
After the annotation is completed, use the detection box to intercept each person into a single picture, the picture and id attribute annotation will establish a corresponding relationship. You can also first cut into a single image and then annotate, the result is the same.
## Model Training
Once the data is annotated, it can be used for model training to complete the optimization of the customized model.
There are two main steps to implement: 1) organize the data and annotated data into a training format. 2) modify the configuration file to start training.
### Training data format
The training data consists of the images used for training and a training list bounding_box_train.txt, the location of which is specified in the training configuration, with the following example placement.
```
REID/
|-- data Training image folder
|-- 00001.jpg
|-- 00002.jpg
|-- 0000x.jpg
`-- bounding_box_train.txt List of training data
```
bounding_box_train.txt file contains the names of all training images (file path relative to the root path) + 1 id annotation value
Each line represents a person's image and id annotation result. The format is as follows:
```
0001.jpg 00001
0002.jpg 00002
0003.jpg 00001
0004.jpg 00003
```
Note: The images are separated from the annotated values by a Tab[\t] symbol. This format must be correct, otherwise, the parsing will fail.
### Modify the configuration to start training
First, execute the following command to download the training code (for more environment issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md):
```
git clone https://github.com/PaddlePaddle/PaddleClas
```
You need to change the following configuration items in the configuration file [softmax_triplet_with_center.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/reid/strong_ baseline/softmax_triplet_with_center.yaml):
```
Head:
name: "FC"
embedding_size: *feat_dim
class_num: &class_num 751 #Total number of pedestrian ids
DataLoader:
Train:
dataset:
name: "Market1501"
image_root: ". /dataset/" #training image root path
cls_label_path: "bounding_box_train" #training_file_list
Eval:
Query:
dataset:
name: "Market1501"
image_root: ". /dataset/" #Evaluated image root path
cls_label_path: "query" #List of evaluation files
```
Note:
1. Here the image_root path + the relative path of the image in the bounding_box_train.txt corresponds to the full path where the image is stored.
Then run the following command to start the training.
```
#Multi-card training
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
#Single card training
python3 tools/train.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
```
After the training is completed, you may run the following commands for performance evaluation:
```
#Multi-card evaluation
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model
#Single card evaluation
python3 tools/eval.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model
```
### Model Export
Use the following command to export the trained model as an inference deployment model.
```
python3 tools/export_model.py \
-c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
-o Global.pretrained_model=./output/strong_baseline/best_model \
-o Global.save_inference_dir=deploy/models/strong_baseline_inference
```
After exporting the model, download the [infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/REID/infer_cfg.yml) file to the newly exported model folder 'strong_baseline_ inference'.
Change the model path `model_dir` in the configuration file `infer_cfg_pphuman.yml` in PP-Human and set `enable`.
```
REID:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/strong_baseline_inference/
enable: True
```
Now, the model is ready.

View File

@@ -0,0 +1,257 @@
简体中文 | [English](./ppvehicle_attribute_en.md)
# 车辆属性识别任务二次开发
## 数据准备
### 数据格式
车辆属性模型采用VeRi数据集的属性共计10种车辆颜色及9种车型, 具体如下:
```
# 车辆颜色
- "yellow"
- "orange"
- "green"
- "gray"
- "red"
- "blue"
- "white"
- "golden"
- "brown"
- "black"
# 车型
- "sedan"
- "suv"
- "van"
- "hatchback"
- "mpv"
- "pickup"
- "bus"
- "truck"
- "estate"
```
在标注文件中使用长度为19的序列来表示上述属性。
举例:
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
前10位中位序号0的值为1表示车辆颜色为`"yellow"`
后9位中位序号11的值为1表示车型为`"suv"`
### 数据标注
理解了上面`数据格式`的含义后就可以进行数据标注的工作。其本质是每张车辆的图片建立一组长度为19的标注项分别对应各项属性值。
举例:
对于一张原始图片,
1 使用检测框,标注图片中每台车辆的位置。
2 每一个检测框对应每辆车包含一组19位的属性值数组数组的每一位以0或1表示。对应上述19个属性分类。例如如果颜色是'orange'则数组索引为1的位置值为1如果车型是'sedan'则数组索引为10的位置值为1。
标注完成后利用检测框将每辆车截取成只包含单辆车的图片则图片与19位属性标注建立了对应关系。也可先截取再进行标注效果相同。
## 模型训练
数据标注完成后,就可以拿来做模型的训练,完成自定义模型的优化工作。
其主要有两步工作需要完成1将数据与标注数据整理成训练格式。2修改配置文件开始训练。
### 训练数据格式
训练数据包括训练使用的图片和一个训练列表train.txt其具体位置在训练配置中指定其放置方式示例如下
```
Attribute/
|-- data 训练图片文件夹
| |-- 00001.jpg
| |-- 00002.jpg
| `-- 0000x.jpg
`-- train.txt 训练数据列表
```
train.txt文件内为所有训练图片名称相对于根路径的文件路径+ 19个标注值
其每一行表示一辆车的图片和标注结果。其格式为:
```
00001.jpg 0,0,1,0,....
```
注意1)图片与标注值之间是以Tab[\t]符号隔开, 2)标注值之间是以逗号[,]隔开。该格式不能错,否则解析失败。
### 修改配置开始训练
首先执行以下命令下载训练代码(更多环境问题请参考[Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md):
```shell
git clone https://github.com/PaddlePaddle/PaddleClas
```
需要在[配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml)中,修改的配置项如下:
```yaml
DataLoader:
Train:
dataset:
name: MultiLabelDataset
image_root: "dataset/VeRi/" # the root path of training images
cls_label_path: "dataset/VeRi/train_list.txt" # the location of the training list file
label_ratio: True
transform_ops:
...
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/VeRi/" # the root path of evaluation images
cls_label_path: "dataset/VeRi/val_list.txt" # the location of the evaluation list file
label_ratio: True
transform_ops:
...
```
注意:
1. 这里image_root路径+train.txt中图片相对路径对应图片的完整路径位置。
2. 如果有修改属性数量,则还需修改内容配置项中属性种类数量:
```yaml
# model architecture
Arch:
name: "PPLCNet_x1_0"
pretrained: True
use_ssld: True
class_num: 19 #属性种类数量
```
然后运行以下命令开始训练。
```
#多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
#单卡训练
python3 tools/train.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
```
训练完成后可以执行以下命令进行性能评估:
```
#多卡评估
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
#单卡评估
python3 tools/eval.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
```
### 模型导出
使用下述命令将训练好的模型导出为预测部署模型。
```
python3 tools/export_model.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_vehicle_attribute_model
```
导出模型后如果希望在PP-Vehicle中使用则需要下载[预测部署模型](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip),解压并将其中的配置文件`infer_cfg.yml`文件,放置到导出的模型文件夹`PPLCNet_x1_0_vehicle_attribute_model`中。
使用时在PP-Vehicle中的配置文件`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`中修改新的模型路径`model_dir`项,并开启功能`enable: True`
```
VEHICLE_ATTR:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_vehicle_attribute_infer/ #新导出的模型路径位置
enable: True #开启功能
```
然后可以使用-->至此即完成新增属性类别识别任务。
## 属性增减
该过程与行人属性的增减过程相似,如果需要增加、减少属性数量,则需要:
1)标注时需增加新属性类别信息或删减属性类别信息;
2)对应修改训练中train.txt所使用的属性数量和名称
3)修改训练配置,例如``PaddleClas/blob/develop/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml``文件中的属性数量,详细见上述`修改配置开始训练`部分。
增加属性示例:
1. 在标注数据时在19位后继续增加新的属性标注数值
2. 在train.txt文件的标注数值中也增加新的属性数值。
3. 注意属性类型在train.txt中属性数值列表中的位置的对应关系需要固定。
<div width="500" align="center">
<img src="../../images/add_attribute.png"/>
</div>
删减属性同理。
## 修改后处理代码
修改了属性定义后pipeline后处理部分也需要做相应修改主要影响结果可视化时的显示结果。
相应代码在[文件](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/ppvehicle/vehicle_attr.py#L108)中`postprocess`函数。
其函数实现说明如下:
```python
# 在类的初始化函数中,定义了颜色/车型的名称
self.color_list = [
"yellow", "orange", "green", "gray", "red", "blue", "white",
"golden", "brown", "black"
]
self.type_list = [
"sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus", "truck",
"estate"
]
...
def postprocess(self, inputs, result):
# postprocess output of predictor
im_results = result['output']
batch_res = []
for res in im_results:
res = res.tolist()
attr_res = []
color_res_str = "Color: "
type_res_str = "Type: "
color_idx = np.argmax(res[:10]) # 前10项表示各项颜色得分取得分最大项作为颜色结果
type_idx = np.argmax(res[10:]) # 后9项表示各项车型得分取得分最大项作为车型结果
# 颜色和车型的得分都需要超过对应阈值,否则视为'UnKnown'
if res[color_idx] >= self.color_threshold:
color_res_str += self.color_list[color_idx]
else:
color_res_str += "Unknown"
attr_res.append(color_res_str)
if res[type_idx + 10] >= self.type_threshold:
type_res_str += self.type_list[type_idx]
else:
type_res_str += "Unknown"
attr_res.append(type_res_str)
batch_res.append(attr_res)
result = {'output': batch_res}
return result
```

View File

@@ -0,0 +1,271 @@
[简体中文](ppvehicle_attribute.md) | English
# Customized Vehicle Attribute Recognition
## Data Preparation
### Data Format
We use the VeRi attribute annotation format, with a total of 10 color and 9 model attributes shown as follows.
```
# colors
- "yellow"
- "orange"
- "green"
- "gray"
- "red"
- "blue"
- "white"
- "golden"
- "brown"
- "black"
# models
- "sedan"
- "suv"
- "van"
- "hatchback"
- "mpv"
- "pickup"
- "bus"
- "truck"
- "estate"
```
A sequence of length 19 is used in the annotation file to represent the above attributes.
Examples:
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
In the first 10 bits, the value of bit index 0 is 1, indicating that the vehicle color is `"yellow"`.
In the last 9 bits, the value of bit index 11 is 1, indicating that the model is `"suv"`.
### Data Annotation
After knowing the purpose of the above `Data format`, we can start to annotate data. The essence is that each single-vehicle image creates a set of 19 annotation items, corresponding to the attribute values at 19 positions.
Examples:
For an original image:
1) Using bounding boxes to annotate the position of each vehicle in the picture.
2) Each detection box (corresponding to each vehicle) contains 19 attribute values which are represented by 0 or 1. It corresponds to the above 19 attributes. For example, if the color is 'orange', then the index 1 bit of the array is 1. If the model is 'sedan', then the index 10 bit of the array is 1.
After the annotation is completed, the model will use the detection box to intercept each vehicle into a single-vehicle picture, and its picture establishes a corresponding relationship with the 19 attribute annotation. It is also possible to cut into a single-vehicle image first and then annotate it. The results are the same.
## Model Training
Once the data is annotated, it can be used for model training to complete the optimization of the customized model.
There are two main steps: 1) Organize the data and annotated data into the training format. 2) Modify the configuration file to start training.
### Training Data Format
The training data includes the images used for training and a training list called train.txt. Its location is specified in the training configuration, with the following example:
```
Attribute/
|-- data Training images folder
|-- 00001.jpg
|-- 00002.jpg
| `-- 0000x.jpg
train.txt List of training data
```
train.txt file contains the names of all training images (file path relative to the root path) + 19 annotation values
Each line of it represents a vehicle's image and annotation result. The format is as follows:
```
00001.jpg 0,0,1,0,....
```
Note 1) The images are separated by Tab[\t], 2) The annotated values are separated by commas [,]. If the format is wrong, the parsing will fail.
### Modify The Configuration To Start Training
First run the following command to download the training code (for more environmental issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md)):
```
git clone https://github.com/PaddlePaddle/PaddleClas
```
You need to modify the following configuration in the configuration file `PaddleClas/blob/develop/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml`
```yaml
DataLoader:
Train:
dataset:
name: MultiLabelDataset
image_root: "dataset/VeRi/" # the root path of training images
cls_label_path: "dataset/VeRi/train_list.txt" # the location of the training list file
label_ratio: True
transform_ops:
...
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/VeRi/" # the root path of evaluation images
cls_label_path: "dataset/VeRi/val_list.txt" # the location of the training list file
label_ratio: True
transform_ops:
...
```
Note:
1. here image_root path and the relative path of the image in train.txt, corresponding to the full path of the image.
2. If you modify the number of attributes, the number of attribute types in the content configuration item should also be modified accordingly.
```yaml
# model architecture
Arch:
name: "PPLCNet_x1_0"
pretrained: True
use_ssld: True
class_num: 19 # Number of attribute classes
```
Then run the following command to start training:
```bash
#Multi-card training
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
#Single card training
python3 tools/train.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml
```
You can run the following commands for performance evaluation after the training is completed:
```
#Multi-card evaluation
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
#Single card evaluation
python3 tools/eval.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
```
### Model Export
Use the following command to export the trained model as an inference deployment model.
```
python3 tools/export_model.py \
-c ./ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_vehicle_attribute_model
```
After exporting the model, if want to use it in PP-Vehicle, you need to download the [deploy infer model](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) and copy `infer_cfg.yml` into the exported model folder `PPLCNet_x1_0_vehicle_attribute_model` .
When you use the model, you need to modify the new model path `model_dir` entry and set `enable: True` in the configuration file of PP-Vehicle `. /deploy/pipeline/config/infer_cfg_ppvehicle.yml` .
```
VEHICLE_ATTR:
model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_vehicle_attribute_infer/ #The exported model location
enable: True #Whether to enable the function
```
To this point, a new attribute category recognition task is completed.
## Adding or deleting attributes
This is similar to the increase and decrease process of pedestrian attributes.
If the attributes need to be added or deleted, you need to
1) New attribute category information needs to be added or deleted when annotating the data.
2) Modify the number and name of attributes used in train.txt corresponding to the training.
3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/vehicle_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above.
Example of adding attributes.
1. Continue to add new attribute annotation values after 19 values when annotating the data.
2. Add new attribute values to the annotated values in the train.txt file as well.
3. The above is the annotation and training process with 19 attributes.
<div width="500" align="center">
<img src="../../images/add_attribute.png"/>
</div>
The same applies to the deletion of attributes.
## Modifications to post-processing code
After modifying the attribute definition, the post-processing part of the pipeline also needs to be modified accordingly, which mainly affects the display results when the results are visualized.
The code is at [file](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/pipeline/ppvehicle/vehicle_attr.py#L108), that is, the `postprocess` function.
The function implementation is described as follows:
```python
# The name of the color/model is defined in the initialization function of the class
self.color_list = [
"yellow", "orange", "green", "gray", "red", "blue", "white",
"golden", "brown", "black"
]
self.type_list = [
"sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus", "truck",
"estate"
]
...
def postprocess(self, inputs, result):
# postprocess output of predictor
im_results = result['output']
batch_res = []
for res in im_results:
res = res.tolist()
attr_res = []
color_res_str = "Color: "
type_res_str = "Type: "
color_idx = np.argmax(res[:10]) # The first 10 items represent the color scores, and the item with the largest score is used as the color result
type_idx = np.argmax(res[10:]) # The last 9 items represent the model scores, and the item with the largest score is used as the model result.
# The score of color and model need to be larger than the corresponding threshold, otherwise it will be regarded as 'UnKnown'
if res[color_idx] >= self.color_threshold:
color_res_str += self.color_list[color_idx]
else:
color_res_str += "Unknown"
attr_res.append(color_res_str)
if res[type_idx + 10] >= self.type_threshold:
type_res_str += self.type_list[type_idx]
else:
type_res_str += "Unknown"
attr_res.append(type_res_str)
batch_res.append(attr_res)
result = {'output': batch_res}
return result
```

View File

@@ -0,0 +1,215 @@
简体中文 | [English](./ppvehicle_plate_en.md)
# 车牌识别任务二次开发
车牌识别任务采用PP-OCRv3模型在车牌数据集上进行fine-tune得到过程参考[PaddleOCR车牌应用介绍](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/applications/%E8%BD%BB%E9%87%8F%E7%BA%A7%E8%BD%A6%E7%89%8C%E8%AF%86%E5%88%AB.md)在CCPD2019数据集上进行了拓展。
## 数据准备
1. 对于CCPD2019、CCPD2020数据集我们提供了处理脚本[ccpd2ocr_all.py](../../../deploy/pipeline/tools/ccpd2ocr_all.py), 使用时跟CCPD2019、CCPD2020数据集文件夹放在同一目录下然后执行脚本即可在CCPD2019/PPOCR、CCPD2020/PPOCR目录下得到检测、识别模型的训练标注文件。训练时可以整合到一起使用。
2. 对于其他来源数据或者自标注数据,可以按如下格式整理训练列表文件:
- **车牌检测标注**
标注文件格式如下,中间用'\t'分隔:
```
" 图像文件路径 标注框标注信息"
CCPD2020/xxx.jpg [{"transcription": "京AD88888", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
```
标注框标注信息是包含多个字典的list有多少个标注框就有多少个字典对应字典中的 `points` 表示车牌框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 `transcription` 表示当前文本框的文字,***当其内容为“###”时,表示该文本框无效,在训练时会跳过。***
- **车牌字符识别标注**
标注文件的格式如下txt文件中默认请将图片路径和图片标签用'\t'分割,如用其他方式分割将造成训练报错。其中图片是对车牌字符的截图。
```
" 图像文件名 字符标注信息 "
CCPD2020/crop_imgs/xxx.jpg 京AD88888
```
## 模型训练
首先执行以下命令clone PaddleOCR库代码到训练机器
```
git clone git@github.com:PaddlePaddle/PaddleOCR.git
```
下载预训练模型:
```
#检测预训练模型:
mkdir models
cd models
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar -xf ch_PP-OCRv3_det_distill_train.tar
#识别预训练模型:
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
tar -xf ch_PP-OCRv3_rec_train.tar
cd ..
```
安装相关依赖环境:
```
cd PaddleOCR
pip install -r requirements.txt
```
然后进行训练相关配置修改。
### 修改配置
**检测模型配置项**
修改配置项包括以下3部分内容可以在训练时以命令行修改或者直接在配置文件`configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml`中修改:
1. 模型存储和训练相关:
- Global.pretrained_model: 指向前面下载的PP-OCRv3文本检测预训练模型地址
- Global.eval_batch_step: 模型多少step评估一次一般设置为一个epoch对应的step数可以从训练开始的log中读取。此处以[0, 772]为例第一个数字表示从第0各step开始算起。
2. 优化器相关:
- Optimizer.lr.name: 学习率衰减器设为常量 Const
- Optimizer.lr.learning_rate: 做 fine-tune 实验学习率需要设置的比较小此处学习率设为配置文件中的0.05倍
- Optimizer.lr.warmup_epoch: warmup_epoch设为0
3. 数据集相关:
- Train.dataset.data_dir指向训练集图片存放根目录
- Train.dataset.label_file_list指向训练集标注文件
- Eval.dataset.data_dir指向测试集图片存放根目录
- Eval.dataset.label_file_list指向测试集标注文件
**识别模型配置项**
修改配置项包括以下3部分内容可以在训练时以命令行修改或者直接在配置文件`configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml`中修改:
1. 模型存储和训练相关:
- Global.pretrained_model: 指向PP-OCRv3文本识别预训练模型地址
- Global.eval_batch_step: 模型多少step评估一次一般设置为一个epoch对应的step数可以从训练开始的log中读取。此处以[0, 90]为例第一个数字表示从第0各step开始算起。
2. 优化器相关
- Optimizer.lr.name: 学习率衰减器设为常量 Const
- Optimizer.lr.learning_rate: 做 fine-tune 实验学习率需要设置的比较小此处学习率设为配置文件中的0.05倍
- Optimizer.lr.warmup_epoch: warmup_epoch设为0
3. 数据集相关
- Train.dataset.data_dir指向训练集图片存放根目录
- Train.dataset.label_file_list指向训练集标注文件
- Eval.dataset.data_dir指向测试集图片存放根目录
- Eval.dataset.label_file_list指向测试集标注文件
### 执行训练
然后运行以下命令开始训练。如果在配置文件中已经做了修改,可以省略`-o`及其后面的内容。
**检测模型训练命令**
```
#单卡训练
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \
Global.save_model_dir=output/CCPD/det \
Global.eval_batch_step="[0, 772]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/ccpd_data/ \
Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/det.txt]
#多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \
Global.save_model_dir=output/CCPD/det \
Global.eval_batch_step="[0, 772]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/ccpd_data/ \
Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/det.txt]
```
训练完成后可以执行以下命令进行性能评估:
```
#单卡评估
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
Eval.dataset.data_dir=/home/aistudio/ccpd_data/ \
Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/det.txt]
```
**识别模型训练命令**
```
#单卡训练
python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
Global.save_model_dir=output/CCPD/rec/ \
Global.eval_batch_step="[0, 90]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/ccpd_data \
Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/rec.txt] \
Eval.dataset.data_dir=/home/aistudio/ccpd_data \
Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/rec.txt]
#多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
Global.save_model_dir=output/CCPD/rec/ \
Global.eval_batch_step="[0, 90]" \
Optimizer.lr.name=Const \
Optimizer.lr.learning_rate=0.0005 \
Optimizer.lr.warmup_epoch=0 \
Train.dataset.data_dir=/home/aistudio/ccpd_data \
Train.dataset.label_file_list=[/home/aistudio/ccpd_data/train/rec.txt] \
Eval.dataset.data_dir=/home/aistudio/ccpd_data \
Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/rec.txt]
```
训练完成后可以执行以下命令进行性能评估:
```
#单卡评估
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
Eval.dataset.data_dir=/home/aistudio/ccpd_data/ \
Eval.dataset.label_file_list=[/home/aistudio/ccpd_data/test/rec.txt]
```
### 模型导出
使用下述命令将训练好的模型导出为预测部署模型。
**检测模型导出**
```
python tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
Global.save_inference_dir=output/det/infer
```
**识别模型导出**
```
python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
Global.save_inference_dir=output/CCPD/rec/infer
```
使用时在PP-Vehicle中的配置文件`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`中修改`VEHICLE_PLATE`模块中的`det_model_dir``rec_model_dir`项,并开启功能`enable: True`
```
VEHICLE_PLATE:
det_model_dir: [YOUR_DET_INFERENCE_MODEL_PATH] #设置检测模型路径
det_limit_side_len: 736
det_limit_type: "max"
rec_model_dir: [YOUR_REC_INFERENCE_MODEL_PATH] #设置识别模型路径
rec_image_shape: [3, 48, 320]
rec_batch_num: 6
word_dict_path: deploy/pipeline/ppvehicle/rec_word_dict.txt
enable: True #开启功能
```
然后可以使用-->至此即完成更新车牌识别模型任务。

View File

@@ -0,0 +1,235 @@
简体中文 | [English](./ppvehicle_violation_en.md)
# 车辆违章任务二次开发
车辆违章任务的二次开发主要集中于车道线分割模型任务。采用PP-LiteSeg模型在车道线数据集bdd100k,上进行fine-tune得到过程参考[PP-LiteSeg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/configs/pp_liteseg/README.md)。
## 数据准备
ppvehicle违法分析将车道线类别分为4类
```
0 背景
1 双黄线
2 实线
3 虚线
```
1. 对于bdd100k数据集可以结合我们的提供的处理脚本[lane_to_mask.py](../../../deploy/pipeline/tools/lane_to_mask.py)和bdd100k官方[repo](https://github.com/bdd100k/bdd100k)将数据处理成分割需要的数据格式.
```
#首先执行以下命令clone bdd100k库
git clone https://github.com/bdd100k/bdd100k.git
#拷贝lane_to_mask.py到bdd100k目录
cp PaddleDetection/deploy/pipeline/tools/lane_to_mask.py bdd100k/
#准备bdd100k环境
cd bdd100k && pip install -r requirements.txt
#数据转换
python lane_to_mask.py -i dataset/labels/lane/polygons/lane_train.json -o /output_path
# -i bdd100k数据集label的json路径
# -o 生成的mask图像路径
```
2. 整理数据,按如下格式存放数据
```
dataset_root
|
|--images
| |--train
| |--image1.jpg
| |--image2.jpg
| |--...
| |--val
| |--image3.jpg
| |--image4.jpg
| |--...
| |--test
| |--image5.jpg
| |--image6.jpg
| |--...
|
|--labels
| |--train
| |--label1.jpg
| |--label2.jpg
| |--...
| |--val
| |--label3.jpg
| |--label4.jpg
| |--...
| |--test
| |--label5.jpg
| |--label6.jpg
| |--...
|
```
运行[create_dataset_list.py](../../../deploy/pipeline/tools/create_dataset_list.py)生成txt文件
```
python create_dataset_list.py <dataset_root> #数据根目录
--type custom #数据类型支持cityscapes、custom
```
其他数据以及数据标注可参考PaddleSeg[准备自定义数据集](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/data/marker/marker_cn.md)
## 模型训练
首先执行以下命令clone PaddleSeg库代码到训练机器
```
git clone https://github.com/PaddlePaddle/PaddleSeg.git
```
安装相关依赖环境:
```
cd PaddleSeg
pip install -r requirements.txt
```
### 准备配置文件
详细可参考PaddleSeg[准备配置文件](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/config/pre_config_cn.md).
本例用pp_liteseg_stdc2_bdd100k_1024x512.yml示例
```
batch_size: 16
iters: 50000
train_dataset:
type: Dataset
dataset_root: data/bdd100k #数据集路径
train_path: data/bdd100k/train.txt #数据集训练txt文件
num_classes: 4 #ppvehicle将道路分为4类
mode: train
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [512, 1024]
- type: RandomHorizontalFlip
- type: RandomAffine
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
val_dataset:
type: Dataset
dataset_root: data/bdd100k #数据集路径
val_path: data/bdd100k/val.txt #数据集验证集txt文件
num_classes: 4
mode: val
transforms:
- type: Normalize
optimizer:
type: sgd
momentum: 0.9
weight_decay: 4.0e-5
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01 #0.01
end_lr: 0
power: 0.9
loss:
types:
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
coef: [1, 1,1]
model:
type: PPLiteSeg
backbone:
type: STDC2
pretrained: https://bj.bcebos.com/paddleseg/dygraph/PP_STDCNet2.tar.gz #预训练模型
```
### 执行训练
```
#单卡训练
export CUDA_VISIBLE_DEVICES=0 # Linux上设置1张可用的卡
# set CUDA_VISIBLE_DEVICES=0 # Windows上设置1张可用的卡
python train.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--do_eval \
--use_vdl \
--save_interval 500 \
--save_dir output
```
### 训练参数解释
```
--do_eval 是否在保存模型时启动评估, 启动时将会根据mIoU保存最佳模型至best_model
--use_vdl 是否开启visualdl记录训练数据
--save_interval 500 模型保存的间隔步数
--save_dir output 模型输出路径
```
## 2、多卡训练
如果想要使用多卡训练的话需要将环境变量CUDA_VISIBLE_DEVICES指定为多卡不指定时默认使用所有的gpu)并使用paddle.distributed.launch启动训练脚本windows下由于不支持nccl无法使用多卡训练:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3 # 设置4张可用的卡
python -m paddle.distributed.launch train.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--do_eval \
--use_vdl \
--save_interval 500 \
--save_dir output
```
训练完成后可以执行以下命令进行性能评估:
```
#单卡评估
python val.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--model_path output/iter_1000/model.pdparams
```
### 模型导出
使用下述命令将训练好的模型导出为预测部署模型。
```
python export.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--model_path output/iter_1000/model.pdparams \
--save_dir output/inference_model
```
使用时在PP-Vehicle中的配置文件`./deploy/pipeline/config/infer_cfg_ppvehicle.yml`中修改`LANE_SEG`模块中的`model_dir`项.
```
LANE_SEG:
lane_seg_config: deploy/pipeline/config/lane_seg_config.yml
model_dir: output/inference_model
```
然后可以使用-->至此即完成更新车道线分割模型任务。

View File

@@ -0,0 +1,240 @@
English | [简体中文](./ppvehicle_violation.md)
# Customized Vehicle Violation
The secondary development of vehicle violation task mainly focuses on the task of lane line segmentation model. PP-LiteSeg model is used to get the lane line data set bdd100k through fine-tune. The process is referred to [PP-LiteSeg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/configs/pp_liteseg/README.md)。
## Data preparation
ppvehicle violation analysis divides the lane line into 4 categories
```
0 Background
1 double yellow line
2 Solid line
3 Dashed line
```
1. For the bdd100k data set, we can combine the processing script provided by [lane_to_mask.py](../../../deploy/pipeline/tools/lane_to_mask.py) and bdd100k [repo](https://github.com/bdd100k/bdd100k) to process the data into the data format required for segmentation.
```
# clone bdd100k
git clone https://github.com/bdd100k/bdd100k.git
# copy lane_to_mask.py to bdd100k/
cp PaddleDetection/deploy/pipeline/tools/lane_to_mask.py bdd100k/
# preparation bdd100k env
cd bdd100k && pip install -r requirements.txt
#bdd100k to mask
python lane_to_mask.py -i dataset/labels/lane/polygons/lane_train.json -o /output_path
# -i means input path for bdd100k dataset label json
# -o for output patn
```
2. Organize data and store data in the following format:
```
dataset_root
|
|--images
| |--train
| |--image1.jpg
| |--image2.jpg
| |--...
| |--val
| |--image3.jpg
| |--image4.jpg
| |--...
| |--test
| |--image5.jpg
| |--image6.jpg
| |--...
|
|--labels
| |--train
| |--label1.jpg
| |--label2.jpg
| |--...
| |--val
| |--label3.jpg
| |--label4.jpg
| |--...
| |--test
| |--label5.jpg
| |--label6.jpg
| |--...
|
```
run [create_dataset_list.py](../../../deploy/pipeline/tools/create_dataset_list.py) create txt file
```
python create_dataset_list.py <dataset_root> #dataset path
--type custom #dataset typesupport cityscapes、custom
```
For other data and data annotation, please refer to PaddleSeg [Prepare Custom Datasets](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/data/marker/marker_cn.md)
## model training
clone PaddleSeg
```
git clone https://github.com/PaddlePaddle/PaddleSeg.git
```
prepapation env
```
cd PaddleSeg
pip install -r requirements.txt
```
### Prepare configuration file
For details, please refer to PaddleSeg [prepare configuration file](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.7/docs/config/pre_config_cn.md).
exp: pp_liteseg_stdc2_bdd100k_1024x512.yml
```
batch_size: 16
iters: 50000
train_dataset:
type: Dataset
dataset_root: data/bdd100k #dataset path
train_path: data/bdd100k/train.txt #dataset train txt
num_classes: 4 #lane classes
mode: train
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [512, 1024]
- type: RandomHorizontalFlip
- type: RandomAffine
- type: RandomDistort
brightness_range: 0.5
contrast_range: 0.5
saturation_range: 0.5
- type: Normalize
val_dataset:
type: Dataset
dataset_root: data/bdd100k #dataset path
val_path: data/bdd100k/val.txt #dataset val txt
num_classes: 4
mode: val
transforms:
- type: Normalize
optimizer:
type: sgd
momentum: 0.9
weight_decay: 4.0e-5
lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01 #0.01
end_lr: 0
power: 0.9
loss:
types:
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
- type: MixedLoss
losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss
coef: [0.6, 0.4]
coef: [1, 1,1]
model:
type: PPLiteSeg
backbone:
type: STDC2
pretrained: https://bj.bcebos.com/paddleseg/dygraph/PP_STDCNet2.tar.gz #Pre-training model
```
### training model
```
#Single GPU training
export CUDA_VISIBLE_DEVICES=0 # Linux
# set CUDA_VISIBLE_DEVICES=0 # Windows
python train.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--do_eval \
--use_vdl \
--save_interval 500 \
--save_dir output
```
### Explanation of training parameters
```
--do_eval Whether to start the evaluation when saving the model. When starting, the best model will be saved to best according to mIoU model
--use_vdl Whether to enable visualdl to record training data
--save_interval 500 Number of steps between model saving
--save_dir output Model output path
```
## 2、Multiple GPUs training
if you want to use multiple gpus training, you need to set the environment variable CUDA_VISIBLE_DEVICES is specified as multiple gpus (if not specified, all gpus will be used by default), and the training script will be started using paddle.distributed.launch (because nccl is not supported under windows, multi-card training cannot be used):
```
export CUDA_VISIBLE_DEVICES=0,1,2,3 # 4 gpus
python -m paddle.distributed.launch train.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--do_eval \
--use_vdl \
--save_interval 500 \
--save_dir output
```
After training, you can execute the following commands for performance evaluation:
```
python val.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--model_path output/iter_1000/model.pdparams
```
### Model export
Use the following command to export the trained model as a prediction deployment model.
```
python export.py \
--config configs/pp_liteseg/pp_liteseg_stdc2_bdd100k_1024x512.yml \
--model_path output/iter_1000/model.pdparams \
--save_dir output/inference_model
```
Profile in PP-Vehicle when used `./deploy/pipeline/config/infer_cfg_ppvehicle.yml` set `model_dir` in `LANE_SEG`.
```
LANE_SEG:
lane_seg_config: deploy/pipeline/config/lane_seg_config.yml
model_dir: output/inference_model
```
Then you can use -->to finish the task of updating the lane line segmentation model.

View File

@@ -0,0 +1,159 @@
# Using OpenVINO for Inference
## Introduction
PaddleDetection has been a vibrant open-source project and has a large amout of contributors and maintainers around it. It is an AI framework which enables developers to quickly integrate AI capacities into their own projects and applications.
Intel OpenVINO is a widely used free toolkit. It facilitates the optimization of a deep learning model from a framework and deployment using an inference engine onto Intel hardware.
Apparently, the upstream(Paddle) and the downstream(Intel OpenVINO) can work together to streamline and simplify the process of developing an AI model and deploying the model onto hardware, which, in turn, makes our lives easier.
This article will show you how to use a PaddleDetection model [FairMOT](../../../configs/mot/fairmot/README.md) from the Model Zoo in PaddleDetection and use it with OpenVINO to do the inference.
------------
## Prerequisites
This article is not an entry level introduction to help you set up everything, in order to focus on its main purpose, the instruction of setting up environment will be kept at the minmum level and respective instructions will be provided by their official website links.
Before we can do anything, please make sure you have PaddlePaddle environment set up.
```
conda install paddlepaddle==2.2.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
Please also download the converted [ONNX format of FairMOT](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_576_320_v3.onnx)
## Export the PaddleDetection Model to ONNX format
1. Download the [FairMOT Inference Model](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.tar)
2. Using Paddle2ONNX to convert the model
Make sure you have the [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) installed
```
paddle2onnx --model_dir . --model_filename model.pdmodel \
--params_filename model.pdiparams \
--input_shape_dict "{'image': [1, 3, 320, 576], 'scale_factor': [1, 2], 'im_shape': [1, 2]}" \
--save_file fairmot_576_320_v2.onnx \
--opset_version 12 \
--enable_onnx_checker True
```
For more details about how to convert Paddle models to ONNX, please see [Export ONNX Model](../../../deploy/EXPORT_ONNX_MODEL_en.md).
## Use the ONNX model for inference
Once the Paddle model has been converted to ONNX format, we can then use it with OpenVINO inference engine to do the prediction.
*<sub>Please make sure you have the OpenVINO installed, here is the [instruction for installation](https://docs.openvino.ai/cn/latest/openvino_docs_install_guides_installing_openvino_linux.html).<sub>*
1. ### Get the execution network
So the 1st thing to do here is to get an execution network which can be used later to do the inference.
Here is the code.
```
def get_net():
ie = IECore()
model_path = root_path / "PaddleDetection/FairMot/fairmot_576_320_v3.onnx"
net = ie.read_network(model= str(model_path))
exec_net = ie.load_network(network=net, device_name="CPU")
return net, exec_net
```
2. ### Preprocessing
Every AI model has its own steps of preprocessing, let's have a look how to do it for the FairMOT model:
```
def prepare_input():
transforms = [
T.Resize(target_size=(target_width, target_height)),
T.Normalize(mean=(0,0,0), std=(1,1,1))
]
img_file = root_path / "images/street.jpeg"
img = cv2.imread(str(img_file))
normalized_img, _ = T.Compose(transforms)(img)
# add an new axis in front
img_input = normalized_img[np.newaxis, :]
# scale_factor is calculated as: im_shape / original_im_shape
h_scale = target_height / img.shape[0]
w_scale = target_width / img.shape[1]
input = {"image": img_input, "im_shape": [target_height, target_width], "scale_factor": [h_scale, w_scale]}
return input, img
```
3. ### Prediction
After we have done all the load network and preprocessing, it finally comes to the stage of prediction.
```
def predict(exec_net, input):
result = exec_net.infer(input)
return result
```
You might be surprised to see the very exciting stage this small. Hang on there, the next stage is actually big again.
4. ### Post-processing
MOT(Multi-Object Tracking) is special, not like other AI models which require a few steps of post-processing. Instead, FairMOT requires a special object called tracker, to handle the prediction results. The prediction results are prediction detections and prediction embeddings.
Luckily, PaddleDetection has made this procesure easy for us, it has exported the JDETracker from `ppdet`, so that we do not need to write much code to handle it.
```
def postprocess(pred_dets, pred_embs, threshold = 0.5):
tracker = JDETracker()
online_targets_dict = tracker.update(pred_dets, pred_embs)
online_tlwhs = defaultdict(list)
online_scores = defaultdict(list)
online_ids = defaultdict(list)
for cls_id in range(1):
online_targets = online_targets_dict[cls_id]
for t in online_targets:
tlwh = t.tlwh
tid = t.track_id
tscore = t.score
# make sure the tscore is no less then the threshold.
if tscore < threshold: continue
# make sure the target area is not less than the min_box_area.
if tlwh[2] * tlwh[3] <= tracker.min_box_area:
continue
# make sure the vertical ratio of a found target is within the range (1.6 as default ratio).
if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[3] > tracker.vertical_ratio:
continue
online_tlwhs[cls_id].append(tlwh)
online_ids[cls_id].append(tid)
online_scores[cls_id].append(tscore)
online_im = plot_tracking_dict(
img,
1,
online_tlwhs,
online_ids,
online_scores,
frame_id=0)
return online_im
```
5. ### Plot the detections (Optional)
This step is optional. For demo purpose, I just use `plot_tracking_dict()` method to draw all boundary boxes on the image. But you do not need to do this if you don't have the same requirement.
```
online_im = plot_tracking_dict(
img,
1,
online_tlwhs,
online_ids,
online_scores,
frame_id=0)
```
So these are the all steps which you need to follow in order to run FairMOT on your machine.
A companion article which explains in details of this procedure will be released soon and a link to that article will be updated here soon.
To see the full code, please take a look at [Paddle OpenVINO Prediction](./fairmot_onnx_openvino.py).

View File

@@ -0,0 +1,157 @@
# 将FairMOT模型为ONNX格式,并用OpenVINO做推理
## 简介
PaddleDetection是一个充满活力的开源项目拥有大量的贡献者和维护者。 PaddleDetection是PaddlePaddle下面一个人工智能框物体检测工具集能够帮助开发人员快速的将人工智能集成到自己的项目和应用程序中。
Intel OpenVINO 是一个广泛使用的免费工具包。 它能帮助优化深度学习模型,并使用推理引擎将其部署到英特尔硬件上。
很显然当我们可以协同上下游PaddlePaddle, OpenVINO一起工作,这将可以极大的简化工作流程, 并且帮助我们实现AI模型从开发到部署的流水线工作模式, 这也让我们的生活更轻松。
本文将向您展示如何在 PaddleDetection 中使用 Model Zoo 中的FairMOT模型 [FairMOT](../../../configs/mot/fairmot/README.md) 并用OpenVINO来实现推理过程。
------------
## 前提要求
为了专注于介绍如何在OpenVINO中使用飞桨的模型这一主题,本文将不是一片入门级文章,它不会帮助您设置好您的开发环境, 本文只会提供最核心的组件安装, 并且会为每个需要用到的组件提供相应的链接.
在开始之前 请确保您已经安装了 PaddlePaddle.
```
conda install paddlepaddle==2.2.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
```
为了运行演示程序, 您还需要下载已经转换好了的[ONNX格式的FairMOT模型](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_576_320_v3.onnx).
## 将FairMOT模型到ONNX格式
1. 下载[FairMOT推理模型](https://bj.bcebos.com/v1/paddledet/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_576x320.tar).
2. 使用Paddle2ONNX来转换FairMOT模型.
请确保您已经安装了[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
```
paddle2onnx --model_dir . --model_filename model.pdmodel \
--params_filename model.pdiparams \
--input_shape_dict "{'image': [1, 3, 320, 576], 'scale_factor': [1, 2], 'im_shape': [1, 2]}" \
--save_file fairmot_576_320_v2.onnx \
--opset_version 12 \
--enable_onnx_checker True
```
更多关于如何使用Paddle2ONNX的详细信息, 请参考: [ONNX模型导出](../../../deploy/EXPORT_ONNX_MODEL_en.md).
## 使用ONNX模型以及OpenVINO进行推理
当我们把Paddle模型转换成ONNX模型之后, 我们可以直接使用OpenVINO读取其模型 并且进行推理.
*<sub>请确保您已经安装了OpenVINO, 这里是[OpenVINO的安装指南](https://docs.openvino.ai/cn/latest/openvino_docs_install_guides_installing_openvino_linux.html).<sub>*
1. ### 创建一个execution network
所以这里要做的第一件事是获得一个执行网络,以后可以使用它来进行推理。
代码如下:
```
def get_net():
ie = IECore()
model_path = root_path / "PaddleDetection/FairMot/fairmot_576_320_v3.onnx"
net = ie.read_network(model= str(model_path))
exec_net = ie.load_network(network=net, device_name="CPU")
return net, exec_net
```
2. ### 预处理
每个 AI 模型都有自己不同的预处理步骤,让我们看看 FairMOT 模型是如何做的:
```
def prepare_input():
transforms = [
T.Resize(target_size=(target_width, target_height)),
T.Normalize(mean=(0,0,0), std=(1,1,1))
]
img_file = root_path / "images/street.jpeg"
img = cv2.imread(str(img_file))
normalized_img, _ = T.Compose(transforms)(img)
# add an new axis in front
img_input = normalized_img[np.newaxis, :]
# scale_factor is calculated as: im_shape / original_im_shape
h_scale = target_height / img.shape[0]
w_scale = target_width / img.shape[1]
input = {"image": img_input, "im_shape": [target_height, target_width], "scale_factor": [h_scale, w_scale]}
return input, img
```
3. ### 预测
在我们完成了所有的负载网络和预处理之后,终于开始了预测阶段。
```
def predict(exec_net, input):
result = exec_net.infer(input)
return result
```
您可能会惊讶地看到, 最激动人心的步骤居然如此简单。 不过下一个阶段会更加复杂。
4. ### 后处理
相较于大多数其他类型的AI推理, MOTMulti-Object Tracking显然是特殊的. FairMOT 需要一个称为跟踪器的特殊对象来处理预测结果。 这个预测结果则包括预测检测和预测的行人特征向量。
幸运的是PaddleDetection 为我们简化了这个过程,我们可以从`ppdet`导出JDETracker然后用这个tracker挑选出来符合条件的检测框,而且我们不需要编写太多代码来处理它。
```
def postprocess(pred_dets, pred_embs, threshold = 0.5):
tracker = JDETracker()
online_targets_dict = tracker.update(pred_dets, pred_embs)
online_tlwhs = defaultdict(list)
online_scores = defaultdict(list)
online_ids = defaultdict(list)
for cls_id in range(1):
online_targets = online_targets_dict[cls_id]
for t in online_targets:
tlwh = t.tlwh
tid = t.track_id
tscore = t.score
# make sure the tscore is no less then the threshold.
if tscore < threshold: continue
# make sure the target area is not less than the min_box_area.
if tlwh[2] * tlwh[3] <= tracker.min_box_area:
continue
# make sure the vertical ratio of a found target is within the range (1.6 as default ratio).
if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[3] > tracker.vertical_ratio:
continue
online_tlwhs[cls_id].append(tlwh)
online_ids[cls_id].append(tid)
online_scores[cls_id].append(tscore)
online_im = plot_tracking_dict(
img,
1,
online_tlwhs,
online_ids,
online_scores,
frame_id=0)
return online_im
```
5. ### 画出检测框(可选)
这一步是可选的。出于演示目的,我只使用 `plot_tracking_dict()` 方法在图像上绘制所有边界框。 但是,如果您没有相同的要求,则不需要这样做。
```
online_im = plot_tracking_dict(
img,
1,
online_tlwhs,
online_ids,
online_scores,
frame_id=0)
```
这些就是在您的硬件上运行 FairMOT 所需要遵循的所有步骤。
之后会有一篇详细解释此过程的配套文章将会发布,并且该文章的链接将很快在此处更新。
完整代码请查看 [Paddle OpenVINO 预测](./fairmot_onnx_openvino.py).

View File

@@ -0,0 +1,104 @@
from collections import defaultdict
from pathlib import Path
import cv2
import numpy as np
import paddle.vision.transforms as T
from openvino.inference_engine import IECore
from ppdet.modeling.mot.tracker import JDETracker
from ppdet.modeling.mot.visualization import plot_tracking_dict
root_path = Path(__file__).parent
target_height = 320
target_width = 576
# -------------------------------
def get_net():
ie = IECore()
model_path = root_path / "fairmot_576_320_v3.onnx"
net = ie.read_network(model= str(model_path))
exec_net = ie.load_network(network=net, device_name="CPU")
return net, exec_net
def get_output_names(net):
output_names = [key for key in net.outputs]
return output_names
def prepare_input():
transforms = [
T.Resize(size=(target_height, target_width)),
T.Normalize(mean=(0,0,0), std=(1,1,1), data_format='HWC', to_rgb= True),
T.Transpose()
]
img_file = root_path / "street.jpeg"
img = cv2.imread(str(img_file))
normalized_img = T.Compose(transforms)(img)
normalized_img = normalized_img.astype(np.float32, copy=False) / 255.0
# add an new axis in front
img_input = normalized_img[np.newaxis, :]
# scale_factor is calculated as: im_shape / original_im_shape
h_scale = target_height / img.shape[0]
w_scale = target_width / img.shape[1]
input = {"image": img_input, "im_shape": [target_height, target_width], "scale_factor": [h_scale, w_scale]}
return input, img
def predict(exec_net, input):
result = exec_net.infer(input)
return result
def postprocess(pred_dets, pred_embs, threshold = 0.5):
tracker = JDETracker()
online_targets_dict = tracker.update(pred_dets, pred_embs)
online_tlwhs = defaultdict(list)
online_scores = defaultdict(list)
online_ids = defaultdict(list)
for cls_id in range(1):
online_targets = online_targets_dict[cls_id]
for t in online_targets:
tlwh = t.tlwh
tid = t.track_id
tscore = t.score
# make sure the tscore is no less then the threshold.
if tscore < threshold: continue
# make sure the target area is not less than the min_box_area.
if tlwh[2] * tlwh[3] <= tracker.min_box_area:
continue
# make sure the vertical ratio of a found target is within the range (1.6 as default ratio).
if tracker.vertical_ratio > 0 and tlwh[2] / tlwh[3] > tracker.vertical_ratio:
continue
online_tlwhs[cls_id].append(tlwh)
online_ids[cls_id].append(tid)
online_scores[cls_id].append(tscore)
online_im = plot_tracking_dict(
img,
1,
online_tlwhs,
online_ids,
online_scores,
frame_id=0)
return online_im
# -------------------------------
net, exec_net = get_net()
output_names = get_output_names(net)
del net
input, img = prepare_input()
result = predict(exec_net, input)
pred_dets = result[output_names[0]]
pred_embs = result[output_names[1]]
processed_img = postprocess(pred_dets, pred_embs)
tracked_img_file_path = root_path / "tracked.jpg"
cv2.imwrite(str(tracked_img_file_path), processed_img)

View File

@@ -0,0 +1,4 @@
numpy
opencv-python
openvino == 2021.4.0
paddledet==2.3.0

Binary file not shown.

After

Width:  |  Height:  |  Size: 500 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 951 KiB

View File

@@ -0,0 +1,33 @@
# Contributing to PaddleDetection
PaddleDetection非常欢迎你加入到飞桨社区的开源建设中你可以通过以下方式参与贡献
- 新建一个 ISSUE 来反馈 bug
- 新建一个 ISSUE 来提出新功能需求、建议、疑问
- 提 PR 来修复一个 bug
- 提 PR 来实现一个新功能
同时我们也会组织专项活动引导大家参与到PaddleDetection的开发中
- [Yes, PP-YOLOE! 基于PP-YOLOE的算法开发](https://github.com/PaddlePaddle/PaddleDetection/issues/7345)
## 贡献指南
提ISSUE、PR的步骤请参考[飞桨官网-贡献指南-代码贡献流程](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/code_contributing_path_cn.html)
## 开发者
我们非常欢迎你可以为PaddleDetection提供代码也十分感谢你的反馈。
- 感谢[Mandroide](https://github.com/Mandroide)清理代码并且统一部分函数接口。
- 感谢[FL77N](https://github.com/FL77N/)贡献`Sparse-RCNN`模型。
- 感谢[Chen-Song](https://github.com/Chen-Song)贡献`Swin Faster-RCNN`模型。
- 感谢[yangyudong](https://github.com/yangyudong2020), [hchhtc123](https://github.com/hchhtc123) 开发PP-Tracking GUI界面
- 感谢Shigure19 开发PP-TinyPose健身APP
- 感谢[manangoel99](https://github.com/manangoel99)贡献Wandb可视化方式
- 感谢百度ACG政务产品部统管通办研发组视觉研发团队贡献PP-YOLOE蒸馏方案
非常感谢大家为飞桨贡献!共建飞桨繁荣社区!

View File

@@ -0,0 +1,80 @@
# [Contribute to PaddleDetection] Yes, PP-YOLOE! 基于PP-YOLOE的算法开发
本期活动联系人:[thinkthinking](https://github.com/thinkthinking)
## 建设目标
[PP-YOLOE+](../../configs/ppyoloe)是百度飞桨团队开源的最新SOTA通用检测模型COCO数据集精度达54.7mAP其L版本相比YOLOv7精度提升1.9%V100端到端(包含前后处理)推理速度达42.2FPS。
我们鼓励大家基于PP-YOLOE去做新的算法开发比如
- 改造PP-YOLOE适用于旋转框、小目标、关键点检测、实例分割等场景
- 精调PP-YOLOE用于工业质检、火灾检测、垃圾检测等垂类场景
- 将PP-YOLOE用于PP-Human、PP-Vehicle等Pipeline中提升pipeline的检测效果。
相信通过这些活动大家可以对PP-YOLOE的细节有更深刻的理解对业务场景的应用也可以做更细节的适配。
## 参与方式
- **方式一****列表选题**,见招募列表(提供了选题方向、题目、优秀的对标项目、文章和代码,以供学习)。
- **方式二****自选题目**,对于非参考列表内的题目,可自主命题,需要与负责人 [thinkthinking](https://github.com/thinkthinking)讨论后决定题目。
## 题目认领
为避免重复选题、知晓任务状态、方便统计管理,请根据如下操作认领您的题目。
在本issue提交题目:[issue](https://github.com/PaddlePaddle/PaddleDetection/issues/7345)
* 方式一(列表选题):在“招募列表”中选择题目,并在[issue](https://github.com/PaddlePaddle/PaddleDetection/issues/7345)中,回复下列信息:
```
【列表选题】
编号XX
题目XXXX
认领人XX
```
* 方式二(自选题目):自主命题,直接在 [issue](https://github.com/PaddlePaddle/PaddleDetection/issues/7345) 中,回复下列信息:
```
【自选题目】
题目XXXX
认领人XX
```
## 招募列表
| 序号 | 类型 | 题目 | 难度 | 参考 | 认领人 |
| :--- | :------- | :-------------------------- | :--- | :-------------------------------------------------------------------------------- | :----- |
| 01 | 模型改造 | PP-YOLOE用于旋转框检测 | 高 | https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/rotate | ---- |
| 02 | 模型改造 | PP-YOLOE用于小目标检测 | 高 | https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/smalldet | ---- |
| 03 | 模型改造 | PP-YOLOE用于关键点检测 | 高 | https://github.com/WongKinYiu/yolov7/tree/pose | ---- |
| 04 | 模型改造 | PP-YOLOE用于实例分割 | 高 | https://github.com/WongKinYiu/yolov7/tree/mask | ---- |
| 05 | 垂类应用 | 基于PP-YOLOE的缺陷检测 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/2367089 | ---- |
| 06 | 垂类应用 | 基于PP-YOLOE的行为检测 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/2500639 | ---- |
| 07 | 垂类应用 | 基于PP-YOLOE的异物检测 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0 | ---- |
| 08 | 垂类应用 | 基于PP-YOLOE的安全监测 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/2503301?channelType=0&channel=0 | ---- |
| 09 | Pipeline | PP-YOLOE-->PP-Human大升级 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/4606001 | ---- |
| 10 | Pipeline | PP-YOLOE-->PP-Vehicle大升级 | 中 | https://aistudio.baidu.com/aistudio/projectdetail/4512254 | ---- |
<mark>【注意】招募列表外的,欢迎开发者联系活动负责人[thinkthinking](https://github.com/thinkthinking)提交贡献👏 <mark>
## 贡献指南
1. 提ISSUE、PR的步骤请参考[飞桨官网-贡献指南-代码贡献流程](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/code_contributing_path_cn.html)
2. AI-Studio使用指南请参考[AI-Studio新手指南](https://ai.baidu.com/ai-doc/AISTUDIO/Tk39ty6ho)
## 原则及注意事项
1. <u>需</u>使用PaddlePaddle框架, 建议复用PaddleDetection代码。
2. <u>建议使用</u>[Paddle框架最新版本](https://www.paddlepaddle.org.cn/).
3. <u>PR</u>需提到[PaddleDetection-develop](https://github.com/PaddlePaddle/PaddleDetection/tree/develop)分支。
4. 模型改造类的任务建议以<u>PR形式</u>提交
5. 垂类应用以及Pipeline类的任务建议以<u>AI-Studio项目形式</u>提交,项目会同步到[产业范例页面](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/industrial_tutorial/README.md)
## 还有不清楚的问题
欢迎大家随时在本[issue](https://github.com/PaddlePaddle/PaddleDetection/issues/7345)下提问,飞桨会有专门的管理员进行疑问解答。
有任何问题,请联系本期活动联系人 [thinkthinking](https://github.com/thinkthinking)
非常感谢大家为飞桨贡献!共建飞桨繁荣社区!

View File

@@ -0,0 +1,408 @@
简体中文 | [English](PaddleYOLO_MODEL_en.md)
# [**PaddleYOLO**](https://github.com/PaddlePaddle/PaddleYOLO)
## 内容
- [**PaddleYOLO**](#paddleyolo)
- [内容](#内容)
- [简介](#简介)
- [更新日志](#更新日志)
- [模型库](#模型库)
- [PP-YOLOE](#pp-yoloe)
- [YOLOX](#yolox)
- [YOLOv5](#yolov5)
- [YOLOv6](#yolov6)
- [YOLOv7](#yolov7)
- [YOLOv8](#yolov8)
- [RTMDet](#rtmdet)
- [**注意:**](#注意)
- [VOC](#voc)
- [使用指南](#使用指南)
- [**一键运行全流程**](#一键运行全流程)
- [自定义数据集](#自定义数据集)
- [数据集准备:](#数据集准备)
- [fintune训练](#fintune训练)
- [预测和导出:](#预测和导出)
## 简介
**PaddleYOLO**是基于[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)的YOLO系列模型库**只包含YOLO系列模型的相关代码**,支持`YOLOv3`,`PP-YOLO`,`PP-YOLOv2`,`PP-YOLOE`,`PP-YOLOE+`,`YOLOX`,`YOLOv5`,`YOLOv6`,`YOLOv7`,`YOLOv8`,`RTMDet`等模型,欢迎一起使用和建设!
## 更新日志
* 【2022/01/10】支持[YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8)预测和部署;
* 【2022/09/29】支持[RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet)预测和部署;
* 【2022/09/26】发布[`PaddleYOLO`](https://github.com/PaddlePaddle/PaddleYOLO)模型套件;
* 【2022/09/19】支持[`YOLOv6`](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6)新版包括n/t/s/m/l模型
* 【2022/08/23】发布`YOLOSeries`代码库: 支持`YOLOv3`,`PP-YOLOE`,`PP-YOLOE+`,`YOLOX`,`YOLOv5`,`YOLOv6`,`YOLOv7`等YOLO模型支持`ConvNeXt`骨干网络高精度版`PP-YOLOE`,`YOLOX``YOLOv5`等模型支持PaddleSlim无损加速量化训练`PP-YOLOE`,`YOLOv5`,`YOLOv6``YOLOv7`等模型,详情可阅读[此文章](https://mp.weixin.qq.com/s/Hki01Zs2lQgvLSLWS0btrA)
**注意:**
- **PaddleYOLO**代码库协议为**GPL 3.0**[YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5),[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6),[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7)和[YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8)这几类模型代码不合入[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)其余YOLO模型推荐在[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)中使用,**会最先发布PP-YOLO系列特色检测模型的最新进展**
- **PaddleYOLO**代码库**推荐使用paddlepaddle-2.3.2以上的版本**,请参考[官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载对应适合版本,**Windows平台请安装paddle develop版本**
- PaddleYOLO 的[Roadmap](https://github.com/PaddlePaddle/PaddleYOLO/issues/44) issue用于收集用户的需求欢迎提出您的建议和需求。
- 训练**自定义数据集**请参照[文档](#自定义数据集)和[issue](https://github.com/PaddlePaddle/PaddleYOLO/issues/43)。请首先**确保加载了COCO权重作为预训练**YOLO检测模型建议**总`batch_size`至少大于`64`**去训练,如果资源不够请**换小模型**或**减小模型的输入尺度**,为了保障较高检测精度,**尽量不要尝试单卡训和总`batch_size`小于`32`训**
## 模型库
### [PP-YOLOE](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe)
<details>
<summary> 基础模型 </summary>
| 网络模型 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| PP-YOLOE-s | 640 | 32 | 400e | 2.9 | 43.4 | 60.0 | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml) |
| PP-YOLOE-s | 640 | 32 | 300e | 2.9 | 43.0 | 59.6 | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) |
| PP-YOLOE-m | 640 | 28 | 300e | 6.0 | 49.0 | 65.9 | 23.43 | 49.91 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml) |
| PP-YOLOE-l | 640 | 20 | 300e | 8.7 | 51.4 | 68.6 | 52.20 | 110.07 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) |
| PP-YOLOE-x | 640 | 16 | 300e | 14.9 | 52.3 | 69.5 | 98.42 | 206.59 |[model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml) |
| PP-YOLOE-tiny ConvNeXt| 640 | 16 | 36e | - | 44.6 | 63.3 | 33.04 | 13.87 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_convnext_tiny_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/convnext/ppyoloe_convnext_tiny_36e_coco.yml) |
| **PP-YOLOE+_s** | 640 | 8 | 80e | 2.9 | **43.7** | **60.6** | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml) |
| **PP-YOLOE+_m** | 640 | 8 | 80e | 6.0 | **49.8** | **67.1** | 23.43 | 49.91 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml) |
| **PP-YOLOE+_l** | 640 | 8 | 80e | 8.7 | **52.9** | **70.1** | 52.20 | 110.07 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) |
| **PP-YOLOE+_x** | 640 | 8 | 80e | 14.9 | **54.7** | **72.0** | 98.42 | 206.59 |[model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml) |
</details>
<details>
<summary> 部署模型 </summary>
| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| PP-YOLOE-s(400epoch) | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_wo_nms.onnx) |
| PP-YOLOE-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_wo_nms.onnx) |
| PP-YOLOE-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_wo_nms.onnx) |
| PP-YOLOE-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_wo_nms.onnx) |
| PP-YOLOE-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_wo_nms.onnx) |
| **PP-YOLOE+_s** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_wo_nms.onnx) |
| **PP-YOLOE+_m** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_wo_nms.onnx) |
| **PP-YOLOE+_l** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_wo_nms.onnx) |
| **PP-YOLOE+_x** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_wo_nms.onnx) |
</details>
### [YOLOX](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox)
<details>
<summary> 基础模型 </summary>
| 网络模型 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| YOLOX-nano | 416 | 8 | 300e | 2.3 | 26.1 | 42.0 | 0.91 | 1.08 | [model](https://paddledet.bj.bcebos.com/models/yolox_nano_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_nano_300e_coco.yml) |
| YOLOX-tiny | 416 | 8 | 300e | 2.8 | 32.9 | 50.4 | 5.06 | 6.45 | [model](https://paddledet.bj.bcebos.com/models/yolox_tiny_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_tiny_300e_coco.yml) |
| YOLOX-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 9.0 | 26.8 | [model](https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_s_300e_coco.yml) |
| YOLOX-m | 640 | 8 | 300e | 5.8 | 46.9 | 65.7 | 25.3 | 73.8 | [model](https://paddledet.bj.bcebos.com/models/yolox_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_m_300e_coco.yml) |
| YOLOX-l | 640 | 8 | 300e | 9.3 | 50.1 | 68.8 | 54.2 | 155.6 | [model](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_l_300e_coco.yml) |
| YOLOX-x | 640 | 8 | 300e | 16.6 | **51.8** | **70.6** | 99.1 | 281.9 | [model](https://paddledet.bj.bcebos.com/models/yolox_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_x_300e_coco.yml) |
YOLOX-cdn-tiny | 416 | 8 | 300e | 1.9 | 32.4 | 50.2 | 5.03 | 6.33 | [model](https://paddledet.bj.bcebos.com/models/yolox_cdn_tiny_300e_coco.pdparams) | [config](c../../onfigs/yolox/yolox_cdn_tiny_300e_coco.yml) |
| YOLOX-crn-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 7.7 | 24.69 | [model](https://paddledet.bj.bcebos.com/models/yolox_crn_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_crn_s_300e_coco.yml) |
| YOLOX-s ConvNeXt| 640 | 8 | 36e | - | 44.6 | 65.3 | 36.2 | 27.52 | [model](https://paddledet.bj.bcebos.com/models/yolox_convnext_s_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/convnext/yolox_convnext_s_36e_coco.yml) |
</details>
<details>
<summary> 部署模型 </summary>
| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| YOLOx-nano | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_wo_nms.onnx) |
| YOLOx-tiny | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_wo_nms.onnx) |
| YOLOx-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_wo_nms.onnx) |
| YOLOx-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_wo_nms.onnx) |
| YOLOx-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_wo_nms.onnx) |
| YOLOx-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_wo_nms.onnx) |
</details>
### [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5)
<details>
<summary> 基础模型 </summary>
| 网络模型 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| YOLOv5-n | 640 | 16 | 300e | 2.6 | 28.0 | 45.7 | 1.87 | 4.52 | [model](https://paddledet.bj.bcebos.com/models/yolov5_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_n_300e_coco.yml) |
| YOLOv5-s | 640 | 16 | 300e | 3.2 | 37.6 | 56.7 | 7.24 | 16.54 | [model](https://paddledet.bj.bcebos.com/models/yolov5_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_s_300e_coco.yml) |
| YOLOv5-m | 640 | 16 | 300e | 5.2 | 45.4 | 64.1 | 21.19 | 49.08 | [model](https://paddledet.bj.bcebos.com/models/yolov5_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_m_300e_coco.yml) |
| YOLOv5-l | 640 | 16 | 300e | 7.9 | 48.9 | 67.1 | 46.56 | 109.32 | [model](https://paddledet.bj.bcebos.com/models/yolov5_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_l_300e_coco.yml) |
| YOLOv5-x | 640 | 16 | 300e | 13.7 | 50.6 | 68.7 | 86.75 | 205.92 | [model](https://paddledet.bj.bcebos.com/models/yolov5_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_x_300e_coco.yml) |
| YOLOv5-s ConvNeXt| 640 | 8 | 36e | - | 42.4 | 65.3 | 34.54 | 17.96 | [model](https://paddledet.bj.bcebos.com/models/yolov5_convnext_s_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_convnext_s_36e_coco.yml) |
| *YOLOv5p6-n | 1280 | 16 | 300e | - | 35.9 | 54.2 | 3.25 | 9.23 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_n_300e_coco.yml) |
| *YOLOv5p6-s | 1280 | 16 | 300e | - | 44.5 | 63.3 | 12.63 | 33.81 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_s_300e_coco.yml) |
| *YOLOv5p6-m | 1280 | 16 | 300e | - | 51.1 | 69.0 | 35.73 | 100.21 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_m_300e_coco.yml) |
| *YOLOv5p6-l | 1280 | 8 | 300e | - | 53.4 | 71.0 | 76.77 | 223.09 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_l_300e_coco.yml) |
| *YOLOv5p6-x | 1280 | 8 | 300e | - | 54.7 | 72.4 | 140.80 | 420.03 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_x_300e_coco.yml) |
</details>
<details>
<summary> 部署模型 </summary>
| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| YOLOv5-n | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_wo_nms.onnx) |
| YOLOv5-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_wo_nms.onnx) |
| YOLOv5-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_wo_nms.onnx) |
| YOLOv5-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_wo_nms.onnx) |
| YOLOv5-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_wo_nms.onnx) |
</details>
### [YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6)
<details>
<summary> 基础模型 </summary>
| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP | AP50 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 |
| :------------- | :------- | :-------: | :------: | :---------: | :-----: |:-----: | :-----: |:-----: | :-------------: | :-----: |
| *YOLOv6-n | 640 | 16 | 300e(+300e) | 2.0 | 37.5 | 53.1 | 5.07 | 12.49 |[model](https://paddledet.bj.bcebos.com/models/yolov6_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_n_300e_coco.yml) |
| *YOLOv6-s | 640 | 32 | 300e(+300e) | 2.7 | 44.8 | 61.7 | 20.18 | 49.36 |[model](https://paddledet.bj.bcebos.com/models/yolov6_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_s_300e_coco.yml) |
| *YOLOv6-m | 640 | 32 | 300e(+300e) | - | 49.5 | 66.9 | 37.74 | 92.47 |[model](https://paddledet.bj.bcebos.com/models/yolov6_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_m_300e_coco.yml) |
| *YOLOv6-l(silu) | 640 | 32 | 300e(+300e) | - | 52.2 | 70.2 | 59.66 | 149.4 |[model](https://paddledet.bj.bcebos.com/models/yolov6_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_l_300e_coco.yml) |
</details>
<details>
<summary> 部署模型 </summary>
| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| yolov6-n | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_w_nms.zip) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_w_nms.onnx) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_wo_nms.onnx) |
| yolov6-s | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_w_nms.zip) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_w_nms.onnx) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_wo_nms.onnx) |
| yolov6-m | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_w_nms.zip) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_w_nms.onnx) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_wo_nms.onnx) |
| yolov6-l(silu) | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_w_nms.zip) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_w_nms.onnx) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_wo_nms.onnx) |
</details>
### [YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7)
<details>
<summary> 基础模型 </summary>
| 网络模型 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| YOLOv7-L | 640 | 32 | 300e | 7.4 | 51.0 | 70.2 | 37.62 | 106.08 |[model](https://paddledet.bj.bcebos.com/models/yolov7_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_l_300e_coco.yml) |
| *YOLOv7-X | 640 | 32 | 300e | 12.2 | 53.0 | 70.8 | 71.34 | 190.08 | [model](https://paddledet.bj.bcebos.com/models/yolov7_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_x_300e_coco.yml) |
| *YOLOv7P6-W6 | 1280 | 16 | 300e | 25.5 | 54.4 | 71.8 | 70.43 | 360.26 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_w6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_w6_300e_coco.yml) |
| *YOLOv7P6-E6 | 1280 | 10 | 300e | 31.1 | 55.7 | 73.0 | 97.25 | 515.4 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_e6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_e6_300e_coco.yml) |
| *YOLOv7P6-D6 | 1280 | 8 | 300e | 37.4 | 56.1 | 73.3 | 133.81 | 702.92 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_d6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_d6_300e_coco.yml) |
| *YOLOv7P6-E6E | 1280 | 6 | 300e | 48.7 | 56.5 | 73.7 | 151.76 | 843.52 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_e6e_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_e6e_300e_coco.yml) |
| YOLOv7-tiny | 640 | 32 | 300e | - | 37.3 | 54.5 | 6.23 | 6.90 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_300e_coco.yml) |
| YOLOv7-tiny | 416 | 32 | 300e | - | 33.3 | 49.5 | 6.23 | 2.91 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_416_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_416_300e_coco.yml) |
| YOLOv7-tiny | 320 | 32 | 300e | - | 29.1 | 43.8 | 6.23 | 1.73 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_320_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_320_300e_coco.yml) |
</details>
<details>
<summary> 部署模型 </summary>
| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| YOLOv7-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_wo_nms.onnx) |
| YOLOv7-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_wo_nms.onnx) |
| YOLOv7P6-W6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_wo_nms.onnx) |
| YOLOv7P6-E6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_wo_nms.onnx) |
| YOLOv7P6-D6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_wo_nms.onnx) |
| YOLOv7P6-E6E | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_wo_nms.onnx) |
| YOLOv7-tiny | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_wo_nms.onnx) |
| YOLOv7-tiny | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_wo_nms.onnx) |
| YOLOv7-tiny | 320 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_wo_nms.onnx) |
</details>
### [YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8)
<details>
<summary> 基础模型 </summary>
| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| *YOLOv8-n | 640 | 16 | 500e | 2.4 | 37.3 | 53.0 | 3.16 | 8.7 | [model](https://paddledet.bj.bcebos.com/models/yolov8_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_n_300e_coco.yml) |
| *YOLOv8-s | 640 | 16 | 500e | 3.4 | 44.9 | 61.8 | 11.17 | 28.6 | [model](https://paddledet.bj.bcebos.com/models/yolov8_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_s_300e_coco.yml) |
| *YOLOv8-m | 640 | 16 | 500e | 6.5 | 50.2 | 67.3 | 25.90 | 78.9 | [model](https://paddledet.bj.bcebos.com/models/yolov8_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_m_300e_coco.yml) |
| *YOLOv8-l | 640 | 16 | 500e | 10.0 | 52.8 | 69.6 | 43.69 | 165.2 | [model](https://paddledet.bj.bcebos.com/models/yolov8_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_l_300e_coco.yml) |
| *YOLOv8-x | 640 | 16 | 500e | 15.1 | 53.8 | 70.6 | 68.23 | 257.8 | [model](https://paddledet.bj.bcebos.com/models/yolov8_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_x_300e_coco.yml) |
| *YOLOv8-P6-x | 1280 | 16 | 500e | 55.0 | - | - | 97.42 | 522.93 | [model](https://paddledet.bj.bcebos.com/models/yolov8p6_x_500e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8p6_x_500e_coco.yml) |
</details>
<details>
<summary> 部署模型 </summary>
| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| YOLOv8-n | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_wo_nms.onnx) |
| YOLOv8-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_wo_nms.onnx) |
| YOLOv8-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_wo_nms.onnx) |
| YOLOv8-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_wo_nms.onnx) |
| YOLOv8-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_wo_nms.onnx) |
</details>
### [RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet)
<details>
<summary> 基础模型 </summary>
| 网络网络 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP | AP50 | Params(M) | FLOPs(G) | 下载链接 | 配置文件 |
| :------------- | :------- | :-------: | :------: | :---------: | :-----: |:-----: | :-----: |:-----: | :-------------: | :-----: |
| *RTMDet-t | 640 | 32 | 300e | 2.8 | 40.9 | 57.9 | 4.90 | 16.21 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_t_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_t_300e_coco.yml) |
| *RTMDet-s | 640 | 32 | 300e | 3.3 | 44.5 | 62.0 | 8.89 | 29.71 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_s_300e_coco.yml) |
| *RTMDet-m | 640 | 32 | 300e | 6.4 | 49.1 | 66.8 | 24.71 | 78.47 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_m_300e_coco.yml) |
| *RTMDet-l | 640 | 32 | 300e | 10.2 | 51.2 | 68.8 | 52.31 | 160.32 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_l_300e_coco.yml) |
| *RTMDet-x | 640 | 32 | 300e | 18.0 | 52.6 | 70.4 | 94.86 | 283.12 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_x_300e_coco.yml) |
</details>
<details>
<summary> 部署模型 </summary>
| 网络模型 | 输入尺寸 | 导出后的权重(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| RTMDet-t | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_wo_nms.onnx) |
| RTMDet-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_wo_nms.onnx) |
| RTMDet-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_wo_nms.onnx) |
| RTMDet-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_wo_nms.onnx) |
| RTMDet-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_wo_nms.onnx) |
</details>
### **注意:**
- 所有模型均使用COCO train2017作为训练集在COCO val2017上验证精度模型前带*表示训练更新中。
- 具体精度和速度细节请查看[PP-YOLOE](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe),[YOLOX](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox),[YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5),[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6),[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7)**其中YOLOv5,YOLOv6,YOLOv7评估并未采用`multi_label`形式**。
- 模型推理耗时(ms)为TensorRT-FP16下测试的耗时**不包含数据预处理和模型输出后处理(NMS)的耗时**。测试采用**单卡Tesla T4 GPUbatch size=1**,测试环境为**paddlepaddle-2.3.2**, **CUDA 11.2**, **CUDNN 8.2**, **GCC-8.2**, **TensorRT 8.0.3.4**,具体请参考各自模型主页。
- **统计FLOPs(G)和Params(M)**,首先安装[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), `pip install paddleslim`,然后设置[runtime.yml](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/runtime.yml)里`print_flops: True``print_params: True`,并且注意确保是**单尺度**下如640x640**打印的是MACsFLOPs=2*MACs**。
- 各模型导出后的权重以及ONNX分为**带(w)**和**不带(wo)**后处理NMS都提供了下载链接请参考各自模型主页下载。`w_nms`表示**带NMS后处理**,可以直接使用预测出最终检测框结果如```python deploy/python/infer.py --model_dir=ppyoloe_crn_l_300e_coco_w_nms/ --image_file=demo/000000014439.jpg --device=GPU````wo_nms`表示**不带NMS后处理**,是**测速**时使用,如需预测出检测框结果需要找到**对应head中的后处理相关代码**并修改为如下:
```
if self.exclude_nms:
# `exclude_nms=True` just use in benchmark for speed test
# return pred_bboxes.sum(), pred_scores.sum() # 原先是这行,现在注释
return pred_bboxes, pred_scores # 新加这行表示保留进NMS前的原始结果
else:
bbox_pred, bbox_num, _ = self.nms(pred_bboxes, pred_scores)
return bbox_pred, bbox_num
```
并重新导出,使用时再**另接自己写的NMS后处理**。
- 基于[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)对YOLO系列模型进行量化训练可以实现精度基本无损速度普遍提升30%以上,具体请参照[模型自动化压缩工具ACT](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression)。
### [VOC](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc)
<details>
<summary> 基础模型 </summary>
| 网络模型 | 输入尺寸 | 图片数/GPU | 学习率策略 | TRT-FP16-Latency(ms) | mAP(0.50,11point) | Params(M) | FLOPs(G) | 下载链接 | 配置文件 |
| :-----------: | :-------: | :-------: | :------: | :------------: | :---------------: | :------------------: |:-----------------: | :------: | :------: |
| YOLOv5-s | 640 | 16 | 60e | 3.2 | 80.3 | 7.24 | 16.54 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov5_s_60e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolov5_s_60e_voc.yml) |
| YOLOv7-tiny | 640 | 32 | 60e | 2.6 | 80.2 | 6.23 | 6.90 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov7_tiny_60e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolov7_tiny_60e_voc.yml) |
| YOLOX-s | 640 | 8 | 40e | 3.0 | 82.9 | 9.0 | 26.8 | [下载链接](https://paddledet.bj.bcebos.com/models/yolox_s_40e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolox_s_40e_voc.yml) |
| PP-YOLOE+_s | 640 | 8 | 30e | 2.9 | 86.7 | 7.93 | 17.36 | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_30e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/ppyoloe_plus_crn_s_30e_voc.yml) |
</details>
**注意:**
- VOC数据集训练的mAP为`mAP(IoU=0.5)`的结果,且评估未使用`multi_label`等trick
- 所有YOLO VOC模型均加载各自模型的COCO权重作为预训练各个配置文件的配置均为默认使用8卡GPU可作为自定义数据集设置参考具体精度会因数据集而异
- YOLO检测模型建议**总`batch_size`至少大于`64`**去训练,如果资源不够请**换小模型**或**减小模型的输入尺度**,为了保障较高检测精度,**尽量不要尝试单卡训和总`batch_size`小于`64`训**
- Params(M)和FLOPs(G)均为训练时所测YOLOv7没有s模型故选用tiny模型
- TRT-FP16-Latency(ms)测速相关请查看各YOLO模型的config的主页
## 使用指南
下载MS-COCO数据集[官网](https://cocodataset.org)下载地址为: [annotations](http://images.cocodataset.org/annotations/annotations_trainval2017.zip), [train2017](http://images.cocodataset.org/zips/train2017.zip), [val2017](http://images.cocodataset.org/zips/val2017.zip), [test2017](http://images.cocodataset.org/zips/test2017.zip)。
PaddleDetection团队提供的下载链接为[coco](https://bj.bcebos.com/v1/paddledet/data/coco.tar)(共约22G)和[test2017](https://bj.bcebos.com/v1/paddledet/data/cocotest2017.zip)注意test2017可不下载评估是使用的val2017。
### **一键运行全流程**
将以下命令写在一个脚本文件里如```run.sh```,一键运行命令为:```sh run.sh```,也可命令行一句句去运行。
```bash
model_name=ppyoloe # 可修改,如 yolov7
job_name=ppyoloe_plus_crn_l_300e_coco # 可修改,如 yolov7_tiny_300e_coco
config=configs/${model_name}/${job_name}.yml
log_dir=log_dir/${job_name}
# weights=https://bj.bcebos.com/v1/paddledet/models/${job_name}.pdparams
weights=output/${job_name}/model_final.pdparams
# 1.训练(单卡/多卡)
# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp
python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp
# 2.评估
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=${weights} --classwise
# 3.直接预测
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c ${config} -o weights=${weights} --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.5
# 4.导出模型
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=${weights} # exclude_nms=True trt=True
# 5.部署预测
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU
# 6.部署测速,加 “--run_mode=trt_fp16” 表示在TensorRT FP16模式下测速
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU --run_benchmark=True # --run_mode=trt_fp16
# 7.onnx导出
paddle2onnx --model_dir output_inference/${job_name} --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ${job_name}.onnx
# 8.onnx测速
/usr/local/TensorRT-8.0.3.4/bin/trtexec --onnx=${job_name}.onnx --workspace=4096 --avgRuns=10 --shapes=input:1x3x640x640 --fp16
```
- 如果想切换模型,只要修改开头两行即可,如:
```
model_name=yolov7
job_name=yolov7_l_300e_coco
```
- 导出**onnx**,首先安装[Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX)`pip install paddle2onnx`
- **统计FLOPs(G)和Params(M)**,首先安装[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)`pip install paddleslim`,然后设置[runtime.yml](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/runtime.yml)里`print_flops: True`和`print_params: True`,并且注意确保是**单尺度**下如640x640**打印的是MACsFLOPs=2*MACs**。
### 自定义数据集
#### 数据集准备:
1.自定义数据集的标注制作,请参考[DetAnnoTools](../tutorials/data/DetAnnoTools.md);
2.自定义数据集的训练准备,请参考[PrepareDataSet](../tutorials/PrepareDataSet.md)。
#### fintune训练
除了更改数据集的路径外,训练一般推荐加载**对应模型的COCO预训练权重**去fintune会更快收敛和达到更高精度
```base
# 单卡fintune训练
# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams
# 多卡fintune训练
python -m paddle.distributed.launch --log_dir=./log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams
```
**注意:**
- fintune训练一般会提示head分类分支最后一层卷积的通道数没对应上属于正常情况是由于自定义数据集一般和COCO数据集种类数不一致
- fintune训练一般epoch数可以设置更少lr设置也更小点如1/10最高精度可能出现在中间某个epoch
#### 预测和导出:
使用自定义数据集预测和导出模型时如果TestDataset数据集路径设置不正确会默认使用COCO 80类。
除了TestDataset数据集路径设置正确外也可以自行修改和添加对应的label_list.txt文件(一行记录一个对应种类)TestDataset中的anno_path也可设置为绝对路径
```
TestDataset:
!ImageFolder
anno_path: label_list.txt # 如不使用dataset_dir则anno_path即为相对于PaddleDetection主目录的相对路径
# dataset_dir: dataset/my_coco # 如使用dataset_dir则dataset_dir/anno_path作为新的anno_path
```
label_list.txt里的一行记录一个对应种类如下所示
```
person
vehicle
```

View File

@@ -0,0 +1,399 @@
[简体中文](PaddleYOLO_MODEL.md) | English
# [**PaddleYOLO**](https://github.com/PaddlePaddle/PaddleYOLO)
## Introduction
- [**PaddleYOLO**](#paddleyolo)
- [Introduction](#introduction)
- [Introduction](#introduction-1)
- [Updates](#updates)
- [ModelZoo](#modelzoo)
- [PP-YOLOE](#pp-yoloe)
- [YOLOX](#yolox)
- [YOLOv5](#yolov5)
- [YOLOv6](#yolov6)
- [YOLOv7](#yolov7)
- [YOLOv8](#yolov8)
- [RTMDet](#rtmdet)
- [**Notes**](#notes)
- [VOC](#voc)
- [UserGuide](#userguide)
- [**Pipeline**](#pipeline)
- [CustomDataset](#customdataset)
- [preparation](#preparation)
- [fintune](#fintune)
- [Predict and export:](#predict-and-export)
## Introduction
**PaddleYOLO** is a YOLO Series toolbox based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), **only relevant codes of YOLO series models are included**. It supports `YOLOv3`,`PP-YOLO`,`PP-YOLOv2`,`PP-YOLOE`,`PP-YOLOE+`,`YOLOX`,`YOLOv5`,`YOLOv6`,`YOLOv7`,`YOLOv8`,`RTMDet` and so on. Welcome to use and build it together!
## Updates
* 【2023/01/10】Support [YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8) inference and deploy;
* 【2022/09/29】Support [RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet) inference and deploy;
* 【2022/09/26】Release [`PaddleYOLO`](https://github.com/PaddlePaddle/PaddleYOLO);
* 【2022/09/19】Support the new version of [`YOLOv6`](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6), including n/t/s/m/l model;
* 【2022/08/23】Release `YOLOSeries` codebase: support `YOLOv3`,`PP-YOLOE`,`PP-YOLOE+`,`YOLOX`,`YOLOv5`,`YOLOv6` and `YOLOv7`; support using `ConvNeXt` backbone to get high-precision version of `PP-YOLOE`,`YOLOX` and `YOLOv5`; support PaddleSlim accelerated quantitative training `PP-YOLOE`,`YOLOv5`,`YOLOv6` and `YOLOv7`. For details, please read this [article](https://mp.weixin.qq.com/s/Hki01Zs2lQgvLSLWS0btrA)
**Notes**
- The Licence of **PaddleYOLO** is **GPL 3.0**, the codes of [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5),[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6),[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7) and [YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8) will not be merged into [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). Except for these three YOLO models, other YOLO models are recommended to use in [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), **which will be the first to release the latest progress of PP-YOLO series detection model**;
- To use **PaddleYOLO**, **PaddlePaddle-2.3.2 or above is recommended**please refer to the [official website](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html) to download the appropriate version. **For Windows platforms, please install the paddle develop version**;
- Training **Custom dataset** please refer to [doc](#CustomDataset) and [issue](https://github.com/PaddlePaddle/PaddleYOLO/issues/43). Please **ensure COCO trained weights are loaded as pre-train** at first. We recommend to use YOLO detection model **with a total `batch_size` at least greater than `64` to train**. If the resources are insufficient, please **use the smaller model** or **reduce the input size of the model**. To ensure high detection accuracy, **you'd better never try to using single GPU or total `batch_size` less than `32` for training**;
## ModelZoo
### [PP-YOLOE](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe)
<details>
<summary> Baseline </summary>
| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | download | config |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| PP-YOLOE-s | 640 | 32 | 400e | 2.9 | 43.4 | 60.0 | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_400e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_s_400e_coco.yml) |
| PP-YOLOE-s | 640 | 32 | 300e | 2.9 | 43.0 | 59.6 | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) |
| PP-YOLOE-m | 640 | 28 | 300e | 6.0 | 49.0 | 65.9 | 23.43 | 49.91 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml) |
| PP-YOLOE-l | 640 | 20 | 300e | 8.7 | 51.4 | 68.6 | 52.20 | 110.07 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) |
| PP-YOLOE-x | 640 | 16 | 300e | 14.9 | 52.3 | 69.5 | 98.42 | 206.59 |[model](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml) |
| PP-YOLOE-tiny ConvNeXt| 640 | 16 | 36e | - | 44.6 | 63.3 | 33.04 | 13.87 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_convnext_tiny_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/convnext/ppyoloe_convnext_tiny_36e_coco.yml) |
| **PP-YOLOE+_s** | 640 | 8 | 80e | 2.9 | **43.7** | **60.6** | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml) |
| **PP-YOLOE+_m** | 640 | 8 | 80e | 6.0 | **49.8** | **67.1** | 23.43 | 49.91 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml) |
| **PP-YOLOE+_l** | 640 | 8 | 80e | 8.7 | **52.9** | **70.1** | 52.20 | 110.07 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) |
| **PP-YOLOE+_x** | 640 | 8 | 80e | 14.9 | **54.7** | **72.0** | 98.42 | 206.59 |[model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml) |
</details>
<details>
<summary> Deploy Models </summary>
| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| PP-YOLOE-s(400epoch) | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_400e_coco_wo_nms.onnx) |
| PP-YOLOE-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_s_300e_coco_wo_nms.onnx) |
| PP-YOLOE-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_m_300e_coco_wo_nms.onnx) |
| PP-YOLOE-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_l_300e_coco_wo_nms.onnx) |
| PP-YOLOE-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_crn_x_300e_coco_wo_nms.onnx) |
| **PP-YOLOE+_s** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_s_80e_coco_wo_nms.onnx) |
| **PP-YOLOE+_m** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_m_80e_coco_wo_nms.onnx) |
| **PP-YOLOE+_l** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_l_80e_coco_wo_nms.onnx) |
| **PP-YOLOE+_x** | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/ppyoloe/ppyoloe_plus_crn_x_80e_coco_wo_nms.onnx) |
</details>
### [YOLOX](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox)
<details>
<summary> Baseline </summary>
| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | download | config |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| YOLOX-nano | 416 | 8 | 300e | 2.3 | 26.1 | 42.0 | 0.91 | 1.08 | [model](https://paddledet.bj.bcebos.com/models/yolox_nano_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_nano_300e_coco.yml) |
| YOLOX-tiny | 416 | 8 | 300e | 2.8 | 32.9 | 50.4 | 5.06 | 6.45 | [model](https://paddledet.bj.bcebos.com/models/yolox_tiny_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_tiny_300e_coco.yml) |
| YOLOX-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 9.0 | 26.8 | [model](https://paddledet.bj.bcebos.com/models/yolox_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_s_300e_coco.yml) |
| YOLOX-m | 640 | 8 | 300e | 5.8 | 46.9 | 65.7 | 25.3 | 73.8 | [model](https://paddledet.bj.bcebos.com/models/yolox_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_m_300e_coco.yml) |
| YOLOX-l | 640 | 8 | 300e | 9.3 | 50.1 | 68.8 | 54.2 | 155.6 | [model](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_l_300e_coco.yml) |
| YOLOX-x | 640 | 8 | 300e | 16.6 | **51.8** | **70.6** | 99.1 | 281.9 | [model](https://paddledet.bj.bcebos.com/models/yolox_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_x_300e_coco.yml) |
YOLOX-cdn-tiny | 416 | 8 | 300e | 1.9 | 32.4 | 50.2 | 5.03 | 6.33 | [model](https://paddledet.bj.bcebos.com/models/yolox_cdn_tiny_300e_coco.pdparams) | [config](c../../onfigs/yolox/yolox_cdn_tiny_300e_coco.yml) |
| YOLOX-crn-s | 640 | 8 | 300e | 3.0 | 40.4 | 59.6 | 7.7 | 24.69 | [model](https://paddledet.bj.bcebos.com/models/yolox_crn_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox/yolox_crn_s_300e_coco.yml) |
| YOLOX-s ConvNeXt| 640 | 8 | 36e | - | 44.6 | 65.3 | 36.2 | 27.52 | [model](https://paddledet.bj.bcebos.com/models/yolox_convnext_s_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/convnext/yolox_convnext_s_36e_coco.yml) |
</details>
<details>
<summary> Deploy Models </summary>
| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| YOLOx-nano | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_nano_300e_coco_wo_nms.onnx) |
| YOLOx-tiny | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_tiny_300e_coco_wo_nms.onnx) |
| YOLOx-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_s_300e_coco_wo_nms.onnx) |
| YOLOx-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_m_300e_coco_wo_nms.onnx) |
| YOLOx-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_l_300e_coco_wo_nms.onnx) |
| YOLOx-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolox/yolox_x_300e_coco_wo_nms.onnx) |
</details>
### [YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5)
<details>
<summary> Baseline </summary>
| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | download | config |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| YOLOv5-n | 640 | 16 | 300e | 2.6 | 28.0 | 45.7 | 1.87 | 4.52 | [model](https://paddledet.bj.bcebos.com/models/yolov5_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_n_300e_coco.yml) |
| YOLOv5-s | 640 | 16 | 300e | 3.2 | 37.6 | 56.7 | 7.24 | 16.54 | [model](https://paddledet.bj.bcebos.com/models/yolov5_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_s_300e_coco.yml) |
| YOLOv5-m | 640 | 16 | 300e | 5.2 | 45.4 | 64.1 | 21.19 | 49.08 | [model](https://paddledet.bj.bcebos.com/models/yolov5_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_m_300e_coco.yml) |
| YOLOv5-l | 640 | 16 | 300e | 7.9 | 48.9 | 67.1 | 46.56 | 109.32 | [model](https://paddledet.bj.bcebos.com/models/yolov5_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_l_300e_coco.yml) |
| YOLOv5-x | 640 | 16 | 300e | 13.7 | 50.6 | 68.7 | 86.75 | 205.92 | [model](https://paddledet.bj.bcebos.com/models/yolov5_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_x_300e_coco.yml) |
| YOLOv5-s ConvNeXt| 640 | 8 | 36e | - | 42.4 | 65.3 | 34.54 | 17.96 | [model](https://paddledet.bj.bcebos.com/models/yolov5_convnext_s_36e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5_convnext_s_36e_coco.yml) |
| *YOLOv5p6-n | 1280 | 16 | 300e | - | 35.9 | 54.2 | 3.25 | 9.23 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_n_300e_coco.yml) |
| *YOLOv5p6-s | 1280 | 16 | 300e | - | 44.5 | 63.3 | 12.63 | 33.81 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_s_300e_coco.yml) |
| *YOLOv5p6-m | 1280 | 16 | 300e | - | 51.1 | 69.0 | 35.73 | 100.21 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_m_300e_coco.yml) |
| *YOLOv5p6-l | 1280 | 8 | 300e | - | 53.4 | 71.0 | 76.77 | 223.09 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_l_300e_coco.yml) |
| *YOLOv5p6-x | 1280 | 8 | 300e | - | 54.7 | 72.4 | 140.80 | 420.03 | [model](https://paddledet.bj.bcebos.com/models/yolov5p6_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5/yolov5p6_x_300e_coco.yml) |
</details>
<details>
<summary> Deploy Models </summary>
| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| YOLOv5-n | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_n_300e_coco_wo_nms.onnx) |
| YOLOv5-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_s_300e_coco_wo_nms.onnx) |
| YOLOv5-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_m_300e_coco_wo_nms.onnx) |
| YOLOv5-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_l_300e_coco_wo_nms.onnx) |
| YOLOv5-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov5/yolov5_x_300e_coco_wo_nms.onnx) |
</details>
### [YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6)
<details>
<summary> Baseline </summary>
| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | download | config |
| :------------- | :------- | :-------: | :------: | :---------: | :-----: |:-----: | :-----: |:-----: | :-------------: | :-----: |
| *YOLOv6-n | 640 | 16 | 300e(+300e) | 2.0 | 37.5 | 53.1 | 5.07 | 12.49 |[model](https://paddledet.bj.bcebos.com/models/yolov6_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_n_300e_coco.yml) |
| *YOLOv6-s | 640 | 32 | 300e(+300e) | 2.7 | 44.8 | 61.7 | 20.18 | 49.36 |[model](https://paddledet.bj.bcebos.com/models/yolov6_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_s_300e_coco.yml) |
| *YOLOv6-m | 640 | 32 | 300e(+300e) | - | 49.5 | 66.9 | 37.74 | 92.47 |[model](https://paddledet.bj.bcebos.com/models/yolov6_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_m_300e_coco.yml) |
| *YOLOv6-l(silu) | 640 | 32 | 300e(+300e) | - | 52.2 | 70.2 | 59.66 | 149.4 |[model](https://paddledet.bj.bcebos.com/models/yolov6_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6/yolov6_l_300e_coco.yml) |
</details>
<details>
<summary> Deploy Models </summary>
| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| yolov6-n | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_w_nms.zip) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_w_nms.onnx) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_n_300e_coco_wo_nms.onnx) |
| yolov6-s | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_w_nms.zip) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_w_nms.onnx) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_s_300e_coco_wo_nms.onnx) |
| yolov6-m | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_w_nms.zip) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_w_nms.onnx) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_m_300e_coco_wo_nms.onnx) |
| yolov6-l(silu) | 640 | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_w_nms.zip) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_wo_nms.zip) | [(w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_w_nms.onnx) &#124; [(w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov6/yolov6_l_300e_coco_wo_nms.onnx) |
</details>
### [YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7)
<details>
<summary> Baseline </summary>
| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | download | config |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| YOLOv7-L | 640 | 32 | 300e | 7.4 | 51.0 | 70.2 | 37.62 | 106.08 |[model](https://paddledet.bj.bcebos.com/models/yolov7_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_l_300e_coco.yml) |
| *YOLOv7-X | 640 | 32 | 300e | 12.2 | 53.0 | 70.8 | 71.34 | 190.08 | [model](https://paddledet.bj.bcebos.com/models/yolov7_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_x_300e_coco.yml) |
| *YOLOv7P6-W6 | 1280 | 16 | 300e | 25.5 | 54.4 | 71.8 | 70.43 | 360.26 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_w6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_w6_300e_coco.yml) |
| *YOLOv7P6-E6 | 1280 | 10 | 300e | 31.1 | 55.7 | 73.0 | 97.25 | 515.4 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_e6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_e6_300e_coco.yml) |
| *YOLOv7P6-D6 | 1280 | 8 | 300e | 37.4 | 56.1 | 73.3 | 133.81 | 702.92 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_d6_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_d6_300e_coco.yml) |
| *YOLOv7P6-E6E | 1280 | 6 | 300e | 48.7 | 56.5 | 73.7 | 151.76 | 843.52 | [model](https://paddledet.bj.bcebos.com/models/yolov7p6_e6e_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7p6_e6e_300e_coco.yml) |
| YOLOv7-tiny | 640 | 32 | 300e | - | 37.3 | 54.5 | 6.23 | 6.90 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_300e_coco.yml) |
| YOLOv7-tiny | 416 | 32 | 300e | - | 33.3 | 49.5 | 6.23 | 2.91 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_416_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_416_300e_coco.yml) |
| YOLOv7-tiny | 320 | 32 | 300e | - | 29.1 | 43.8 | 6.23 | 1.73 |[model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_320_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7/yolov7_tiny_320_300e_coco.yml) |
</details>
<details>
<summary> Deploy Models </summary>
| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| YOLOv7-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_l_300e_coco_wo_nms.onnx) |
| YOLOv7-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_x_300e_coco_wo_nms.onnx) |
| YOLOv7P6-W6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_w6_300e_coco_wo_nms.onnx) |
| YOLOv7P6-E6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6_300e_coco_wo_nms.onnx) |
| YOLOv7P6-D6 | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_d6_300e_coco_wo_nms.onnx) |
| YOLOv7P6-E6E | 1280 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7p6_e6e_300e_coco_wo_nms.onnx) |
| YOLOv7-tiny | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_300e_coco_wo_nms.onnx) |
| YOLOv7-tiny | 416 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_416_300e_coco_wo_nms.onnx) |
| YOLOv7-tiny | 320 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov7/yolov7_tiny_320_300e_coco_wo_nms.onnx) |
</details>
### [YOLOv8](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8)
<details>
<summary> Baseline </summary>
| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | download | config |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| *YOLOv8-n | 640 | 16 | 500e | 2.4 | 37.3 | 53.0 | 3.16 | 8.7 | [model](https://paddledet.bj.bcebos.com/models/yolov8_n_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_n_300e_coco.yml) |
| *YOLOv8-s | 640 | 16 | 500e | 3.4 | 44.9 | 61.8 | 11.17 | 28.6 | [model](https://paddledet.bj.bcebos.com/models/yolov8_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_s_300e_coco.yml) |
| *YOLOv8-m | 640 | 16 | 500e | 6.5 | 50.2 | 67.3 | 25.90 | 78.9 | [model](https://paddledet.bj.bcebos.com/models/yolov8_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_m_300e_coco.yml) |
| *YOLOv8-l | 640 | 16 | 500e | 10.0 | 52.8 | 69.6 | 43.69 | 165.2 | [model](https://paddledet.bj.bcebos.com/models/yolov8_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_l_300e_coco.yml) |
| *YOLOv8-x | 640 | 16 | 500e | 15.1 | 53.8 | 70.6 | 68.23 | 257.8 | [model](https://paddledet.bj.bcebos.com/models/yolov8_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8_x_300e_coco.yml) |
| *YOLOv8-P6-x | 1280 | 16 | 500e | 55.0 | - | - | 97.42 | 522.93 | [model](https://paddledet.bj.bcebos.com/models/yolov8p6_x_500e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8/yolov8p6_x_500e_coco.yml) |
</details>
<details>
<summary> Deploy Models </summary>
| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| YOLOv8-n | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_n_500e_coco_wo_nms.onnx) |
| YOLOv8-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_s_500e_coco_wo_nms.onnx) |
| YOLOv8-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_m_500e_coco_wo_nms.onnx) |
| YOLOv8-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_l_500e_coco_wo_nms.onnx) |
| YOLOv8-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/yolov8/yolov8_x_500e_coco_wo_nms.onnx) |
</details>
### [RTMDet](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet)
<details>
<summary> Baseline </summary>
| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAP<sup>val<br>0.5:0.95 | mAP<sup>val<br>0.5 | Params(M) | FLOPs(G) | download | config |
| :------------- | :------- | :-------: | :------: | :------------: | :---------------------: | :----------------: |:---------: | :------: |:---------------: |:-----: |
| *RTMDet-t | 640 | 32 | 300e | 2.8 | 40.9 | 57.9 | 4.90 | 16.21 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_t_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_t_300e_coco.yml) |
| *RTMDet-s | 640 | 32 | 300e | 3.3 | 44.5 | 62.0 | 8.89 | 29.71 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_s_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_s_300e_coco.yml) |
| *RTMDet-m | 640 | 32 | 300e | 6.4 | 49.1 | 66.8 | 24.71 | 78.47 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_m_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_m_300e_coco.yml) |
| *RTMDet-l | 640 | 32 | 300e | 10.2 | 51.2 | 68.8 | 52.31 | 160.32 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_l_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_l_300e_coco.yml) |
| *RTMDet-x | 640 | 32 | 300e | 18.0 | 52.6 | 70.4 | 94.86 | 283.12 |[model](https://paddledet.bj.bcebos.com/models/rtmdet_x_300e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/rtmdet/rtmdet_x_300e_coco.yml) |
</details>
<details>
<summary> Deploy Models </summary>
| Model | Input Size | Exported weights(w/o NMS) | ONNX(w/o NMS) |
| :-------- | :--------: | :---------------------: | :----------------: |
| RTMDet-t | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_t_300e_coco_wo_nms.onnx) |
| RTMDet-s | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_s_300e_coco_wo_nms.onnx) |
| RTMDet-m | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_m_300e_coco_wo_nms.onnx) |
| RTMDet-l | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_l_300e_coco_wo_nms.onnx) |
| RTMDet-x | 640 | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_w_nms.zip) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_wo_nms.zip) | [( w/ nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_w_nms.onnx) &#124; [( w/o nms)](https://paddledet.bj.bcebos.com/deploy/yoloseries/rtmdet/rtmdet_x_300e_coco_wo_nms.onnx) |
</details>
### **Notes**
- All the models are trained on COCO train2017 dataset and evaluated on val2017 dataset. The * in front of the model indicates that the training is being updated.
- Please check the specific accuracy and speed details in [PP-YOLOE](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/ppyoloe),[YOLOX](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolox),[YOLOv5](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov5),[YOLOv6](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov6),[YOLOv7](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov7). **Note that YOLOv5, YOLOv6 and YOLOv7 have not adopted `multi_label` to eval**.
- TRT-FP16-Latency(ms) is the time spent in testing under TensorRT-FP16, **excluding data preprocessing and model output post-processing (NMS)**. The test adopts single card **Tesla T4 GPU, batch size=1**, and the test environment is **paddlepaddle-2.3.2**, **CUDA 11.2**, **CUDNN 8.2**, **GCC-8.2**, **TensorRT 8.0.3.4**. Please refer to the respective model homepage for details.
- For **FLOPs(G) and Params(M)**, you should first install [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), `pip install paddleslim`, then set `print_flops: True` and `print_params: True` in [runtime.yml](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/runtime.yml). Make sure **single scale** like 640x640, **MACs are printedFLOPs=2*MACs**.
- Based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), quantitative training of YOLO series models can achieve basically lossless accuracy and generally improve the speed by more than 30%. For details, please refer to [auto_compression](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression).
### [VOC](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc)
<details>
<summary> Baseline </summary>
| Model | Input Size | images/GPU | Epoch | TRT-FP16-Latency(ms) | mAP(0.50,11point) | Params(M) | FLOPs(G) | download | config |
| :-----------: | :-------: | :-------: | :------: | :------------: | :---------------: | :------------------: |:-----------------: | :------: | :------: |
| YOLOv5-s | 640 | 16 | 60e | 3.2 | 80.3 | 7.24 | 16.54 | [model](https://paddledet.bj.bcebos.com/models/yolov5_s_60e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolov5_s_60e_voc.yml) |
| YOLOv7-tiny | 640 | 32 | 60e | 2.6 | 80.2 | 6.23 | 6.90 | [model](https://paddledet.bj.bcebos.com/models/yolov7_tiny_60e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolov7_tiny_60e_voc.yml) |
| YOLOX-s | 640 | 8 | 40e | 3.0 | 82.9 | 9.0 | 26.8 | [model](https://paddledet.bj.bcebos.com/models/yolox_s_40e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/yolox_s_40e_voc.yml) |
| PP-YOLOE+_s | 640 | 8 | 30e | 2.9 | 86.7 | 7.93 | 17.36 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_30e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc/ppyoloe_plus_crn_s_30e_voc.yml) |
</details>
**Note:**
- The VOC mAP is `mAP(IoU=0.5)`, and all the models **have not adopted `multi_label` to eval**.
- All YOLO VOC models are loaded with the COCO weights of their respective models as pre-train weights. Each config file uses 8 GPUs by default, which can be used as a reference for setting custom datasets. The specific mAP will vary depending on the datasets;
- We recommend to use YOLO detection model **with a total `batch_size` at least greater than `64` to train**. If the resources are insufficient, please **use the smaller model** or **reduce the input size of the model**. To ensure high detection accuracy, **you'd better not try to using single GPU or total `batch_size` less than `64` for training**;
- Params (M) and FLOPs (G) are measured during training. YOLOv7 has no s model, so tiny model is selected;
- For TRT-FP16 Latency (ms) speed measurement, please refer to the config homepage of each YOLO model;
## UserGuide
Download MS-COCO dataset, [official website](https://cocodataset.org). The download links are: [annotations](http://images.cocodataset.org/annotations/annotations_trainval2017.zip), [train2017](http://images.cocodataset.org/zips/train2017.zip), [val2017](http://images.cocodataset.org/zips/val2017.zip), [test2017](http://images.cocodataset.org/zips/test2017.zip).
The download link provided by PaddleDetection team is: [coco](https://bj.bcebos.com/v1/paddledet/data/coco.tar)(about 22G) and [test2017](https://bj.bcebos.com/v1/paddledet/data/cocotest2017.zip). Note that test2017 is optional, and the evaluation is based on val2017.
### **Pipeline**
Write the following commands in a script file, such as ```run.sh```, and run as```sh run.sh```. You can also run the command line sentence by sentence.
```bash
model_name=ppyoloe # yolov7
job_name=ppyoloe_plus_crn_l_80e_coco # yolov7_tiny_300e_coco
config=configs/${model_name}/${job_name}.yml
log_dir=log_dir/${job_name}
# weights=https://bj.bcebos.com/v1/paddledet/models/${job_name}.pdparams
weights=output/${job_name}/model_final.pdparams
# 1.trainingsingle GPU / multi GPU
# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp
python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp
# 2.eval
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=${weights} --classwise
# 3.infer
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c ${config} -o weights=${weights} --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.5
# 4.export
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=${weights} # exclude_nms=True trt=True
# 5.deploy infer
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU
# 6.deploy speed, add '--run_mode=trt_fp16' to test in TensorRT FP16 mode
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU --run_benchmark=True # --run_mode=trt_fp16
# 7.export onnx
paddle2onnx --model_dir output_inference/${job_name} --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ${job_name}.onnx
# 8.onnx speed
/usr/local/TensorRT-8.0.3.4/bin/trtexec --onnx=${job_name}.onnx --workspace=4096 --avgRuns=10 --shapes=input:1x3x640x640 --fp16
```
**Note**
- If you want to switch models, just modify the first two lines, such as:
```
model_name=yolov7
job_name=yolov7_tiny_300e_coco
```
- For **exporting onnx**, you should install [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX) by `pip install paddle2onnx` at first.
- For **FLOPs(G) and Params(M)**, you should install [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) by `pip install paddleslim` at first, then set `print_flops: True` and `print_params: True` in [runtime.yml](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/runtime.yml). Make sure **single scale** like 640x640, **MACs are printedFLOPs=2*MACs**.
### CustomDataset
#### preparation
1.For the annotation of custom dataset, please refer to[DetAnnoTools](../tutorials/data/DetAnnoTools.md);
2.For training preparation of custom datasetplease refer to[PrepareDataSet](../tutorials/PrepareDataSet.md).
#### fintune
In addition to changing the path of the dataset, it is generally recommended to load **the COCO pre training weight of the corresponding model** to fintune, which will converge faster and achieve higher accuracy, such as
```base
# fintune with single GPU
# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams
# fintune with multi GPU
python -m paddle.distributed.launch --log_dir=./log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams
```
**Note:**
- The fintune training will show that the channels of the last layer of the head classification branch is not matched, which is a normal situation, because the number of custom dataset is generally inconsistent with that of COCO dataset;
- In general, the number of epochs for fintune training can be set less, and the lr setting is also smaller, such as 1/10. The highest accuracy may occur in one of the middle epochs;
#### Predict and export:
When using custom dataset to predict and export models, if the path of the TestDataset dataset is set incorrectly, COCO 80 categories will be used by default.
In addition to the correct path setting of the TestDataset dataset, you can also modify and add the corresponding `label_list`. Txt file (one category is recorded in one line), and `anno_path` in TestDataset can also be set as an absolute path, such as:
```
TestDataset:
!ImageFolder
anno_path: label_list.txt # if not set dataset_dir, the anno_path will be relative path of PaddleDetection root directory
# dataset_dir: dataset/my_coco # if set dataset_dir, the anno_path will be dataset_dir/anno_path
```
one line in `label_list.txt` records a corresponding category
```
person
vehicle
```

View File

@@ -0,0 +1,54 @@
简体中文 | [English](SSLD_PRETRAINED_MODEL_en.md)
### Simple semi-supervised label knowledge distillation solution (SSLD)
### R-CNN on COCO
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 |
| :------------------- | :------------| :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: |
| ResNet50-vd-SSLDv2-FPN | Faster | 1 | 1x | ---- | 41.4 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Faster | 1 | 2x | ---- | 42.3 | - | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Mask | 1 | 1x | ---- | 42.0 | 38.2 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Mask | 1 | 2x | ---- | 42.7 | 38.9 | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 1x | ---- | 44.4 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 2x | ---- | 45.0 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 1x | ---- | 44.9 | 39.1 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 2x | ---- | 45.7 | 39.7 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
### YOLOv3 on COCO
| 骨架网络 | 输入尺寸 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 |
| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: |
| MobileNet-V1-SSLD | 608 | 8 | 270e | ---- | 31.0 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
| MobileNet-V1-SSLD | 416 | 8 | 270e | ---- | 30.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
| MobileNet-V1-SSLD | 320 | 8 | 270e | ---- | 28.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
### YOLOv3 on Pasacl VOC
| 骨架网络 | 输入尺寸 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 |
| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: |
| MobileNet-V1-SSLD | 608 | 8 | 270e | - | 78.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
| MobileNet-V1-SSLD | 416 | 8 | 270e | - | 79.6 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
| MobileNet-V1-SSLD | 320 | 8 | 270e | - | 77.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
| MobileNet-V3-SSLD | 608 | 8 | 270e | - | 80.4 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
| MobileNet-V3-SSLD | 416 | 8 | 270e | - | 79.2 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
| MobileNet-V3-SSLD | 320 | 8 | 270e | - | 77.3 | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
**注意事项:**
- [SSLD](https://arxiv.org/abs/2103.05959)是一种知识蒸馏方法我们使用蒸馏后性能更强的backbone预训练模型进一步提升检测精度详细方案请参考[知识蒸馏教程](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/advanced_tutorials/distillation/distillation_en.md)
![demo image](../images/ssld_model.png)
## Citations
```
@misc{cui2021selfsupervision,
title={Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones},
author={Cheng Cui and Ruoyu Guo and Yuning Du and Dongliang He and Fu Li and Zewu Wu and Qiwen Liu and Shilei Wen and Jizhou Huang and Xiaoguang Hu and Dianhai Yu and Errui Ding and Yanjun Ma},
year={2021},
eprint={2103.05959},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

View File

@@ -0,0 +1,53 @@
English | [简体中文](SSLD_PRETRAINED_MODEL.md)
### Simple semi-supervised label knowledge distillation solution (SSLD)
### R-CNN on COCO
| Backbone | Model | Images/GPU | Lr schd | FPS | Box AP | Mask AP | Download | Config |
| :------------------- | :------------| :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: |
| ResNet50-vd-SSLDv2-FPN | Faster | 1 | 1x | ---- | 41.4 | - | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Faster | 1 | 2x | ---- | 42.3 | - | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Mask | 1 | 1x | ---- | 42.0 | 38.2 | [model](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Mask | 1 | 2x | ---- | 42.7 | 38.9 | [model](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 1x | ---- | 44.4 | - | [model](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 2x | ---- | 45.0 | - | [model](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 1x | ---- | 44.9 | 39.1 | [model](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 2x | ---- | 45.7 | 39.7 | [model](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
### YOLOv3 on COCO
| Backbone | Input shape | Images/GPU | Lr schd | FPS | Box AP | Download | Config |
| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: |
| MobileNet-V1-SSLD | 608 | 8 | 270e | ---- | 31.0 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
| MobileNet-V1-SSLD | 416 | 8 | 270e | ---- | 30.6 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
| MobileNet-V1-SSLD | 320 | 8 | 270e | ---- | 28.4 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
### YOLOv3 on Pasacl VOC
| Backbone | Input shape | Images/GPU | Lr schd | FPS | Box AP | Download | Config |
| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: |
| MobileNet-V1-SSLD | 608 | 8 | 270e | - | 78.3 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
| MobileNet-V1-SSLD | 416 | 8 | 270e | - | 79.6 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
| MobileNet-V1-SSLD | 320 | 8 | 270e | - | 77.3 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
| MobileNet-V3-SSLD | 608 | 8 | 270e | - | 80.4 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
| MobileNet-V3-SSLD | 416 | 8 | 270e | - | 79.2 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
| MobileNet-V3-SSLD | 320 | 8 | 270e | - | 77.3 | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
**Notes:**
- [SSLD](https://arxiv.org/abs/2103.05959) is a knowledge distillation method. We use the stronger backbone pretrained model after distillation to further improve the detection accuracy. Please refer to the [knowledge distillation tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/advanced_tutorials/distillation/distillation_en.md).
![demo image](../images/ssld_model.png)
## Citations
```
@misc{cui2021selfsupervision,
title={Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones},
author={Cheng Cui and Ruoyu Guo and Yuning Du and Dongliang He and Fu Li and Zewu Wu and Qiwen Liu and Shilei Wen and Jizhou Huang and Xiaoguang Hu and Dianhai Yu and Errui Ding and Yanjun Ma},
year={2021},
eprint={2103.05959},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 203 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 491 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 191 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 311 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 165 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 479 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 182 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 338 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 820 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 346 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 939 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 312 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 148 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 428 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 982 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 846 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 287 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 457 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 277 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 179 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 145 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 195 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 481 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 142 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 512 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 668 KiB

View File

@@ -0,0 +1,66 @@
[English](DistributedTraining_en.md) | 简体中文
# 分布式训练
## 1. 简介
* 分布式训练指的是将训练任务按照一定方法拆分到多个计算节点进行计算再按照一定的方法对拆分后计算得到的梯度等信息进行聚合与更新。飞桨分布式训练技术源自百度的业务实践在自然语言处理、计算机视觉、搜索和推荐等领域经过超大规模业务检验。分布式训练的高性能是飞桨的核心优势技术之一PaddleDetection同时支持单机训练与多机训练。更多关于分布式训练的方法与文档可以参考[分布式训练快速开始教程](https://fleet-x.readthedocs.io/en/latest/paddle_fleet_rst/parameter_server/ps_quick_start.html)。
## 2. 使用方法
### 2.1 单机训练
* 以PP-YOLOE-s为例本地准备好数据之后使用`paddle.distributed.launch`或者`fleetrun`的接口启动训练任务即可。下面为运行脚本示例。
```bash
fleetrun \
--selected_gpu 0,1,2,3,4,5,6,7 \
tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \
--eval &>logs.txt 2>&1 &
```
### 2.2 多机训练
* 相比单机训练,多机训练时,只需要添加`--ips`的参数该参数表示需要参与分布式训练的机器的ip列表不同机器的ip用逗号隔开。下面为运行代码示例。
```shell
ip_list="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151"
fleetrun \
--ips=${ip_list} \
--selected_gpu 0,1,2,3,4,5,6,7 \
tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \
--eval &>logs.txt 2>&1 &
```
**注:**
* 不同机器的ip信息需要用逗号隔开可以通过`ifconfig`或者`ipconfig`查看。
* 不同机器之间需要做免密设置且可以直接ping通否则无法完成通信。
* 不同机器之间的代码、数据与运行命令或脚本需要保持一致,且所有的机器上都需要运行设置好的训练命令或者脚本。最终`ip_list`中的第一台机器的第一块设备是trainer0以此类推。
* 不同机器的起始端口可能不同,建议在启动多机任务前,在不同的机器中设置相同的多机运行起始端口,命令为`export FLAGS_START_PORT=17000`,端口值建议在`10000~20000`之间。
## 3. 性能效果测试
* 在3机8卡V100的机器上进行模型训练不同模型的精度、训练耗时、多机加速比情况如下所示。
| 模型 | 数据集 | 配置 | 单机8卡耗时/精度 | 3机8卡耗时/精度 | 加速比 |
|:---------:|:--------:|:--------:|:--------:|:--------:|:------:|
| PP-YOLOE-s | Objects365 | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | 301h/- | 162h/17.7% | **1.85** |
| PP-YOLOE-l | Objects365 | [ppyoloe_crn_l_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | 401h/- | 178h/30.3% | **2.25** |
* 在4机8卡V100的机器上进行模型训练不同模型的精度、训练耗时、多机加速比情况如下所示。
| 模型 | 数据集 | 配置 | 单机8卡耗时/精度 | 4机8卡耗时/精度 | 加速比 |
|:---------:|:--------:|:--------:|:--------:|:--------:|:------:|
| PP-YOLOE-s | COCO | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | 39h/42.7% | 13h/42.1% | **3.0** |
| PP-YOLOE-m | Objects365 | [ppyoloe_crn_m_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml) | 337h/- | 112h/24.6% | **3.0** |
| PP-YOLOE-x | Objects365 | [ppyoloe_crn_x_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml) | 464h/- | 125h/32.1% | **3.4** |
* **注意**
* 在训练的GPU卡数过多时精度会稍微有所损失1%左右此时可以尝试通过添加warmup或者适当增加迭代轮数来弥补精度损失。
* 这里的配置文件均提供的是COCO数据集的配置文件如果需要训练其他的数据集需要修改数据集路径。
* 上面的`PP-YOLOE`系列模型在多机训练过程中均设置单卡batch size为8同时学习率相比于单机8卡保持不变。

View File

@@ -0,0 +1,60 @@
English | [简体中文](DistributedTraining_cn.md)
## 1. Usage
### 1.1 Single-machine
* Take PP-YOLOE-s as an example, after preparing the data locally, use the interface of `paddle.distributed.launch` or `fleetrun` to start the training task. Below is an example of running the script.
```bash
fleetrun \
--selected_gpu 0,1,2,3,4,5,6,7 \
tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \
--eval &>logs.txt 2>&1 &
```
### 1.2 Multi-machine
* Compared with single-machine training, when training on multiple machines, you only need to add the `--ips` parameter, which indicates the ip list of machines that need to participate in distributed training. The ips of different machines are separated by commas. Below is an example of running code.
```shell
ip_list="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151"
fleetrun \
--ips=${ip_list} \
--selected_gpu 0,1,2,3,4,5,6,7 \
tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \
--eval &>logs.txt 2>&1 &
```
**Note:**
* The ip information of different machines needs to be separated by commas, which can be viewed through `ifconfig` or `ipconfig`.
* Password-free settings are required between different machines, and they can be pinged directly, otherwise the communication cannot be completed.
* The code, data, and running commands or scripts between different machines need to be consistent, and the set training commands or scripts need to be run on all machines. The first device of the first machine in the final `ip_list` is trainer0, and so on.
* The starting port of different machines may be different. It is recommended to set the same starting port for multi-machine running in different machines before starting the multi-machine task. The command is `export FLAGS_START_PORT=17000`, and the port value is recommended to be `10000~20000`.
## 2. Performance
* We conducted model training on 3x8 V100 GPUs. Accuracy, training time, and multi machine acceleration ratio of different models are shown below.
| Model | Dataset | Configuration | 8 GPU training time / Accuracy | 3x8 GPU training time / Accuracy | Acceleration ratio |
|:---------:|:--------:|:--------:|:--------:|:--------:|:------:|
| PP-YOLOE-s | Objects365 | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | 301h/- | 162h/17.7% | **1.85** |
| PP-YOLOE-l | Objects365 | [ppyoloe_crn_l_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml) | 401h/- | 178h/30.3% | **2.25** |
* We conducted model training on 4x8 V100 GPUs. Accuracy, training time, and multi machine acceleration ratio of different models are shown below.
| Model | Dataset | Configuration | 8 GPU training time / Accuracy | 4x8 GPU training time / Accuracy | Acceleration ratio |
|:---------:|:--------:|:--------:|:--------:|:--------:|:------:|
| PP-YOLOE-s | COCO | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml) | 39h/42.7% | 13h/42.1% | **3.0** |
| PP-YOLOE-m | Objects365 | [ppyoloe_crn_m_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml) | 337h/- | 112h/24.6% | **3.0** |
| PP-YOLOE-x | Objects365 | [ppyoloe_crn_x_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml) | 464h/- | 125h/32.1% | **3.4** |
* **Note**
* When the number of GPU cards for training is too large, the accuracy will be slightly lost (about 1%). At this time, you can try to warmup the training process or increase some training epochs to reduce the lost.
* The configuration files here are provided based on COCO datasets. If you need to train on other datasets, you need to modify the dataset path.
* For the multi-machine training process of `PP-YOLOE` series, the batch size of single card is set as 8 and learning rate is same as that of single machine.

View File

@@ -0,0 +1,57 @@
# FAQ第一期
**Q**SOLOv2训练mAP值宽幅震荡无上升趋势检测效果不好检测置信度超过了1的原因是
**A** SOLOv2训练不收敛的话先更新PaddleDetection到release/2.2或者develop分支尝试。
**Q** Optimizer中优化器支持哪几种
**A** Paddle中支持的优化器[Optimizer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html )在PaddleDetection中均支持需要手动修改下配置文件即可。
**Q** 在tools/infer.py加入如下函数得到FLOPs值为-1,请问原因?
**A** 更新PaddleDetection到release/2.2或者develop分支`print_flops`设为True即可打印FLOPs。
**Q** 使用官方的ReID模块时遇到了模块未注册的问题
**A** 请尝试`pip uninstall paddledet`并重新安装,或者`python setup.py install`
**Q** 大规模实用目标检测模型有动态图版本吗,或者可以转换为动态图版本吗?
**A** 大规模实用模型的动态图版本正在整理我们正在开发更大规模的通用预训练模型预计在2.3版本中发布。
**Q** Develop分支下FairMot预测视频问题预测视频时不会完全运行完毕。比如用一个300frame的视频代码会保存预测结果的每一帧图片但只保存到299张就没了并且也没有预测好的视频文件生成该如何解决
**A** 已经支持自己设置帧率infer视频请使用develop分支或release/2.2分支,命令如下:
```
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4 --frame_rate=20 --save_videos
```
**Q** 使用YOLOv3模型如何通过yml文件修改输入图片尺寸
**A** 模型预测部署需要用到指定的尺寸时,首先在训练前需要修改`configs/_base_/yolov3_reader.yml`中的`TrainReader``BatchRandomResize``target_size`包含指定的尺寸,训练完成后,在评估或者预测时,需要将`EvalReader``TestReader`中的`Resize``target_size`修改成对应的尺寸,如果是需要模型导出(export_model),则需要将`TestReader`中的`image_shape`修改为对应的图片输入尺寸 。
**Q** 以前的模型都是用静态图训练的,现在想用动态图训练,但想加载原来静态图的模型作为预训练模型,可以直接用加载静态图保存的模型断点吗?如不行,有其它方法吗?
**A** 静态图和动态图模型的权重的key做下映射一一对应转过去是可以的可以参考[这个代码](https://github.com/nemonameless/weights_st2dy )。但是不保证所有静态图的权重的key映射都能对应上静态图是把背景也训练了动态图去背景类训的而且现有动态图模型训出来的一般都比以前静态图更高资源时间够的情况下建议还是直接训动态图版本。
**Q** TTFNet训练过程中hm_loss异常
**A** 如果是单卡的话学习率需要对应降低8倍。另外ttfnet模型因为自身设置的学习率比较大可能会出现其他数据集训练出现不稳定的情况。建议pretrain_weights加载官方release出的coco数据集上训练好的模型然后将学习率再调低一些。

View File

@@ -0,0 +1,104 @@
# FAQ第零期
**Q:** 为什么我使用单GPU训练loss会出`NaN`? </br>
**A:** 配置文件中原始学习率是适配多GPU训练(8x GPU)若使用单GPU训练须对应调整学习率例如除以8
以[faster_rcnn_r50](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml) 为例,在静态图下计算规则表如下所示,它们是等价的,表中变化节点即为`piecewise decay`里的`boundaries`: </br>
| GPU数 |batch size/卡| 学习率 | 最大轮数 | 变化节点 |
| :---------: | :------------:|:------------: | :-------: | :--------------: |
| 2 | 1 | 0.0025 | 720000 | [480000, 640000] |
| 4 | 1 | 0.005 | 360000 | [240000, 320000] |
| 8 | 1| 0.01 | 180000 | [120000, 160000] |
* 上述方式适用于静态图下。在动态图中由于训练以epoch方式计数因此调整GPU卡数后只需要修改学习率即可修改方式和静态图相同.
**Q:** 自定义数据集时,配置文件里的`num_classes`应该如何设置? </br>
**A:** 动态图中,自定义数据集时将`num_classes`统一设置为自定义数据集的类别数即可,静态图中(static目录下)YOLO系列模型和anchor free系列模型将`num_classes`设置为自定义数据集类别即可其他模型如RCNN系列SSDRetinaNetSOLOv2等模型由于检测原理上分类中需要区分背景框和前景框设置的`num_classes`须为自定义数据集类别数+1即增加一类背景类。
**Q:** PP-YOLOv2模型训练使用`—eval`做训练中验证在第一次做eval的时候hang住,该如何处理?</br>
**A:** PP-YOLO系列模型如果只加载backbone的预训练权重从头开始训练的话收敛会比较慢当模型还没有较好收敛的时候做预测时由于输出的预测框比较混乱在NMS时做排序和滤除会非常耗时就好像eval时hang住了一样这种情况一般发生在使用自定义数据集并且自定义数据集样本数较少导致训练到第一次做eval的时候训练轮数较少模型还没有较好收敛的情况下可以通过如下三个方面排查解决。
* PaddleDetection中提供的默认配置一般是采用8卡训练的配置配置文件中的`batch_size`数为每卡的batch size若训练的时候不是使用8卡或者对`batch_size`有修改,需要等比例的调小初始`learning_rate`来获得较好的收敛效果
* 如果使用自定义数据集并且样本数比较少,建议增大`snapshot_epoch`数来增加第一次进行eval的时候的训练轮数来保证模型已经较好收敛
* 若使用自定义数据集训练可以加载我们发布的COCO或VOC数据集上训练好的权重进行finetune训练来加快收敛速度可以使用`-o pretrain_weights=xxx`的方式指定预训练权重xxx可以是Model Zoo里发布的模型权重链接
**Q:** 如何更好的理解reader和自定义修改reader文件
```
# 每张GPU reader进程个数
worker_num: 2
# 训练数据
TrainReader:
inputs_def:
num_max_boxes: 50
# 训练数据transforms
sample_transforms:
- Decode: {} # 图片解码将图片数据从numpy格式转为rgb格式是必须存在的一个OP
- Mixup: {alpha: 1.5, beta: 1.5} # Mixup数据增强对两个样本的gt_bbbox/gt_score操作构建虚拟的训练样本可选的OP
- RandomDistort: {} # 随机颜色失真可选的OP
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]} # 随机Canvas填充可选的OP
- RandomCrop: {} # 随机裁剪可选的OP
- RandomFlip: {} # 随机左右翻转默认概率0.5可选的OP
# batch_transforms
batch_transforms:
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeBox: {}
- PadBox: {num_max_boxes: 50}
- BboxXYXY2XYWH: {}
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
- Permute: {}
- Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
# 训练时batch_size
batch_size: 24
# 读取数据是否乱序
shuffle: true
# 是否丢弃最后不能完整组成batch的数据
drop_last: true
# mixup_epoch大于最大epoch表示训练过程一直使用mixup数据增广。默认值为-1表示不使用Mixup。如果删去- Mixup: {alpha: 1.5, beta: 1.5}这行代码则必须也将mixup_epoch设置为-1或者删除
mixup_epoch: 25000
# 是否通过共享内存进行数据读取加速,需要保证共享内存大小(如/dev/shm)满足大于1G
use_shared_memory: true
如果需要单尺度训练则去掉batch_transforms里的BatchRandomResize这一行在sample_transforms最后一行添加- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
Decode是必须保留的如果想要去除数据增强则可以注释或删除Mixup RandomDistort RandomExpand RandomCrop RandomFlip注意如果注释或删除Mixup则必须也将mixup_epoch这一行注释或删除或者设置为-1表示不使用Mixup
sample_transforms:
- Decode: {}
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
```
**Q:** 用户如何控制类别类别输出?即图中有多类目标只输出其中的某几类
**A:** 用户可自行在代码中进行修改,增加条件设置。
```
# filter by class_id
keep_class_id = [1, 2]
bbox_res = [e for e in bbox_res if int(e[0]) in keep_class_id]
```
https://github.com/PaddlePaddle/PaddleDetection/blob/b87a1ea86fa18ce69e44a17ad1b49c1326f19ff9/ppdet/engine/trainer.py#L438
**Q:** 用户自定义数据集训练,预测结果标签错误
**A:** 此类情况往往是用户在设置数据集路径时候并没有关注TestDataset中anno_path的路径问题。需要用户将anno_path设置成自己的路径。
```
TestDataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
```
**Q:** 如何打印网络FLOPs
**A:**`configs/runtime.yml`中设置`print_flops: true`同时需要安装PaddleSlim(比如pip install paddleslim)即可打印模型的FLOPs。
**Q:** 如何使用无标注框进行训练?
**A:**`configs/dataset/coco.py` 或者`configs/dataset/voc.py`中的TrainDataset下设置`allow_empty: true`, 此时允许数据集加载无标注框进行训练。该功能支持cocovoc数据格式RCNN系列和YOLO系列模型验证能够正常训练。另外如果无标注框数据过多会影响模型收敛在TrainDataset下可以设置`empty_ratio: 0.1`对无标注框数据进行随机采样控制无标注框的数据量占总数据量的比例默认值为1.,即使用全部无标注框

View File

@@ -0,0 +1,6 @@
# FAQ/常见问题
**PaddleDetection**非常感谢各位开发者提出任何使用问题或需求,我们根据大家的提问,总结**FAQ/常见问题**合集,并在**每周一**进行更新以下是往期的FAQ欢迎大家进行查阅。
- [FAQ第零期](./FAQ第零期.md)
- [FAQ第一期](./FAQ第一期.md)

View File

@@ -0,0 +1,146 @@
English | [简体中文](GETTING_STARTED_cn.md)
# Getting Started
## Installation
For setting up the running environment, please refer to [installation
instructions](INSTALL_cn.md).
## Data preparation
- Please refer to [PrepareDetDataSet](./data/PrepareDetDataSet_en.md) for data preparation
- Please set the data path for data configuration file in ```configs/datasets```
## Training & Evaluation & Inference
PaddleDetection provides scripts for training, evalution and inference with various features according to different configure. And for more distribued training details see [DistributedTraining].(./DistributedTraining_en.md)
```bash
# training on single-GPU
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
# training on multi-GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
# training on multi-machines and multi-GPUs
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
$fleetrun --ips="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" --selected_gpu 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
# GPU evaluation
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
# Inference
python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --infer_img=demo/000000570688.jpg -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
```
### Other argument list
list below can be viewed by `--help`
| FLAG | script supported | description | default | remark |
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
| -c | ALL | Select config file | None | **required**, such as `-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml` |
| -o | ALL | Set parameters in configure file | None | `-o` has higher priority to file configured by `-c`. Such as `-o use_gpu=False` |
| --eval | train | Whether to perform evaluation in training | False | set `--eval` if needed |
| -r/--resume_checkpoint | train | Checkpoint path for resuming training | None | such as `-r output/faster_rcnn_r50_1x_coco/10000` |
| --slim_config | ALL | Configure file of slim method | None | such as `--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml` |
| --use_vdl | train/infer | Whether to record the data with [VisualDL](https://github.com/paddlepaddle/visualdl), so as to display in VisualDL | False | VisualDL requires Python>=3.5 |
| --vdl\_log_dir | train/infer | VisualDL logging directory for image | train:`vdl_log_dir/scalar` infer: `vdl_log_dir/image` | VisualDL requires Python>=3.5 |
| --output_eval | eval | Directory for storing the evaluation output | None | such as `--output_eval=eval_output`, default is current directory |
| --json_eval | eval | Whether to evaluate with already existed bbox.json or mask.json | False | set `--json_eval` if needed and json path is set in `--output_eval` |
| --classwise | eval | Whether to eval AP for each class and draw PR curve | False | set `--classwise` if needed |
| --output_dir | infer | Directory for storing the output visualization files | `./output` | such as `--output_dir output` |
| --draw_threshold | infer | Threshold to reserve the result for visualization | 0.5 | such as `--draw_threshold 0.7` |
| --infer_dir | infer | Directory for images to perform inference on | None | One of `infer_dir` and `infer_img` is requied |
| --infer_img | infer | Image path | None | One of `infer_dir` and `infer_img` is requied, `infer_img` has higher priority over `infer_dir` |
| --save_results | infer | Whether to save detection results to file | False | Optional
## Examples
### Training
- Perform evaluation in training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --eval
```
Perform training and evalution alternatively and evaluate at each end of epoch. Meanwhile, the best model with highest MAP is saved at each epoch which has the same path as `model_final`.
If evaluation dataset is large, we suggest modifing `snapshot_epoch` in `configs/runtime.yml` to decrease evaluation times or evaluating after training.
- Fine-tune other task
When using pre-trained model to fine-tune other task, pretrain\_weights can be used directly. The parameters with different shape will be ignored automatically. For example:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# If the shape of parameters in program is different from pretrain_weights,
# then PaddleDetection will not use such parameters.
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
-o pretrain_weights=output/faster_rcnn_r50_1x_coco/model_final \
```
##### NOTES
- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`.
- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
- Checkpoints are saved in `output` by default, and can be revised from `save_dir` in `configs/runtime.yml`.
### Evaluation
- Evaluate by specified weights path and dataset path
```bash
export CUDA_VISIBLE_DEVICES=0
python -u tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
-o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
```
The path of model to be evaluted can be both local path and link in [MODEL_ZOO](../MODEL_ZOO_cn.md).
- Evaluate with json
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
--json_eval \
-output_eval evaluation/
```
The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory.
### Inference
- Output specified directory && Set up threshold
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
--infer_img=demo/000000570688.jpg \
--output_dir=infer_output/ \
--draw_threshold=0.5 \
-o weights=output/faster_rcnn_r50_fpn_1x_coco/model_final \
--use_vdl=True
```
`--draw_threshold` is an optional argument. Default is 0.5.
Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
## Deployment
Please refer to [depolyment](../../deploy/README_en.md)
## Model Compression
Please refer to [slim](../../configs/slim/README_en.md)

View File

@@ -0,0 +1,266 @@
[English](GETTING_STARTED.md) | 简体中文
# 30分钟快速上手PaddleDetection
PaddleDetection作为成熟的目标检测开发套件提供了从数据准备、模型训练、模型评估、模型导出到模型部署的全流程。在这个章节里面我们以路标检测数据集为例提供快速上手PaddleDetection的流程。
## 1 安装
关于安装配置运行环境,请参考[安装指南](INSTALL_cn.md)
在本演示案例中假定用户将PaddleDetection的代码克隆并放置在`/home/paddle`目录中。用户执行的命令操作均在`/home/paddle/PaddleDetection`目录下完成
## 2 准备数据
目前PaddleDetection支持COCO VOC WiderFace, MOT四种数据格式。
- 首先按照[准备数据文档](./data/PrepareDetDataSet.md) 准备数据。
- 然后设置`configs/datasets`中相应的coco或voc等数据配置文件中的数据路径。
- 在本项目中,我们使用路标识别数据集
```bash
python dataset/roadsign_voc/download_roadsign_voc.py
```
- 下载后的数据格式为
```
├── download_roadsign_voc.py
├── annotations
│ ├── road0.xml
│ ├── road1.xml
│ | ...
├── images
│ ├── road0.png
│ ├── road1.png
│ | ...
├── label_list.txt
├── train.txt
├── valid.txt
```
## 3 配置文件改动和说明
我们使用`configs/yolov3/yolov3_mobilenet_v1_roadsign`配置进行训练。
在静态图版本下一个模型往往可以通过两个配置文件一个主配置文件、一个reader的读取配置实现在PaddleDetection 2.0后续版本,采用了模块解耦设计,用户可以组合配置模块实现检测器,并可自由修改覆盖各模块配置,如下图所示
<center>
<img src="../images/roadsign_yml.png" width="500" >
</center>
<br><center>配置文件摘要</center></br>
从上图看到`yolov3_mobilenet_v1_roadsign.yml`配置需要依赖其他的配置文件。在该例子中需要依赖:
```bash
roadsign_voc.yml
runtime.yml
optimizer_40e.yml
yolov3_mobilenet_v1.yml
yolov3_reader.yml
--------------------------------------
yolov3_mobilenet_v1_roadsign 文件入口
roadsign_voc 主要说明了训练数据和验证数据的路径
runtime.yml 主要说明了公共的运行参数比如说是否使用GPU、每多少个epoch存储checkpoint等
optimizer_40e.yml 主要说明了学习率和优化器的配置。
ppyolov2_r50vd_dcn.yml 主要说明模型、和主干网络的情况。
ppyolov2_reader.yml 主要说明数据读取器配置如batch size并发加载子进程数等同时包含读取后预处理操作如resize、数据增强等等
```
<center><img src="../images/yaml_show.png" width="1000" ></center>
<br><center>配置文件结构说明</center></br>
### 修改配置文件说明
* 关于数据的路径修改说明
在修改配置文件中,用户如何实现自定义数据集是非常关键的一步,如何定义数据集请参考[如何自定义数据集](https://aistudio.baidu.com/aistudio/projectdetail/1917140)
* 默认学习率是适配多GPU训练(8x GPU)若使用单GPU训练须对应调整学习率例如除以8
* 更多使用问题,请参考[FAQ](FAQ)
## 4 训练
PaddleDetection提供了单卡/多卡训练模式,满足用户多种训练需求
* GPU单卡训练
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
```
* GPU多卡训练
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 #windows和Mac下不需要执行该命令
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
```
* [GPU多机多卡训练](./DistributedTraining_cn.md)
```bash
$fleetrun \
--ips="10.127.6.17,10.127.5.142,10.127.45.13,10.127.44.151" \
--selected_gpu 0,1,2,3,4,5,6,7 \
tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
```
* Fine-tune其他任务
使用预训练模型fine-tune其他任务时可以直接加载预训练模型形状不匹配的参数将自动忽略例如
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# 如果模型中参数形状与加载权重形状不同,将不会加载这类参数
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o pretrain_weights=output/model_final
```
* 模型恢复训练
在日常训练过程中,有的用户由于一些原因导致训练中断,用户可以使用-r的命令恢复训练
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -r output/faster_rcnn_r50_1x_coco/10000
```
## 5 评估
* 默认将训练生成的模型保存在当前`output`文件夹下
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_roadsign.pdparams
```
* 边训练,边评估
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 #windows和Mac下不需要执行该命令
python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval
```
在训练中交替执行评估, 评估在每个epoch训练结束后开始。每次评估后还会评出最佳mAP模型保存到`best_model`文件夹下。
如果验证集很大,测试将会比较耗时,建议调整`configs/runtime.yml` 文件中的 `snapshot_epoch`配置以减少评估次数,或训练完成后再进行评估。
- 通过json文件评估
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
--json_eval \
-output_eval evaluation/
```
* 上述命令中没有加载模型的选项则使用配置文件中weights的默认配置`weights`表示训练过程中保存的最后一轮模型文件
* json文件必须命名为bbox.json或者mask.json放在`evaluation`目录下。
## 6 预测
```bash
python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --infer_img=demo/road554.png -o weights=https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_roadsign.pdparams
```
* 设置参数预测
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
--infer_img=demo/road554.png \
--output_dir=infer_output/ \
--draw_threshold=0.5 \
-o weights=output/yolov3_mobilenet_v1_roadsign/model_final \
--use_vdl=True
```
`--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,不同阈值会产生不同的结果
`keep_top_k`表示设置输出目标的最大数量默认值为100用户可以根据自己的实际情况进行设定。
结果如下图:
![road554 image](../images/road554.png)
## 7 训练可视化
当打开`use_vdl`开关后为了方便用户实时查看训练过程中状态PaddleDetection集成了VisualDL可视化工具当打开`use_vdl`开关后,记录的数据包括:
1. loss变化趋势
2. mAP变化趋势
```bash
export CUDA_VISIBLE_DEVICES=0 #windows和Mac下不需要执行该命令
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
--use_vdl=true \
--vdl_log_dir=vdl_dir/scalar \
```
使用如下命令启动VisualDL查看日志
```shell
# 下述命令会在127.0.0.1上启动一个服务支持通过前端web页面查看可以通过--host这个参数指定实际ip地址
visualdl --logdir vdl_dir/scalar/
```
在浏览器输入提示的网址,效果如下:
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/ab767a202f084d1589f7d34702a75a7ef5d0f0a7e8c445bd80d54775b5761a8d" width="900" ></center>
<br><center>图VDL效果演示</center></br>
**参数列表**
以下列表可以通过`--help`查看
| FLAG | 支持脚本 | 用途 | 默认值 | 备注 |
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
| -c | ALL | 指定配置文件 | None | **必选**,例如-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml |
| -o | ALL | 设置或更改配置文件里的参数内容 | None | 相较于`-c`设置的配置文件有更高优先级,例如:`-o use_gpu=False` |
| --eval | train | 是否边训练边测试 | False | 如需指定,直接`--eval`即可 |
| -r/--resume_checkpoint | train | 恢复训练加载的权重路径 | None | 例如:`-r output/faster_rcnn_r50_1x_coco/10000` |
| --slim_config | ALL | 模型压缩策略配置文件 | None | 例如`--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml` |
| --use_vdl | train/infer | 是否使用[VisualDL](https://github.com/paddlepaddle/visualdl)记录数据进而在VisualDL面板中显示 | False | VisualDL需Python>=3.5 |
| --vdl\_log_dir | train/infer | 指定 VisualDL 记录数据的存储路径 | train:`vdl_log_dir/scalar` infer: `vdl_log_dir/image` | VisualDL需Python>=3.5 |
| --output_eval | eval | 评估阶段保存json路径 | None | 例如 `--output_eval=eval_output`, 默认为当前路径 |
| --json_eval | eval | 是否通过已存在的bbox.json或者mask.json进行评估 | False | 如需指定,直接`--json_eval`即可, json文件路径在`--output_eval`中设置 |
| --classwise | eval | 是否评估单类AP和绘制单类PR曲线 | False | 如需指定,直接`--classwise`即可 |
| --output_dir | infer/export_model | 预测后结果或导出模型保存路径 | `./output` | 例如`--output_dir=output` |
| --draw_threshold | infer | 可视化时分数阈值 | 0.5 | 例如`--draw_threshold=0.7` |
| --infer_dir | infer | 用于预测的图片文件夹路径 | None | `--infer_img`和`--infer_dir`必须至少设置一个 |
| --infer_img | infer | 用于预测的图片路径 | None | `--infer_img`和`--infer_dir`必须至少设置一个,`infer_img`具有更高优先级 |
| --save_results | infer | 是否在文件夹下将图片的预测结果保存到文件中 | False | 可选 |
## 8 模型导出
在模型训练过程中保存的模型文件是包含前向预测和反向传播的过程,在实际的工业部署则不需要反向传播,因此需要将模型进行导成部署需要的模型格式。
在PaddleDetection中提供了 `tools/export_model.py`脚本来导出模型
```bash
python tools/export_model.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --output_dir=./inference_model \
-o weights=output/yolov3_mobilenet_v1_roadsign/best_model
```
预测模型会导出到`inference_model/yolov3_mobilenet_v1_roadsign`目录下,分别为`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`,`model.pdmodel` 如果不指定文件夹,模型则会导出在`output_inference`
* 更多关于模型导出的文档,请参考[模型导出文档](../../deploy/EXPORT_MODEL.md)
## 9 模型压缩
为了进一步对模型进行优化PaddleDetection提供了基于PaddleSlim进行模型压缩的完整教程和benchmark。目前支持的方案
* 裁剪
* 量化
* 蒸馏
* 联合策略
* 更多关于模型压缩的文档,请参考[模型压缩文档](../../configs/slim/README.md)。
## 10 预测部署
PaddleDetection提供了PaddleInference、PaddleServing、PaddleLite多种部署形式支持服务端、移动端、嵌入式等多种平台提供了完善的Python和C++部署方案。
* 在这里我们以Python为例说明如何使用PaddleInference进行模型部署
```bash
python deploy/python/infer.py --model_dir=./output_inference/yolov3_mobilenet_v1_roadsign --image_file=demo/road554.png --device=GPU
```
* 同时`infer.py`提供了丰富的接口,用户进行接入视频文件、摄像头进行预测,更多内容请参考[Python端预测部署](../../deploy/python)
### PaddleDetection支持的部署形式说明
|形式|语言|教程|设备/平台|
|-|-|-|-|
|PaddleInference|Python|已完善|Linux(arm X86)、Windows
|PaddleInference|C++|已完善|Linux(arm X86)、Windows|
|PaddleServing|Python|已完善|Linux(arm X86)、Windows|
|PaddleLite|C++|已完善|Android、IOS、FPGA、RK...
* 更多关于预测部署的文档,请参考[预测部署文档](../../deploy/README.md)。

View File

@@ -0,0 +1,69 @@
# 目标检测热力图
## 1.简介
基于backbone/roi特征图计算物体预测框的cam(类激活图), 目前支持基于FasterRCNN/MaskRCNN系列, PPYOLOE系列, 以及BlazeFace, SSD, Retinanet网络。
## 2.使用方法
* 以PP-YOLOE为例准备好数据之后指定网络配置文件、模型权重地址和图片路径以及输出文件夹路径使用脚本调用tools/cam_ppdet.py计算图片中物体预测框的grad_cam热力图。下面为运行脚本示例。
```shell
python tools/cam_ppdet.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_ppyoloe --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams
```
* **参数**
| FLAG | 用途 |
|:--------------------------:|:--------------------------------------------------------------------------------------------------------------------------:|
| -c | 指定配置文件 |
| --infer_img | 用于预测的图片路径 |
| --cam_out | 指定输出路径 |
| --target_feature_layer_name | 计算cam的特征图位置, 如model.backbone、 model.bbox_head.roi_extractor |
| -o | 设置或更改配置文件里的参数内容, 如 -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams |
* 运行效果
<center>
<img src="../images/grad_cam_ppyoloe_demo.jpg" width="500" >
</center>
<br><center>cam_ppyoloe/225.jpg</center></br>
## 3. 目前支持基于FasterRCNN/MaskRCNN系列, PPYOLOE系列以及BlazeFace, SSD, Retinanet网络。
* PPYOLOE网络热图可视化脚本
```bash
python tools/cam_ppdet.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_ppyoloe --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams
```
* MaskRCNN网络roi特征热图可视化脚本
```bash
python tools/cam_ppdet.py -c configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_mask_rcnn_roi --target_feature_layer_name model.bbox_head.roi_extractor -o weights=https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams
```
* MaskRCNN网络backbone特征的热图可视化脚本
```bash
python tools/cam_ppdet.py -c configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_mask_rcnn_backbone --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams
```
* FasterRCNN网络基于roi特征的热图可视化脚本
```bash
python tools/cam_ppdet.py -c configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_faster_rcnn_roi --target_feature_layer_name model.bbox_head.roi_extractor -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams
```
* FasterRCNN网络基于backbone特征的热图可视化脚本
```bash
python tools/cam_ppdet.py -c configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_faster_rcnn_backbone --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams
```
* BlaczeFace网络backbone特征热图可视化脚本
```bash
python tools/cam_ppdet.py -c configs/face_detection/blazeface_1000e.yml --infer_img demo/hrnet_demo.jpg --cam_out cam_blazeface --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams
```
* SSD网络backbone特征热图可视化脚本
```bash
python tools/cam_ppdet.py -c configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml --infer_img demo/000000014439.jpg --cam_out cam_ssd --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams
```
* Retinanet网络backbone特征热图可视化脚本
```bash
python tools/cam_ppdet.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_retinanet --target_feature_layer_name model.backbone -o weights=https://bj.bcebos.com/v1/paddledet/models/retinanet_r50_fpn_2x_coco.pdparams
```

View File

@@ -0,0 +1,71 @@
# Object detection grad_cam heatmap
## 1.Introduction
Calculate the cam (class activation map) of the object predict bbox based on the backbone/roi feature map, currently supports networks based on FasterRCNN/MaskRCNN series, PPYOLOE series and BlazeFace, SSD, Retinanet.
## 2.Usage
* Taking PP-YOLOE as an example, after preparing the data, specify the network configuration file, model weight address, image path and output folder path, and then use the script to call tools/cam_ppdet.py to calculate the grad_cam heat map of the prediction box. Below is an example run script.
```shell
python tools/cam_ppdet.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_ppyoloe --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams
```
* **Arguments**
| FLAG | description |
| :----------------------: |:---------------------------------------------------------------------------------------------------------------------------------:|
| -c | Select config file |
| --infer_img | Image path |
| --cam_out | Directory for output |
| --target_feature_layer_name | The position of featuremap to do gradcam, for example:model.backbone, model.bbox_head.roi_extractor |
| -o | Set parameters in configure file, for example: -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams |
* result
<center>
<img src="../images/grad_cam_ppyoloe_demo.jpg" width="500" >
</center>
<br><center>cam_ppyoloe/225.jpg</center></br>
## 3.Currently supports networks based on FasterRCNN/MaskRCNN series, PPYOLOE series and BlazeFace, SSD, Retinanet.
* PPYOLOE bbox heat map visualization script (with backbone featuremap)
```bash
python tools/cam_ppdet.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_ppyoloe -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams
```
* MaskRCNN bbox heat map visualization script (with roi featuremap)
```bash
python tools/cam_ppdet.py -c configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_mask_rcnn_roi --target_feature_layer_name model.bbox_head.roi_extractor -o weights=https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams
```
* MaskRCNN bbox heat map visualization script (with backbone featuremap)
```bash
python tools/cam_ppdet.py -c configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_mask_rcnn_backbone --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams
```
* FasterRCNN bbox heat map visualization script (with roi featuremap)
```bash
python tools/cam_ppdet.py -c configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_faster_rcnn_roi --target_feature_layer_name model.bbox_head.roi_extractor -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams
```
* FasterRCNN bbox heat map visualization script (with backbone featuremap)
```bash
python tools/cam_ppdet.py -c configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_faster_rcnn_backbone --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams
```
* BlaczeFace bbox heat map visualization script (with backbone featuremap)
```bash
python tools/cam_ppdet.py -c configs/face_detection/blazeface_1000e.yml --infer_img demo/hrnet_demo.jpg --cam_out cam_blazeface --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams
```
* SSD bbox heat map visualization script (with backbone featuremap)
```bash
python tools/cam_ppdet.py -c configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml --infer_img demo/000000014439.jpg --cam_out cam_ssd --target_feature_layer_name model.backbone -o weights=https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams
```
* Retinanet bbox heat map visualization script (with backbone featuremap)
```bash
python tools/cam_ppdet.py -c configs/retinanet/retinanet_r50_fpn_2x_coco.yml --infer_img demo/000000014439.jpg --cam_out cam_retinanet --target_feature_layer_name model.backbone -o weights=https://bj.bcebos.com/v1/paddledet/models/retinanet_r50_fpn_2x_coco.pdparams
```

View File

@@ -0,0 +1,138 @@
English | [简体中文](INSTALL_cn.md)
# Installation
This document covers how to install PaddleDetection and its dependencies
(including PaddlePaddle), together with COCO and Pascal VOC dataset.
For general information about PaddleDetection, please see [README.md](https://github.com/PaddlePaddle/PaddleDetection/tree/develop).
## Requirements:
- PaddlePaddle 2.2
- OS 64 bit
- Python 3(3.5.1+/3.6/3.7/3.8/3.9/3.10)64 bit
- pip/pip3(9.0.1+), 64 bit
- CUDA >= 10.2
- cuDNN >= 7.6
Dependency of PaddleDetection and PaddlePaddle:
| PaddleDetection version | PaddlePaddle version | tips |
| :----------------: | :---------------: | :-------: |
| develop | >= 2.3.2 | Dygraph mode is set as default |
| release/2.6 | >= 2.3.2 | Dygraph mode is set as default |
| release/2.5 | >= 2.2.2 | Dygraph mode is set as default |
| release/2.4 | >= 2.2.2 | Dygraph mode is set as default |
| release/2.3 | >= 2.2.0rc | Dygraph mode is set as default |
| release/2.2 | >= 2.1.2 | Dygraph mode is set as default |
| release/2.1 | >= 2.1.0 | Dygraph mode is set as default |
| release/2.0 | >= 2.0.1 | Dygraph mode is set as default |
| release/2.0-rc | >= 2.0.1 | -- |
| release/0.5 | >= 1.8.4 | Cascade R-CNN and SOLOv2 depends on 2.0.0.rc |
| release/0.4 | >= 1.8.4 | PP-YOLO depends on 1.8.4 |
| release/0.3 | >=1.7 | -- |
## Instruction
### 1. Install PaddlePaddle
```
# CUDA10.2
python -m pip install paddlepaddle-gpu==2.3.2 -i https://mirror.baidu.com/pypi/simple
# CPU
python -m pip install paddlepaddle==2.3.2 -i https://mirror.baidu.com/pypi/simple
```
- For more CUDA version or environment to quick install, please refer to the [PaddlePaddle Quick Installation document](https://www.paddlepaddle.org.cn/install/quick)
- For more installation methods such as conda or compile with source code, please refer to the [installation document](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html)
Please make sure that your PaddlePaddle is installed successfully and the version is not lower than the required version. Use the following command to verify.
```
# check
>>> import paddle
>>> paddle.utils.run_check()
# confirm the paddle's version
python -c "import paddle; print(paddle.__version__)"
```
**Note**
1. If you want to use PaddleDetection on multi-GPU, please install NCCL at first.
### 2. Install PaddleDetection
**Note:** Installing via pip only supports Python3
```
# Clone PaddleDetection repository
cd <path/to/clone/PaddleDetection>
git clone https://github.com/PaddlePaddle/PaddleDetection.git
# Install other dependencies
cd PaddleDetection
pip install -r requirements.txt
# Compile and install paddledet
python setup.py install
```
**Note**
1. If you are working on Windows OS, `pycocotools` installing may failed because of the origin version of cocoapi does not support windows, another version can be used used which only supports Python3:
```pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI```
2. If you are using Python <= 3.6, `pycocotools` installing may failed with error like `distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('cython>=0.27.3')`, please install `cython` firstly, for example `pip install cython`
After installation, make sure the tests pass:
```shell
python ppdet/modeling/tests/test_architectures.py
```
If the tests are passed, the following information will be prompted:
```
.......
----------------------------------------------------------------------
Ran 7 tests in 12.816s
OK
```
## Use built Docker images
> If you do not have a Docker environment, please refer to [Docker](https://www.docker.com/).
We provide docker images containing the latest PaddleDetection code, and all environment and package dependencies are pre-installed. All you have to do is to **pull and run the docker image**. Then you can enjoy PaddleDetection without any extra steps.
Get these images and guidance in [docker hub](https://hub.docker.com/repository/docker/paddlecloud/paddledetection), including CPU, GPU, ROCm environment versions.
If you have some customized requirements about automatic building docker images, you can get it in github repo [PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton).
## Inference demo
**Congratulation!** Now you have installed PaddleDetection successfully and try our inference demo:
```
# Predict an image by GPU
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
```
An image of the same name with the predicted result will be generated under the `output` folder.
The result is as shown below
![](../images/000000014439.jpg)

View File

@@ -0,0 +1,130 @@
[English](INSTALL.md) | 简体中文
# 安装文档
## 环境要求
- PaddlePaddle 2.3.2
- OS 64位操作系统
- Python 3(3.5.1+/3.6/3.7/3.8/3.9/3.10)64位版本
- pip/pip3(9.0.1+)64位版本
- CUDA >= 10.2
- cuDNN >= 7.6
PaddleDetection 依赖 PaddlePaddle 版本关系:
| PaddleDetection版本 | PaddlePaddle版本 | 备注 |
| :------------------: | :---------------: | :-------: |
| develop | >=2.3.2 | 默认使用动态图模式 |
| release/2.6 | >=2.3.2 | 默认使用动态图模式 |
| release/2.5 | >= 2.2.2 | 默认使用动态图模式 |
| release/2.4 | >= 2.2.2 | 默认使用动态图模式 |
| release/2.3 | >= 2.2.0rc | 默认使用动态图模式 |
| release/2.2 | >= 2.1.2 | 默认使用动态图模式 |
| release/2.1 | >= 2.1.0 | 默认使用动态图模式 |
| release/2.0 | >= 2.0.1 | 默认使用动态图模式 |
| release/2.0-rc | >= 2.0.1 | -- |
| release/0.5 | >= 1.8.4 | 大部分模型>=1.8.4即可运行Cascade R-CNN系列模型与SOLOv2依赖2.0.0.rc版本 |
| release/0.4 | >= 1.8.4 | PP-YOLO依赖1.8.4 |
| release/0.3 | >=1.7 | -- |
## 安装说明
### 1. 安装PaddlePaddle
```
# CUDA10.2
python -m pip install paddlepaddle-gpu==2.3.2 -i https://mirror.baidu.com/pypi/simple
# CPU
python -m pip install paddlepaddle==2.3.2 -i https://mirror.baidu.com/pypi/simple
```
- 更多CUDA版本或环境快速安装请参考[PaddlePaddle快速安装文档](https://www.paddlepaddle.org.cn/install/quick)
- 更多安装方式例如conda或源码编译安装方法请参考[PaddlePaddle安装文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/index_cn.html)
请确保您的PaddlePaddle安装成功并且版本不低于需求版本。使用以下命令进行验证。
```
# 在您的Python解释器中确认PaddlePaddle安装成功
>>> import paddle
>>> paddle.utils.run_check()
# 确认PaddlePaddle版本
python -c "import paddle; print(paddle.__version__)"
```
**注意**
1. 如果您希望在多卡环境下使用PaddleDetection请首先安装NCCL
### 2. 安装PaddleDetection
**注意:** pip安装方式只支持Python3
```
# 克隆PaddleDetection仓库
cd <path/to/clone/PaddleDetection>
git clone https://github.com/PaddlePaddle/PaddleDetection.git
# 安装其他依赖
cd PaddleDetection
pip install -r requirements.txt
# 编译安装paddledet
python setup.py install
```
**注意**
1. 如果github下载代码较慢可尝试使用[gitee](https://gitee.com/PaddlePaddle/PaddleDetection.git)或者[代理加速](https://doc.fastgit.org/zh-cn/guide.html)。
1. 若您使用的是Windows系统由于原版cocoapi不支持Windows`pycocotools`依赖可能安装失败可采用第三方实现版本该版本仅支持Python3
```pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI```
2. 若您使用的是Python <= 3.6的版本,安装`pycocotools`可能会报错`distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('cython>=0.27.3')`, 您可通过先安装`cython`如`pip install cython`解决该问题
安装后确认测试通过:
```
python ppdet/modeling/tests/test_architectures.py
```
测试通过后会提示如下信息:
```
.......
----------------------------------------------------------------------
Ran 7 tests in 12.816s
OK
```
## 使用Docker镜像
> 如果您没有Docker运行环境请参考[Docker官网](https://www.docker.com/)进行安装。
我们提供了包含最新 PaddleDetection 代码的docker镜像并预先安装好了所有的环境和库依赖您只需要**拉取docker镜像**,然后**运行docker镜像**无需其他任何额外操作即可开始使用PaddleDetection的所有功能。
在[Docker Hub](https://hub.docker.com/repository/docker/paddlecloud/paddledetection)中获取这些镜像及相应的使用指南包括CPU、GPU、ROCm版本。
如果您对自动化制作docker镜像感兴趣或有自定义需求请访问[PaddlePaddle/PaddleCloud](https://github.com/PaddlePaddle/PaddleCloud/tree/main/tekton)做进一步了解。
## 快速体验
**恭喜!** 您已经成功安装了PaddleDetection接下来快速体验目标检测效果
```
# 在GPU上预测一张图片
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
```
会在`output`文件夹下生成一个画有预测结果的同名图像。
结果如下图:
![](../images/000000014439.jpg)

View File

@@ -0,0 +1,299 @@
**# config yaml配置项说明**
KeyPoint 使用时config文件配置项说明以[tinypose_256x192.yml](../../configs/keypoint/tiny_pose/tinypose_256x192.yml)为例
```yaml
use_gpu: true #是否使用gpu训练
log_iter: 5 #打印log的iter间隔
save_dir: output #模型保存目录
snapshot_epoch: 10 #保存模型epoch间隔
weights: output/tinypose_256x192/model_final #测试加载模型路径(不含后缀“.pdparams”
epoch: 420 #总训练epoch数量
num_joints: &num_joints 17 #关键点数量
pixel_std: &pixel_std 200 #变换时相对比率像素(无需关注,不动就行)
metric: KeyPointTopDownCOCOEval #metric评估函数
num_classes: 1 #种类数(检测模型用,不需关注)
train_height: &train_height 256 #模型输入尺度高度变量设置
train_width: &train_width 192 #模型输入尺度宽度变量设置
trainsize: &trainsize [*train_width, *train_height] #模型输入尺寸,使用已定义变量
hmsize: &hmsize [48, 64] #输出热力图尺寸(宽,高)
flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #左右关键点经图像翻转时对应关系,例如:图像翻转后,左手腕变成了右手腕,右手腕变成了左手腕
\#####model
architecture: TopDownHRNet #模型框架结构类选择
TopDownHRNet: #TopDownHRNet相关配置
backbone: LiteHRNet #模型主干网络
post_process: HRNetPostProcess #模型后处理类
flip_perm: *flip_perm #同上flip_perm
num_joints: *num_joints #关键点数量(输出通道数量)
width: &width 40 #backbone输出通道数
loss: KeyPointMSELoss #loss函数选择
use_dark: true #是否使用DarkPose后处理
LiteHRNet: #LiteHRNet相关配置
network_type: wider_naive #网络结构类型选择
freeze_at: -1 #梯度截断branch id截断则该branch梯度不会反传
freeze_norm: false #是否固定normalize层参数
return_idx: [0] #返回feature的branch id
KeyPointMSELoss: #Loss相关配置
use_target_weight: true #是否使用关键点权重
loss_scale: 1.0 #loss比率调整1.0表示不变
\#####optimizer
LearningRate: #学习率相关配置
base_lr: 0.002 #初始基础学习率
schedulers:
\- !PiecewiseDecay #衰减策略
milestones: [380, 410] #衰减时间对应epoch次数
gamma: 0.1 #衰减率
\- !LinearWarmup #Warmup策略
start_factor: 0.001 #warmup初始学习率比率
steps: 500 #warmup所用iter次数
OptimizerBuilder: #学习策略设置
optimizer:
type: Adam #学习策略Adam
regularizer:
factor: 0.0 #正则项权重
type: L2 #正则类型L2/L1
\#####data
TrainDataset: #训练数据集设置
!KeypointTopDownCocoDataset #数据加载类
image_dir: "" #图片文件夹,对应dataset_dir/image_dir
anno_path: aic_coco_train_cocoformat.json #训练数据Json文件coco格式
dataset_dir: dataset #训练数据集所在路径image_dir、anno_path路径基于此目录
num_joints: *num_joints #关键点数量,使用已定义变量
trainsize: *trainsize #训练使用尺寸,使用已定义变量
pixel_std: *pixel_std #同上pixel_std
use_gt_bbox: True #是否使用gt框
EvalDataset: #评估数据集设置
!KeypointTopDownCocoDataset #数据加载类
image_dir: val2017 #图片文件夹
anno_path: annotations/person_keypoints_val2017.json #评估数据Json文件coco格式
dataset_dir: dataset/coco #数据集路径image_dir、anno_path路径基于此目录
num_joints: *num_joints #关键点数量,使用已定义变量
trainsize: *trainsize #训练使用尺寸,使用已定义变量
pixel_std: *pixel_std #同上pixel_std
use_gt_bbox: True #是否使用gt框一般测试时用
image_thre: 0.5 #检测框阈值设置测试时使用非gt_bbox时用
TestDataset: #纯测试数据集设置无label
!ImageFolder #数据加载类,图片文件夹类型
anno_path: dataset/coco/keypoint_imagelist.txt #测试图片列表文件
worker_num: 2 #数据加载worker数量一般2-4太多可能堵塞
global_mean: &global_mean [0.485, 0.456, 0.406] #全局均值变量设置
global_std: &global_std [0.229, 0.224, 0.225] #全局方差变量设置
TrainReader: #训练数据加载类设置
sample_transforms: #数据预处理变换设置
\- RandomFlipHalfBodyTransform: #随机翻转&随机半身变换类
scale: 0.25 #最大缩放尺度比例
rot: 30 #最大旋转角度
num_joints_half_body: 8 #关键点小于此数不做半身变换
prob_half_body: 0.3 #半身变换执行概率(满足关键点数量前提下)
pixel_std: *pixel_std #同上pixel_std
trainsize: *trainsize #训练尺度同上trainsize
upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #上半身关键点id
flip_pairs: *flip_perm #左右关键点对应关系同上flip_perm
\- AugmentationbyInformantionDropping:
prob_cutout: 0.5 #随机擦除变换概率
offset_factor: 0.05 #擦除位置中心点随机波动范围相对图片宽度比例
num_patch: 1 #擦除位置数量
trainsize: *trainsize #同上trainsize
\- TopDownAffine:
trainsize: *trainsize #同上trainsize
use_udp: true #是否使用udp_unbiasflip测试使用
\- ToHeatmapsTopDown_DARK: #生成热力图gt类
hmsize: *hmsize #热力图尺寸
sigma: 2 #生成高斯核sigma值设置
batch_transforms:
\- NormalizeImage: #图像归一化类
mean: *global_mean #均值设置,使用已有变量
std: *global_std #方差设置,使用已有变量
is_scale: true #图像元素是否除255.,即[0,255]到[0,1]
\- Permute: {} #通道变换HWC->CHW,一般都需要
batch_size: 128 #训练时batchsize
shuffle: true #数据集是否shuffle
drop_last: false #数据集对batchsize取余数量是否丢弃
EvalReader:
sample_transforms: #数据预处理变换设置意义同TrainReader
\- TopDownAffine: #Affine变换设置
trainsize: *trainsize #训练尺寸同上trainsize使用已有变量
use_udp: true #是否使用udp_unbias与训练需对应
batch_transforms:
\- NormalizeImage: #图片归一化,与训练需对应
mean: *global_mean
std: *global_std
is_scale: true
\- Permute: {} #通道变换HWC->CHW
batch_size: 16 #测试时batchsize
TestReader:
inputs_def:
image_shape: [3, *train_height, *train_width] #输入数据维度设置CHW
sample_transforms:
\- Decode: {} #图片加载
\- TopDownEvalAffine: #Affine类Eval时用
trainsize: *trainsize #输入图片尺度
\- NormalizeImage: #输入图像归一化
mean: *global_mean #均值
std: *global_std #方差
is_scale: true #图像元素是否除255.,即[0,255]到[0,1]
\- Permute: {} #通道变换HWC->CHW
batch_size: 1 #Test batchsize
fuse_normalize: false #导出模型时是否内融合归一化操作若是预处理中可省略normalize可以加快pipeline速度
```

View File

@@ -0,0 +1,299 @@
**# config yaml guide**
KeyPoint config guideTake an example of [tinypose_256x192.yml](../../configs/keypoint/tiny_pose/tinypose_256x192.yml)
```yaml
use_gpu: true #train with gpu or not
log_iter: 5 #print log every 5 iter
save_dir: output #the directory to save model
snapshot_epoch: 10 #save model every 10 epochs
weights: output/tinypose_256x192/model_final #the weight to load(without postfix “.pdparams”
epoch: 420 #the total epoch number to train
num_joints: &num_joints 17 #number of joints
pixel_std: &pixel_std 200 #the standard pixel lengthdon't care
metric: KeyPointTopDownCOCOEval #metric function
num_classes: 1 #number of classesjust for object detection, don't care
train_height: &train_height 256 #the height of model input
train_width: &train_width 192 #the width of model input
trainsize: &trainsize [*train_width, *train_height] #the shape of model input
hmsize: &hmsize [48, 64] #the shape of model output
flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] #the correspondence between left and right keypoint id, for example: left wrist become right wrist after image flip, and also the right wrist becomes left wrist
\#####model
architecture: TopDownHRNet #the model architecture
TopDownHRNet: #TopDownHRNet configs
backbone: LiteHRNet #which backbone to use
post_process: HRNetPostProcess #the post_process to use
flip_perm: *flip_perm #same to the upper "flip_perm"
num_joints: *num_joints #the joint numberthe number of output channels
width: &width 40 #backbone output channels
loss: KeyPointMSELoss #loss funciton
use_dark: true #whther to use DarkPose in postprocess
LiteHRNet: #LiteHRNet configs
network_type: wider_naive #the network type of backbone
freeze_at: -1 #the branch match this id doesn't backward-1 means all branch backward
freeze_norm: false #whether to freeze normalize weights
return_idx: [0] #the branch id to fetch features
KeyPointMSELoss: #Loss configs
use_target_weight: true #whether to use target weights
loss_scale: 1.0 #loss weightsfinalloss = loss*loss_scale
\#####optimizer
LearningRate: #LearningRate configs
base_lr: 0.002 #the original base learning rate
schedulers:
\- !PiecewiseDecay #the scheduler to adjust learning rate
milestones: [380, 410] #the milestones(epochs) to adjust learning rate
gamma: 0.1 #the ratio to adjust learning rate, new_lr = lr*gamma
\- !LinearWarmup #Warmup configs
start_factor: 0.001 #the original ratio with respect to base_lr
steps: 500 #iters used to warmup
OptimizerBuilder: #Optimizer type configs
optimizer:
type: Adam #optimizer type: Adam
regularizer:
factor: 0.0 #the regularizer weight
type: L2 #regularizer type: L2/L1
\#####data
TrainDataset: #Train Dataset configs
!KeypointTopDownCocoDataset #the dataset class to load data
image_dir: "" #the image directory, relative to dataset_dir
anno_path: aic_coco_train_cocoformat.json #the train datalistcoco format, relative to dataset_dir
dataset_dir: dataset #the dataset directory, the image_dir and anno_path based on this directory
num_joints: *num_joints #joint numbers
trainsize: *trainsize #the input size of model
pixel_std: *pixel_std #same to the upper "pixel_std"
use_gt_bbox: True #whether to use gt bbox, commonly used in eval
EvalDataset: #Eval Dataset configs
!KeypointTopDownCocoDataset #the dataset class to load data
image_dir: val2017 #the image directory, relative to dataset_dir
anno_path: annotations/person_keypoints_val2017.json #the eval datalistcoco format, relative to dataset_dir
dataset_dir: dataset/coco #the dataset directory, the image_dir and anno_path based on this directory
num_joints: *num_joints #joint numbers
trainsize: *trainsize #the input size of model
pixel_std: *pixel_std #same to the upper "pixel_std"
use_gt_bbox: True #whether to use gt bbox, commonly used in eval
image_thre: 0.5 #the threshold of detected rect, used while use_gt_bbox is False
TestDataset: #the test dataset without label
!ImageFolder #the class to load data, find images by folder
anno_path: dataset/coco/keypoint_imagelist.txt #the image list file
worker_num: 2 #the workers to load Dataset
global_mean: &global_mean [0.485, 0.456, 0.406] #means used to normalize image
global_std: &global_std [0.229, 0.224, 0.225] #stds used to normalize image
TrainReader: #TrainReader configs
sample_transforms: #transform configs
\- RandomFlipHalfBodyTransform: #random flip & random HalfBodyTransform
scale: 0.25 #the maximum scale for size transform
rot: 30 #the maximum rotation to transoform
num_joints_half_body: 8 #the HalfBodyTransform is skiped while joints found is less than this number
prob_half_body: 0.3 #the ratio of halfbody transform
pixel_std: *pixel_std #same to upper "pixel_std"
trainsize: *trainsize #the input size of model
upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] #the joint id which is belong to upper body
flip_pairs: *flip_perm #same to the upper "flip_perm"
\- AugmentationbyInformantionDropping:
prob_cutout: 0.5 #the probability to cutout keypoint
offset_factor: 0.05 #the jitter offset of cutout position, expressed as a percentage of trainwidth
num_patch: 1 #the numbers of area to cutout
trainsize: *trainsize #same to upper "trainsize"
\- TopDownAffine:
trainsize: *trainsize #same to upper "trainsize"
use_udp: true #whether to use udp_unbiasjust for flip eval
\- ToHeatmapsTopDown_DARK: #generate gt heatmaps
hmsize: *hmsize #the size of output heatmaps
sigma: 2 #the sigma of gaussin kernel which used to generate gt heatmaps
batch_transforms:
\- NormalizeImage: #image normalize class
mean: *global_mean #mean of normalize
std: *global_std #std of normalize
is_scale: true #whether scale by 1/255 to every image pixelstransform pixel from [0,255] to [0,1]
\- Permute: {} #channel transform from HWC to CHW
batch_size: 128 #batchsize used for train
shuffle: true #whether to shuffle the images before train
drop_last: false #whether drop the last images which is not enogh for batchsize
EvalReader:
sample_transforms: #transform configs
\- TopDownAffine: #Affine configs
trainsize: *trainsize #same to upper "trainsize"
use_udp: true #whether to use udp_unbiasjust for flip eval
batch_transforms:
\- NormalizeImage: #image normalize, the values should be same to values in TrainReader
mean: *global_mean
std: *global_std
is_scale: true
\- Permute: {} #channel transform from HWC to CHW
batch_size: 16 #batchsize used for test
TestReader:
inputs_def:
image_shape: [3, *train_height, *train_width] #the input dimensions used in modelCHW
sample_transforms:
\- Decode: {} #load image
\- TopDownEvalAffine: #Affine class used in Eval
trainsize: *trainsize #the input size of model
\- NormalizeImage: #image normalize, the values should be same to values in TrainReader
mean: *global_mean #mean of normalize
std: *global_std #std of normalize
is_scale: true #whether scale by 1/255 to every image pixelstransform pixel from [0,255] to [0,1]
\- Permute: {} #channel transform from HWC to CHW
batch_size: 1 #Test batchsize
fuse_normalize: false #whether fuse the normalize into model while export model, this speedup the model infer
```

View File

@@ -0,0 +1,91 @@
English | [简体中文](QUICK_STARTED_cn.md)
# Quick Start
In order to enable users to experience PaddleDetection and produce models in a short time, this tutorial introduces the pipeline to get a decent object detection model by finetuning on a small dataset in 10 minutes only. In practical applications, it is recommended that users select a suitable model configuration file for their specific demand.
- **Set GPU**
```bash
export CUDA_VISIBLE_DEVICES=0
```
## Inference Demo with Pre-trained Models
```
# predict an image using PP-YOLO
python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
```
the result
![](../images/000000014439.jpg)
## Data preparation
The Dataset is [Kaggle dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) including 877 images and 4 data categories: crosswalk, speedlimit, stop, trafficlight. The dataset is divided into training set (701 images) and test set (176 images)[download link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
```
# Note: this command could skip and
# the dataset will be dowloaded automatically at the stage of training.
python dataset/roadsign_voc/download_roadsign_voc.py
```
## Training & Evaluation & Inference
### 1、Training
```
# It will takes about 10 minutes on 1080Ti and 1 hour on CPU
# -c set configuration file
# -o overwrite the settings in the configuration file
# --eval Evaluate while training, and a model named best_model.pdmodel with the most evaluation results will be automatically saved
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true
```
If you want to observe the loss change curve in real time through VisualDL, add --use_vdl=true to the training command, and set the log save path through --vdl_log_dir.
**Note: VisualDL need Python>=3.5**
Please install [VisualDL](https://github.com/PaddlePaddle/VisualDL) first
```
python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple
```
```
python -u tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
--use_vdl=true \
--vdl_log_dir=vdl_dir/scalar \
--eval
```
View the change curve in real time through the visualdl command:
```
visualdl --logdir vdl_dir/scalar/ --host <host_IP> --port <port_num>
```
### 2、Evaluation
```
# Evaluate best_model by default
# -c set config file
# -o overwrite the settings in the configuration file
python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true
```
The final mAP should be around 0.85. The dataset is small so the precision may vary a little after each training.
### 3、Inference
```
# -c set config file
# -o overwrite the settings in the configuration file
# --infer_img image path
# After the prediction is over, an image of the same name with the prediction result will be generated in the output folder
python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png
```
The result is as shown below
![](../images/road554.png)

View File

@@ -0,0 +1,88 @@
[English](QUICK_STARTED.md) | 简体中文
# 快速开始
为了使得用户能够在很短时间内快速产出模型掌握PaddleDetection的使用方式这篇教程通过一个预训练检测模型对小数据集进行finetune。在较短时间内即可产出一个效果不错的模型。实际业务中建议用户根据需要选择合适模型配置文件进行适配。
- **设置显卡**
```bash
export CUDA_VISIBLE_DEVICES=0
```
## 一、快速体验
```
# 用PP-YOLO算法在COCO数据集上预训练模型预测一张图片
python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
```
结果如下图:
![demo image](../images/000000014439.jpg)
## 二、准备数据
数据集参考[Kaggle数据集](https://www.kaggle.com/andrewmvd/road-sign-detection) 包含877张图像数据类别4类crosswalkspeedlimitstoptrafficlight。
将数据划分为训练集701张图和测试集176张图[下载链接](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
```
# 注意:可跳过这步下载,后面训练会自动下载
python dataset/roadsign_voc/download_roadsign_voc.py
```
## 三、训练、评估、预测
### 1、训练
```
# 边训练边测试 CPU需要约1小时(use_gpu=false)1080Ti GPU需要约10分钟
# -c 参数表示指定使用哪个配置文件
# -o 参数表示指定配置文件中的全局变量覆盖配置文件中的设置这里设置使用gpu
# --eval 参数表示边训练边评估最后会自动保存一个名为model_final.pdparams的模型
python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true
```
如果想通过VisualDL实时观察loss变化曲线在训练命令中添加--use_vdl=true以及通过--vdl_log_dir设置日志保存路径。
**但注意VisualDL需Python>=3.5**
首先安装[VisualDL](https://github.com/PaddlePaddle/VisualDL)
```
python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple
```
```
python -u tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
--use_vdl=true \
--vdl_log_dir=vdl_dir/scalar \
--eval
```
通过visualdl命令实时查看变化曲线
```
visualdl --logdir vdl_dir/scalar/ --host <host_IP> --port <port_num>
```
### 2、评估
```
# 评估 默认使用训练过程中保存的model_final.pdparams
# -c 参数表示指定使用哪个配置文件
# -o 参数表示指定配置文件中的全局变量(覆盖配置文件中的设置)
# 目前只支持单卡评估
python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true
```
最终模型精度在mAP=0.85左右,由于数据集较小因此每次训练结束后精度会有一定波动
### 3、预测
```
# -c 参数表示指定使用哪个配置文件
# -o 参数表示指定配置文件中的全局变量(覆盖配置文件中的设置)
# --infer_img 参数指定预测图像路径
# 预测结束后会在output文件夹中生成一张画有预测结果的同名图像
python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png
```
结果如下图:
![road554 image](../images/road554.png)

View File

@@ -0,0 +1,261 @@
# RCNN系列模型参数配置教程
标签: 模型参数配置
`faster_rcnn_r50_fpn_1x_coco.yml`为例,这个模型由五个子配置文件组成:
- 数据配置文件 `coco_detection.yml`
```yaml
# 数据评估类型
metric: COCO
# 数据集的类别数
num_classes: 80
# TrainDataset
TrainDataset:
!COCODataSet
# 图像数据路径,相对 dataset_dir 路径os.path.join(dataset_dir, image_dir)
image_dir: train2017
# 标注文件路径,相对 dataset_dir 路径os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_train2017.json
# 数据文件夹
dataset_dir: dataset/coco
# data_fields
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
# 图像数据路径,相对 dataset_dir 路径os.path.join(dataset_dir, image_dir)
image_dir: val2017
# 标注文件路径,相对 dataset_dir 路径os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_val2017.json
# 数据文件夹
dataset_dir: dataset/coco
TestDataset:
!ImageFolder
# 标注文件路径,相对 dataset_dir 路径os.path.join(dataset_dir, anno_path)
anno_path: annotations/instances_val2017.json
```
- 优化器配置文件 `optimizer_1x.yml`
```yaml
# 总训练轮数
epoch: 12
# 学习率设置
LearningRate:
# 默认为8卡训学习率
base_lr: 0.01
# 学习率调整策略
schedulers:
- !PiecewiseDecay
gamma: 0.1
# 学习率变化位置(轮数)
milestones: [8, 11]
- !LinearWarmup
start_factor: 0.1
steps: 1000
# 优化器
OptimizerBuilder:
# 优化器
optimizer:
momentum: 0.9
type: Momentum
# 正则化
regularizer:
factor: 0.0001
type: L2
```
- 数据读取配置文件 `faster_fpn_reader.yml`
```yaml
# 每张GPU reader进程个数
worker_num: 2
# 训练数据
TrainReader:
# 训练数据transforms
sample_transforms:
- Decode: {}
- RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True}
- RandomFlip: {prob: 0.5}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
# 由于模型存在FPN结构输入图片需要padding为32的倍数
- PadBatch: {pad_to_stride: 32}
# 训练时batch_size
batch_size: 1
# 读取数据是否乱序
shuffle: true
# 是否丢弃最后不能完整组成batch的数据
drop_last: true
# 表示reader是否对gt进行组batch的操作在rcnn系列算法中设置为false得到的gt格式为list[Tensor]
collate_batch: false
# 评估数据
EvalReader:
# 评估数据transforms
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
# 由于模型存在FPN结构输入图片需要padding为32的倍数
- PadBatch: {pad_to_stride: 32}
# 评估时batch_size
batch_size: 1
# 读取数据是否乱序
shuffle: false
# 是否丢弃最后不能完整组成batch的数据
drop_last: false
# 测试数据
TestReader:
# 测试数据transforms
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
# 由于模型存在FPN结构输入图片需要padding为32的倍数
- PadBatch: {pad_to_stride: 32}
# 测试时batch_size
batch_size: 1
# 读取数据是否乱序
shuffle: false
# 是否丢弃最后不能完整组成batch的数据
drop_last: false
```
- 模型配置文件 `faster_rcnn_r50_fpn.yml`
```yaml
# 模型结构类型
architecture: FasterRCNN
# 预训练模型地址
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
# FasterRCNN
FasterRCNN:
# backbone
backbone: ResNet
# neck
neck: FPN
# rpn_head
rpn_head: RPNHead
# bbox_head
bbox_head: BBoxHead
# post process
bbox_post_process: BBoxPostProcess
# backbone
ResNet:
# index 0 stands for res2
depth: 50
# norm_type可设置参数bn 或 sync_bn
norm_type: bn
# freeze_at index, 0 represent res2
freeze_at: 0
# return_idx
return_idx: [0,1,2,3]
# num_stages
num_stages: 4
# FPN
FPN:
# channel of FPN
out_channel: 256
# RPNHead
RPNHead:
# anchor generator
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
anchor_sizes: [[32], [64], [128], [256], [512]]
strides: [4, 8, 16, 32, 64]
# rpn_target_assign
rpn_target_assign:
batch_size_per_im: 256
fg_fraction: 0.5
negative_overlap: 0.3
positive_overlap: 0.7
use_random: True
# 训练时生成proposal的参数
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 1000
topk_after_collect: True
# 评估时生成proposal的参数
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
# BBoxHead
BBoxHead:
# TwoFCHead as BBoxHead
head: TwoFCHead
# roi align
roi_extractor:
resolution: 7
sampling_ratio: 0
aligned: True
# bbox_assigner
bbox_assigner: BBoxAssigner
# BBoxAssigner
BBoxAssigner:
# batch_size_per_im
batch_size_per_im: 512
# 背景阈值
bg_thresh: 0.5
# 前景阈值
fg_thresh: 0.5
# 前景比例
fg_fraction: 0.25
# 是否随机采样
use_random: True
# TwoFCHead
TwoFCHead:
# TwoFCHead特征维度
out_channel: 1024
# BBoxPostProcess
BBoxPostProcess:
# 解码
decode: RCNNBox
# nms
nms:
# 使用MultiClassNMS
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.05
nms_threshold: 0.5
```
- 运行时置文件 `runtime.yml`
```yaml
# 是否使用gpu
use_gpu: true
# 日志打印间隔
log_iter: 20
# save_dir
save_dir: output
# 模型保存间隔时间
snapshot_epoch: 1
```

Some files were not shown because too many files have changed in this diff Show More