更换文档检测模型

This commit is contained in:
2024-08-27 14:42:45 +08:00
parent aea6f19951
commit 1514e09c40
2072 changed files with 254336 additions and 4967 deletions

View File

@@ -1,18 +0,0 @@
loss/
data/
cache/
tf_cache/
debug/
results/
misc/outputs
evaluation/evaluate_object
evaluation/analyze_object
nnet/__pycache__/
*.swp
*.pyc
*.o*

View File

@@ -1,29 +0,0 @@
BSD 3-Clause License
Copyright (c) 2019, Princeton University
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -1,152 +0,0 @@
# CornerNet-Lite: Training, Evaluation and Testing Code
Code for reproducing results in the following paper:
[**CornerNet-Lite: Efficient Keypoint Based Object Detection**](https://arxiv.org/abs/1904.08900)
Hei Law, Yun Teng, Olga Russakovsky, Jia Deng
*arXiv:1904.08900*
## Getting Started
### Software Requirement
- Python 3.7
- PyTorch 1.0.0
- CUDA 10
- GCC 4.9.2 or above
### Installing Dependencies
Please first install [Anaconda](https://anaconda.org) and create an Anaconda environment using the provided package list `conda_packagelist.txt`.
```
conda create --name CornerNet_Lite --file conda_packagelist.txt --channel pytorch
```
After you create the environment, please activate it.
```
source activate CornerNet_Lite
```
### Compiling Corner Pooling Layers
Compile the C++ implementation of the corner pooling layers. (GCC4.9.2 or above is required.)
```
cd <CornerNet-Lite dir>/core/models/py_utils/_cpools/
python setup.py install --user
```
### Compiling NMS
Compile the NMS code which are originally from [Faster R-CNN](https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/nms/cpu_nms.pyx) and [Soft-NMS](https://github.com/bharatsingh430/soft-nms/blob/master/lib/nms/cpu_nms.pyx).
```
cd <CornerNet-Lite dir>/core/external
make
```
### Downloading Models
In this repo, we provide models for the following detectors:
- [CornerNet-Saccade](https://drive.google.com/file/d/1MQDyPRI0HgDHxHToudHqQ-2m8TVBciaa/view?usp=sharing)
- [CornerNet-Squeeze](https://drive.google.com/file/d/1qM8BBYCLUBcZx_UmLT0qMXNTh-Yshp4X/view?usp=sharing)
- [CornerNet](https://drive.google.com/file/d/1e8At_iZWyXQgLlMwHkB83kN-AN85Uff1/view?usp=sharing)
Put the CornerNet-Saccade model under `<CornerNet-Lite dir>/cache/nnet/CornerNet_Saccade/`, CornerNet-Squeeze model under `<CornerNet-Lite dir>/cache/nnet/CornerNet_Squeeze/` and CornerNet model under `<CornerNet-Lite dir>/cache/nnet/CornerNet/`. (\* Note we use underscore instead of dash in both the directory names for CornerNet-Saccade and CornerNet-Squeeze.)
Note: The CornerNet model is the same as the one in the original [CornerNet repo](https://github.com/princeton-vl/CornerNet). We just ported it to this new repo.
### Running the Demo Script
After downloading the models, you should be able to use the detectors on your own images. We provide a demo script `demo.py` to test if the repo is installed correctly.
```
python demo.py
```
This script applies CornerNet-Saccade to `demo.jpg` and writes the results to `demo_out.jpg`.
In the demo script, the default detector is CornerNet-Saccade. You can modify the demo script to test different detectors. For example, if you want to test CornerNet-Squeeze:
```python
#!/usr/bin/env python
import cv2
from core.detectors import CornerNet_Squeeze
from core.vis_utils import draw_bboxes
detector = CornerNet_Squeeze()
image = cv2.imread("demo.jpg")
bboxes = detector(image)
image = draw_bboxes(image, bboxes)
cv2.imwrite("demo_out.jpg", image)
```
### Using CornerNet-Lite in Your Project
It is also easy to use CornerNet-Lite in your project. You will need to change the directory name from `CornerNet-Lite` to `CornerNet_Lite`. Otherwise, you won't be able to import CornerNet-Lite.
```
Your project
│ README.md
│ ...
│ foo.py
└───CornerNet_Lite
└───directory1
└───...
```
In `foo.py`, you can easily import CornerNet-Saccade by adding:
```python
from CornerNet_Lite import CornerNet_Saccade
def foo():
cornernet = CornerNet_Saccade()
# CornerNet_Saccade is ready to use
image = cv2.imread('/path/to/your/image')
bboxes = cornernet(image)
```
If you want to train or evaluate the detectors on COCO, please move on to the following steps.
## Training and Evaluation
### Installing MS COCO APIs
```
mkdir -p <CornerNet-Lite dir>/data
cd <CornerNet-Lite dir>/data
git clone git@github.com:cocodataset/cocoapi.git coco
cd <CornerNet-Lite dir>/data/coco/PythonAPI
make install
```
### Downloading MS COCO Data
- Download the training/validation split we use in our paper from [here](https://drive.google.com/file/d/1dop4188xo5lXDkGtOZUzy2SHOD_COXz4/view?usp=sharing) (originally from [Faster R-CNN](https://github.com/rbgirshick/py-faster-rcnn/tree/master/data))
- Unzip the file and place `annotations` under `<CornerNet-Lite dir>/data/coco`
- Download the images (2014 Train, 2014 Val, 2017 Test) from [here](http://cocodataset.org/#download)
- Create 3 directories, `trainval2014`, `minival2014` and `testdev2017`, under `<CornerNet-Lite dir>/data/coco/images/`
- Copy the training/validation/testing images to the corresponding directories according to the annotation files
To train and evaluate a network, you will need to create a configuration file, which defines the hyperparameters, and a model file, which defines the network architecture. The configuration file should be in JSON format and placed in `<CornerNet-Lite dir>/configs/`. Each configuration file should have a corresponding model file in `<CornerNet-Lite dir>/core/models/`. i.e. If there is a `<model>.json` in `<CornerNet-Lite dir>/configs/`, there should be a `<model>.py` in `<CornerNet-Lite dir>/core/models/`. There is only one exception which we will mention later.
### Training and Evaluating a Model
To train a model:
```
python train.py <model>
```
We provide the configuration files and the model files for CornerNet-Saccade, CornerNet-Squeeze and CornerNet in this repo. Please check the configuration files in `<CornerNet-Lite dir>/configs/`.
To train CornerNet-Saccade:
```
python train.py CornerNet_Saccade
```
Please adjust the batch size in `CornerNet_Saccade.json` to accommodate the number of GPUs that are available to you.
To evaluate the trained model:
```
python evaluate.py CornerNet_Saccade --testiter 500000 --split <split>
```
If you want to test different hyperparameters during evaluation and do not want to overwrite the original configuration file, you can do so by creating a configuration file with a suffix (`<model>-<suffix>.json`). There is no need to create `<model>-<suffix>.py` in `<CornerNet-Lite dir>/core/models/`.
To use the new configuration file:
```
python evaluate.py <model> --testiter <iter> --split <split> --suffix <suffix>
```
We also include a configuration file for CornerNet under multi-scale setting, which is `CornerNet-multi_scale.json`, in this repo.
To use the multi-scale configuration file:
```
python evaluate.py CornerNet --testiter <iter> --split <split> --suffix multi_scale

View File

@@ -1,2 +0,0 @@
from .core.detectors import CornerNet, CornerNet_Squeeze, CornerNet_Saccade
from .core.vis_utils import draw_bboxes

View File

@@ -1,81 +0,0 @@
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
blas=1.0=mkl
bzip2=1.0.6=h14c3975_5
ca-certificates=2018.12.5=0
cairo=1.14.12=h8948797_3
certifi=2018.11.29=py37_0
cffi=1.11.5=py37he75722e_1
cuda100=1.0=0
cycler=0.10.0=py37_0
cython=0.28.5=py37hf484d3e_0
dbus=1.13.2=h714fa37_1
expat=2.2.6=he6710b0_0
ffmpeg=4.0=hcdf2ecd_0
fontconfig=2.13.0=h9420a91_0
freeglut=3.0.0=hf484d3e_5
freetype=2.9.1=h8a8886c_1
glib=2.56.2=hd408876_0
graphite2=1.3.12=h23475e2_2
gst-plugins-base=1.14.0=hbbd80ab_1
gstreamer=1.14.0=hb453b48_1
harfbuzz=1.8.8=hffaf4a1_0
hdf5=1.10.2=hba1933b_1
icu=58.2=h9c2bf20_1
intel-openmp=2019.0=118
jasper=2.0.14=h07fcdf6_1
jpeg=9b=h024ee3a_2
kiwisolver=1.0.1=py37hf484d3e_0
libedit=3.1.20170329=h6b74fdf_2
libffi=3.2.1=hd88cf55_4
libgcc-ng=8.2.0=hdf63c60_1
libgfortran-ng=7.3.0=hdf63c60_0
libglu=9.0.0=hf484d3e_1
libopencv=3.4.2=hb342d67_1
libopus=1.2.1=hb9ed12e_0
libpng=1.6.35=hbc83047_0
libstdcxx-ng=8.2.0=hdf63c60_1
libtiff=4.0.9=he85c1e1_2
libuuid=1.0.3=h1bed415_2
libvpx=1.7.0=h439df22_0
libxcb=1.13=h1bed415_1
libxml2=2.9.8=h26e45fe_1
matplotlib=3.0.2=py37h5429711_0
mkl=2018.0.3=1
mkl_fft=1.0.6=py37h7dd41cf_0
mkl_random=1.0.1=py37h4414c95_1
ncurses=6.1=hf484d3e_0
ninja=1.8.2=py37h6bb024c_1
numpy=1.15.4=py37h1d66e8a_0
numpy-base=1.15.4=py37h81de0dd_0
olefile=0.46=py37_0
opencv=3.4.2=py37h6fd60c2_1
openssl=1.1.1a=h7b6447c_0
pcre=8.42=h439df22_0
pillow=5.2.0=py37heded4f4_0
pip=10.0.1=py37_0
pixman=0.34.0=hceecf20_3
py-opencv=3.4.2=py37hb342d67_1
pycparser=2.18=py37_1
pyparsing=2.2.0=py37_1
pyqt=5.9.2=py37h05f1152_2
python=3.7.1=h0371630_3
python-dateutil=2.7.3=py37_0
pytorch=1.0.0=py3.7_cuda10.0.130_cudnn7.4.1_1
pytz=2018.5=py37_0
qt=5.9.7=h5867ecd_1
readline=7.0=h7b6447c_5
scikit-learn=0.19.1=py37hedc7406_0
scipy=1.1.0=py37hfa4b5c9_1
setuptools=40.2.0=py37_0
sip=4.19.8=py37hf484d3e_0
six=1.11.0=py37_1
sqlite=3.25.3=h7b6447c_0
tk=8.6.8=hbc83047_0
torchvision=0.2.1=py37_1
tornado=5.1=py37h14c3975_0
tqdm=4.25.0=py37h28b3542_0
wheel=0.31.1=py37_0
xz=5.2.4=h14c3975_4
zlib=1.2.11=ha838bed_2

View File

@@ -1,54 +0,0 @@
{
"system": {
"dataset": "COCO",
"batch_size": 49,
"sampling_function": "cornernet",
"train_split": "trainval",
"val_split": "minival",
"learning_rate": 0.00025,
"decay_rate": 10,
"val_iter": 100,
"opt_algo": "adam",
"prefetch_size": 5,
"max_iter": 500000,
"stepsize": 450000,
"snapshot": 5000,
"chunk_sizes": [4, 5, 5, 5, 5, 5, 5, 5, 5, 5],
"data_dir": "./data"
},
"db": {
"rand_scale_min": 0.6,
"rand_scale_max": 1.4,
"rand_scale_step": 0.1,
"rand_scales": null,
"rand_crop": true,
"rand_color": true,
"border": 128,
"gaussian_bump": true,
"input_size": [511, 511],
"output_sizes": [[128, 128]],
"test_scales": [0.5, 0.75, 1, 1.25, 1.5],
"top_k": 100,
"categories": 80,
"ae_threshold": 0.5,
"nms_threshold": 0.5,
"merge_bbox": true,
"weight_exp": 10,
"max_per_image": 100
}
}

View File

@@ -1,52 +0,0 @@
{
"system": {
"dataset": "COCO",
"batch_size": 49,
"sampling_function": "cornernet",
"train_split": "trainval",
"val_split": "minival",
"learning_rate": 0.00025,
"decay_rate": 10,
"val_iter": 100,
"opt_algo": "adam",
"prefetch_size": 5,
"max_iter": 500000,
"stepsize": 450000,
"snapshot": 5000,
"chunk_sizes": [4, 5, 5, 5, 5, 5, 5, 5, 5, 5],
"data_dir": "./data"
},
"db": {
"rand_scale_min": 0.6,
"rand_scale_max": 1.4,
"rand_scale_step": 0.1,
"rand_scales": null,
"rand_crop": true,
"rand_color": true,
"border": 128,
"gaussian_bump": true,
"gaussian_iou": 0.3,
"input_size": [511, 511],
"output_sizes": [[128, 128]],
"test_scales": [1],
"top_k": 100,
"categories": 80,
"ae_threshold": 0.5,
"nms_threshold": 0.5,
"max_per_image": 100
}
}

View File

@@ -1,56 +0,0 @@
{
"system": {
"dataset": "COCO",
"batch_size": 48,
"sampling_function": "cornernet_saccade",
"train_split": "trainval",
"val_split": "minival",
"learning_rate": 0.00025,
"decay_rate": 10,
"val_iter": 100,
"opt_algo": "adam",
"prefetch_size": 5,
"max_iter": 500000,
"stepsize": 450000,
"snapshot": 5000,
"chunk_sizes": [12, 12, 12, 12]
},
"db": {
"rand_scale_min": 0.5,
"rand_scale_max": 1.1,
"rand_scale_step": 0.1,
"rand_scales": null,
"rand_full_crop": true,
"gaussian_bump": true,
"gaussian_iou": 0.5,
"min_scale": 16,
"view_sizes": [],
"height_mult": 31,
"width_mult": 31,
"input_size": [255, 255],
"output_sizes": [[64, 64]],
"att_max_crops": 30,
"att_scales": [[1, 2, 4]],
"att_thresholds": [0.3],
"top_k": 12,
"num_dets": 12,
"categories": 80,
"ae_threshold": 0.3,
"nms_threshold": 0.5,
"max_per_image": 100
}
}

View File

@@ -1,54 +0,0 @@
{
"system": {
"dataset": "COCO",
"batch_size": 55,
"sampling_function": "cornernet",
"train_split": "trainval",
"val_split": "minival",
"learning_rate": 0.00025,
"decay_rate": 10,
"val_iter": 100,
"opt_algo": "adam",
"prefetch_size": 5,
"max_iter": 500000,
"stepsize": 450000,
"snapshot": 5000,
"chunk_sizes": [13, 14, 14, 14],
"data_dir": "./data"
},
"db": {
"rand_scale_min": 0.6,
"rand_scale_max": 1.4,
"rand_scale_step": 0.1,
"rand_scales": null,
"rand_crop": true,
"rand_color": true,
"border": 128,
"gaussian_bump": true,
"gaussian_iou": 0.3,
"input_size": [511, 511],
"output_sizes": [[64, 64]],
"test_scales": [1],
"test_flipped": false,
"top_k": 20,
"num_dets": 100,
"categories": 80,
"ae_threshold": 0.5,
"nms_threshold": 0.5,
"max_per_image": 100
}
}

View File

@@ -1,39 +0,0 @@
import json
from .nnet.py_factory import NetworkFactory
class Base(object):
def __init__(self, db, nnet, func, model=None):
super(Base, self).__init__()
self._db = db
self._nnet = nnet
self._func = func
if model is not None:
self._nnet.load_pretrained_params(model)
self._nnet.cuda()
self._nnet.eval_mode()
def _inference(self, image, *args, **kwargs):
return self._func(self._db, self._nnet, image.copy(), *args, **kwargs)
def __call__(self, image, *args, **kwargs):
categories = self._db.configs["categories"]
bboxes = self._inference(image, *args, **kwargs)
return {self._db.cls2name(j): bboxes[j] for j in range(1, categories + 1)}
def load_cfg(cfg_file):
with open(cfg_file, "r") as f:
cfg = json.load(f)
cfg_sys = cfg["system"]
cfg_db = cfg["db"]
return cfg_sys, cfg_db
def load_nnet(cfg_sys, model):
return NetworkFactory(cfg_sys, model)

View File

@@ -1,164 +0,0 @@
import os
import numpy as np
class SystemConfig(object):
def __init__(self):
self._configs = {}
self._configs["dataset"] = None
self._configs["sampling_function"] = "coco_detection"
# Training Config
self._configs["display"] = 5
self._configs["snapshot"] = 400
self._configs["stepsize"] = 5000
self._configs["learning_rate"] = 0.001
self._configs["decay_rate"] = 10
self._configs["max_iter"] = 100000
self._configs["val_iter"] = 20
self._configs["batch_size"] = 1
self._configs["snapshot_name"] = None
self._configs["prefetch_size"] = 100
self._configs["pretrain"] = None
self._configs["opt_algo"] = "adam"
self._configs["chunk_sizes"] = None
# Directories
self._configs["data_dir"] = "./data"
self._configs["cache_dir"] = "./cache"
self._configs["config_dir"] = "./config"
self._configs["result_dir"] = "./results"
# Split
self._configs["train_split"] = "training"
self._configs["val_split"] = "validation"
self._configs["test_split"] = "testdev"
# Rng
self._configs["data_rng"] = np.random.RandomState(123)
self._configs["nnet_rng"] = np.random.RandomState(317)
@property
def chunk_sizes(self):
return self._configs["chunk_sizes"]
@property
def train_split(self):
return self._configs["train_split"]
@property
def val_split(self):
return self._configs["val_split"]
@property
def test_split(self):
return self._configs["test_split"]
@property
def full(self):
return self._configs
@property
def sampling_function(self):
return self._configs["sampling_function"]
@property
def data_rng(self):
return self._configs["data_rng"]
@property
def nnet_rng(self):
return self._configs["nnet_rng"]
@property
def opt_algo(self):
return self._configs["opt_algo"]
@property
def prefetch_size(self):
return self._configs["prefetch_size"]
@property
def pretrain(self):
return self._configs["pretrain"]
@property
def result_dir(self):
result_dir = os.path.join(self._configs["result_dir"], self.snapshot_name)
if not os.path.exists(result_dir):
os.makedirs(result_dir)
return result_dir
@property
def dataset(self):
return self._configs["dataset"]
@property
def snapshot_name(self):
return self._configs["snapshot_name"]
@property
def snapshot_dir(self):
snapshot_dir = os.path.join(self.cache_dir, "nnet", self.snapshot_name)
if not os.path.exists(snapshot_dir):
os.makedirs(snapshot_dir)
return snapshot_dir
@property
def snapshot_file(self):
snapshot_file = os.path.join(self.snapshot_dir, self.snapshot_name + "_{}.pkl")
return snapshot_file
@property
def config_dir(self):
return self._configs["config_dir"]
@property
def batch_size(self):
return self._configs["batch_size"]
@property
def max_iter(self):
return self._configs["max_iter"]
@property
def learning_rate(self):
return self._configs["learning_rate"]
@property
def decay_rate(self):
return self._configs["decay_rate"]
@property
def stepsize(self):
return self._configs["stepsize"]
@property
def snapshot(self):
return self._configs["snapshot"]
@property
def display(self):
return self._configs["display"]
@property
def val_iter(self):
return self._configs["val_iter"]
@property
def data_dir(self):
return self._configs["data_dir"]
@property
def cache_dir(self):
if not os.path.exists(self._configs["cache_dir"]):
os.makedirs(self._configs["cache_dir"])
return self._configs["cache_dir"]
def update_config(self, new):
for key in new:
if key in self._configs:
self._configs[key] = new[key]
return self

View File

@@ -1,5 +0,0 @@
from .coco import COCO
datasets = {
"COCO": COCO
}

View File

@@ -1,74 +0,0 @@
import os
import numpy as np
class BASE(object):
def __init__(self):
self._split = None
self._db_inds = []
self._image_ids = []
self._mean = np.zeros((3,), dtype=np.float32)
self._std = np.ones((3,), dtype=np.float32)
self._eig_val = np.ones((3,), dtype=np.float32)
self._eig_vec = np.zeros((3, 3), dtype=np.float32)
self._configs = {}
self._configs["data_aug"] = True
self._data_rng = None
@property
def configs(self):
return self._configs
@property
def mean(self):
return self._mean
@property
def std(self):
return self._std
@property
def eig_val(self):
return self._eig_val
@property
def eig_vec(self):
return self._eig_vec
@property
def db_inds(self):
return self._db_inds
@property
def split(self):
return self._split
def update_config(self, new):
for key in new:
if key in self._configs:
self._configs[key] = new[key]
def image_ids(self, ind):
return self._image_ids[ind]
def image_path(self, ind):
pass
def write_result(self, ind, all_bboxes, all_scores):
pass
def evaluate(self, name):
pass
def shuffle_inds(self, quiet=False):
if self._data_rng is None:
self._data_rng = np.random.RandomState(os.getpid())
if not quiet:
print("shuffling indices...")
rand_perm = self._data_rng.permutation(len(self._db_inds))
self._db_inds = self._db_inds[rand_perm]

View File

@@ -1,169 +0,0 @@
import os
import numpy as np
from .detection import DETECTION
# COCO bounding boxes are 0-indexed
class COCO(DETECTION):
def __init__(self, db_config, split=None, sys_config=None):
assert split is None or sys_config is not None
super(COCO, self).__init__(db_config)
self._mean = np.array([0.40789654, 0.44719302, 0.47026115], dtype=np.float32)
self._std = np.array([0.28863828, 0.27408164, 0.27809835], dtype=np.float32)
self._eig_val = np.array([0.2141788, 0.01817699, 0.00341571], dtype=np.float32)
self._eig_vec = np.array([
[-0.58752847, -0.69563484, 0.41340352],
[-0.5832747, 0.00994535, -0.81221408],
[-0.56089297, 0.71832671, 0.41158938]
], dtype=np.float32)
self._coco_cls_ids = [
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13,
14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 27, 28, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 67, 70,
72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 84, 85, 86, 87, 88, 89, 90
]
self._coco_cls_names = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',
'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
'snowboard', 'sports ball', 'kite', 'baseball bat',
'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork',
'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
'oven', 'toaster', 'sink', 'refrigerator', 'book',
'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
'toothbrush'
]
self._cls2coco = {ind + 1: coco_id for ind, coco_id in enumerate(self._coco_cls_ids)}
self._coco2cls = {coco_id: cls_id for cls_id, coco_id in self._cls2coco.items()}
self._coco2name = {cls_id: cls_name for cls_id, cls_name in zip(self._coco_cls_ids, self._coco_cls_names)}
self._name2coco = {cls_name: cls_id for cls_name, cls_id in self._coco2name.items()}
if split is not None:
coco_dir = os.path.join(sys_config.data_dir, "coco")
self._split = {
"trainval": "trainval2014",
"minival": "minival2014",
"testdev": "testdev2017"
}[split]
self._data_dir = os.path.join(coco_dir, "images", self._split)
self._anno_file = os.path.join(coco_dir, "annotations", "instances_{}.json".format(self._split))
self._detections, self._eval_ids = self._load_coco_annos()
self._image_ids = list(self._detections.keys())
self._db_inds = np.arange(len(self._image_ids))
def _load_coco_annos(self):
from pycocotools.coco import COCO
coco = COCO(self._anno_file)
self._coco = coco
class_ids = coco.getCatIds()
image_ids = coco.getImgIds()
eval_ids = {}
detections = {}
for image_id in image_ids:
image = coco.loadImgs(image_id)[0]
dets = []
eval_ids[image["file_name"]] = image_id
for class_id in class_ids:
annotation_ids = coco.getAnnIds(imgIds=image["id"], catIds=class_id)
annotations = coco.loadAnns(annotation_ids)
category = self._coco2cls[class_id]
for annotation in annotations:
det = annotation["bbox"] + [category]
det[2] += det[0]
det[3] += det[1]
dets.append(det)
file_name = image["file_name"]
if len(dets) == 0:
detections[file_name] = np.zeros((0, 5), dtype=np.float32)
else:
detections[file_name] = np.array(dets, dtype=np.float32)
return detections, eval_ids
def image_path(self, ind):
if self._data_dir is None:
raise ValueError("Data directory is not set")
db_ind = self._db_inds[ind]
file_name = self._image_ids[db_ind]
return os.path.join(self._data_dir, file_name)
def detections(self, ind):
db_ind = self._db_inds[ind]
file_name = self._image_ids[db_ind]
return self._detections[file_name].copy()
def cls2name(self, cls):
coco = self._cls2coco[cls]
return self._coco2name[coco]
def _to_float(self, x):
return float("{:.2f}".format(x))
def convert_to_coco(self, all_bboxes):
detections = []
for image_id in all_bboxes:
coco_id = self._eval_ids[image_id]
for cls_ind in all_bboxes[image_id]:
category_id = self._cls2coco[cls_ind]
for bbox in all_bboxes[image_id][cls_ind]:
bbox[2] -= bbox[0]
bbox[3] -= bbox[1]
score = bbox[4]
bbox = list(map(self._to_float, bbox[0:4]))
detection = {
"image_id": coco_id,
"category_id": category_id,
"bbox": bbox,
"score": float("{:.2f}".format(score))
}
detections.append(detection)
return detections
def evaluate(self, result_json, cls_ids, image_ids):
from pycocotools.cocoeval import COCOeval
if self._split == "testdev":
return None
coco = self._coco
eval_ids = [self._eval_ids[image_id] for image_id in image_ids]
cat_ids = [self._cls2coco[cls_id] for cls_id in cls_ids]
coco_dets = coco.loadRes(result_json)
coco_eval = COCOeval(coco, coco_dets, "bbox")
coco_eval.params.imgIds = eval_ids
coco_eval.params.catIds = cat_ids
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
return coco_eval.stats[0], coco_eval.stats[12:]

View File

@@ -1,71 +0,0 @@
import numpy as np
from .base import BASE
class DETECTION(BASE):
def __init__(self, db_config):
super(DETECTION, self).__init__()
# Configs for training
self._configs["categories"] = 80
self._configs["rand_scales"] = [1]
self._configs["rand_scale_min"] = 0.8
self._configs["rand_scale_max"] = 1.4
self._configs["rand_scale_step"] = 0.2
# Configs for both training and testing
self._configs["input_size"] = [383, 383]
self._configs["output_sizes"] = [[96, 96], [48, 48], [24, 24], [12, 12]]
self._configs["score_threshold"] = 0.05
self._configs["nms_threshold"] = 0.7
self._configs["max_per_set"] = 40
self._configs["max_per_image"] = 100
self._configs["top_k"] = 20
self._configs["ae_threshold"] = 1
self._configs["nms_kernel"] = 3
self._configs["num_dets"] = 1000
self._configs["nms_algorithm"] = "exp_soft_nms"
self._configs["weight_exp"] = 8
self._configs["merge_bbox"] = False
self._configs["data_aug"] = True
self._configs["lighting"] = True
self._configs["border"] = 64
self._configs["gaussian_bump"] = False
self._configs["gaussian_iou"] = 0.7
self._configs["gaussian_radius"] = -1
self._configs["rand_crop"] = False
self._configs["rand_color"] = False
self._configs["rand_center"] = True
self._configs["init_sizes"] = [192, 255]
self._configs["view_sizes"] = []
self._configs["min_scale"] = 16
self._configs["max_scale"] = 32
self._configs["att_sizes"] = [[16, 16], [32, 32], [64, 64]]
self._configs["att_ranges"] = [[96, 256], [32, 96], [0, 32]]
self._configs["att_ratios"] = [16, 8, 4]
self._configs["att_scales"] = [1, 1.5, 2]
self._configs["att_thresholds"] = [0.3, 0.3, 0.3, 0.3]
self._configs["att_nms_ks"] = [3, 3, 3]
self._configs["att_max_crops"] = 8
self._configs["ref_dets"] = True
# Configs for testing
self._configs["test_scales"] = [1]
self._configs["test_flipped"] = True
self.update_config(db_config)
if self._configs["rand_scales"] is None:
self._configs["rand_scales"] = np.arange(
self._configs["rand_scale_min"],
self._configs["rand_scale_max"],
self._configs["rand_scale_step"]
)

View File

@@ -1,52 +0,0 @@
from .base import Base, load_cfg, load_nnet
from .config import SystemConfig
from .dbs.coco import COCO
from .paths import get_file_path
class CornerNet(Base):
def __init__(self):
from .test.cornernet import cornernet_inference
from .models.CornerNet import model
cfg_path = get_file_path("..", "configs", "CornerNet.json")
model_path = get_file_path("..", "cache", "nnet", "CornerNet", "CornerNet_500000.pkl")
cfg_sys, cfg_db = load_cfg(cfg_path)
sys_cfg = SystemConfig().update_config(cfg_sys)
coco = COCO(cfg_db)
cornernet = load_nnet(sys_cfg, model())
super(CornerNet, self).__init__(coco, cornernet, cornernet_inference, model=model_path)
class CornerNet_Squeeze(Base):
def __init__(self):
from .test.cornernet import cornernet_inference
from .models.CornerNet_Squeeze import model
cfg_path = get_file_path("..", "configs", "CornerNet_Squeeze.json")
model_path = get_file_path("..", "cache", "nnet", "CornerNet_Squeeze", "CornerNet_Squeeze_500000.pkl")
cfg_sys, cfg_db = load_cfg(cfg_path)
sys_cfg = SystemConfig().update_config(cfg_sys)
coco = COCO(cfg_db)
cornernet = load_nnet(sys_cfg, model())
super(CornerNet_Squeeze, self).__init__(coco, cornernet, cornernet_inference, model=model_path)
class CornerNet_Saccade(Base):
def __init__(self):
from .test.cornernet_saccade import cornernet_saccade_inference
from .models.CornerNet_Saccade import model
cfg_path = get_file_path("..", "configs", "CornerNet_Saccade.json")
model_path = get_file_path("..", "cache", "nnet", "CornerNet_Saccade", "CornerNet_Saccade_500000.pkl")
cfg_sys, cfg_db = load_cfg(cfg_path)
sys_cfg = SystemConfig().update_config(cfg_sys)
coco = COCO(cfg_db)
cornernet = load_nnet(sys_cfg, model())
super(CornerNet_Saccade, self).__init__(coco, cornernet, cornernet_saccade_inference, model=model_path)

View File

@@ -1,7 +0,0 @@
bbox.c
bbox.cpython-35m-x86_64-linux-gnu.so
bbox.cpython-36m-x86_64-linux-gnu.so
nms.c
nms.cpython-35m-x86_64-linux-gnu.so
nms.cpython-36m-x86_64-linux-gnu.so

View File

@@ -1,3 +0,0 @@
all:
python setup.py build_ext --inplace
rm -rf build

View File

@@ -1,55 +0,0 @@
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Sergey Karayev
# --------------------------------------------------------
cimport cython
import numpy as np
cimport numpy as np
DTYPE = np.float
ctypedef np.float_t DTYPE_t
def bbox_overlaps(
np.ndarray[DTYPE_t, ndim=2] boxes,
np.ndarray[DTYPE_t, ndim=2] query_boxes):
"""
Parameters
----------
boxes: (N, 4) ndarray of float
query_boxes: (K, 4) ndarray of float
Returns
-------
overlaps: (N, K) ndarray of overlap between boxes and query_boxes
"""
cdef unsigned int N = boxes.shape[0]
cdef unsigned int K = query_boxes.shape[0]
cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
cdef DTYPE_t iw, ih, box_area
cdef DTYPE_t ua
cdef unsigned int k, n
for k in range(K):
box_area = (
(query_boxes[k, 2] - query_boxes[k, 0] + 1) *
(query_boxes[k, 3] - query_boxes[k, 1] + 1)
)
for n in range(N):
iw = (
min(boxes[n, 2], query_boxes[k, 2]) -
max(boxes[n, 0], query_boxes[k, 0]) + 1
)
if iw > 0:
ih = (
min(boxes[n, 3], query_boxes[k, 3]) -
max(boxes[n, 1], query_boxes[k, 1]) + 1
)
if ih > 0:
ua = float(
(boxes[n, 2] - boxes[n, 0] + 1) *
(boxes[n, 3] - boxes[n, 1] + 1) +
box_area - iw * ih
)
overlaps[n, k] = iw * ih / ua
return overlaps

View File

@@ -1,281 +0,0 @@
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import numpy as np
cimport numpy as np
cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
return a if a >= b else b
cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
return a if a <= b else b
def nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1]
cdef int ndets = dets.shape[0]
cdef np.ndarray[np.int_t, ndim=1] suppressed = \
np.zeros((ndets), dtype=np.int)
# nominal indices
cdef int _i, _j
# sorted indices
cdef int i, j
# temp variables for box i's (the box currently under consideration)
cdef np.float32_t ix1, iy1, ix2, iy2, iarea
# variables for computing overlap with box j (lower scoring box)
cdef np.float32_t xx1, yy1, xx2, yy2
cdef np.float32_t w, h
cdef np.float32_t inter, ovr
keep = []
for _i in range(ndets):
i = order[_i]
if suppressed[i] == 1:
continue
keep.append(i)
ix1 = x1[i]
iy1 = y1[i]
ix2 = x2[i]
iy2 = y2[i]
iarea = areas[i]
for _j in range(_i + 1, ndets):
j = order[_j]
if suppressed[j] == 1:
continue
xx1 = max(ix1, x1[j])
yy1 = max(iy1, y1[j])
xx2 = min(ix2, x2[j])
yy2 = min(iy2, y2[j])
w = max(0.0, xx2 - xx1 + 1)
h = max(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (iarea + areas[j] - inter)
if ovr >= thresh:
suppressed[j] = 1
return keep
def soft_nms(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001,
unsigned int method=0):
cdef unsigned int N = boxes.shape[0]
cdef float iw, ih, box_area
cdef float ua
cdef int pos = 0
cdef float maxscore = 0
cdef int maxpos = 0
cdef float x1, x2, y1, y2, tx1, tx2, ty1, ty2, ts, area, weight, ov
for i in range(N):
maxscore = boxes[i, 4]
maxpos = i
tx1 = boxes[i, 0]
ty1 = boxes[i, 1]
tx2 = boxes[i, 2]
ty2 = boxes[i, 3]
ts = boxes[i, 4]
pos = i + 1
# get max box
while pos < N:
if maxscore < boxes[pos, 4]:
maxscore = boxes[pos, 4]
maxpos = pos
pos = pos + 1
# add max box as a detection
boxes[i, 0] = boxes[maxpos, 0]
boxes[i, 1] = boxes[maxpos, 1]
boxes[i, 2] = boxes[maxpos, 2]
boxes[i, 3] = boxes[maxpos, 3]
boxes[i, 4] = boxes[maxpos, 4]
# swap ith box with position of max box
boxes[maxpos, 0] = tx1
boxes[maxpos, 1] = ty1
boxes[maxpos, 2] = tx2
boxes[maxpos, 3] = ty2
boxes[maxpos, 4] = ts
tx1 = boxes[i, 0]
ty1 = boxes[i, 1]
tx2 = boxes[i, 2]
ty2 = boxes[i, 3]
ts = boxes[i, 4]
pos = i + 1
# NMS iterations, note that N changes if detection boxes fall below threshold
while pos < N:
x1 = boxes[pos, 0]
y1 = boxes[pos, 1]
x2 = boxes[pos, 2]
y2 = boxes[pos, 3]
s = boxes[pos, 4]
area = (x2 - x1 + 1) * (y2 - y1 + 1)
iw = (min(tx2, x2) - max(tx1, x1) + 1)
if iw > 0:
ih = (min(ty2, y2) - max(ty1, y1) + 1)
if ih > 0:
ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
ov = iw * ih / ua #iou between max box and detection box
if method == 1: # linear
if ov > Nt:
weight = 1 - ov
else:
weight = 1
elif method == 2: # gaussian
weight = np.exp(-(ov * ov) / sigma)
else: # original NMS
if ov > Nt:
weight = 0
else:
weight = 1
boxes[pos, 4] = weight * boxes[pos, 4]
# if box score falls below threshold, discard the box by swapping with last box
# update N
if boxes[pos, 4] < threshold:
boxes[pos, 0] = boxes[N - 1, 0]
boxes[pos, 1] = boxes[N - 1, 1]
boxes[pos, 2] = boxes[N - 1, 2]
boxes[pos, 3] = boxes[N - 1, 3]
boxes[pos, 4] = boxes[N - 1, 4]
N = N - 1
pos = pos - 1
pos = pos + 1
keep = [i for i in range(N)]
return keep
def soft_nms_merge(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001,
unsigned int method=0, float weight_exp=6):
cdef unsigned int N = boxes.shape[0]
cdef float iw, ih, box_area
cdef float ua
cdef int pos = 0
cdef float maxscore = 0
cdef int maxpos = 0
cdef float x1, x2, y1, y2, tx1, tx2, ty1, ty2, ts, area, weight, ov
cdef float mx1, mx2, my1, my2, mts, mbs, mw
for i in range(N):
maxscore = boxes[i, 4]
maxpos = i
tx1 = boxes[i, 0]
ty1 = boxes[i, 1]
tx2 = boxes[i, 2]
ty2 = boxes[i, 3]
ts = boxes[i, 4]
pos = i + 1
# get max box
while pos < N:
if maxscore < boxes[pos, 4]:
maxscore = boxes[pos, 4]
maxpos = pos
pos = pos + 1
# add max box as a detection
boxes[i, 0] = boxes[maxpos, 0]
boxes[i, 1] = boxes[maxpos, 1]
boxes[i, 2] = boxes[maxpos, 2]
boxes[i, 3] = boxes[maxpos, 3]
boxes[i, 4] = boxes[maxpos, 4]
mx1 = boxes[i, 0] * boxes[i, 5]
my1 = boxes[i, 1] * boxes[i, 5]
mx2 = boxes[i, 2] * boxes[i, 6]
my2 = boxes[i, 3] * boxes[i, 6]
mts = boxes[i, 5]
mbs = boxes[i, 6]
# swap ith box with position of max box
boxes[maxpos, 0] = tx1
boxes[maxpos, 1] = ty1
boxes[maxpos, 2] = tx2
boxes[maxpos, 3] = ty2
boxes[maxpos, 4] = ts
tx1 = boxes[i, 0]
ty1 = boxes[i, 1]
tx2 = boxes[i, 2]
ty2 = boxes[i, 3]
ts = boxes[i, 4]
pos = i + 1
# NMS iterations, note that N changes if detection boxes fall below threshold
while pos < N:
x1 = boxes[pos, 0]
y1 = boxes[pos, 1]
x2 = boxes[pos, 2]
y2 = boxes[pos, 3]
s = boxes[pos, 4]
area = (x2 - x1 + 1) * (y2 - y1 + 1)
iw = (min(tx2, x2) - max(tx1, x1) + 1)
if iw > 0:
ih = (min(ty2, y2) - max(ty1, y1) + 1)
if ih > 0:
ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
ov = iw * ih / ua #iou between max box and detection box
if method == 1: # linear
if ov > Nt:
weight = 1 - ov
else:
weight = 1
elif method == 2: # gaussian
weight = np.exp(-(ov * ov) / sigma)
else: # original NMS
if ov > Nt:
weight = 0
else:
weight = 1
mw = (1 - weight) ** weight_exp
mx1 = mx1 + boxes[pos, 0] * boxes[pos, 5] * mw
my1 = my1 + boxes[pos, 1] * boxes[pos, 5] * mw
mx2 = mx2 + boxes[pos, 2] * boxes[pos, 6] * mw
my2 = my2 + boxes[pos, 3] * boxes[pos, 6] * mw
mts = mts + boxes[pos, 5] * mw
mbs = mbs + boxes[pos, 6] * mw
boxes[pos, 4] = weight * boxes[pos, 4]
# if box score falls below threshold, discard the box by swapping with last box
# update N
if boxes[pos, 4] < threshold:
boxes[pos, 0] = boxes[N - 1, 0]
boxes[pos, 1] = boxes[N - 1, 1]
boxes[pos, 2] = boxes[N - 1, 2]
boxes[pos, 3] = boxes[N - 1, 3]
boxes[pos, 4] = boxes[N - 1, 4]
N = N - 1
pos = pos - 1
pos = pos + 1
boxes[i, 0] = mx1 / mts
boxes[i, 1] = my1 / mts
boxes[i, 2] = mx2 / mbs
boxes[i, 3] = my2 / mbs
keep = [i for i in range(N)]
return keep

View File

@@ -1,24 +0,0 @@
from distutils.core import setup
from distutils.extension import Extension
import numpy
from Cython.Build import cythonize
extensions = [
Extension(
"bbox",
["bbox.pyx"],
extra_compile_args=[]
),
Extension(
"nms",
["nms.pyx"],
extra_compile_args=[]
)
]
setup(
name="coco",
ext_modules=cythonize(extensions),
include_dirs=[numpy.get_include()]
)

View File

@@ -1,73 +0,0 @@
import torch
import torch.nn as nn
from .py_utils import TopPool, BottomPool, LeftPool, RightPool
from .py_utils.losses import CornerNet_Loss
from .py_utils.modules import hg_module, hg, hg_net
from .py_utils.utils import convolution, residual, corner_pool
def make_pool_layer(dim):
return nn.Sequential()
def make_hg_layer(inp_dim, out_dim, modules):
layers = [residual(inp_dim, out_dim, stride=2)]
layers += [residual(out_dim, out_dim) for _ in range(1, modules)]
return nn.Sequential(*layers)
class model(hg_net):
def _pred_mod(self, dim):
return nn.Sequential(
convolution(3, 256, 256, with_bn=False),
nn.Conv2d(256, dim, (1, 1))
)
def _merge_mod(self):
return nn.Sequential(
nn.Conv2d(256, 256, (1, 1), bias=False),
nn.BatchNorm2d(256)
)
def __init__(self):
stacks = 2
pre = nn.Sequential(
convolution(7, 3, 128, stride=2),
residual(128, 256, stride=2)
)
hg_mods = nn.ModuleList([
hg_module(
5, [256, 256, 384, 384, 384, 512], [2, 2, 2, 2, 2, 4],
make_pool_layer=make_pool_layer,
make_hg_layer=make_hg_layer
) for _ in range(stacks)
])
cnvs = nn.ModuleList([convolution(3, 256, 256) for _ in range(stacks)])
inters = nn.ModuleList([residual(256, 256) for _ in range(stacks - 1)])
cnvs_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
inters_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
hgs = hg(pre, hg_mods, cnvs, inters, cnvs_, inters_)
tl_modules = nn.ModuleList([corner_pool(256, TopPool, LeftPool) for _ in range(stacks)])
br_modules = nn.ModuleList([corner_pool(256, BottomPool, RightPool) for _ in range(stacks)])
tl_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
br_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
for tl_heat, br_heat in zip(tl_heats, br_heats):
torch.nn.init.constant_(tl_heat[-1].bias, -2.19)
torch.nn.init.constant_(br_heat[-1].bias, -2.19)
tl_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
br_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
tl_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
br_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
super(model, self).__init__(
hgs, tl_modules, br_modules, tl_heats, br_heats,
tl_tags, br_tags, tl_offs, br_offs
)
self.loss = CornerNet_Loss(pull_weight=1e-1, push_weight=1e-1)

View File

@@ -1,93 +0,0 @@
import torch
import torch.nn as nn
from .py_utils import TopPool, BottomPool, LeftPool, RightPool
from .py_utils.losses import CornerNet_Saccade_Loss
from .py_utils.modules import saccade_net, saccade_module, saccade
from .py_utils.utils import convolution, residual, corner_pool
def make_pool_layer(dim):
return nn.Sequential()
def make_hg_layer(inp_dim, out_dim, modules):
layers = [residual(inp_dim, out_dim, stride=2)]
layers += [residual(out_dim, out_dim) for _ in range(1, modules)]
return nn.Sequential(*layers)
class model(saccade_net):
def _pred_mod(self, dim):
return nn.Sequential(
convolution(3, 256, 256, with_bn=False),
nn.Conv2d(256, dim, (1, 1))
)
def _merge_mod(self):
return nn.Sequential(
nn.Conv2d(256, 256, (1, 1), bias=False),
nn.BatchNorm2d(256)
)
def __init__(self):
stacks = 3
pre = nn.Sequential(
convolution(7, 3, 128, stride=2),
residual(128, 256, stride=2)
)
hg_mods = nn.ModuleList([
saccade_module(
3, [256, 384, 384, 512], [1, 1, 1, 1],
make_pool_layer=make_pool_layer,
make_hg_layer=make_hg_layer
) for _ in range(stacks)
])
cnvs = nn.ModuleList([convolution(3, 256, 256) for _ in range(stacks)])
inters = nn.ModuleList([residual(256, 256) for _ in range(stacks - 1)])
cnvs_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
inters_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
att_mods = nn.ModuleList([
nn.ModuleList([
nn.Sequential(
convolution(3, 384, 256, with_bn=False),
nn.Conv2d(256, 1, (1, 1))
),
nn.Sequential(
convolution(3, 384, 256, with_bn=False),
nn.Conv2d(256, 1, (1, 1))
),
nn.Sequential(
convolution(3, 256, 256, with_bn=False),
nn.Conv2d(256, 1, (1, 1))
)
]) for _ in range(stacks)
])
for att_mod in att_mods:
for att in att_mod:
torch.nn.init.constant_(att[-1].bias, -2.19)
hgs = saccade(pre, hg_mods, cnvs, inters, cnvs_, inters_)
tl_modules = nn.ModuleList([corner_pool(256, TopPool, LeftPool) for _ in range(stacks)])
br_modules = nn.ModuleList([corner_pool(256, BottomPool, RightPool) for _ in range(stacks)])
tl_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
br_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
for tl_heat, br_heat in zip(tl_heats, br_heats):
torch.nn.init.constant_(tl_heat[-1].bias, -2.19)
torch.nn.init.constant_(br_heat[-1].bias, -2.19)
tl_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
br_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
tl_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
br_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
super(model, self).__init__(
hgs, tl_modules, br_modules, tl_heats, br_heats,
tl_tags, br_tags, tl_offs, br_offs, att_mods
)
self.loss = CornerNet_Saccade_Loss(pull_weight=1e-1, push_weight=1e-1)

View File

@@ -1,117 +0,0 @@
import torch
import torch.nn as nn
from .py_utils import TopPool, BottomPool, LeftPool, RightPool
from .py_utils.losses import CornerNet_Loss
from .py_utils.modules import hg_module, hg, hg_net
from .py_utils.utils import convolution, corner_pool, residual
class fire_module(nn.Module):
def __init__(self, inp_dim, out_dim, sr=2, stride=1):
super(fire_module, self).__init__()
self.conv1 = nn.Conv2d(inp_dim, out_dim // sr, kernel_size=1, stride=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_dim // sr)
self.conv_1x1 = nn.Conv2d(out_dim // sr, out_dim // 2, kernel_size=1, stride=stride, bias=False)
self.conv_3x3 = nn.Conv2d(out_dim // sr, out_dim // 2, kernel_size=3, padding=1,
stride=stride, groups=out_dim // sr, bias=False)
self.bn2 = nn.BatchNorm2d(out_dim)
self.skip = (stride == 1 and inp_dim == out_dim)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
conv1 = self.conv1(x)
bn1 = self.bn1(conv1)
conv2 = torch.cat((self.conv_1x1(bn1), self.conv_3x3(bn1)), 1)
bn2 = self.bn2(conv2)
if self.skip:
return self.relu(bn2 + x)
else:
return self.relu(bn2)
def make_pool_layer(dim):
return nn.Sequential()
def make_unpool_layer(dim):
return nn.ConvTranspose2d(dim, dim, kernel_size=4, stride=2, padding=1)
def make_layer(inp_dim, out_dim, modules):
layers = [fire_module(inp_dim, out_dim)]
layers += [fire_module(out_dim, out_dim) for _ in range(1, modules)]
return nn.Sequential(*layers)
def make_layer_revr(inp_dim, out_dim, modules):
layers = [fire_module(inp_dim, inp_dim) for _ in range(modules - 1)]
layers += [fire_module(inp_dim, out_dim)]
return nn.Sequential(*layers)
def make_hg_layer(inp_dim, out_dim, modules):
layers = [fire_module(inp_dim, out_dim, stride=2)]
layers += [fire_module(out_dim, out_dim) for _ in range(1, modules)]
return nn.Sequential(*layers)
class model(hg_net):
def _pred_mod(self, dim):
return nn.Sequential(
convolution(1, 256, 256, with_bn=False),
nn.Conv2d(256, dim, (1, 1))
)
def _merge_mod(self):
return nn.Sequential(
nn.Conv2d(256, 256, (1, 1), bias=False),
nn.BatchNorm2d(256)
)
def __init__(self):
stacks = 2
pre = nn.Sequential(
convolution(7, 3, 128, stride=2),
residual(128, 256, stride=2),
residual(256, 256, stride=2)
)
hg_mods = nn.ModuleList([
hg_module(
4, [256, 256, 384, 384, 512], [2, 2, 2, 2, 4],
make_pool_layer=make_pool_layer,
make_unpool_layer=make_unpool_layer,
make_up_layer=make_layer,
make_low_layer=make_layer,
make_hg_layer_revr=make_layer_revr,
make_hg_layer=make_hg_layer
) for _ in range(stacks)
])
cnvs = nn.ModuleList([convolution(3, 256, 256) for _ in range(stacks)])
inters = nn.ModuleList([residual(256, 256) for _ in range(stacks - 1)])
cnvs_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
inters_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
hgs = hg(pre, hg_mods, cnvs, inters, cnvs_, inters_)
tl_modules = nn.ModuleList([corner_pool(256, TopPool, LeftPool) for _ in range(stacks)])
br_modules = nn.ModuleList([corner_pool(256, BottomPool, RightPool) for _ in range(stacks)])
tl_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
br_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
for tl_heat, br_heat in zip(tl_heats, br_heats):
torch.nn.init.constant_(tl_heat[-1].bias, -2.19)
torch.nn.init.constant_(br_heat[-1].bias, -2.19)
tl_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
br_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
tl_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
br_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
super(model, self).__init__(
hgs, tl_modules, br_modules, tl_heats, br_heats,
tl_tags, br_tags, tl_offs, br_offs
)
self.loss = CornerNet_Loss(pull_weight=1e-1, push_weight=1e-1)

View File

@@ -1 +0,0 @@
from ._cpools import TopPool, BottomPool, LeftPool, RightPool

View File

@@ -1,3 +0,0 @@
build/
cpools.egg-info/
dist/

View File

@@ -1,82 +0,0 @@
import bottom_pool
import left_pool
import right_pool
import top_pool
from torch import nn
from torch.autograd import Function
class TopPoolFunction(Function):
@staticmethod
def forward(ctx, input):
output = top_pool.forward(input)[0]
ctx.save_for_backward(input)
return output
@staticmethod
def backward(ctx, grad_output):
input = ctx.saved_variables[0]
output = top_pool.backward(input, grad_output)[0]
return output
class BottomPoolFunction(Function):
@staticmethod
def forward(ctx, input):
output = bottom_pool.forward(input)[0]
ctx.save_for_backward(input)
return output
@staticmethod
def backward(ctx, grad_output):
input = ctx.saved_variables[0]
output = bottom_pool.backward(input, grad_output)[0]
return output
class LeftPoolFunction(Function):
@staticmethod
def forward(ctx, input):
output = left_pool.forward(input)[0]
ctx.save_for_backward(input)
return output
@staticmethod
def backward(ctx, grad_output):
input = ctx.saved_variables[0]
output = left_pool.backward(input, grad_output)[0]
return output
class RightPoolFunction(Function):
@staticmethod
def forward(ctx, input):
output = right_pool.forward(input)[0]
ctx.save_for_backward(input)
return output
@staticmethod
def backward(ctx, grad_output):
input = ctx.saved_variables[0]
output = right_pool.backward(input, grad_output)[0]
return output
class TopPool(nn.Module):
def forward(self, x):
return TopPoolFunction.apply(x)
class BottomPool(nn.Module):
def forward(self, x):
return BottomPoolFunction.apply(x)
class LeftPool(nn.Module):
def forward(self, x):
return LeftPoolFunction.apply(x)
class RightPool(nn.Module):
def forward(self, x):
return RightPoolFunction.apply(x)

View File

@@ -1,15 +0,0 @@
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CppExtension
setup(
name="cpools",
ext_modules=[
CppExtension("top_pool", ["src/top_pool.cpp"]),
CppExtension("bottom_pool", ["src/bottom_pool.cpp"]),
CppExtension("left_pool", ["src/left_pool.cpp"]),
CppExtension("right_pool", ["src/right_pool.cpp"])
],
cmdclass={
"build_ext": BuildExtension
}
)

View File

@@ -1,80 +0,0 @@
#include <torch/torch.h>
#include <vector>
std::vector<at::Tensor> pool_forward(
at::Tensor input
) {
// Initialize output
at::Tensor output = at::zeros_like(input);
// Get height
int64_t height = input.size(2);
output.copy_(input);
for (int64_t ind = 1; ind < height; ind <<= 1) {
at::Tensor max_temp = at::slice(output, 2, ind, height);
at::Tensor cur_temp = at::slice(output, 2, ind, height);
at::Tensor next_temp = at::slice(output, 2, 0, height-ind);
at::max_out(max_temp, cur_temp, next_temp);
}
return {
output
};
}
std::vector<at::Tensor> pool_backward(
at::Tensor input,
at::Tensor grad_output
) {
auto output = at::zeros_like(input);
int32_t batch = input.size(0);
int32_t channel = input.size(1);
int32_t height = input.size(2);
int32_t width = input.size(3);
auto max_val = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat));
auto max_ind = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kLong));
auto input_temp = input.select(2, 0);
max_val.copy_(input_temp);
max_ind.fill_(0);
auto output_temp = output.select(2, 0);
auto grad_output_temp = grad_output.select(2, 0);
output_temp.copy_(grad_output_temp);
auto un_max_ind = max_ind.unsqueeze(2);
auto gt_mask = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kByte));
auto max_temp = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat));
for (int32_t ind = 0; ind < height - 1; ++ind) {
input_temp = input.select(2, ind + 1);
at::gt_out(gt_mask, input_temp, max_val);
at::masked_select_out(max_temp, input_temp, gt_mask);
max_val.masked_scatter_(gt_mask, max_temp);
max_ind.masked_fill_(gt_mask, ind + 1);
grad_output_temp = grad_output.select(2, ind + 1).unsqueeze(2);
output.scatter_add_(2, un_max_ind, grad_output_temp);
}
return {
output
};
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def(
"forward", &pool_forward, "Bottom Pool Forward",
py::call_guard<py::gil_scoped_release>()
);
m.def(
"backward", &pool_backward, "Bottom Pool Backward",
py::call_guard<py::gil_scoped_release>()
);
}

View File

@@ -1,80 +0,0 @@
#include <torch/torch.h>
#include <vector>
std::vector<at::Tensor> pool_forward(
at::Tensor input
) {
// Initialize output
at::Tensor output = at::zeros_like(input);
// Get width
int64_t width = input.size(3);
output.copy_(input);
for (int64_t ind = 1; ind < width; ind <<= 1) {
at::Tensor max_temp = at::slice(output, 3, 0, width-ind);
at::Tensor cur_temp = at::slice(output, 3, 0, width-ind);
at::Tensor next_temp = at::slice(output, 3, ind, width);
at::max_out(max_temp, cur_temp, next_temp);
}
return {
output
};
}
std::vector<at::Tensor> pool_backward(
at::Tensor input,
at::Tensor grad_output
) {
auto output = at::zeros_like(input);
int32_t batch = input.size(0);
int32_t channel = input.size(1);
int32_t height = input.size(2);
int32_t width = input.size(3);
auto max_val = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat));
auto max_ind = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kLong));
auto input_temp = input.select(3, width - 1);
max_val.copy_(input_temp);
max_ind.fill_(width - 1);
auto output_temp = output.select(3, width - 1);
auto grad_output_temp = grad_output.select(3, width - 1);
output_temp.copy_(grad_output_temp);
auto un_max_ind = max_ind.unsqueeze(3);
auto gt_mask = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kByte));
auto max_temp = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat));
for (int32_t ind = 1; ind < width; ++ind) {
input_temp = input.select(3, width - ind - 1);
at::gt_out(gt_mask, input_temp, max_val);
at::masked_select_out(max_temp, input_temp, gt_mask);
max_val.masked_scatter_(gt_mask, max_temp);
max_ind.masked_fill_(gt_mask, width - ind - 1);
grad_output_temp = grad_output.select(3, width - ind - 1).unsqueeze(3);
output.scatter_add_(3, un_max_ind, grad_output_temp);
}
return {
output
};
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def(
"forward", &pool_forward, "Left Pool Forward",
py::call_guard<py::gil_scoped_release>()
);
m.def(
"backward", &pool_backward, "Left Pool Backward",
py::call_guard<py::gil_scoped_release>()
);
}

View File

@@ -1,80 +0,0 @@
#include <torch/torch.h>
#include <vector>
std::vector<at::Tensor> pool_forward(
at::Tensor input
) {
// Initialize output
at::Tensor output = at::zeros_like(input);
// Get width
int64_t width = input.size(3);
output.copy_(input);
for (int64_t ind = 1; ind < width; ind <<= 1) {
at::Tensor max_temp = at::slice(output, 3, ind, width);
at::Tensor cur_temp = at::slice(output, 3, ind, width);
at::Tensor next_temp = at::slice(output, 3, 0, width-ind);
at::max_out(max_temp, cur_temp, next_temp);
}
return {
output
};
}
std::vector<at::Tensor> pool_backward(
at::Tensor input,
at::Tensor grad_output
) {
at::Tensor output = at::zeros_like(input);
int32_t batch = input.size(0);
int32_t channel = input.size(1);
int32_t height = input.size(2);
int32_t width = input.size(3);
auto max_val = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat));
auto max_ind = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kLong));
auto input_temp = input.select(3, 0);
max_val.copy_(input_temp);
max_ind.fill_(0);
auto output_temp = output.select(3, 0);
auto grad_output_temp = grad_output.select(3, 0);
output_temp.copy_(grad_output_temp);
auto un_max_ind = max_ind.unsqueeze(3);
auto gt_mask = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kByte));
auto max_temp = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat));
for (int32_t ind = 0; ind < width - 1; ++ind) {
input_temp = input.select(3, ind + 1);
at::gt_out(gt_mask, input_temp, max_val);
at::masked_select_out(max_temp, input_temp, gt_mask);
max_val.masked_scatter_(gt_mask, max_temp);
max_ind.masked_fill_(gt_mask, ind + 1);
grad_output_temp = grad_output.select(3, ind + 1).unsqueeze(3);
output.scatter_add_(3, un_max_ind, grad_output_temp);
}
return {
output
};
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def(
"forward", &pool_forward, "Right Pool Forward",
py::call_guard<py::gil_scoped_release>()
);
m.def(
"backward", &pool_backward, "Right Pool Backward",
py::call_guard<py::gil_scoped_release>()
);
}

View File

@@ -1,80 +0,0 @@
#include <torch/torch.h>
#include <vector>
std::vector<at::Tensor> top_pool_forward(
at::Tensor input
) {
// Initialize output
at::Tensor output = at::zeros_like(input);
// Get height
int64_t height = input.size(2);
output.copy_(input);
for (int64_t ind = 1; ind < height; ind <<= 1) {
at::Tensor max_temp = at::slice(output, 2, 0, height-ind);
at::Tensor cur_temp = at::slice(output, 2, 0, height-ind);
at::Tensor next_temp = at::slice(output, 2, ind, height);
at::max_out(max_temp, cur_temp, next_temp);
}
return {
output
};
}
std::vector<at::Tensor> top_pool_backward(
at::Tensor input,
at::Tensor grad_output
) {
auto output = at::zeros_like(input);
int32_t batch = input.size(0);
int32_t channel = input.size(1);
int32_t height = input.size(2);
int32_t width = input.size(3);
auto max_val = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat));
auto max_ind = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kLong));
auto input_temp = input.select(2, height - 1);
max_val.copy_(input_temp);
max_ind.fill_(height - 1);
auto output_temp = output.select(2, height - 1);
auto grad_output_temp = grad_output.select(2, height - 1);
output_temp.copy_(grad_output_temp);
auto un_max_ind = max_ind.unsqueeze(2);
auto gt_mask = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kByte));
auto max_temp = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat));
for (int32_t ind = 1; ind < height; ++ind) {
input_temp = input.select(2, height - ind - 1);
at::gt_out(gt_mask, input_temp, max_val);
at::masked_select_out(max_temp, input_temp, gt_mask);
max_val.masked_scatter_(gt_mask, max_temp);
max_ind.masked_fill_(gt_mask, height - ind - 1);
grad_output_temp = grad_output.select(2, height - ind - 1).unsqueeze(2);
output.scatter_add_(2, un_max_ind, grad_output_temp);
}
return {
output
};
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def(
"forward", &top_pool_forward, "Top Pool Forward",
py::call_guard<py::gil_scoped_release>()
);
m.def(
"backward", &top_pool_backward, "Top Pool Backward",
py::call_guard<py::gil_scoped_release>()
);
}

View File

@@ -1,117 +0,0 @@
import torch
from torch.nn.modules import Module
from torch.nn.parallel.parallel_apply import parallel_apply
from torch.nn.parallel.replicate import replicate
from torch.nn.parallel.scatter_gather import gather
from .scatter_gather import scatter_kwargs
class DataParallel(Module):
r"""Implements data parallelism at the module level.
This container parallelizes the application of the given module by
splitting the input across the specified devices by chunking in the batch
dimension. In the forward pass, the module is replicated on each device,
and each replica handles a portion of the input. During the backwards
pass, gradients from each replica are summed into the original module.
The batch size should be larger than the number of GPUs used. It should
also be an integer multiple of the number of GPUs so that each chunk is the
same size (so that each GPU processes the same number of samples).
See also: :ref:`cuda-nn-dataparallel-instead`
Arbitrary positional and keyword inputs are allowed to be passed into
DataParallel EXCEPT Tensors. All variables will be scattered on dim
specified (default 0). Primitive types will be broadcasted, but all
other types will be a shallow copy and can be corrupted if written to in
the model's forward pass.
Args:
module: module to be parallelized
device_ids: CUDA devices (default: all devices)
output_device: device location of output (default: device_ids[0])
Example::
>>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
>>> output = net(input_var)
"""
# TODO: update notes/cuda.rst when this class handles 8+ GPUs well
def __init__(self, module, device_ids=None, output_device=None, dim=0, chunk_sizes=None):
super(DataParallel, self).__init__()
if not torch.cuda.is_available():
self.module = module
self.device_ids = []
return
if device_ids is None:
device_ids = list(range(torch.cuda.device_count()))
if output_device is None:
output_device = device_ids[0]
self.dim = dim
self.module = module
self.device_ids = device_ids
self.chunk_sizes = chunk_sizes
self.output_device = output_device
if len(self.device_ids) == 1:
self.module.cuda(device_ids[0])
def forward(self, *inputs, **kwargs):
if not self.device_ids:
return self.module(*inputs, **kwargs)
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
if len(self.device_ids) == 1:
return self.module(*inputs[0], **kwargs[0])
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
outputs = self.parallel_apply(replicas, inputs, kwargs)
return self.gather(outputs, self.output_device)
def replicate(self, module, device_ids):
return replicate(module, device_ids)
def scatter(self, inputs, kwargs, device_ids, chunk_sizes):
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
def parallel_apply(self, replicas, inputs, kwargs):
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
def gather(self, outputs, output_device):
return gather(outputs, output_device, dim=self.dim)
def data_parallel(module, inputs, device_ids=None, output_device=None, dim=0, module_kwargs=None):
r"""Evaluates module(input) in parallel across the GPUs given in device_ids.
This is the functional version of the DataParallel module.
Args:
module: the module to evaluate in parallel
inputs: inputs to the module
device_ids: GPU ids on which to replicate module
output_device: GPU location of the output Use -1 to indicate the CPU.
(default: device_ids[0])
Returns:
a Variable containing the result of module(input) located on
output_device
"""
if not isinstance(inputs, tuple):
inputs = (inputs,)
if device_ids is None:
device_ids = list(range(torch.cuda.device_count()))
if output_device is None:
output_device = device_ids[0]
inputs, module_kwargs = scatter_kwargs(inputs, module_kwargs, device_ids, dim)
if len(device_ids) == 1:
return module(*inputs[0], **module_kwargs[0])
used_device_ids = device_ids[:len(inputs)]
replicas = replicate(module, used_device_ids)
outputs = parallel_apply(replicas, inputs, module_kwargs, used_device_ids)
return gather(outputs, output_device, dim)

View File

@@ -1,231 +0,0 @@
import torch
import torch.nn as nn
from .utils import _tranpose_and_gather_feat
def _sigmoid(x):
return torch.clamp(x.sigmoid_(), min=1e-4, max=1 - 1e-4)
def _ae_loss(tag0, tag1, mask):
num = mask.sum(dim=1, keepdim=True).float()
tag0 = tag0.squeeze()
tag1 = tag1.squeeze()
tag_mean = (tag0 + tag1) / 2
tag0 = torch.pow(tag0 - tag_mean, 2) / (num + 1e-4)
tag0 = tag0[mask].sum()
tag1 = torch.pow(tag1 - tag_mean, 2) / (num + 1e-4)
tag1 = tag1[mask].sum()
pull = tag0 + tag1
mask = mask.unsqueeze(1) + mask.unsqueeze(2)
mask = mask.eq(2)
num = num.unsqueeze(2)
num2 = (num - 1) * num
dist = tag_mean.unsqueeze(1) - tag_mean.unsqueeze(2)
dist = 1 - torch.abs(dist)
dist = nn.functional.relu(dist, inplace=True)
dist = dist - 1 / (num + 1e-4)
dist = dist / (num2 + 1e-4)
dist = dist[mask]
push = dist.sum()
return pull, push
def _off_loss(off, gt_off, mask):
num = mask.float().sum()
mask = mask.unsqueeze(2).expand_as(gt_off)
off = off[mask]
gt_off = gt_off[mask]
off_loss = nn.functional.smooth_l1_loss(off, gt_off, reduction="sum")
off_loss = off_loss / (num + 1e-4)
return off_loss
def _focal_loss_mask(preds, gt, mask):
pos_inds = gt.eq(1)
neg_inds = gt.lt(1)
neg_weights = torch.pow(1 - gt[neg_inds], 4)
pos_mask = mask[pos_inds]
neg_mask = mask[neg_inds]
loss = 0
for pred in preds:
pos_pred = pred[pos_inds]
neg_pred = pred[neg_inds]
pos_loss = torch.log(pos_pred) * torch.pow(1 - pos_pred, 2) * pos_mask
neg_loss = torch.log(1 - neg_pred) * torch.pow(neg_pred, 2) * neg_weights * neg_mask
num_pos = pos_inds.float().sum()
pos_loss = pos_loss.sum()
neg_loss = neg_loss.sum()
if pos_pred.nelement() == 0:
loss = loss - neg_loss
else:
loss = loss - (pos_loss + neg_loss) / num_pos
return loss
def _focal_loss(preds, gt):
pos_inds = gt.eq(1)
neg_inds = gt.lt(1)
neg_weights = torch.pow(1 - gt[neg_inds], 4)
loss = 0
for pred in preds:
pos_pred = pred[pos_inds]
neg_pred = pred[neg_inds]
pos_loss = torch.log(pos_pred) * torch.pow(1 - pos_pred, 2)
neg_loss = torch.log(1 - neg_pred) * torch.pow(neg_pred, 2) * neg_weights
num_pos = pos_inds.float().sum()
pos_loss = pos_loss.sum()
neg_loss = neg_loss.sum()
if pos_pred.nelement() == 0:
loss = loss - neg_loss
else:
loss = loss - (pos_loss + neg_loss) / num_pos
return loss
class CornerNet_Saccade_Loss(nn.Module):
def __init__(self, pull_weight=1, push_weight=1, off_weight=1, focal_loss=_focal_loss_mask):
super(CornerNet_Saccade_Loss, self).__init__()
self.pull_weight = pull_weight
self.push_weight = push_weight
self.off_weight = off_weight
self.focal_loss = focal_loss
self.ae_loss = _ae_loss
self.off_loss = _off_loss
def forward(self, outs, targets):
tl_heats = outs[0]
br_heats = outs[1]
tl_tags = outs[2]
br_tags = outs[3]
tl_offs = outs[4]
br_offs = outs[5]
atts = outs[6]
gt_tl_heat = targets[0]
gt_br_heat = targets[1]
gt_mask = targets[2]
gt_tl_off = targets[3]
gt_br_off = targets[4]
gt_tl_ind = targets[5]
gt_br_ind = targets[6]
gt_tl_valid = targets[7]
gt_br_valid = targets[8]
gt_atts = targets[9]
# focal loss
focal_loss = 0
tl_heats = [_sigmoid(t) for t in tl_heats]
br_heats = [_sigmoid(b) for b in br_heats]
focal_loss += self.focal_loss(tl_heats, gt_tl_heat, gt_tl_valid)
focal_loss += self.focal_loss(br_heats, gt_br_heat, gt_br_valid)
atts = [[_sigmoid(a) for a in att] for att in atts]
atts = [[att[ind] for att in atts] for ind in range(len(gt_atts))]
att_loss = 0
for att, gt_att in zip(atts, gt_atts):
att_loss += _focal_loss(att, gt_att) / max(len(att), 1)
# tag loss
pull_loss = 0
push_loss = 0
tl_tags = [_tranpose_and_gather_feat(tl_tag, gt_tl_ind) for tl_tag in tl_tags]
br_tags = [_tranpose_and_gather_feat(br_tag, gt_br_ind) for br_tag in br_tags]
for tl_tag, br_tag in zip(tl_tags, br_tags):
pull, push = self.ae_loss(tl_tag, br_tag, gt_mask)
pull_loss += pull
push_loss += push
pull_loss = self.pull_weight * pull_loss
push_loss = self.push_weight * push_loss
off_loss = 0
tl_offs = [_tranpose_and_gather_feat(tl_off, gt_tl_ind) for tl_off in tl_offs]
br_offs = [_tranpose_and_gather_feat(br_off, gt_br_ind) for br_off in br_offs]
for tl_off, br_off in zip(tl_offs, br_offs):
off_loss += self.off_loss(tl_off, gt_tl_off, gt_mask)
off_loss += self.off_loss(br_off, gt_br_off, gt_mask)
off_loss = self.off_weight * off_loss
loss = (focal_loss + att_loss + pull_loss + push_loss + off_loss) / max(len(tl_heats), 1)
return loss.unsqueeze(0)
class CornerNet_Loss(nn.Module):
def __init__(self, pull_weight=1, push_weight=1, off_weight=1, focal_loss=_focal_loss):
super(CornerNet_Loss, self).__init__()
self.pull_weight = pull_weight
self.push_weight = push_weight
self.off_weight = off_weight
self.focal_loss = focal_loss
self.ae_loss = _ae_loss
self.off_loss = _off_loss
def forward(self, outs, targets):
tl_heats = outs[0]
br_heats = outs[1]
tl_tags = outs[2]
br_tags = outs[3]
tl_offs = outs[4]
br_offs = outs[5]
gt_tl_heat = targets[0]
gt_br_heat = targets[1]
gt_mask = targets[2]
gt_tl_off = targets[3]
gt_br_off = targets[4]
gt_tl_ind = targets[5]
gt_br_ind = targets[6]
# focal loss
focal_loss = 0
tl_heats = [_sigmoid(t) for t in tl_heats]
br_heats = [_sigmoid(b) for b in br_heats]
focal_loss += self.focal_loss(tl_heats, gt_tl_heat)
focal_loss += self.focal_loss(br_heats, gt_br_heat)
# tag loss
pull_loss = 0
push_loss = 0
tl_tags = [_tranpose_and_gather_feat(tl_tag, gt_tl_ind) for tl_tag in tl_tags]
br_tags = [_tranpose_and_gather_feat(br_tag, gt_br_ind) for br_tag in br_tags]
for tl_tag, br_tag in zip(tl_tags, br_tags):
pull, push = self.ae_loss(tl_tag, br_tag, gt_mask)
pull_loss += pull
push_loss += push
pull_loss = self.pull_weight * pull_loss
push_loss = self.push_weight * push_loss
off_loss = 0
tl_offs = [_tranpose_and_gather_feat(tl_off, gt_tl_ind) for tl_off in tl_offs]
br_offs = [_tranpose_and_gather_feat(br_off, gt_br_ind) for br_off in br_offs]
for tl_off, br_off in zip(tl_offs, br_offs):
off_loss += self.off_loss(tl_off, gt_tl_off, gt_mask)
off_loss += self.off_loss(br_off, gt_br_off, gt_mask)
off_loss = self.off_weight * off_loss
loss = (focal_loss + pull_loss + push_loss + off_loss) / max(len(tl_heats), 1)
return loss.unsqueeze(0)

View File

@@ -1,303 +0,0 @@
import torch
import torch.nn as nn
from .utils import residual, upsample, merge, _decode
def _make_layer(inp_dim, out_dim, modules):
layers = [residual(inp_dim, out_dim)]
layers += [residual(out_dim, out_dim) for _ in range(1, modules)]
return nn.Sequential(*layers)
def _make_layer_revr(inp_dim, out_dim, modules):
layers = [residual(inp_dim, inp_dim) for _ in range(modules - 1)]
layers += [residual(inp_dim, out_dim)]
return nn.Sequential(*layers)
def _make_pool_layer(dim):
return nn.MaxPool2d(kernel_size=2, stride=2)
def _make_unpool_layer(dim):
return upsample(scale_factor=2)
def _make_merge_layer(dim):
return merge()
class hg_module(nn.Module):
def __init__(
self, n, dims, modules, make_up_layer=_make_layer,
make_pool_layer=_make_pool_layer, make_hg_layer=_make_layer,
make_low_layer=_make_layer, make_hg_layer_revr=_make_layer_revr,
make_unpool_layer=_make_unpool_layer, make_merge_layer=_make_merge_layer
):
super(hg_module, self).__init__()
curr_mod = modules[0]
next_mod = modules[1]
curr_dim = dims[0]
next_dim = dims[1]
self.n = n
self.up1 = make_up_layer(curr_dim, curr_dim, curr_mod)
self.max1 = make_pool_layer(curr_dim)
self.low1 = make_hg_layer(curr_dim, next_dim, curr_mod)
self.low2 = hg_module(
n - 1, dims[1:], modules[1:],
make_up_layer=make_up_layer,
make_pool_layer=make_pool_layer,
make_hg_layer=make_hg_layer,
make_low_layer=make_low_layer,
make_hg_layer_revr=make_hg_layer_revr,
make_unpool_layer=make_unpool_layer,
make_merge_layer=make_merge_layer
) if n > 1 else make_low_layer(next_dim, next_dim, next_mod)
self.low3 = make_hg_layer_revr(next_dim, curr_dim, curr_mod)
self.up2 = make_unpool_layer(curr_dim)
self.merg = make_merge_layer(curr_dim)
def forward(self, x):
up1 = self.up1(x)
max1 = self.max1(x)
low1 = self.low1(max1)
low2 = self.low2(low1)
low3 = self.low3(low2)
up2 = self.up2(low3)
merg = self.merg(up1, up2)
return merg
class hg(nn.Module):
def __init__(self, pre, hg_modules, cnvs, inters, cnvs_, inters_):
super(hg, self).__init__()
self.pre = pre
self.hgs = hg_modules
self.cnvs = cnvs
self.inters = inters
self.inters_ = inters_
self.cnvs_ = cnvs_
def forward(self, x):
inter = self.pre(x)
cnvs = []
for ind, (hg_, cnv_) in enumerate(zip(self.hgs, self.cnvs)):
hg = hg_(inter)
cnv = cnv_(hg)
cnvs.append(cnv)
if ind < len(self.hgs) - 1:
inter = self.inters_[ind](inter) + self.cnvs_[ind](cnv)
inter = nn.functional.relu_(inter)
inter = self.inters[ind](inter)
return cnvs
class hg_net(nn.Module):
def __init__(
self, hg, tl_modules, br_modules, tl_heats, br_heats,
tl_tags, br_tags, tl_offs, br_offs
):
super(hg_net, self).__init__()
self._decode = _decode
self.hg = hg
self.tl_modules = tl_modules
self.br_modules = br_modules
self.tl_heats = tl_heats
self.br_heats = br_heats
self.tl_tags = tl_tags
self.br_tags = br_tags
self.tl_offs = tl_offs
self.br_offs = br_offs
def _train(self, *xs):
image = xs[0]
cnvs = self.hg(image)
tl_modules = [tl_mod_(cnv) for tl_mod_, cnv in zip(self.tl_modules, cnvs)]
br_modules = [br_mod_(cnv) for br_mod_, cnv in zip(self.br_modules, cnvs)]
tl_heats = [tl_heat_(tl_mod) for tl_heat_, tl_mod in zip(self.tl_heats, tl_modules)]
br_heats = [br_heat_(br_mod) for br_heat_, br_mod in zip(self.br_heats, br_modules)]
tl_tags = [tl_tag_(tl_mod) for tl_tag_, tl_mod in zip(self.tl_tags, tl_modules)]
br_tags = [br_tag_(br_mod) for br_tag_, br_mod in zip(self.br_tags, br_modules)]
tl_offs = [tl_off_(tl_mod) for tl_off_, tl_mod in zip(self.tl_offs, tl_modules)]
br_offs = [br_off_(br_mod) for br_off_, br_mod in zip(self.br_offs, br_modules)]
return [tl_heats, br_heats, tl_tags, br_tags, tl_offs, br_offs]
def _test(self, *xs, **kwargs):
image = xs[0]
cnvs = self.hg(image)
tl_mod = self.tl_modules[-1](cnvs[-1])
br_mod = self.br_modules[-1](cnvs[-1])
tl_heat, br_heat = self.tl_heats[-1](tl_mod), self.br_heats[-1](br_mod)
tl_tag, br_tag = self.tl_tags[-1](tl_mod), self.br_tags[-1](br_mod)
tl_off, br_off = self.tl_offs[-1](tl_mod), self.br_offs[-1](br_mod)
outs = [tl_heat, br_heat, tl_tag, br_tag, tl_off, br_off]
return self._decode(*outs, **kwargs), tl_heat, br_heat, tl_tag, br_tag
def forward(self, *xs, test=False, **kwargs):
if not test:
return self._train(*xs, **kwargs)
return self._test(*xs, **kwargs)
class saccade_module(nn.Module):
def __init__(
self, n, dims, modules, make_up_layer=_make_layer,
make_pool_layer=_make_pool_layer, make_hg_layer=_make_layer,
make_low_layer=_make_layer, make_hg_layer_revr=_make_layer_revr,
make_unpool_layer=_make_unpool_layer, make_merge_layer=_make_merge_layer
):
super(saccade_module, self).__init__()
curr_mod = modules[0]
next_mod = modules[1]
curr_dim = dims[0]
next_dim = dims[1]
self.n = n
self.up1 = make_up_layer(curr_dim, curr_dim, curr_mod)
self.max1 = make_pool_layer(curr_dim)
self.low1 = make_hg_layer(curr_dim, next_dim, curr_mod)
self.low2 = saccade_module(
n - 1, dims[1:], modules[1:],
make_up_layer=make_up_layer,
make_pool_layer=make_pool_layer,
make_hg_layer=make_hg_layer,
make_low_layer=make_low_layer,
make_hg_layer_revr=make_hg_layer_revr,
make_unpool_layer=make_unpool_layer,
make_merge_layer=make_merge_layer
) if n > 1 else make_low_layer(next_dim, next_dim, next_mod)
self.low3 = make_hg_layer_revr(next_dim, curr_dim, curr_mod)
self.up2 = make_unpool_layer(curr_dim)
self.merg = make_merge_layer(curr_dim)
def forward(self, x):
up1 = self.up1(x)
max1 = self.max1(x)
low1 = self.low1(max1)
if self.n > 1:
low2, mergs = self.low2(low1)
else:
low2, mergs = self.low2(low1), []
low3 = self.low3(low2)
up2 = self.up2(low3)
merg = self.merg(up1, up2)
mergs.append(merg)
return merg, mergs
class saccade(nn.Module):
def __init__(self, pre, hg_modules, cnvs, inters, cnvs_, inters_):
super(saccade, self).__init__()
self.pre = pre
self.hgs = hg_modules
self.cnvs = cnvs
self.inters = inters
self.inters_ = inters_
self.cnvs_ = cnvs_
def forward(self, x):
inter = self.pre(x)
cnvs = []
atts = []
for ind, (hg_, cnv_) in enumerate(zip(self.hgs, self.cnvs)):
hg, ups = hg_(inter)
cnv = cnv_(hg)
cnvs.append(cnv)
atts.append(ups)
if ind < len(self.hgs) - 1:
inter = self.inters_[ind](inter) + self.cnvs_[ind](cnv)
inter = nn.functional.relu_(inter)
inter = self.inters[ind](inter)
return cnvs, atts
class saccade_net(nn.Module):
def __init__(
self, hg, tl_modules, br_modules, tl_heats, br_heats,
tl_tags, br_tags, tl_offs, br_offs, att_modules, up_start=0
):
super(saccade_net, self).__init__()
self._decode = _decode
self.hg = hg
self.tl_modules = tl_modules
self.br_modules = br_modules
self.tl_heats = tl_heats
self.br_heats = br_heats
self.tl_tags = tl_tags
self.br_tags = br_tags
self.tl_offs = tl_offs
self.br_offs = br_offs
self.att_modules = att_modules
self.up_start = up_start
def _train(self, *xs):
image = xs[0]
cnvs, ups = self.hg(image)
ups = [up[self.up_start:] for up in ups]
tl_modules = [tl_mod_(cnv) for tl_mod_, cnv in zip(self.tl_modules, cnvs)]
br_modules = [br_mod_(cnv) for br_mod_, cnv in zip(self.br_modules, cnvs)]
tl_heats = [tl_heat_(tl_mod) for tl_heat_, tl_mod in zip(self.tl_heats, tl_modules)]
br_heats = [br_heat_(br_mod) for br_heat_, br_mod in zip(self.br_heats, br_modules)]
tl_tags = [tl_tag_(tl_mod) for tl_tag_, tl_mod in zip(self.tl_tags, tl_modules)]
br_tags = [br_tag_(br_mod) for br_tag_, br_mod in zip(self.br_tags, br_modules)]
tl_offs = [tl_off_(tl_mod) for tl_off_, tl_mod in zip(self.tl_offs, tl_modules)]
br_offs = [br_off_(br_mod) for br_off_, br_mod in zip(self.br_offs, br_modules)]
atts = [[att_mod_(u) for att_mod_, u in zip(att_mods, up)] for att_mods, up in zip(self.att_modules, ups)]
return [tl_heats, br_heats, tl_tags, br_tags, tl_offs, br_offs, atts]
def _test(self, *xs, no_att=False, **kwargs):
image = xs[0]
cnvs, ups = self.hg(image)
ups = [up[self.up_start:] for up in ups]
if not no_att:
atts = [att_mod_(up) for att_mod_, up in zip(self.att_modules[-1], ups[-1])]
atts = [torch.sigmoid(att) for att in atts]
tl_mod = self.tl_modules[-1](cnvs[-1])
br_mod = self.br_modules[-1](cnvs[-1])
tl_heat, br_heat = self.tl_heats[-1](tl_mod), self.br_heats[-1](br_mod)
tl_tag, br_tag = self.tl_tags[-1](tl_mod), self.br_tags[-1](br_mod)
tl_off, br_off = self.tl_offs[-1](tl_mod), self.br_offs[-1](br_mod)
outs = [tl_heat, br_heat, tl_tag, br_tag, tl_off, br_off]
if not no_att:
return self._decode(*outs, **kwargs), atts
else:
return self._decode(*outs, **kwargs)
def forward(self, *xs, test=False, **kwargs):
if not test:
return self._train(*xs, **kwargs)
return self._test(*xs, **kwargs)

View File

@@ -1,39 +0,0 @@
import torch
from torch.autograd import Variable
from torch.nn.parallel._functions import Scatter
def scatter(inputs, target_gpus, dim=0, chunk_sizes=None):
r"""
Slices variables into approximately equal chunks and
distributes them across given GPUs. Duplicates
references to objects that are not variables. Does not
support Tensors.
"""
def scatter_map(obj):
if isinstance(obj, Variable):
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
assert not torch.is_tensor(obj), "Tensors not supported in scatter."
if isinstance(obj, tuple):
return list(zip(*map(scatter_map, obj)))
if isinstance(obj, list):
return list(map(list, zip(*map(scatter_map, obj))))
if isinstance(obj, dict):
return list(map(type(obj), zip(*map(scatter_map, obj.items()))))
return [obj for targets in target_gpus]
return scatter_map(inputs)
def scatter_kwargs(inputs, kwargs, target_gpus, dim=0, chunk_sizes=None):
r"""Scatter with support for kwargs dictionary"""
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
kwargs = scatter(kwargs, target_gpus, dim, chunk_sizes) if kwargs else []
if len(inputs) < len(kwargs):
inputs.extend([() for _ in range(len(kwargs) - len(inputs))])
elif len(kwargs) < len(inputs):
kwargs.extend([{} for _ in range(len(inputs) - len(kwargs))])
inputs = tuple(inputs)
kwargs = tuple(kwargs)
return inputs, kwargs

View File

@@ -1,236 +0,0 @@
import torch
import torch.nn as nn
def _gather_feat(feat, ind, mask=None):
dim = feat.size(2)
ind = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim)
feat = feat.gather(1, ind)
if mask is not None:
mask = mask.unsqueeze(2).expand_as(feat)
feat = feat[mask]
feat = feat.view(-1, dim)
return feat
def _nms(heat, kernel=1):
pad = (kernel - 1) // 2
hmax = nn.functional.max_pool2d(heat, (kernel, kernel), stride=1, padding=pad)
keep = (hmax == heat).float()
return heat * keep
def _tranpose_and_gather_feat(feat, ind):
feat = feat.permute(0, 2, 3, 1).contiguous()
feat = feat.view(feat.size(0), -1, feat.size(3))
feat = _gather_feat(feat, ind)
return feat
def _topk(scores, K=20):
batch, cat, height, width = scores.size()
topk_scores, topk_inds = torch.topk(scores.view(batch, -1), K)
topk_clses = (topk_inds / (height * width)).int()
topk_inds = topk_inds % (height * width)
topk_ys = (topk_inds / width).int().float()
topk_xs = (topk_inds % width).int().float()
return topk_scores, topk_inds, topk_clses, topk_ys, topk_xs
def _decode(
tl_heat, br_heat, tl_tag, br_tag, tl_regr, br_regr,
K=100, kernel=1, ae_threshold=1, num_dets=1000, no_border=False
):
batch, cat, height, width = tl_heat.size()
tl_heat = torch.sigmoid(tl_heat)
br_heat = torch.sigmoid(br_heat)
# perform nms on heatmaps
tl_heat = _nms(tl_heat, kernel=kernel)
br_heat = _nms(br_heat, kernel=kernel)
tl_scores, tl_inds, tl_clses, tl_ys, tl_xs = _topk(tl_heat, K=K)
br_scores, br_inds, br_clses, br_ys, br_xs = _topk(br_heat, K=K)
tl_ys = tl_ys.view(batch, K, 1).expand(batch, K, K)
tl_xs = tl_xs.view(batch, K, 1).expand(batch, K, K)
br_ys = br_ys.view(batch, 1, K).expand(batch, K, K)
br_xs = br_xs.view(batch, 1, K).expand(batch, K, K)
if no_border:
tl_ys_binds = (tl_ys == 0)
tl_xs_binds = (tl_xs == 0)
br_ys_binds = (br_ys == height - 1)
br_xs_binds = (br_xs == width - 1)
if tl_regr is not None and br_regr is not None:
tl_regr = _tranpose_and_gather_feat(tl_regr, tl_inds)
tl_regr = tl_regr.view(batch, K, 1, 2)
br_regr = _tranpose_and_gather_feat(br_regr, br_inds)
br_regr = br_regr.view(batch, 1, K, 2)
tl_xs = tl_xs + tl_regr[..., 0]
tl_ys = tl_ys + tl_regr[..., 1]
br_xs = br_xs + br_regr[..., 0]
br_ys = br_ys + br_regr[..., 1]
# all possible boxes based on top k corners (ignoring class)
bboxes = torch.stack((tl_xs, tl_ys, br_xs, br_ys), dim=3)
tl_tag = _tranpose_and_gather_feat(tl_tag, tl_inds)
tl_tag = tl_tag.view(batch, K, 1)
br_tag = _tranpose_and_gather_feat(br_tag, br_inds)
br_tag = br_tag.view(batch, 1, K)
dists = torch.abs(tl_tag - br_tag)
tl_scores = tl_scores.view(batch, K, 1).expand(batch, K, K)
br_scores = br_scores.view(batch, 1, K).expand(batch, K, K)
scores = (tl_scores + br_scores) / 2
# reject boxes based on classes
tl_clses = tl_clses.view(batch, K, 1).expand(batch, K, K)
br_clses = br_clses.view(batch, 1, K).expand(batch, K, K)
cls_inds = (tl_clses != br_clses)
# reject boxes based on distances
dist_inds = (dists > ae_threshold)
# reject boxes based on widths and heights
width_inds = (br_xs < tl_xs)
height_inds = (br_ys < tl_ys)
if no_border:
scores[tl_ys_binds] = -1
scores[tl_xs_binds] = -1
scores[br_ys_binds] = -1
scores[br_xs_binds] = -1
scores[cls_inds] = -1
scores[dist_inds] = -1
scores[width_inds] = -1
scores[height_inds] = -1
scores = scores.view(batch, -1)
scores, inds = torch.topk(scores, num_dets)
scores = scores.unsqueeze(2)
bboxes = bboxes.view(batch, -1, 4)
bboxes = _gather_feat(bboxes, inds)
clses = tl_clses.contiguous().view(batch, -1, 1)
clses = _gather_feat(clses, inds).float()
tl_scores = tl_scores.contiguous().view(batch, -1, 1)
tl_scores = _gather_feat(tl_scores, inds).float()
br_scores = br_scores.contiguous().view(batch, -1, 1)
br_scores = _gather_feat(br_scores, inds).float()
detections = torch.cat([bboxes, scores, tl_scores, br_scores, clses], dim=2)
return detections
class upsample(nn.Module):
def __init__(self, scale_factor):
super(upsample, self).__init__()
self.scale_factor = scale_factor
def forward(self, x):
return nn.functional.interpolate(x, scale_factor=self.scale_factor)
class merge(nn.Module):
def forward(self, x, y):
return x + y
class convolution(nn.Module):
def __init__(self, k, inp_dim, out_dim, stride=1, with_bn=True):
super(convolution, self).__init__()
pad = (k - 1) // 2
self.conv = nn.Conv2d(inp_dim, out_dim, (k, k), padding=(pad, pad), stride=(stride, stride), bias=not with_bn)
self.bn = nn.BatchNorm2d(out_dim) if with_bn else nn.Sequential()
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
conv = self.conv(x)
bn = self.bn(conv)
relu = self.relu(bn)
return relu
class residual(nn.Module):
def __init__(self, inp_dim, out_dim, k=3, stride=1):
super(residual, self).__init__()
p = (k - 1) // 2
self.conv1 = nn.Conv2d(inp_dim, out_dim, (k, k), padding=(p, p), stride=(stride, stride), bias=False)
self.bn1 = nn.BatchNorm2d(out_dim)
self.relu1 = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(out_dim, out_dim, (k, k), padding=(p, p), bias=False)
self.bn2 = nn.BatchNorm2d(out_dim)
self.skip = nn.Sequential(
nn.Conv2d(inp_dim, out_dim, (1, 1), stride=(stride, stride), bias=False),
nn.BatchNorm2d(out_dim)
) if stride != 1 or inp_dim != out_dim else nn.Sequential()
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
conv1 = self.conv1(x)
bn1 = self.bn1(conv1)
relu1 = self.relu1(bn1)
conv2 = self.conv2(relu1)
bn2 = self.bn2(conv2)
skip = self.skip(x)
return self.relu(bn2 + skip)
class corner_pool(nn.Module):
def __init__(self, dim, pool1, pool2):
super(corner_pool, self).__init__()
self._init_layers(dim, pool1, pool2)
def _init_layers(self, dim, pool1, pool2):
self.p1_conv1 = convolution(3, dim, 128)
self.p2_conv1 = convolution(3, dim, 128)
self.p_conv1 = nn.Conv2d(128, dim, (3, 3), padding=(1, 1), bias=False)
self.p_bn1 = nn.BatchNorm2d(dim)
self.conv1 = nn.Conv2d(dim, dim, (1, 1), bias=False)
self.bn1 = nn.BatchNorm2d(dim)
self.relu1 = nn.ReLU(inplace=True)
self.conv2 = convolution(3, dim, dim)
self.pool1 = pool1()
self.pool2 = pool2()
def forward(self, x):
# pool 1
p1_conv1 = self.p1_conv1(x)
pool1 = self.pool1(p1_conv1)
# pool 2
p2_conv1 = self.p2_conv1(x)
pool2 = self.pool2(p2_conv1)
# pool 1 + pool 2
p_conv1 = self.p_conv1(pool1 + pool2)
p_bn1 = self.p_bn1(p_conv1)
conv1 = self.conv1(x)
bn1 = self.bn1(conv1)
relu1 = self.relu1(p_bn1 + bn1)
conv2 = self.conv2(relu1)
return conv2

View File

@@ -1,137 +0,0 @@
import torch
import torch.nn as nn
from ..models.py_utils.data_parallel import DataParallel
torch.manual_seed(317)
class Network(nn.Module):
def __init__(self, model, loss):
super(Network, self).__init__()
self.model = model
self.loss = loss
def forward(self, xs, ys, **kwargs):
preds = self.model(*xs, **kwargs)
loss = self.loss(preds, ys, **kwargs)
return loss
# for model backward compatibility
# previously model was wrapped by DataParallel module
class DummyModule(nn.Module):
def __init__(self, model):
super(DummyModule, self).__init__()
self.module = model
def forward(self, *xs, **kwargs):
return self.module(*xs, **kwargs)
class NetworkFactory(object):
def __init__(self, system_config, model, distributed=False, gpu=None):
super(NetworkFactory, self).__init__()
self.system_config = system_config
self.gpu = gpu
self.model = DummyModule(model)
self.loss = model.loss
self.network = Network(self.model, self.loss)
if distributed:
from apex.parallel import DistributedDataParallel, convert_syncbn_model
torch.cuda.set_device(gpu)
self.network = self.network.cuda(gpu)
self.network = convert_syncbn_model(self.network)
self.network = DistributedDataParallel(self.network)
else:
self.network = DataParallel(self.network, chunk_sizes=system_config.chunk_sizes)
total_params = 0
for params in self.model.parameters():
num_params = 1
for x in params.size():
num_params *= x
total_params += num_params
print("total parameters: {}".format(total_params))
if system_config.opt_algo == "adam":
self.optimizer = torch.optim.Adam(
filter(lambda p: p.requires_grad, self.model.parameters())
)
elif system_config.opt_algo == "sgd":
self.optimizer = torch.optim.SGD(
filter(lambda p: p.requires_grad, self.model.parameters()),
lr=system_config.learning_rate,
momentum=0.9, weight_decay=0.0001
)
else:
raise ValueError("unknown optimizer")
def cuda(self):
self.model.cuda()
def train_mode(self):
self.network.train()
def eval_mode(self):
self.network.eval()
def _t_cuda(self, xs):
if type(xs) is list:
return [x.cuda(self.gpu, non_blocking=True) for x in xs]
return xs.cuda(self.gpu, non_blocking=True)
def train(self, xs, ys, **kwargs):
xs = [self._t_cuda(x) for x in xs]
ys = [self._t_cuda(y) for y in ys]
self.optimizer.zero_grad()
loss = self.network(xs, ys)
loss = loss.mean()
loss.backward()
self.optimizer.step()
return loss
def validate(self, xs, ys, **kwargs):
with torch.no_grad():
xs = [self._t_cuda(x) for x in xs]
ys = [self._t_cuda(y) for y in ys]
loss = self.network(xs, ys)
loss = loss.mean()
return loss
def test(self, xs, **kwargs):
with torch.no_grad():
xs = [self._t_cuda(x) for x in xs]
return self.model(*xs, **kwargs)
def set_lr(self, lr):
print("setting learning rate to: {}".format(lr))
for param_group in self.optimizer.param_groups:
param_group["lr"] = lr
def load_pretrained_params(self, pretrained_model):
print("loading from {}".format(pretrained_model))
with open(pretrained_model, "rb") as f:
params = torch.load(f, weights_only=False)
self.model.load_state_dict(params)
def load_params(self, iteration):
cache_file = self.system_config.snapshot_file.format(iteration)
print("loading model from {}".format(cache_file))
with open(cache_file, "rb") as f:
params = torch.load(f)
self.model.load_state_dict(params)
def save_params(self, iteration):
cache_file = self.system_config.snapshot_file.format(iteration)
print("saving model to {}".format(cache_file))
with open(cache_file, "wb") as f:
params = self.model.state_dict()
torch.save(params, f)

View File

@@ -1,8 +0,0 @@
import pkg_resources
_package_name = __name__
def get_file_path(*paths):
path = "/".join(paths)
return pkg_resources.resource_filename(_package_name, path)

View File

@@ -1,5 +0,0 @@
from .cornernet import cornernet
from .cornernet_saccade import cornernet_saccade
def data_sampling_func(sys_configs, db, k_ind, data_aug=True, debug=False):
return globals()[sys_configs.sampling_function](sys_configs, db, k_ind, data_aug, debug)

View File

@@ -1,164 +0,0 @@
import math
import cv2
import numpy as np
import torch
from .utils import random_crop, draw_gaussian, gaussian_radius, normalize_, color_jittering_, lighting_
def _resize_image(image, detections, size):
detections = detections.copy()
height, width = image.shape[0:2]
new_height, new_width = size
image = cv2.resize(image, (new_width, new_height))
height_ratio = new_height / height
width_ratio = new_width / width
detections[:, 0:4:2] *= width_ratio
detections[:, 1:4:2] *= height_ratio
return image, detections
def _clip_detections(image, detections):
detections = detections.copy()
height, width = image.shape[0:2]
detections[:, 0:4:2] = np.clip(detections[:, 0:4:2], 0, width - 1)
detections[:, 1:4:2] = np.clip(detections[:, 1:4:2], 0, height - 1)
keep_inds = ((detections[:, 2] - detections[:, 0]) > 0) & \
((detections[:, 3] - detections[:, 1]) > 0)
detections = detections[keep_inds]
return detections
def cornernet(system_configs, db, k_ind, data_aug, debug):
data_rng = system_configs.data_rng
batch_size = system_configs.batch_size
categories = db.configs["categories"]
input_size = db.configs["input_size"]
output_size = db.configs["output_sizes"][0]
border = db.configs["border"]
lighting = db.configs["lighting"]
rand_crop = db.configs["rand_crop"]
rand_color = db.configs["rand_color"]
rand_scales = db.configs["rand_scales"]
gaussian_bump = db.configs["gaussian_bump"]
gaussian_iou = db.configs["gaussian_iou"]
gaussian_rad = db.configs["gaussian_radius"]
max_tag_len = 128
# allocating memory
images = np.zeros((batch_size, 3, input_size[0], input_size[1]), dtype=np.float32)
tl_heatmaps = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
br_heatmaps = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
tl_regrs = np.zeros((batch_size, max_tag_len, 2), dtype=np.float32)
br_regrs = np.zeros((batch_size, max_tag_len, 2), dtype=np.float32)
tl_tags = np.zeros((batch_size, max_tag_len), dtype=np.int64)
br_tags = np.zeros((batch_size, max_tag_len), dtype=np.int64)
tag_masks = np.zeros((batch_size, max_tag_len), dtype=np.uint8)
tag_lens = np.zeros((batch_size,), dtype=np.int32)
db_size = db.db_inds.size
for b_ind in range(batch_size):
if not debug and k_ind == 0:
db.shuffle_inds()
db_ind = db.db_inds[k_ind]
k_ind = (k_ind + 1) % db_size
# reading image
image_path = db.image_path(db_ind)
image = cv2.imread(image_path)
# reading detections
detections = db.detections(db_ind)
# cropping an image randomly
if not debug and rand_crop:
image, detections = random_crop(image, detections, rand_scales, input_size, border=border)
image, detections = _resize_image(image, detections, input_size)
detections = _clip_detections(image, detections)
width_ratio = output_size[1] / input_size[1]
height_ratio = output_size[0] / input_size[0]
# flipping an image randomly
if not debug and np.random.uniform() > 0.5:
image[:] = image[:, ::-1, :]
width = image.shape[1]
detections[:, [0, 2]] = width - detections[:, [2, 0]] - 1
if not debug:
image = image.astype(np.float32) / 255.
if rand_color:
color_jittering_(data_rng, image)
if lighting:
lighting_(data_rng, image, 0.1, db.eig_val, db.eig_vec)
normalize_(image, db.mean, db.std)
images[b_ind] = image.transpose((2, 0, 1))
for ind, detection in enumerate(detections):
category = int(detection[-1]) - 1
xtl, ytl = detection[0], detection[1]
xbr, ybr = detection[2], detection[3]
fxtl = (xtl * width_ratio)
fytl = (ytl * height_ratio)
fxbr = (xbr * width_ratio)
fybr = (ybr * height_ratio)
xtl = int(fxtl)
ytl = int(fytl)
xbr = int(fxbr)
ybr = int(fybr)
if gaussian_bump:
width = detection[2] - detection[0]
height = detection[3] - detection[1]
width = math.ceil(width * width_ratio)
height = math.ceil(height * height_ratio)
if gaussian_rad == -1:
radius = gaussian_radius((height, width), gaussian_iou)
radius = max(0, int(radius))
else:
radius = gaussian_rad
draw_gaussian(tl_heatmaps[b_ind, category], [xtl, ytl], radius)
draw_gaussian(br_heatmaps[b_ind, category], [xbr, ybr], radius)
else:
tl_heatmaps[b_ind, category, ytl, xtl] = 1
br_heatmaps[b_ind, category, ybr, xbr] = 1
tag_ind = tag_lens[b_ind]
tl_regrs[b_ind, tag_ind, :] = [fxtl - xtl, fytl - ytl]
br_regrs[b_ind, tag_ind, :] = [fxbr - xbr, fybr - ybr]
tl_tags[b_ind, tag_ind] = ytl * output_size[1] + xtl
br_tags[b_ind, tag_ind] = ybr * output_size[1] + xbr
tag_lens[b_ind] += 1
for b_ind in range(batch_size):
tag_len = tag_lens[b_ind]
tag_masks[b_ind, :tag_len] = 1
images = torch.from_numpy(images)
tl_heatmaps = torch.from_numpy(tl_heatmaps)
br_heatmaps = torch.from_numpy(br_heatmaps)
tl_regrs = torch.from_numpy(tl_regrs)
br_regrs = torch.from_numpy(br_regrs)
tl_tags = torch.from_numpy(tl_tags)
br_tags = torch.from_numpy(br_tags)
tag_masks = torch.from_numpy(tag_masks)
return {
"xs": [images],
"ys": [tl_heatmaps, br_heatmaps, tag_masks, tl_regrs, br_regrs, tl_tags, br_tags]
}, k_ind

View File

@@ -1,293 +0,0 @@
import math
import cv2
import numpy as np
import torch
from .utils import draw_gaussian, gaussian_radius, normalize_, color_jittering_, lighting_, crop_image
def bbox_overlaps(a_dets, b_dets):
a_widths = a_dets[:, 2] - a_dets[:, 0]
a_heights = a_dets[:, 3] - a_dets[:, 1]
a_areas = a_widths * a_heights
b_widths = b_dets[:, 2] - b_dets[:, 0]
b_heights = b_dets[:, 3] - b_dets[:, 1]
b_areas = b_widths * b_heights
return a_areas / b_areas
def clip_detections(border, detections):
detections = detections.copy()
y0, y1, x0, x1 = border
det_xs = detections[:, 0:4:2]
det_ys = detections[:, 1:4:2]
np.clip(det_xs, x0, x1 - 1, out=det_xs)
np.clip(det_ys, y0, y1 - 1, out=det_ys)
keep_inds = ((det_xs[:, 1] - det_xs[:, 0]) > 0) & \
((det_ys[:, 1] - det_ys[:, 0]) > 0)
keep_inds = np.where(keep_inds)[0]
return detections[keep_inds], keep_inds
def crop_image_dets(image, dets, ind, input_size, output_size=None, random_crop=True, rand_center=True):
if ind is not None:
det_x0, det_y0, det_x1, det_y1 = dets[ind, 0:4]
else:
det_x0, det_y0, det_x1, det_y1 = None, None, None, None
input_height, input_width = input_size
image_height, image_width = image.shape[0:2]
centered = rand_center and np.random.uniform() > 0.5
if not random_crop or image_width <= input_width:
xc = image_width // 2
elif ind is None or not centered:
xmin = max(det_x1 - input_width, 0) if ind is not None else 0
xmax = min(image_width - input_width, det_x0) if ind is not None else image_width - input_width
xrand = np.random.randint(int(xmin), int(xmax) + 1)
xc = xrand + input_width // 2
else:
xmin = max((det_x0 + det_x1) // 2 - np.random.randint(0, 15), 0)
xmax = min((det_x0 + det_x1) // 2 + np.random.randint(0, 15), image_width - 1)
xc = np.random.randint(int(xmin), int(xmax) + 1)
if not random_crop or image_height <= input_height:
yc = image_height // 2
elif ind is None or not centered:
ymin = max(det_y1 - input_height, 0) if ind is not None else 0
ymax = min(image_height - input_height, det_y0) if ind is not None else image_height - input_height
yrand = np.random.randint(int(ymin), int(ymax) + 1)
yc = yrand + input_height // 2
else:
ymin = max((det_y0 + det_y1) // 2 - np.random.randint(0, 15), 0)
ymax = min((det_y0 + det_y1) // 2 + np.random.randint(0, 15), image_height - 1)
yc = np.random.randint(int(ymin), int(ymax) + 1)
image, border, offset = crop_image(image, [yc, xc], input_size, output_size=output_size)
dets[:, 0:4:2] -= offset[1]
dets[:, 1:4:2] -= offset[0]
return image, dets, border
def scale_image_detections(image, dets, scale):
height, width = image.shape[0:2]
new_height = int(height * scale)
new_width = int(width * scale)
image = cv2.resize(image, (new_width, new_height))
dets = dets.copy()
dets[:, 0:4] *= scale
return image, dets
def ref_scale(detections, random_crop=False):
if detections.shape[0] == 0:
return None, None
if random_crop and np.random.uniform() > 0.7:
return None, None
ref_ind = np.random.randint(detections.shape[0])
ref_det = detections[ref_ind].copy()
ref_h = ref_det[3] - ref_det[1]
ref_w = ref_det[2] - ref_det[0]
ref_hw = max(ref_h, ref_w)
if ref_hw > 96:
return np.random.randint(low=96, high=255) / ref_hw, ref_ind
elif ref_hw > 32:
return np.random.randint(low=32, high=97) / ref_hw, ref_ind
return np.random.randint(low=16, high=33) / ref_hw, ref_ind
def create_attention_mask(atts, ratios, sizes, detections):
for det in detections:
width = det[2] - det[0]
height = det[3] - det[1]
max_hw = max(width, height)
for att, ratio, size in zip(atts, ratios, sizes):
if max_hw >= size[0] and max_hw <= size[1]:
x = (det[0] + det[2]) / 2
y = (det[1] + det[3]) / 2
x = (x / ratio).astype(np.int32)
y = (y / ratio).astype(np.int32)
att[y, x] = 1
def cornernet_saccade(system_configs, db, k_ind, data_aug, debug):
data_rng = system_configs.data_rng
batch_size = system_configs.batch_size
categories = db.configs["categories"]
input_size = db.configs["input_size"]
output_size = db.configs["output_sizes"][0]
rand_scales = db.configs["rand_scales"]
rand_crop = db.configs["rand_crop"]
rand_center = db.configs["rand_center"]
view_sizes = db.configs["view_sizes"]
gaussian_iou = db.configs["gaussian_iou"]
gaussian_rad = db.configs["gaussian_radius"]
att_ratios = db.configs["att_ratios"]
att_ranges = db.configs["att_ranges"]
att_sizes = db.configs["att_sizes"]
min_scale = db.configs["min_scale"]
max_scale = db.configs["max_scale"]
max_objects = 128
images = np.zeros((batch_size, 3, input_size[0], input_size[1]), dtype=np.float32)
tl_heats = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
br_heats = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
tl_valids = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
br_valids = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
tl_regrs = np.zeros((batch_size, max_objects, 2), dtype=np.float32)
br_regrs = np.zeros((batch_size, max_objects, 2), dtype=np.float32)
tl_tags = np.zeros((batch_size, max_objects), dtype=np.int64)
br_tags = np.zeros((batch_size, max_objects), dtype=np.int64)
tag_masks = np.zeros((batch_size, max_objects), dtype=np.uint8)
tag_lens = np.zeros((batch_size,), dtype=np.int32)
attentions = [np.zeros((batch_size, 1, att_size[0], att_size[1]), dtype=np.float32) for att_size in att_sizes]
db_size = db.db_inds.size
for b_ind in range(batch_size):
if not debug and k_ind == 0:
# if k_ind == 0:
db.shuffle_inds()
db_ind = db.db_inds[k_ind]
k_ind = (k_ind + 1) % db_size
image_path = db.image_path(db_ind)
image = cv2.imread(image_path)
orig_detections = db.detections(db_ind)
keep_inds = np.arange(orig_detections.shape[0])
# clip the detections
detections = orig_detections.copy()
border = [0, image.shape[0], 0, image.shape[1]]
detections, clip_inds = clip_detections(border, detections)
keep_inds = keep_inds[clip_inds]
scale, ref_ind = ref_scale(detections, random_crop=rand_crop)
scale = np.random.choice(rand_scales) if scale is None else scale
orig_detections[:, 0:4:2] *= scale
orig_detections[:, 1:4:2] *= scale
image, detections = scale_image_detections(image, detections, scale)
ref_detection = detections[ref_ind].copy()
image, detections, border = crop_image_dets(image, detections, ref_ind, input_size, rand_center=rand_center)
detections, clip_inds = clip_detections(border, detections)
keep_inds = keep_inds[clip_inds]
width_ratio = output_size[1] / input_size[1]
height_ratio = output_size[0] / input_size[0]
# flipping an image randomly
if not debug and np.random.uniform() > 0.5:
image[:] = image[:, ::-1, :]
width = image.shape[1]
detections[:, [0, 2]] = width - detections[:, [2, 0]] - 1
create_attention_mask([att[b_ind, 0] for att in attentions], att_ratios, att_ranges, detections)
if debug:
dimage = image.copy()
for det in detections.astype(np.int32):
cv2.rectangle(dimage,
(det[0], det[1]),
(det[2], det[3]),
(0, 255, 0), 2
)
cv2.imwrite('debug/{:03d}.jpg'.format(b_ind), dimage)
overlaps = bbox_overlaps(detections, orig_detections[keep_inds]) > 0.5
if not debug:
image = image.astype(np.float32) / 255.
color_jittering_(data_rng, image)
lighting_(data_rng, image, 0.1, db.eig_val, db.eig_vec)
normalize_(image, db.mean, db.std)
images[b_ind] = image.transpose((2, 0, 1))
for ind, (detection, overlap) in enumerate(zip(detections, overlaps)):
category = int(detection[-1]) - 1
xtl, ytl = detection[0], detection[1]
xbr, ybr = detection[2], detection[3]
det_height = int(ybr) - int(ytl)
det_width = int(xbr) - int(xtl)
det_max = max(det_height, det_width)
valid = det_max >= min_scale
fxtl = (xtl * width_ratio)
fytl = (ytl * height_ratio)
fxbr = (xbr * width_ratio)
fybr = (ybr * height_ratio)
xtl = int(fxtl)
ytl = int(fytl)
xbr = int(fxbr)
ybr = int(fybr)
width = detection[2] - detection[0]
height = detection[3] - detection[1]
width = math.ceil(width * width_ratio)
height = math.ceil(height * height_ratio)
if gaussian_rad == -1:
radius = gaussian_radius((height, width), gaussian_iou)
radius = max(0, int(radius))
else:
radius = gaussian_rad
if overlap and valid:
draw_gaussian(tl_heats[b_ind, category], [xtl, ytl], radius)
draw_gaussian(br_heats[b_ind, category], [xbr, ybr], radius)
tag_ind = tag_lens[b_ind]
tl_regrs[b_ind, tag_ind, :] = [fxtl - xtl, fytl - ytl]
br_regrs[b_ind, tag_ind, :] = [fxbr - xbr, fybr - ybr]
tl_tags[b_ind, tag_ind] = ytl * output_size[1] + xtl
br_tags[b_ind, tag_ind] = ybr * output_size[1] + xbr
tag_lens[b_ind] += 1
else:
draw_gaussian(tl_valids[b_ind, category], [xtl, ytl], radius)
draw_gaussian(br_valids[b_ind, category], [xbr, ybr], radius)
tl_valids = (tl_valids == 0).astype(np.float32)
br_valids = (br_valids == 0).astype(np.float32)
for b_ind in range(batch_size):
tag_len = tag_lens[b_ind]
tag_masks[b_ind, :tag_len] = 1
images = torch.from_numpy(images)
tl_heats = torch.from_numpy(tl_heats)
br_heats = torch.from_numpy(br_heats)
tl_regrs = torch.from_numpy(tl_regrs)
br_regrs = torch.from_numpy(br_regrs)
tl_tags = torch.from_numpy(tl_tags)
br_tags = torch.from_numpy(br_tags)
tag_masks = torch.from_numpy(tag_masks)
tl_valids = torch.from_numpy(tl_valids)
br_valids = torch.from_numpy(br_valids)
attentions = [torch.from_numpy(att) for att in attentions]
return {
"xs": [images],
"ys": [tl_heats, br_heats, tag_masks, tl_regrs, br_regrs, tl_tags, br_tags, tl_valids, br_valids, attentions]
}, k_ind

View File

@@ -1,178 +0,0 @@
import random
import cv2
import numpy as np
def grayscale(image):
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
def normalize_(image, mean, std):
image -= mean
image /= std
def lighting_(data_rng, image, alphastd, eigval, eigvec):
alpha = data_rng.normal(scale=alphastd, size=(3,))
image += np.dot(eigvec, eigval * alpha)
def blend_(alpha, image1, image2):
image1 *= alpha
image2 *= (1 - alpha)
image1 += image2
def saturation_(data_rng, image, gs, gs_mean, var):
alpha = 1. + data_rng.uniform(low=-var, high=var)
blend_(alpha, image, gs[:, :, None])
def brightness_(data_rng, image, gs, gs_mean, var):
alpha = 1. + data_rng.uniform(low=-var, high=var)
image *= alpha
def contrast_(data_rng, image, gs, gs_mean, var):
alpha = 1. + data_rng.uniform(low=-var, high=var)
blend_(alpha, image, gs_mean)
def color_jittering_(data_rng, image):
functions = [brightness_, contrast_, saturation_]
random.shuffle(functions)
gs = grayscale(image)
gs_mean = gs.mean()
for f in functions:
f(data_rng, image, gs, gs_mean, 0.4)
def gaussian2D(shape, sigma=1):
m, n = [(ss - 1.) / 2. for ss in shape]
y, x = np.ogrid[-m:m + 1, -n:n + 1]
h = np.exp(-(x * x + y * y) / (2 * sigma * sigma))
h[h < np.finfo(h.dtype).eps * h.max()] = 0
return h
def draw_gaussian(heatmap, center, radius, k=1):
diameter = 2 * radius + 1
gaussian = gaussian2D((diameter, diameter), sigma=diameter / 6)
x, y = center
height, width = heatmap.shape[0:2]
left, right = min(x, radius), min(width - x, radius + 1)
top, bottom = min(y, radius), min(height - y, radius + 1)
masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right]
masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:radius + right]
np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap)
def gaussian_radius(det_size, min_overlap):
height, width = det_size
a1 = 1
b1 = (height + width)
c1 = width * height * (1 - min_overlap) / (1 + min_overlap)
sq1 = np.sqrt(b1 ** 2 - 4 * a1 * c1)
r1 = (b1 - sq1) / (2 * a1)
a2 = 4
b2 = 2 * (height + width)
c2 = (1 - min_overlap) * width * height
sq2 = np.sqrt(b2 ** 2 - 4 * a2 * c2)
r2 = (b2 - sq2) / (2 * a2)
a3 = 4 * min_overlap
b3 = -2 * min_overlap * (height + width)
c3 = (min_overlap - 1) * width * height
sq3 = np.sqrt(b3 ** 2 - 4 * a3 * c3)
r3 = (b3 + sq3) / (2 * a3)
return min(r1, r2, r3)
def _get_border(border, size):
i = 1
while size - border // i <= border // i:
i *= 2
return border // i
def random_crop(image, detections, random_scales, view_size, border=64):
view_height, view_width = view_size
image_height, image_width = image.shape[0:2]
scale = np.random.choice(random_scales)
height = int(view_height * scale)
width = int(view_width * scale)
cropped_image = np.zeros((height, width, 3), dtype=image.dtype)
w_border = _get_border(border, image_width)
h_border = _get_border(border, image_height)
ctx = np.random.randint(low=w_border, high=image_width - w_border)
cty = np.random.randint(low=h_border, high=image_height - h_border)
x0, x1 = max(ctx - width // 2, 0), min(ctx + width // 2, image_width)
y0, y1 = max(cty - height // 2, 0), min(cty + height // 2, image_height)
left_w, right_w = ctx - x0, x1 - ctx
top_h, bottom_h = cty - y0, y1 - cty
# crop image
cropped_ctx, cropped_cty = width // 2, height // 2
x_slice = slice(cropped_ctx - left_w, cropped_ctx + right_w)
y_slice = slice(cropped_cty - top_h, cropped_cty + bottom_h)
cropped_image[y_slice, x_slice, :] = image[y0:y1, x0:x1, :]
# crop detections
cropped_detections = detections.copy()
cropped_detections[:, 0:4:2] -= x0
cropped_detections[:, 1:4:2] -= y0
cropped_detections[:, 0:4:2] += cropped_ctx - left_w
cropped_detections[:, 1:4:2] += cropped_cty - top_h
return cropped_image, cropped_detections
def crop_image(image, center, size, output_size=None):
if output_size == None:
output_size = size
cty, ctx = center
height, width = size
o_height, o_width = output_size
im_height, im_width = image.shape[0:2]
cropped_image = np.zeros((o_height, o_width, 3), dtype=image.dtype)
x0, x1 = max(0, ctx - width // 2), min(ctx + width // 2, im_width)
y0, y1 = max(0, cty - height // 2), min(cty + height // 2, im_height)
left, right = ctx - x0, x1 - ctx
top, bottom = cty - y0, y1 - cty
cropped_cty, cropped_ctx = o_height // 2, o_width // 2
y_slice = slice(cropped_cty - top, cropped_cty + bottom)
x_slice = slice(cropped_ctx - left, cropped_ctx + right)
cropped_image[y_slice, x_slice, :] = image[y0:y1, x0:x1, :]
border = np.array([
cropped_cty - top,
cropped_cty + bottom,
cropped_ctx - left,
cropped_ctx + right
], dtype=np.float32)
offset = np.array([
cty - o_height // 2,
ctx - o_width // 2
])
return cropped_image, border, offset

View File

@@ -1,5 +0,0 @@
from .cornernet import cornernet
from .cornernet_saccade import cornernet_saccade
def test_func(sys_config, db, nnet, result_dir, debug=False):
return globals()[sys_config.sampling_function](db, nnet, result_dir, debug=debug)

View File

@@ -1,180 +0,0 @@
import json
import os
import cv2
import numpy as np
import torch
from tqdm import tqdm
from ..external.nms import soft_nms, soft_nms_merge
from ..sample.utils import crop_image
from ..utils import Timer
from ..vis_utils import draw_bboxes
def rescale_dets_(detections, ratios, borders, sizes):
xs, ys = detections[..., 0:4:2], detections[..., 1:4:2]
xs /= ratios[:, 1][:, None, None]
ys /= ratios[:, 0][:, None, None]
xs -= borders[:, 2][:, None, None]
ys -= borders[:, 0][:, None, None]
np.clip(xs, 0, sizes[:, 1][:, None, None], out=xs)
np.clip(ys, 0, sizes[:, 0][:, None, None], out=ys)
def decode(nnet, images, K, ae_threshold=0.5, kernel=3, num_dets=1000):
detections = nnet.test([images], ae_threshold=ae_threshold, test=True, K=K, kernel=kernel, num_dets=num_dets)[0]
return detections.data.cpu().numpy()
def cornernet(db, nnet, result_dir, debug=False, decode_func=decode):
debug_dir = os.path.join(result_dir, "debug")
if not os.path.exists(debug_dir):
os.makedirs(debug_dir)
if db.split != "trainval2014":
db_inds = db.db_inds[:100] if debug else db.db_inds
else:
db_inds = db.db_inds[:100] if debug else db.db_inds[:5000]
num_images = db_inds.size
categories = db.configs["categories"]
timer = Timer()
top_bboxes = {}
for ind in tqdm(range(0, num_images), ncols=80, desc="locating kps"):
db_ind = db_inds[ind]
image_id = db.image_ids(db_ind)
image_path = db.image_path(db_ind)
image = cv2.imread(image_path)
timer.tic()
top_bboxes[image_id] = cornernet_inference(db, nnet, image)
timer.toc()
if debug:
image_path = db.image_path(db_ind)
image = cv2.imread(image_path)
bboxes = {
db.cls2name(j): top_bboxes[image_id][j]
for j in range(1, categories + 1)
}
image = draw_bboxes(image, bboxes)
debug_file = os.path.join(debug_dir, "{}.jpg".format(db_ind))
cv2.imwrite(debug_file, image)
print('average time: {}'.format(timer.average_time))
result_json = os.path.join(result_dir, "results.json")
detections = db.convert_to_coco(top_bboxes)
with open(result_json, "w") as f:
json.dump(detections, f)
cls_ids = list(range(1, categories + 1))
image_ids = [db.image_ids(ind) for ind in db_inds]
db.evaluate(result_json, cls_ids, image_ids)
return 0
def cornernet_inference(db, nnet, image, decode_func=decode):
K = db.configs["top_k"]
ae_threshold = db.configs["ae_threshold"]
nms_kernel = db.configs["nms_kernel"]
num_dets = db.configs["num_dets"]
test_flipped = db.configs["test_flipped"]
input_size = db.configs["input_size"]
output_size = db.configs["output_sizes"][0]
scales = db.configs["test_scales"]
weight_exp = db.configs["weight_exp"]
merge_bbox = db.configs["merge_bbox"]
categories = db.configs["categories"]
nms_threshold = db.configs["nms_threshold"]
max_per_image = db.configs["max_per_image"]
nms_algorithm = {
"nms": 0,
"linear_soft_nms": 1,
"exp_soft_nms": 2
}[db.configs["nms_algorithm"]]
height, width = image.shape[0:2]
height_scale = (input_size[0] + 1) // output_size[0]
width_scale = (input_size[1] + 1) // output_size[1]
im_mean = torch.cuda.FloatTensor(db.mean).reshape(1, 3, 1, 1)
im_std = torch.cuda.FloatTensor(db.std).reshape(1, 3, 1, 1)
detections = []
for scale in scales:
new_height = int(height * scale)
new_width = int(width * scale)
new_center = np.array([new_height // 2, new_width // 2])
inp_height = new_height | 127
inp_width = new_width | 127
images = np.zeros((1, 3, inp_height, inp_width), dtype=np.float32)
ratios = np.zeros((1, 2), dtype=np.float32)
borders = np.zeros((1, 4), dtype=np.float32)
sizes = np.zeros((1, 2), dtype=np.float32)
out_height, out_width = (inp_height + 1) // height_scale, (inp_width + 1) // width_scale
height_ratio = out_height / inp_height
width_ratio = out_width / inp_width
resized_image = cv2.resize(image, (new_width, new_height))
resized_image, border, offset = crop_image(resized_image, new_center, [inp_height, inp_width])
resized_image = resized_image / 255.
images[0] = resized_image.transpose((2, 0, 1))
borders[0] = border
sizes[0] = [int(height * scale), int(width * scale)]
ratios[0] = [height_ratio, width_ratio]
if test_flipped:
images = np.concatenate((images, images[:, :, :, ::-1]), axis=0)
images = torch.from_numpy(images).cuda()
images -= im_mean
images /= im_std
dets = decode_func(nnet, images, K, ae_threshold=ae_threshold, kernel=nms_kernel, num_dets=num_dets)
if test_flipped:
dets[1, :, [0, 2]] = out_width - dets[1, :, [2, 0]]
dets = dets.reshape(1, -1, 8)
rescale_dets_(dets, ratios, borders, sizes)
dets[:, :, 0:4] /= scale
detections.append(dets)
detections = np.concatenate(detections, axis=1)
classes = detections[..., -1]
classes = classes[0]
detections = detections[0]
# reject detections with negative scores
keep_inds = (detections[:, 4] > -1)
detections = detections[keep_inds]
classes = classes[keep_inds]
top_bboxes = {}
for j in range(categories):
keep_inds = (classes == j)
top_bboxes[j + 1] = detections[keep_inds][:, 0:7].astype(np.float32)
if merge_bbox:
soft_nms_merge(top_bboxes[j + 1], Nt=nms_threshold, method=nms_algorithm, weight_exp=weight_exp)
else:
soft_nms(top_bboxes[j + 1], Nt=nms_threshold, method=nms_algorithm)
top_bboxes[j + 1] = top_bboxes[j + 1][:, 0:5]
scores = np.hstack([top_bboxes[j][:, -1] for j in range(1, categories + 1)])
if len(scores) > max_per_image:
kth = len(scores) - max_per_image
thresh = np.partition(scores, kth)[kth]
for j in range(1, categories + 1):
keep_inds = (top_bboxes[j][:, -1] >= thresh)
top_bboxes[j] = top_bboxes[j][keep_inds]
return top_bboxes

View File

@@ -1,405 +0,0 @@
import json
import math
import os
import cv2
import numpy as np
import torch
import torch.nn as nn
from tqdm import tqdm
from ..external.nms import soft_nms
from ..utils import Timer
from ..vis_utils import draw_bboxes
def crop_image_gpu(image, center, size, out_image):
cty, ctx = center
height, width = size
o_height, o_width = out_image.shape[1:3]
im_height, im_width = image.shape[1:3]
scale = o_height / max(height, width)
x0, x1 = max(0, ctx - width // 2), min(ctx + width // 2, im_width)
y0, y1 = max(0, cty - height // 2), min(cty + height // 2, im_height)
left, right = ctx - x0, x1 - ctx
top, bottom = cty - y0, y1 - cty
cropped_cty, cropped_ctx = o_height // 2, o_width // 2
out_y0, out_y1 = cropped_cty - int(top * scale), cropped_cty + int(bottom * scale)
out_x0, out_x1 = cropped_ctx - int(left * scale), cropped_ctx + int(right * scale)
new_height = out_y1 - out_y0
new_width = out_x1 - out_x0
image = image[:, y0:y1, x0:x1].unsqueeze(0)
out_image[:, out_y0:out_y1, out_x0:out_x1] = nn.functional.interpolate(
image, size=[new_height, new_width], mode='bilinear'
)[0]
return np.array([cty - height // 2, ctx - width // 2])
def remap_dets_(detections, scales, offsets):
xs, ys = detections[..., 0:4:2], detections[..., 1:4:2]
xs /= scales.reshape(-1, 1, 1)
ys /= scales.reshape(-1, 1, 1)
xs += offsets[:, 1][:, None, None]
ys += offsets[:, 0][:, None, None]
def att_nms(atts, ks):
pads = [(k - 1) // 2 for k in ks]
pools = [nn.functional.max_pool2d(att, (k, k), stride=1, padding=pad) for k, att, pad in zip(ks, atts, pads)]
keeps = [(att == pool).float() for att, pool in zip(atts, pools)]
atts = [att * keep for att, keep in zip(atts, keeps)]
return atts
def batch_decode(db, nnet, images, no_att=False):
K = db.configs["top_k"]
ae_threshold = db.configs["ae_threshold"]
kernel = db.configs["nms_kernel"]
num_dets = db.configs["num_dets"]
att_nms_ks = db.configs["att_nms_ks"]
att_ranges = db.configs["att_ranges"]
num_images = images.shape[0]
detections = []
attentions = [[] for _ in range(len(att_ranges))]
batch_size = 32
for b_ind in range(math.ceil(num_images / batch_size)):
b_start = b_ind * batch_size
b_end = min(num_images, (b_ind + 1) * batch_size)
b_images = images[b_start:b_end]
b_outputs = nnet.test(
[b_images], ae_threshold=ae_threshold, K=K, kernel=kernel,
test=True, num_dets=num_dets, no_border=True, no_att=no_att
)
if no_att:
b_detections = b_outputs
else:
b_detections = b_outputs[0]
b_attentions = b_outputs[1]
b_attentions = att_nms(b_attentions, att_nms_ks)
b_attentions = [b_attention.data.cpu().numpy() for b_attention in b_attentions]
b_detections = b_detections.data.cpu().numpy()
detections.append(b_detections)
if not no_att:
for attention, b_attention in zip(attentions, b_attentions):
attention.append(b_attention)
if not no_att:
attentions = [np.concatenate(atts, axis=0) for atts in attentions] if detections else None
detections = np.concatenate(detections, axis=0) if detections else np.zeros((0, num_dets, 8))
return detections, attentions
def decode_atts(db, atts, att_scales, scales, offsets, height, width, thresh, ignore_same=False):
att_ranges = db.configs["att_ranges"]
att_ratios = db.configs["att_ratios"]
input_size = db.configs["input_size"]
next_ys, next_xs, next_scales, next_scores = [], [], [], []
num_atts = atts[0].shape[0]
for aind in range(num_atts):
for att, att_range, att_ratio, att_scale in zip(atts, att_ranges, att_ratios, att_scales):
ys, xs = np.where(att[aind, 0] > thresh)
scores = att[aind, 0, ys, xs]
ys = ys * att_ratio / scales[aind] + offsets[aind, 0]
xs = xs * att_ratio / scales[aind] + offsets[aind, 1]
keep = (ys >= 0) & (ys < height) & (xs >= 0) & (xs < width)
ys, xs, scores = ys[keep], xs[keep], scores[keep]
next_scale = att_scale * scales[aind]
if (ignore_same and att_scale <= 1) or scales[aind] > 2 or next_scale > 4:
continue
next_scales += [next_scale] * len(xs)
next_scores += scores.tolist()
next_ys += ys.tolist()
next_xs += xs.tolist()
next_ys = np.array(next_ys)
next_xs = np.array(next_xs)
next_scales = np.array(next_scales)
next_scores = np.array(next_scores)
return np.stack((next_ys, next_xs, next_scales, next_scores), axis=1)
def get_ref_locs(dets):
keep = dets[:, 4] > 0.5
dets = dets[keep]
ref_xs = (dets[:, 0] + dets[:, 2]) / 2
ref_ys = (dets[:, 1] + dets[:, 3]) / 2
ref_maxhws = np.maximum(dets[:, 2] - dets[:, 0], dets[:, 3] - dets[:, 1])
ref_scales = np.zeros_like(ref_maxhws)
ref_scores = dets[:, 4]
large_inds = ref_maxhws > 96
medium_inds = (ref_maxhws > 32) & (ref_maxhws <= 96)
small_inds = ref_maxhws <= 32
ref_scales[large_inds] = 192 / ref_maxhws[large_inds]
ref_scales[medium_inds] = 64 / ref_maxhws[medium_inds]
ref_scales[small_inds] = 24 / ref_maxhws[small_inds]
new_locations = np.stack((ref_ys, ref_xs, ref_scales, ref_scores), axis=1)
new_locations[:, 3] = 1
return new_locations
def get_locs(db, nnet, image, im_mean, im_std, att_scales, thresh, sizes, ref_dets=True):
att_ranges = db.configs["att_ranges"]
att_ratios = db.configs["att_ratios"]
input_size = db.configs["input_size"]
height, width = image.shape[1:3]
locations = []
for size in sizes:
scale = size / max(height, width)
location = [height // 2, width // 2, scale]
locations.append(location)
locations = np.array(locations, dtype=np.float32)
images, offsets = prepare_images(db, image, locations, flipped=False)
images -= im_mean
images /= im_std
dets, atts = batch_decode(db, nnet, images)
scales = locations[:, 2]
next_locations = decode_atts(db, atts, att_scales, scales, offsets, height, width, thresh)
rescale_dets_(db, dets)
remap_dets_(dets, scales, offsets)
dets = dets.reshape(-1, 8)
keep = dets[:, 4] > 0.3
dets = dets[keep]
if ref_dets:
ref_locations = get_ref_locs(dets)
next_locations = np.concatenate((next_locations, ref_locations), axis=0)
next_locations = location_nms(next_locations, thresh=16)
return dets, next_locations, atts
def location_nms(locations, thresh=15):
next_locations = []
sorted_inds = np.argsort(locations[:, -1])[::-1]
locations = locations[sorted_inds]
ys = locations[:, 0]
xs = locations[:, 1]
scales = locations[:, 2]
dist_ys = np.absolute(ys.reshape(-1, 1) - ys.reshape(1, -1))
dist_xs = np.absolute(xs.reshape(-1, 1) - xs.reshape(1, -1))
dists = np.minimum(dist_ys, dist_xs)
ratios = scales.reshape(-1, 1) / scales.reshape(1, -1)
while dists.shape[0] > 0:
next_locations.append(locations[0])
scale = scales[0]
dist = dists[0]
ratio = ratios[0]
keep = (dist > (thresh / scale)) | (ratio > 1.2) | (ratio < 0.8)
locations = locations[keep]
scales = scales[keep]
dists = dists[keep, :]
dists = dists[:, keep]
ratios = ratios[keep, :]
ratios = ratios[:, keep]
return np.stack(next_locations) if next_locations else np.zeros((0, 4))
def prepare_images(db, image, locs, flipped=True):
input_size = db.configs["input_size"]
num_patches = locs.shape[0]
images = torch.zeros((num_patches, 3, input_size[0], input_size[1]), dtype=torch.float32, device='cuda')
offsets = np.zeros((num_patches, 2), dtype=np.float32)
for ind, (y, x, scale) in enumerate(locs[:, :3]):
crop_height = int(input_size[0] / scale)
crop_width = int(input_size[1] / scale)
offsets[ind] = crop_image_gpu(image, [int(y), int(x)], [crop_height, crop_width], images[ind])
return images, offsets
def rescale_dets_(db, dets):
input_size = db.configs["input_size"]
output_size = db.configs["output_sizes"][0]
ratios = [o / i for o, i in zip(output_size, input_size)]
dets[..., 0:4:2] /= ratios[1]
dets[..., 1:4:2] /= ratios[0]
def cornernet_saccade(db, nnet, result_dir, debug=False, decode_func=batch_decode):
debug_dir = os.path.join(result_dir, "debug")
if not os.path.exists(debug_dir):
os.makedirs(debug_dir)
if db.split != "trainval2014":
db_inds = db.db_inds[:500] if debug else db.db_inds
else:
db_inds = db.db_inds[:100] if debug else db.db_inds[:5000]
num_images = db_inds.size
categories = db.configs["categories"]
timer = Timer()
top_bboxes = {}
for k_ind in tqdm(range(0, num_images), ncols=80, desc="locating kps"):
db_ind = db_inds[k_ind]
image_id = db.image_ids(db_ind)
image_path = db.image_path(db_ind)
image = cv2.imread(image_path)
timer.tic()
top_bboxes[image_id] = cornernet_saccade_inference(db, nnet, image)
timer.toc()
if debug:
image_path = db.image_path(db_ind)
image = cv2.imread(image_path)
bboxes = {
db.cls2name(j): top_bboxes[image_id][j]
for j in range(1, categories + 1)
}
image = draw_bboxes(image, bboxes)
debug_file = os.path.join(debug_dir, "{}.jpg".format(db_ind))
cv2.imwrite(debug_file, image)
print('average time: {}'.format(timer.average_time))
result_json = os.path.join(result_dir, "results.json")
detections = db.convert_to_coco(top_bboxes)
with open(result_json, "w") as f:
json.dump(detections, f)
cls_ids = list(range(1, categories + 1))
image_ids = [db.image_ids(ind) for ind in db_inds]
db.evaluate(result_json, cls_ids, image_ids)
return 0
def cornernet_saccade_inference(db, nnet, image, decode_func=batch_decode):
init_sizes = db.configs["init_sizes"]
ref_dets = db.configs["ref_dets"]
att_thresholds = db.configs["att_thresholds"]
att_scales = db.configs["att_scales"]
att_max_crops = db.configs["att_max_crops"]
categories = db.configs["categories"]
nms_threshold = db.configs["nms_threshold"]
max_per_image = db.configs["max_per_image"]
nms_algorithm = {
"nms": 0,
"linear_soft_nms": 1,
"exp_soft_nms": 2
}[db.configs["nms_algorithm"]]
num_iterations = len(att_thresholds)
im_mean = torch.tensor(db.mean, dtype=torch.float32, device='cuda').reshape(1, 3, 1, 1)
im_std = torch.tensor(db.std, dtype=torch.float32, device='cuda').reshape(1, 3, 1, 1)
height, width = image.shape[0:2]
image = image / 255.
image = image.transpose((2, 0, 1)).copy()
image = torch.from_numpy(image).cuda(non_blocking=True)
dets, locations, atts = get_locs(
db, nnet, image, im_mean, im_std,
att_scales[0], att_thresholds[0],
init_sizes, ref_dets=ref_dets
)
detections = [dets]
num_patches = locations.shape[0]
num_crops = 0
for ind in range(1, num_iterations + 1):
if num_patches == 0:
break
if num_crops + num_patches > att_max_crops:
max_crops = min(att_max_crops - num_crops, num_patches)
locations = locations[:max_crops]
num_patches = locations.shape[0]
num_crops += locations.shape[0]
no_att = (ind == num_iterations)
images, offsets = prepare_images(db, image, locations, flipped=False)
images -= im_mean
images /= im_std
dets, atts = decode_func(db, nnet, images, no_att=no_att)
dets = dets.reshape(num_patches, -1, 8)
rescale_dets_(db, dets)
remap_dets_(dets, locations[:, 2], offsets)
dets = dets.reshape(-1, 8)
keeps = (dets[:, 4] > -1)
dets = dets[keeps]
detections.append(dets)
if num_crops == att_max_crops:
break
if ind < num_iterations:
att_threshold = att_thresholds[ind]
att_scale = att_scales[ind]
next_locations = decode_atts(
db, atts, att_scale, locations[:, 2], offsets, height, width, att_threshold, ignore_same=True
)
if ref_dets:
ref_locations = get_ref_locs(dets)
next_locations = np.concatenate((next_locations, ref_locations), axis=0)
next_locations = location_nms(next_locations, thresh=16)
locations = next_locations
num_patches = locations.shape[0]
detections = np.concatenate(detections, axis=0)
classes = detections[..., -1]
top_bboxes = {}
for j in range(categories):
keep_inds = (classes == j)
top_bboxes[j + 1] = detections[keep_inds][:, 0:7].astype(np.float32)
keep_inds = soft_nms(top_bboxes[j + 1], Nt=nms_threshold, method=nms_algorithm, sigma=0.7)
top_bboxes[j + 1] = top_bboxes[j + 1][keep_inds, 0:5]
scores = np.hstack([top_bboxes[j][:, -1] for j in range(1, categories + 1)])
if len(scores) > max_per_image:
kth = len(scores) - max_per_image
thresh = np.partition(scores, kth)[kth]
for j in range(1, categories + 1):
keep_inds = (top_bboxes[j][:, -1] >= thresh)
top_bboxes[j] = top_bboxes[j][keep_inds]
return top_bboxes

View File

@@ -1,2 +0,0 @@
from .tqdm import stdout_to_tqdm
from .timer import Timer

View File

@@ -1,27 +0,0 @@
import time
class Timer(object):
"""A simple timer."""
def __init__(self):
self.total_time = 0.
self.calls = 0
self.start_time = 0.
self.diff = 0.
self.average_time = 0.
def tic(self):
# using time.time instead of time.clock because time time.clock
# does not normalize for multithreading
self.start_time = time.time()
def toc(self, average=True):
self.diff = time.time() - self.start_time
self.total_time += self.diff
self.calls += 1
self.average_time = self.total_time / self.calls
if average:
return self.average_time
else:
return self.diff

View File

@@ -1,27 +0,0 @@
import contextlib
import sys
from tqdm import tqdm
class TqdmFile(object):
dummy_file = None
def __init__(self, dummy_file):
self.dummy_file = dummy_file
def write(self, x):
if len(x.rstrip()) > 0:
tqdm.write(x, file=self.dummy_file)
@contextlib.contextmanager
def stdout_to_tqdm():
save_stdout = sys.stdout
try:
sys.stdout = TqdmFile(sys.stdout)
yield save_stdout
except Exception as exc:
raise exc
finally:
sys.stdout = save_stdout

View File

@@ -1,63 +0,0 @@
import cv2
import numpy as np
def draw_bboxes(image, bboxes, font_size=0.5, thresh=0.5, colors=None):
"""Draws bounding boxes on an image.
Args:
image: An image in OpenCV format
bboxes: A dictionary representing bounding boxes of different object
categories, where the keys are the names of the categories and the
values are the bounding boxes. The bounding boxes of category should be
stored in a 2D NumPy array, where each row is a bounding box (x1, y1,
x2, y2, score).
font_size: (Optional) Font size of the category names.
thresh: (Optional) Only bounding boxes with scores above the threshold
will be drawn.
colors: (Optional) Color of bounding boxes for each category. If it is
not provided, this function will use random color for each category.
Returns:
An image with bounding boxes.
"""
image = image.copy()
for cat_name in bboxes:
keep_inds = bboxes[cat_name][:, -1] > thresh
cat_size = cv2.getTextSize(cat_name, cv2.FONT_HERSHEY_SIMPLEX, font_size, 2)[0]
if colors is None:
color = np.random.random((3,)) * 0.6 + 0.4
color = (color * 255).astype(np.int32).tolist()
else:
color = colors[cat_name]
for bbox in bboxes[cat_name][keep_inds]:
bbox = bbox[0:4].astype(np.int32)
if bbox[1] - cat_size[1] - 2 < 0:
cv2.rectangle(image,
(bbox[0], bbox[1] + 2),
(bbox[0] + cat_size[0], bbox[1] + cat_size[1] + 2),
color, -1
)
cv2.putText(image, cat_name,
(bbox[0], bbox[1] + cat_size[1] + 2),
cv2.FONT_HERSHEY_SIMPLEX, font_size, (0, 0, 0), thickness=1
)
else:
cv2.rectangle(image,
(bbox[0], bbox[1] - cat_size[1] - 2),
(bbox[0] + cat_size[0], bbox[1] - 2),
color, -1
)
cv2.putText(image, cat_name,
(bbox[0], bbox[1] - 2),
cv2.FONT_HERSHEY_SIMPLEX, font_size, (0, 0, 0), thickness=1
)
cv2.rectangle(image,
(bbox[0], bbox[1]),
(bbox[2], bbox[3]),
color, 2
)
return image

Binary file not shown.

Before

Width:  |  Height:  |  Size: 316 KiB

View File

@@ -1,13 +0,0 @@
#!/usr/bin/env python
import cv2
from core.detectors import CornerNet_Saccade
from core.vis_utils import draw_bboxes
detector = CornerNet_Saccade()
image = cv2.imread("demo.jpg")
bboxes = detector(image)
image = draw_bboxes(image, bboxes)
cv2.imwrite("demo_out.jpg", image)

View File

@@ -1,16 +0,0 @@
import numpy as np
from object_detection import CornerNet_Saccade
from util import image_util
def capture_target_area(image, target="book"):
detector = CornerNet_Saccade()
bboxes = detector(image)
target_images = []
keep_inds = bboxes[target][:, -1] > 0.5
for bbox in bboxes[target][keep_inds]:
bbox = bbox[0:4].astype(np.int32)
bbox = np.clip(bbox, 0, None)
target_images.append(image_util.capture(image, bbox))
return target_images

View File

@@ -1,110 +0,0 @@
#!/usr/bin/env python
import argparse
import importlib
import json
import os
import pprint
import torch
from core.config import SystemConfig
from core.dbs import datasets
from core.nnet.py_factory import NetworkFactory
from core.test import test_func
torch.backends.cudnn.benchmark = False
def parse_args():
parser = argparse.ArgumentParser(description="Evaluation Script")
parser.add_argument("cfg_file", help="config file", type=str)
parser.add_argument("--testiter", dest="testiter",
help="test at iteration i",
default=None, type=int)
parser.add_argument("--split", dest="split",
help="which split to use",
default="validation", type=str)
parser.add_argument("--suffix", dest="suffix", default=None, type=str)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
return args
def make_dirs(directories):
for directory in directories:
if not os.path.exists(directory):
os.makedirs(directory)
def test(db, system_config, model, args):
split = args.split
testiter = args.testiter
debug = args.debug
suffix = args.suffix
result_dir = system_config.result_dir
result_dir = os.path.join(result_dir, str(testiter), split)
if suffix is not None:
result_dir = os.path.join(result_dir, suffix)
make_dirs([result_dir])
test_iter = system_config.max_iter if testiter is None else testiter
print("loading parameters at iteration: {}".format(test_iter))
print("building neural network...")
nnet = NetworkFactory(system_config, model)
print("loading parameters...")
nnet.load_params(test_iter)
nnet.cuda()
nnet.eval_mode()
test_func(system_config, db, nnet, result_dir, debug=debug)
def main(args):
if args.suffix is None:
cfg_file = os.path.join("./configs", args.cfg_file + ".json")
else:
cfg_file = os.path.join("./configs", args.cfg_file + "-{}.json".format(args.suffix))
print("cfg_file: {}".format(cfg_file))
with open(cfg_file, "r") as f:
config = json.load(f)
config["system"]["snapshot_name"] = args.cfg_file
system_config = SystemConfig().update_config(config["system"])
model_file = "core.models.{}".format(args.cfg_file)
model_file = importlib.import_module(model_file)
model = model_file.model()
train_split = system_config.train_split
val_split = system_config.val_split
test_split = system_config.test_split
split = {
"training": train_split,
"validation": val_split,
"testing": test_split
}[args.split]
print("loading all datasets...")
dataset = system_config.dataset
print("split: {}".format(split))
testing_db = datasets[dataset](config["db"], split=split, sys_config=system_config)
print("system config...")
pprint.pprint(system_config.full)
print("db config...")
pprint.pprint(testing_db.configs)
test(testing_db, system_config, model, args)
if __name__ == "__main__":
args = parse_args()
main(args)

View File

@@ -1,260 +0,0 @@
#!/usr/bin/env python
import argparse
import importlib
import json
import os
import pprint
import queue
import threading
import traceback
import numpy as np
import torch
import torch.distributed as dist
import torch.multiprocessing as mp
from torch.multiprocessing import Process, Queue
from tqdm import tqdm
from core.config import SystemConfig
from core.dbs import datasets
from core.nnet.py_factory import NetworkFactory
from core.sample import data_sampling_func
from core.utils import stdout_to_tqdm
torch.backends.cudnn.enabled = True
torch.backends.cudnn.benchmark = True
def parse_args():
parser = argparse.ArgumentParser(description="Training Script")
parser.add_argument("cfg_file", help="config file", type=str)
parser.add_argument("--iter", dest="start_iter",
help="train at iteration i",
default=0, type=int)
parser.add_argument("--workers", default=4, type=int)
parser.add_argument("--initialize", action="store_true")
parser.add_argument("--distributed", action="store_true")
parser.add_argument("--world-size", default=-1, type=int,
help="number of nodes of distributed training")
parser.add_argument("--rank", default=0, type=int,
help="node rank for distributed training")
parser.add_argument("--dist-url", default=None, type=str,
help="url used to set up distributed training")
parser.add_argument("--dist-backend", default="nccl", type=str)
args = parser.parse_args()
return args
def prefetch_data(system_config, db, queue, sample_data, data_aug):
ind = 0
print("start prefetching data...")
np.random.seed(os.getpid())
while True:
try:
data, ind = sample_data(system_config, db, ind, data_aug=data_aug)
queue.put(data)
except Exception as e:
traceback.print_exc()
raise e
def _pin_memory(ts):
if type(ts) is list:
return [t.pin_memory() for t in ts]
return ts.pin_memory()
def pin_memory(data_queue, pinned_data_queue, sema):
while True:
data = data_queue.get()
data["xs"] = [_pin_memory(x) for x in data["xs"]]
data["ys"] = [_pin_memory(y) for y in data["ys"]]
pinned_data_queue.put(data)
if sema.acquire(blocking=False):
return
def init_parallel_jobs(system_config, dbs, queue, fn, data_aug):
tasks = [Process(target=prefetch_data, args=(system_config, db, queue, fn, data_aug)) for db in dbs]
for task in tasks:
task.daemon = True
task.start()
return tasks
def terminate_tasks(tasks):
for task in tasks:
task.terminate()
def train(training_dbs, validation_db, system_config, model, args):
# reading arguments from command
start_iter = args.start_iter
distributed = args.distributed
world_size = args.world_size
initialize = args.initialize
gpu = args.gpu
rank = args.rank
# reading arguments from json file
batch_size = system_config.batch_size
learning_rate = system_config.learning_rate
max_iteration = system_config.max_iter
pretrained_model = system_config.pretrain
stepsize = system_config.stepsize
snapshot = system_config.snapshot
val_iter = system_config.val_iter
display = system_config.display
decay_rate = system_config.decay_rate
stepsize = system_config.stepsize
print("Process {}: building model...".format(rank))
nnet = NetworkFactory(system_config, model, distributed=distributed, gpu=gpu)
if initialize:
nnet.save_params(0)
exit(0)
# queues storing data for training
training_queue = Queue(system_config.prefetch_size)
validation_queue = Queue(5)
# queues storing pinned data for training
pinned_training_queue = queue.Queue(system_config.prefetch_size)
pinned_validation_queue = queue.Queue(5)
# allocating resources for parallel reading
training_tasks = init_parallel_jobs(system_config, training_dbs, training_queue, data_sampling_func, True)
if val_iter:
validation_tasks = init_parallel_jobs(system_config, [validation_db], validation_queue, data_sampling_func,
False)
training_pin_semaphore = threading.Semaphore()
validation_pin_semaphore = threading.Semaphore()
training_pin_semaphore.acquire()
validation_pin_semaphore.acquire()
training_pin_args = (training_queue, pinned_training_queue, training_pin_semaphore)
training_pin_thread = threading.Thread(target=pin_memory, args=training_pin_args)
training_pin_thread.daemon = True
training_pin_thread.start()
validation_pin_args = (validation_queue, pinned_validation_queue, validation_pin_semaphore)
validation_pin_thread = threading.Thread(target=pin_memory, args=validation_pin_args)
validation_pin_thread.daemon = True
validation_pin_thread.start()
if pretrained_model is not None:
if not os.path.exists(pretrained_model):
raise ValueError("pretrained model does not exist")
print("Process {}: loading from pretrained model".format(rank))
nnet.load_pretrained_params(pretrained_model)
if start_iter:
nnet.load_params(start_iter)
learning_rate /= (decay_rate ** (start_iter // stepsize))
nnet.set_lr(learning_rate)
print("Process {}: training starts from iteration {} with learning_rate {}".format(rank, start_iter + 1,
learning_rate))
else:
nnet.set_lr(learning_rate)
if rank == 0:
print("training start...")
nnet.cuda()
nnet.train_mode()
with stdout_to_tqdm() as save_stdout:
for iteration in tqdm(range(start_iter + 1, max_iteration + 1), file=save_stdout, ncols=80):
training = pinned_training_queue.get(block=True)
training_loss = nnet.train(**training)
if display and iteration % display == 0:
print("Process {}: training loss at iteration {}: {}".format(rank, iteration, training_loss.item()))
del training_loss
if val_iter and validation_db.db_inds.size and iteration % val_iter == 0:
nnet.eval_mode()
validation = pinned_validation_queue.get(block=True)
validation_loss = nnet.validate(**validation)
print("Process {}: validation loss at iteration {}: {}".format(rank, iteration, validation_loss.item()))
nnet.train_mode()
if iteration % snapshot == 0 and rank == 0:
nnet.save_params(iteration)
if iteration % stepsize == 0:
learning_rate /= decay_rate
nnet.set_lr(learning_rate)
# sending signal to kill the thread
training_pin_semaphore.release()
validation_pin_semaphore.release()
# terminating data fetching processes
terminate_tasks(training_tasks)
terminate_tasks(validation_tasks)
def main(gpu, ngpus_per_node, args):
args.gpu = gpu
if args.distributed:
args.rank = args.rank * ngpus_per_node + gpu
dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
world_size=args.world_size, rank=args.rank)
rank = args.rank
cfg_file = os.path.join("./configs", args.cfg_file + ".json")
with open(cfg_file, "r") as f:
config = json.load(f)
config["system"]["snapshot_name"] = args.cfg_file
system_config = SystemConfig().update_config(config["system"])
model_file = "core.models.{}".format(args.cfg_file)
model_file = importlib.import_module(model_file)
model = model_file.model()
train_split = system_config.train_split
val_split = system_config.val_split
print("Process {}: loading all datasets...".format(rank))
dataset = system_config.dataset
workers = args.workers
print("Process {}: using {} workers".format(rank, workers))
training_dbs = [datasets[dataset](config["db"], split=train_split, sys_config=system_config) for _ in
range(workers)]
validation_db = datasets[dataset](config["db"], split=val_split, sys_config=system_config)
if rank == 0:
print("system config...")
pprint.pprint(system_config.full)
print("db config...")
pprint.pprint(training_dbs[0].configs)
print("len of db: {}".format(len(training_dbs[0].db_inds)))
print("distributed: {}".format(args.distributed))
train(training_dbs, validation_db, system_config, model, args)
if __name__ == "__main__":
args = parse_args()
distributed = args.distributed
world_size = args.world_size
if distributed and world_size < 0:
raise ValueError("world size must be greater than 0 in distributed training")
ngpus_per_node = torch.cuda.device_count()
if distributed:
args.world_size = ngpus_per_node * args.world_size
mp.spawn(main, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
else:
main(None, ngpus_per_node, args)

View File

@@ -0,0 +1,106 @@
name: 🐛 报BUG Bug Report
description: 报告一个可复现的Bug以帮助我们修复PaddleDetection。 Report a bug to help us reproduce and fix it.
labels: [type/bug-report, status/new-issue]
body:
- type: markdown
attributes:
value: |
Thank you for submitting a PaddleDetection Bug Report!
- type: checkboxes
attributes:
label: 问题确认 Search before asking
description: >
(必选项) 在向PaddleDetection报bug之前请先查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)是否报过同样的bug。
(Required) Before submitting a bug, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/PaddlePaddle/PaddleDetection/issues).
options:
- label: >
我已经查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)没有发现相似的bug。I have searched the [issues](https://github.com/PaddlePaddle/PaddleDetection/issues) and found no similar bug report.
required: true
- type: dropdown
attributes:
label: Bug组件 Bug Component
description: |
(可选项) 请选择在哪部分代码发现这个bug。(Optional) Please select the part of PaddleDetection where you found the bug.
multiple: true
options:
- "Training"
- "Validation"
- "Inference"
- "Export"
- "Deploy"
- "Installation"
- "DataProcess"
- "Other"
validations:
required: false
- type: textarea
id: code
attributes:
label: Bug描述 Describe the Bug
description: |
请清晰而简洁地描述这个bug并附上bug复现步骤、报错信息或截图、代码改动说明或最小可复现代码。如果代码太长请将可执行代码放到[AIStudio](https://aistudio.baidu.com/aistudio/index)中并将项目设置为公开或者放到github gist上并在项目中描述清楚bug复现步骤在issue中描述期望结果与实际结果。
如果你报告的是一个报错信息,请将完整回溯的报错贴在这里,并使用 ` ```三引号块``` `展示错误信息。
placeholder: |
请清晰简洁的描述这个bug。 A clear and concise description of what the bug is.
```python
代码改动说明,或最小可复现代码。 Code change description, or sample code to reproduce the problem.
```
```shell
带有完整回溯信息的报错日志或截图。 The error log or screenshot you got, with the full traceback.
```
validations:
required: true
- type: textarea
attributes:
label: 复现环境 Environment
description: 请具体说明复现bug的环境信息。Please specify the environment information for reproducing the bug.
placeholder: |
- OS: Linux/Windows
- PaddlePaddle: 2.2.2
- PaddleDetection: release/2.4
- Python: 3.8.0
- CUDA: 10.2
- CUDNN: 7.6
- GCC: 8.2.0
validations:
required: true
- type: checkboxes
attributes:
label: Bug描述确认 Bug description confirmation
description: >
(必选项) 请确认是否提供了详细的Bug描述和环境信息确认问题是否可以复现。
(Required) Please confirm whether the bug description and environment information are provided, and whether the problem can be reproduced.
options:
- label: >
我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.
required: true
- type: checkboxes
attributes:
label: 是否愿意提交PR Are you willing to submit a PR?
description: >
(可选项) 如果你对修复bug有自己的想法十分鼓励提交[Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls)共同提升PaddleDetection。
(Optional) We encourage you to submit a [Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls) (PR) to help improve PaddleDetection for everyone, especially if you have a good understanding of how to implement a fix or feature.
options:
- label: 我愿意提交PRI'd like to help by submitting a PR!
- type: markdown
attributes:
value: >
感谢你的贡献 🎉Thanks for your contribution 🎉!

View File

@@ -0,0 +1,50 @@
name: 🚀 新需求 Feature Request
description: 提交一个你对PaddleDetection的新需求。 Submit a request for a new Paddle feature.
labels: [type/feature-request, status/new-issue]
body:
- type: markdown
attributes:
value: >
#### 你可以在这里提出你对PaddleDetection的新需求包括但不限于功能或模型缺失、功能不全或无法使用、精度/性能不符合预期等。
#### You could submit a request for a new feature here, including but not limited to: new features or models, incomplete or unusable features, accuracy/performance not as expected, etc.
- type: checkboxes
attributes:
label: 问题确认 Search before asking
description: >
在向PaddleDetection提新需求之前请先查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)是否报过同样的需求。
Before submitting a feature request, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/PaddlePaddle/PaddleDetection/issues).
options:
- label: >
我已经查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)没有类似需求。I have searched the [issues](https://github.com/PaddlePaddle/PaddleDetection/issues) and found no similar feature requests.
required: true
- type: textarea
id: description
attributes:
label: 需求描述 Feature Description
description: |
请尽可能包含任务目标、需求场景、功能描述等信息,全面的信息有利于我们准确评估你的需求。
Please include as much information as possible, such as mission objectives, requirement scenarios, functional descriptions, etc. Comprehensive information will help us accurately assess your feature request.
value: "1. 任务目标(请描述你正在做的项目是什么,如模型、论文、项目是什么?); 2. 需求场景(请描述你的项目中为什么需要用此功能); 3. 功能描述(请简单描述或设计这个功能)"
validations:
required: true
- type: checkboxes
attributes:
label: 是否愿意提交PR Are you willing to submit a PR?
description: >
(可选)如果你对新feature有自己的想法十分鼓励提交[Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls)共同提升PaddleDetection
(Optional) We encourage you to submit a [Pull Request](https://github.com/PaddlePaddle/PaddleDetection/pulls) (PR) to help improve PaddleDetection for everyone, especially if you have a good understanding of how to implement a fix or feature.
options:
- label: Yes I'd like to help by submitting a PR!
- type: markdown
attributes:
value: >
感谢你的贡献 🎉Thanks for your contribution 🎉!

View File

@@ -0,0 +1,38 @@
name: 📚 文档 Documentation Issue
description: 反馈一个官网文档错误。 Report an issue related to https://github.com/PaddlePaddle/PaddleDetection.
labels: [type/docs, status/new-issue]
body:
- type: markdown
attributes:
value: >
#### 请确认反馈的问题来自PaddlePaddle官网文档https://github.com/PaddlePaddle/PaddleDetection 。
#### Before submitting a Documentation Issue, Please make sure that issue is related to https://github.com/PaddlePaddle/PaddleDetection.
- type: textarea
id: link
attributes:
label: 文档链接&描述 Document Links & Description
description: |
请说明有问题的文档链接以及该文档存在的问题。
Please fill in the link to the document and describe the question.
validations:
required: true
- type: textarea
id: error
attributes:
label: 请提出你的建议 Please give your suggestion
description: |
请告诉我们你希望如何改进这个文档。或者你可以提个PR修复这个问题。
Please tell us how you would like to improve this document. Or you can submit a PR to fix this problem.
validations:
required: false
- type: markdown
attributes:
value: >
感谢你的贡献 🎉Thanks for your contribution 🎉!

View File

@@ -0,0 +1,37 @@
name: 🙋🏼‍♀️🙋🏻‍♂️提问 Ask a Question
description: 提出一个使用/咨询问题。 Ask a usage or consultation question.
labels: [type/question, status/new-issue]
body:
- type: checkboxes
attributes:
label: 问题确认 Search before asking
description: >
#### 你可以在这里提出一个使用/咨询问题,提问之前请确保:
- 1已经百度/谷歌搜索过你的问题,但是没有找到解答;
- 2已经在官网查询过[教程文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/GETTING_STARTED_cn.md)与[FAQ](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/docs/tutorials/FAQ),但是没有找到解答;
- 3已经在[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues)中搜索过没有找到同类issue或issue未被解答。
#### You could ask a usage or consultation question here, before your start, please make sure:
- 1) You have searched your question on Baidu/Google, but found no answer;
- 2) You have checked the [tutorials](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/GETTING_STARTED.md), but found no answer;
- 3) You have searched [the existing and past issues](https://github.com/PaddlePaddle/PaddleDetection/issues), but found no similar issue or the issue has not been answered.
options:
- label: >
我已经搜索过问题但是没有找到解答。I have searched the question and found no related answer.
required: true
- type: textarea
id: question
attributes:
label: 请提出你的问题 Please ask your question
validations:
required: true

View File

@@ -0,0 +1,23 @@
name: 🧩 其他 Others
description: 提出其他问题。 Report any other non-support related issues.
labels: [type/others, status/new-issue]
body:
- type: markdown
attributes:
value: >
#### 你可以在这里提出任何前面几类模板不适用的问题,包括但不限于:优化性建议、框架使用体验反馈、版本兼容性问题、报错信息不清楚等。
#### You can report any issues that are not applicable to the previous types of templates, including but not limited to: enhancement suggestions, feedback on the use of the framework, version compatibility issues, unclear error information, etc.
- type: textarea
id: others
attributes:
label: 问题描述 Please describe your issue
validations:
required: true
- type: markdown
attributes:
value: >
感谢你的贡献 🎉! Thanks for your contribution 🎉!

88
paddle_detection/.gitignore vendored Normal file
View File

@@ -0,0 +1,88 @@
# Virtualenv
/.venv/
/venv/
# Byte-compiled / optimized / DLL files
__pycache__/
.ipynb_checkpoints/
*.py[cod]
# C extensions
*.so
# json file
*.json
# log file
*.log
# Distribution / packaging
/bin/
*build/
/develop-eggs/
*dist/
/eggs/
/lib/
/lib64/
/output/
/inference_model/
/output_inference/
/parts/
/sdist/
/var/
*.egg-info/
/.installed.cfg
/*.egg
/.eggs
# AUTHORS and ChangeLog will be generated while packaging
/AUTHORS
/ChangeLog
# BCloud / BuildSubmitter
/build_submitter.*
/logger_client_log
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
.tox/
.coverage
.cache
.pytest_cache
nosetests.xml
coverage.xml
# Translations
*.mo
# Sphinx documentation
/docs/_build/
*.tar
*.pyc
.idea/
dataset/coco/annotations
dataset/coco/train2017
dataset/coco/val2017
dataset/voc/VOCdevkit
dataset/fruit/fruit-detection/
dataset/voc/test.txt
dataset/voc/trainval.txt
dataset/wider_face/WIDER_test
dataset/wider_face/WIDER_train
dataset/wider_face/WIDER_val
dataset/wider_face/wider_face_split
ppdet/version.py
# NPU meta folder
kernel_meta/
# MAC
*.DS_Store

View File

@@ -0,0 +1,44 @@
- repo: https://github.com/PaddlePaddle/mirrors-yapf.git
sha: 0d79c0c469bab64f7229c9aca2b1186ef47f0e37
hooks:
- id: yapf
files: \.py$
- repo: https://github.com/pre-commit/pre-commit-hooks
sha: a11d9314b22d8f8c7556443875b731ef05965464
hooks:
- id: check-merge-conflict
- id: check-symlinks
- id: detect-private-key
files: (?!.*paddle)^.*$
- id: end-of-file-fixer
files: \.(md|yml)$
- id: trailing-whitespace
files: \.(md|yml)$
- repo: https://github.com/Lucas-C/pre-commit-hooks
sha: v1.0.1
hooks:
- id: forbid-crlf
files: \.(md|yml)$
- id: remove-crlf
files: \.(md|yml)$
- id: forbid-tabs
files: \.(md|yml)$
- id: remove-tabs
files: \.(md|yml)$
- repo: local
hooks:
- id: clang-format-with-version-check
name: clang-format
description: Format files with ClangFormat.
entry: bash ./.travis/codestyle/clang_format.hook -i
language: system
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto)$
- repo: local
hooks:
- id: cpplint-cpp-source
name: cpplint
description: Check C++ code style using cpplint.py.
entry: bash ./.travis/codestyle/cpplint_pre_commit.hook
language: system
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx)$

View File

@@ -0,0 +1,3 @@
[style]
based_on_style = pep8
column_limit = 80

View File

@@ -0,0 +1,35 @@
language: cpp
cache: ccache
sudo: required
dist: trusty
services:
- docker
os:
- linux
env:
- JOB=PRE_COMMIT
addons:
apt:
packages:
- git
- python
- python-pip
- python2.7-dev
ssh_known_hosts: 13.229.163.131
before_install:
- sudo pip install -U virtualenv pre-commit pip -i https://pypi.tuna.tsinghua.edu.cn/simple
- docker pull paddlepaddle/paddle:latest
- git pull https://github.com/PaddlePaddle/PaddleDetection develop
script:
- exit_code=0
- .travis/precommit.sh || exit_code=$(( exit_code | $? ))
# - docker run -i --rm -v "$PWD:/py_unittest" paddlepaddle/paddle:latest /bin/bash -c
# 'cd /py_unittest; sh .travis/unittest.sh' || exit_code=$(( exit_code | $? ))
- if [ $exit_code -eq 0 ]; then true; else exit 1; fi;
notifications:
email:
on_success: change
on_failure: always

View File

@@ -0,0 +1,4 @@
#!/bin/bash
set -e
clang-format $@

View File

@@ -0,0 +1,27 @@
#!/bin/bash
TOTAL_ERRORS=0
if [[ ! $TRAVIS_BRANCH ]]; then
# install cpplint on local machine.
if [[ ! $(which cpplint) ]]; then
pip install cpplint
fi
# diff files on local machine.
files=$(git diff --cached --name-status | awk '$1 != "D" {print $2}')
else
# diff files between PR and latest commit on Travis CI.
branch_ref=$(git rev-parse "$TRAVIS_BRANCH")
head_ref=$(git rev-parse HEAD)
files=$(git diff --name-status $branch_ref $head_ref | awk '$1 != "D" {print $2}')
fi
# The trick to remove deleted files: https://stackoverflow.com/a/2413151
for file in $files; do
if [[ $file =~ ^(patches/.*) ]]; then
continue;
else
cpplint --filter=-readability/fn_size,-build/include_what_you_use,-build/c++11 $file;
TOTAL_ERRORS=$(expr $TOTAL_ERRORS + $?);
fi
done
exit $TOTAL_ERRORS

View File

@@ -0,0 +1,21 @@
#!/bin/bash
function abort(){
echo "Your commit not fit PaddlePaddle code style" 1>&2
echo "Please use pre-commit scripts to auto-format your code" 1>&2
exit 1
}
trap 'abort' 0
set -e
cd `dirname $0`
cd ..
export PATH=/usr/bin:$PATH
pre-commit install
if ! pre-commit run -a ; then
ls -lh
git diff --exit-code
exit 1
fi
trap : 0

View File

@@ -0,0 +1,8 @@
# add python requirements for unittests here, note install pycocotools
# directly is not supported in travis ci, it is installed by compiling
# from source files in unittest.sh
tqdm
cython
shapely
llvmlite==0.33
numba==0.50

View File

@@ -0,0 +1,47 @@
#!/bin/bash
abort(){
echo "Run unittest failed" 1>&2
echo "Please check your code" 1>&2
echo " 1. you can run unit tests by 'bash .travis/unittest.sh' locally" 1>&2
echo " 2. you can add python requirements in .travis/requirements.txt if you use new requirements in unit tests" 1>&2
exit 1
}
unittest(){
if [ $? != 0 ]; then
exit 1
fi
find "./ppdet" -name 'tests' -type d -print0 | \
xargs -0 -I{} -n1 bash -c \
'python -m unittest discover -v -s {}'
}
trap 'abort' 0
set -e
# install travis python dependencies exclude pycocotools
if [ -f ".travis/requirements.txt" ]; then
pip install -r .travis/requirements.txt
fi
# install pycocotools
if [ `pip list | grep pycocotools | wc -l` -eq 0 ]; then
# install git if needed
if [ -n `which git` ]; then
apt-get update
apt-get install -y git
fi;
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make install
python setup.py install --user
cd ../..
rm -rf cocoapi
fi
export PYTHONPATH=`pwd`:$PYTHONPATH
unittest .
trap : 0

201
paddle_detection/LICENSE Normal file
View File

@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

1
paddle_detection/README.md Symbolic link
View File

@@ -0,0 +1 @@
README_cn.md

View File

@@ -0,0 +1,878 @@
简体中文 | [English](README_en.md)
<div align="center">
<p align="center">
<img src="https://user-images.githubusercontent.com/48054808/160532560-34cf7a1f-d950-435e-90d2-4b0a679e5119.png" align="middle" width = "800" />
</p>
<p align="center">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleDetection/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleDetection?color=ffa"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleDetection/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleDetection?color=ccf"></a>
</p>
</div>
## 💌目录
- [💌目录](#目录)
- [🌈简介](#简介)
- [📣最新进展](#最新进展)
- [👫开源社区](#开源社区)
- [✨主要特性](#主要特性)
- [🧩模块化设计](#模块化设计)
- [📱丰富的模型库](#丰富的模型库)
- [🎗️产业特色模型|产业工具](#️产业特色模型产业工具)
- [💡🏆产业级部署实践](#产业级部署实践)
- [🍱安装](#安装)
- [🔥教程](#教程)
- [🔑FAQ](#faq)
- [🧩模块组件](#模块组件)
- [📱模型库](#模型库)
- [⚖️模型性能对比](#️模型性能对比)
- [🖥️服务器端模型性能对比](#️服务器端模型性能对比)
- [⌚️移动端模型性能对比](#️移动端模型性能对比)
- [🎗️产业特色模型|产业工具](#️产业特色模型产业工具-1)
- [💎PP-YOLOE 高精度目标检测模型](#pp-yoloe-高精度目标检测模型)
- [💎PP-YOLOE-R 高性能旋转框检测模型](#pp-yoloe-r-高性能旋转框检测模型)
- [💎PP-YOLOE-SOD 高精度小目标检测模型](#pp-yoloe-sod-高精度小目标检测模型)
- [💫PP-PicoDet 超轻量实时目标检测模型](#pp-picodet-超轻量实时目标检测模型)
- [📡PP-Tracking 实时多目标跟踪系统](#pp-tracking-实时多目标跟踪系统)
- [PP-TinyPose 人体骨骼关键点识别](#pp-tinypose-人体骨骼关键点识别)
- [🏃🏻PP-Human 实时行人分析工具](#pp-human-实时行人分析工具)
- [🏎PP-Vehicle 实时车辆分析工具](#pp-vehicle-实时车辆分析工具)
- [💡产业实践范例](#产业实践范例)
- [🏆企业应用案例](#企业应用案例)
- [📝许可证书](#许可证书)
- [📌引用](#引用)
## 🌈简介
PaddleDetection是一个基于PaddlePaddle的目标检测端到端开发套件在提供丰富的模型组件和测试基准的同时注重端到端的产业落地应用通过打造产业级特色模型|工具、建设产业应用范例等手段,帮助开发者实现数据准备、模型选型、模型训练、模型部署的全流程打通,快速进行落地应用。
主要模型效果示例如下(点击标题可快速跳转):
| [**通用目标检测**](#pp-yoloe-高精度目标检测模型) | [**小目标检测**](#pp-yoloe-sod-高精度小目标检测模型) | [**旋转框检测**](#pp-yoloe-r-高性能旋转框检测模型) | [**3D目标物检测**](https://github.com/PaddlePaddle/Paddle3D) |
| :--------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------: |
| <img src='https://user-images.githubusercontent.com/61035602/206095864-f174835d-4e9a-42f7-96b8-d684fc3a3687.png' height="126px" width="180px"> | <img src='https://user-images.githubusercontent.com/61035602/206095892-934be83a-f869-4a31-8e52-1074184149d1.jpg' height="126px" width="180px"> | <img src='https://user-images.githubusercontent.com/61035602/206111796-d9a9702a-c1a0-4647-b8e9-3e1307e9d34c.png' height="126px" width="180px"> | <img src='https://user-images.githubusercontent.com/61035602/206095622-cf6dbd26-5515-472f-9451-b39bbef5b1bf.gif' height="126px" width="180px"> |
| [**人脸检测**](#模型库) | [**2D关键点检测**](#pp-tinypose-人体骨骼关键点识别) | [**多目标追踪**](#pp-tracking-实时多目标跟踪系统) | [**实例分割**](#模型库) |
| <img src='https://user-images.githubusercontent.com/61035602/206095684-72f42233-c9c7-4bd8-9195-e34859bd08bf.jpg' height="126px" width="180px"> | <img src='https://user-images.githubusercontent.com/61035602/206100220-ab01d347-9ff9-4f17-9718-290ec14d4205.gif' height="126px" width="180px"> | <img src='https://user-images.githubusercontent.com/61035602/206111753-836e7827-968e-4c80-92ef-7a78766892fc.gif' height="126px" width="180px" > | <img src='https://user-images.githubusercontent.com/61035602/206095831-cc439557-1a23-4a99-b6b0-b6f2e97e8c57.jpg' height="126px" width="180px"> |
| [**车辆分析——车牌识别**](#pp-vehicle-实时车辆分析工具) | [**车辆分析——车流统计**](#pp-vehicle-实时车辆分析工具) | [**车辆分析——违章检测**](#pp-vehicle-实时车辆分析工具) | [**车辆分析——属性分析**](#pp-vehicle-实时车辆分析工具) |
| <img src='https://user-images.githubusercontent.com/61035602/206099328-2a1559e0-3b48-4424-9bad-d68f9ba5ba65.gif' height="126px" width="180px"> | <img src='https://user-images.githubusercontent.com/61035602/206095918-d0e7ad87-7bbb-40f1-bcc1-37844e2271ff.gif' height="126px" width="180px"> | <img src='https://user-images.githubusercontent.com/61035602/206100295-7762e1ab-ffce-44fb-b69d-45fb93657fa0.gif' height="126px" width="180px" > | <img src='https://user-images.githubusercontent.com/61035602/206095905-8255776a-d8e6-4af1-b6e9-8d9f97e5059d.gif' height="126px" width="180px"> |
| [**行人分析——闯入分析**](#pp-human-实时行人分析工具) | [**行人分析——行为分析**](#pp-human-实时行人分析工具) | [**行人分析——属性分析**](#pp-human-实时行人分析工具) | [**行人分析——人流统计**](#pp-human-实时行人分析工具) |
| <img src='https://user-images.githubusercontent.com/61035602/206095792-ae0ac107-cd8e-492a-8baa-32118fc82b04.gif' height="126px" width="180px"> | <img src='https://user-images.githubusercontent.com/61035602/206095778-fdd73e5d-9f91-48c7-9d3d-6f2e02ec3f79.gif' height="126px" width="180px"> | <img src='https://user-images.githubusercontent.com/61035602/206095709-2c3a209e-6626-45dd-be16-7f0bf4d48a14.gif' height="126px" width="180px"> | <img src="https://user-images.githubusercontent.com/61035602/206113351-cc59df79-8672-4d76-b521-a15acf69ae78.gif" height="126px" width="180px"> |
同时PaddleDetection提供了模型的在线体验功能用户可以选择自己的数据进行在线推理。
`说明`考虑到服务器负载压力在线推理均为CPU推理完整的模型开发实例以及产业部署实践代码示例请前往[🎗️产业特色模型|产业工具](#️产业特色模型产业工具-1)。
`传送门`[模型在线体验](https://www.paddlepaddle.org.cn/models)
<div align="center">
<p align="center">
<img src="https://user-images.githubusercontent.com/61035602/206896755-bd0cd498-1149-4e94-ae30-da590ea78a7a.gif" align="middle"/>
</p>
</div>
## 📣最新进展
💥 2024.6.27 **飞桨低代码开发工具 [PaddleX 3.0](https://github.com/paddlepaddle/paddlex) 重磅更新!**
- 低代码开发范式:支持目标检测模型全流程低代码开发,提供 Python API支持用户自定义串联模型
- 多硬件训推支持:支持英伟达 GPU、昆仑芯、昇腾和寒武纪等多种硬件进行模型训练与推理。
**🔥超越YOLOv8飞桨推出精度最高的实时检测器RT-DETR**
<div align="center">
<img src="https://github.com/PaddlePaddle/PaddleDetection/assets/17582080/196b0a10-d2e8-401c-9132-54b9126e0a33" height = "500" caption='' />
<p></p>
</div>
- `RT-DETR解读文章传送门`
- [《超越YOLOv8飞桨推出精度最高的实时检测器RT-DETR](https://mp.weixin.qq.com/s/o03QM2rZNjHVto36gcV0Yw)
- `代码传送门`[RT-DETR](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rtdetr)
## 👫开源社区
- **📑项目合作:** 如果您是企业开发者且有明确的目标检测垂类应用需求,请扫描如下二维码入群,并联系`群管理员AI`后可免费与官方团队展开不同层次的合作。
- **🏅️社区贡献:** PaddleDetection非常欢迎你加入到飞桨社区的开源建设中参与贡献方式可以参考[开源项目开发指南](docs/contribution/README.md)。
- **💻直播教程:** PaddleDetection会定期在飞桨直播间([B站:飞桨PaddlePaddle](https://space.bilibili.com/476867757)、[微信: 飞桨PaddlePaddle](https://mp.weixin.qq.com/s/6ji89VKqoXDY6SSGkxS8NQ)),针对发新内容、以及产业范例、使用教程等进行直播分享。
- **🎁加入社区:** **微信扫描二维码并填写问卷之后,可以及时获取如下信息,包括:**
- 社区最新文章、直播课等活动预告
- 往期直播录播&PPT
- 30+行人车辆等垂类高性能预训练模型
- 七大任务开源数据集下载链接汇总
- 40+前沿检测领域顶会算法
- 15+从零上手目标检测理论与实践视频课程
- 10+工业安防交通全流程项目实操(含源码)
<div align="center">
<img src="https://github.com/PaddlePaddle/PaddleDetection/assets/22989727/0466954b-ab4d-4984-bd36-796c37f0ee9c" width = "150" height = "150",caption='' />
<p>PaddleDetection官方交流群二维码</p>
</div>
## 📖 技术交流合作
- 飞桨低代码开发工具PaddleX—— 面向国内外主流AI硬件的飞桨精选模型一站式开发工具。包含如下核心优势
- 【产业高精度模型库】覆盖10个主流AI任务 40+精选模型,丰富齐全。
- 【特色模型产线】:提供融合大小模型的特色模型产线,精度更高,效果更好。
- 【低代码开发模式】:图形化界面支持统一开发范式,便捷高效。
- 【私有化部署多硬件支持】适配国内外主流AI硬件支持本地纯离线使用满足企业安全保密需要。
- PaddleX官网地址https://aistudio.baidu.com/intro/paddlex
- PaddleX官方交流频道https://aistudio.baidu.com/community/channel/610
- **🎈社区近期活动**
- **🔥PaddleDetection v2.6版本更新解读**
<div align="center">
<img src="https://user-images.githubusercontent.com/61035602/224244188-da8495fc-eea9-432f-bc2d-6f0144c2dde9.png" height = "250" caption='' />
<p></p>
</div>
- `v2.6版本版本更新解读文章传送门`[《PaddleDetection v2.6发布目标小数据缺标注累泛化差PP新员逐一应对](https://mp.weixin.qq.com/s/SLITj5k120d_fQc7jEO8Vw)
- **🏆半监督检测**
- `文章传送门`[CVPR 2023 | 单阶段半监督目标检测SOTAARSL](https://mp.weixin.qq.com/s/UZLIGL6va2KBfofC-nKG4g)
- `代码传送门`[ARSL](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/semi_det)
<div align="center">
<img src="https://user-images.githubusercontent.com/61035602/230522850-21873665-ba79-4f8d-8dce-43d736111df8.png" height = "250" caption='' />
<p></p>
</div>
- **👀YOLO系列专题**
- `文章传送门`[YOLOv8来啦YOLO内卷期模型怎么选9+款AI硬件如何快速部署深度解析](https://mp.weixin.qq.com/s/rPwprZeHEpmGOe5wxrmO5g)
- `代码传送门`[PaddleYOLO全系列](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/docs/feature_models/PaddleYOLO_MODEL.md)
<div align="center">
<img src="https://user-images.githubusercontent.com/61035602/213202797-3a1b24f3-53c0-4094-bb31-db2f84438fbc.jpeg" height = "250" caption='' />
<p></p>
</div>
- **🎯少目标迁移学习专题**
- `文章传送门`[囿于数据少泛化性差PaddleDetection少样本迁移学习助你一键突围](https://mp.weixin.qq.com/s/dFEQoxSzVCOaWVZPb3N7WA)
- **⚽2022卡塔尔世界杯专题**
- `文章传送门`[世界杯决赛号角吹响趁周末来搭一套足球3D+AI量化分析系统吧](https://mp.weixin.qq.com/s/koJxjWDPBOlqgI-98UsfKQ)
<div align="center">
<img src="https://user-images.githubusercontent.com/61035602/208036574-f151a7ff-a5f1-4495-9316-a47218a6576b.gif" height = "250" caption='' />
<p></p>
</div>
- **🔍旋转框小目标检测专题**
- `文章传送门`[Yes, PP-YOLOE80.73mAP、38.5mAP旋转框、小目标检测能力双SOTA](https://mp.weixin.qq.com/s/6ji89VKqoXDY6SSGkxS8NQ)
<div align="center">
<img src="https://user-images.githubusercontent.com/61035602/208037368-5b9f01f7-afd9-46d8-bc80-271ccb5db7bb.png" height = "220" caption='' />
<p></p>
</div>
- **🎊YOLO Vision世界学术交流大会**
- **PaddleDetection**受邀参与首个以**YOLO为主题**的**YOLO-VISION**世界大会与全球AI领先开发者学习交流。
- `活动链接传送门`[YOLO-VISION](https://ultralytics.com/yolo-vision)
<div align="center">
<img src="https://user-images.githubusercontent.com/48054808/192301374-940cf2fa-9661-419b-9c46-18a4570df381.jpeg" width="400"/>
</div>
- **🏅️社区贡献**
- `活动链接传送门`[Yes, PP-YOLOE! 基于PP-YOLOE的算法开发](https://github.com/PaddlePaddle/PaddleDetection/issues/7345)
## ✨主要特性
#### 🧩模块化设计
PaddleDetection将检测模型解耦成不同的模块组件通过自定义模块组件组合用户可以便捷高效地完成检测模型的搭建。`传送门`[🧩模块组件](#模块组件)。
#### 📱丰富的模型库
PaddleDetection支持大量的最新主流的算法基准以及预训练模型涵盖2D/3D目标检测、实例分割、人脸检测、关键点检测、多目标跟踪、半监督学习等方向。`传送门`[📱模型库](#模型库)、[⚖️模型性能对比](#️模型性能对比)。
#### 🎗️产业特色模型|产业工具
PaddleDetection打造产业级特色模型以及分析工具PP-YOLOE+、PP-PicoDet、PP-TinyPose、PP-HumanV2、PP-Vehicle等针对通用、高频垂类应用场景提供深度优化解决方案以及高度集成的分析工具降低开发者的试错、选择成本针对业务场景快速应用落地。`传送门`[🎗️产业特色模型|产业工具](#️产业特色模型产业工具-1)。
#### 💡🏆产业级部署实践
PaddleDetection整理工业、农业、林业、交通、医疗、金融、能源电力等AI应用范例打通数据标注-模型训练-模型调优-预测部署全流程,持续降低目标检测技术产业落地门槛。`传送门`[💡产业实践范例](#产业实践范例)、[🏆企业应用案例](#企业应用案例)。
<div align="center">
<p align="center">
<img src="https://user-images.githubusercontent.com/61035602/206431371-912a14c8-ce1e-48ec-ae6f-7267016b308e.png" align="middle" width="1280"/>
</p>
</div>
## 🍱安装
参考[安装说明](docs/tutorials/INSTALL_cn.md)进行安装。
## 🔥教程
**深度学习入门教程**
- [零基础入门深度学习](https://www.paddlepaddle.org.cn/tutorials/projectdetail/4676538)
- [零基础入门目标检测](https://aistudio.baidu.com/aistudio/education/group/info/1617)
**快速开始**
- [快速体验](docs/tutorials/QUICK_STARTED_cn.md)
- [示例30分钟快速开发交通标志检测模型](docs/tutorials/GETTING_STARTED_cn.md)
**数据准备**
- [数据准备](docs/tutorials/data/README.md)
- [数据处理模块](docs/advanced_tutorials/READER.md)
**配置文件说明**
- [RCNN参数说明](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md)
- [PP-YOLO参数说明](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md)
**模型开发**
- [新增检测模型](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- 二次开发
- [目标检测](docs/advanced_tutorials/customization/detection.md)
- [关键点检测](docs/advanced_tutorials/customization/keypoint_detection.md)
- [多目标跟踪](docs/advanced_tutorials/customization/pphuman_mot.md)
- [行为识别](docs/advanced_tutorials/customization/action_recognotion/)
- [属性识别](docs/advanced_tutorials/customization/pphuman_attribute.md)
**部署推理**
- [模型导出教程](deploy/EXPORT_MODEL.md)
- [模型压缩](https://github.com/PaddlePaddle/PaddleSlim)
- [剪裁/量化/蒸馏教程](configs/slim)
- [Paddle Inference部署](deploy/README.md)
- [Python端推理部署](deploy/python)
- [C++端推理部署](deploy/cpp)
- [Paddle Lite部署](deploy/lite)
- [Paddle Serving部署](deploy/serving)
- [ONNX模型导出](deploy/EXPORT_ONNX_MODEL.md)
- [推理benchmark](deploy/BENCHMARK_INFER.md)
## 🔑FAQ
- [FAQ/常见问题汇总](docs/tutorials/FAQ)
## 🧩模块组件
<table align="center">
<tbody>
<tr align="center" valign="center">
<td>
<b>Backbones</b>
</td>
<td>
<b>Necks</b>
</td>
<td>
<b>Loss</b>
</td>
<td>
<b>Common</b>
</td>
<td>
<b>Data Augmentation</b>
</td>
</tr>
<tr valign="top">
<td>
<ul>
<li><a href="ppdet/modeling/backbones/resnet.py">ResNet</a></li>
<li><a href="ppdet/modeling/backbones/res2net.py">CSPResNet</a></li>
<li><a href="ppdet/modeling/backbones/senet.py">SENet</a></li>
<li><a href="ppdet/modeling/backbones/res2net.py">Res2Net</a></li>
<li><a href="ppdet/modeling/backbones/hrnet.py">HRNet</a></li>
<li><a href="ppdet/modeling/backbones/lite_hrnet.py">Lite-HRNet</a></li>
<li><a href="ppdet/modeling/backbones/darknet.py">DarkNet</a></li>
<li><a href="ppdet/modeling/backbones/csp_darknet.py">CSPDarkNet</a></li>
<li><a href="ppdet/modeling/backbones/mobilenet_v1.py">MobileNetV1</a></li>
<li><a href="ppdet/modeling/backbones/mobilenet_v3.py">MobileNetV1</a></li>
<li><a href="ppdet/modeling/backbones/shufflenet_v2.py">ShuffleNetV2</a></li>
<li><a href="ppdet/modeling/backbones/ghostnet.py">GhostNet</a></li>
<li><a href="ppdet/modeling/backbones/blazenet.py">BlazeNet</a></li>
<li><a href="ppdet/modeling/backbones/dla.py">DLA</a></li>
<li><a href="ppdet/modeling/backbones/hardnet.py">HardNet</a></li>
<li><a href="ppdet/modeling/backbones/lcnet.py">LCNet</a></li>
<li><a href="ppdet/modeling/backbones/esnet.py">ESNet</a></li>
<li><a href="ppdet/modeling/backbones/swin_transformer.py">Swin-Transformer</a></li>
<li><a href="ppdet/modeling/backbones/convnext.py">ConvNeXt</a></li>
<li><a href="ppdet/modeling/backbones/vgg.py">VGG</a></li>
<li><a href="ppdet/modeling/backbones/vision_transformer.py">Vision Transformer</a></li>
<li><a href="configs/convnext">ConvNext</a></li>
</ul>
</td>
<td>
<ul>
<li><a href="ppdet/modeling/necks/bifpn.py">BiFPN</a></li>
<li><a href="ppdet/modeling/necks/blazeface_fpn.py">BlazeFace-FPN</a></li>
<li><a href="ppdet/modeling/necks/centernet_fpn.py">CenterNet-FPN</a></li>
<li><a href="ppdet/modeling/necks/csp_pan.py">CSP-PAN</a></li>
<li><a href="ppdet/modeling/necks/custom_pan.py">Custom-PAN</a></li>
<li><a href="ppdet/modeling/necks/fpn.py">FPN</a></li>
<li><a href="ppdet/modeling/necks/es_pan.py">ES-PAN</a></li>
<li><a href="ppdet/modeling/necks/hrfpn.py">HRFPN</a></li>
<li><a href="ppdet/modeling/necks/lc_pan.py">LC-PAN</a></li>
<li><a href="ppdet/modeling/necks/ttf_fpn.py">TTF-FPN</a></li>
<li><a href="ppdet/modeling/necks/yolo_fpn.py">YOLO-FPN</a></li>
</ul>
</td>
<td>
<ul>
<li><a href="ppdet/modeling/losses/smooth_l1_loss.py">Smooth-L1</a></li>
<li><a href="ppdet/modeling/losses/detr_loss.py">Detr Loss</a></li>
<li><a href="ppdet/modeling/losses/fairmot_loss.py">Fairmot Loss</a></li>
<li><a href="ppdet/modeling/losses/fcos_loss.py">Fcos Loss</a></li>
<li><a href="ppdet/modeling/losses/gfocal_loss.py">GFocal Loss</a></li>
<li><a href="ppdet/modeling/losses/jde_loss.py">JDE Loss</a></li>
<li><a href="ppdet/modeling/losses/keypoint_loss.py">KeyPoint Loss</a></li>
<li><a href="ppdet/modeling/losses/solov2_loss.py">SoloV2 Loss</a></li>
<li><a href="ppdet/modeling/losses/focal_loss.py">Focal Loss</a></li>
<li><a href="ppdet/modeling/losses/iou_loss.py">GIoU/DIoU/CIoU</a></li>
<li><a href="ppdet/modeling/losses/iou_aware_loss.py">IoUAware</a></li>
<li><a href="ppdet/modeling/losses/sparsercnn_loss.py">SparseRCNN Loss</a></li>
<li><a href="ppdet/modeling/losses/ssd_loss.py">SSD Loss</a></li>
<li><a href="ppdet/modeling/losses/focal_loss.py">YOLO Loss</a></li>
<li><a href="ppdet/modeling/losses/yolo_loss.py">CT Focal Loss</a></li>
<li><a href="ppdet/modeling/losses/varifocal_loss.py">VariFocal Loss</a></li>
</ul>
</td>
<td>
</ul>
<li><b>Post-processing</b></li>
<ul>
<ul>
<li><a href="ppdet/modeling/post_process.py">SoftNMS</a></li>
<li><a href="ppdet/modeling/post_process.py">MatrixNMS</a></li>
</ul>
</ul>
<li><b>Training</b></li>
<ul>
<ul>
<li><a href="tools/train.py#L62">FP16 training</a></li>
<li><a href="docs/tutorials/DistributedTraining_cn.md">Multi-machine training </a></li>
</ul>
</ul>
<li><b>Common</b></li>
<ul>
<ul>
<li><a href="ppdet/modeling/backbones/resnet.py#L41">Sync-BN</a></li>
<li><a href="configs/gn/README.md">Group Norm</a></li>
<li><a href="configs/dcn/README.md">DCNv2</a></li>
<li><a href="ppdet/optimizer/ema.py">EMA</a></li>
</ul>
</td>
<td>
<ul>
<li><a href="ppdet/data/transform/operators.py">Resize</a></li>
<li><a href="ppdet/data/transform/operators.py">Lighting</a></li>
<li><a href="ppdet/data/transform/operators.py">Flipping</a></li>
<li><a href="ppdet/data/transform/operators.py">Expand</a></li>
<li><a href="ppdet/data/transform/operators.py">Crop</a></li>
<li><a href="ppdet/data/transform/operators.py">Color Distort</a></li>
<li><a href="ppdet/data/transform/operators.py">Random Erasing</a></li>
<li><a href="ppdet/data/transform/operators.py">Mixup </a></li>
<li><a href="ppdet/data/transform/operators.py">AugmentHSV</a></li>
<li><a href="ppdet/data/transform/operators.py">Mosaic</a></li>
<li><a href="ppdet/data/transform/operators.py">Cutmix </a></li>
<li><a href="ppdet/data/transform/operators.py">Grid Mask</a></li>
<li><a href="ppdet/data/transform/operators.py">Auto Augment</a></li>
<li><a href="ppdet/data/transform/operators.py">Random Perspective</a></li>
</ul>
</td>
</tr>
</td>
</tr>
</tbody>
</table>
## 📱模型库
<table align="center">
<tbody>
<tr align="center" valign="center">
<td>
<b>2D Detection</b>
</td>
<td>
<b>Multi Object Tracking</b>
</td>
<td>
<b>KeyPoint Detection</b>
</td>
<td>
<b>Others</b>
</td>
</tr>
<tr valign="top">
<td>
<ul>
<li><a href="configs/faster_rcnn/README.md">Faster RCNN</a></li>
<li><a href="ppdet/modeling/necks/fpn.py">FPN</a></li>
<li><a href="configs/cascade_rcnn/README.md">Cascade-RCNN</a></li>
<li><a href="configs/rcnn_enhance">PSS-Det</a></li>
<li><a href="configs/retinanet/README.md">RetinaNet</a></li>
<li><a href="configs/yolov3/README.md">YOLOv3</a></li>
<li><a href="configs/yolof/README.md">YOLOF</a></li>
<li><a href="configs/yolox/README.md">YOLOX</a></li>
<li><a href="https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5">YOLOv5</a></li>
<li><a href="https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov6">YOLOv6</a></li>
<li><a href="https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7">YOLOv7</a></li>
<li><a href="https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov8">YOLOv8</a></li>
<li><a href="https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/rtmdet">RTMDet</a></li>
<li><a href="configs/ppyolo/README_cn.md">PP-YOLO</a></li>
<li><a href="configs/ppyolo#pp-yolo-tiny">PP-YOLO-Tiny</a></li>
<li><a href="configs/picodet">PP-PicoDet</a></li>
<li><a href="configs/ppyolo/README_cn.md">PP-YOLOv2</a></li>
<li><a href="configs/ppyoloe/README_legacy.md">PP-YOLOE</a></li>
<li><a href="configs/ppyoloe/README_cn.md">PP-YOLOE+</a></li>
<li><a href="configs/smalldet">PP-YOLOE-SOD</a></li>
<li><a href="configs/rotate/README.md">PP-YOLOE-R</a></li>
<li><a href="configs/ssd/README.md">SSD</a></li>
<li><a href="configs/centernet">CenterNet</a></li>
<li><a href="configs/fcos">FCOS</a></li>
<li><a href="configs/rotate/fcosr">FCOSR</a></li>
<li><a href="configs/ttfnet">TTFNet</a></li>
<li><a href="configs/tood">TOOD</a></li>
<li><a href="configs/gfl">GFL</a></li>
<li><a href="configs/gfl/gflv2_r50_fpn_1x_coco.yml">GFLv2</a></li>
<li><a href="configs/detr">DETR</a></li>
<li><a href="configs/deformable_detr">Deformable DETR</a></li>
<li><a href="configs/sparse_rcnn">Sparse RCNN</a></li>
</ul>
</td>
<td>
<ul>
<li><a href="configs/mot/jde">JDE</a></li>
<li><a href="configs/mot/fairmot">FairMOT</a></li>
<li><a href="configs/mot/deepsort">DeepSORT</a></li>
<li><a href="configs/mot/bytetrack">ByteTrack</a></li>
<li><a href="configs/mot/ocsort">OC-SORT</a></li>
<li><a href="configs/mot/botsort">BoT-SORT</a></li>
<li><a href="configs/mot/centertrack">CenterTrack</a></li>
</ul>
</td>
<td>
<ul>
<li><a href="configs/keypoint/hrnet">HRNet</a></li>
<li><a href="configs/keypoint/higherhrnet">HigherHRNet</a></li>
<li><a href="configs/keypoint/lite_hrnet">Lite-HRNet</a></li>
<li><a href="configs/keypoint/tiny_pose">PP-TinyPose</a></li>
</ul>
</td>
<td>
</ul>
<li><b>Instance Segmentation</b></li>
<ul>
<ul>
<li><a href="configs/mask_rcnn">Mask RCNN</a></li>
<li><a href="configs/cascade_rcnn">Cascade Mask RCNN</a></li>
<li><a href="configs/solov2">SOLOv2</a></li>
</ul>
</ul>
<li><b>Face Detection</b></li>
<ul>
<ul>
<li><a href="configs/face_detection">BlazeFace</a></li>
</ul>
</ul>
<li><b>Semi-Supervised Detection</b></li>
<ul>
<ul>
<li><a href="configs/semi_det">DenseTeacher</a></li>
</ul>
</ul>
<li><b>3D Detection</b></li>
<ul>
<ul>
<li><a href="https://github.com/PaddlePaddle/Paddle3D">Smoke</a></li>
<li><a href="https://github.com/PaddlePaddle/Paddle3D">CaDDN</a></li>
<li><a href="https://github.com/PaddlePaddle/Paddle3D">PointPillars</a></li>
<li><a href="https://github.com/PaddlePaddle/Paddle3D">CenterPoint</a></li>
<li><a href="https://github.com/PaddlePaddle/Paddle3D">SequeezeSegV3</a></li>
<li><a href="https://github.com/PaddlePaddle/Paddle3D">IA-SSD</a></li>
<li><a href="https://github.com/PaddlePaddle/Paddle3D">PETR</a></li>
</ul>
</ul>
<li><b>Vehicle Analysis Toolbox</b></li>
<ul>
<ul>
<li><a href="deploy/pipeline/README.md">PP-Vehicle</a></li>
</ul>
</ul>
<li><b>Human Analysis Toolbox</b></li>
<ul>
<ul>
<li><a href="deploy/pipeline/README.md">PP-Human</a></li>
<li><a href="deploy/pipeline/README.md">PP-HumanV2</a></li>
</ul>
</ul>
<li><b>Sport Analysis Toolbox</b></li>
<ul>
<ul>
<li><a href="https://github.com/PaddlePaddle/PaddleSports">PP-Sports</a></li>
</ul>
</td>
</tr>
</tbody>
</table>
## ⚖️模型性能对比
#### 🖥️服务器端模型性能对比
各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。
<div align="center">
<img src="https://user-images.githubusercontent.com/61035602/206434766-caaa781b-b922-481f-af09-15faac9ed33b.png" width="800"/>
</div>
<details>
<summary><b> 测试说明(点击展开)</b></summary>
- ViT为ViT-Cascade-Faster-RCNN模型COCO数据集mAP高达55.7%
- Cascade-Faster-RCNN为Cascade-Faster-RCNN-ResNet50vd-DCNPaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPS
- PP-YOLOE是对PP-YOLO v2模型的进一步优化L版本在COCO数据集mAP为51.6%Tesla V100预测速度78.1FPS
- PP-YOLOE+是对PPOLOE模型的进一步优化L版本在COCO数据集mAP为53.3%Tesla V100预测速度78.1FPS
- YOLOX和YOLOv5均为基于PaddleDetection复现算法YOLOv5代码在[PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO)中,参照[PaddleYOLO_MODEL](docs/feature_models/PaddleYOLO_MODEL.md)
- 图中模型均可在[📱模型库](#模型库)中获取
</details>
#### ⌚️移动端模型性能对比
各移动端模型在COCO数据集上精度mAP和高通骁龙865处理器上预测速度(FPS)对比图。
<div align="center">
<img src="https://user-images.githubusercontent.com/61035602/206434741-10460690-8fc3-4084-a11a-16fe4ce2fc85.png" width="550"/>
</div>
<details>
<summary><b> 测试说明(点击展开)</b></summary>
- 测试数据均使用高通骁龙865(4xA77+4xA55)处理器batch size为1, 开启4线程测试测试使用NCNN预测库测试脚本见[MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark)
- PP-PicoDet及PP-YOLO-Tiny为PaddleDetection自研模型可在[📱模型库](#模型库)中获取其余模型PaddleDetection暂未提供
</details>
## 🎗️产业特色模型|产业工具
产业特色模型产业工具是PaddleDetection针对产业高频应用场景打造的兼顾精度和速度的模型以及工具箱注重从数据处理-模型训练-模型调优-模型部署的端到端打通,且提供了实际生产环境中的实践范例代码,帮助拥有类似需求的开发者高效的完成产品开发落地应用。
该系列模型工具均已PP前缀命名具体介绍、预训练模型以及产业实践范例代码如下。
### 💎PP-YOLOE 高精度目标检测模型
<details>
<summary><b> 简介(点击展开)</b></summary>
PP-YOLOE是基于PP-YOLOv2的卓越的单阶段Anchor-free模型超越了多种流行的YOLO模型。PP-YOLOE避免了使用诸如Deformable Convolution或者Matrix NMS之类的特殊算子以使其能轻松地部署在多种多样的硬件上。其使用大规模数据集obj365预训练模型进行预训练可以在不同场景数据集上快速调优收敛。
`传送门`[PP-YOLOE说明](configs/ppyoloe/README_cn.md)。
`传送门`[arXiv论文](https://arxiv.org/abs/2203.16250)。
</details>
<details>
<summary><b> 预训练模型(点击展开)</b></summary>
| 模型名称 | COCO精度mAP | V100 TensorRT FP16速度(FPS) | 推荐部署硬件 | 配置文件 | 模型下载 |
| :---------- | :-------------: | :-------------------------: | :----------: | :-----------------------------------------------------: | :-------------------------------------------------------------------------------------: |
| PP-YOLOE+_l | 53.3 | 149.2 | 服务器 | [链接](configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams) |
`传送门`[全部预训练模型](configs/ppyoloe/README_cn.md)。
</details>
<details>
<summary><b> 产业应用代码示例(点击展开)</b></summary>
| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 |
| ---- | ----------------- | --------------------------------------------------------------------------------------------- | ------------------------------------------------------------- | --------------------------------------------------- |
| 农业 | 农作物检测 | 用于葡萄栽培中基于图像的监测和现场机器人技术提供了来自5种不同葡萄品种的实地实例 | [PP-YOLOE+ 下游任务](./configs/ppyoloe/application/README.md) | [下载链接](./configs/ppyoloe/application/README.md) |
| 通用 | 低光场景检测 | 低光数据集使用ExDark包括从极低光环境到暮光环境等10种不同光照条件下的图片。 | [PP-YOLOE+ 下游任务](./configs/ppyoloe/application/README.md) | [下载链接](./configs/ppyoloe/application/README.md) |
| 工业 | PCB电路板瑕疵检测 | 工业数据集使用PKU-Market-PCB该数据集用于印刷电路板PCB的瑕疵检测提供了6种常见的PCB缺陷 | [PP-YOLOE+ 下游任务](./configs/ppyoloe/application/README.md) | [下载链接](./configs/ppyoloe/application/README.md) |
</details>
### 💎PP-YOLOE-R 高性能旋转框检测模型
<details>
<summary><b> 简介(点击展开)</b></summary>
PP-YOLOE-R是一个高效的单阶段Anchor-free旋转框检测模型基于PP-YOLOE+引入了一系列改进策略来提升检测精度。根据不同的硬件对精度和速度的要求PP-YOLOE-R包含s/m/l/x四个尺寸的模型。在DOTA 1.0数据集上PP-YOLOE-R-l和PP-YOLOE-R-x在单尺度训练和测试的情况下分别达到了78.14mAP和78.28 mAP这在单尺度评估下超越了几乎所有的旋转框检测模型。通过多尺度训练和测试PP-YOLOE-R-l和PP-YOLOE-R-x的检测精度进一步提升至80.02mAP和80.73 mAP超越了所有的Anchor-free方法并且和最先进的Anchor-based的两阶段模型精度几乎相当。在保持高精度的同时PP-YOLOE-R避免使用特殊的算子例如Deformable Convolution或Rotated RoI Align使其能轻松地部署在多种多样的硬件上。
`传送门`[PP-YOLOE-R说明](configs/rotate/ppyoloe_r)。
`传送门`[arXiv论文](https://arxiv.org/abs/2211.02386)。
</details>
<details>
<summary><b> 预训练模型(点击展开)</b></summary>
| 模型 | Backbone | mAP | V100 TRT FP16 (FPS) | RTX 2080 Ti TRT FP16 (FPS) | Params (M) | FLOPs (G) | 学习率策略 | 角度表示 | 数据增广 | GPU数目 | 每GPU图片数目 | 模型下载 | 配置文件 |
| :----------: | :------: | :---: | :-----------------: | :------------------------: | :--------: | :-------: | :--------: | :------: | :------: | :-----: | :-----------: | :---------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------: |
| PP-YOLOE-R-l | CRN-l | 80.02 | 69.7 | 48.3 | 53.29 | 281.65 | 3x | oc | MS+RR | 4 | 2 | [model](https://paddledet.bj.bcebos.com/models/ppyoloe_r_crn_l_3x_dota_ms.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rotate/ppyoloe_r/ppyoloe_r_crn_l_3x_dota_ms.yml) |
`传送门`[全部预训练模型](configs/rotate/ppyoloe_r)。
</details>
<details>
<summary><b> 产业应用代码示例(点击展开)</b></summary>
| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 |
| ---- | ---------- | --------------------------------------------------------------------- | --------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| 通用 | 旋转框检测 | 手把手教你上手PP-YOLOE-R旋转框检测10分钟将脊柱数据集精度训练至95mAP | [基于PP-YOLOE-R的旋转框检测](https://aistudio.baidu.com/aistudio/projectdetail/5058293) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/5058293) |
</details>
### 💎PP-YOLOE-SOD 高精度小目标检测模型
<details>
<summary><b> 简介(点击展开)</b></summary>
PP-YOLOE-SOD(Small Object Detection)是PaddleDetection团队针对小目标检测提出的检测方案在VisDrone-DET数据集上单模型精度达到38.5mAP达到了SOTA性能。其分别基于切图拼图流程优化的小目标检测方案以及基于原图模型算法优化的小目标检测方案。同时提供了数据集自动分析脚本只需输入数据集标注文件便可得到数据集统计结果辅助判断数据集是否是小目标数据集以及是否需要采用切图策略同时给出网络超参数参考值。
`传送门`[PP-YOLOE-SOD 小目标检测模型](configs/smalldet)。
</details>
<details>
<summary><b> 预训练模型(点击展开)</b></summary>
- VisDrone数据集预训练模型
| 模型 | COCOAPI mAP<sup>val<br>0.5:0.95 | COCOAPI mAP<sup>val<br>0.5 | COCOAPI mAP<sup>test_dev<br>0.5:0.95 | COCOAPI mAP<sup>test_dev<br>0.5 | MatlabAPI mAP<sup>test_dev<br>0.5:0.95 | MatlabAPI mAP<sup>test_dev<br>0.5 | 下载 | 配置文件 |
| :------------------ | :-----------------------------: | :------------------------: | :----------------------------------: | :-----------------------------: | :------------------------------------: | :-------------------------------: | :---------------------------------------------------------------------------------------------: | :----------------------------------------------------------: |
| **PP-YOLOE+_SOD-l** | **31.9** | **52.1** | **25.6** | **43.5** | **30.25** | **51.18** | [下载链接](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_sod_crn_l_80e_visdrone.pdparams) | [配置文件](visdrone/ppyoloe_plus_sod_crn_l_80e_visdrone.yml) |
`传送门`[全部预训练模型](configs/smalldet)。
</details>
<details>
<summary><b> 产业应用代码示例(点击展开)</b></summary>
| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 |
| ---- | ---------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| 通用 | 小目标检测 | 基于PP-YOLOE-SOD的无人机航拍图像检测案例全流程实操。 | [基于PP-YOLOE-SOD的无人机航拍图像检测](https://aistudio.baidu.com/aistudio/projectdetail/5036782) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/5036782) |
</details>
### 💫PP-PicoDet 超轻量实时目标检测模型
<details>
<summary><b> 简介(点击展开)</b></summary>
全新的轻量级系列模型PP-PicoDet在移动端具有卓越的性能成为全新SOTA轻量级模型。
`传送门`[PP-PicoDet说明](configs/picodet/README.md)。
`传送门`[arXiv论文](https://arxiv.org/abs/2111.00902)。
</details>
<details>
<summary><b> 预训练模型(点击展开)</b></summary>
| 模型名称 | COCO精度mAP | 骁龙865 四线程速度(FPS) | 推荐部署硬件 | 配置文件 | 模型下载 |
| :-------- | :-------------: | :---------------------: | :------------: | :--------------------------------------------------: | :----------------------------------------------------------------------------------: |
| PicoDet-L | 36.1 | 39.7 | 移动端、嵌入式 | [链接](configs/picodet/picodet_l_320_coco_lcnet.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) |
`传送门`[全部预训练模型](configs/picodet/README.md)。
</details>
<details>
<summary><b> 产业应用代码示例(点击展开)</b></summary>
| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 |
| -------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
| 智慧城市 | 道路垃圾检测 | 通过在市政环卫车辆上安装摄像头对路面垃圾检测并分析,实现对路面遗撒的垃圾进行监控,记录并通知环卫人员清理,大大提升了环卫人效。 | [基于PP-PicoDet的路面垃圾检测](https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0) |
</details>
### 📡PP-Tracking 实时多目标跟踪系统
<details>
<summary><b> 简介(点击展开)</b></summary>
PaddleDetection团队提供了实时多目标跟踪系统PP-Tracking是基于PaddlePaddle深度学习框架的业界首个开源的实时多目标跟踪系统具有模型丰富、应用广泛和部署高效三大优势。 PP-Tracking支持单镜头跟踪(MOT)和跨镜头跟踪(MTMCT)两种模式针对实际业务的难点和痛点提供了行人跟踪、车辆跟踪、多类别跟踪、小目标跟踪、流量统计以及跨镜头跟踪等各种多目标跟踪功能和应用部署方式支持API调用和GUI可视化界面部署语言支持Python和C++部署平台环境支持Linux、NVIDIA Jetson等。
`传送门`[PP-Tracking说明](configs/mot/README.md)。
</details>
<details>
<summary><b> 预训练模型(点击展开)</b></summary>
| 模型名称 | 模型简介 | 精度 | 速度(FPS) | 推荐部署硬件 | 配置文件 | 模型下载 |
| :-------- | :----------------------------------: | :--------------------: | :-------: | :--------------------: | :--------------------------------------------------------: | :------------------------------------------------------------------------------------------------: |
| ByteTrack | SDE多目标跟踪算法 仅包含检测模型 | MOT-17 test: 78.4 | - | 服务器、移动端、嵌入式 | [链接](configs/mot/bytetrack/bytetrack_yolox.yml) | [下载地址](https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_det.pdparams) |
| FairMOT | JDE多目标跟踪算法 多任务联合学习方法 | MOT-16 test: 75.0 | - | 服务器、移动端、嵌入式 | [链接](configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) |
| OC-SORT | SDE多目标跟踪算法 仅包含检测模型 | MOT-17 half val: 75.5 | - | 服务器、移动端、嵌入式 | [链接](configs/mot/ocsort/ocsort_yolox.yml) | [下载地址](https://bj.bcebos.com/v1/paddledet/models/mot/yolox_x_24e_800x1440_mix_mot_ch.pdparams) |
</details>
<details>
<summary><b> 产业应用代码示例(点击展开)</b></summary>
| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 |
| ---- | ---------- | -------------------------- | ---------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| 通用 | 多目标跟踪 | 快速上手单镜头、多镜头跟踪 | [PP-Tracking之手把手玩转多目标跟踪](https://aistudio.baidu.com/aistudio/projectdetail/3022582) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/3022582) |
</details>
### ⛷PP-TinyPose 人体骨骼关键点识别
<details>
<summary><b> 简介(点击展开)</b></summary>
PaddleDetection 中的关键点检测部分紧跟最先进的算法,包括 Top-Down 和 Bottom-Up 两种方法可以满足用户的不同需求。同时PaddleDetection 提供针对移动端设备优化的自研实时关键点检测模型 PP-TinyPose。
`传送门`[PP-TinyPose说明](configs/keypoint/tiny_pose)。
</details>
<details>
<summary><b> 预训练模型(点击展开)</b></summary>
| 模型名称 | 模型简介 | COCO精度AP | 速度(FPS) | 推荐部署硬件 | 配置文件 | 模型下载 |
| :---------: | :----------------------------------: | :------------: | :-----------------------: | :------------: | :-----------------------------------------------------: | :--------------------------------------------------------------------------------------: |
| PP-TinyPose | 轻量级关键点算法<br/>输入尺寸256x192 | 68.8 | 骁龙865 四线程: 158.7 FPS | 移动端、嵌入式 | [链接](configs/keypoint/tiny_pose/tinypose_256x192.yml) | [下载地址](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) |
`传送门`[全部预训练模型](configs/keypoint/README.md)。
</details>
<details>
<summary><b> 产业应用代码示例(点击展开)</b></summary>
| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 |
| ---- | ---- | ---------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| 运动 | 健身 | 提供从模型选型、数据准备、模型训练优化到后处理逻辑和模型部署的全流程可复用方案有效解决了复杂健身动作的高效识别打造AI虚拟健身教练 | [基于PP-TinyPose增强版的智能健身动作识别](https://aistudio.baidu.com/aistudio/projectdetail/4385813) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/4385813) |
</details>
### 🏃🏻PP-Human 实时行人分析工具
<details>
<summary><b> 简介(点击展开)</b></summary>
PaddleDetection深入探索核心行业的高频场景提供了行人开箱即用分析工具支持图片/单镜头视频/多镜头视频/在线视频流多种输入方式广泛应用于智慧交通、智慧城市、工业巡检等领域。支持服务器端部署及TensorRT加速T4服务器上可达到实时。
PP-Human支持四大产业级功能五大异常行为识别、26种人体属性分析、实时人流计数、跨镜头ReID跟踪。
`传送门`[PP-Human行人分析工具使用指南](deploy/pipeline/README.md)。
</details>
<details>
<summary><b> 预训练模型(点击展开)</b></summary>
| 任务 | T4 TensorRT FP16: 速度FPS | 推荐部署硬件 | 模型下载 | 模型体积 |
| :----------------: | :---------------------------: | :----------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------: |
| 行人检测(高精度) | 39.8 | 服务器 | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 行人跟踪(高精度) | 31.4 | 服务器 | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 属性识别(高精度) | 单人 117.6 | 服务器 | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_small_person_attribute_954_infer.zip) | 目标检测182M<br>属性识别86M |
| 摔倒识别 | 单人 100 | 服务器 | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) <br> [关键点检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) <br> [基于关键点行为识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | 多目标跟踪182M<br>关键点检测101M<br>基于关键点行为识别21.8M |
| 闯入识别 | 31.4 | 服务器 | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| 打架识别 | 50.8 | 服务器 | [视频分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M |
| 抽烟识别 | 340.1 | 服务器 | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[基于人体id的目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | 目标检测182M<br>基于人体id的目标检测27M |
| 打电话识别 | 166.7 | 服务器 | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[基于人体id的图像分类](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | 目标检测182M<br>基于人体id的图像分类45M |
`传送门`[完整预训练模型](deploy/pipeline/README.md)。
</details>
<details>
<summary><b> 产业应用代码示例(点击展开)</b></summary>
| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 |
| -------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------- |
| 智能安防 | 摔倒检测 | 飞桨行人分析PP-Human中提供的摔倒识别算法采用了关键点+时空图卷积网络的技术,对摔倒姿势无限制、背景环境无要求。 | [基于PP-Human v2的摔倒检测](https://aistudio.baidu.com/aistudio/projectdetail/4606001) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/4606001) |
| 智能安防 | 打架识别 | 本项目基于PaddleVideo视频开发套件训练打架识别模型然后将训练好的模型集成到PaddleDetection的PP-Human中助力行人行为分析。 | [基于PP-Human的打架识别](https://aistudio.baidu.com/aistudio/projectdetail/4086987?contributionType=1) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/4086987?contributionType=1) |
| 智能安防 | 摔倒检测 | 基于PP-Human完成来客分析整体流程。使用PP-Human完成来客分析中非常常见的场景 1. 来客属性识别(单镜和跨境可视化2. 来客行为识别(摔倒识别)。 | [基于PP-Human的来客分析案例教程](https://aistudio.baidu.com/aistudio/projectdetail/4537344) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/4537344) |
</details>
### 🏎PP-Vehicle 实时车辆分析工具
<details>
<summary><b> 简介(点击展开)</b></summary>
PaddleDetection深入探索核心行业的高频场景提供了车辆开箱即用分析工具支持图片/单镜头视频/多镜头视频/在线视频流多种输入方式广泛应用于智慧交通、智慧城市、工业巡检等领域。支持服务器端部署及TensorRT加速T4服务器上可达到实时。
PP-Vehicle囊括四大交通场景核心功能车牌识别、属性识别、车流量统计、违章检测。
`传送门`[PP-Vehicle车辆分析工具指南](deploy/pipeline/README.md)。
</details>
<details>
<summary><b> 预训练模型(点击展开)</b></summary>
| 任务 | T4 TensorRT FP16: 速度(FPS) | 推荐部署硬件 | 模型方案 | 模型体积 |
| :----------------: | :-------------------------: | :----------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------: |
| 车辆检测(高精度) | 38.9 | 服务器 | [目标检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M |
| 车辆跟踪(高精度) | 25 | 服务器 | [多目标跟踪](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M |
| 车牌识别 | 213.7 | 服务器 | [车牌检测](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz) <br> [车牌识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | 车牌检测3.9M <br> 车牌字符识别: 12M |
| 车辆属性 | 136.8 | 服务器 | [属性识别](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | 7.2M |
`传送门`[完整预训练模型](deploy/pipeline/README.md)。
</details>
<details>
<summary><b> 产业应用代码示例(点击展开)</b></summary>
| 行业 | 类别 | 亮点 | 文档说明 | 模型下载 |
| -------- | ---------------- | ------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| 智慧交通 | 交通监控车辆分析 | 本项目基于PP-Vehicle演示智慧交通中最刚需的车流量监控、车辆违停检测以及车辆结构化车牌、车型、颜色分析三大场景。 | [基于PP-Vehicle的交通监控分析系统](https://aistudio.baidu.com/aistudio/projectdetail/4512254) | [下载链接](https://aistudio.baidu.com/aistudio/projectdetail/4512254) |
</details>
## 💡产业实践范例
产业实践范例是PaddleDetection针对高频目标检测应用场景提供的端到端开发示例帮助开发者打通数据标注-模型训练-模型调优-预测部署全流程。
针对每个范例我们都通过[AI-Studio](https://ai.baidu.com/ai-doc/AISTUDIO/Tk39ty6ho)提供了项目代码以及说明,用户可以同步运行体验。
`传送门`[产业实践范例完整列表](industrial_tutorial/README.md)
- [基于PP-YOLOE-R的旋转框检测](https://aistudio.baidu.com/aistudio/projectdetail/5058293)
- [基于PP-YOLOE-SOD的无人机航拍图像检测](https://aistudio.baidu.com/aistudio/projectdetail/5036782)
- [基于PP-Vehicle的交通监控分析系统](https://aistudio.baidu.com/aistudio/projectdetail/4512254)
- [基于PP-Human v2的摔倒检测](https://aistudio.baidu.com/aistudio/projectdetail/4606001)
- [基于PP-TinyPose增强版的智能健身动作识别](https://aistudio.baidu.com/aistudio/projectdetail/4385813)
- [基于PP-Human的打架识别](https://aistudio.baidu.com/aistudio/projectdetail/4086987?contributionType=1)
- [基于Faster-RCNN的瓷砖表面瑕疵检测](https://aistudio.baidu.com/aistudio/projectdetail/2571419)
- [基于PaddleDetection的PCB瑕疵检测](https://aistudio.baidu.com/aistudio/projectdetail/2367089)
- [基于FairMOT实现人流量统计](https://aistudio.baidu.com/aistudio/projectdetail/2421822)
- [基于YOLOv3实现跌倒检测](https://aistudio.baidu.com/aistudio/projectdetail/2500639)
- [基于PP-PicoDetv2 的路面垃圾检测](https://aistudio.baidu.com/aistudio/projectdetail/3846170?channelType=0&channel=0)
- [基于人体关键点检测的合规检测](https://aistudio.baidu.com/aistudio/projectdetail/4061642?contributionType=1)
- [基于PP-Human的来客分析案例教程](https://aistudio.baidu.com/aistudio/projectdetail/4537344)
- 持续更新中...
## 🏆企业应用案例
企业应用案例是企业在实生产环境下落地应用PaddleDetection的方案思路相比产业实践范例其更多强调整体方案设计思路可供开发者在项目方案设计中做参考。
`传送门`[企业应用案例完整列表](https://www.paddlepaddle.org.cn/customercase)
- [中国南方电网——变电站智慧巡检](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2330)
- [国铁电气——轨道在线智能巡检系统](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2280)
- [京东物流——园区车辆行为识别](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2611)
- [中兴克拉—厂区传统仪表统计监测](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2618)
- [宁德时代—动力电池高精度质量检测](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2609)
- [中国科学院空天信息创新研究院——高尔夫球场遥感监测](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2483)
- [御航智能——基于边缘的无人机智能巡检](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2481)
- [普宙无人机——高精度森林巡检](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2121)
- [领邦智能——红外无感测温监控](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2615)
- [北京地铁——口罩检测](https://mp.weixin.qq.com/s/znrqaJmtA7CcjG0yQESWig)
- [音智达——工厂人员违规行为检测](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2288)
- [华夏天信——输煤皮带机器人智能巡检](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2331)
- [优恩物联网——社区住户分类支持广告精准投放](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2485)
- [螳螂慧视——室内3D点云场景物体分割与检测](https://www.paddlepaddle.org.cn/support/news?action=detail&id=2599)
- 持续更新中...
## 📝许可证书
本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
## 📌引用
```
@misc{ppdet2019,
title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.},
author={PaddlePaddle Authors},
howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}},
year={2019}
}
```

View File

@@ -0,0 +1,541 @@
[简体中文](README_cn.md) | English
<div align="center">
<p align="center">
<img src="https://user-images.githubusercontent.com/48054808/160532560-34cf7a1f-d950-435e-90d2-4b0a679e5119.png" align="middle" width = "800" />
</p>
**A High-Efficient Development Toolkit for Object Detection based on [PaddlePaddle](https://github.com/paddlepaddle/paddle)**
<p align="center">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleDetection/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleDetection?color=ffa"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleDetection/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleDetection?color=ccf"></a>
</p>
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/22989727/205581915-aa8d6bee-5624-4aec-8059-76b5ebaf96f1.gif" width="800"/>
</div>
## <img src="https://user-images.githubusercontent.com/48054808/157793354-6e7f381a-0aa6-4bb7-845c-9acf2ecc05c3.png" width="20"/> Product Update
- 🔥 **2022.11.15SOTA rotated object detector and small object detector based on PP-YOLOE**
- Rotated object detector [PP-YOLOE-R](configs/rotate/ppyoloe_r)
- SOTA Anchor-free rotated object detection model with high accuracy and efficiency
- A series of models, named s/m/l/x, for cloud and edge devices
- Avoiding using special operators to be deployed friendly with TensorRT.
- Small object detector [PP-YOLOE-SOD](configs/smalldet)
- End-to-end detection pipeline based on sliced images
- SOTA model on VisDrone based on original images.
- 2022.8.26PaddleDetection releases[release/2.5 version](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5)
- 🗳 Model features
- Release [PP-YOLOE+](configs/ppyoloe): Increased accuracy by a maximum of 2.4% mAP to 54.9% mAP, 3.75 times faster model training convergence rate, and up to 2.3 times faster end-to-end inference speed; improved generalization for multiple downstream tasks
- Release [PicoDet-NPU](configs/picodet) model which supports full quantization deployment of models; add [PicoDet](configs/picodet) layout analysis model
- Release [PP-TinyPose Plus](./configs/keypoint/tiny_pose/). With 9.1% AP accuracy improvement in physical exercise, dance, and other scenarios, our PP-TinyPose Plus supports unconventional movements such as turning to one side, lying down, jumping, and high lifts
- 🔮 Functions in different scenarios
- Release the pedestrian analysis tool [PP-Human v2](./deploy/pipeline). It introduces four new behavior recognition: fighting, telephoning, smoking, and trespassing. The underlying algorithm performance is optimized, covering three core algorithm capabilities: detection, tracking, and attributes of pedestrians. Our model provides end-to-end development and model optimization strategies for beginners and supports online video streaming input.
- First release [PP-Vehicle](./deploy/pipeline), which has four major functions: license plate recognition, vehicle attribute analysis (color, model), traffic flow statistics, and violation detection. It is compatible with input formats, including pictures, online video streaming, and video. And we also offer our users a comprehensive set of tutorials for customization.
- 💡 Cutting-edge algorithms
- Release [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO) which overs classic and latest models of [YOLO family](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/docs/MODEL_ZOO_en.md): YOLOv3, PP-YOLOE (a real-time high-precision object detection model developed by Baidu PaddlePaddle), and cutting-edge detection algorithms such as YOLOv4, YOLOv5, YOLOX, YOLOv6, YOLOv7 and YOLOv8
- Newly add high precision detection model based on [ViT](configs/vitdet) backbone network, with a 55.7% mAP accuracy on COCO dataset; newly add multi-object tracking model [OC-SORT](configs/mot/ocsort); newly add [ConvNeXt](configs/convnext) backbone network.
- 📋 Industrial applications: Newly add [Smart Fitness](https://aistudio.baidu.com/aistudio/projectdetail/4385813), [Fighting recognition](https://aistudio.baidu.com/aistudio/projectdetail/4086987?channelType=0&channel=0),[ and Visitor Analysis](https://aistudio.baidu.com/aistudio/projectdetail/4230123?channelType=0&channel=0).
- 2022.3.24PaddleDetection released[release/2.4 version](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4)
- Release high-performanace SOTA object detection model [PP-YOLOE](configs/ppyoloe). It integrates cloud and edge devices and provides S/M/L/X versions. In particular, Verson L has the accuracy as 51.4% on COCO test 2017 dataset, inference speed as 78.1 FPS on a single Test V100. It supports mixed precision training, 33% faster than PP-YOLOv2. Its full range of multi-sized models can meet different hardware arithmetic requirements, and adaptable to server, edge-device GPU and other AI accelerator cards on servers.
- Release ultra-lightweight SOTA object detection model [PP-PicoDet Plus](configs/picodet) with 2% improvement in accuracy and 63% improvement in CPU inference speed. Add PicoDet-XS model with a 0.7M parameter, providing model sparsification and quantization functions for model acceleration. No specific post processing module is required for all the hardware, simplifying the deployment.
- Release the real-time pedestrian analysis tool [PP-Human](deploy/pphuman). It has four major functions: pedestrian tracking, visitor flow statistics, human attribute recognition and falling detection. For falling detection, it is optimized based on real-life data with accurate recognition of various types of falling posture. It can adapt to different environmental background, light and camera angle.
- Add [YOLOX](configs/yolox) object detection model with nano/tiny/S/M/L/X. X version has the accuracy as 51.8% on COCO Val2017 dataset.
- [More releases](https://github.com/PaddlePaddle/PaddleDetection/releases)
## <img title="" src="https://user-images.githubusercontent.com/48054808/157795569-9fc77c85-732f-4870-9be0-99a7fe2cff27.png" alt="" width="20"> Brief Introduction
**PaddleDetection** is an end-to-end object detection development kit based on PaddlePaddle. Providing **over 30 model algorithm** and **over 300 pre-trained models**, it covers object detection, instance segmentation, keypoint detection, multi-object tracking. In particular, PaddleDetection offers **high- performance & light-weight** industrial SOTA models on **servers and mobile** devices, champion solution and cutting-edge algorithm. PaddleDetection provides various data augmentation methods, configurable network components, loss functions and other advanced optimization & deployment schemes. In addition to running through the whole process of data processing, model development, training, compression and deployment, PaddlePaddle also provides rich cases and tutorials to accelerate the industrial application of algorithm.
<div align="center">
<img src="https://user-images.githubusercontent.com/22989727/189122825-ee1c1db2-b5f9-42c0-88b4-7975e1ec239d.gif" width="800"/>
</div>
## <img src="https://user-images.githubusercontent.com/48054808/157799599-e6a66855-bac6-4e75-b9c0-96e13cb9612f.png" width="20"/> Features
- **Rich model library**: PaddleDetection provides over 250 pre-trained models including **object detection, instance segmentation, face recognition, multi-object tracking**. It covers a variety of **global competition champion** schemes.
- **Simple to use**: Modular design, decoupling each network component, easy for developers to build and try various detection models and optimization strategies, quick access to high-performance, customized algorithm.
- **Getting Through End to End**: PaddlePaddle gets through end to end from data augmentation, constructing models, training, compression, depolyment. It also supports multi-architecture, multi-device deployment for **cloud and edge** device.
- **High Performance**: Due to the high performance core, PaddlePaddle has clear advantages in training speed and memory occupation. It also supports FP16 training and multi-machine training.
<div align="center">
<img src="https://user-images.githubusercontent.com/22989727/202131382-45fd2de6-3805-460e-a70c-66db7188d37c.png" width="800"/>
</div>
## <img title="" src="https://user-images.githubusercontent.com/48054808/157800467-2a9946ad-30d1-49a9-b9db-ba33413d9c90.png" alt="" width="20"> Exchanges
- If you have any question or suggestion, please give us your valuable input via [GitHub Issues](https://github.com/PaddlePaddle/PaddleDetection/issues)
Welcome to join PaddleDetection user groups on WeChat (scan the QR code, add and reply "D" to the assistant)
<div align="center">
<img src="https://user-images.githubusercontent.com/34162360/177678712-4655747d-4290-4ad9-b7a1-4564a5418ac6.jpg" width = "200" />
</div>
## <img src="https://user-images.githubusercontent.com/48054808/157827140-03ffaff7-7d14-48b4-9440-c38986ea378c.png" width="20"/> Kit Structure
<table align="center">
<tbody>
<tr align="center" valign="bottom">
<td>
<b>Architectures</b>
</td>
<td>
<b>Backbones</b>
</td>
<td>
<b>Components</b>
</td>
<td>
<b>Data Augmentation</b>
</td>
</tr>
<tr valign="top">
<td>
<ul>
<details><summary><b>Object Detection</b></summary>
<ul>
<li>Faster RCNN</li>
<li>FPN</li>
<li>Cascade-RCNN</li>
<li>PSS-Det</li>
<li>RetinaNet</li>
<li>YOLOv3</li>
<li>YOLOF</li>
<li>YOLOX</li>
<li>YOLOv5</li>
<li>YOLOv6</li>
<li>YOLOv7</li>
<li>YOLOv8</li>
<li>RTMDet</li>
<li>PP-YOLO</li>
<li>PP-YOLO-Tiny</li>
<li>PP-PicoDet</li>
<li>PP-YOLOv2</li>
<li>PP-YOLOE</li>
<li>PP-YOLOE+</li>
<li>PP-YOLOE-SOD</li>
<li>PP-YOLOE-R</li>
<li>SSD</li>
<li>CenterNet</li>
<li>FCOS</li>
<li>FCOSR</li>
<li>TTFNet</li>
<li>TOOD</li>
<li>GFL</li>
<li>GFLv2</li>
<li>DETR</li>
<li>Deformable DETR</li>
<li>Swin Transformer</li>
<li>Sparse RCNN</li>
</ul></details>
<details><summary><b>Instance Segmentation</b></summary>
<ul>
<li>Mask RCNN</li>
<li>Cascade Mask RCNN</li>
<li>SOLOv2</li>
</ul></details>
<details><summary><b>Face Detection</b></summary>
<ul>
<li>BlazeFace</li>
</ul></details>
<details><summary><b>Multi-Object-Tracking</b></summary>
<ul>
<li>JDE</li>
<li>FairMOT</li>
<li>DeepSORT</li>
<li>ByteTrack</li>
<li>OC-SORT</li>
<li>BoT-SORT</li>
<li>CenterTrack</li>
</ul></details>
<details><summary><b>KeyPoint-Detection</b></summary>
<ul>
<li>HRNet</li>
<li>HigherHRNet</li>
<li>Lite-HRNet</li>
<li>PP-TinyPose</li>
</ul></details>
</ul>
</td>
<td>
<details><summary><b>Details</b></summary>
<ul>
<li>ResNet(&vd)</li>
<li>Res2Net(&vd)</li>
<li>CSPResNet</li>
<li>SENet</li>
<li>Res2Net</li>
<li>HRNet</li>
<li>Lite-HRNet</li>
<li>DarkNet</li>
<li>CSPDarkNet</li>
<li>MobileNetv1/v3</li>
<li>ShuffleNet</li>
<li>GhostNet</li>
<li>BlazeNet</li>
<li>DLA</li>
<li>HardNet</li>
<li>LCNet</li>
<li>ESNet</li>
<li>Swin-Transformer</li>
<li>ConvNeXt</li>
<li>Vision Transformer</li>
</ul></details>
</td>
<td>
<details><summary><b>Common</b></summary>
<ul>
<li>Sync-BN</li>
<li>Group Norm</li>
<li>DCNv2</li>
<li>EMA</li>
</ul> </details>
</ul>
<details><summary><b>KeyPoint</b></summary>
<ul>
<li>DarkPose</li>
</ul></details>
</ul>
<details><summary><b>FPN</b></summary>
<ul>
<li>BiFPN</li>
<li>CSP-PAN</li>
<li>Custom-PAN</li>
<li>ES-PAN</li>
<li>HRFPN</li>
</ul> </details>
</ul>
<details><summary><b>Loss</b></summary>
<ul>
<li>Smooth-L1</li>
<li>GIoU/DIoU/CIoU</li>
<li>IoUAware</li>
<li>Focal Loss</li>
<li>CT Focal Loss</li>
<li>VariFocal Loss</li>
</ul> </details>
</ul>
<details><summary><b>Post-processing</b></summary>
<ul>
<li>SoftNMS</li>
<li>MatrixNMS</li>
</ul> </details>
</ul>
<details><summary><b>Speed</b></summary>
<ul>
<li>FP16 training</li>
<li>Multi-machine training </li>
</ul> </details>
</ul>
</td>
<td>
<details><summary><b>Details</b></summary>
<ul>
<li>Resize</li>
<li>Lighting</li>
<li>Flipping</li>
<li>Expand</li>
<li>Crop</li>
<li>Color Distort</li>
<li>Random Erasing</li>
<li>Mixup </li>
<li>AugmentHSV</li>
<li>Mosaic</li>
<li>Cutmix </li>
<li>Grid Mask</li>
<li>Auto Augment</li>
<li>Random Perspective</li>
</ul> </details>
</td>
</tr>
</td>
</tr>
</tbody>
</table>
## <img src="https://user-images.githubusercontent.com/48054808/157801371-9a9a8c65-1690-4123-985a-e0559a7f9494.png" width="20"/> Model Performance
<details>
<summary><b> Performance comparison of Cloud models</b></summary>
The comparison between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones.
<div align="center">
<img src="docs/images/fps_map.png" />
</div>
**Clarification**
- `ViT` stands for `ViT-Cascade-Faster-RCNN`, which has highest mAP on COCO as 55.7%
- `Cascade-Faster-RCNN`stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8% in PaddleDetection models
- `PP-YOLOE` are optimized `PP-YOLO v2`. It reached accuracy as 51.4% on COCO dataset, inference speed as 78.1 FPS on Tesla V100
- `PP-YOLOE+` are optimized `PP-YOLOE`. It reached accuracy as 53.3% on COCO dataset, inference speed as 78.1 FPS on Tesla V100
- The models in the figure are available in the[ model library](#模型库)
</details>
<details>
<summary><b> Performance omparison on mobiles</b></summary>
The comparison between COCO mAP and FPS on Qualcomm Snapdragon 865 processor of models on mobile devices.
<div align="center">
<img src="docs/images/mobile_fps_map.png" width=600/>
</div>
**Clarification**
- Tests were conducted on Qualcomm Snapdragon 865 (4 \*A77 + 4 \*A55) batch_size=1, 4 thread, and NCNN inference library, test script see [MobileDetBenchmark](https://github.com/JiweiMaster/MobileDetBenchmark)
- [PP-PicoDet](configs/picodet) and [PP-YOLO-Tiny](configs/ppyolo) are self-developed models of PaddleDetection, and other models are not tested yet.
</details>
## <img src="https://user-images.githubusercontent.com/48054808/157829890-a535b8a6-631c-4c87-b861-64d4b32b2d6a.png" width="20"/> Model libraries
<details>
<summary><b> 1. General detection</b></summary>
#### PP-YOLOE series Recommended scenarios: Cloud GPU such as Nvidia V100, T4 and edge devices such as Jetson series
| Model | COCO AccuracymAP | V100 TensorRT FP16 Speed(FPS) | Configuration | Download |
|:---------- |:------------------:|:-----------------------------:|:-------------------------------------------------------:|:----------------------------------------------------------------------------------------:|
| PP-YOLOE+_s | 43.9 | 333.3 | [link](configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams) |
| PP-YOLOE+_m | 50.0 | 208.3 | [link](configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_m_80e_coco.pdparams) |
| PP-YOLOE+_l | 53.3 | 149.2 | [link](configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams) |
| PP-YOLOE+_x | 54.9 | 95.2 | [link](configs/ppyoloe/ppyoloe_plus_crn_x_80e_coco.yml) | [download](https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_x_80e_coco.pdparams) |
#### PP-PicoDet series Recommended scenarios: Mobile chips and x86 CPU devices, such as ARM CPU(RK3399, Raspberry Pi) and NPU(BITMAIN)
| Model | COCO AccuracymAP | Snapdragon 865 four-thread speed (ms) | Configuration | Download |
|:---------- |:------------------:|:-------------------------------------:|:-----------------------------------------------------:|:-------------------------------------------------------------------------------------:|
| PicoDet-XS | 23.5 | 7.81 | [Link](configs/picodet/picodet_xs_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_xs_320_coco_lcnet.pdparams) |
| PicoDet-S | 29.1 | 9.56 | [Link](configs/picodet/picodet_s_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_s_320_coco_lcnet.pdparams) |
| PicoDet-M | 34.4 | 17.68 | [Link](configs/picodet/picodet_m_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_m_320_coco_lcnet.pdparams) |
| PicoDet-L | 36.1 | 25.21 | [Link](configs/picodet/picodet_l_320_coco_lcnet.yml) | [Download](https://paddledet.bj.bcebos.com/models/picodet_l_320_coco_lcnet.pdparams) |
#### [Frontier detection algorithm](docs/feature_models/PaddleYOLO_MODEL.md)
| Model | COCO AccuracymAP | V100 TensorRT FP16 speed(FPS) | Configuration | Download |
|:-------- |:------------------:|:-----------------------------:|:--------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------:|
| [YOLOX-l](configs/yolox) | 50.1 | 107.5 | [Link](configs/yolox/yolox_l_300e_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams) |
| [YOLOv5-l](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5) | 48.6 | 136.0 | [Link](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov5/yolov5_l_300e_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/yolov5_l_300e_coco.pdparams) |
| [YOLOv7-l](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7) | 51.0 | 135.0 | [链接](https://github.com/PaddlePaddle/PaddleYOLO/tree/develop/configs/yolov7/yolov7_l_300e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/yolov7_l_300e_coco.pdparams) |
#### Other general purpose models [doc](docs/MODEL_ZOO_en.md)
</details>
<details>
<summary><b> 2. Instance segmentation</b></summary>
| Model | Introduction | Recommended Scenarios | COCO Accuracy(mAP) | Configuration | Download |
|:----------------- |:-------------------------------------------------------- |:--------------------------------------------- |:--------------------------------:|:-----------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------:|
| Mask RCNN | Two-stage instance segmentation algorithm | <div style="width: 50pt">Edge-Cloud end</div> | box AP: 41.4 <br/> mask AP: 37.5 | [Link](configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams) |
| Cascade Mask RCNN | Two-stage instance segmentation algorithm | <div style="width: 50pt">Edge-Cloud end</div> | box AP: 45.7 <br/> mask AP: 39.7 | [Link](configs/mask_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) |
| SOLOv2 | Lightweight single-stage instance segmentation algorithm | <div style="width: 50pt">Edge-Cloud end</div> | mask AP: 38.0 | [Link](configs/solov2/solov2_r50_fpn_3x_coco.yml) | [Download](https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_3x_coco.pdparams) |
</details>
<details>
<summary><b> 3. Keypoint detection</b></summary>
| Model | Introduction | Recommended scenarios | COCO AccuracyAP | Speed | Configuration | Download |
|:-------------------- |:--------------------------------------------------------------------------------------------- |:--------------------------------------------- |:-----------------:|:---------------------------------:|:---------------------------------------------------------:|:-------------------------------------------------------------------------------------------:|
| HRNet-w32 + DarkPose | <div style="width: 130pt">Top-down Keypoint detection algorithm<br/>Input size: 384x288</div> | <div style="width: 50pt">Edge-Cloud end</div> | 78.3 | T4 TensorRT FP16 2.96ms | [Link](configs/keypoint/hrnet/dark_hrnet_w32_384x288.yml) | [Download](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_384x288.pdparams) |
| HRNet-w32 + DarkPose | Top-down Keypoint detection algorithm<br/>Input size: 256x192 | Edge-Cloud end | 78.0 | T4 TensorRT FP16 1.75ms | [Link](configs/keypoint/hrnet/dark_hrnet_w32_256x192.yml) | [Download](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams) |
| PP-TinyPose | Light-weight keypoint algorithm<br/>Input size: 256x192 | Mobile | 68.8 | Snapdragon 865 four-thread 6.30ms | [Link](configs/keypoint/tiny_pose/tinypose_256x192.yml) | [Download](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) |
| PP-TinyPose | Light-weight keypoint algorithm<br/>Input size: 128x96 | Mobile | 58.1 | Snapdragon 865 four-thread 2.37ms | [Link](configs/keypoint/tiny_pose/tinypose_128x96.yml) | [Download](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) |
#### Other keypoint detection models [doc](configs/keypoint)
</details>
<details>
<summary><b> 4. Multi-object tracking PP-Tracking</b></summary>
| Model | Introduction | Recommended scenarios | Accuracy | Configuration | Download |
|:--------- |:------------------------------------------------------------- |:--------------------- |:----------------------:|:-----------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------:|
| ByteTrack | SDE Multi-object tracking algorithm with detection model only | Edge-Cloud end | MOT-17 half val: 77.3 | [Link](configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_det.yml) | [Download](https://paddledet.bj.bcebos.com/models/mot/deepsort/yolox_x_24e_800x1440_mix_det.pdparams) |
| FairMOT | JDE multi-object tracking algorithm multi-task learning | Edge-Cloud end | MOT-16 test: 75.0 | [Link](configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) | [Download](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) |
| OC-SORT | SDE multi-object tracking algorithm with detection model only | Edge-Cloud end | MOT-16 half val: 75.5 | [Link](configs/mot/ocsort/ocsort_yolox.yml) | - |
#### Other multi-object tracking models [docs](configs/mot)
</details>
<details>
<summary><b> 5. Industrial real-time pedestrain analysis tool-PP Human</b></summary>
| Task | End-to-End Speedms | Model | Size |
|:--------------------------------------:|:--------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------:|
| Pedestrian detection (high precision) | 25.1ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| Pedestrian detection (lightweight) | 16.2ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M |
| Pedestrian tracking (high precision) | 31.8ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| Pedestrian tracking (lightweight) | 21.0ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M |
| Attribute recognition (high precision) | Single person8.5ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | Object detection182M<br>Attribute recognition86M |
| Attribute recognition (lightweight) | Single person 7.1ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip) | Object detection182M<br>Attribute recognition86M |
| Falling detection | Single person 10ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) <br> [Keypoint detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) <br> [Behavior detection based on key points](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | Multi-object tracking182M<br>Keypoint detection101M<br>Behavior detection based on key points: 21.8M |
| Intrusion detection | 31.8ms | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 182M |
| Fighting detection | 19.7ms | [Video classification](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) | 90M |
| Smoking detection | Single person 15.1ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[Object detection based on Human Id](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip) | Object detection182M<br>Object detection based on Human ID: 27M |
| Phoning detection | Single person ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[Image classification based on Human ID](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip) | Object detection182M<br>Image classification based on Human ID45M |
Please refer to [docs](deploy/pipeline/README_en.md) for details.
</details>
<details>
<summary><b> 6. Industrial real-time vehicle analysis tool-PP Vehicle</b></summary>
| Task | End-to-End Speedms | Model | Size |
|:--------------------------------------:|:--------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------:|
| Vehicle detection (high precision) | 25.7ms | [object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M |
| Vehicle detection (lightweight) | 13.2ms | [object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_ppvehicle.zip) | 27M |
| Vehicle tracking (high precision) | 40ms | [multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_ppvehicle.zip) | 182M |
| Vehicle tracking (lightweight) | 25ms | [multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip) | 27M |
| Plate Recognition | 4.68ms | [plate detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_det_infer.tar.gz)<br>[plate recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/ch_PP-OCRv3_rec_infer.tar.gz) | Plate detection3.9M<br>Plate recognition12M |
| Vehicle attribute | 7.31ms | [attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/vehicle_attribute_model.zip) | 7.2M |
Please refer to [docs](deploy/pipeline/README_en.md) for details.
</details>
## <img src="https://user-images.githubusercontent.com/48054808/157828296-d5eb0ccb-23ea-40f5-9957-29853d7d13a9.png" width="20"/>Document tutorials
### Introductory tutorials
- [Installation](docs/tutorials/INSTALL_cn.md)
- [Quick start](docs/tutorials/QUICK_STARTED_cn.md)
- [Data preparation](docs/tutorials/data/README.md)
- [Geting Started on PaddleDetection](docs/tutorials/GETTING_STARTED_cn.md)
- [FAQ](docs/tutorials/FAQ)
### Advanced tutorials
- Configuration
- [RCNN Configuration](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md)
- [PP-YOLO Configuration](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md)
- Compression based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
- [Pruning/Quantization/Distillation Tutorial](configs/slim)
- [Inference deployment](deploy/README.md)
- [Export model for inference](deploy/EXPORT_MODEL.md)
- [Paddle Inference deployment](deploy/README.md)
- [Inference deployment with Python](deploy/python)
- [Inference deployment with C++](deploy/cpp)
- [Paddle-Lite deployment](deploy/lite)
- [Paddle Serving deployment](deploy/serving)
- [ONNX model export](deploy/EXPORT_ONNX_MODEL.md)
- [Inference benchmark](deploy/BENCHMARK_INFER.md)
- Advanced development
- [Data processing module](docs/advanced_tutorials/READER.md)
- [New object detection models](docs/advanced_tutorials/MODEL_TECHNICAL.md)
- Custumization
- [Object detection](docs/advanced_tutorials/customization/detection.md)
- [Keypoint detection](docs/advanced_tutorials/customization/keypoint_detection.md)
- [Multiple object tracking](docs/advanced_tutorials/customization/pphuman_mot.md)
- [Action recognition](docs/advanced_tutorials/customization/action_recognotion/)
- [Attribute recognition](docs/advanced_tutorials/customization/pphuman_attribute.md)
### Courses
- **[Theoretical foundation] [Object detection 7-day camp](https://aistudio.baidu.com/aistudio/education/group/info/1617):** Overview of object detection tasks, details of RCNN series object detection algorithm and YOLO series object detection algorithm, PP-YOLO optimization strategy and case sharing, introduction and practice of AnchorFree series algorithm
- **[Industrial application] [AI Fast Track industrial object detection technology and application](https://aistudio.baidu.com/aistudio/education/group/info/23670):** Super object detection algorithms, real-time pedestrian analysis system PP-Human, breakdown and practice of object detection industrial application
- **[Industrial features] 2022.3.26** **[Smart City Industry Seven-Day Class](https://aistudio.baidu.com/aistudio/education/group/info/25620)** : Urban planning, Urban governance, Smart governance service, Traffic management, community governance.
- **[Academic exchange] 2022.9.27 [YOLO Vision Event](https://www.youtube.com/playlist?list=PL1FZnkj4ad1NHVC7CMc3pjSQ-JRK-Ev6O):** As the first YOLO-themed event, PaddleDetection was invited to communicate with the experts in the field of Computer Vision around the world.
### [Industrial tutorial examples](./industrial_tutorial/README.md)
- [Rotated object detection based on PP-YOLOE-R](https://aistudio.baidu.com/aistudio/projectdetail/5058293)
- [Aerial image detection based on PP-YOLOE-SOD](https://aistudio.baidu.com/aistudio/projectdetail/5036782)
- [Fall down recognition based on PP-Human v2](https://aistudio.baidu.com/aistudio/projectdetail/4606001)
- [Intelligent fitness recognition based on PP-TinyPose Plus](https://aistudio.baidu.com/aistudio/projectdetail/4385813)
- [Road litter detection based on PP-PicoDet Plus](https://aistudio.baidu.com/aistudio/projectdetail/3561097)
- [Visitor flow statistics based on FairMOT](https://aistudio.baidu.com/aistudio/projectdetail/2421822)
- [Guest analysis based on PP-Human](https://aistudio.baidu.com/aistudio/projectdetail/4537344)
- [More examples](./industrial_tutorial/README.md)
## <img title="" src="https://user-images.githubusercontent.com/48054808/157836473-1cf451fa-f01f-4148-ba68-b6d06d5da2f9.png" alt="" width="20"> Applications
- [Fitness app on android mobile](https://github.com/zhiboniu/pose_demo_android)
- [PP-Tracking GUI Visualization Interface](https://github.com/yangyudong2020/PP-Tracking_GUi)
## Recommended third-party tutorials
- [Deployment of PaddleDetection for Windows I ](https://zhuanlan.zhihu.com/p/268657833)
- [Deployment of PaddleDetection for Windows II](https://zhuanlan.zhihu.com/p/280206376)
- [Deployment of PaddleDetection on Jestson Nano](https://zhuanlan.zhihu.com/p/319371293)
- [How to deploy YOLOv3 model on Raspberry Pi for Helmet detection](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/yolov3_for_raspi.md)
- [Use SSD-MobileNetv1 for a project -- From dataset to deployment on Raspberry Pi](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/ssd_mobilenet_v1_for_raspi.md)
## <img src="https://user-images.githubusercontent.com/48054808/157835981-ef6057b4-6347-4768-8fcc-cd07fcc3d8b0.png" width="20"/> Version updates
Please refer to the[ Release note ](https://github.com/PaddlePaddle/Paddle/wiki/PaddlePaddle-2.3.0-Release-Note-EN)for more details about the updates
## <img title="" src="https://user-images.githubusercontent.com/48054808/157835345-f5d24128-abaf-4813-b793-d2e5bdc70e5a.png" alt="" width="20"> License
PaddlePaddle is provided under the [Apache 2.0 license](LICENSE)
## <img src="https://user-images.githubusercontent.com/48054808/157835796-08d4ffbc-87d9-4622-89d8-cf11a44260fc.png" width="20"/> Contribute your code
We appreciate your contributions and your feedback
- Thank [Mandroide](https://github.com/Mandroide) for code cleanup and
- Thank [FL77N](https://github.com/FL77N/) for `Sparse-RCNN`model
- Thank [Chen-Song](https://github.com/Chen-Song) for `Swin Faster-RCNN`model
- Thank [yangyudong](https://github.com/yangyudong2020), [hchhtc123](https://github.com/hchhtc123) for developing PP-Tracking GUI interface
- Thank Shigure19 for developing PP-TinyPose fitness APP
- Thank [manangoel99](https://github.com/manangoel99) for Wandb visualization methods
## <img src="https://user-images.githubusercontent.com/48054808/157835276-9aab9d1c-1c46-446b-bdd4-5ab75c5cfa48.png" width="20"/> Quote
```
@misc{ppdet2019,
title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.},
author={PaddlePaddle Authors},
howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}},
year={2019}
}
```

View File

@@ -0,0 +1,125 @@
# 直播答疑第一期
### 答疑全程回放可以通过链接下载观看https://pan.baidu.com/s/168ouju4MxN5XJEb-GU1iAw 提取码: 92mw
## PaddleDetection框架/API问题
#### Q1. warmup能详细讲解下吗
A1. warmup是在训练初期学习率从0调整至预设学习率的过程设置可以参考[源码](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/ppdet/optimizer.py#L156)可以设置step数或epoch数
#### Q2. 如果类别不匹配 也能用pretrain weights吗
A2. 可以类别不匹配时模型会自动不加载shape不匹配的权重通常和类别数相关的权重位于head层
#### Q3. 请问nms_eta怎么用呀源码上没有写的很清楚API文档也没有细说
A3. 针对密集的场景nms_eta会在每轮动态的调整nms阈值避免过滤掉两个重叠程度很高但是属于不同物体的检测框具体可以参考[源码](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/detection/multiclass_nms_op.cc#L139)默认为1通常无需设置
#### Q4. 请问anchor_cluster.py中的--size 是模型的input size 还是 实际使用图片的size
A4. 是实际推理时的图片尺寸一般可以参考TestReader中的image_shape的设置。
#### Q5. 请问为什么预测的坐标会出现负的值?
A5. 模型算法中是有可能负值的情况首先需要判断模型预测效果是否符合预期如果正常可以考虑在后处理中增加clip的操作限制输出box在图像中如果不正常说明模型训练效果欠佳需要进一步排查问题或调优
#### Q6. PaddleDetection 人脸检测blazeface模型一键式预测时load_params没有参数文件从哪里下载?
A6. blazeface的模型可以在[模型库](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/face_detection#%E6%A8%A1%E5%9E%8B%E5%BA%93)中下载到,如果想部署需要参考[步骤](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/EXPORT_MODEL.md) 导出模型
## PP-YOLOE问题
#### Q1. 训练PP-YOLOE的时候loss是越训练越高这种情况 是数据集的问题吗?
A1. 可以从以下几个方面排查
1. 数据: 首先确认数据集没问题,包括标注,类别等
2. 超参数base_lr根据batch_size调整遵守线性原则warmup_iters根据总的epoch数进行调整
3. 预训练参数可以加载官方提供的自在coco数据集上的预训练参数
4. 网络结构方面分析下box的分布情况 适当调整dfl的参数
#### Q2. 检测模型选型问题PicoDet、PP-YOLO系列如何选型
A2. PicoDet是针对移动端设备设计的模型是针对armx86等低算力设备上设计PP-YOLO是针对服务器端设计的模型英伟达N卡百度昆仑卡等。手机端无gpu桌面端优先PicoDet有高算力设备如N卡优先PP-YOLO系列对延时不敏感的场景更注重高精度优先PP-YOLO系列
#### Q3. ConvBNLayer中BN层的参数都不会使用L2DecayPP-YOLOE-s的其它部分都会按照配置文件的设置使用0.0005的L2Decay。是这样吗
A3. PP-YOLOE的backbone和neck部分使用了ConvBNLayer其中BN层不会使用L2Decay其他部分使用全局设置的0.0005的L2Decay
#### Q4. PP-YOLOE的Conv的bias也不使用decay吗
A4. PP-YOLOE的backbone和neck部分的Conv是没有bias参数的head部分的Conv bias使用全局decay
#### Q5. 在测速时为什么要用PaddleInference而不是直接加载模型测时间呢
A5. PaddleInference会将paddle导出的预测模型会前向算子做融合从而实现速度优化并且实际部署过程也是使用PaddleInference实现
#### Q6. PP-YOLOE系列在部署的时候前后处理是不是一样的啊
A6. PP-YOLO系列模型在部署时的前处理都是 decode-resize-normalize-permute的流程后处理方面PP-YOLOv2使用了Matrix NMSPP-YOLOE使用的是普通的NMS算法
#### Q7. 针对小目标和类别不平衡的数据集PP-YOLOE有什么调整策略吗
A7 针对小目标数据集可以适当增大ppyoloe的输入尺寸然后在模型中增加注意力机制目前基于PP-YOLOE的小目标检测正在开发中针对类别不平衡问题可以从数据采样的角度处理目前PP-YOLOE还没有专门针对类别不平衡问题的优化
## PP-Human问题
#### Q1. 请问pphuman用导出的模型18个点不是官方17个点去预测时报错是为什么
A1. 这个问题是关键点模型输出点的数量与行为识别模型不一致导致的。如果希望用18点模型预测除了关键点用18点模型以外还需要自建18点的动作识别模型。
#### Q2. 为什么官方导出模型设置的window_size是50
A2. 导出模型的设置与训练和预测的输入数据长度是一致的我们主要采用的数据集是ntu、企业提供的实际数据等等。在训练这个模型的时候我们对这些数据中摔倒的片段做了统计分析基本上每个动作片段持续的帧数大约是40~80左右。综合考虑到实际使用的延迟以及预测效果我们选择了50这个量级在我们的这部分数据上既能完整描述一个完整动作又不会使得延迟过大。
总的来说这个window_size的数值最好还是根据实际动作以及设备的情况进行选择。例如在某种设备上50帧的长度根本不足以包含一个完整的动作那么这个数值就需要扩大又或者某些动作持续时间很短50帧的长度包含了太多不相关的其他动作容易造成误识别那么这个数值可以适当缩小。
#### Q3. PP-Human中如何替换检测、跟踪、关键点模型
A3. 我们使用的模型都是PaddleDetection中模型进行导出得到的。理论上PP-Human所使用的模型都是可以直接替换的但是需要注意是流程和前后处理一样的模型。
#### Q4. PP-Human中的数据标注问题检测、跟踪、关键点、行为、属性标注工具推荐和标注步骤
A4. 标注工具:检测 labelme, labelImg, cvat 跟踪darklabelcvat关键点 labelmecvat。检测标注可以使用tools/x2coco.py转换成coco格式
#### Q5. PP-Human中如何更改label属性和动作识别
A5. 在PPHuman中动作识别被定义为基于骨骼点序列的分类问题目前我们已经开源的摔倒动作识别是一个二分类问题属性方面我们当前还暂时没有开放训练正在建设中
#### Q6. PP-Human的哪些功能支持单人、哪些支持多人
A6. PP-Human的功能实现基于一套流程检测->跟踪->具体功能。当前我们的具体功能模型每次处理的是单人的,即属性、动作等都是属于图像中每一个具体人的。但是基于这套流程下来,图像中的每一个人都得到了处理的。所以单人、多人实际都是一样支持的。
#### Q7. PP-Human对视频流预测的支持及服务化部署
A7. 目前正在建设当中,下个版本会支持这部分功能
#### Q8. 在使用pphuman训练自己的数据集时训练完进行测试时可视化的标签如何更改没有更改的情况下还是falling
A8. 可视化的函数位于https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/visualize.py#L368,这里在可视化的时候将 action_text替换为期望的类别即可。
#### Q9. 关键点检测可以实现一个连贯动作的检测吗,比如健身规范
A9. 基于关键点是可以实现的。这里可以有不同思路去做:
1. 如果是期望判定动作规范的程度且这个动作可以很好的描述。那么可以在关键点模型获得的坐标的基础上人工增加逻辑判断即可。这里我们提供一个安卓的健身APP示例https://github.com/zhiboniu/pose_demo_android 其中实现判定各项动作的逻辑可以参考https://github.com/zhiboniu/pose_demo_android/blob/release/1.0/app/src/main/cpp/pose_action.cc 。
2. 当一个动作较难用逻辑去描述的时候,可能参考现有摔倒检测的案例,训练一个识别健身动作的模型,但对收集数据的要求会比较高。
#### Q10. 有遮挡的生产环境中梯子,可以用关键点检测判断人员上下梯动作是否合规
A10. 这个问题需要视遮挡的程度而定,如果遮挡过于严重时关键点检测模型的效果会大打折扣,从而导致行为的判断失准。此外,由于基于关键点的方案抹去了外观信息,如果只是从人物本身的动作上去做判断,那么在遮挡不严重的场景下是可以的。反之,如果梯子这个物体是判断动作是否合规的必要元素,那么这个方案其实不一定是最佳选择。
#### Q11. 关键点做的行为识别并不是时序上的动作识别吗
A11. 是时序的动作识别。这里是将一定时间范围内的每一帧关键点坐标组成一个时序的关键点序列,再通过行为识别模型去预测这个序列所属的行为类别。
## 检测算法问题
#### Q1. 大图片小目标 最终推理的图片也是大图片 怎么预处理呀
A1. 小目标问题常见的处理方式是切图以及增大网络输入尺寸如果使用基于anchor的检测算法可以通过对目标物体大小聚类生成anchor参考[脚本](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/tools/anchor_cluster.py) 目前基于PP-YOLOE的小目标检测正在开发中
#### Q2. 想问下大的目标对象怎么检测,比如发票
A2. 如果使用基于anchor的检测算法可以通过对目标物体大小聚类生成anchor参考[脚本](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/tools/anchor_cluster.py);另外可以增强深层特征提升大物体检测效果
#### Q3. 在做预测时发现预测框特别多有的框的置信度甚至低于0.1,请问如果将这种框过滤掉?也就是训练模型时就把这些极低置信度的预测结果过滤掉,避免在推理部署时,做不必要的计算,从而影响推理速度。
A3. 后处理部分有两个过滤1是提取置信度最高的Top 100个框做nms。2是根据设定阈值threshold进行过滤。如果你可以确认图片上目标相对比较少<10个可以调整Top 100这个值到50或者更低这样可以加速nms部分的计算。其次调整threshold这个影响最终检测的准确度和召回率的效果。
#### Q4. 正负样本的比例一般怎么设计
A4. 在PaddleDetection中支持负样本训练TrainDataset下设置allow_empty: true即可通过数据集测试负样本比例在0.3时对模型提升效果最明显。
## 压缩部署问题
#### Q1. PaddleDetection训练的模型导出inference model后在做推理部署的时候前后处理相关代码如何编写有什么参考教程吗
A1. 目前PaddleDetection下的网络模型大部分都能够支持c++ inference不同的处理方式针对不同功能例如PP-YOLOE速度测试不包含后处理PicoDet为支持不同的第三方推理引擎会设置是否导出nms
object_detector.cc是针对所有检测模型的流程其中前处理大部分都是decode-resize-normalize-permute 部分网络会加入padding的操作大部分模型的后处理操作都放在模型里面了picodet有单独提供nms的后处理代码
检测模型的输入统一为imageim_shapescale_factor 如果模型中没有使用im_shape输出个数会减少但是整套预处理流程不需要额外开发
#### Q2. 针对TensorRT的加速问题fp16在v100确实可以但是耗时好像有点偏差我在1080ti上单张图片跑1000次耗时50s还是float32的可是在v100上float16耗时97
A2. 目前PPYOLOE等模型的速度都有在V100上使用TensorRT FP16测试关于速度测试有以下几个方面可以排查
1. 速度测试时是否正确设置warmup以避免过长的启动时间影响速度测试准确度
2. 在开启TensorRT时生成engine文件的过程耗时较长可以在https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/infer.py#L745 中将use_static设置为True
#### Q3. PaddleDetection已经支持了在线量化一些模型比如想训练其他的一个新模型是不是可以轻松用起来qat如果不能为什么只能支持很有限的模型而qat其他模型总会出各种各样的问题原因是什么
A3. 目前PaddleDetection模型很多只能针对部分模型开源了QAT的config其他模型也是支持QAT的只是配置文件没有覆盖到如果量化报错通常是配置问题。检测模型一般建议跳过head最后一个conv。如果想要跳过某些层量化可以设置skip_quant参考[代码](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/ppdet/modeling/heads/yolo_head.py#L97)

View File

@@ -0,0 +1,47 @@
# 通用检测benchmark测试脚本说明
```
├── benchmark
│ ├── analysis_log.py
│ ├── prepare.sh
│ ├── README.md
│ ├── run_all.sh
│ ├── run_benchmark.sh
```
## 脚本说明
### prepare.sh
相关数据准备脚本,完成数据、模型的自动下载
### run_all.sh
主要运行脚本,可完成所有相关模型的测试方案
### run_benchmark.sh
单模型运行脚本,可完成指定模型的测试方案
## Docker 运行环境
* docker image: registry.baidubce.com/paddlepaddle/paddle:2.1.2-gpu-cuda10.2-cudnn7
* paddle = 2.1.2
* python = 3.7
## 运行benchmark测试
### 运行所有模型
```
git clone https://github.com/PaddlePaddle/PaddleDetection.git
cd PaddleDetection
bash benchmark/run_all.sh
```
### 运行指定模型
* Usagebash run_benchmark.sh ${run_mode} ${batch_size} ${fp_item} ${max_epoch} ${model_name}
* model_name: faster_rcnn, fcos, deformable_detr, gfl, hrnet, higherhrnet, solov2, jde, fairmot
```
git clone https://github.com/PaddlePaddle/PaddleDetection.git
cd PaddleDetection
bash benchmark/prepare.sh
# 单卡
CUDA_VISIBLE_DEVICES=0 bash benchmark/run_benchmark.sh sp 2 fp32 1 faster_rcnn
# 多卡
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash benchmark/run_benchmark.sh mp 2 fp32 1 faster_rcnn
```

View File

@@ -0,0 +1,48 @@
_BASE_: [
'../../configs/datasets/coco_detection.yml',
'../../configs/runtime.yml',
'../../configs/faster_rcnn/_base_/optimizer_1x.yml',
'../../configs/faster_rcnn/_base_/faster_rcnn_r50_fpn.yml',
]
weights: output/faster_rcnn_r50_fpn_1x_coco/model_final
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- RandomFlip: {prob: 0.5}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: true
drop_last: true
collate_batch: false
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: false
drop_last: false
TestReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: false
drop_last: false

View File

@@ -0,0 +1,17 @@
#!/usr/bin/env bash
pip install -U pip Cython
pip install -r requirements.txt
mv ./dataset/coco/download_coco.py . && rm -rf ./dataset/coco/* && mv ./download_coco.py ./dataset/coco/
# prepare lite train data
wget -nc -P ./dataset/coco/ https://paddledet.bj.bcebos.com/data/coco_benchmark.tar
cd ./dataset/coco/ && tar -xvf coco_benchmark.tar && mv -u coco_benchmark/* .
rm -rf coco_benchmark/
cd ../../
rm -rf ./dataset/mot/*
# prepare mot mini train data
wget -nc -P ./dataset/mot/ https://paddledet.bj.bcebos.com/data/mot_benchmark.tar
cd ./dataset/mot/ && tar -xvf mot_benchmark.tar && mv -u mot_benchmark/* .
rm -rf mot_benchmark/

View File

@@ -0,0 +1,47 @@
# Use docker: paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7 paddle=2.1.2 python3.7
#
# Usage:
# git clone https://github.com/PaddlePaddle/PaddleDetection.git
# cd PaddleDetection
# bash benchmark/run_all.sh
log_path=${LOG_PATH_INDEX_DIR:-$(pwd)} # benchmark系统指定该参数,不需要跑profile时,log_path指向存speed的目录
# run prepare.sh
bash benchmark/prepare.sh
model_name_list=(faster_rcnn fcos deformable_detr gfl hrnet higherhrnet solov2 jde fairmot)
fp_item_list=(fp32)
max_epoch=2
for model_item in ${model_name_list[@]}; do
for fp_item in ${fp_item_list[@]}; do
case ${model_item} in
faster_rcnn) bs_list=(1 8) ;;
fcos) bs_list=(2) ;;
deformable_detr) bs_list=(2) ;;
gfl) bs_list=(2) ;;
hrnet) bs_list=(64) ;;
higherhrnet) bs_list=(20) ;;
solov2) bs_list=(2) ;;
jde) bs_list=(4) ;;
fairmot) bs_list=(6) ;;
*) echo "wrong model_name"; exit 1;
esac
for bs_item in ${bs_list[@]}
do
run_mode=sp
log_name=detection_${model_item}_bs${bs_item}_${fp_item} # 如:clas_MobileNetv1_mp_bs32_fp32_8
echo "index is speed, 1gpus, begin, ${log_name}"
CUDA_VISIBLE_DEVICES=0 bash benchmark/run_benchmark.sh ${run_mode} ${bs_item} \
${fp_item} ${max_epoch} ${model_item} | tee ${log_path}/${log_name}_speed_1gpus 2>&1
sleep 60
run_mode=mp
log_name=detection_${model_item}_bs${bs_item}_${fp_item} # 如:clas_MobileNetv1_mp_bs32_fp32_8
echo "index is speed, 8gpus, run_mode is multi_process, begin, ${log_name}"
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash benchmark/run_benchmark.sh ${run_mode} \
${bs_item} ${fp_item} ${max_epoch} ${model_item}| tee ${log_path}/${log_name}_speed_8gpus8p 2>&1
sleep 60
done
done
done

View File

@@ -0,0 +1,92 @@
#!/usr/bin/env bash
set -xe
# UsageCUDA_VISIBLE_DEVICES=0 bash benchmark/run_benchmark.sh ${run_mode} ${batch_size} ${fp_item} ${max_epoch} ${model_name}
python="python3.7"
# Parameter description
function _set_params(){
run_mode=${1:-"sp"} # sp|mp
batch_size=${2:-"2"}
fp_item=${3:-"fp32"} # fp32|fp16
max_epoch=${4:-"1"}
model_item=${5:-"model_item"}
run_log_path=${TRAIN_LOG_DIR:-$(pwd)}
# 添加日志解析需要的参数
base_batch_size=${batch_size}
mission_name="目标检测"
direction_id="0"
ips_unit="images/s"
skip_steps=10 # 解析日志有些模型前几个step耗时长需要跳过 (必填)
keyword="ips:" # 解析日志,筛选出数据所在行的关键字 (必填)
index="1"
model_name=${model_item}_bs${batch_size}_${fp_item}
device=${CUDA_VISIBLE_DEVICES//,/ }
arr=(${device})
num_gpu_devices=${#arr[*]}
log_file=${run_log_path}/${model_item}_${run_mode}_bs${batch_size}_${fp_item}_${num_gpu_devices}
}
function _train(){
echo "Train on ${num_gpu_devices} GPUs"
echo "current CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES, gpus=$num_gpu_devices, batch_size=$batch_size"
# set runtime params
set_optimizer_lr_sp=" "
set_optimizer_lr_mp=" "
# parse model_item
case ${model_item} in
faster_rcnn) model_yml="benchmark/configs/faster_rcnn_r50_fpn_1x_coco.yml"
set_optimizer_lr_sp="LearningRate.base_lr=0.001" ;;
fcos) model_yml="configs/fcos/fcos_r50_fpn_1x_coco.yml"
set_optimizer_lr_sp="LearningRate.base_lr=0.001" ;;
deformable_detr) model_yml="configs/deformable_detr/deformable_detr_r50_1x_coco.yml" ;;
gfl) model_yml="configs/gfl/gfl_r50_fpn_1x_coco.yml"
set_optimizer_lr_sp="LearningRate.base_lr=0.001" ;;
hrnet) model_yml="configs/keypoint/hrnet/hrnet_w32_256x192.yml" ;;
higherhrnet) model_yml="configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml" ;;
solov2) model_yml="configs/solov2/solov2_r50_fpn_1x_coco.yml" ;;
jde) model_yml="configs/mot/jde/jde_darknet53_30e_1088x608.yml" ;;
fairmot) model_yml="configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml" ;;
*) echo "Undefined model_item"; exit 1;
esac
set_batch_size="TrainReader.batch_size=${batch_size}"
set_max_epoch="epoch=${max_epoch}"
set_log_iter="log_iter=1"
if [ ${fp_item} = "fp16" ]; then
set_fp_item="--fp16"
else
set_fp_item=" "
fi
case ${run_mode} in
sp) train_cmd="${python} -u tools/train.py -c ${model_yml} ${set_fp_item} \
-o ${set_batch_size} ${set_max_epoch} ${set_log_iter} ${set_optimizer_lr_sp}" ;;
mp) rm -rf mylog
train_cmd="${python} -m paddle.distributed.launch --log_dir=./mylog \
--gpus=${CUDA_VISIBLE_DEVICES} tools/train.py -c ${model_yml} ${set_fp_item} \
-o ${set_batch_size} ${set_max_epoch} ${set_log_iter} ${set_optimizer_lr_mp}"
log_parse_file="mylog/workerlog.0" ;;
*) echo "choose run_mode(sp or mp)"; exit 1;
esac
timeout 15m ${train_cmd} > ${log_file} 2>&1
if [ $? -ne 0 ];then
echo -e "${train_cmd}, FAIL"
export job_fail_flag=1
else
echo -e "${train_cmd}, SUCCESS"
export job_fail_flag=0
fi
kill -9 `ps -ef|grep 'python'|awk '{print $2}'`
if [ $run_mode = "mp" -a -d mylog ]; then
rm ${log_file}
cp mylog/workerlog.0 ${log_file}
fi
}
source ${BENCHMARK_ROOT}/scripts/run_model.sh # 在该脚本中会对符合benchmark规范的log使用analysis.py 脚本进行性能数据解析;该脚本在联调时可从benchmark repo中下载https://github.com/PaddlePaddle/benchmark/blob/master/scripts/run_model.sh;如果不联调只想要产出训练log可以注掉本行,提交时需打开
_set_params $@
# _train # 如果只想产出训练log,不解析,可取消注释
_run # 该函数在run_model.sh中,执行时会调用_train; 如果不联调只想要产出训练log可以注掉本行,提交时需打开

View File

@@ -0,0 +1,28 @@
# Cascade R-CNN: High Quality Object Detection and Instance Segmentation
## Model Zoo
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 |
| :------------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: |
| ResNet50-FPN | Cascade Faster | 1 | 1x | ---- | 41.1 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml) |
| ResNet50-FPN | Cascade Mask | 1 | 1x | ---- | 41.8 | 36.3 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 1x | ---- | 44.4 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Faster | 1 | 2x | ---- | 45.0 | - | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 1x | ---- | 44.9 | 39.1 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
| ResNet50-vd-SSLDv2-FPN | Cascade Mask | 1 | 2x | ---- | 45.7 | 39.7 | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
## Citations
```
@article{Cai_2019,
title={Cascade R-CNN: High Quality Object Detection and Instance Segmentation},
ISSN={1939-3539},
url={http://dx.doi.org/10.1109/tpami.2019.2956516},
DOI={10.1109/tpami.2019.2956516},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
author={Cai, Zhaowei and Vasconcelos, Nuno},
year={2019},
pages={11}
}
```

View File

@@ -0,0 +1,40 @@
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
- RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True}
- RandomFlip: {prob: 0.5}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: true
drop_last: true
collate_batch: false
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: false
drop_last: false
TestReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: false
drop_last: false

View File

@@ -0,0 +1,40 @@
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
- RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True}
- RandomFlip: {prob: 0.5}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: true
drop_last: true
collate_batch: false
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: false
drop_last: false
TestReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_transforms:
- PadBatch: {pad_to_stride: 32}
batch_size: 1
shuffle: false
drop_last: false

View File

@@ -0,0 +1,97 @@
architecture: CascadeRCNN
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
CascadeRCNN:
backbone: ResNet
neck: FPN
rpn_head: RPNHead
bbox_head: CascadeHead
mask_head: MaskHead
# post process
bbox_post_process: BBoxPostProcess
mask_post_process: MaskPostProcess
ResNet:
# index 0 stands for res2
depth: 50
norm_type: bn
freeze_at: 0
return_idx: [0,1,2,3]
num_stages: 4
FPN:
out_channel: 256
RPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
anchor_sizes: [[32], [64], [128], [256], [512]]
strides: [4, 8, 16, 32, 64]
rpn_target_assign:
batch_size_per_im: 256
fg_fraction: 0.5
negative_overlap: 0.3
positive_overlap: 0.7
use_random: True
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
topk_after_collect: True
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
CascadeHead:
head: CascadeTwoFCHead
roi_extractor:
resolution: 7
sampling_ratio: 0
aligned: True
bbox_assigner: BBoxAssigner
BBoxAssigner:
batch_size_per_im: 512
bg_thresh: 0.5
fg_thresh: 0.5
fg_fraction: 0.25
cascade_iou: [0.5, 0.6, 0.7]
use_random: True
CascadeTwoFCHead:
out_channel: 1024
BBoxPostProcess:
decode:
name: RCNNBox
prior_box_var: [30.0, 30.0, 15.0, 15.0]
nms:
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.05
nms_threshold: 0.5
MaskHead:
head: MaskFeat
roi_extractor:
resolution: 14
sampling_ratio: 0
aligned: True
mask_assigner: MaskAssigner
share_bbox_feat: False
MaskFeat:
num_convs: 4
out_channel: 256
MaskAssigner:
mask_resolution: 28
MaskPostProcess:
binary_thresh: 0.5

View File

@@ -0,0 +1,75 @@
architecture: CascadeRCNN
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
CascadeRCNN:
backbone: ResNet
neck: FPN
rpn_head: RPNHead
bbox_head: CascadeHead
# post process
bbox_post_process: BBoxPostProcess
ResNet:
# index 0 stands for res2
depth: 50
norm_type: bn
freeze_at: 0
return_idx: [0,1,2,3]
num_stages: 4
FPN:
out_channel: 256
RPNHead:
anchor_generator:
aspect_ratios: [0.5, 1.0, 2.0]
anchor_sizes: [[32], [64], [128], [256], [512]]
strides: [4, 8, 16, 32, 64]
rpn_target_assign:
batch_size_per_im: 256
fg_fraction: 0.5
negative_overlap: 0.3
positive_overlap: 0.7
use_random: True
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
topk_after_collect: True
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 1000
post_nms_top_n: 1000
CascadeHead:
head: CascadeTwoFCHead
roi_extractor:
resolution: 7
sampling_ratio: 0
aligned: True
bbox_assigner: BBoxAssigner
BBoxAssigner:
batch_size_per_im: 512
bg_thresh: 0.5
fg_thresh: 0.5
fg_fraction: 0.25
cascade_iou: [0.5, 0.6, 0.7]
use_random: True
CascadeTwoFCHead:
out_channel: 1024
BBoxPostProcess:
decode:
name: RCNNBox
prior_box_var: [30.0, 30.0, 15.0, 15.0]
nms:
name: MultiClassNMS
keep_top_k: 100
score_threshold: 0.05
nms_threshold: 0.5

View File

@@ -0,0 +1,19 @@
epoch: 12
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [8, 11]
- !LinearWarmup
start_factor: 0.001
steps: 1000
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2

View File

@@ -0,0 +1,8 @@
_BASE_: [
'../datasets/coco_instance.yml',
'../runtime.yml',
'_base_/optimizer_1x.yml',
'_base_/cascade_mask_rcnn_r50_fpn.yml',
'_base_/cascade_mask_fpn_reader.yml',
]
weights: output/cascade_mask_rcnn_r50_fpn_1x_coco/model_final

View File

@@ -0,0 +1,18 @@
_BASE_: [
'../datasets/coco_instance.yml',
'../runtime.yml',
'_base_/optimizer_1x.yml',
'_base_/cascade_mask_rcnn_r50_fpn.yml',
'_base_/cascade_mask_fpn_reader.yml',
]
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
weights: output/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco/model_final
ResNet:
depth: 50
variant: d
norm_type: bn
freeze_at: 0
return_idx: [0,1,2,3]
num_stages: 4
lr_mult_list: [0.05, 0.05, 0.1, 0.15]

View File

@@ -0,0 +1,29 @@
_BASE_: [
'../datasets/coco_instance.yml',
'../runtime.yml',
'_base_/optimizer_1x.yml',
'_base_/cascade_mask_rcnn_r50_fpn.yml',
'_base_/cascade_mask_fpn_reader.yml',
]
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
weights: output/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco/model_final
ResNet:
depth: 50
variant: d
norm_type: bn
freeze_at: 0
return_idx: [0,1,2,3]
num_stages: 4
lr_mult_list: [0.05, 0.05, 0.1, 0.15]
epoch: 24
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [12, 22]
- !LinearWarmup
start_factor: 0.1
steps: 1000

View File

@@ -0,0 +1,8 @@
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'_base_/optimizer_1x.yml',
'_base_/cascade_rcnn_r50_fpn.yml',
'_base_/cascade_fpn_reader.yml',
]
weights: output/cascade_rcnn_r50_fpn_1x_coco/model_final

View File

@@ -0,0 +1,18 @@
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'_base_/optimizer_1x.yml',
'_base_/cascade_rcnn_r50_fpn.yml',
'_base_/cascade_fpn_reader.yml',
]
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
weights: output/cascade_rcnn_r50_vd_fpn_ssld_1x_coco/model_final
ResNet:
depth: 50
variant: d
norm_type: bn
freeze_at: 0
return_idx: [0,1,2,3]
num_stages: 4
lr_mult_list: [0.05, 0.05, 0.1, 0.15]

View File

@@ -0,0 +1,29 @@
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'_base_/optimizer_1x.yml',
'_base_/cascade_rcnn_r50_fpn.yml',
'_base_/cascade_fpn_reader.yml',
]
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_coco/model_final
ResNet:
depth: 50
variant: d
norm_type: bn
freeze_at: 0
return_idx: [0,1,2,3]
num_stages: 4
lr_mult_list: [0.05, 0.05, 0.1, 0.15]
epoch: 24
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [12, 22]
- !LinearWarmup
start_factor: 0.1
steps: 1000

View File

@@ -0,0 +1,37 @@
English | [简体中文](README_cn.md)
# CenterNet (CenterNet: Objects as Points)
## Table of Contents
- [Introduction](#Introduction)
- [Model Zoo](#Model_Zoo)
- [Citations](#Citations)
## Introduction
[CenterNet](http://arxiv.org/abs/1904.07850) is an Anchor Free detector, which model an object as a single point -- the center point of its bounding box. The detector uses keypoint estimation to find center points and regresses to all other object properties. The center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors.
## Model Zoo
### CenterNet Results on COCO-val 2017
| backbone | input shape | mAP | FPS | download | config |
| :--------------| :------- | :----: | :------: | :----: |:-----: |
| DLA-34(paper) | 512x512 | 37.4 | - | - | - |
| DLA-34 | 512x512 | 37.6 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_dla34_140e_coco.pdparams) | [config](./centernet_dla34_140e_coco.yml) |
| ResNet50 + DLAUp | 512x512 | 38.9 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_r50_140e_coco.pdparams) | [config](./centernet_r50_140e_coco.yml) |
| MobileNetV1 + DLAUp | 512x512 | 28.2 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv1_140e_coco.pdparams) | [config](./centernet_mbv1_140e_coco.yml) |
| MobileNetV3_small + DLAUp | 512x512 | 17 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv3_small_140e_coco.pdparams) | [config](./centernet_mbv3_small_140e_coco.yml) |
| MobileNetV3_large + DLAUp | 512x512 | 27.1 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv3_large_140e_coco.pdparams) | [config](./centernet_mbv3_large_140e_coco.yml) |
| ShuffleNetV2 + DLAUp | 512x512 | 23.8 | - | [model](https://bj.bcebos.com/v1/paddledet/models/centernet_shufflenetv2_140e_coco.pdparams) | [config](./centernet_shufflenetv2_140e_coco.yml) |
## Citations
```
@article{zhou2019objects,
title={Objects as points},
author={Zhou, Xingyi and Wang, Dequan and Kr{\"a}henb{\"u}hl, Philipp},
journal={arXiv preprint arXiv:1904.07850},
year={2019}
}
```

View File

@@ -0,0 +1,36 @@
简体中文 | [English](README.md)
# CenterNet (CenterNet: Objects as Points)
## 内容
- [简介](#简介)
- [模型库](#模型库)
- [引用](#引用)
## 内容
[CenterNet](http://arxiv.org/abs/1904.07850)是Anchor Free检测器将物体表示为一个目标框中心点。CenterNet使用关键点检测的方式定位中心点并回归物体的其他属性。CenterNet是以中心点为基础的检测方法是端到端可训练的并且相较于基于anchor的检测器更加检测高效。
## 模型库
### CenterNet在COCO-val 2017上结果
| 骨干网络 | 输入尺寸 | mAP | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :------: | :----: |:-----: |
| DLA-34(paper) | 512x512 | 37.4 | - | - | - |
| DLA-34 | 512x512 | 37.6 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_dla34_140e_coco.pdparams) | [配置文件](./centernet_dla34_140e_coco.yml) |
| ResNet50 + DLAUp | 512x512 | 38.9 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_r50_140e_coco.pdparams) | [配置文件](./centernet_r50_140e_coco.yml) |
| MobileNetV1 + DLAUp | 512x512 | 28.2 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv1_140e_coco.pdparams) | [配置文件](./centernet_mbv1_140e_coco.yml) |
| MobileNetV3_small + DLAUp | 512x512 | 17 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv3_small_140e_coco.pdparams) | [配置文件](./centernet_mbv3_small_140e_coco.yml) |
| MobileNetV3_large + DLAUp | 512x512 | 27.1 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_mbv3_large_140e_coco.pdparams) | [配置文件](./centernet_mbv3_large_140e_coco.yml) |
| ShuffleNetV2 + DLAUp | 512x512 | 23.8 | - | [下载链接](https://bj.bcebos.com/v1/paddledet/models/centernet_shufflenetv2_140e_coco.pdparams) | [配置文件](./centernet_shufflenetv2_140e_coco.yml) |
## 引用
```
@article{zhou2019objects,
title={Objects as points},
author={Zhou, Xingyi and Wang, Dequan and Kr{\"a}henb{\"u}hl, Philipp},
journal={arXiv preprint arXiv:1904.07850},
year={2019}
}
```

View File

@@ -0,0 +1,22 @@
architecture: CenterNet
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/DLA34_pretrain.pdparams
CenterNet:
backbone: DLA
neck: CenterNetDLAFPN
head: CenterNetHead
post_process: CenterNetPostProcess
DLA:
depth: 34
CenterNetDLAFPN:
down_ratio: 4
CenterNetHead:
head_planes: 256
regress_ltrb: False
CenterNetPostProcess:
max_per_img: 100
regress_ltrb: False

View File

@@ -0,0 +1,34 @@
architecture: CenterNet
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
CenterNet:
backbone: ResNet
neck: CenterNetDLAFPN
head: CenterNetHead
post_process: CenterNetPostProcess
ResNet:
depth: 50
variant: d
return_idx: [0, 1, 2, 3]
freeze_at: -1
norm_decay: 0.
dcn_v2_stages: [3]
CenterNetDLAFPN:
first_level: 0
last_level: 4
down_ratio: 4
dcn_v2: False
CenterNetHead:
head_planes: 256
regress_ltrb: False
CenterNetPostProcess:
max_per_img: 100
regress_ltrb: False

View File

@@ -0,0 +1,35 @@
worker_num: 4
TrainReader:
inputs_def:
image_shape: [3, 512, 512]
sample_transforms:
- Decode: {}
- FlipWarpAffine: {keep_res: False, input_h: 512, input_w: 512, use_random: True}
- CenterRandColor: {}
- Lighting: {eigval: [0.2141788, 0.01817699, 0.00341571], eigvec: [[-0.58752847, -0.69563484, 0.41340352], [-0.5832747, 0.00994535, -0.81221408], [-0.56089297, 0.71832671, 0.41158938]]}
- NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: False}
- Permute: {}
- Gt2CenterNetTarget: {down_ratio: 4, max_objs: 128}
batch_size: 16
shuffle: True
drop_last: True
use_shared_memory: True
EvalReader:
sample_transforms:
- Decode: {}
- WarpAffine: {keep_res: True, input_h: 512, input_w: 512}
- NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834]}
- Permute: {}
batch_size: 1
TestReader:
inputs_def:
image_shape: [3, 512, 512]
sample_transforms:
- Decode: {}
- WarpAffine: {keep_res: True, input_h: 512, input_w: 512}
- NormalizeImage: {mean: [0.40789655, 0.44719303, 0.47026116], std: [0.2886383 , 0.27408165, 0.27809834], is_scale: True}
- Permute: {}
batch_size: 1

View File

@@ -0,0 +1,14 @@
epoch: 140
LearningRate:
base_lr: 0.0005
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [90, 120]
use_warmup: False
OptimizerBuilder:
optimizer:
type: Adam
regularizer: NULL

View File

@@ -0,0 +1,9 @@
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'_base_/optimizer_140e.yml',
'_base_/centernet_dla34.yml',
'_base_/centernet_reader.yml',
]
weights: output/centernet_dla34_140e_coco/model_final

Some files were not shown because too many files have changed in this diff Show More