文档检测
This commit is contained in:
18
object_detection/.gitignore
vendored
Normal file
18
object_detection/.gitignore
vendored
Normal file
@@ -0,0 +1,18 @@
|
||||
loss/
|
||||
data/
|
||||
cache/
|
||||
tf_cache/
|
||||
debug/
|
||||
results/
|
||||
|
||||
misc/outputs
|
||||
|
||||
evaluation/evaluate_object
|
||||
evaluation/analyze_object
|
||||
|
||||
nnet/__pycache__/
|
||||
|
||||
*.swp
|
||||
|
||||
*.pyc
|
||||
*.o*
|
||||
29
object_detection/LICENSE
Normal file
29
object_detection/LICENSE
Normal file
@@ -0,0 +1,29 @@
|
||||
BSD 3-Clause License
|
||||
|
||||
Copyright (c) 2019, Princeton University
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
|
||||
* Neither the name of the copyright holder nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
||||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
152
object_detection/README.md
Normal file
152
object_detection/README.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# CornerNet-Lite: Training, Evaluation and Testing Code
|
||||
Code for reproducing results in the following paper:
|
||||
|
||||
[**CornerNet-Lite: Efficient Keypoint Based Object Detection**](https://arxiv.org/abs/1904.08900)
|
||||
Hei Law, Yun Teng, Olga Russakovsky, Jia Deng
|
||||
*arXiv:1904.08900*
|
||||
|
||||
## Getting Started
|
||||
### Software Requirement
|
||||
- Python 3.7
|
||||
- PyTorch 1.0.0
|
||||
- CUDA 10
|
||||
- GCC 4.9.2 or above
|
||||
|
||||
### Installing Dependencies
|
||||
Please first install [Anaconda](https://anaconda.org) and create an Anaconda environment using the provided package list `conda_packagelist.txt`.
|
||||
```
|
||||
conda create --name CornerNet_Lite --file conda_packagelist.txt --channel pytorch
|
||||
```
|
||||
|
||||
After you create the environment, please activate it.
|
||||
```
|
||||
source activate CornerNet_Lite
|
||||
```
|
||||
|
||||
### Compiling Corner Pooling Layers
|
||||
Compile the C++ implementation of the corner pooling layers. (GCC4.9.2 or above is required.)
|
||||
```
|
||||
cd <CornerNet-Lite dir>/core/models/py_utils/_cpools/
|
||||
python setup.py install --user
|
||||
```
|
||||
|
||||
### Compiling NMS
|
||||
Compile the NMS code which are originally from [Faster R-CNN](https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/nms/cpu_nms.pyx) and [Soft-NMS](https://github.com/bharatsingh430/soft-nms/blob/master/lib/nms/cpu_nms.pyx).
|
||||
```
|
||||
cd <CornerNet-Lite dir>/core/external
|
||||
make
|
||||
```
|
||||
|
||||
### Downloading Models
|
||||
In this repo, we provide models for the following detectors:
|
||||
- [CornerNet-Saccade](https://drive.google.com/file/d/1MQDyPRI0HgDHxHToudHqQ-2m8TVBciaa/view?usp=sharing)
|
||||
- [CornerNet-Squeeze](https://drive.google.com/file/d/1qM8BBYCLUBcZx_UmLT0qMXNTh-Yshp4X/view?usp=sharing)
|
||||
- [CornerNet](https://drive.google.com/file/d/1e8At_iZWyXQgLlMwHkB83kN-AN85Uff1/view?usp=sharing)
|
||||
|
||||
Put the CornerNet-Saccade model under `<CornerNet-Lite dir>/cache/nnet/CornerNet_Saccade/`, CornerNet-Squeeze model under `<CornerNet-Lite dir>/cache/nnet/CornerNet_Squeeze/` and CornerNet model under `<CornerNet-Lite dir>/cache/nnet/CornerNet/`. (\* Note we use underscore instead of dash in both the directory names for CornerNet-Saccade and CornerNet-Squeeze.)
|
||||
|
||||
Note: The CornerNet model is the same as the one in the original [CornerNet repo](https://github.com/princeton-vl/CornerNet). We just ported it to this new repo.
|
||||
|
||||
### Running the Demo Script
|
||||
After downloading the models, you should be able to use the detectors on your own images. We provide a demo script `demo.py` to test if the repo is installed correctly.
|
||||
```
|
||||
python demo.py
|
||||
```
|
||||
This script applies CornerNet-Saccade to `demo.jpg` and writes the results to `demo_out.jpg`.
|
||||
|
||||
In the demo script, the default detector is CornerNet-Saccade. You can modify the demo script to test different detectors. For example, if you want to test CornerNet-Squeeze:
|
||||
```python
|
||||
#!/usr/bin/env python
|
||||
|
||||
import cv2
|
||||
from core.detectors import CornerNet_Squeeze
|
||||
from core.vis_utils import draw_bboxes
|
||||
|
||||
detector = CornerNet_Squeeze()
|
||||
image = cv2.imread("demo.jpg")
|
||||
|
||||
bboxes = detector(image)
|
||||
image = draw_bboxes(image, bboxes)
|
||||
cv2.imwrite("demo_out.jpg", image)
|
||||
```
|
||||
|
||||
### Using CornerNet-Lite in Your Project
|
||||
It is also easy to use CornerNet-Lite in your project. You will need to change the directory name from `CornerNet-Lite` to `CornerNet_Lite`. Otherwise, you won't be able to import CornerNet-Lite.
|
||||
```
|
||||
Your project
|
||||
│ README.md
|
||||
│ ...
|
||||
│ foo.py
|
||||
│
|
||||
└───CornerNet_Lite
|
||||
│
|
||||
└───directory1
|
||||
│
|
||||
└───...
|
||||
```
|
||||
|
||||
In `foo.py`, you can easily import CornerNet-Saccade by adding:
|
||||
```python
|
||||
from CornerNet_Lite import CornerNet_Saccade
|
||||
|
||||
def foo():
|
||||
cornernet = CornerNet_Saccade()
|
||||
# CornerNet_Saccade is ready to use
|
||||
|
||||
image = cv2.imread('/path/to/your/image')
|
||||
bboxes = cornernet(image)
|
||||
```
|
||||
|
||||
If you want to train or evaluate the detectors on COCO, please move on to the following steps.
|
||||
|
||||
## Training and Evaluation
|
||||
|
||||
### Installing MS COCO APIs
|
||||
```
|
||||
mkdir -p <CornerNet-Lite dir>/data
|
||||
cd <CornerNet-Lite dir>/data
|
||||
git clone git@github.com:cocodataset/cocoapi.git coco
|
||||
cd <CornerNet-Lite dir>/data/coco/PythonAPI
|
||||
make install
|
||||
```
|
||||
|
||||
### Downloading MS COCO Data
|
||||
- Download the training/validation split we use in our paper from [here](https://drive.google.com/file/d/1dop4188xo5lXDkGtOZUzy2SHOD_COXz4/view?usp=sharing) (originally from [Faster R-CNN](https://github.com/rbgirshick/py-faster-rcnn/tree/master/data))
|
||||
- Unzip the file and place `annotations` under `<CornerNet-Lite dir>/data/coco`
|
||||
- Download the images (2014 Train, 2014 Val, 2017 Test) from [here](http://cocodataset.org/#download)
|
||||
- Create 3 directories, `trainval2014`, `minival2014` and `testdev2017`, under `<CornerNet-Lite dir>/data/coco/images/`
|
||||
- Copy the training/validation/testing images to the corresponding directories according to the annotation files
|
||||
|
||||
To train and evaluate a network, you will need to create a configuration file, which defines the hyperparameters, and a model file, which defines the network architecture. The configuration file should be in JSON format and placed in `<CornerNet-Lite dir>/configs/`. Each configuration file should have a corresponding model file in `<CornerNet-Lite dir>/core/models/`. i.e. If there is a `<model>.json` in `<CornerNet-Lite dir>/configs/`, there should be a `<model>.py` in `<CornerNet-Lite dir>/core/models/`. There is only one exception which we will mention later.
|
||||
|
||||
### Training and Evaluating a Model
|
||||
To train a model:
|
||||
```
|
||||
python train.py <model>
|
||||
```
|
||||
|
||||
We provide the configuration files and the model files for CornerNet-Saccade, CornerNet-Squeeze and CornerNet in this repo. Please check the configuration files in `<CornerNet-Lite dir>/configs/`.
|
||||
|
||||
To train CornerNet-Saccade:
|
||||
```
|
||||
python train.py CornerNet_Saccade
|
||||
```
|
||||
Please adjust the batch size in `CornerNet_Saccade.json` to accommodate the number of GPUs that are available to you.
|
||||
|
||||
To evaluate the trained model:
|
||||
```
|
||||
python evaluate.py CornerNet_Saccade --testiter 500000 --split <split>
|
||||
```
|
||||
|
||||
If you want to test different hyperparameters during evaluation and do not want to overwrite the original configuration file, you can do so by creating a configuration file with a suffix (`<model>-<suffix>.json`). There is no need to create `<model>-<suffix>.py` in `<CornerNet-Lite dir>/core/models/`.
|
||||
|
||||
To use the new configuration file:
|
||||
```
|
||||
python evaluate.py <model> --testiter <iter> --split <split> --suffix <suffix>
|
||||
```
|
||||
|
||||
We also include a configuration file for CornerNet under multi-scale setting, which is `CornerNet-multi_scale.json`, in this repo.
|
||||
|
||||
To use the multi-scale configuration file:
|
||||
```
|
||||
python evaluate.py CornerNet --testiter <iter> --split <split> --suffix multi_scale
|
||||
2
object_detection/__init__.py
Normal file
2
object_detection/__init__.py
Normal file
@@ -0,0 +1,2 @@
|
||||
from .core.detectors import CornerNet, CornerNet_Squeeze, CornerNet_Saccade
|
||||
from .core.vis_utils import draw_bboxes
|
||||
81
object_detection/conda_packagelist.txt
Normal file
81
object_detection/conda_packagelist.txt
Normal file
@@ -0,0 +1,81 @@
|
||||
# This file may be used to create an environment using:
|
||||
# $ conda create --name <env> --file <this file>
|
||||
# platform: linux-64
|
||||
blas=1.0=mkl
|
||||
bzip2=1.0.6=h14c3975_5
|
||||
ca-certificates=2018.12.5=0
|
||||
cairo=1.14.12=h8948797_3
|
||||
certifi=2018.11.29=py37_0
|
||||
cffi=1.11.5=py37he75722e_1
|
||||
cuda100=1.0=0
|
||||
cycler=0.10.0=py37_0
|
||||
cython=0.28.5=py37hf484d3e_0
|
||||
dbus=1.13.2=h714fa37_1
|
||||
expat=2.2.6=he6710b0_0
|
||||
ffmpeg=4.0=hcdf2ecd_0
|
||||
fontconfig=2.13.0=h9420a91_0
|
||||
freeglut=3.0.0=hf484d3e_5
|
||||
freetype=2.9.1=h8a8886c_1
|
||||
glib=2.56.2=hd408876_0
|
||||
graphite2=1.3.12=h23475e2_2
|
||||
gst-plugins-base=1.14.0=hbbd80ab_1
|
||||
gstreamer=1.14.0=hb453b48_1
|
||||
harfbuzz=1.8.8=hffaf4a1_0
|
||||
hdf5=1.10.2=hba1933b_1
|
||||
icu=58.2=h9c2bf20_1
|
||||
intel-openmp=2019.0=118
|
||||
jasper=2.0.14=h07fcdf6_1
|
||||
jpeg=9b=h024ee3a_2
|
||||
kiwisolver=1.0.1=py37hf484d3e_0
|
||||
libedit=3.1.20170329=h6b74fdf_2
|
||||
libffi=3.2.1=hd88cf55_4
|
||||
libgcc-ng=8.2.0=hdf63c60_1
|
||||
libgfortran-ng=7.3.0=hdf63c60_0
|
||||
libglu=9.0.0=hf484d3e_1
|
||||
libopencv=3.4.2=hb342d67_1
|
||||
libopus=1.2.1=hb9ed12e_0
|
||||
libpng=1.6.35=hbc83047_0
|
||||
libstdcxx-ng=8.2.0=hdf63c60_1
|
||||
libtiff=4.0.9=he85c1e1_2
|
||||
libuuid=1.0.3=h1bed415_2
|
||||
libvpx=1.7.0=h439df22_0
|
||||
libxcb=1.13=h1bed415_1
|
||||
libxml2=2.9.8=h26e45fe_1
|
||||
matplotlib=3.0.2=py37h5429711_0
|
||||
mkl=2018.0.3=1
|
||||
mkl_fft=1.0.6=py37h7dd41cf_0
|
||||
mkl_random=1.0.1=py37h4414c95_1
|
||||
ncurses=6.1=hf484d3e_0
|
||||
ninja=1.8.2=py37h6bb024c_1
|
||||
numpy=1.15.4=py37h1d66e8a_0
|
||||
numpy-base=1.15.4=py37h81de0dd_0
|
||||
olefile=0.46=py37_0
|
||||
opencv=3.4.2=py37h6fd60c2_1
|
||||
openssl=1.1.1a=h7b6447c_0
|
||||
pcre=8.42=h439df22_0
|
||||
pillow=5.2.0=py37heded4f4_0
|
||||
pip=10.0.1=py37_0
|
||||
pixman=0.34.0=hceecf20_3
|
||||
py-opencv=3.4.2=py37hb342d67_1
|
||||
pycparser=2.18=py37_1
|
||||
pyparsing=2.2.0=py37_1
|
||||
pyqt=5.9.2=py37h05f1152_2
|
||||
python=3.7.1=h0371630_3
|
||||
python-dateutil=2.7.3=py37_0
|
||||
pytorch=1.0.0=py3.7_cuda10.0.130_cudnn7.4.1_1
|
||||
pytz=2018.5=py37_0
|
||||
qt=5.9.7=h5867ecd_1
|
||||
readline=7.0=h7b6447c_5
|
||||
scikit-learn=0.19.1=py37hedc7406_0
|
||||
scipy=1.1.0=py37hfa4b5c9_1
|
||||
setuptools=40.2.0=py37_0
|
||||
sip=4.19.8=py37hf484d3e_0
|
||||
six=1.11.0=py37_1
|
||||
sqlite=3.25.3=h7b6447c_0
|
||||
tk=8.6.8=hbc83047_0
|
||||
torchvision=0.2.1=py37_1
|
||||
tornado=5.1=py37h14c3975_0
|
||||
tqdm=4.25.0=py37h28b3542_0
|
||||
wheel=0.31.1=py37_0
|
||||
xz=5.2.4=h14c3975_4
|
||||
zlib=1.2.11=ha838bed_2
|
||||
54
object_detection/configs/CornerNet-multi_scale.json
Normal file
54
object_detection/configs/CornerNet-multi_scale.json
Normal file
@@ -0,0 +1,54 @@
|
||||
{
|
||||
"system": {
|
||||
"dataset": "COCO",
|
||||
"batch_size": 49,
|
||||
"sampling_function": "cornernet",
|
||||
|
||||
"train_split": "trainval",
|
||||
"val_split": "minival",
|
||||
|
||||
"learning_rate": 0.00025,
|
||||
"decay_rate": 10,
|
||||
|
||||
"val_iter": 100,
|
||||
|
||||
"opt_algo": "adam",
|
||||
"prefetch_size": 5,
|
||||
|
||||
"max_iter": 500000,
|
||||
"stepsize": 450000,
|
||||
"snapshot": 5000,
|
||||
|
||||
"chunk_sizes": [4, 5, 5, 5, 5, 5, 5, 5, 5, 5],
|
||||
|
||||
"data_dir": "./data"
|
||||
},
|
||||
|
||||
"db": {
|
||||
"rand_scale_min": 0.6,
|
||||
"rand_scale_max": 1.4,
|
||||
"rand_scale_step": 0.1,
|
||||
"rand_scales": null,
|
||||
|
||||
"rand_crop": true,
|
||||
"rand_color": true,
|
||||
|
||||
"border": 128,
|
||||
"gaussian_bump": true,
|
||||
|
||||
"input_size": [511, 511],
|
||||
"output_sizes": [[128, 128]],
|
||||
|
||||
"test_scales": [0.5, 0.75, 1, 1.25, 1.5],
|
||||
|
||||
"top_k": 100,
|
||||
"categories": 80,
|
||||
"ae_threshold": 0.5,
|
||||
"nms_threshold": 0.5,
|
||||
|
||||
"merge_bbox": true,
|
||||
"weight_exp": 10,
|
||||
|
||||
"max_per_image": 100
|
||||
}
|
||||
}
|
||||
52
object_detection/configs/CornerNet.json
Normal file
52
object_detection/configs/CornerNet.json
Normal file
@@ -0,0 +1,52 @@
|
||||
{
|
||||
"system": {
|
||||
"dataset": "COCO",
|
||||
"batch_size": 49,
|
||||
"sampling_function": "cornernet",
|
||||
|
||||
"train_split": "trainval",
|
||||
"val_split": "minival",
|
||||
|
||||
"learning_rate": 0.00025,
|
||||
"decay_rate": 10,
|
||||
|
||||
"val_iter": 100,
|
||||
|
||||
"opt_algo": "adam",
|
||||
"prefetch_size": 5,
|
||||
|
||||
"max_iter": 500000,
|
||||
"stepsize": 450000,
|
||||
"snapshot": 5000,
|
||||
|
||||
"chunk_sizes": [4, 5, 5, 5, 5, 5, 5, 5, 5, 5],
|
||||
|
||||
"data_dir": "./data"
|
||||
},
|
||||
|
||||
"db": {
|
||||
"rand_scale_min": 0.6,
|
||||
"rand_scale_max": 1.4,
|
||||
"rand_scale_step": 0.1,
|
||||
"rand_scales": null,
|
||||
|
||||
"rand_crop": true,
|
||||
"rand_color": true,
|
||||
|
||||
"border": 128,
|
||||
"gaussian_bump": true,
|
||||
"gaussian_iou": 0.3,
|
||||
|
||||
"input_size": [511, 511],
|
||||
"output_sizes": [[128, 128]],
|
||||
|
||||
"test_scales": [1],
|
||||
|
||||
"top_k": 100,
|
||||
"categories": 80,
|
||||
"ae_threshold": 0.5,
|
||||
"nms_threshold": 0.5,
|
||||
|
||||
"max_per_image": 100
|
||||
}
|
||||
}
|
||||
56
object_detection/configs/CornerNet_Saccade.json
Normal file
56
object_detection/configs/CornerNet_Saccade.json
Normal file
@@ -0,0 +1,56 @@
|
||||
{
|
||||
"system": {
|
||||
"dataset": "COCO",
|
||||
"batch_size": 48,
|
||||
"sampling_function": "cornernet_saccade",
|
||||
|
||||
"train_split": "trainval",
|
||||
"val_split": "minival",
|
||||
|
||||
"learning_rate": 0.00025,
|
||||
"decay_rate": 10,
|
||||
|
||||
"val_iter": 100,
|
||||
|
||||
"opt_algo": "adam",
|
||||
"prefetch_size": 5,
|
||||
|
||||
"max_iter": 500000,
|
||||
"stepsize": 450000,
|
||||
"snapshot": 5000,
|
||||
|
||||
"chunk_sizes": [12, 12, 12, 12]
|
||||
},
|
||||
|
||||
"db": {
|
||||
"rand_scale_min": 0.5,
|
||||
"rand_scale_max": 1.1,
|
||||
"rand_scale_step": 0.1,
|
||||
"rand_scales": null,
|
||||
|
||||
"rand_full_crop": true,
|
||||
"gaussian_bump": true,
|
||||
"gaussian_iou": 0.5,
|
||||
|
||||
"min_scale": 16,
|
||||
"view_sizes": [],
|
||||
|
||||
"height_mult": 31,
|
||||
"width_mult": 31,
|
||||
|
||||
"input_size": [255, 255],
|
||||
"output_sizes": [[64, 64]],
|
||||
|
||||
"att_max_crops": 30,
|
||||
"att_scales": [[1, 2, 4]],
|
||||
"att_thresholds": [0.3],
|
||||
|
||||
"top_k": 12,
|
||||
"num_dets": 12,
|
||||
"categories": 80,
|
||||
"ae_threshold": 0.3,
|
||||
"nms_threshold": 0.5,
|
||||
|
||||
"max_per_image": 100
|
||||
}
|
||||
}
|
||||
54
object_detection/configs/CornerNet_Squeeze.json
Normal file
54
object_detection/configs/CornerNet_Squeeze.json
Normal file
@@ -0,0 +1,54 @@
|
||||
{
|
||||
"system": {
|
||||
"dataset": "COCO",
|
||||
"batch_size": 55,
|
||||
"sampling_function": "cornernet",
|
||||
|
||||
"train_split": "trainval",
|
||||
"val_split": "minival",
|
||||
|
||||
"learning_rate": 0.00025,
|
||||
"decay_rate": 10,
|
||||
|
||||
"val_iter": 100,
|
||||
|
||||
"opt_algo": "adam",
|
||||
"prefetch_size": 5,
|
||||
|
||||
"max_iter": 500000,
|
||||
"stepsize": 450000,
|
||||
"snapshot": 5000,
|
||||
|
||||
"chunk_sizes": [13, 14, 14, 14],
|
||||
|
||||
"data_dir": "./data"
|
||||
},
|
||||
|
||||
"db": {
|
||||
"rand_scale_min": 0.6,
|
||||
"rand_scale_max": 1.4,
|
||||
"rand_scale_step": 0.1,
|
||||
"rand_scales": null,
|
||||
|
||||
"rand_crop": true,
|
||||
"rand_color": true,
|
||||
|
||||
"border": 128,
|
||||
"gaussian_bump": true,
|
||||
"gaussian_iou": 0.3,
|
||||
|
||||
"input_size": [511, 511],
|
||||
"output_sizes": [[64, 64]],
|
||||
|
||||
"test_scales": [1],
|
||||
"test_flipped": false,
|
||||
|
||||
"top_k": 20,
|
||||
"num_dets": 100,
|
||||
"categories": 80,
|
||||
"ae_threshold": 0.5,
|
||||
"nms_threshold": 0.5,
|
||||
|
||||
"max_per_image": 100
|
||||
}
|
||||
}
|
||||
0
object_detection/core/__init__.py
Normal file
0
object_detection/core/__init__.py
Normal file
39
object_detection/core/base.py
Normal file
39
object_detection/core/base.py
Normal file
@@ -0,0 +1,39 @@
|
||||
import json
|
||||
|
||||
from .nnet.py_factory import NetworkFactory
|
||||
|
||||
|
||||
class Base(object):
|
||||
def __init__(self, db, nnet, func, model=None):
|
||||
super(Base, self).__init__()
|
||||
|
||||
self._db = db
|
||||
self._nnet = nnet
|
||||
self._func = func
|
||||
|
||||
if model is not None:
|
||||
self._nnet.load_pretrained_params(model)
|
||||
|
||||
self._nnet.cuda()
|
||||
self._nnet.eval_mode()
|
||||
|
||||
def _inference(self, image, *args, **kwargs):
|
||||
return self._func(self._db, self._nnet, image.copy(), *args, **kwargs)
|
||||
|
||||
def __call__(self, image, *args, **kwargs):
|
||||
categories = self._db.configs["categories"]
|
||||
bboxes = self._inference(image, *args, **kwargs)
|
||||
return {self._db.cls2name(j): bboxes[j] for j in range(1, categories + 1)}
|
||||
|
||||
|
||||
def load_cfg(cfg_file):
|
||||
with open(cfg_file, "r") as f:
|
||||
cfg = json.load(f)
|
||||
|
||||
cfg_sys = cfg["system"]
|
||||
cfg_db = cfg["db"]
|
||||
return cfg_sys, cfg_db
|
||||
|
||||
|
||||
def load_nnet(cfg_sys, model):
|
||||
return NetworkFactory(cfg_sys, model)
|
||||
164
object_detection/core/config.py
Normal file
164
object_detection/core/config.py
Normal file
@@ -0,0 +1,164 @@
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
class SystemConfig(object):
|
||||
def __init__(self):
|
||||
self._configs = {}
|
||||
self._configs["dataset"] = None
|
||||
self._configs["sampling_function"] = "coco_detection"
|
||||
|
||||
# Training Config
|
||||
self._configs["display"] = 5
|
||||
self._configs["snapshot"] = 400
|
||||
self._configs["stepsize"] = 5000
|
||||
self._configs["learning_rate"] = 0.001
|
||||
self._configs["decay_rate"] = 10
|
||||
self._configs["max_iter"] = 100000
|
||||
self._configs["val_iter"] = 20
|
||||
self._configs["batch_size"] = 1
|
||||
self._configs["snapshot_name"] = None
|
||||
self._configs["prefetch_size"] = 100
|
||||
self._configs["pretrain"] = None
|
||||
self._configs["opt_algo"] = "adam"
|
||||
self._configs["chunk_sizes"] = None
|
||||
|
||||
# Directories
|
||||
self._configs["data_dir"] = "./data"
|
||||
self._configs["cache_dir"] = "./cache"
|
||||
self._configs["config_dir"] = "./config"
|
||||
self._configs["result_dir"] = "./results"
|
||||
|
||||
# Split
|
||||
self._configs["train_split"] = "training"
|
||||
self._configs["val_split"] = "validation"
|
||||
self._configs["test_split"] = "testdev"
|
||||
|
||||
# Rng
|
||||
self._configs["data_rng"] = np.random.RandomState(123)
|
||||
self._configs["nnet_rng"] = np.random.RandomState(317)
|
||||
|
||||
@property
|
||||
def chunk_sizes(self):
|
||||
return self._configs["chunk_sizes"]
|
||||
|
||||
@property
|
||||
def train_split(self):
|
||||
return self._configs["train_split"]
|
||||
|
||||
@property
|
||||
def val_split(self):
|
||||
return self._configs["val_split"]
|
||||
|
||||
@property
|
||||
def test_split(self):
|
||||
return self._configs["test_split"]
|
||||
|
||||
@property
|
||||
def full(self):
|
||||
return self._configs
|
||||
|
||||
@property
|
||||
def sampling_function(self):
|
||||
return self._configs["sampling_function"]
|
||||
|
||||
@property
|
||||
def data_rng(self):
|
||||
return self._configs["data_rng"]
|
||||
|
||||
@property
|
||||
def nnet_rng(self):
|
||||
return self._configs["nnet_rng"]
|
||||
|
||||
@property
|
||||
def opt_algo(self):
|
||||
return self._configs["opt_algo"]
|
||||
|
||||
@property
|
||||
def prefetch_size(self):
|
||||
return self._configs["prefetch_size"]
|
||||
|
||||
@property
|
||||
def pretrain(self):
|
||||
return self._configs["pretrain"]
|
||||
|
||||
@property
|
||||
def result_dir(self):
|
||||
result_dir = os.path.join(self._configs["result_dir"], self.snapshot_name)
|
||||
if not os.path.exists(result_dir):
|
||||
os.makedirs(result_dir)
|
||||
return result_dir
|
||||
|
||||
@property
|
||||
def dataset(self):
|
||||
return self._configs["dataset"]
|
||||
|
||||
@property
|
||||
def snapshot_name(self):
|
||||
return self._configs["snapshot_name"]
|
||||
|
||||
@property
|
||||
def snapshot_dir(self):
|
||||
snapshot_dir = os.path.join(self.cache_dir, "nnet", self.snapshot_name)
|
||||
|
||||
if not os.path.exists(snapshot_dir):
|
||||
os.makedirs(snapshot_dir)
|
||||
return snapshot_dir
|
||||
|
||||
@property
|
||||
def snapshot_file(self):
|
||||
snapshot_file = os.path.join(self.snapshot_dir, self.snapshot_name + "_{}.pkl")
|
||||
return snapshot_file
|
||||
|
||||
@property
|
||||
def config_dir(self):
|
||||
return self._configs["config_dir"]
|
||||
|
||||
@property
|
||||
def batch_size(self):
|
||||
return self._configs["batch_size"]
|
||||
|
||||
@property
|
||||
def max_iter(self):
|
||||
return self._configs["max_iter"]
|
||||
|
||||
@property
|
||||
def learning_rate(self):
|
||||
return self._configs["learning_rate"]
|
||||
|
||||
@property
|
||||
def decay_rate(self):
|
||||
return self._configs["decay_rate"]
|
||||
|
||||
@property
|
||||
def stepsize(self):
|
||||
return self._configs["stepsize"]
|
||||
|
||||
@property
|
||||
def snapshot(self):
|
||||
return self._configs["snapshot"]
|
||||
|
||||
@property
|
||||
def display(self):
|
||||
return self._configs["display"]
|
||||
|
||||
@property
|
||||
def val_iter(self):
|
||||
return self._configs["val_iter"]
|
||||
|
||||
@property
|
||||
def data_dir(self):
|
||||
return self._configs["data_dir"]
|
||||
|
||||
@property
|
||||
def cache_dir(self):
|
||||
if not os.path.exists(self._configs["cache_dir"]):
|
||||
os.makedirs(self._configs["cache_dir"])
|
||||
return self._configs["cache_dir"]
|
||||
|
||||
def update_config(self, new):
|
||||
for key in new:
|
||||
if key in self._configs:
|
||||
self._configs[key] = new[key]
|
||||
return self
|
||||
5
object_detection/core/dbs/__init__.py
Normal file
5
object_detection/core/dbs/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
from .coco import COCO
|
||||
|
||||
datasets = {
|
||||
"COCO": COCO
|
||||
}
|
||||
74
object_detection/core/dbs/base.py
Normal file
74
object_detection/core/dbs/base.py
Normal file
@@ -0,0 +1,74 @@
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
class BASE(object):
|
||||
def __init__(self):
|
||||
self._split = None
|
||||
self._db_inds = []
|
||||
self._image_ids = []
|
||||
|
||||
self._mean = np.zeros((3,), dtype=np.float32)
|
||||
self._std = np.ones((3,), dtype=np.float32)
|
||||
self._eig_val = np.ones((3,), dtype=np.float32)
|
||||
self._eig_vec = np.zeros((3, 3), dtype=np.float32)
|
||||
|
||||
self._configs = {}
|
||||
self._configs["data_aug"] = True
|
||||
|
||||
self._data_rng = None
|
||||
|
||||
@property
|
||||
def configs(self):
|
||||
return self._configs
|
||||
|
||||
@property
|
||||
def mean(self):
|
||||
return self._mean
|
||||
|
||||
@property
|
||||
def std(self):
|
||||
return self._std
|
||||
|
||||
@property
|
||||
def eig_val(self):
|
||||
return self._eig_val
|
||||
|
||||
@property
|
||||
def eig_vec(self):
|
||||
return self._eig_vec
|
||||
|
||||
@property
|
||||
def db_inds(self):
|
||||
return self._db_inds
|
||||
|
||||
@property
|
||||
def split(self):
|
||||
return self._split
|
||||
|
||||
def update_config(self, new):
|
||||
for key in new:
|
||||
if key in self._configs:
|
||||
self._configs[key] = new[key]
|
||||
|
||||
def image_ids(self, ind):
|
||||
return self._image_ids[ind]
|
||||
|
||||
def image_path(self, ind):
|
||||
pass
|
||||
|
||||
def write_result(self, ind, all_bboxes, all_scores):
|
||||
pass
|
||||
|
||||
def evaluate(self, name):
|
||||
pass
|
||||
|
||||
def shuffle_inds(self, quiet=False):
|
||||
if self._data_rng is None:
|
||||
self._data_rng = np.random.RandomState(os.getpid())
|
||||
|
||||
if not quiet:
|
||||
print("shuffling indices...")
|
||||
rand_perm = self._data_rng.permutation(len(self._db_inds))
|
||||
self._db_inds = self._db_inds[rand_perm]
|
||||
169
object_detection/core/dbs/coco.py
Normal file
169
object_detection/core/dbs/coco.py
Normal file
@@ -0,0 +1,169 @@
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
|
||||
from .detection import DETECTION
|
||||
|
||||
|
||||
# COCO bounding boxes are 0-indexed
|
||||
|
||||
class COCO(DETECTION):
|
||||
def __init__(self, db_config, split=None, sys_config=None):
|
||||
assert split is None or sys_config is not None
|
||||
super(COCO, self).__init__(db_config)
|
||||
|
||||
self._mean = np.array([0.40789654, 0.44719302, 0.47026115], dtype=np.float32)
|
||||
self._std = np.array([0.28863828, 0.27408164, 0.27809835], dtype=np.float32)
|
||||
self._eig_val = np.array([0.2141788, 0.01817699, 0.00341571], dtype=np.float32)
|
||||
self._eig_vec = np.array([
|
||||
[-0.58752847, -0.69563484, 0.41340352],
|
||||
[-0.5832747, 0.00994535, -0.81221408],
|
||||
[-0.56089297, 0.71832671, 0.41158938]
|
||||
], dtype=np.float32)
|
||||
|
||||
self._coco_cls_ids = [
|
||||
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13,
|
||||
14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
|
||||
24, 25, 27, 28, 31, 32, 33, 34, 35, 36,
|
||||
37, 38, 39, 40, 41, 42, 43, 44, 46, 47,
|
||||
48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
|
||||
58, 59, 60, 61, 62, 63, 64, 65, 67, 70,
|
||||
72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
|
||||
82, 84, 85, 86, 87, 88, 89, 90
|
||||
]
|
||||
|
||||
self._coco_cls_names = [
|
||||
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
|
||||
'bus', 'train', 'truck', 'boat', 'traffic light',
|
||||
'fire hydrant', 'stop sign', 'parking meter', 'bench',
|
||||
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
|
||||
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',
|
||||
'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
|
||||
'snowboard', 'sports ball', 'kite', 'baseball bat',
|
||||
'baseball glove', 'skateboard', 'surfboard',
|
||||
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork',
|
||||
'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
|
||||
'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
|
||||
'donut', 'cake', 'chair', 'couch', 'potted plant',
|
||||
'bed', 'dining table', 'toilet', 'tv', 'laptop',
|
||||
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
|
||||
'oven', 'toaster', 'sink', 'refrigerator', 'book',
|
||||
'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
|
||||
'toothbrush'
|
||||
]
|
||||
|
||||
self._cls2coco = {ind + 1: coco_id for ind, coco_id in enumerate(self._coco_cls_ids)}
|
||||
self._coco2cls = {coco_id: cls_id for cls_id, coco_id in self._cls2coco.items()}
|
||||
self._coco2name = {cls_id: cls_name for cls_id, cls_name in zip(self._coco_cls_ids, self._coco_cls_names)}
|
||||
self._name2coco = {cls_name: cls_id for cls_name, cls_id in self._coco2name.items()}
|
||||
|
||||
if split is not None:
|
||||
coco_dir = os.path.join(sys_config.data_dir, "coco")
|
||||
|
||||
self._split = {
|
||||
"trainval": "trainval2014",
|
||||
"minival": "minival2014",
|
||||
"testdev": "testdev2017"
|
||||
}[split]
|
||||
self._data_dir = os.path.join(coco_dir, "images", self._split)
|
||||
self._anno_file = os.path.join(coco_dir, "annotations", "instances_{}.json".format(self._split))
|
||||
|
||||
self._detections, self._eval_ids = self._load_coco_annos()
|
||||
self._image_ids = list(self._detections.keys())
|
||||
self._db_inds = np.arange(len(self._image_ids))
|
||||
|
||||
def _load_coco_annos(self):
|
||||
from pycocotools.coco import COCO
|
||||
|
||||
coco = COCO(self._anno_file)
|
||||
self._coco = coco
|
||||
|
||||
class_ids = coco.getCatIds()
|
||||
image_ids = coco.getImgIds()
|
||||
|
||||
eval_ids = {}
|
||||
detections = {}
|
||||
for image_id in image_ids:
|
||||
image = coco.loadImgs(image_id)[0]
|
||||
dets = []
|
||||
|
||||
eval_ids[image["file_name"]] = image_id
|
||||
for class_id in class_ids:
|
||||
annotation_ids = coco.getAnnIds(imgIds=image["id"], catIds=class_id)
|
||||
annotations = coco.loadAnns(annotation_ids)
|
||||
category = self._coco2cls[class_id]
|
||||
for annotation in annotations:
|
||||
det = annotation["bbox"] + [category]
|
||||
det[2] += det[0]
|
||||
det[3] += det[1]
|
||||
dets.append(det)
|
||||
|
||||
file_name = image["file_name"]
|
||||
if len(dets) == 0:
|
||||
detections[file_name] = np.zeros((0, 5), dtype=np.float32)
|
||||
else:
|
||||
detections[file_name] = np.array(dets, dtype=np.float32)
|
||||
return detections, eval_ids
|
||||
|
||||
def image_path(self, ind):
|
||||
if self._data_dir is None:
|
||||
raise ValueError("Data directory is not set")
|
||||
|
||||
db_ind = self._db_inds[ind]
|
||||
file_name = self._image_ids[db_ind]
|
||||
return os.path.join(self._data_dir, file_name)
|
||||
|
||||
def detections(self, ind):
|
||||
db_ind = self._db_inds[ind]
|
||||
file_name = self._image_ids[db_ind]
|
||||
return self._detections[file_name].copy()
|
||||
|
||||
def cls2name(self, cls):
|
||||
coco = self._cls2coco[cls]
|
||||
return self._coco2name[coco]
|
||||
|
||||
def _to_float(self, x):
|
||||
return float("{:.2f}".format(x))
|
||||
|
||||
def convert_to_coco(self, all_bboxes):
|
||||
detections = []
|
||||
for image_id in all_bboxes:
|
||||
coco_id = self._eval_ids[image_id]
|
||||
for cls_ind in all_bboxes[image_id]:
|
||||
category_id = self._cls2coco[cls_ind]
|
||||
for bbox in all_bboxes[image_id][cls_ind]:
|
||||
bbox[2] -= bbox[0]
|
||||
bbox[3] -= bbox[1]
|
||||
|
||||
score = bbox[4]
|
||||
bbox = list(map(self._to_float, bbox[0:4]))
|
||||
|
||||
detection = {
|
||||
"image_id": coco_id,
|
||||
"category_id": category_id,
|
||||
"bbox": bbox,
|
||||
"score": float("{:.2f}".format(score))
|
||||
}
|
||||
|
||||
detections.append(detection)
|
||||
return detections
|
||||
|
||||
def evaluate(self, result_json, cls_ids, image_ids):
|
||||
from pycocotools.cocoeval import COCOeval
|
||||
|
||||
if self._split == "testdev":
|
||||
return None
|
||||
|
||||
coco = self._coco
|
||||
|
||||
eval_ids = [self._eval_ids[image_id] for image_id in image_ids]
|
||||
cat_ids = [self._cls2coco[cls_id] for cls_id in cls_ids]
|
||||
|
||||
coco_dets = coco.loadRes(result_json)
|
||||
coco_eval = COCOeval(coco, coco_dets, "bbox")
|
||||
coco_eval.params.imgIds = eval_ids
|
||||
coco_eval.params.catIds = cat_ids
|
||||
coco_eval.evaluate()
|
||||
coco_eval.accumulate()
|
||||
coco_eval.summarize()
|
||||
return coco_eval.stats[0], coco_eval.stats[12:]
|
||||
71
object_detection/core/dbs/detection.py
Normal file
71
object_detection/core/dbs/detection.py
Normal file
@@ -0,0 +1,71 @@
|
||||
import numpy as np
|
||||
|
||||
from .base import BASE
|
||||
|
||||
|
||||
class DETECTION(BASE):
|
||||
def __init__(self, db_config):
|
||||
super(DETECTION, self).__init__()
|
||||
|
||||
# Configs for training
|
||||
self._configs["categories"] = 80
|
||||
self._configs["rand_scales"] = [1]
|
||||
self._configs["rand_scale_min"] = 0.8
|
||||
self._configs["rand_scale_max"] = 1.4
|
||||
self._configs["rand_scale_step"] = 0.2
|
||||
|
||||
# Configs for both training and testing
|
||||
self._configs["input_size"] = [383, 383]
|
||||
self._configs["output_sizes"] = [[96, 96], [48, 48], [24, 24], [12, 12]]
|
||||
|
||||
self._configs["score_threshold"] = 0.05
|
||||
self._configs["nms_threshold"] = 0.7
|
||||
self._configs["max_per_set"] = 40
|
||||
self._configs["max_per_image"] = 100
|
||||
self._configs["top_k"] = 20
|
||||
self._configs["ae_threshold"] = 1
|
||||
self._configs["nms_kernel"] = 3
|
||||
self._configs["num_dets"] = 1000
|
||||
|
||||
self._configs["nms_algorithm"] = "exp_soft_nms"
|
||||
self._configs["weight_exp"] = 8
|
||||
self._configs["merge_bbox"] = False
|
||||
|
||||
self._configs["data_aug"] = True
|
||||
self._configs["lighting"] = True
|
||||
|
||||
self._configs["border"] = 64
|
||||
self._configs["gaussian_bump"] = False
|
||||
self._configs["gaussian_iou"] = 0.7
|
||||
self._configs["gaussian_radius"] = -1
|
||||
self._configs["rand_crop"] = False
|
||||
self._configs["rand_color"] = False
|
||||
self._configs["rand_center"] = True
|
||||
|
||||
self._configs["init_sizes"] = [192, 255]
|
||||
self._configs["view_sizes"] = []
|
||||
|
||||
self._configs["min_scale"] = 16
|
||||
self._configs["max_scale"] = 32
|
||||
|
||||
self._configs["att_sizes"] = [[16, 16], [32, 32], [64, 64]]
|
||||
self._configs["att_ranges"] = [[96, 256], [32, 96], [0, 32]]
|
||||
self._configs["att_ratios"] = [16, 8, 4]
|
||||
self._configs["att_scales"] = [1, 1.5, 2]
|
||||
self._configs["att_thresholds"] = [0.3, 0.3, 0.3, 0.3]
|
||||
self._configs["att_nms_ks"] = [3, 3, 3]
|
||||
self._configs["att_max_crops"] = 8
|
||||
self._configs["ref_dets"] = True
|
||||
|
||||
# Configs for testing
|
||||
self._configs["test_scales"] = [1]
|
||||
self._configs["test_flipped"] = True
|
||||
|
||||
self.update_config(db_config)
|
||||
|
||||
if self._configs["rand_scales"] is None:
|
||||
self._configs["rand_scales"] = np.arange(
|
||||
self._configs["rand_scale_min"],
|
||||
self._configs["rand_scale_max"],
|
||||
self._configs["rand_scale_step"]
|
||||
)
|
||||
52
object_detection/core/detectors.py
Normal file
52
object_detection/core/detectors.py
Normal file
@@ -0,0 +1,52 @@
|
||||
from .base import Base, load_cfg, load_nnet
|
||||
from .config import SystemConfig
|
||||
from .dbs.coco import COCO
|
||||
from .paths import get_file_path
|
||||
|
||||
|
||||
class CornerNet(Base):
|
||||
def __init__(self):
|
||||
from .test.cornernet import cornernet_inference
|
||||
from .models.CornerNet import model
|
||||
|
||||
cfg_path = get_file_path("..", "configs", "CornerNet.json")
|
||||
model_path = get_file_path("..", "cache", "nnet", "CornerNet", "CornerNet_500000.pkl")
|
||||
|
||||
cfg_sys, cfg_db = load_cfg(cfg_path)
|
||||
sys_cfg = SystemConfig().update_config(cfg_sys)
|
||||
coco = COCO(cfg_db)
|
||||
|
||||
cornernet = load_nnet(sys_cfg, model())
|
||||
super(CornerNet, self).__init__(coco, cornernet, cornernet_inference, model=model_path)
|
||||
|
||||
|
||||
class CornerNet_Squeeze(Base):
|
||||
def __init__(self):
|
||||
from .test.cornernet import cornernet_inference
|
||||
from .models.CornerNet_Squeeze import model
|
||||
|
||||
cfg_path = get_file_path("..", "configs", "CornerNet_Squeeze.json")
|
||||
model_path = get_file_path("..", "cache", "nnet", "CornerNet_Squeeze", "CornerNet_Squeeze_500000.pkl")
|
||||
|
||||
cfg_sys, cfg_db = load_cfg(cfg_path)
|
||||
sys_cfg = SystemConfig().update_config(cfg_sys)
|
||||
coco = COCO(cfg_db)
|
||||
|
||||
cornernet = load_nnet(sys_cfg, model())
|
||||
super(CornerNet_Squeeze, self).__init__(coco, cornernet, cornernet_inference, model=model_path)
|
||||
|
||||
|
||||
class CornerNet_Saccade(Base):
|
||||
def __init__(self):
|
||||
from .test.cornernet_saccade import cornernet_saccade_inference
|
||||
from .models.CornerNet_Saccade import model
|
||||
|
||||
cfg_path = get_file_path("..", "configs", "CornerNet_Saccade.json")
|
||||
model_path = get_file_path("..", "cache", "nnet", "CornerNet_Saccade", "CornerNet_Saccade_500000.pkl")
|
||||
|
||||
cfg_sys, cfg_db = load_cfg(cfg_path)
|
||||
sys_cfg = SystemConfig().update_config(cfg_sys)
|
||||
coco = COCO(cfg_db)
|
||||
|
||||
cornernet = load_nnet(sys_cfg, model())
|
||||
super(CornerNet_Saccade, self).__init__(coco, cornernet, cornernet_saccade_inference, model=model_path)
|
||||
7
object_detection/core/external/.gitignore
vendored
Normal file
7
object_detection/core/external/.gitignore
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
bbox.c
|
||||
bbox.cpython-35m-x86_64-linux-gnu.so
|
||||
bbox.cpython-36m-x86_64-linux-gnu.so
|
||||
|
||||
nms.c
|
||||
nms.cpython-35m-x86_64-linux-gnu.so
|
||||
nms.cpython-36m-x86_64-linux-gnu.so
|
||||
3
object_detection/core/external/Makefile
vendored
Normal file
3
object_detection/core/external/Makefile
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
all:
|
||||
python setup.py build_ext --inplace
|
||||
rm -rf build
|
||||
0
object_detection/core/external/__init__.py
vendored
Normal file
0
object_detection/core/external/__init__.py
vendored
Normal file
55
object_detection/core/external/bbox.pyx
vendored
Normal file
55
object_detection/core/external/bbox.pyx
vendored
Normal file
@@ -0,0 +1,55 @@
|
||||
# --------------------------------------------------------
|
||||
# Fast R-CNN
|
||||
# Copyright (c) 2015 Microsoft
|
||||
# Licensed under The MIT License [see LICENSE for details]
|
||||
# Written by Sergey Karayev
|
||||
# --------------------------------------------------------
|
||||
|
||||
cimport cython
|
||||
import numpy as np
|
||||
cimport numpy as np
|
||||
|
||||
DTYPE = np.float
|
||||
ctypedef np.float_t DTYPE_t
|
||||
|
||||
def bbox_overlaps(
|
||||
np.ndarray[DTYPE_t, ndim=2] boxes,
|
||||
np.ndarray[DTYPE_t, ndim=2] query_boxes):
|
||||
"""
|
||||
Parameters
|
||||
----------
|
||||
boxes: (N, 4) ndarray of float
|
||||
query_boxes: (K, 4) ndarray of float
|
||||
Returns
|
||||
-------
|
||||
overlaps: (N, K) ndarray of overlap between boxes and query_boxes
|
||||
"""
|
||||
cdef unsigned int N = boxes.shape[0]
|
||||
cdef unsigned int K = query_boxes.shape[0]
|
||||
cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
|
||||
cdef DTYPE_t iw, ih, box_area
|
||||
cdef DTYPE_t ua
|
||||
cdef unsigned int k, n
|
||||
for k in range(K):
|
||||
box_area = (
|
||||
(query_boxes[k, 2] - query_boxes[k, 0] + 1) *
|
||||
(query_boxes[k, 3] - query_boxes[k, 1] + 1)
|
||||
)
|
||||
for n in range(N):
|
||||
iw = (
|
||||
min(boxes[n, 2], query_boxes[k, 2]) -
|
||||
max(boxes[n, 0], query_boxes[k, 0]) + 1
|
||||
)
|
||||
if iw > 0:
|
||||
ih = (
|
||||
min(boxes[n, 3], query_boxes[k, 3]) -
|
||||
max(boxes[n, 1], query_boxes[k, 1]) + 1
|
||||
)
|
||||
if ih > 0:
|
||||
ua = float(
|
||||
(boxes[n, 2] - boxes[n, 0] + 1) *
|
||||
(boxes[n, 3] - boxes[n, 1] + 1) +
|
||||
box_area - iw * ih
|
||||
)
|
||||
overlaps[n, k] = iw * ih / ua
|
||||
return overlaps
|
||||
281
object_detection/core/external/nms.pyx
vendored
Normal file
281
object_detection/core/external/nms.pyx
vendored
Normal file
@@ -0,0 +1,281 @@
|
||||
# --------------------------------------------------------
|
||||
# Fast R-CNN
|
||||
# Copyright (c) 2015 Microsoft
|
||||
# Licensed under The MIT License [see LICENSE for details]
|
||||
# Written by Ross Girshick
|
||||
# --------------------------------------------------------
|
||||
|
||||
import numpy as np
|
||||
cimport numpy as np
|
||||
|
||||
cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
|
||||
return a if a >= b else b
|
||||
|
||||
cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
|
||||
return a if a <= b else b
|
||||
|
||||
def nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
|
||||
cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
|
||||
cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
|
||||
cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
|
||||
cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
|
||||
cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
|
||||
|
||||
cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
|
||||
cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1]
|
||||
|
||||
cdef int ndets = dets.shape[0]
|
||||
cdef np.ndarray[np.int_t, ndim=1] suppressed = \
|
||||
np.zeros((ndets), dtype=np.int)
|
||||
|
||||
# nominal indices
|
||||
cdef int _i, _j
|
||||
# sorted indices
|
||||
cdef int i, j
|
||||
# temp variables for box i's (the box currently under consideration)
|
||||
cdef np.float32_t ix1, iy1, ix2, iy2, iarea
|
||||
# variables for computing overlap with box j (lower scoring box)
|
||||
cdef np.float32_t xx1, yy1, xx2, yy2
|
||||
cdef np.float32_t w, h
|
||||
cdef np.float32_t inter, ovr
|
||||
|
||||
keep = []
|
||||
for _i in range(ndets):
|
||||
i = order[_i]
|
||||
if suppressed[i] == 1:
|
||||
continue
|
||||
keep.append(i)
|
||||
ix1 = x1[i]
|
||||
iy1 = y1[i]
|
||||
ix2 = x2[i]
|
||||
iy2 = y2[i]
|
||||
iarea = areas[i]
|
||||
for _j in range(_i + 1, ndets):
|
||||
j = order[_j]
|
||||
if suppressed[j] == 1:
|
||||
continue
|
||||
xx1 = max(ix1, x1[j])
|
||||
yy1 = max(iy1, y1[j])
|
||||
xx2 = min(ix2, x2[j])
|
||||
yy2 = min(iy2, y2[j])
|
||||
w = max(0.0, xx2 - xx1 + 1)
|
||||
h = max(0.0, yy2 - yy1 + 1)
|
||||
inter = w * h
|
||||
ovr = inter / (iarea + areas[j] - inter)
|
||||
if ovr >= thresh:
|
||||
suppressed[j] = 1
|
||||
|
||||
return keep
|
||||
|
||||
def soft_nms(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001,
|
||||
unsigned int method=0):
|
||||
cdef unsigned int N = boxes.shape[0]
|
||||
cdef float iw, ih, box_area
|
||||
cdef float ua
|
||||
cdef int pos = 0
|
||||
cdef float maxscore = 0
|
||||
cdef int maxpos = 0
|
||||
cdef float x1, x2, y1, y2, tx1, tx2, ty1, ty2, ts, area, weight, ov
|
||||
|
||||
for i in range(N):
|
||||
maxscore = boxes[i, 4]
|
||||
maxpos = i
|
||||
|
||||
tx1 = boxes[i, 0]
|
||||
ty1 = boxes[i, 1]
|
||||
tx2 = boxes[i, 2]
|
||||
ty2 = boxes[i, 3]
|
||||
ts = boxes[i, 4]
|
||||
|
||||
pos = i + 1
|
||||
# get max box
|
||||
while pos < N:
|
||||
if maxscore < boxes[pos, 4]:
|
||||
maxscore = boxes[pos, 4]
|
||||
maxpos = pos
|
||||
pos = pos + 1
|
||||
|
||||
# add max box as a detection
|
||||
boxes[i, 0] = boxes[maxpos, 0]
|
||||
boxes[i, 1] = boxes[maxpos, 1]
|
||||
boxes[i, 2] = boxes[maxpos, 2]
|
||||
boxes[i, 3] = boxes[maxpos, 3]
|
||||
boxes[i, 4] = boxes[maxpos, 4]
|
||||
|
||||
# swap ith box with position of max box
|
||||
boxes[maxpos, 0] = tx1
|
||||
boxes[maxpos, 1] = ty1
|
||||
boxes[maxpos, 2] = tx2
|
||||
boxes[maxpos, 3] = ty2
|
||||
boxes[maxpos, 4] = ts
|
||||
|
||||
tx1 = boxes[i, 0]
|
||||
ty1 = boxes[i, 1]
|
||||
tx2 = boxes[i, 2]
|
||||
ty2 = boxes[i, 3]
|
||||
ts = boxes[i, 4]
|
||||
|
||||
pos = i + 1
|
||||
# NMS iterations, note that N changes if detection boxes fall below threshold
|
||||
while pos < N:
|
||||
x1 = boxes[pos, 0]
|
||||
y1 = boxes[pos, 1]
|
||||
x2 = boxes[pos, 2]
|
||||
y2 = boxes[pos, 3]
|
||||
s = boxes[pos, 4]
|
||||
|
||||
area = (x2 - x1 + 1) * (y2 - y1 + 1)
|
||||
iw = (min(tx2, x2) - max(tx1, x1) + 1)
|
||||
if iw > 0:
|
||||
ih = (min(ty2, y2) - max(ty1, y1) + 1)
|
||||
if ih > 0:
|
||||
ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
|
||||
ov = iw * ih / ua #iou between max box and detection box
|
||||
|
||||
if method == 1: # linear
|
||||
if ov > Nt:
|
||||
weight = 1 - ov
|
||||
else:
|
||||
weight = 1
|
||||
elif method == 2: # gaussian
|
||||
weight = np.exp(-(ov * ov) / sigma)
|
||||
else: # original NMS
|
||||
if ov > Nt:
|
||||
weight = 0
|
||||
else:
|
||||
weight = 1
|
||||
|
||||
boxes[pos, 4] = weight * boxes[pos, 4]
|
||||
|
||||
# if box score falls below threshold, discard the box by swapping with last box
|
||||
# update N
|
||||
if boxes[pos, 4] < threshold:
|
||||
boxes[pos, 0] = boxes[N - 1, 0]
|
||||
boxes[pos, 1] = boxes[N - 1, 1]
|
||||
boxes[pos, 2] = boxes[N - 1, 2]
|
||||
boxes[pos, 3] = boxes[N - 1, 3]
|
||||
boxes[pos, 4] = boxes[N - 1, 4]
|
||||
N = N - 1
|
||||
pos = pos - 1
|
||||
|
||||
pos = pos + 1
|
||||
|
||||
keep = [i for i in range(N)]
|
||||
return keep
|
||||
|
||||
def soft_nms_merge(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001,
|
||||
unsigned int method=0, float weight_exp=6):
|
||||
cdef unsigned int N = boxes.shape[0]
|
||||
cdef float iw, ih, box_area
|
||||
cdef float ua
|
||||
cdef int pos = 0
|
||||
cdef float maxscore = 0
|
||||
cdef int maxpos = 0
|
||||
cdef float x1, x2, y1, y2, tx1, tx2, ty1, ty2, ts, area, weight, ov
|
||||
cdef float mx1, mx2, my1, my2, mts, mbs, mw
|
||||
|
||||
for i in range(N):
|
||||
maxscore = boxes[i, 4]
|
||||
maxpos = i
|
||||
|
||||
tx1 = boxes[i, 0]
|
||||
ty1 = boxes[i, 1]
|
||||
tx2 = boxes[i, 2]
|
||||
ty2 = boxes[i, 3]
|
||||
ts = boxes[i, 4]
|
||||
|
||||
pos = i + 1
|
||||
# get max box
|
||||
while pos < N:
|
||||
if maxscore < boxes[pos, 4]:
|
||||
maxscore = boxes[pos, 4]
|
||||
maxpos = pos
|
||||
pos = pos + 1
|
||||
|
||||
# add max box as a detection
|
||||
boxes[i, 0] = boxes[maxpos, 0]
|
||||
boxes[i, 1] = boxes[maxpos, 1]
|
||||
boxes[i, 2] = boxes[maxpos, 2]
|
||||
boxes[i, 3] = boxes[maxpos, 3]
|
||||
boxes[i, 4] = boxes[maxpos, 4]
|
||||
|
||||
mx1 = boxes[i, 0] * boxes[i, 5]
|
||||
my1 = boxes[i, 1] * boxes[i, 5]
|
||||
mx2 = boxes[i, 2] * boxes[i, 6]
|
||||
my2 = boxes[i, 3] * boxes[i, 6]
|
||||
mts = boxes[i, 5]
|
||||
mbs = boxes[i, 6]
|
||||
|
||||
# swap ith box with position of max box
|
||||
boxes[maxpos, 0] = tx1
|
||||
boxes[maxpos, 1] = ty1
|
||||
boxes[maxpos, 2] = tx2
|
||||
boxes[maxpos, 3] = ty2
|
||||
boxes[maxpos, 4] = ts
|
||||
|
||||
tx1 = boxes[i, 0]
|
||||
ty1 = boxes[i, 1]
|
||||
tx2 = boxes[i, 2]
|
||||
ty2 = boxes[i, 3]
|
||||
ts = boxes[i, 4]
|
||||
|
||||
pos = i + 1
|
||||
# NMS iterations, note that N changes if detection boxes fall below threshold
|
||||
while pos < N:
|
||||
x1 = boxes[pos, 0]
|
||||
y1 = boxes[pos, 1]
|
||||
x2 = boxes[pos, 2]
|
||||
y2 = boxes[pos, 3]
|
||||
s = boxes[pos, 4]
|
||||
|
||||
area = (x2 - x1 + 1) * (y2 - y1 + 1)
|
||||
iw = (min(tx2, x2) - max(tx1, x1) + 1)
|
||||
if iw > 0:
|
||||
ih = (min(ty2, y2) - max(ty1, y1) + 1)
|
||||
if ih > 0:
|
||||
ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
|
||||
ov = iw * ih / ua #iou between max box and detection box
|
||||
|
||||
if method == 1: # linear
|
||||
if ov > Nt:
|
||||
weight = 1 - ov
|
||||
else:
|
||||
weight = 1
|
||||
elif method == 2: # gaussian
|
||||
weight = np.exp(-(ov * ov) / sigma)
|
||||
else: # original NMS
|
||||
if ov > Nt:
|
||||
weight = 0
|
||||
else:
|
||||
weight = 1
|
||||
|
||||
mw = (1 - weight) ** weight_exp
|
||||
mx1 = mx1 + boxes[pos, 0] * boxes[pos, 5] * mw
|
||||
my1 = my1 + boxes[pos, 1] * boxes[pos, 5] * mw
|
||||
mx2 = mx2 + boxes[pos, 2] * boxes[pos, 6] * mw
|
||||
my2 = my2 + boxes[pos, 3] * boxes[pos, 6] * mw
|
||||
mts = mts + boxes[pos, 5] * mw
|
||||
mbs = mbs + boxes[pos, 6] * mw
|
||||
|
||||
boxes[pos, 4] = weight * boxes[pos, 4]
|
||||
|
||||
# if box score falls below threshold, discard the box by swapping with last box
|
||||
# update N
|
||||
if boxes[pos, 4] < threshold:
|
||||
boxes[pos, 0] = boxes[N - 1, 0]
|
||||
boxes[pos, 1] = boxes[N - 1, 1]
|
||||
boxes[pos, 2] = boxes[N - 1, 2]
|
||||
boxes[pos, 3] = boxes[N - 1, 3]
|
||||
boxes[pos, 4] = boxes[N - 1, 4]
|
||||
N = N - 1
|
||||
pos = pos - 1
|
||||
|
||||
pos = pos + 1
|
||||
|
||||
boxes[i, 0] = mx1 / mts
|
||||
boxes[i, 1] = my1 / mts
|
||||
boxes[i, 2] = mx2 / mbs
|
||||
boxes[i, 3] = my2 / mbs
|
||||
|
||||
keep = [i for i in range(N)]
|
||||
return keep
|
||||
24
object_detection/core/external/setup.py
vendored
Normal file
24
object_detection/core/external/setup.py
vendored
Normal file
@@ -0,0 +1,24 @@
|
||||
from distutils.core import setup
|
||||
from distutils.extension import Extension
|
||||
|
||||
import numpy
|
||||
from Cython.Build import cythonize
|
||||
|
||||
extensions = [
|
||||
Extension(
|
||||
"bbox",
|
||||
["bbox.pyx"],
|
||||
extra_compile_args=["-Wno-cpp", "-Wno-unused-function"]
|
||||
),
|
||||
Extension(
|
||||
"nms",
|
||||
["nms.pyx"],
|
||||
extra_compile_args=["-Wno-cpp", "-Wno-unused-function"]
|
||||
)
|
||||
]
|
||||
|
||||
setup(
|
||||
name="coco",
|
||||
ext_modules=cythonize(extensions),
|
||||
include_dirs=[numpy.get_include()]
|
||||
)
|
||||
73
object_detection/core/models/CornerNet.py
Normal file
73
object_detection/core/models/CornerNet.py
Normal file
@@ -0,0 +1,73 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
from .py_utils import TopPool, BottomPool, LeftPool, RightPool
|
||||
from .py_utils.losses import CornerNet_Loss
|
||||
from .py_utils.modules import hg_module, hg, hg_net
|
||||
from .py_utils.utils import convolution, residual, corner_pool
|
||||
|
||||
|
||||
def make_pool_layer(dim):
|
||||
return nn.Sequential()
|
||||
|
||||
|
||||
def make_hg_layer(inp_dim, out_dim, modules):
|
||||
layers = [residual(inp_dim, out_dim, stride=2)]
|
||||
layers += [residual(out_dim, out_dim) for _ in range(1, modules)]
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
|
||||
class model(hg_net):
|
||||
def _pred_mod(self, dim):
|
||||
return nn.Sequential(
|
||||
convolution(3, 256, 256, with_bn=False),
|
||||
nn.Conv2d(256, dim, (1, 1))
|
||||
)
|
||||
|
||||
def _merge_mod(self):
|
||||
return nn.Sequential(
|
||||
nn.Conv2d(256, 256, (1, 1), bias=False),
|
||||
nn.BatchNorm2d(256)
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
stacks = 2
|
||||
pre = nn.Sequential(
|
||||
convolution(7, 3, 128, stride=2),
|
||||
residual(128, 256, stride=2)
|
||||
)
|
||||
hg_mods = nn.ModuleList([
|
||||
hg_module(
|
||||
5, [256, 256, 384, 384, 384, 512], [2, 2, 2, 2, 2, 4],
|
||||
make_pool_layer=make_pool_layer,
|
||||
make_hg_layer=make_hg_layer
|
||||
) for _ in range(stacks)
|
||||
])
|
||||
cnvs = nn.ModuleList([convolution(3, 256, 256) for _ in range(stacks)])
|
||||
inters = nn.ModuleList([residual(256, 256) for _ in range(stacks - 1)])
|
||||
cnvs_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
|
||||
inters_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
|
||||
|
||||
hgs = hg(pre, hg_mods, cnvs, inters, cnvs_, inters_)
|
||||
|
||||
tl_modules = nn.ModuleList([corner_pool(256, TopPool, LeftPool) for _ in range(stacks)])
|
||||
br_modules = nn.ModuleList([corner_pool(256, BottomPool, RightPool) for _ in range(stacks)])
|
||||
|
||||
tl_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
|
||||
br_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
|
||||
for tl_heat, br_heat in zip(tl_heats, br_heats):
|
||||
torch.nn.init.constant_(tl_heat[-1].bias, -2.19)
|
||||
torch.nn.init.constant_(br_heat[-1].bias, -2.19)
|
||||
|
||||
tl_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
|
||||
br_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
|
||||
|
||||
tl_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
|
||||
br_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
|
||||
|
||||
super(model, self).__init__(
|
||||
hgs, tl_modules, br_modules, tl_heats, br_heats,
|
||||
tl_tags, br_tags, tl_offs, br_offs
|
||||
)
|
||||
|
||||
self.loss = CornerNet_Loss(pull_weight=1e-1, push_weight=1e-1)
|
||||
93
object_detection/core/models/CornerNet_Saccade.py
Normal file
93
object_detection/core/models/CornerNet_Saccade.py
Normal file
@@ -0,0 +1,93 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
from .py_utils import TopPool, BottomPool, LeftPool, RightPool
|
||||
from .py_utils.losses import CornerNet_Saccade_Loss
|
||||
from .py_utils.modules import saccade_net, saccade_module, saccade
|
||||
from .py_utils.utils import convolution, residual, corner_pool
|
||||
|
||||
|
||||
def make_pool_layer(dim):
|
||||
return nn.Sequential()
|
||||
|
||||
|
||||
def make_hg_layer(inp_dim, out_dim, modules):
|
||||
layers = [residual(inp_dim, out_dim, stride=2)]
|
||||
layers += [residual(out_dim, out_dim) for _ in range(1, modules)]
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
|
||||
class model(saccade_net):
|
||||
def _pred_mod(self, dim):
|
||||
return nn.Sequential(
|
||||
convolution(3, 256, 256, with_bn=False),
|
||||
nn.Conv2d(256, dim, (1, 1))
|
||||
)
|
||||
|
||||
def _merge_mod(self):
|
||||
return nn.Sequential(
|
||||
nn.Conv2d(256, 256, (1, 1), bias=False),
|
||||
nn.BatchNorm2d(256)
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
stacks = 3
|
||||
pre = nn.Sequential(
|
||||
convolution(7, 3, 128, stride=2),
|
||||
residual(128, 256, stride=2)
|
||||
)
|
||||
hg_mods = nn.ModuleList([
|
||||
saccade_module(
|
||||
3, [256, 384, 384, 512], [1, 1, 1, 1],
|
||||
make_pool_layer=make_pool_layer,
|
||||
make_hg_layer=make_hg_layer
|
||||
) for _ in range(stacks)
|
||||
])
|
||||
cnvs = nn.ModuleList([convolution(3, 256, 256) for _ in range(stacks)])
|
||||
inters = nn.ModuleList([residual(256, 256) for _ in range(stacks - 1)])
|
||||
cnvs_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
|
||||
inters_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
|
||||
|
||||
att_mods = nn.ModuleList([
|
||||
nn.ModuleList([
|
||||
nn.Sequential(
|
||||
convolution(3, 384, 256, with_bn=False),
|
||||
nn.Conv2d(256, 1, (1, 1))
|
||||
),
|
||||
nn.Sequential(
|
||||
convolution(3, 384, 256, with_bn=False),
|
||||
nn.Conv2d(256, 1, (1, 1))
|
||||
),
|
||||
nn.Sequential(
|
||||
convolution(3, 256, 256, with_bn=False),
|
||||
nn.Conv2d(256, 1, (1, 1))
|
||||
)
|
||||
]) for _ in range(stacks)
|
||||
])
|
||||
for att_mod in att_mods:
|
||||
for att in att_mod:
|
||||
torch.nn.init.constant_(att[-1].bias, -2.19)
|
||||
|
||||
hgs = saccade(pre, hg_mods, cnvs, inters, cnvs_, inters_)
|
||||
|
||||
tl_modules = nn.ModuleList([corner_pool(256, TopPool, LeftPool) for _ in range(stacks)])
|
||||
br_modules = nn.ModuleList([corner_pool(256, BottomPool, RightPool) for _ in range(stacks)])
|
||||
|
||||
tl_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
|
||||
br_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
|
||||
for tl_heat, br_heat in zip(tl_heats, br_heats):
|
||||
torch.nn.init.constant_(tl_heat[-1].bias, -2.19)
|
||||
torch.nn.init.constant_(br_heat[-1].bias, -2.19)
|
||||
|
||||
tl_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
|
||||
br_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
|
||||
|
||||
tl_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
|
||||
br_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
|
||||
|
||||
super(model, self).__init__(
|
||||
hgs, tl_modules, br_modules, tl_heats, br_heats,
|
||||
tl_tags, br_tags, tl_offs, br_offs, att_mods
|
||||
)
|
||||
|
||||
self.loss = CornerNet_Saccade_Loss(pull_weight=1e-1, push_weight=1e-1)
|
||||
117
object_detection/core/models/CornerNet_Squeeze.py
Normal file
117
object_detection/core/models/CornerNet_Squeeze.py
Normal file
@@ -0,0 +1,117 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
from .py_utils import TopPool, BottomPool, LeftPool, RightPool
|
||||
from .py_utils.losses import CornerNet_Loss
|
||||
from .py_utils.modules import hg_module, hg, hg_net
|
||||
from .py_utils.utils import convolution, corner_pool, residual
|
||||
|
||||
|
||||
class fire_module(nn.Module):
|
||||
def __init__(self, inp_dim, out_dim, sr=2, stride=1):
|
||||
super(fire_module, self).__init__()
|
||||
self.conv1 = nn.Conv2d(inp_dim, out_dim // sr, kernel_size=1, stride=1, bias=False)
|
||||
self.bn1 = nn.BatchNorm2d(out_dim // sr)
|
||||
self.conv_1x1 = nn.Conv2d(out_dim // sr, out_dim // 2, kernel_size=1, stride=stride, bias=False)
|
||||
self.conv_3x3 = nn.Conv2d(out_dim // sr, out_dim // 2, kernel_size=3, padding=1,
|
||||
stride=stride, groups=out_dim // sr, bias=False)
|
||||
self.bn2 = nn.BatchNorm2d(out_dim)
|
||||
self.skip = (stride == 1 and inp_dim == out_dim)
|
||||
self.relu = nn.ReLU(inplace=True)
|
||||
|
||||
def forward(self, x):
|
||||
conv1 = self.conv1(x)
|
||||
bn1 = self.bn1(conv1)
|
||||
conv2 = torch.cat((self.conv_1x1(bn1), self.conv_3x3(bn1)), 1)
|
||||
bn2 = self.bn2(conv2)
|
||||
if self.skip:
|
||||
return self.relu(bn2 + x)
|
||||
else:
|
||||
return self.relu(bn2)
|
||||
|
||||
|
||||
def make_pool_layer(dim):
|
||||
return nn.Sequential()
|
||||
|
||||
|
||||
def make_unpool_layer(dim):
|
||||
return nn.ConvTranspose2d(dim, dim, kernel_size=4, stride=2, padding=1)
|
||||
|
||||
|
||||
def make_layer(inp_dim, out_dim, modules):
|
||||
layers = [fire_module(inp_dim, out_dim)]
|
||||
layers += [fire_module(out_dim, out_dim) for _ in range(1, modules)]
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
|
||||
def make_layer_revr(inp_dim, out_dim, modules):
|
||||
layers = [fire_module(inp_dim, inp_dim) for _ in range(modules - 1)]
|
||||
layers += [fire_module(inp_dim, out_dim)]
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
|
||||
def make_hg_layer(inp_dim, out_dim, modules):
|
||||
layers = [fire_module(inp_dim, out_dim, stride=2)]
|
||||
layers += [fire_module(out_dim, out_dim) for _ in range(1, modules)]
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
|
||||
class model(hg_net):
|
||||
def _pred_mod(self, dim):
|
||||
return nn.Sequential(
|
||||
convolution(1, 256, 256, with_bn=False),
|
||||
nn.Conv2d(256, dim, (1, 1))
|
||||
)
|
||||
|
||||
def _merge_mod(self):
|
||||
return nn.Sequential(
|
||||
nn.Conv2d(256, 256, (1, 1), bias=False),
|
||||
nn.BatchNorm2d(256)
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
stacks = 2
|
||||
pre = nn.Sequential(
|
||||
convolution(7, 3, 128, stride=2),
|
||||
residual(128, 256, stride=2),
|
||||
residual(256, 256, stride=2)
|
||||
)
|
||||
hg_mods = nn.ModuleList([
|
||||
hg_module(
|
||||
4, [256, 256, 384, 384, 512], [2, 2, 2, 2, 4],
|
||||
make_pool_layer=make_pool_layer,
|
||||
make_unpool_layer=make_unpool_layer,
|
||||
make_up_layer=make_layer,
|
||||
make_low_layer=make_layer,
|
||||
make_hg_layer_revr=make_layer_revr,
|
||||
make_hg_layer=make_hg_layer
|
||||
) for _ in range(stacks)
|
||||
])
|
||||
cnvs = nn.ModuleList([convolution(3, 256, 256) for _ in range(stacks)])
|
||||
inters = nn.ModuleList([residual(256, 256) for _ in range(stacks - 1)])
|
||||
cnvs_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
|
||||
inters_ = nn.ModuleList([self._merge_mod() for _ in range(stacks - 1)])
|
||||
|
||||
hgs = hg(pre, hg_mods, cnvs, inters, cnvs_, inters_)
|
||||
|
||||
tl_modules = nn.ModuleList([corner_pool(256, TopPool, LeftPool) for _ in range(stacks)])
|
||||
br_modules = nn.ModuleList([corner_pool(256, BottomPool, RightPool) for _ in range(stacks)])
|
||||
|
||||
tl_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
|
||||
br_heats = nn.ModuleList([self._pred_mod(80) for _ in range(stacks)])
|
||||
for tl_heat, br_heat in zip(tl_heats, br_heats):
|
||||
torch.nn.init.constant_(tl_heat[-1].bias, -2.19)
|
||||
torch.nn.init.constant_(br_heat[-1].bias, -2.19)
|
||||
|
||||
tl_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
|
||||
br_tags = nn.ModuleList([self._pred_mod(1) for _ in range(stacks)])
|
||||
|
||||
tl_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
|
||||
br_offs = nn.ModuleList([self._pred_mod(2) for _ in range(stacks)])
|
||||
|
||||
super(model, self).__init__(
|
||||
hgs, tl_modules, br_modules, tl_heats, br_heats,
|
||||
tl_tags, br_tags, tl_offs, br_offs
|
||||
)
|
||||
|
||||
self.loss = CornerNet_Loss(pull_weight=1e-1, push_weight=1e-1)
|
||||
0
object_detection/core/models/__init__.py
Normal file
0
object_detection/core/models/__init__.py
Normal file
1
object_detection/core/models/py_utils/__init__.py
Normal file
1
object_detection/core/models/py_utils/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
from ._cpools import TopPool, BottomPool, LeftPool, RightPool
|
||||
3
object_detection/core/models/py_utils/_cpools/.gitignore
vendored
Normal file
3
object_detection/core/models/py_utils/_cpools/.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
build/
|
||||
cpools.egg-info/
|
||||
dist/
|
||||
82
object_detection/core/models/py_utils/_cpools/__init__.py
Normal file
82
object_detection/core/models/py_utils/_cpools/__init__.py
Normal file
@@ -0,0 +1,82 @@
|
||||
import bottom_pool
|
||||
import left_pool
|
||||
import right_pool
|
||||
import top_pool
|
||||
from torch import nn
|
||||
from torch.autograd import Function
|
||||
|
||||
|
||||
class TopPoolFunction(Function):
|
||||
@staticmethod
|
||||
def forward(ctx, input):
|
||||
output = top_pool.forward(input)[0]
|
||||
ctx.save_for_backward(input)
|
||||
return output
|
||||
|
||||
@staticmethod
|
||||
def backward(ctx, grad_output):
|
||||
input = ctx.saved_variables[0]
|
||||
output = top_pool.backward(input, grad_output)[0]
|
||||
return output
|
||||
|
||||
|
||||
class BottomPoolFunction(Function):
|
||||
@staticmethod
|
||||
def forward(ctx, input):
|
||||
output = bottom_pool.forward(input)[0]
|
||||
ctx.save_for_backward(input)
|
||||
return output
|
||||
|
||||
@staticmethod
|
||||
def backward(ctx, grad_output):
|
||||
input = ctx.saved_variables[0]
|
||||
output = bottom_pool.backward(input, grad_output)[0]
|
||||
return output
|
||||
|
||||
|
||||
class LeftPoolFunction(Function):
|
||||
@staticmethod
|
||||
def forward(ctx, input):
|
||||
output = left_pool.forward(input)[0]
|
||||
ctx.save_for_backward(input)
|
||||
return output
|
||||
|
||||
@staticmethod
|
||||
def backward(ctx, grad_output):
|
||||
input = ctx.saved_variables[0]
|
||||
output = left_pool.backward(input, grad_output)[0]
|
||||
return output
|
||||
|
||||
|
||||
class RightPoolFunction(Function):
|
||||
@staticmethod
|
||||
def forward(ctx, input):
|
||||
output = right_pool.forward(input)[0]
|
||||
ctx.save_for_backward(input)
|
||||
return output
|
||||
|
||||
@staticmethod
|
||||
def backward(ctx, grad_output):
|
||||
input = ctx.saved_variables[0]
|
||||
output = right_pool.backward(input, grad_output)[0]
|
||||
return output
|
||||
|
||||
|
||||
class TopPool(nn.Module):
|
||||
def forward(self, x):
|
||||
return TopPoolFunction.apply(x)
|
||||
|
||||
|
||||
class BottomPool(nn.Module):
|
||||
def forward(self, x):
|
||||
return BottomPoolFunction.apply(x)
|
||||
|
||||
|
||||
class LeftPool(nn.Module):
|
||||
def forward(self, x):
|
||||
return LeftPoolFunction.apply(x)
|
||||
|
||||
|
||||
class RightPool(nn.Module):
|
||||
def forward(self, x):
|
||||
return RightPoolFunction.apply(x)
|
||||
15
object_detection/core/models/py_utils/_cpools/setup.py
Normal file
15
object_detection/core/models/py_utils/_cpools/setup.py
Normal file
@@ -0,0 +1,15 @@
|
||||
from setuptools import setup
|
||||
from torch.utils.cpp_extension import BuildExtension, CppExtension
|
||||
|
||||
setup(
|
||||
name="cpools",
|
||||
ext_modules=[
|
||||
CppExtension("top_pool", ["src/top_pool.cpp"]),
|
||||
CppExtension("bottom_pool", ["src/bottom_pool.cpp"]),
|
||||
CppExtension("left_pool", ["src/left_pool.cpp"]),
|
||||
CppExtension("right_pool", ["src/right_pool.cpp"])
|
||||
],
|
||||
cmdclass={
|
||||
"build_ext": BuildExtension
|
||||
}
|
||||
)
|
||||
@@ -0,0 +1,80 @@
|
||||
#include <torch/torch.h>
|
||||
|
||||
#include <vector>
|
||||
|
||||
std::vector<at::Tensor> pool_forward(
|
||||
at::Tensor input
|
||||
) {
|
||||
// Initialize output
|
||||
at::Tensor output = at::zeros_like(input);
|
||||
|
||||
// Get height
|
||||
int64_t height = input.size(2);
|
||||
|
||||
output.copy_(input);
|
||||
|
||||
for (int64_t ind = 1; ind < height; ind <<= 1) {
|
||||
at::Tensor max_temp = at::slice(output, 2, ind, height);
|
||||
at::Tensor cur_temp = at::slice(output, 2, ind, height);
|
||||
at::Tensor next_temp = at::slice(output, 2, 0, height-ind);
|
||||
at::max_out(max_temp, cur_temp, next_temp);
|
||||
}
|
||||
|
||||
return {
|
||||
output
|
||||
};
|
||||
}
|
||||
|
||||
std::vector<at::Tensor> pool_backward(
|
||||
at::Tensor input,
|
||||
at::Tensor grad_output
|
||||
) {
|
||||
auto output = at::zeros_like(input);
|
||||
|
||||
int32_t batch = input.size(0);
|
||||
int32_t channel = input.size(1);
|
||||
int32_t height = input.size(2);
|
||||
int32_t width = input.size(3);
|
||||
|
||||
auto max_val = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat));
|
||||
auto max_ind = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kLong));
|
||||
|
||||
auto input_temp = input.select(2, 0);
|
||||
max_val.copy_(input_temp);
|
||||
|
||||
max_ind.fill_(0);
|
||||
|
||||
auto output_temp = output.select(2, 0);
|
||||
auto grad_output_temp = grad_output.select(2, 0);
|
||||
output_temp.copy_(grad_output_temp);
|
||||
|
||||
auto un_max_ind = max_ind.unsqueeze(2);
|
||||
auto gt_mask = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kByte));
|
||||
auto max_temp = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat));
|
||||
for (int32_t ind = 0; ind < height - 1; ++ind) {
|
||||
input_temp = input.select(2, ind + 1);
|
||||
at::gt_out(gt_mask, input_temp, max_val);
|
||||
|
||||
at::masked_select_out(max_temp, input_temp, gt_mask);
|
||||
max_val.masked_scatter_(gt_mask, max_temp);
|
||||
max_ind.masked_fill_(gt_mask, ind + 1);
|
||||
|
||||
grad_output_temp = grad_output.select(2, ind + 1).unsqueeze(2);
|
||||
output.scatter_add_(2, un_max_ind, grad_output_temp);
|
||||
}
|
||||
|
||||
return {
|
||||
output
|
||||
};
|
||||
}
|
||||
|
||||
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
|
||||
m.def(
|
||||
"forward", &pool_forward, "Bottom Pool Forward",
|
||||
py::call_guard<py::gil_scoped_release>()
|
||||
);
|
||||
m.def(
|
||||
"backward", &pool_backward, "Bottom Pool Backward",
|
||||
py::call_guard<py::gil_scoped_release>()
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,80 @@
|
||||
#include <torch/torch.h>
|
||||
|
||||
#include <vector>
|
||||
|
||||
std::vector<at::Tensor> pool_forward(
|
||||
at::Tensor input
|
||||
) {
|
||||
// Initialize output
|
||||
at::Tensor output = at::zeros_like(input);
|
||||
|
||||
// Get width
|
||||
int64_t width = input.size(3);
|
||||
|
||||
output.copy_(input);
|
||||
|
||||
for (int64_t ind = 1; ind < width; ind <<= 1) {
|
||||
at::Tensor max_temp = at::slice(output, 3, 0, width-ind);
|
||||
at::Tensor cur_temp = at::slice(output, 3, 0, width-ind);
|
||||
at::Tensor next_temp = at::slice(output, 3, ind, width);
|
||||
at::max_out(max_temp, cur_temp, next_temp);
|
||||
}
|
||||
|
||||
return {
|
||||
output
|
||||
};
|
||||
}
|
||||
|
||||
std::vector<at::Tensor> pool_backward(
|
||||
at::Tensor input,
|
||||
at::Tensor grad_output
|
||||
) {
|
||||
auto output = at::zeros_like(input);
|
||||
|
||||
int32_t batch = input.size(0);
|
||||
int32_t channel = input.size(1);
|
||||
int32_t height = input.size(2);
|
||||
int32_t width = input.size(3);
|
||||
|
||||
auto max_val = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat));
|
||||
auto max_ind = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kLong));
|
||||
|
||||
auto input_temp = input.select(3, width - 1);
|
||||
max_val.copy_(input_temp);
|
||||
|
||||
max_ind.fill_(width - 1);
|
||||
|
||||
auto output_temp = output.select(3, width - 1);
|
||||
auto grad_output_temp = grad_output.select(3, width - 1);
|
||||
output_temp.copy_(grad_output_temp);
|
||||
|
||||
auto un_max_ind = max_ind.unsqueeze(3);
|
||||
auto gt_mask = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kByte));
|
||||
auto max_temp = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat));
|
||||
for (int32_t ind = 1; ind < width; ++ind) {
|
||||
input_temp = input.select(3, width - ind - 1);
|
||||
at::gt_out(gt_mask, input_temp, max_val);
|
||||
|
||||
at::masked_select_out(max_temp, input_temp, gt_mask);
|
||||
max_val.masked_scatter_(gt_mask, max_temp);
|
||||
max_ind.masked_fill_(gt_mask, width - ind - 1);
|
||||
|
||||
grad_output_temp = grad_output.select(3, width - ind - 1).unsqueeze(3);
|
||||
output.scatter_add_(3, un_max_ind, grad_output_temp);
|
||||
}
|
||||
|
||||
return {
|
||||
output
|
||||
};
|
||||
}
|
||||
|
||||
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
|
||||
m.def(
|
||||
"forward", &pool_forward, "Left Pool Forward",
|
||||
py::call_guard<py::gil_scoped_release>()
|
||||
);
|
||||
m.def(
|
||||
"backward", &pool_backward, "Left Pool Backward",
|
||||
py::call_guard<py::gil_scoped_release>()
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,80 @@
|
||||
#include <torch/torch.h>
|
||||
|
||||
#include <vector>
|
||||
|
||||
std::vector<at::Tensor> pool_forward(
|
||||
at::Tensor input
|
||||
) {
|
||||
// Initialize output
|
||||
at::Tensor output = at::zeros_like(input);
|
||||
|
||||
// Get width
|
||||
int64_t width = input.size(3);
|
||||
|
||||
output.copy_(input);
|
||||
|
||||
for (int64_t ind = 1; ind < width; ind <<= 1) {
|
||||
at::Tensor max_temp = at::slice(output, 3, ind, width);
|
||||
at::Tensor cur_temp = at::slice(output, 3, ind, width);
|
||||
at::Tensor next_temp = at::slice(output, 3, 0, width-ind);
|
||||
at::max_out(max_temp, cur_temp, next_temp);
|
||||
}
|
||||
|
||||
return {
|
||||
output
|
||||
};
|
||||
}
|
||||
|
||||
std::vector<at::Tensor> pool_backward(
|
||||
at::Tensor input,
|
||||
at::Tensor grad_output
|
||||
) {
|
||||
at::Tensor output = at::zeros_like(input);
|
||||
|
||||
int32_t batch = input.size(0);
|
||||
int32_t channel = input.size(1);
|
||||
int32_t height = input.size(2);
|
||||
int32_t width = input.size(3);
|
||||
|
||||
auto max_val = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat));
|
||||
auto max_ind = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kLong));
|
||||
|
||||
auto input_temp = input.select(3, 0);
|
||||
max_val.copy_(input_temp);
|
||||
|
||||
max_ind.fill_(0);
|
||||
|
||||
auto output_temp = output.select(3, 0);
|
||||
auto grad_output_temp = grad_output.select(3, 0);
|
||||
output_temp.copy_(grad_output_temp);
|
||||
|
||||
auto un_max_ind = max_ind.unsqueeze(3);
|
||||
auto gt_mask = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kByte));
|
||||
auto max_temp = torch::zeros({batch, channel, height}, at::device(at::kCUDA).dtype(at::kFloat));
|
||||
for (int32_t ind = 0; ind < width - 1; ++ind) {
|
||||
input_temp = input.select(3, ind + 1);
|
||||
at::gt_out(gt_mask, input_temp, max_val);
|
||||
|
||||
at::masked_select_out(max_temp, input_temp, gt_mask);
|
||||
max_val.masked_scatter_(gt_mask, max_temp);
|
||||
max_ind.masked_fill_(gt_mask, ind + 1);
|
||||
|
||||
grad_output_temp = grad_output.select(3, ind + 1).unsqueeze(3);
|
||||
output.scatter_add_(3, un_max_ind, grad_output_temp);
|
||||
}
|
||||
|
||||
return {
|
||||
output
|
||||
};
|
||||
}
|
||||
|
||||
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
|
||||
m.def(
|
||||
"forward", &pool_forward, "Right Pool Forward",
|
||||
py::call_guard<py::gil_scoped_release>()
|
||||
);
|
||||
m.def(
|
||||
"backward", &pool_backward, "Right Pool Backward",
|
||||
py::call_guard<py::gil_scoped_release>()
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,80 @@
|
||||
#include <torch/torch.h>
|
||||
|
||||
#include <vector>
|
||||
|
||||
std::vector<at::Tensor> top_pool_forward(
|
||||
at::Tensor input
|
||||
) {
|
||||
// Initialize output
|
||||
at::Tensor output = at::zeros_like(input);
|
||||
|
||||
// Get height
|
||||
int64_t height = input.size(2);
|
||||
|
||||
output.copy_(input);
|
||||
|
||||
for (int64_t ind = 1; ind < height; ind <<= 1) {
|
||||
at::Tensor max_temp = at::slice(output, 2, 0, height-ind);
|
||||
at::Tensor cur_temp = at::slice(output, 2, 0, height-ind);
|
||||
at::Tensor next_temp = at::slice(output, 2, ind, height);
|
||||
at::max_out(max_temp, cur_temp, next_temp);
|
||||
}
|
||||
|
||||
return {
|
||||
output
|
||||
};
|
||||
}
|
||||
|
||||
std::vector<at::Tensor> top_pool_backward(
|
||||
at::Tensor input,
|
||||
at::Tensor grad_output
|
||||
) {
|
||||
auto output = at::zeros_like(input);
|
||||
|
||||
int32_t batch = input.size(0);
|
||||
int32_t channel = input.size(1);
|
||||
int32_t height = input.size(2);
|
||||
int32_t width = input.size(3);
|
||||
|
||||
auto max_val = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat));
|
||||
auto max_ind = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kLong));
|
||||
|
||||
auto input_temp = input.select(2, height - 1);
|
||||
max_val.copy_(input_temp);
|
||||
|
||||
max_ind.fill_(height - 1);
|
||||
|
||||
auto output_temp = output.select(2, height - 1);
|
||||
auto grad_output_temp = grad_output.select(2, height - 1);
|
||||
output_temp.copy_(grad_output_temp);
|
||||
|
||||
auto un_max_ind = max_ind.unsqueeze(2);
|
||||
auto gt_mask = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kByte));
|
||||
auto max_temp = torch::zeros({batch, channel, width}, at::device(at::kCUDA).dtype(at::kFloat));
|
||||
for (int32_t ind = 1; ind < height; ++ind) {
|
||||
input_temp = input.select(2, height - ind - 1);
|
||||
at::gt_out(gt_mask, input_temp, max_val);
|
||||
|
||||
at::masked_select_out(max_temp, input_temp, gt_mask);
|
||||
max_val.masked_scatter_(gt_mask, max_temp);
|
||||
max_ind.masked_fill_(gt_mask, height - ind - 1);
|
||||
|
||||
grad_output_temp = grad_output.select(2, height - ind - 1).unsqueeze(2);
|
||||
output.scatter_add_(2, un_max_ind, grad_output_temp);
|
||||
}
|
||||
|
||||
return {
|
||||
output
|
||||
};
|
||||
}
|
||||
|
||||
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
|
||||
m.def(
|
||||
"forward", &top_pool_forward, "Top Pool Forward",
|
||||
py::call_guard<py::gil_scoped_release>()
|
||||
);
|
||||
m.def(
|
||||
"backward", &top_pool_backward, "Top Pool Backward",
|
||||
py::call_guard<py::gil_scoped_release>()
|
||||
);
|
||||
}
|
||||
117
object_detection/core/models/py_utils/data_parallel.py
Normal file
117
object_detection/core/models/py_utils/data_parallel.py
Normal file
@@ -0,0 +1,117 @@
|
||||
import torch
|
||||
from torch.nn.modules import Module
|
||||
from torch.nn.parallel.parallel_apply import parallel_apply
|
||||
from torch.nn.parallel.replicate import replicate
|
||||
from torch.nn.parallel.scatter_gather import gather
|
||||
|
||||
from .scatter_gather import scatter_kwargs
|
||||
|
||||
|
||||
class DataParallel(Module):
|
||||
r"""Implements data parallelism at the module level.
|
||||
|
||||
This container parallelizes the application of the given module by
|
||||
splitting the input across the specified devices by chunking in the batch
|
||||
dimension. In the forward pass, the module is replicated on each device,
|
||||
and each replica handles a portion of the input. During the backwards
|
||||
pass, gradients from each replica are summed into the original module.
|
||||
|
||||
The batch size should be larger than the number of GPUs used. It should
|
||||
also be an integer multiple of the number of GPUs so that each chunk is the
|
||||
same size (so that each GPU processes the same number of samples).
|
||||
|
||||
See also: :ref:`cuda-nn-dataparallel-instead`
|
||||
|
||||
Arbitrary positional and keyword inputs are allowed to be passed into
|
||||
DataParallel EXCEPT Tensors. All variables will be scattered on dim
|
||||
specified (default 0). Primitive types will be broadcasted, but all
|
||||
other types will be a shallow copy and can be corrupted if written to in
|
||||
the model's forward pass.
|
||||
|
||||
Args:
|
||||
module: module to be parallelized
|
||||
device_ids: CUDA devices (default: all devices)
|
||||
output_device: device location of output (default: device_ids[0])
|
||||
|
||||
Example::
|
||||
|
||||
>>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
|
||||
>>> output = net(input_var)
|
||||
"""
|
||||
|
||||
# TODO: update notes/cuda.rst when this class handles 8+ GPUs well
|
||||
|
||||
def __init__(self, module, device_ids=None, output_device=None, dim=0, chunk_sizes=None):
|
||||
super(DataParallel, self).__init__()
|
||||
|
||||
if not torch.cuda.is_available():
|
||||
self.module = module
|
||||
self.device_ids = []
|
||||
return
|
||||
|
||||
if device_ids is None:
|
||||
device_ids = list(range(torch.cuda.device_count()))
|
||||
if output_device is None:
|
||||
output_device = device_ids[0]
|
||||
self.dim = dim
|
||||
self.module = module
|
||||
self.device_ids = device_ids
|
||||
self.chunk_sizes = chunk_sizes
|
||||
self.output_device = output_device
|
||||
if len(self.device_ids) == 1:
|
||||
self.module.cuda(device_ids[0])
|
||||
|
||||
def forward(self, *inputs, **kwargs):
|
||||
if not self.device_ids:
|
||||
return self.module(*inputs, **kwargs)
|
||||
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
|
||||
if len(self.device_ids) == 1:
|
||||
return self.module(*inputs[0], **kwargs[0])
|
||||
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
|
||||
outputs = self.parallel_apply(replicas, inputs, kwargs)
|
||||
return self.gather(outputs, self.output_device)
|
||||
|
||||
def replicate(self, module, device_ids):
|
||||
return replicate(module, device_ids)
|
||||
|
||||
def scatter(self, inputs, kwargs, device_ids, chunk_sizes):
|
||||
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
|
||||
|
||||
def parallel_apply(self, replicas, inputs, kwargs):
|
||||
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
|
||||
|
||||
def gather(self, outputs, output_device):
|
||||
return gather(outputs, output_device, dim=self.dim)
|
||||
|
||||
|
||||
def data_parallel(module, inputs, device_ids=None, output_device=None, dim=0, module_kwargs=None):
|
||||
r"""Evaluates module(input) in parallel across the GPUs given in device_ids.
|
||||
|
||||
This is the functional version of the DataParallel module.
|
||||
|
||||
Args:
|
||||
module: the module to evaluate in parallel
|
||||
inputs: inputs to the module
|
||||
device_ids: GPU ids on which to replicate module
|
||||
output_device: GPU location of the output Use -1 to indicate the CPU.
|
||||
(default: device_ids[0])
|
||||
Returns:
|
||||
a Variable containing the result of module(input) located on
|
||||
output_device
|
||||
"""
|
||||
if not isinstance(inputs, tuple):
|
||||
inputs = (inputs,)
|
||||
|
||||
if device_ids is None:
|
||||
device_ids = list(range(torch.cuda.device_count()))
|
||||
|
||||
if output_device is None:
|
||||
output_device = device_ids[0]
|
||||
|
||||
inputs, module_kwargs = scatter_kwargs(inputs, module_kwargs, device_ids, dim)
|
||||
if len(device_ids) == 1:
|
||||
return module(*inputs[0], **module_kwargs[0])
|
||||
used_device_ids = device_ids[:len(inputs)]
|
||||
replicas = replicate(module, used_device_ids)
|
||||
outputs = parallel_apply(replicas, inputs, module_kwargs, used_device_ids)
|
||||
return gather(outputs, output_device, dim)
|
||||
231
object_detection/core/models/py_utils/losses.py
Normal file
231
object_detection/core/models/py_utils/losses.py
Normal file
@@ -0,0 +1,231 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
from .utils import _tranpose_and_gather_feat
|
||||
|
||||
|
||||
def _sigmoid(x):
|
||||
return torch.clamp(x.sigmoid_(), min=1e-4, max=1 - 1e-4)
|
||||
|
||||
|
||||
def _ae_loss(tag0, tag1, mask):
|
||||
num = mask.sum(dim=1, keepdim=True).float()
|
||||
tag0 = tag0.squeeze()
|
||||
tag1 = tag1.squeeze()
|
||||
|
||||
tag_mean = (tag0 + tag1) / 2
|
||||
|
||||
tag0 = torch.pow(tag0 - tag_mean, 2) / (num + 1e-4)
|
||||
tag0 = tag0[mask].sum()
|
||||
tag1 = torch.pow(tag1 - tag_mean, 2) / (num + 1e-4)
|
||||
tag1 = tag1[mask].sum()
|
||||
pull = tag0 + tag1
|
||||
|
||||
mask = mask.unsqueeze(1) + mask.unsqueeze(2)
|
||||
mask = mask.eq(2)
|
||||
num = num.unsqueeze(2)
|
||||
num2 = (num - 1) * num
|
||||
dist = tag_mean.unsqueeze(1) - tag_mean.unsqueeze(2)
|
||||
dist = 1 - torch.abs(dist)
|
||||
dist = nn.functional.relu(dist, inplace=True)
|
||||
dist = dist - 1 / (num + 1e-4)
|
||||
dist = dist / (num2 + 1e-4)
|
||||
dist = dist[mask]
|
||||
push = dist.sum()
|
||||
return pull, push
|
||||
|
||||
|
||||
def _off_loss(off, gt_off, mask):
|
||||
num = mask.float().sum()
|
||||
mask = mask.unsqueeze(2).expand_as(gt_off)
|
||||
|
||||
off = off[mask]
|
||||
gt_off = gt_off[mask]
|
||||
|
||||
off_loss = nn.functional.smooth_l1_loss(off, gt_off, reduction="sum")
|
||||
off_loss = off_loss / (num + 1e-4)
|
||||
return off_loss
|
||||
|
||||
|
||||
def _focal_loss_mask(preds, gt, mask):
|
||||
pos_inds = gt.eq(1)
|
||||
neg_inds = gt.lt(1)
|
||||
|
||||
neg_weights = torch.pow(1 - gt[neg_inds], 4)
|
||||
|
||||
pos_mask = mask[pos_inds]
|
||||
neg_mask = mask[neg_inds]
|
||||
|
||||
loss = 0
|
||||
for pred in preds:
|
||||
pos_pred = pred[pos_inds]
|
||||
neg_pred = pred[neg_inds]
|
||||
|
||||
pos_loss = torch.log(pos_pred) * torch.pow(1 - pos_pred, 2) * pos_mask
|
||||
neg_loss = torch.log(1 - neg_pred) * torch.pow(neg_pred, 2) * neg_weights * neg_mask
|
||||
|
||||
num_pos = pos_inds.float().sum()
|
||||
pos_loss = pos_loss.sum()
|
||||
neg_loss = neg_loss.sum()
|
||||
|
||||
if pos_pred.nelement() == 0:
|
||||
loss = loss - neg_loss
|
||||
else:
|
||||
loss = loss - (pos_loss + neg_loss) / num_pos
|
||||
return loss
|
||||
|
||||
|
||||
def _focal_loss(preds, gt):
|
||||
pos_inds = gt.eq(1)
|
||||
neg_inds = gt.lt(1)
|
||||
|
||||
neg_weights = torch.pow(1 - gt[neg_inds], 4)
|
||||
|
||||
loss = 0
|
||||
for pred in preds:
|
||||
pos_pred = pred[pos_inds]
|
||||
neg_pred = pred[neg_inds]
|
||||
|
||||
pos_loss = torch.log(pos_pred) * torch.pow(1 - pos_pred, 2)
|
||||
neg_loss = torch.log(1 - neg_pred) * torch.pow(neg_pred, 2) * neg_weights
|
||||
|
||||
num_pos = pos_inds.float().sum()
|
||||
pos_loss = pos_loss.sum()
|
||||
neg_loss = neg_loss.sum()
|
||||
|
||||
if pos_pred.nelement() == 0:
|
||||
loss = loss - neg_loss
|
||||
else:
|
||||
loss = loss - (pos_loss + neg_loss) / num_pos
|
||||
return loss
|
||||
|
||||
|
||||
class CornerNet_Saccade_Loss(nn.Module):
|
||||
def __init__(self, pull_weight=1, push_weight=1, off_weight=1, focal_loss=_focal_loss_mask):
|
||||
super(CornerNet_Saccade_Loss, self).__init__()
|
||||
|
||||
self.pull_weight = pull_weight
|
||||
self.push_weight = push_weight
|
||||
self.off_weight = off_weight
|
||||
self.focal_loss = focal_loss
|
||||
self.ae_loss = _ae_loss
|
||||
self.off_loss = _off_loss
|
||||
|
||||
def forward(self, outs, targets):
|
||||
tl_heats = outs[0]
|
||||
br_heats = outs[1]
|
||||
tl_tags = outs[2]
|
||||
br_tags = outs[3]
|
||||
tl_offs = outs[4]
|
||||
br_offs = outs[5]
|
||||
atts = outs[6]
|
||||
|
||||
gt_tl_heat = targets[0]
|
||||
gt_br_heat = targets[1]
|
||||
gt_mask = targets[2]
|
||||
gt_tl_off = targets[3]
|
||||
gt_br_off = targets[4]
|
||||
gt_tl_ind = targets[5]
|
||||
gt_br_ind = targets[6]
|
||||
gt_tl_valid = targets[7]
|
||||
gt_br_valid = targets[8]
|
||||
gt_atts = targets[9]
|
||||
|
||||
# focal loss
|
||||
focal_loss = 0
|
||||
|
||||
tl_heats = [_sigmoid(t) for t in tl_heats]
|
||||
br_heats = [_sigmoid(b) for b in br_heats]
|
||||
|
||||
focal_loss += self.focal_loss(tl_heats, gt_tl_heat, gt_tl_valid)
|
||||
focal_loss += self.focal_loss(br_heats, gt_br_heat, gt_br_valid)
|
||||
|
||||
atts = [[_sigmoid(a) for a in att] for att in atts]
|
||||
atts = [[att[ind] for att in atts] for ind in range(len(gt_atts))]
|
||||
|
||||
att_loss = 0
|
||||
for att, gt_att in zip(atts, gt_atts):
|
||||
att_loss += _focal_loss(att, gt_att) / max(len(att), 1)
|
||||
|
||||
# tag loss
|
||||
pull_loss = 0
|
||||
push_loss = 0
|
||||
tl_tags = [_tranpose_and_gather_feat(tl_tag, gt_tl_ind) for tl_tag in tl_tags]
|
||||
br_tags = [_tranpose_and_gather_feat(br_tag, gt_br_ind) for br_tag in br_tags]
|
||||
for tl_tag, br_tag in zip(tl_tags, br_tags):
|
||||
pull, push = self.ae_loss(tl_tag, br_tag, gt_mask)
|
||||
pull_loss += pull
|
||||
push_loss += push
|
||||
pull_loss = self.pull_weight * pull_loss
|
||||
push_loss = self.push_weight * push_loss
|
||||
|
||||
off_loss = 0
|
||||
tl_offs = [_tranpose_and_gather_feat(tl_off, gt_tl_ind) for tl_off in tl_offs]
|
||||
br_offs = [_tranpose_and_gather_feat(br_off, gt_br_ind) for br_off in br_offs]
|
||||
for tl_off, br_off in zip(tl_offs, br_offs):
|
||||
off_loss += self.off_loss(tl_off, gt_tl_off, gt_mask)
|
||||
off_loss += self.off_loss(br_off, gt_br_off, gt_mask)
|
||||
off_loss = self.off_weight * off_loss
|
||||
|
||||
loss = (focal_loss + att_loss + pull_loss + push_loss + off_loss) / max(len(tl_heats), 1)
|
||||
return loss.unsqueeze(0)
|
||||
|
||||
|
||||
class CornerNet_Loss(nn.Module):
|
||||
def __init__(self, pull_weight=1, push_weight=1, off_weight=1, focal_loss=_focal_loss):
|
||||
super(CornerNet_Loss, self).__init__()
|
||||
|
||||
self.pull_weight = pull_weight
|
||||
self.push_weight = push_weight
|
||||
self.off_weight = off_weight
|
||||
self.focal_loss = focal_loss
|
||||
self.ae_loss = _ae_loss
|
||||
self.off_loss = _off_loss
|
||||
|
||||
def forward(self, outs, targets):
|
||||
tl_heats = outs[0]
|
||||
br_heats = outs[1]
|
||||
tl_tags = outs[2]
|
||||
br_tags = outs[3]
|
||||
tl_offs = outs[4]
|
||||
br_offs = outs[5]
|
||||
|
||||
gt_tl_heat = targets[0]
|
||||
gt_br_heat = targets[1]
|
||||
gt_mask = targets[2]
|
||||
gt_tl_off = targets[3]
|
||||
gt_br_off = targets[4]
|
||||
gt_tl_ind = targets[5]
|
||||
gt_br_ind = targets[6]
|
||||
|
||||
# focal loss
|
||||
focal_loss = 0
|
||||
|
||||
tl_heats = [_sigmoid(t) for t in tl_heats]
|
||||
br_heats = [_sigmoid(b) for b in br_heats]
|
||||
|
||||
focal_loss += self.focal_loss(tl_heats, gt_tl_heat)
|
||||
focal_loss += self.focal_loss(br_heats, gt_br_heat)
|
||||
|
||||
# tag loss
|
||||
pull_loss = 0
|
||||
push_loss = 0
|
||||
tl_tags = [_tranpose_and_gather_feat(tl_tag, gt_tl_ind) for tl_tag in tl_tags]
|
||||
br_tags = [_tranpose_and_gather_feat(br_tag, gt_br_ind) for br_tag in br_tags]
|
||||
for tl_tag, br_tag in zip(tl_tags, br_tags):
|
||||
pull, push = self.ae_loss(tl_tag, br_tag, gt_mask)
|
||||
pull_loss += pull
|
||||
push_loss += push
|
||||
pull_loss = self.pull_weight * pull_loss
|
||||
push_loss = self.push_weight * push_loss
|
||||
|
||||
off_loss = 0
|
||||
tl_offs = [_tranpose_and_gather_feat(tl_off, gt_tl_ind) for tl_off in tl_offs]
|
||||
br_offs = [_tranpose_and_gather_feat(br_off, gt_br_ind) for br_off in br_offs]
|
||||
for tl_off, br_off in zip(tl_offs, br_offs):
|
||||
off_loss += self.off_loss(tl_off, gt_tl_off, gt_mask)
|
||||
off_loss += self.off_loss(br_off, gt_br_off, gt_mask)
|
||||
off_loss = self.off_weight * off_loss
|
||||
|
||||
loss = (focal_loss + pull_loss + push_loss + off_loss) / max(len(tl_heats), 1)
|
||||
return loss.unsqueeze(0)
|
||||
303
object_detection/core/models/py_utils/modules.py
Normal file
303
object_detection/core/models/py_utils/modules.py
Normal file
@@ -0,0 +1,303 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
from .utils import residual, upsample, merge, _decode
|
||||
|
||||
|
||||
def _make_layer(inp_dim, out_dim, modules):
|
||||
layers = [residual(inp_dim, out_dim)]
|
||||
layers += [residual(out_dim, out_dim) for _ in range(1, modules)]
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
|
||||
def _make_layer_revr(inp_dim, out_dim, modules):
|
||||
layers = [residual(inp_dim, inp_dim) for _ in range(modules - 1)]
|
||||
layers += [residual(inp_dim, out_dim)]
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
|
||||
def _make_pool_layer(dim):
|
||||
return nn.MaxPool2d(kernel_size=2, stride=2)
|
||||
|
||||
|
||||
def _make_unpool_layer(dim):
|
||||
return upsample(scale_factor=2)
|
||||
|
||||
|
||||
def _make_merge_layer(dim):
|
||||
return merge()
|
||||
|
||||
|
||||
class hg_module(nn.Module):
|
||||
def __init__(
|
||||
self, n, dims, modules, make_up_layer=_make_layer,
|
||||
make_pool_layer=_make_pool_layer, make_hg_layer=_make_layer,
|
||||
make_low_layer=_make_layer, make_hg_layer_revr=_make_layer_revr,
|
||||
make_unpool_layer=_make_unpool_layer, make_merge_layer=_make_merge_layer
|
||||
):
|
||||
super(hg_module, self).__init__()
|
||||
|
||||
curr_mod = modules[0]
|
||||
next_mod = modules[1]
|
||||
|
||||
curr_dim = dims[0]
|
||||
next_dim = dims[1]
|
||||
|
||||
self.n = n
|
||||
self.up1 = make_up_layer(curr_dim, curr_dim, curr_mod)
|
||||
self.max1 = make_pool_layer(curr_dim)
|
||||
self.low1 = make_hg_layer(curr_dim, next_dim, curr_mod)
|
||||
self.low2 = hg_module(
|
||||
n - 1, dims[1:], modules[1:],
|
||||
make_up_layer=make_up_layer,
|
||||
make_pool_layer=make_pool_layer,
|
||||
make_hg_layer=make_hg_layer,
|
||||
make_low_layer=make_low_layer,
|
||||
make_hg_layer_revr=make_hg_layer_revr,
|
||||
make_unpool_layer=make_unpool_layer,
|
||||
make_merge_layer=make_merge_layer
|
||||
) if n > 1 else make_low_layer(next_dim, next_dim, next_mod)
|
||||
self.low3 = make_hg_layer_revr(next_dim, curr_dim, curr_mod)
|
||||
self.up2 = make_unpool_layer(curr_dim)
|
||||
self.merg = make_merge_layer(curr_dim)
|
||||
|
||||
def forward(self, x):
|
||||
up1 = self.up1(x)
|
||||
max1 = self.max1(x)
|
||||
low1 = self.low1(max1)
|
||||
low2 = self.low2(low1)
|
||||
low3 = self.low3(low2)
|
||||
up2 = self.up2(low3)
|
||||
merg = self.merg(up1, up2)
|
||||
return merg
|
||||
|
||||
|
||||
class hg(nn.Module):
|
||||
def __init__(self, pre, hg_modules, cnvs, inters, cnvs_, inters_):
|
||||
super(hg, self).__init__()
|
||||
|
||||
self.pre = pre
|
||||
self.hgs = hg_modules
|
||||
self.cnvs = cnvs
|
||||
|
||||
self.inters = inters
|
||||
self.inters_ = inters_
|
||||
self.cnvs_ = cnvs_
|
||||
|
||||
def forward(self, x):
|
||||
inter = self.pre(x)
|
||||
|
||||
cnvs = []
|
||||
for ind, (hg_, cnv_) in enumerate(zip(self.hgs, self.cnvs)):
|
||||
hg = hg_(inter)
|
||||
cnv = cnv_(hg)
|
||||
cnvs.append(cnv)
|
||||
|
||||
if ind < len(self.hgs) - 1:
|
||||
inter = self.inters_[ind](inter) + self.cnvs_[ind](cnv)
|
||||
inter = nn.functional.relu_(inter)
|
||||
inter = self.inters[ind](inter)
|
||||
return cnvs
|
||||
|
||||
|
||||
class hg_net(nn.Module):
|
||||
def __init__(
|
||||
self, hg, tl_modules, br_modules, tl_heats, br_heats,
|
||||
tl_tags, br_tags, tl_offs, br_offs
|
||||
):
|
||||
super(hg_net, self).__init__()
|
||||
|
||||
self._decode = _decode
|
||||
|
||||
self.hg = hg
|
||||
|
||||
self.tl_modules = tl_modules
|
||||
self.br_modules = br_modules
|
||||
|
||||
self.tl_heats = tl_heats
|
||||
self.br_heats = br_heats
|
||||
|
||||
self.tl_tags = tl_tags
|
||||
self.br_tags = br_tags
|
||||
|
||||
self.tl_offs = tl_offs
|
||||
self.br_offs = br_offs
|
||||
|
||||
def _train(self, *xs):
|
||||
image = xs[0]
|
||||
cnvs = self.hg(image)
|
||||
|
||||
tl_modules = [tl_mod_(cnv) for tl_mod_, cnv in zip(self.tl_modules, cnvs)]
|
||||
br_modules = [br_mod_(cnv) for br_mod_, cnv in zip(self.br_modules, cnvs)]
|
||||
tl_heats = [tl_heat_(tl_mod) for tl_heat_, tl_mod in zip(self.tl_heats, tl_modules)]
|
||||
br_heats = [br_heat_(br_mod) for br_heat_, br_mod in zip(self.br_heats, br_modules)]
|
||||
tl_tags = [tl_tag_(tl_mod) for tl_tag_, tl_mod in zip(self.tl_tags, tl_modules)]
|
||||
br_tags = [br_tag_(br_mod) for br_tag_, br_mod in zip(self.br_tags, br_modules)]
|
||||
tl_offs = [tl_off_(tl_mod) for tl_off_, tl_mod in zip(self.tl_offs, tl_modules)]
|
||||
br_offs = [br_off_(br_mod) for br_off_, br_mod in zip(self.br_offs, br_modules)]
|
||||
return [tl_heats, br_heats, tl_tags, br_tags, tl_offs, br_offs]
|
||||
|
||||
def _test(self, *xs, **kwargs):
|
||||
image = xs[0]
|
||||
cnvs = self.hg(image)
|
||||
|
||||
tl_mod = self.tl_modules[-1](cnvs[-1])
|
||||
br_mod = self.br_modules[-1](cnvs[-1])
|
||||
|
||||
tl_heat, br_heat = self.tl_heats[-1](tl_mod), self.br_heats[-1](br_mod)
|
||||
tl_tag, br_tag = self.tl_tags[-1](tl_mod), self.br_tags[-1](br_mod)
|
||||
tl_off, br_off = self.tl_offs[-1](tl_mod), self.br_offs[-1](br_mod)
|
||||
|
||||
outs = [tl_heat, br_heat, tl_tag, br_tag, tl_off, br_off]
|
||||
return self._decode(*outs, **kwargs), tl_heat, br_heat, tl_tag, br_tag
|
||||
|
||||
def forward(self, *xs, test=False, **kwargs):
|
||||
if not test:
|
||||
return self._train(*xs, **kwargs)
|
||||
return self._test(*xs, **kwargs)
|
||||
|
||||
|
||||
class saccade_module(nn.Module):
|
||||
def __init__(
|
||||
self, n, dims, modules, make_up_layer=_make_layer,
|
||||
make_pool_layer=_make_pool_layer, make_hg_layer=_make_layer,
|
||||
make_low_layer=_make_layer, make_hg_layer_revr=_make_layer_revr,
|
||||
make_unpool_layer=_make_unpool_layer, make_merge_layer=_make_merge_layer
|
||||
):
|
||||
super(saccade_module, self).__init__()
|
||||
|
||||
curr_mod = modules[0]
|
||||
next_mod = modules[1]
|
||||
|
||||
curr_dim = dims[0]
|
||||
next_dim = dims[1]
|
||||
|
||||
self.n = n
|
||||
self.up1 = make_up_layer(curr_dim, curr_dim, curr_mod)
|
||||
self.max1 = make_pool_layer(curr_dim)
|
||||
self.low1 = make_hg_layer(curr_dim, next_dim, curr_mod)
|
||||
self.low2 = saccade_module(
|
||||
n - 1, dims[1:], modules[1:],
|
||||
make_up_layer=make_up_layer,
|
||||
make_pool_layer=make_pool_layer,
|
||||
make_hg_layer=make_hg_layer,
|
||||
make_low_layer=make_low_layer,
|
||||
make_hg_layer_revr=make_hg_layer_revr,
|
||||
make_unpool_layer=make_unpool_layer,
|
||||
make_merge_layer=make_merge_layer
|
||||
) if n > 1 else make_low_layer(next_dim, next_dim, next_mod)
|
||||
self.low3 = make_hg_layer_revr(next_dim, curr_dim, curr_mod)
|
||||
self.up2 = make_unpool_layer(curr_dim)
|
||||
self.merg = make_merge_layer(curr_dim)
|
||||
|
||||
def forward(self, x):
|
||||
up1 = self.up1(x)
|
||||
max1 = self.max1(x)
|
||||
low1 = self.low1(max1)
|
||||
if self.n > 1:
|
||||
low2, mergs = self.low2(low1)
|
||||
else:
|
||||
low2, mergs = self.low2(low1), []
|
||||
low3 = self.low3(low2)
|
||||
up2 = self.up2(low3)
|
||||
merg = self.merg(up1, up2)
|
||||
mergs.append(merg)
|
||||
return merg, mergs
|
||||
|
||||
|
||||
class saccade(nn.Module):
|
||||
def __init__(self, pre, hg_modules, cnvs, inters, cnvs_, inters_):
|
||||
super(saccade, self).__init__()
|
||||
|
||||
self.pre = pre
|
||||
self.hgs = hg_modules
|
||||
self.cnvs = cnvs
|
||||
|
||||
self.inters = inters
|
||||
self.inters_ = inters_
|
||||
self.cnvs_ = cnvs_
|
||||
|
||||
def forward(self, x):
|
||||
inter = self.pre(x)
|
||||
|
||||
cnvs = []
|
||||
atts = []
|
||||
for ind, (hg_, cnv_) in enumerate(zip(self.hgs, self.cnvs)):
|
||||
hg, ups = hg_(inter)
|
||||
cnv = cnv_(hg)
|
||||
cnvs.append(cnv)
|
||||
atts.append(ups)
|
||||
|
||||
if ind < len(self.hgs) - 1:
|
||||
inter = self.inters_[ind](inter) + self.cnvs_[ind](cnv)
|
||||
inter = nn.functional.relu_(inter)
|
||||
inter = self.inters[ind](inter)
|
||||
return cnvs, atts
|
||||
|
||||
|
||||
class saccade_net(nn.Module):
|
||||
def __init__(
|
||||
self, hg, tl_modules, br_modules, tl_heats, br_heats,
|
||||
tl_tags, br_tags, tl_offs, br_offs, att_modules, up_start=0
|
||||
):
|
||||
super(saccade_net, self).__init__()
|
||||
|
||||
self._decode = _decode
|
||||
|
||||
self.hg = hg
|
||||
|
||||
self.tl_modules = tl_modules
|
||||
self.br_modules = br_modules
|
||||
self.tl_heats = tl_heats
|
||||
self.br_heats = br_heats
|
||||
self.tl_tags = tl_tags
|
||||
self.br_tags = br_tags
|
||||
self.tl_offs = tl_offs
|
||||
self.br_offs = br_offs
|
||||
|
||||
self.att_modules = att_modules
|
||||
self.up_start = up_start
|
||||
|
||||
def _train(self, *xs):
|
||||
image = xs[0]
|
||||
|
||||
cnvs, ups = self.hg(image)
|
||||
ups = [up[self.up_start:] for up in ups]
|
||||
|
||||
tl_modules = [tl_mod_(cnv) for tl_mod_, cnv in zip(self.tl_modules, cnvs)]
|
||||
br_modules = [br_mod_(cnv) for br_mod_, cnv in zip(self.br_modules, cnvs)]
|
||||
tl_heats = [tl_heat_(tl_mod) for tl_heat_, tl_mod in zip(self.tl_heats, tl_modules)]
|
||||
br_heats = [br_heat_(br_mod) for br_heat_, br_mod in zip(self.br_heats, br_modules)]
|
||||
tl_tags = [tl_tag_(tl_mod) for tl_tag_, tl_mod in zip(self.tl_tags, tl_modules)]
|
||||
br_tags = [br_tag_(br_mod) for br_tag_, br_mod in zip(self.br_tags, br_modules)]
|
||||
tl_offs = [tl_off_(tl_mod) for tl_off_, tl_mod in zip(self.tl_offs, tl_modules)]
|
||||
br_offs = [br_off_(br_mod) for br_off_, br_mod in zip(self.br_offs, br_modules)]
|
||||
atts = [[att_mod_(u) for att_mod_, u in zip(att_mods, up)] for att_mods, up in zip(self.att_modules, ups)]
|
||||
return [tl_heats, br_heats, tl_tags, br_tags, tl_offs, br_offs, atts]
|
||||
|
||||
def _test(self, *xs, no_att=False, **kwargs):
|
||||
image = xs[0]
|
||||
cnvs, ups = self.hg(image)
|
||||
ups = [up[self.up_start:] for up in ups]
|
||||
|
||||
if not no_att:
|
||||
atts = [att_mod_(up) for att_mod_, up in zip(self.att_modules[-1], ups[-1])]
|
||||
atts = [torch.sigmoid(att) for att in atts]
|
||||
|
||||
tl_mod = self.tl_modules[-1](cnvs[-1])
|
||||
br_mod = self.br_modules[-1](cnvs[-1])
|
||||
|
||||
tl_heat, br_heat = self.tl_heats[-1](tl_mod), self.br_heats[-1](br_mod)
|
||||
tl_tag, br_tag = self.tl_tags[-1](tl_mod), self.br_tags[-1](br_mod)
|
||||
tl_off, br_off = self.tl_offs[-1](tl_mod), self.br_offs[-1](br_mod)
|
||||
|
||||
outs = [tl_heat, br_heat, tl_tag, br_tag, tl_off, br_off]
|
||||
if not no_att:
|
||||
return self._decode(*outs, **kwargs), atts
|
||||
else:
|
||||
return self._decode(*outs, **kwargs)
|
||||
|
||||
def forward(self, *xs, test=False, **kwargs):
|
||||
if not test:
|
||||
return self._train(*xs, **kwargs)
|
||||
return self._test(*xs, **kwargs)
|
||||
39
object_detection/core/models/py_utils/scatter_gather.py
Normal file
39
object_detection/core/models/py_utils/scatter_gather.py
Normal file
@@ -0,0 +1,39 @@
|
||||
import torch
|
||||
from torch.autograd import Variable
|
||||
from torch.nn.parallel._functions import Scatter
|
||||
|
||||
|
||||
def scatter(inputs, target_gpus, dim=0, chunk_sizes=None):
|
||||
r"""
|
||||
Slices variables into approximately equal chunks and
|
||||
distributes them across given GPUs. Duplicates
|
||||
references to objects that are not variables. Does not
|
||||
support Tensors.
|
||||
"""
|
||||
|
||||
def scatter_map(obj):
|
||||
if isinstance(obj, Variable):
|
||||
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
|
||||
assert not torch.is_tensor(obj), "Tensors not supported in scatter."
|
||||
if isinstance(obj, tuple):
|
||||
return list(zip(*map(scatter_map, obj)))
|
||||
if isinstance(obj, list):
|
||||
return list(map(list, zip(*map(scatter_map, obj))))
|
||||
if isinstance(obj, dict):
|
||||
return list(map(type(obj), zip(*map(scatter_map, obj.items()))))
|
||||
return [obj for targets in target_gpus]
|
||||
|
||||
return scatter_map(inputs)
|
||||
|
||||
|
||||
def scatter_kwargs(inputs, kwargs, target_gpus, dim=0, chunk_sizes=None):
|
||||
r"""Scatter with support for kwargs dictionary"""
|
||||
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
|
||||
kwargs = scatter(kwargs, target_gpus, dim, chunk_sizes) if kwargs else []
|
||||
if len(inputs) < len(kwargs):
|
||||
inputs.extend([() for _ in range(len(kwargs) - len(inputs))])
|
||||
elif len(kwargs) < len(inputs):
|
||||
kwargs.extend([{} for _ in range(len(inputs) - len(kwargs))])
|
||||
inputs = tuple(inputs)
|
||||
kwargs = tuple(kwargs)
|
||||
return inputs, kwargs
|
||||
236
object_detection/core/models/py_utils/utils.py
Normal file
236
object_detection/core/models/py_utils/utils.py
Normal file
@@ -0,0 +1,236 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
|
||||
def _gather_feat(feat, ind, mask=None):
|
||||
dim = feat.size(2)
|
||||
ind = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim)
|
||||
feat = feat.gather(1, ind)
|
||||
if mask is not None:
|
||||
mask = mask.unsqueeze(2).expand_as(feat)
|
||||
feat = feat[mask]
|
||||
feat = feat.view(-1, dim)
|
||||
return feat
|
||||
|
||||
|
||||
def _nms(heat, kernel=1):
|
||||
pad = (kernel - 1) // 2
|
||||
|
||||
hmax = nn.functional.max_pool2d(heat, (kernel, kernel), stride=1, padding=pad)
|
||||
keep = (hmax == heat).float()
|
||||
return heat * keep
|
||||
|
||||
|
||||
def _tranpose_and_gather_feat(feat, ind):
|
||||
feat = feat.permute(0, 2, 3, 1).contiguous()
|
||||
feat = feat.view(feat.size(0), -1, feat.size(3))
|
||||
feat = _gather_feat(feat, ind)
|
||||
return feat
|
||||
|
||||
|
||||
def _topk(scores, K=20):
|
||||
batch, cat, height, width = scores.size()
|
||||
|
||||
topk_scores, topk_inds = torch.topk(scores.view(batch, -1), K)
|
||||
|
||||
topk_clses = (topk_inds / (height * width)).int()
|
||||
|
||||
topk_inds = topk_inds % (height * width)
|
||||
topk_ys = (topk_inds / width).int().float()
|
||||
topk_xs = (topk_inds % width).int().float()
|
||||
return topk_scores, topk_inds, topk_clses, topk_ys, topk_xs
|
||||
|
||||
|
||||
def _decode(
|
||||
tl_heat, br_heat, tl_tag, br_tag, tl_regr, br_regr,
|
||||
K=100, kernel=1, ae_threshold=1, num_dets=1000, no_border=False
|
||||
):
|
||||
batch, cat, height, width = tl_heat.size()
|
||||
|
||||
tl_heat = torch.sigmoid(tl_heat)
|
||||
br_heat = torch.sigmoid(br_heat)
|
||||
|
||||
# perform nms on heatmaps
|
||||
tl_heat = _nms(tl_heat, kernel=kernel)
|
||||
br_heat = _nms(br_heat, kernel=kernel)
|
||||
|
||||
tl_scores, tl_inds, tl_clses, tl_ys, tl_xs = _topk(tl_heat, K=K)
|
||||
br_scores, br_inds, br_clses, br_ys, br_xs = _topk(br_heat, K=K)
|
||||
|
||||
tl_ys = tl_ys.view(batch, K, 1).expand(batch, K, K)
|
||||
tl_xs = tl_xs.view(batch, K, 1).expand(batch, K, K)
|
||||
br_ys = br_ys.view(batch, 1, K).expand(batch, K, K)
|
||||
br_xs = br_xs.view(batch, 1, K).expand(batch, K, K)
|
||||
|
||||
if no_border:
|
||||
tl_ys_binds = (tl_ys == 0)
|
||||
tl_xs_binds = (tl_xs == 0)
|
||||
br_ys_binds = (br_ys == height - 1)
|
||||
br_xs_binds = (br_xs == width - 1)
|
||||
|
||||
if tl_regr is not None and br_regr is not None:
|
||||
tl_regr = _tranpose_and_gather_feat(tl_regr, tl_inds)
|
||||
tl_regr = tl_regr.view(batch, K, 1, 2)
|
||||
br_regr = _tranpose_and_gather_feat(br_regr, br_inds)
|
||||
br_regr = br_regr.view(batch, 1, K, 2)
|
||||
|
||||
tl_xs = tl_xs + tl_regr[..., 0]
|
||||
tl_ys = tl_ys + tl_regr[..., 1]
|
||||
br_xs = br_xs + br_regr[..., 0]
|
||||
br_ys = br_ys + br_regr[..., 1]
|
||||
|
||||
# all possible boxes based on top k corners (ignoring class)
|
||||
bboxes = torch.stack((tl_xs, tl_ys, br_xs, br_ys), dim=3)
|
||||
|
||||
tl_tag = _tranpose_and_gather_feat(tl_tag, tl_inds)
|
||||
tl_tag = tl_tag.view(batch, K, 1)
|
||||
br_tag = _tranpose_and_gather_feat(br_tag, br_inds)
|
||||
br_tag = br_tag.view(batch, 1, K)
|
||||
dists = torch.abs(tl_tag - br_tag)
|
||||
|
||||
tl_scores = tl_scores.view(batch, K, 1).expand(batch, K, K)
|
||||
br_scores = br_scores.view(batch, 1, K).expand(batch, K, K)
|
||||
scores = (tl_scores + br_scores) / 2
|
||||
|
||||
# reject boxes based on classes
|
||||
tl_clses = tl_clses.view(batch, K, 1).expand(batch, K, K)
|
||||
br_clses = br_clses.view(batch, 1, K).expand(batch, K, K)
|
||||
cls_inds = (tl_clses != br_clses)
|
||||
|
||||
# reject boxes based on distances
|
||||
dist_inds = (dists > ae_threshold)
|
||||
|
||||
# reject boxes based on widths and heights
|
||||
width_inds = (br_xs < tl_xs)
|
||||
height_inds = (br_ys < tl_ys)
|
||||
|
||||
if no_border:
|
||||
scores[tl_ys_binds] = -1
|
||||
scores[tl_xs_binds] = -1
|
||||
scores[br_ys_binds] = -1
|
||||
scores[br_xs_binds] = -1
|
||||
|
||||
scores[cls_inds] = -1
|
||||
scores[dist_inds] = -1
|
||||
scores[width_inds] = -1
|
||||
scores[height_inds] = -1
|
||||
|
||||
scores = scores.view(batch, -1)
|
||||
scores, inds = torch.topk(scores, num_dets)
|
||||
scores = scores.unsqueeze(2)
|
||||
|
||||
bboxes = bboxes.view(batch, -1, 4)
|
||||
bboxes = _gather_feat(bboxes, inds)
|
||||
|
||||
clses = tl_clses.contiguous().view(batch, -1, 1)
|
||||
clses = _gather_feat(clses, inds).float()
|
||||
|
||||
tl_scores = tl_scores.contiguous().view(batch, -1, 1)
|
||||
tl_scores = _gather_feat(tl_scores, inds).float()
|
||||
br_scores = br_scores.contiguous().view(batch, -1, 1)
|
||||
br_scores = _gather_feat(br_scores, inds).float()
|
||||
|
||||
detections = torch.cat([bboxes, scores, tl_scores, br_scores, clses], dim=2)
|
||||
return detections
|
||||
|
||||
|
||||
class upsample(nn.Module):
|
||||
def __init__(self, scale_factor):
|
||||
super(upsample, self).__init__()
|
||||
self.scale_factor = scale_factor
|
||||
|
||||
def forward(self, x):
|
||||
return nn.functional.interpolate(x, scale_factor=self.scale_factor)
|
||||
|
||||
|
||||
class merge(nn.Module):
|
||||
def forward(self, x, y):
|
||||
return x + y
|
||||
|
||||
|
||||
class convolution(nn.Module):
|
||||
def __init__(self, k, inp_dim, out_dim, stride=1, with_bn=True):
|
||||
super(convolution, self).__init__()
|
||||
|
||||
pad = (k - 1) // 2
|
||||
self.conv = nn.Conv2d(inp_dim, out_dim, (k, k), padding=(pad, pad), stride=(stride, stride), bias=not with_bn)
|
||||
self.bn = nn.BatchNorm2d(out_dim) if with_bn else nn.Sequential()
|
||||
self.relu = nn.ReLU(inplace=True)
|
||||
|
||||
def forward(self, x):
|
||||
conv = self.conv(x)
|
||||
bn = self.bn(conv)
|
||||
relu = self.relu(bn)
|
||||
return relu
|
||||
|
||||
|
||||
class residual(nn.Module):
|
||||
def __init__(self, inp_dim, out_dim, k=3, stride=1):
|
||||
super(residual, self).__init__()
|
||||
p = (k - 1) // 2
|
||||
|
||||
self.conv1 = nn.Conv2d(inp_dim, out_dim, (k, k), padding=(p, p), stride=(stride, stride), bias=False)
|
||||
self.bn1 = nn.BatchNorm2d(out_dim)
|
||||
self.relu1 = nn.ReLU(inplace=True)
|
||||
|
||||
self.conv2 = nn.Conv2d(out_dim, out_dim, (k, k), padding=(p, p), bias=False)
|
||||
self.bn2 = nn.BatchNorm2d(out_dim)
|
||||
|
||||
self.skip = nn.Sequential(
|
||||
nn.Conv2d(inp_dim, out_dim, (1, 1), stride=(stride, stride), bias=False),
|
||||
nn.BatchNorm2d(out_dim)
|
||||
) if stride != 1 or inp_dim != out_dim else nn.Sequential()
|
||||
self.relu = nn.ReLU(inplace=True)
|
||||
|
||||
def forward(self, x):
|
||||
conv1 = self.conv1(x)
|
||||
bn1 = self.bn1(conv1)
|
||||
relu1 = self.relu1(bn1)
|
||||
|
||||
conv2 = self.conv2(relu1)
|
||||
bn2 = self.bn2(conv2)
|
||||
|
||||
skip = self.skip(x)
|
||||
return self.relu(bn2 + skip)
|
||||
|
||||
|
||||
class corner_pool(nn.Module):
|
||||
def __init__(self, dim, pool1, pool2):
|
||||
super(corner_pool, self).__init__()
|
||||
self._init_layers(dim, pool1, pool2)
|
||||
|
||||
def _init_layers(self, dim, pool1, pool2):
|
||||
self.p1_conv1 = convolution(3, dim, 128)
|
||||
self.p2_conv1 = convolution(3, dim, 128)
|
||||
|
||||
self.p_conv1 = nn.Conv2d(128, dim, (3, 3), padding=(1, 1), bias=False)
|
||||
self.p_bn1 = nn.BatchNorm2d(dim)
|
||||
|
||||
self.conv1 = nn.Conv2d(dim, dim, (1, 1), bias=False)
|
||||
self.bn1 = nn.BatchNorm2d(dim)
|
||||
self.relu1 = nn.ReLU(inplace=True)
|
||||
|
||||
self.conv2 = convolution(3, dim, dim)
|
||||
|
||||
self.pool1 = pool1()
|
||||
self.pool2 = pool2()
|
||||
|
||||
def forward(self, x):
|
||||
# pool 1
|
||||
p1_conv1 = self.p1_conv1(x)
|
||||
pool1 = self.pool1(p1_conv1)
|
||||
|
||||
# pool 2
|
||||
p2_conv1 = self.p2_conv1(x)
|
||||
pool2 = self.pool2(p2_conv1)
|
||||
|
||||
# pool 1 + pool 2
|
||||
p_conv1 = self.p_conv1(pool1 + pool2)
|
||||
p_bn1 = self.p_bn1(p_conv1)
|
||||
|
||||
conv1 = self.conv1(x)
|
||||
bn1 = self.bn1(conv1)
|
||||
relu1 = self.relu1(p_bn1 + bn1)
|
||||
|
||||
conv2 = self.conv2(relu1)
|
||||
return conv2
|
||||
0
object_detection/core/nnet/__init__.py
Normal file
0
object_detection/core/nnet/__init__.py
Normal file
137
object_detection/core/nnet/py_factory.py
Normal file
137
object_detection/core/nnet/py_factory.py
Normal file
@@ -0,0 +1,137 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
from ..models.py_utils.data_parallel import DataParallel
|
||||
|
||||
torch.manual_seed(317)
|
||||
|
||||
|
||||
class Network(nn.Module):
|
||||
def __init__(self, model, loss):
|
||||
super(Network, self).__init__()
|
||||
|
||||
self.model = model
|
||||
self.loss = loss
|
||||
|
||||
def forward(self, xs, ys, **kwargs):
|
||||
preds = self.model(*xs, **kwargs)
|
||||
loss = self.loss(preds, ys, **kwargs)
|
||||
return loss
|
||||
|
||||
|
||||
# for model backward compatibility
|
||||
# previously model was wrapped by DataParallel module
|
||||
class DummyModule(nn.Module):
|
||||
def __init__(self, model):
|
||||
super(DummyModule, self).__init__()
|
||||
self.module = model
|
||||
|
||||
def forward(self, *xs, **kwargs):
|
||||
return self.module(*xs, **kwargs)
|
||||
|
||||
|
||||
class NetworkFactory(object):
|
||||
def __init__(self, system_config, model, distributed=False, gpu=None):
|
||||
super(NetworkFactory, self).__init__()
|
||||
|
||||
self.system_config = system_config
|
||||
|
||||
self.gpu = gpu
|
||||
self.model = DummyModule(model)
|
||||
self.loss = model.loss
|
||||
self.network = Network(self.model, self.loss)
|
||||
|
||||
if distributed:
|
||||
from apex.parallel import DistributedDataParallel, convert_syncbn_model
|
||||
torch.cuda.set_device(gpu)
|
||||
self.network = self.network.cuda(gpu)
|
||||
self.network = convert_syncbn_model(self.network)
|
||||
self.network = DistributedDataParallel(self.network)
|
||||
else:
|
||||
self.network = DataParallel(self.network, chunk_sizes=system_config.chunk_sizes)
|
||||
|
||||
total_params = 0
|
||||
for params in self.model.parameters():
|
||||
num_params = 1
|
||||
for x in params.size():
|
||||
num_params *= x
|
||||
total_params += num_params
|
||||
print("total parameters: {}".format(total_params))
|
||||
|
||||
if system_config.opt_algo == "adam":
|
||||
self.optimizer = torch.optim.Adam(
|
||||
filter(lambda p: p.requires_grad, self.model.parameters())
|
||||
)
|
||||
elif system_config.opt_algo == "sgd":
|
||||
self.optimizer = torch.optim.SGD(
|
||||
filter(lambda p: p.requires_grad, self.model.parameters()),
|
||||
lr=system_config.learning_rate,
|
||||
momentum=0.9, weight_decay=0.0001
|
||||
)
|
||||
else:
|
||||
raise ValueError("unknown optimizer")
|
||||
|
||||
def cuda(self):
|
||||
self.model.cuda()
|
||||
|
||||
def train_mode(self):
|
||||
self.network.train()
|
||||
|
||||
def eval_mode(self):
|
||||
self.network.eval()
|
||||
|
||||
def _t_cuda(self, xs):
|
||||
if type(xs) is list:
|
||||
return [x.cuda(self.gpu, non_blocking=True) for x in xs]
|
||||
return xs.cuda(self.gpu, non_blocking=True)
|
||||
|
||||
def train(self, xs, ys, **kwargs):
|
||||
xs = [self._t_cuda(x) for x in xs]
|
||||
ys = [self._t_cuda(y) for y in ys]
|
||||
|
||||
self.optimizer.zero_grad()
|
||||
loss = self.network(xs, ys)
|
||||
loss = loss.mean()
|
||||
loss.backward()
|
||||
self.optimizer.step()
|
||||
|
||||
return loss
|
||||
|
||||
def validate(self, xs, ys, **kwargs):
|
||||
with torch.no_grad():
|
||||
xs = [self._t_cuda(x) for x in xs]
|
||||
ys = [self._t_cuda(y) for y in ys]
|
||||
|
||||
loss = self.network(xs, ys)
|
||||
loss = loss.mean()
|
||||
return loss
|
||||
|
||||
def test(self, xs, **kwargs):
|
||||
with torch.no_grad():
|
||||
xs = [self._t_cuda(x) for x in xs]
|
||||
return self.model(*xs, **kwargs)
|
||||
|
||||
def set_lr(self, lr):
|
||||
print("setting learning rate to: {}".format(lr))
|
||||
for param_group in self.optimizer.param_groups:
|
||||
param_group["lr"] = lr
|
||||
|
||||
def load_pretrained_params(self, pretrained_model):
|
||||
print("loading from {}".format(pretrained_model))
|
||||
with open(pretrained_model, "rb") as f:
|
||||
params = torch.load(f)
|
||||
self.model.load_state_dict(params)
|
||||
|
||||
def load_params(self, iteration):
|
||||
cache_file = self.system_config.snapshot_file.format(iteration)
|
||||
print("loading model from {}".format(cache_file))
|
||||
with open(cache_file, "rb") as f:
|
||||
params = torch.load(f)
|
||||
self.model.load_state_dict(params)
|
||||
|
||||
def save_params(self, iteration):
|
||||
cache_file = self.system_config.snapshot_file.format(iteration)
|
||||
print("saving model to {}".format(cache_file))
|
||||
with open(cache_file, "wb") as f:
|
||||
params = self.model.state_dict()
|
||||
torch.save(params, f)
|
||||
8
object_detection/core/paths.py
Normal file
8
object_detection/core/paths.py
Normal file
@@ -0,0 +1,8 @@
|
||||
import pkg_resources
|
||||
|
||||
_package_name = __name__
|
||||
|
||||
|
||||
def get_file_path(*paths):
|
||||
path = "/".join(paths)
|
||||
return pkg_resources.resource_filename(_package_name, path)
|
||||
5
object_detection/core/sample/__init__.py
Normal file
5
object_detection/core/sample/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
from .cornernet import cornernet
|
||||
from .cornernet_saccade import cornernet_saccade
|
||||
|
||||
def data_sampling_func(sys_configs, db, k_ind, data_aug=True, debug=False):
|
||||
return globals()[sys_configs.sampling_function](sys_configs, db, k_ind, data_aug, debug)
|
||||
164
object_detection/core/sample/cornernet.py
Normal file
164
object_detection/core/sample/cornernet.py
Normal file
@@ -0,0 +1,164 @@
|
||||
import math
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from .utils import random_crop, draw_gaussian, gaussian_radius, normalize_, color_jittering_, lighting_
|
||||
|
||||
|
||||
def _resize_image(image, detections, size):
|
||||
detections = detections.copy()
|
||||
height, width = image.shape[0:2]
|
||||
new_height, new_width = size
|
||||
|
||||
image = cv2.resize(image, (new_width, new_height))
|
||||
|
||||
height_ratio = new_height / height
|
||||
width_ratio = new_width / width
|
||||
detections[:, 0:4:2] *= width_ratio
|
||||
detections[:, 1:4:2] *= height_ratio
|
||||
return image, detections
|
||||
|
||||
|
||||
def _clip_detections(image, detections):
|
||||
detections = detections.copy()
|
||||
height, width = image.shape[0:2]
|
||||
|
||||
detections[:, 0:4:2] = np.clip(detections[:, 0:4:2], 0, width - 1)
|
||||
detections[:, 1:4:2] = np.clip(detections[:, 1:4:2], 0, height - 1)
|
||||
keep_inds = ((detections[:, 2] - detections[:, 0]) > 0) & \
|
||||
((detections[:, 3] - detections[:, 1]) > 0)
|
||||
detections = detections[keep_inds]
|
||||
return detections
|
||||
|
||||
|
||||
def cornernet(system_configs, db, k_ind, data_aug, debug):
|
||||
data_rng = system_configs.data_rng
|
||||
batch_size = system_configs.batch_size
|
||||
|
||||
categories = db.configs["categories"]
|
||||
input_size = db.configs["input_size"]
|
||||
output_size = db.configs["output_sizes"][0]
|
||||
|
||||
border = db.configs["border"]
|
||||
lighting = db.configs["lighting"]
|
||||
rand_crop = db.configs["rand_crop"]
|
||||
rand_color = db.configs["rand_color"]
|
||||
rand_scales = db.configs["rand_scales"]
|
||||
gaussian_bump = db.configs["gaussian_bump"]
|
||||
gaussian_iou = db.configs["gaussian_iou"]
|
||||
gaussian_rad = db.configs["gaussian_radius"]
|
||||
|
||||
max_tag_len = 128
|
||||
|
||||
# allocating memory
|
||||
images = np.zeros((batch_size, 3, input_size[0], input_size[1]), dtype=np.float32)
|
||||
tl_heatmaps = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
|
||||
br_heatmaps = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
|
||||
tl_regrs = np.zeros((batch_size, max_tag_len, 2), dtype=np.float32)
|
||||
br_regrs = np.zeros((batch_size, max_tag_len, 2), dtype=np.float32)
|
||||
tl_tags = np.zeros((batch_size, max_tag_len), dtype=np.int64)
|
||||
br_tags = np.zeros((batch_size, max_tag_len), dtype=np.int64)
|
||||
tag_masks = np.zeros((batch_size, max_tag_len), dtype=np.uint8)
|
||||
tag_lens = np.zeros((batch_size,), dtype=np.int32)
|
||||
|
||||
db_size = db.db_inds.size
|
||||
for b_ind in range(batch_size):
|
||||
if not debug and k_ind == 0:
|
||||
db.shuffle_inds()
|
||||
|
||||
db_ind = db.db_inds[k_ind]
|
||||
k_ind = (k_ind + 1) % db_size
|
||||
|
||||
# reading image
|
||||
image_path = db.image_path(db_ind)
|
||||
image = cv2.imread(image_path)
|
||||
|
||||
# reading detections
|
||||
detections = db.detections(db_ind)
|
||||
|
||||
# cropping an image randomly
|
||||
if not debug and rand_crop:
|
||||
image, detections = random_crop(image, detections, rand_scales, input_size, border=border)
|
||||
|
||||
image, detections = _resize_image(image, detections, input_size)
|
||||
detections = _clip_detections(image, detections)
|
||||
|
||||
width_ratio = output_size[1] / input_size[1]
|
||||
height_ratio = output_size[0] / input_size[0]
|
||||
|
||||
# flipping an image randomly
|
||||
if not debug and np.random.uniform() > 0.5:
|
||||
image[:] = image[:, ::-1, :]
|
||||
width = image.shape[1]
|
||||
detections[:, [0, 2]] = width - detections[:, [2, 0]] - 1
|
||||
|
||||
if not debug:
|
||||
image = image.astype(np.float32) / 255.
|
||||
if rand_color:
|
||||
color_jittering_(data_rng, image)
|
||||
if lighting:
|
||||
lighting_(data_rng, image, 0.1, db.eig_val, db.eig_vec)
|
||||
normalize_(image, db.mean, db.std)
|
||||
images[b_ind] = image.transpose((2, 0, 1))
|
||||
|
||||
for ind, detection in enumerate(detections):
|
||||
category = int(detection[-1]) - 1
|
||||
|
||||
xtl, ytl = detection[0], detection[1]
|
||||
xbr, ybr = detection[2], detection[3]
|
||||
|
||||
fxtl = (xtl * width_ratio)
|
||||
fytl = (ytl * height_ratio)
|
||||
fxbr = (xbr * width_ratio)
|
||||
fybr = (ybr * height_ratio)
|
||||
|
||||
xtl = int(fxtl)
|
||||
ytl = int(fytl)
|
||||
xbr = int(fxbr)
|
||||
ybr = int(fybr)
|
||||
|
||||
if gaussian_bump:
|
||||
width = detection[2] - detection[0]
|
||||
height = detection[3] - detection[1]
|
||||
|
||||
width = math.ceil(width * width_ratio)
|
||||
height = math.ceil(height * height_ratio)
|
||||
|
||||
if gaussian_rad == -1:
|
||||
radius = gaussian_radius((height, width), gaussian_iou)
|
||||
radius = max(0, int(radius))
|
||||
else:
|
||||
radius = gaussian_rad
|
||||
|
||||
draw_gaussian(tl_heatmaps[b_ind, category], [xtl, ytl], radius)
|
||||
draw_gaussian(br_heatmaps[b_ind, category], [xbr, ybr], radius)
|
||||
else:
|
||||
tl_heatmaps[b_ind, category, ytl, xtl] = 1
|
||||
br_heatmaps[b_ind, category, ybr, xbr] = 1
|
||||
|
||||
tag_ind = tag_lens[b_ind]
|
||||
tl_regrs[b_ind, tag_ind, :] = [fxtl - xtl, fytl - ytl]
|
||||
br_regrs[b_ind, tag_ind, :] = [fxbr - xbr, fybr - ybr]
|
||||
tl_tags[b_ind, tag_ind] = ytl * output_size[1] + xtl
|
||||
br_tags[b_ind, tag_ind] = ybr * output_size[1] + xbr
|
||||
tag_lens[b_ind] += 1
|
||||
|
||||
for b_ind in range(batch_size):
|
||||
tag_len = tag_lens[b_ind]
|
||||
tag_masks[b_ind, :tag_len] = 1
|
||||
|
||||
images = torch.from_numpy(images)
|
||||
tl_heatmaps = torch.from_numpy(tl_heatmaps)
|
||||
br_heatmaps = torch.from_numpy(br_heatmaps)
|
||||
tl_regrs = torch.from_numpy(tl_regrs)
|
||||
br_regrs = torch.from_numpy(br_regrs)
|
||||
tl_tags = torch.from_numpy(tl_tags)
|
||||
br_tags = torch.from_numpy(br_tags)
|
||||
tag_masks = torch.from_numpy(tag_masks)
|
||||
|
||||
return {
|
||||
"xs": [images],
|
||||
"ys": [tl_heatmaps, br_heatmaps, tag_masks, tl_regrs, br_regrs, tl_tags, br_tags]
|
||||
}, k_ind
|
||||
293
object_detection/core/sample/cornernet_saccade.py
Normal file
293
object_detection/core/sample/cornernet_saccade.py
Normal file
@@ -0,0 +1,293 @@
|
||||
import math
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from .utils import draw_gaussian, gaussian_radius, normalize_, color_jittering_, lighting_, crop_image
|
||||
|
||||
|
||||
def bbox_overlaps(a_dets, b_dets):
|
||||
a_widths = a_dets[:, 2] - a_dets[:, 0]
|
||||
a_heights = a_dets[:, 3] - a_dets[:, 1]
|
||||
a_areas = a_widths * a_heights
|
||||
|
||||
b_widths = b_dets[:, 2] - b_dets[:, 0]
|
||||
b_heights = b_dets[:, 3] - b_dets[:, 1]
|
||||
b_areas = b_widths * b_heights
|
||||
|
||||
return a_areas / b_areas
|
||||
|
||||
|
||||
def clip_detections(border, detections):
|
||||
detections = detections.copy()
|
||||
|
||||
y0, y1, x0, x1 = border
|
||||
det_xs = detections[:, 0:4:2]
|
||||
det_ys = detections[:, 1:4:2]
|
||||
np.clip(det_xs, x0, x1 - 1, out=det_xs)
|
||||
np.clip(det_ys, y0, y1 - 1, out=det_ys)
|
||||
|
||||
keep_inds = ((det_xs[:, 1] - det_xs[:, 0]) > 0) & \
|
||||
((det_ys[:, 1] - det_ys[:, 0]) > 0)
|
||||
keep_inds = np.where(keep_inds)[0]
|
||||
return detections[keep_inds], keep_inds
|
||||
|
||||
|
||||
def crop_image_dets(image, dets, ind, input_size, output_size=None, random_crop=True, rand_center=True):
|
||||
if ind is not None:
|
||||
det_x0, det_y0, det_x1, det_y1 = dets[ind, 0:4]
|
||||
else:
|
||||
det_x0, det_y0, det_x1, det_y1 = None, None, None, None
|
||||
|
||||
input_height, input_width = input_size
|
||||
image_height, image_width = image.shape[0:2]
|
||||
|
||||
centered = rand_center and np.random.uniform() > 0.5
|
||||
if not random_crop or image_width <= input_width:
|
||||
xc = image_width // 2
|
||||
elif ind is None or not centered:
|
||||
xmin = max(det_x1 - input_width, 0) if ind is not None else 0
|
||||
xmax = min(image_width - input_width, det_x0) if ind is not None else image_width - input_width
|
||||
xrand = np.random.randint(int(xmin), int(xmax) + 1)
|
||||
xc = xrand + input_width // 2
|
||||
else:
|
||||
xmin = max((det_x0 + det_x1) // 2 - np.random.randint(0, 15), 0)
|
||||
xmax = min((det_x0 + det_x1) // 2 + np.random.randint(0, 15), image_width - 1)
|
||||
xc = np.random.randint(int(xmin), int(xmax) + 1)
|
||||
|
||||
if not random_crop or image_height <= input_height:
|
||||
yc = image_height // 2
|
||||
elif ind is None or not centered:
|
||||
ymin = max(det_y1 - input_height, 0) if ind is not None else 0
|
||||
ymax = min(image_height - input_height, det_y0) if ind is not None else image_height - input_height
|
||||
yrand = np.random.randint(int(ymin), int(ymax) + 1)
|
||||
yc = yrand + input_height // 2
|
||||
else:
|
||||
ymin = max((det_y0 + det_y1) // 2 - np.random.randint(0, 15), 0)
|
||||
ymax = min((det_y0 + det_y1) // 2 + np.random.randint(0, 15), image_height - 1)
|
||||
yc = np.random.randint(int(ymin), int(ymax) + 1)
|
||||
|
||||
image, border, offset = crop_image(image, [yc, xc], input_size, output_size=output_size)
|
||||
dets[:, 0:4:2] -= offset[1]
|
||||
dets[:, 1:4:2] -= offset[0]
|
||||
return image, dets, border
|
||||
|
||||
|
||||
def scale_image_detections(image, dets, scale):
|
||||
height, width = image.shape[0:2]
|
||||
|
||||
new_height = int(height * scale)
|
||||
new_width = int(width * scale)
|
||||
|
||||
image = cv2.resize(image, (new_width, new_height))
|
||||
dets = dets.copy()
|
||||
dets[:, 0:4] *= scale
|
||||
return image, dets
|
||||
|
||||
|
||||
def ref_scale(detections, random_crop=False):
|
||||
if detections.shape[0] == 0:
|
||||
return None, None
|
||||
|
||||
if random_crop and np.random.uniform() > 0.7:
|
||||
return None, None
|
||||
|
||||
ref_ind = np.random.randint(detections.shape[0])
|
||||
ref_det = detections[ref_ind].copy()
|
||||
ref_h = ref_det[3] - ref_det[1]
|
||||
ref_w = ref_det[2] - ref_det[0]
|
||||
ref_hw = max(ref_h, ref_w)
|
||||
|
||||
if ref_hw > 96:
|
||||
return np.random.randint(low=96, high=255) / ref_hw, ref_ind
|
||||
elif ref_hw > 32:
|
||||
return np.random.randint(low=32, high=97) / ref_hw, ref_ind
|
||||
return np.random.randint(low=16, high=33) / ref_hw, ref_ind
|
||||
|
||||
|
||||
def create_attention_mask(atts, ratios, sizes, detections):
|
||||
for det in detections:
|
||||
width = det[2] - det[0]
|
||||
height = det[3] - det[1]
|
||||
|
||||
max_hw = max(width, height)
|
||||
for att, ratio, size in zip(atts, ratios, sizes):
|
||||
if max_hw >= size[0] and max_hw <= size[1]:
|
||||
x = (det[0] + det[2]) / 2
|
||||
y = (det[1] + det[3]) / 2
|
||||
x = (x / ratio).astype(np.int32)
|
||||
y = (y / ratio).astype(np.int32)
|
||||
att[y, x] = 1
|
||||
|
||||
|
||||
def cornernet_saccade(system_configs, db, k_ind, data_aug, debug):
|
||||
data_rng = system_configs.data_rng
|
||||
batch_size = system_configs.batch_size
|
||||
|
||||
categories = db.configs["categories"]
|
||||
input_size = db.configs["input_size"]
|
||||
output_size = db.configs["output_sizes"][0]
|
||||
rand_scales = db.configs["rand_scales"]
|
||||
rand_crop = db.configs["rand_crop"]
|
||||
rand_center = db.configs["rand_center"]
|
||||
view_sizes = db.configs["view_sizes"]
|
||||
|
||||
gaussian_iou = db.configs["gaussian_iou"]
|
||||
gaussian_rad = db.configs["gaussian_radius"]
|
||||
|
||||
att_ratios = db.configs["att_ratios"]
|
||||
att_ranges = db.configs["att_ranges"]
|
||||
att_sizes = db.configs["att_sizes"]
|
||||
|
||||
min_scale = db.configs["min_scale"]
|
||||
max_scale = db.configs["max_scale"]
|
||||
max_objects = 128
|
||||
|
||||
images = np.zeros((batch_size, 3, input_size[0], input_size[1]), dtype=np.float32)
|
||||
tl_heats = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
|
||||
br_heats = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
|
||||
tl_valids = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
|
||||
br_valids = np.zeros((batch_size, categories, output_size[0], output_size[1]), dtype=np.float32)
|
||||
tl_regrs = np.zeros((batch_size, max_objects, 2), dtype=np.float32)
|
||||
br_regrs = np.zeros((batch_size, max_objects, 2), dtype=np.float32)
|
||||
tl_tags = np.zeros((batch_size, max_objects), dtype=np.int64)
|
||||
br_tags = np.zeros((batch_size, max_objects), dtype=np.int64)
|
||||
tag_masks = np.zeros((batch_size, max_objects), dtype=np.uint8)
|
||||
tag_lens = np.zeros((batch_size,), dtype=np.int32)
|
||||
attentions = [np.zeros((batch_size, 1, att_size[0], att_size[1]), dtype=np.float32) for att_size in att_sizes]
|
||||
|
||||
db_size = db.db_inds.size
|
||||
for b_ind in range(batch_size):
|
||||
if not debug and k_ind == 0:
|
||||
# if k_ind == 0:
|
||||
db.shuffle_inds()
|
||||
|
||||
db_ind = db.db_inds[k_ind]
|
||||
k_ind = (k_ind + 1) % db_size
|
||||
|
||||
image_path = db.image_path(db_ind)
|
||||
image = cv2.imread(image_path)
|
||||
|
||||
orig_detections = db.detections(db_ind)
|
||||
keep_inds = np.arange(orig_detections.shape[0])
|
||||
|
||||
# clip the detections
|
||||
detections = orig_detections.copy()
|
||||
border = [0, image.shape[0], 0, image.shape[1]]
|
||||
detections, clip_inds = clip_detections(border, detections)
|
||||
keep_inds = keep_inds[clip_inds]
|
||||
|
||||
scale, ref_ind = ref_scale(detections, random_crop=rand_crop)
|
||||
scale = np.random.choice(rand_scales) if scale is None else scale
|
||||
|
||||
orig_detections[:, 0:4:2] *= scale
|
||||
orig_detections[:, 1:4:2] *= scale
|
||||
|
||||
image, detections = scale_image_detections(image, detections, scale)
|
||||
ref_detection = detections[ref_ind].copy()
|
||||
|
||||
image, detections, border = crop_image_dets(image, detections, ref_ind, input_size, rand_center=rand_center)
|
||||
|
||||
detections, clip_inds = clip_detections(border, detections)
|
||||
keep_inds = keep_inds[clip_inds]
|
||||
|
||||
width_ratio = output_size[1] / input_size[1]
|
||||
height_ratio = output_size[0] / input_size[0]
|
||||
|
||||
# flipping an image randomly
|
||||
if not debug and np.random.uniform() > 0.5:
|
||||
image[:] = image[:, ::-1, :]
|
||||
width = image.shape[1]
|
||||
detections[:, [0, 2]] = width - detections[:, [2, 0]] - 1
|
||||
create_attention_mask([att[b_ind, 0] for att in attentions], att_ratios, att_ranges, detections)
|
||||
|
||||
if debug:
|
||||
dimage = image.copy()
|
||||
for det in detections.astype(np.int32):
|
||||
cv2.rectangle(dimage,
|
||||
(det[0], det[1]),
|
||||
(det[2], det[3]),
|
||||
(0, 255, 0), 2
|
||||
)
|
||||
cv2.imwrite('debug/{:03d}.jpg'.format(b_ind), dimage)
|
||||
overlaps = bbox_overlaps(detections, orig_detections[keep_inds]) > 0.5
|
||||
|
||||
if not debug:
|
||||
image = image.astype(np.float32) / 255.
|
||||
color_jittering_(data_rng, image)
|
||||
lighting_(data_rng, image, 0.1, db.eig_val, db.eig_vec)
|
||||
normalize_(image, db.mean, db.std)
|
||||
images[b_ind] = image.transpose((2, 0, 1))
|
||||
|
||||
for ind, (detection, overlap) in enumerate(zip(detections, overlaps)):
|
||||
category = int(detection[-1]) - 1
|
||||
|
||||
xtl, ytl = detection[0], detection[1]
|
||||
xbr, ybr = detection[2], detection[3]
|
||||
|
||||
det_height = int(ybr) - int(ytl)
|
||||
det_width = int(xbr) - int(xtl)
|
||||
det_max = max(det_height, det_width)
|
||||
|
||||
valid = det_max >= min_scale
|
||||
|
||||
fxtl = (xtl * width_ratio)
|
||||
fytl = (ytl * height_ratio)
|
||||
fxbr = (xbr * width_ratio)
|
||||
fybr = (ybr * height_ratio)
|
||||
|
||||
xtl = int(fxtl)
|
||||
ytl = int(fytl)
|
||||
xbr = int(fxbr)
|
||||
ybr = int(fybr)
|
||||
|
||||
width = detection[2] - detection[0]
|
||||
height = detection[3] - detection[1]
|
||||
|
||||
width = math.ceil(width * width_ratio)
|
||||
height = math.ceil(height * height_ratio)
|
||||
|
||||
if gaussian_rad == -1:
|
||||
radius = gaussian_radius((height, width), gaussian_iou)
|
||||
radius = max(0, int(radius))
|
||||
else:
|
||||
radius = gaussian_rad
|
||||
|
||||
if overlap and valid:
|
||||
draw_gaussian(tl_heats[b_ind, category], [xtl, ytl], radius)
|
||||
draw_gaussian(br_heats[b_ind, category], [xbr, ybr], radius)
|
||||
|
||||
tag_ind = tag_lens[b_ind]
|
||||
tl_regrs[b_ind, tag_ind, :] = [fxtl - xtl, fytl - ytl]
|
||||
br_regrs[b_ind, tag_ind, :] = [fxbr - xbr, fybr - ybr]
|
||||
tl_tags[b_ind, tag_ind] = ytl * output_size[1] + xtl
|
||||
br_tags[b_ind, tag_ind] = ybr * output_size[1] + xbr
|
||||
tag_lens[b_ind] += 1
|
||||
else:
|
||||
draw_gaussian(tl_valids[b_ind, category], [xtl, ytl], radius)
|
||||
draw_gaussian(br_valids[b_ind, category], [xbr, ybr], radius)
|
||||
|
||||
tl_valids = (tl_valids == 0).astype(np.float32)
|
||||
br_valids = (br_valids == 0).astype(np.float32)
|
||||
|
||||
for b_ind in range(batch_size):
|
||||
tag_len = tag_lens[b_ind]
|
||||
tag_masks[b_ind, :tag_len] = 1
|
||||
|
||||
images = torch.from_numpy(images)
|
||||
tl_heats = torch.from_numpy(tl_heats)
|
||||
br_heats = torch.from_numpy(br_heats)
|
||||
tl_regrs = torch.from_numpy(tl_regrs)
|
||||
br_regrs = torch.from_numpy(br_regrs)
|
||||
tl_tags = torch.from_numpy(tl_tags)
|
||||
br_tags = torch.from_numpy(br_tags)
|
||||
tag_masks = torch.from_numpy(tag_masks)
|
||||
tl_valids = torch.from_numpy(tl_valids)
|
||||
br_valids = torch.from_numpy(br_valids)
|
||||
attentions = [torch.from_numpy(att) for att in attentions]
|
||||
|
||||
return {
|
||||
"xs": [images],
|
||||
"ys": [tl_heats, br_heats, tag_masks, tl_regrs, br_regrs, tl_tags, br_tags, tl_valids, br_valids, attentions]
|
||||
}, k_ind
|
||||
178
object_detection/core/sample/utils.py
Normal file
178
object_detection/core/sample/utils.py
Normal file
@@ -0,0 +1,178 @@
|
||||
import random
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
|
||||
def grayscale(image):
|
||||
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||
|
||||
|
||||
def normalize_(image, mean, std):
|
||||
image -= mean
|
||||
image /= std
|
||||
|
||||
|
||||
def lighting_(data_rng, image, alphastd, eigval, eigvec):
|
||||
alpha = data_rng.normal(scale=alphastd, size=(3,))
|
||||
image += np.dot(eigvec, eigval * alpha)
|
||||
|
||||
|
||||
def blend_(alpha, image1, image2):
|
||||
image1 *= alpha
|
||||
image2 *= (1 - alpha)
|
||||
image1 += image2
|
||||
|
||||
|
||||
def saturation_(data_rng, image, gs, gs_mean, var):
|
||||
alpha = 1. + data_rng.uniform(low=-var, high=var)
|
||||
blend_(alpha, image, gs[:, :, None])
|
||||
|
||||
|
||||
def brightness_(data_rng, image, gs, gs_mean, var):
|
||||
alpha = 1. + data_rng.uniform(low=-var, high=var)
|
||||
image *= alpha
|
||||
|
||||
|
||||
def contrast_(data_rng, image, gs, gs_mean, var):
|
||||
alpha = 1. + data_rng.uniform(low=-var, high=var)
|
||||
blend_(alpha, image, gs_mean)
|
||||
|
||||
|
||||
def color_jittering_(data_rng, image):
|
||||
functions = [brightness_, contrast_, saturation_]
|
||||
random.shuffle(functions)
|
||||
|
||||
gs = grayscale(image)
|
||||
gs_mean = gs.mean()
|
||||
for f in functions:
|
||||
f(data_rng, image, gs, gs_mean, 0.4)
|
||||
|
||||
|
||||
def gaussian2D(shape, sigma=1):
|
||||
m, n = [(ss - 1.) / 2. for ss in shape]
|
||||
y, x = np.ogrid[-m:m + 1, -n:n + 1]
|
||||
|
||||
h = np.exp(-(x * x + y * y) / (2 * sigma * sigma))
|
||||
h[h < np.finfo(h.dtype).eps * h.max()] = 0
|
||||
return h
|
||||
|
||||
|
||||
def draw_gaussian(heatmap, center, radius, k=1):
|
||||
diameter = 2 * radius + 1
|
||||
gaussian = gaussian2D((diameter, diameter), sigma=diameter / 6)
|
||||
|
||||
x, y = center
|
||||
|
||||
height, width = heatmap.shape[0:2]
|
||||
|
||||
left, right = min(x, radius), min(width - x, radius + 1)
|
||||
top, bottom = min(y, radius), min(height - y, radius + 1)
|
||||
|
||||
masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right]
|
||||
masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:radius + right]
|
||||
np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap)
|
||||
|
||||
|
||||
def gaussian_radius(det_size, min_overlap):
|
||||
height, width = det_size
|
||||
|
||||
a1 = 1
|
||||
b1 = (height + width)
|
||||
c1 = width * height * (1 - min_overlap) / (1 + min_overlap)
|
||||
sq1 = np.sqrt(b1 ** 2 - 4 * a1 * c1)
|
||||
r1 = (b1 - sq1) / (2 * a1)
|
||||
|
||||
a2 = 4
|
||||
b2 = 2 * (height + width)
|
||||
c2 = (1 - min_overlap) * width * height
|
||||
sq2 = np.sqrt(b2 ** 2 - 4 * a2 * c2)
|
||||
r2 = (b2 - sq2) / (2 * a2)
|
||||
|
||||
a3 = 4 * min_overlap
|
||||
b3 = -2 * min_overlap * (height + width)
|
||||
c3 = (min_overlap - 1) * width * height
|
||||
sq3 = np.sqrt(b3 ** 2 - 4 * a3 * c3)
|
||||
r3 = (b3 + sq3) / (2 * a3)
|
||||
return min(r1, r2, r3)
|
||||
|
||||
|
||||
def _get_border(border, size):
|
||||
i = 1
|
||||
while size - border // i <= border // i:
|
||||
i *= 2
|
||||
return border // i
|
||||
|
||||
|
||||
def random_crop(image, detections, random_scales, view_size, border=64):
|
||||
view_height, view_width = view_size
|
||||
image_height, image_width = image.shape[0:2]
|
||||
|
||||
scale = np.random.choice(random_scales)
|
||||
height = int(view_height * scale)
|
||||
width = int(view_width * scale)
|
||||
|
||||
cropped_image = np.zeros((height, width, 3), dtype=image.dtype)
|
||||
|
||||
w_border = _get_border(border, image_width)
|
||||
h_border = _get_border(border, image_height)
|
||||
|
||||
ctx = np.random.randint(low=w_border, high=image_width - w_border)
|
||||
cty = np.random.randint(low=h_border, high=image_height - h_border)
|
||||
|
||||
x0, x1 = max(ctx - width // 2, 0), min(ctx + width // 2, image_width)
|
||||
y0, y1 = max(cty - height // 2, 0), min(cty + height // 2, image_height)
|
||||
|
||||
left_w, right_w = ctx - x0, x1 - ctx
|
||||
top_h, bottom_h = cty - y0, y1 - cty
|
||||
|
||||
# crop image
|
||||
cropped_ctx, cropped_cty = width // 2, height // 2
|
||||
x_slice = slice(cropped_ctx - left_w, cropped_ctx + right_w)
|
||||
y_slice = slice(cropped_cty - top_h, cropped_cty + bottom_h)
|
||||
cropped_image[y_slice, x_slice, :] = image[y0:y1, x0:x1, :]
|
||||
|
||||
# crop detections
|
||||
cropped_detections = detections.copy()
|
||||
cropped_detections[:, 0:4:2] -= x0
|
||||
cropped_detections[:, 1:4:2] -= y0
|
||||
cropped_detections[:, 0:4:2] += cropped_ctx - left_w
|
||||
cropped_detections[:, 1:4:2] += cropped_cty - top_h
|
||||
|
||||
return cropped_image, cropped_detections
|
||||
|
||||
|
||||
def crop_image(image, center, size, output_size=None):
|
||||
if output_size == None:
|
||||
output_size = size
|
||||
|
||||
cty, ctx = center
|
||||
height, width = size
|
||||
o_height, o_width = output_size
|
||||
im_height, im_width = image.shape[0:2]
|
||||
cropped_image = np.zeros((o_height, o_width, 3), dtype=image.dtype)
|
||||
|
||||
x0, x1 = max(0, ctx - width // 2), min(ctx + width // 2, im_width)
|
||||
y0, y1 = max(0, cty - height // 2), min(cty + height // 2, im_height)
|
||||
|
||||
left, right = ctx - x0, x1 - ctx
|
||||
top, bottom = cty - y0, y1 - cty
|
||||
|
||||
cropped_cty, cropped_ctx = o_height // 2, o_width // 2
|
||||
y_slice = slice(cropped_cty - top, cropped_cty + bottom)
|
||||
x_slice = slice(cropped_ctx - left, cropped_ctx + right)
|
||||
cropped_image[y_slice, x_slice, :] = image[y0:y1, x0:x1, :]
|
||||
|
||||
border = np.array([
|
||||
cropped_cty - top,
|
||||
cropped_cty + bottom,
|
||||
cropped_ctx - left,
|
||||
cropped_ctx + right
|
||||
], dtype=np.float32)
|
||||
|
||||
offset = np.array([
|
||||
cty - o_height // 2,
|
||||
ctx - o_width // 2
|
||||
])
|
||||
|
||||
return cropped_image, border, offset
|
||||
5
object_detection/core/test/__init__.py
Normal file
5
object_detection/core/test/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
from .cornernet import cornernet
|
||||
from .cornernet_saccade import cornernet_saccade
|
||||
|
||||
def test_func(sys_config, db, nnet, result_dir, debug=False):
|
||||
return globals()[sys_config.sampling_function](db, nnet, result_dir, debug=debug)
|
||||
180
object_detection/core/test/cornernet.py
Normal file
180
object_detection/core/test/cornernet.py
Normal file
@@ -0,0 +1,180 @@
|
||||
import json
|
||||
import os
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
import torch
|
||||
from tqdm import tqdm
|
||||
|
||||
from ..external.nms import soft_nms, soft_nms_merge
|
||||
from ..sample.utils import crop_image
|
||||
from ..utils import Timer
|
||||
from ..vis_utils import draw_bboxes
|
||||
|
||||
|
||||
def rescale_dets_(detections, ratios, borders, sizes):
|
||||
xs, ys = detections[..., 0:4:2], detections[..., 1:4:2]
|
||||
xs /= ratios[:, 1][:, None, None]
|
||||
ys /= ratios[:, 0][:, None, None]
|
||||
xs -= borders[:, 2][:, None, None]
|
||||
ys -= borders[:, 0][:, None, None]
|
||||
np.clip(xs, 0, sizes[:, 1][:, None, None], out=xs)
|
||||
np.clip(ys, 0, sizes[:, 0][:, None, None], out=ys)
|
||||
|
||||
|
||||
def decode(nnet, images, K, ae_threshold=0.5, kernel=3, num_dets=1000):
|
||||
detections = nnet.test([images], ae_threshold=ae_threshold, test=True, K=K, kernel=kernel, num_dets=num_dets)[0]
|
||||
return detections.data.cpu().numpy()
|
||||
|
||||
|
||||
def cornernet(db, nnet, result_dir, debug=False, decode_func=decode):
|
||||
debug_dir = os.path.join(result_dir, "debug")
|
||||
if not os.path.exists(debug_dir):
|
||||
os.makedirs(debug_dir)
|
||||
|
||||
if db.split != "trainval2014":
|
||||
db_inds = db.db_inds[:100] if debug else db.db_inds
|
||||
else:
|
||||
db_inds = db.db_inds[:100] if debug else db.db_inds[:5000]
|
||||
|
||||
num_images = db_inds.size
|
||||
categories = db.configs["categories"]
|
||||
|
||||
timer = Timer()
|
||||
top_bboxes = {}
|
||||
for ind in tqdm(range(0, num_images), ncols=80, desc="locating kps"):
|
||||
db_ind = db_inds[ind]
|
||||
|
||||
image_id = db.image_ids(db_ind)
|
||||
image_path = db.image_path(db_ind)
|
||||
image = cv2.imread(image_path)
|
||||
|
||||
timer.tic()
|
||||
top_bboxes[image_id] = cornernet_inference(db, nnet, image)
|
||||
timer.toc()
|
||||
|
||||
if debug:
|
||||
image_path = db.image_path(db_ind)
|
||||
image = cv2.imread(image_path)
|
||||
bboxes = {
|
||||
db.cls2name(j): top_bboxes[image_id][j]
|
||||
for j in range(1, categories + 1)
|
||||
}
|
||||
image = draw_bboxes(image, bboxes)
|
||||
debug_file = os.path.join(debug_dir, "{}.jpg".format(db_ind))
|
||||
cv2.imwrite(debug_file, image)
|
||||
print('average time: {}'.format(timer.average_time))
|
||||
|
||||
result_json = os.path.join(result_dir, "results.json")
|
||||
detections = db.convert_to_coco(top_bboxes)
|
||||
with open(result_json, "w") as f:
|
||||
json.dump(detections, f)
|
||||
|
||||
cls_ids = list(range(1, categories + 1))
|
||||
image_ids = [db.image_ids(ind) for ind in db_inds]
|
||||
db.evaluate(result_json, cls_ids, image_ids)
|
||||
return 0
|
||||
|
||||
|
||||
def cornernet_inference(db, nnet, image, decode_func=decode):
|
||||
K = db.configs["top_k"]
|
||||
ae_threshold = db.configs["ae_threshold"]
|
||||
nms_kernel = db.configs["nms_kernel"]
|
||||
num_dets = db.configs["num_dets"]
|
||||
test_flipped = db.configs["test_flipped"]
|
||||
|
||||
input_size = db.configs["input_size"]
|
||||
output_size = db.configs["output_sizes"][0]
|
||||
|
||||
scales = db.configs["test_scales"]
|
||||
weight_exp = db.configs["weight_exp"]
|
||||
merge_bbox = db.configs["merge_bbox"]
|
||||
categories = db.configs["categories"]
|
||||
nms_threshold = db.configs["nms_threshold"]
|
||||
max_per_image = db.configs["max_per_image"]
|
||||
nms_algorithm = {
|
||||
"nms": 0,
|
||||
"linear_soft_nms": 1,
|
||||
"exp_soft_nms": 2
|
||||
}[db.configs["nms_algorithm"]]
|
||||
|
||||
height, width = image.shape[0:2]
|
||||
|
||||
height_scale = (input_size[0] + 1) // output_size[0]
|
||||
width_scale = (input_size[1] + 1) // output_size[1]
|
||||
|
||||
im_mean = torch.cuda.FloatTensor(db.mean).reshape(1, 3, 1, 1)
|
||||
im_std = torch.cuda.FloatTensor(db.std).reshape(1, 3, 1, 1)
|
||||
|
||||
detections = []
|
||||
for scale in scales:
|
||||
new_height = int(height * scale)
|
||||
new_width = int(width * scale)
|
||||
new_center = np.array([new_height // 2, new_width // 2])
|
||||
|
||||
inp_height = new_height | 127
|
||||
inp_width = new_width | 127
|
||||
|
||||
images = np.zeros((1, 3, inp_height, inp_width), dtype=np.float32)
|
||||
ratios = np.zeros((1, 2), dtype=np.float32)
|
||||
borders = np.zeros((1, 4), dtype=np.float32)
|
||||
sizes = np.zeros((1, 2), dtype=np.float32)
|
||||
|
||||
out_height, out_width = (inp_height + 1) // height_scale, (inp_width + 1) // width_scale
|
||||
height_ratio = out_height / inp_height
|
||||
width_ratio = out_width / inp_width
|
||||
|
||||
resized_image = cv2.resize(image, (new_width, new_height))
|
||||
resized_image, border, offset = crop_image(resized_image, new_center, [inp_height, inp_width])
|
||||
|
||||
resized_image = resized_image / 255.
|
||||
|
||||
images[0] = resized_image.transpose((2, 0, 1))
|
||||
borders[0] = border
|
||||
sizes[0] = [int(height * scale), int(width * scale)]
|
||||
ratios[0] = [height_ratio, width_ratio]
|
||||
|
||||
if test_flipped:
|
||||
images = np.concatenate((images, images[:, :, :, ::-1]), axis=0)
|
||||
images = torch.from_numpy(images).cuda()
|
||||
images -= im_mean
|
||||
images /= im_std
|
||||
|
||||
dets = decode_func(nnet, images, K, ae_threshold=ae_threshold, kernel=nms_kernel, num_dets=num_dets)
|
||||
if test_flipped:
|
||||
dets[1, :, [0, 2]] = out_width - dets[1, :, [2, 0]]
|
||||
dets = dets.reshape(1, -1, 8)
|
||||
|
||||
rescale_dets_(dets, ratios, borders, sizes)
|
||||
dets[:, :, 0:4] /= scale
|
||||
detections.append(dets)
|
||||
|
||||
detections = np.concatenate(detections, axis=1)
|
||||
|
||||
classes = detections[..., -1]
|
||||
classes = classes[0]
|
||||
detections = detections[0]
|
||||
|
||||
# reject detections with negative scores
|
||||
keep_inds = (detections[:, 4] > -1)
|
||||
detections = detections[keep_inds]
|
||||
classes = classes[keep_inds]
|
||||
|
||||
top_bboxes = {}
|
||||
for j in range(categories):
|
||||
keep_inds = (classes == j)
|
||||
top_bboxes[j + 1] = detections[keep_inds][:, 0:7].astype(np.float32)
|
||||
if merge_bbox:
|
||||
soft_nms_merge(top_bboxes[j + 1], Nt=nms_threshold, method=nms_algorithm, weight_exp=weight_exp)
|
||||
else:
|
||||
soft_nms(top_bboxes[j + 1], Nt=nms_threshold, method=nms_algorithm)
|
||||
top_bboxes[j + 1] = top_bboxes[j + 1][:, 0:5]
|
||||
|
||||
scores = np.hstack([top_bboxes[j][:, -1] for j in range(1, categories + 1)])
|
||||
if len(scores) > max_per_image:
|
||||
kth = len(scores) - max_per_image
|
||||
thresh = np.partition(scores, kth)[kth]
|
||||
for j in range(1, categories + 1):
|
||||
keep_inds = (top_bboxes[j][:, -1] >= thresh)
|
||||
top_bboxes[j] = top_bboxes[j][keep_inds]
|
||||
return top_bboxes
|
||||
406
object_detection/core/test/cornernet_saccade.py
Normal file
406
object_detection/core/test/cornernet_saccade.py
Normal file
@@ -0,0 +1,406 @@
|
||||
import json
|
||||
import math
|
||||
import os
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
from tqdm import tqdm
|
||||
|
||||
from ..external.nms import soft_nms
|
||||
from ..utils import Timer
|
||||
from ..vis_utils import draw_bboxes
|
||||
|
||||
|
||||
def crop_image_gpu(image, center, size, out_image):
|
||||
cty, ctx = center
|
||||
height, width = size
|
||||
o_height, o_width = out_image.shape[1:3]
|
||||
im_height, im_width = image.shape[1:3]
|
||||
|
||||
scale = o_height / max(height, width)
|
||||
x0, x1 = max(0, ctx - width // 2), min(ctx + width // 2, im_width)
|
||||
y0, y1 = max(0, cty - height // 2), min(cty + height // 2, im_height)
|
||||
|
||||
left, right = ctx - x0, x1 - ctx
|
||||
top, bottom = cty - y0, y1 - cty
|
||||
|
||||
cropped_cty, cropped_ctx = o_height // 2, o_width // 2
|
||||
out_y0, out_y1 = cropped_cty - int(top * scale), cropped_cty + int(bottom * scale)
|
||||
out_x0, out_x1 = cropped_ctx - int(left * scale), cropped_ctx + int(right * scale)
|
||||
|
||||
new_height = out_y1 - out_y0
|
||||
new_width = out_x1 - out_x0
|
||||
image = image[:, y0:y1, x0:x1].unsqueeze(0)
|
||||
out_image[:, out_y0:out_y1, out_x0:out_x1] = nn.functional.interpolate(
|
||||
image, size=[new_height, new_width], mode='bilinear'
|
||||
)[0]
|
||||
|
||||
return np.array([cty - height // 2, ctx - width // 2])
|
||||
|
||||
|
||||
def remap_dets_(detections, scales, offsets):
|
||||
xs, ys = detections[..., 0:4:2], detections[..., 1:4:2]
|
||||
|
||||
xs /= scales.reshape(-1, 1, 1)
|
||||
ys /= scales.reshape(-1, 1, 1)
|
||||
xs += offsets[:, 1][:, None, None]
|
||||
ys += offsets[:, 0][:, None, None]
|
||||
|
||||
|
||||
def att_nms(atts, ks):
|
||||
pads = [(k - 1) // 2 for k in ks]
|
||||
pools = [nn.functional.max_pool2d(att, (k, k), stride=1, padding=pad) for k, att, pad in zip(ks, atts, pads)]
|
||||
keeps = [(att == pool).float() for att, pool in zip(atts, pools)]
|
||||
atts = [att * keep for att, keep in zip(atts, keeps)]
|
||||
return atts
|
||||
|
||||
|
||||
def batch_decode(db, nnet, images, no_att=False):
|
||||
K = db.configs["top_k"]
|
||||
ae_threshold = db.configs["ae_threshold"]
|
||||
kernel = db.configs["nms_kernel"]
|
||||
num_dets = db.configs["num_dets"]
|
||||
|
||||
att_nms_ks = db.configs["att_nms_ks"]
|
||||
att_ranges = db.configs["att_ranges"]
|
||||
|
||||
num_images = images.shape[0]
|
||||
detections = []
|
||||
attentions = [[] for _ in range(len(att_ranges))]
|
||||
|
||||
batch_size = 32
|
||||
for b_ind in range(math.ceil(num_images / batch_size)):
|
||||
b_start = b_ind * batch_size
|
||||
b_end = min(num_images, (b_ind + 1) * batch_size)
|
||||
|
||||
b_images = images[b_start:b_end]
|
||||
b_outputs = nnet.test(
|
||||
[b_images], ae_threshold=ae_threshold, K=K, kernel=kernel,
|
||||
test=True, num_dets=num_dets, no_border=True, no_att=no_att
|
||||
)
|
||||
if no_att:
|
||||
b_detections = b_outputs
|
||||
else:
|
||||
b_detections = b_outputs[0]
|
||||
b_attentions = b_outputs[1]
|
||||
b_attentions = att_nms(b_attentions, att_nms_ks)
|
||||
b_attentions = [b_attention.data.cpu().numpy() for b_attention in b_attentions]
|
||||
|
||||
b_detections = b_detections.data.cpu().numpy()
|
||||
|
||||
detections.append(b_detections)
|
||||
if not no_att:
|
||||
for attention, b_attention in zip(attentions, b_attentions):
|
||||
attention.append(b_attention)
|
||||
|
||||
if not no_att:
|
||||
attentions = [np.concatenate(atts, axis=0) for atts in attentions] if detections else None
|
||||
detections = np.concatenate(detections, axis=0) if detections else np.zeros((0, num_dets, 8))
|
||||
return detections, attentions
|
||||
|
||||
|
||||
def decode_atts(db, atts, att_scales, scales, offsets, height, width, thresh, ignore_same=False):
|
||||
att_ranges = db.configs["att_ranges"]
|
||||
att_ratios = db.configs["att_ratios"]
|
||||
input_size = db.configs["input_size"]
|
||||
|
||||
next_ys, next_xs, next_scales, next_scores = [], [], [], []
|
||||
|
||||
num_atts = atts[0].shape[0]
|
||||
for aind in range(num_atts):
|
||||
for att, att_range, att_ratio, att_scale in zip(atts, att_ranges, att_ratios, att_scales):
|
||||
ys, xs = np.where(att[aind, 0] > thresh)
|
||||
scores = att[aind, 0, ys, xs]
|
||||
|
||||
ys = ys * att_ratio / scales[aind] + offsets[aind, 0]
|
||||
xs = xs * att_ratio / scales[aind] + offsets[aind, 1]
|
||||
|
||||
keep = (ys >= 0) & (ys < height) & (xs >= 0) & (xs < width)
|
||||
ys, xs, scores = ys[keep], xs[keep], scores[keep]
|
||||
|
||||
next_scale = att_scale * scales[aind]
|
||||
if (ignore_same and att_scale <= 1) or scales[aind] > 2 or next_scale > 4:
|
||||
continue
|
||||
|
||||
next_scales += [next_scale] * len(xs)
|
||||
next_scores += scores.tolist()
|
||||
next_ys += ys.tolist()
|
||||
next_xs += xs.tolist()
|
||||
next_ys = np.array(next_ys)
|
||||
next_xs = np.array(next_xs)
|
||||
next_scales = np.array(next_scales)
|
||||
next_scores = np.array(next_scores)
|
||||
return np.stack((next_ys, next_xs, next_scales, next_scores), axis=1)
|
||||
|
||||
|
||||
def get_ref_locs(dets):
|
||||
keep = dets[:, 4] > 0.5
|
||||
dets = dets[keep]
|
||||
|
||||
ref_xs = (dets[:, 0] + dets[:, 2]) / 2
|
||||
ref_ys = (dets[:, 1] + dets[:, 3]) / 2
|
||||
|
||||
ref_maxhws = np.maximum(dets[:, 2] - dets[:, 0], dets[:, 3] - dets[:, 1])
|
||||
ref_scales = np.zeros_like(ref_maxhws)
|
||||
ref_scores = dets[:, 4]
|
||||
|
||||
large_inds = ref_maxhws > 96
|
||||
medium_inds = (ref_maxhws > 32) & (ref_maxhws <= 96)
|
||||
small_inds = ref_maxhws <= 32
|
||||
|
||||
ref_scales[large_inds] = 192 / ref_maxhws[large_inds]
|
||||
ref_scales[medium_inds] = 64 / ref_maxhws[medium_inds]
|
||||
ref_scales[small_inds] = 24 / ref_maxhws[small_inds]
|
||||
|
||||
new_locations = np.stack((ref_ys, ref_xs, ref_scales, ref_scores), axis=1)
|
||||
new_locations[:, 3] = 1
|
||||
return new_locations
|
||||
|
||||
|
||||
def get_locs(db, nnet, image, im_mean, im_std, att_scales, thresh, sizes, ref_dets=True):
|
||||
att_ranges = db.configs["att_ranges"]
|
||||
att_ratios = db.configs["att_ratios"]
|
||||
input_size = db.configs["input_size"]
|
||||
|
||||
height, width = image.shape[1:3]
|
||||
|
||||
locations = []
|
||||
for size in sizes:
|
||||
scale = size / max(height, width)
|
||||
location = [height // 2, width // 2, scale]
|
||||
locations.append(location)
|
||||
|
||||
locations = np.array(locations, dtype=np.float32)
|
||||
images, offsets = prepare_images(db, image, locations, flipped=False)
|
||||
|
||||
images -= im_mean
|
||||
images /= im_std
|
||||
|
||||
dets, atts = batch_decode(db, nnet, images)
|
||||
|
||||
scales = locations[:, 2]
|
||||
next_locations = decode_atts(db, atts, att_scales, scales, offsets, height, width, thresh)
|
||||
|
||||
rescale_dets_(db, dets)
|
||||
remap_dets_(dets, scales, offsets)
|
||||
|
||||
dets = dets.reshape(-1, 8)
|
||||
keep = dets[:, 4] > 0.3
|
||||
dets = dets[keep]
|
||||
|
||||
if ref_dets:
|
||||
ref_locations = get_ref_locs(dets)
|
||||
next_locations = np.concatenate((next_locations, ref_locations), axis=0)
|
||||
next_locations = location_nms(next_locations, thresh=16)
|
||||
return dets, next_locations, atts
|
||||
|
||||
|
||||
def location_nms(locations, thresh=15):
|
||||
next_locations = []
|
||||
sorted_inds = np.argsort(locations[:, -1])[::-1]
|
||||
|
||||
locations = locations[sorted_inds]
|
||||
ys = locations[:, 0]
|
||||
xs = locations[:, 1]
|
||||
scales = locations[:, 2]
|
||||
|
||||
dist_ys = np.absolute(ys.reshape(-1, 1) - ys.reshape(1, -1))
|
||||
dist_xs = np.absolute(xs.reshape(-1, 1) - xs.reshape(1, -1))
|
||||
dists = np.minimum(dist_ys, dist_xs)
|
||||
ratios = scales.reshape(-1, 1) / scales.reshape(1, -1)
|
||||
while dists.shape[0] > 0:
|
||||
next_locations.append(locations[0])
|
||||
|
||||
scale = scales[0]
|
||||
dist = dists[0]
|
||||
ratio = ratios[0]
|
||||
|
||||
keep = (dist > (thresh / scale)) | (ratio > 1.2) | (ratio < 0.8)
|
||||
|
||||
locations = locations[keep]
|
||||
|
||||
scales = scales[keep]
|
||||
dists = dists[keep, :]
|
||||
dists = dists[:, keep]
|
||||
ratios = ratios[keep, :]
|
||||
ratios = ratios[:, keep]
|
||||
return np.stack(next_locations) if next_locations else np.zeros((0, 4))
|
||||
|
||||
|
||||
def prepare_images(db, image, locs, flipped=True):
|
||||
input_size = db.configs["input_size"]
|
||||
num_patches = locs.shape[0]
|
||||
|
||||
images = torch.cuda.FloatTensor(num_patches, 3, input_size[0], input_size[1]).fill_(0)
|
||||
offsets = np.zeros((num_patches, 2), dtype=np.float32)
|
||||
for ind, (y, x, scale) in enumerate(locs[:, :3]):
|
||||
crop_height = int(input_size[0] / scale)
|
||||
crop_width = int(input_size[1] / scale)
|
||||
offsets[ind] = crop_image_gpu(image, [int(y), int(x)], [crop_height, crop_width], images[ind])
|
||||
return images, offsets
|
||||
|
||||
|
||||
def rescale_dets_(db, dets):
|
||||
input_size = db.configs["input_size"]
|
||||
output_size = db.configs["output_sizes"][0]
|
||||
|
||||
ratios = [o / i for o, i in zip(output_size, input_size)]
|
||||
dets[..., 0:4:2] /= ratios[1]
|
||||
dets[..., 1:4:2] /= ratios[0]
|
||||
|
||||
|
||||
def cornernet_saccade(db, nnet, result_dir, debug=False, decode_func=batch_decode):
|
||||
debug_dir = os.path.join(result_dir, "debug")
|
||||
if not os.path.exists(debug_dir):
|
||||
os.makedirs(debug_dir)
|
||||
|
||||
if db.split != "trainval2014":
|
||||
db_inds = db.db_inds[:500] if debug else db.db_inds
|
||||
else:
|
||||
db_inds = db.db_inds[:100] if debug else db.db_inds[:5000]
|
||||
|
||||
num_images = db_inds.size
|
||||
categories = db.configs["categories"]
|
||||
|
||||
timer = Timer()
|
||||
top_bboxes = {}
|
||||
for k_ind in tqdm(range(0, num_images), ncols=80, desc="locating kps"):
|
||||
db_ind = db_inds[k_ind]
|
||||
|
||||
image_id = db.image_ids(db_ind)
|
||||
image_path = db.image_path(db_ind)
|
||||
image = cv2.imread(image_path)
|
||||
|
||||
timer.tic()
|
||||
top_bboxes[image_id] = cornernet_saccade_inference(db, nnet, image)
|
||||
timer.toc()
|
||||
|
||||
if debug:
|
||||
image_path = db.image_path(db_ind)
|
||||
image = cv2.imread(image_path)
|
||||
bboxes = {
|
||||
db.cls2name(j): top_bboxes[image_id][j]
|
||||
for j in range(1, categories + 1)
|
||||
}
|
||||
image = draw_bboxes(image, bboxes)
|
||||
debug_file = os.path.join(debug_dir, "{}.jpg".format(db_ind))
|
||||
cv2.imwrite(debug_file, image)
|
||||
print('average time: {}'.format(timer.average_time))
|
||||
|
||||
result_json = os.path.join(result_dir, "results.json")
|
||||
detections = db.convert_to_coco(top_bboxes)
|
||||
with open(result_json, "w") as f:
|
||||
json.dump(detections, f)
|
||||
|
||||
cls_ids = list(range(1, categories + 1))
|
||||
image_ids = [db.image_ids(ind) for ind in db_inds]
|
||||
db.evaluate(result_json, cls_ids, image_ids)
|
||||
return 0
|
||||
|
||||
|
||||
def cornernet_saccade_inference(db, nnet, image, decode_func=batch_decode):
|
||||
init_sizes = db.configs["init_sizes"]
|
||||
ref_dets = db.configs["ref_dets"]
|
||||
|
||||
att_thresholds = db.configs["att_thresholds"]
|
||||
att_scales = db.configs["att_scales"]
|
||||
att_max_crops = db.configs["att_max_crops"]
|
||||
|
||||
categories = db.configs["categories"]
|
||||
nms_threshold = db.configs["nms_threshold"]
|
||||
max_per_image = db.configs["max_per_image"]
|
||||
nms_algorithm = {
|
||||
"nms": 0,
|
||||
"linear_soft_nms": 1,
|
||||
"exp_soft_nms": 2
|
||||
}[db.configs["nms_algorithm"]]
|
||||
|
||||
num_iterations = len(att_thresholds)
|
||||
|
||||
im_mean = torch.cuda.FloatTensor(db.mean).reshape(1, 3, 1, 1)
|
||||
im_std = torch.cuda.FloatTensor(db.std).reshape(1, 3, 1, 1)
|
||||
|
||||
detections = []
|
||||
height, width = image.shape[0:2]
|
||||
|
||||
image = image / 255.
|
||||
image = image.transpose((2, 0, 1)).copy()
|
||||
image = torch.from_numpy(image).cuda(non_blocking=True)
|
||||
|
||||
dets, locations, atts = get_locs(
|
||||
db, nnet, image, im_mean, im_std,
|
||||
att_scales[0], att_thresholds[0],
|
||||
init_sizes, ref_dets=ref_dets
|
||||
)
|
||||
|
||||
detections = [dets]
|
||||
num_patches = locations.shape[0]
|
||||
|
||||
num_crops = 0
|
||||
for ind in range(1, num_iterations + 1):
|
||||
if num_patches == 0:
|
||||
break
|
||||
|
||||
if num_crops + num_patches > att_max_crops:
|
||||
max_crops = min(att_max_crops - num_crops, num_patches)
|
||||
locations = locations[:max_crops]
|
||||
|
||||
num_patches = locations.shape[0]
|
||||
num_crops += locations.shape[0]
|
||||
no_att = (ind == num_iterations)
|
||||
|
||||
images, offsets = prepare_images(db, image, locations, flipped=False)
|
||||
images -= im_mean
|
||||
images /= im_std
|
||||
|
||||
dets, atts = decode_func(db, nnet, images, no_att=no_att)
|
||||
dets = dets.reshape(num_patches, -1, 8)
|
||||
|
||||
rescale_dets_(db, dets)
|
||||
remap_dets_(dets, locations[:, 2], offsets)
|
||||
|
||||
dets = dets.reshape(-1, 8)
|
||||
keeps = (dets[:, 4] > -1)
|
||||
dets = dets[keeps]
|
||||
|
||||
detections.append(dets)
|
||||
|
||||
if num_crops == att_max_crops:
|
||||
break
|
||||
|
||||
if ind < num_iterations:
|
||||
att_threshold = att_thresholds[ind]
|
||||
att_scale = att_scales[ind]
|
||||
|
||||
next_locations = decode_atts(
|
||||
db, atts, att_scale, locations[:, 2], offsets, height, width, att_threshold, ignore_same=True
|
||||
)
|
||||
|
||||
if ref_dets:
|
||||
ref_locations = get_ref_locs(dets)
|
||||
next_locations = np.concatenate((next_locations, ref_locations), axis=0)
|
||||
next_locations = location_nms(next_locations, thresh=16)
|
||||
|
||||
locations = next_locations
|
||||
num_patches = locations.shape[0]
|
||||
|
||||
detections = np.concatenate(detections, axis=0)
|
||||
classes = detections[..., -1]
|
||||
|
||||
top_bboxes = {}
|
||||
for j in range(categories):
|
||||
keep_inds = (classes == j)
|
||||
top_bboxes[j + 1] = detections[keep_inds][:, 0:7].astype(np.float32)
|
||||
keep_inds = soft_nms(top_bboxes[j + 1], Nt=nms_threshold, method=nms_algorithm, sigma=0.7)
|
||||
top_bboxes[j + 1] = top_bboxes[j + 1][keep_inds, 0:5]
|
||||
|
||||
scores = np.hstack([top_bboxes[j][:, -1] for j in range(1, categories + 1)])
|
||||
if len(scores) > max_per_image:
|
||||
kth = len(scores) - max_per_image
|
||||
thresh = np.partition(scores, kth)[kth]
|
||||
for j in range(1, categories + 1):
|
||||
keep_inds = (top_bboxes[j][:, -1] >= thresh)
|
||||
top_bboxes[j] = top_bboxes[j][keep_inds]
|
||||
return top_bboxes
|
||||
2
object_detection/core/utils/__init__.py
Normal file
2
object_detection/core/utils/__init__.py
Normal file
@@ -0,0 +1,2 @@
|
||||
from .tqdm import stdout_to_tqdm
|
||||
from .timer import Timer
|
||||
27
object_detection/core/utils/timer.py
Normal file
27
object_detection/core/utils/timer.py
Normal file
@@ -0,0 +1,27 @@
|
||||
import time
|
||||
|
||||
|
||||
class Timer(object):
|
||||
"""A simple timer."""
|
||||
|
||||
def __init__(self):
|
||||
self.total_time = 0.
|
||||
self.calls = 0
|
||||
self.start_time = 0.
|
||||
self.diff = 0.
|
||||
self.average_time = 0.
|
||||
|
||||
def tic(self):
|
||||
# using time.time instead of time.clock because time time.clock
|
||||
# does not normalize for multithreading
|
||||
self.start_time = time.time()
|
||||
|
||||
def toc(self, average=True):
|
||||
self.diff = time.time() - self.start_time
|
||||
self.total_time += self.diff
|
||||
self.calls += 1
|
||||
self.average_time = self.total_time / self.calls
|
||||
if average:
|
||||
return self.average_time
|
||||
else:
|
||||
return self.diff
|
||||
27
object_detection/core/utils/tqdm.py
Normal file
27
object_detection/core/utils/tqdm.py
Normal file
@@ -0,0 +1,27 @@
|
||||
import contextlib
|
||||
import sys
|
||||
|
||||
from tqdm import tqdm
|
||||
|
||||
|
||||
class TqdmFile(object):
|
||||
dummy_file = None
|
||||
|
||||
def __init__(self, dummy_file):
|
||||
self.dummy_file = dummy_file
|
||||
|
||||
def write(self, x):
|
||||
if len(x.rstrip()) > 0:
|
||||
tqdm.write(x, file=self.dummy_file)
|
||||
|
||||
|
||||
@contextlib.contextmanager
|
||||
def stdout_to_tqdm():
|
||||
save_stdout = sys.stdout
|
||||
try:
|
||||
sys.stdout = TqdmFile(sys.stdout)
|
||||
yield save_stdout
|
||||
except Exception as exc:
|
||||
raise exc
|
||||
finally:
|
||||
sys.stdout = save_stdout
|
||||
63
object_detection/core/vis_utils.py
Normal file
63
object_detection/core/vis_utils.py
Normal file
@@ -0,0 +1,63 @@
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
|
||||
def draw_bboxes(image, bboxes, font_size=0.5, thresh=0.5, colors=None):
|
||||
"""Draws bounding boxes on an image.
|
||||
|
||||
Args:
|
||||
image: An image in OpenCV format
|
||||
bboxes: A dictionary representing bounding boxes of different object
|
||||
categories, where the keys are the names of the categories and the
|
||||
values are the bounding boxes. The bounding boxes of category should be
|
||||
stored in a 2D NumPy array, where each row is a bounding box (x1, y1,
|
||||
x2, y2, score).
|
||||
font_size: (Optional) Font size of the category names.
|
||||
thresh: (Optional) Only bounding boxes with scores above the threshold
|
||||
will be drawn.
|
||||
colors: (Optional) Color of bounding boxes for each category. If it is
|
||||
not provided, this function will use random color for each category.
|
||||
|
||||
Returns:
|
||||
An image with bounding boxes.
|
||||
"""
|
||||
|
||||
image = image.copy()
|
||||
for cat_name in bboxes:
|
||||
keep_inds = bboxes[cat_name][:, -1] > thresh
|
||||
cat_size = cv2.getTextSize(cat_name, cv2.FONT_HERSHEY_SIMPLEX, font_size, 2)[0]
|
||||
|
||||
if colors is None:
|
||||
color = np.random.random((3,)) * 0.6 + 0.4
|
||||
color = (color * 255).astype(np.int32).tolist()
|
||||
else:
|
||||
color = colors[cat_name]
|
||||
|
||||
for bbox in bboxes[cat_name][keep_inds]:
|
||||
bbox = bbox[0:4].astype(np.int32)
|
||||
if bbox[1] - cat_size[1] - 2 < 0:
|
||||
cv2.rectangle(image,
|
||||
(bbox[0], bbox[1] + 2),
|
||||
(bbox[0] + cat_size[0], bbox[1] + cat_size[1] + 2),
|
||||
color, -1
|
||||
)
|
||||
cv2.putText(image, cat_name,
|
||||
(bbox[0], bbox[1] + cat_size[1] + 2),
|
||||
cv2.FONT_HERSHEY_SIMPLEX, font_size, (0, 0, 0), thickness=1
|
||||
)
|
||||
else:
|
||||
cv2.rectangle(image,
|
||||
(bbox[0], bbox[1] - cat_size[1] - 2),
|
||||
(bbox[0] + cat_size[0], bbox[1] - 2),
|
||||
color, -1
|
||||
)
|
||||
cv2.putText(image, cat_name,
|
||||
(bbox[0], bbox[1] - 2),
|
||||
cv2.FONT_HERSHEY_SIMPLEX, font_size, (0, 0, 0), thickness=1
|
||||
)
|
||||
cv2.rectangle(image,
|
||||
(bbox[0], bbox[1]),
|
||||
(bbox[2], bbox[3]),
|
||||
color, 2
|
||||
)
|
||||
return image
|
||||
BIN
object_detection/demo.jpg
Normal file
BIN
object_detection/demo.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 316 KiB |
13
object_detection/demo.py
Normal file
13
object_detection/demo.py
Normal file
@@ -0,0 +1,13 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
import cv2
|
||||
|
||||
from core.detectors import CornerNet_Saccade
|
||||
from core.vis_utils import draw_bboxes
|
||||
|
||||
detector = CornerNet_Saccade()
|
||||
image = cv2.imread("demo.jpg")
|
||||
|
||||
bboxes = detector(image)
|
||||
image = draw_bboxes(image, bboxes)
|
||||
cv2.imwrite("demo_out.jpg", image)
|
||||
16
object_detection/doc_detect.py
Normal file
16
object_detection/doc_detect.py
Normal file
@@ -0,0 +1,16 @@
|
||||
import numpy as np
|
||||
|
||||
from object_detection import CornerNet_Saccade
|
||||
from util import image_util
|
||||
|
||||
|
||||
def capture_target_area(image, target="book"):
|
||||
detector = CornerNet_Saccade()
|
||||
bboxes = detector(image)
|
||||
target_images = []
|
||||
keep_inds = bboxes[target][:, -1] > 0.5
|
||||
for bbox in bboxes[target][keep_inds]:
|
||||
bbox = bbox[0:4].astype(np.int32)
|
||||
bbox = np.clip(bbox, 0, None)
|
||||
target_images.append(image_util.capture(image, bbox))
|
||||
return target_images
|
||||
110
object_detection/evaluate.py
Normal file
110
object_detection/evaluate.py
Normal file
@@ -0,0 +1,110 @@
|
||||
#!/usr/bin/env python
|
||||
import argparse
|
||||
import importlib
|
||||
import json
|
||||
import os
|
||||
import pprint
|
||||
|
||||
import torch
|
||||
|
||||
from core.config import SystemConfig
|
||||
from core.dbs import datasets
|
||||
from core.nnet.py_factory import NetworkFactory
|
||||
from core.test import test_func
|
||||
|
||||
torch.backends.cudnn.benchmark = False
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description="Evaluation Script")
|
||||
parser.add_argument("cfg_file", help="config file", type=str)
|
||||
parser.add_argument("--testiter", dest="testiter",
|
||||
help="test at iteration i",
|
||||
default=None, type=int)
|
||||
parser.add_argument("--split", dest="split",
|
||||
help="which split to use",
|
||||
default="validation", type=str)
|
||||
parser.add_argument("--suffix", dest="suffix", default=None, type=str)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def make_dirs(directories):
|
||||
for directory in directories:
|
||||
if not os.path.exists(directory):
|
||||
os.makedirs(directory)
|
||||
|
||||
|
||||
def test(db, system_config, model, args):
|
||||
split = args.split
|
||||
testiter = args.testiter
|
||||
debug = args.debug
|
||||
suffix = args.suffix
|
||||
|
||||
result_dir = system_config.result_dir
|
||||
result_dir = os.path.join(result_dir, str(testiter), split)
|
||||
|
||||
if suffix is not None:
|
||||
result_dir = os.path.join(result_dir, suffix)
|
||||
|
||||
make_dirs([result_dir])
|
||||
|
||||
test_iter = system_config.max_iter if testiter is None else testiter
|
||||
print("loading parameters at iteration: {}".format(test_iter))
|
||||
|
||||
print("building neural network...")
|
||||
nnet = NetworkFactory(system_config, model)
|
||||
print("loading parameters...")
|
||||
nnet.load_params(test_iter)
|
||||
|
||||
nnet.cuda()
|
||||
nnet.eval_mode()
|
||||
test_func(system_config, db, nnet, result_dir, debug=debug)
|
||||
|
||||
|
||||
def main(args):
|
||||
if args.suffix is None:
|
||||
cfg_file = os.path.join("./configs", args.cfg_file + ".json")
|
||||
else:
|
||||
cfg_file = os.path.join("./configs", args.cfg_file + "-{}.json".format(args.suffix))
|
||||
print("cfg_file: {}".format(cfg_file))
|
||||
|
||||
with open(cfg_file, "r") as f:
|
||||
config = json.load(f)
|
||||
|
||||
config["system"]["snapshot_name"] = args.cfg_file
|
||||
system_config = SystemConfig().update_config(config["system"])
|
||||
|
||||
model_file = "core.models.{}".format(args.cfg_file)
|
||||
model_file = importlib.import_module(model_file)
|
||||
model = model_file.model()
|
||||
|
||||
train_split = system_config.train_split
|
||||
val_split = system_config.val_split
|
||||
test_split = system_config.test_split
|
||||
|
||||
split = {
|
||||
"training": train_split,
|
||||
"validation": val_split,
|
||||
"testing": test_split
|
||||
}[args.split]
|
||||
|
||||
print("loading all datasets...")
|
||||
dataset = system_config.dataset
|
||||
print("split: {}".format(split))
|
||||
testing_db = datasets[dataset](config["db"], split=split, sys_config=system_config)
|
||||
|
||||
print("system config...")
|
||||
pprint.pprint(system_config.full)
|
||||
|
||||
print("db config...")
|
||||
pprint.pprint(testing_db.configs)
|
||||
|
||||
test(testing_db, system_config, model, args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
args = parse_args()
|
||||
main(args)
|
||||
260
object_detection/train.py
Normal file
260
object_detection/train.py
Normal file
@@ -0,0 +1,260 @@
|
||||
#!/usr/bin/env python
|
||||
import argparse
|
||||
import importlib
|
||||
import json
|
||||
import os
|
||||
import pprint
|
||||
import queue
|
||||
import threading
|
||||
import traceback
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
import torch.distributed as dist
|
||||
import torch.multiprocessing as mp
|
||||
from torch.multiprocessing import Process, Queue
|
||||
from tqdm import tqdm
|
||||
|
||||
from core.config import SystemConfig
|
||||
from core.dbs import datasets
|
||||
from core.nnet.py_factory import NetworkFactory
|
||||
from core.sample import data_sampling_func
|
||||
from core.utils import stdout_to_tqdm
|
||||
|
||||
torch.backends.cudnn.enabled = True
|
||||
torch.backends.cudnn.benchmark = True
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description="Training Script")
|
||||
parser.add_argument("cfg_file", help="config file", type=str)
|
||||
parser.add_argument("--iter", dest="start_iter",
|
||||
help="train at iteration i",
|
||||
default=0, type=int)
|
||||
parser.add_argument("--workers", default=4, type=int)
|
||||
parser.add_argument("--initialize", action="store_true")
|
||||
|
||||
parser.add_argument("--distributed", action="store_true")
|
||||
parser.add_argument("--world-size", default=-1, type=int,
|
||||
help="number of nodes of distributed training")
|
||||
parser.add_argument("--rank", default=0, type=int,
|
||||
help="node rank for distributed training")
|
||||
parser.add_argument("--dist-url", default=None, type=str,
|
||||
help="url used to set up distributed training")
|
||||
parser.add_argument("--dist-backend", default="nccl", type=str)
|
||||
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def prefetch_data(system_config, db, queue, sample_data, data_aug):
|
||||
ind = 0
|
||||
print("start prefetching data...")
|
||||
np.random.seed(os.getpid())
|
||||
while True:
|
||||
try:
|
||||
data, ind = sample_data(system_config, db, ind, data_aug=data_aug)
|
||||
queue.put(data)
|
||||
except Exception as e:
|
||||
traceback.print_exc()
|
||||
raise e
|
||||
|
||||
|
||||
def _pin_memory(ts):
|
||||
if type(ts) is list:
|
||||
return [t.pin_memory() for t in ts]
|
||||
return ts.pin_memory()
|
||||
|
||||
|
||||
def pin_memory(data_queue, pinned_data_queue, sema):
|
||||
while True:
|
||||
data = data_queue.get()
|
||||
|
||||
data["xs"] = [_pin_memory(x) for x in data["xs"]]
|
||||
data["ys"] = [_pin_memory(y) for y in data["ys"]]
|
||||
|
||||
pinned_data_queue.put(data)
|
||||
|
||||
if sema.acquire(blocking=False):
|
||||
return
|
||||
|
||||
|
||||
def init_parallel_jobs(system_config, dbs, queue, fn, data_aug):
|
||||
tasks = [Process(target=prefetch_data, args=(system_config, db, queue, fn, data_aug)) for db in dbs]
|
||||
for task in tasks:
|
||||
task.daemon = True
|
||||
task.start()
|
||||
return tasks
|
||||
|
||||
|
||||
def terminate_tasks(tasks):
|
||||
for task in tasks:
|
||||
task.terminate()
|
||||
|
||||
|
||||
def train(training_dbs, validation_db, system_config, model, args):
|
||||
# reading arguments from command
|
||||
start_iter = args.start_iter
|
||||
distributed = args.distributed
|
||||
world_size = args.world_size
|
||||
initialize = args.initialize
|
||||
gpu = args.gpu
|
||||
rank = args.rank
|
||||
|
||||
# reading arguments from json file
|
||||
batch_size = system_config.batch_size
|
||||
learning_rate = system_config.learning_rate
|
||||
max_iteration = system_config.max_iter
|
||||
pretrained_model = system_config.pretrain
|
||||
stepsize = system_config.stepsize
|
||||
snapshot = system_config.snapshot
|
||||
val_iter = system_config.val_iter
|
||||
display = system_config.display
|
||||
decay_rate = system_config.decay_rate
|
||||
stepsize = system_config.stepsize
|
||||
|
||||
print("Process {}: building model...".format(rank))
|
||||
nnet = NetworkFactory(system_config, model, distributed=distributed, gpu=gpu)
|
||||
if initialize:
|
||||
nnet.save_params(0)
|
||||
exit(0)
|
||||
|
||||
# queues storing data for training
|
||||
training_queue = Queue(system_config.prefetch_size)
|
||||
validation_queue = Queue(5)
|
||||
|
||||
# queues storing pinned data for training
|
||||
pinned_training_queue = queue.Queue(system_config.prefetch_size)
|
||||
pinned_validation_queue = queue.Queue(5)
|
||||
|
||||
# allocating resources for parallel reading
|
||||
training_tasks = init_parallel_jobs(system_config, training_dbs, training_queue, data_sampling_func, True)
|
||||
if val_iter:
|
||||
validation_tasks = init_parallel_jobs(system_config, [validation_db], validation_queue, data_sampling_func,
|
||||
False)
|
||||
|
||||
training_pin_semaphore = threading.Semaphore()
|
||||
validation_pin_semaphore = threading.Semaphore()
|
||||
training_pin_semaphore.acquire()
|
||||
validation_pin_semaphore.acquire()
|
||||
|
||||
training_pin_args = (training_queue, pinned_training_queue, training_pin_semaphore)
|
||||
training_pin_thread = threading.Thread(target=pin_memory, args=training_pin_args)
|
||||
training_pin_thread.daemon = True
|
||||
training_pin_thread.start()
|
||||
|
||||
validation_pin_args = (validation_queue, pinned_validation_queue, validation_pin_semaphore)
|
||||
validation_pin_thread = threading.Thread(target=pin_memory, args=validation_pin_args)
|
||||
validation_pin_thread.daemon = True
|
||||
validation_pin_thread.start()
|
||||
|
||||
if pretrained_model is not None:
|
||||
if not os.path.exists(pretrained_model):
|
||||
raise ValueError("pretrained model does not exist")
|
||||
print("Process {}: loading from pretrained model".format(rank))
|
||||
nnet.load_pretrained_params(pretrained_model)
|
||||
|
||||
if start_iter:
|
||||
nnet.load_params(start_iter)
|
||||
learning_rate /= (decay_rate ** (start_iter // stepsize))
|
||||
nnet.set_lr(learning_rate)
|
||||
print("Process {}: training starts from iteration {} with learning_rate {}".format(rank, start_iter + 1,
|
||||
learning_rate))
|
||||
else:
|
||||
nnet.set_lr(learning_rate)
|
||||
|
||||
if rank == 0:
|
||||
print("training start...")
|
||||
nnet.cuda()
|
||||
nnet.train_mode()
|
||||
with stdout_to_tqdm() as save_stdout:
|
||||
for iteration in tqdm(range(start_iter + 1, max_iteration + 1), file=save_stdout, ncols=80):
|
||||
training = pinned_training_queue.get(block=True)
|
||||
training_loss = nnet.train(**training)
|
||||
|
||||
if display and iteration % display == 0:
|
||||
print("Process {}: training loss at iteration {}: {}".format(rank, iteration, training_loss.item()))
|
||||
del training_loss
|
||||
|
||||
if val_iter and validation_db.db_inds.size and iteration % val_iter == 0:
|
||||
nnet.eval_mode()
|
||||
validation = pinned_validation_queue.get(block=True)
|
||||
validation_loss = nnet.validate(**validation)
|
||||
print("Process {}: validation loss at iteration {}: {}".format(rank, iteration, validation_loss.item()))
|
||||
nnet.train_mode()
|
||||
|
||||
if iteration % snapshot == 0 and rank == 0:
|
||||
nnet.save_params(iteration)
|
||||
|
||||
if iteration % stepsize == 0:
|
||||
learning_rate /= decay_rate
|
||||
nnet.set_lr(learning_rate)
|
||||
|
||||
# sending signal to kill the thread
|
||||
training_pin_semaphore.release()
|
||||
validation_pin_semaphore.release()
|
||||
|
||||
# terminating data fetching processes
|
||||
terminate_tasks(training_tasks)
|
||||
terminate_tasks(validation_tasks)
|
||||
|
||||
|
||||
def main(gpu, ngpus_per_node, args):
|
||||
args.gpu = gpu
|
||||
if args.distributed:
|
||||
args.rank = args.rank * ngpus_per_node + gpu
|
||||
dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
|
||||
world_size=args.world_size, rank=args.rank)
|
||||
|
||||
rank = args.rank
|
||||
|
||||
cfg_file = os.path.join("./configs", args.cfg_file + ".json")
|
||||
with open(cfg_file, "r") as f:
|
||||
config = json.load(f)
|
||||
|
||||
config["system"]["snapshot_name"] = args.cfg_file
|
||||
system_config = SystemConfig().update_config(config["system"])
|
||||
|
||||
model_file = "core.models.{}".format(args.cfg_file)
|
||||
model_file = importlib.import_module(model_file)
|
||||
model = model_file.model()
|
||||
|
||||
train_split = system_config.train_split
|
||||
val_split = system_config.val_split
|
||||
|
||||
print("Process {}: loading all datasets...".format(rank))
|
||||
dataset = system_config.dataset
|
||||
workers = args.workers
|
||||
print("Process {}: using {} workers".format(rank, workers))
|
||||
training_dbs = [datasets[dataset](config["db"], split=train_split, sys_config=system_config) for _ in
|
||||
range(workers)]
|
||||
validation_db = datasets[dataset](config["db"], split=val_split, sys_config=system_config)
|
||||
|
||||
if rank == 0:
|
||||
print("system config...")
|
||||
pprint.pprint(system_config.full)
|
||||
|
||||
print("db config...")
|
||||
pprint.pprint(training_dbs[0].configs)
|
||||
|
||||
print("len of db: {}".format(len(training_dbs[0].db_inds)))
|
||||
print("distributed: {}".format(args.distributed))
|
||||
|
||||
train(training_dbs, validation_db, system_config, model, args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
args = parse_args()
|
||||
|
||||
distributed = args.distributed
|
||||
world_size = args.world_size
|
||||
|
||||
if distributed and world_size < 0:
|
||||
raise ValueError("world size must be greater than 0 in distributed training")
|
||||
|
||||
ngpus_per_node = torch.cuda.device_count()
|
||||
if distributed:
|
||||
args.world_size = ngpus_per_node * args.world_size
|
||||
mp.spawn(main, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
|
||||
else:
|
||||
main(None, ngpus_per_node, args)
|
||||
Reference in New Issue
Block a user