Object detection

本页包含内容：读论文总结，论文链接，论文/算法简要概括，代码实现链接等。有关目标检测更详尽的汇总： Object Detection - handong1587

目标检测就是在给定的图片里找到物体的位置，并且标注它们的类别，所以需要解决的问题就是：物体在哪里和物体是什么。目前主流的目标检测算法分为三类：

◎ Object Detection in 20 Years: A Survey

传统的目标检测算法，如滑窗+AdaBoost+Cascade，Cascade+HoG/DPM+SVM等变体；

两阶段的候选框提取+分类的算法，首先提取出候选区域ROI，然后对它们进行深度学习方法为主的分类，如R-CNN, SPP-Net, Fast R-CNN, Faster R-CNN, R-FCN等。

算法	主要组成	备注
R-CNN	Selective search + CNN + SVM	Paper \| Code
SPP-Net	ROI Pooling	Paper \| Code
Fast R-CNN	Selective search + CNN + ROI	Paper \| Code
Faster R-CNN	RPN + CNN + ROI	Paper \| Code
R-FCN		Paper \| Code

一阶段的基于深度学习的回归方法，将候选框位置以及候选框的类别当作回归问题来解决，如YOLO/SSD/DenseBox等。
算法主要组成备注
YOLO Anchor boxes, YOLO-loss function Paper(v3) | Code
SSD Paper | Code
DenseBox Paper | Code

算法	主要组成	备注
YOLO	Anchor boxes, YOLO-loss function	Paper(v3) \| Code
SSD		Paper \| Code
DenseBox		Paper \| Code

Basic knowledge in Deep Learning
Traditional computer vision based
- Hand detection using multiple proposals
DL-based Two-Stage Object Detection
DL-based Single-Shot Object Detection
Face detection
- Viola-Jones methods
- MTCNN
Face alignment
- Regression-based
- Template fitting
OHEM
Tools
- Netron

Basic knowledge in Deep Learning

Metrics

分类模型评估指标——准确率、精准率、召回率、F1、ROC曲线、AUC曲线
Let's evaluate classification model with ROC and PR curves

Confusion matrix

PR Curve: Precision-vs-Recall graph. The higher it is, the better the model is. The AUC is Average Precision.
ROC Curve: TPR-vs-FPR graph at different classification thresholds. AUC stands for "Area under the Curve". Model whose predictions are 100% correct has an AUC of 1.0. ROC curve disregards sample imbalance.

N.B.

We are often concerned with Accuracy, Precision and Recall.
Sensitivity is also called the Recall.

Coding:
Python sklearn.metrics

IoU

The rate of intersection over union between the predicted bounding box and the ground truth bounding box. WHY? To measure how accurate is the object identified in the image and to decide whether to consider the object as a true positive or a false positive. A general threshold for IoU can be 0.5.

$$ IoU=\frac{\text{Area of Overlap}}{\text{Area of Union}} $$

Coding:
Takes bbox coords bbox_iou()

mAP

Average Precision (AP) computes the average precision for recall rate over 0 to 1. The general definition for the AP is the AUC of PR curve. $AP=\int^1_0 p(r)dr$.

Maximum precision. To smooth the PR curve, the precision value at each recall level is replaced with the maximum precision value to the right of this recall level. $p_{interp}(r)=\underset{\hat{r}>r}{max},p(\hat{r})$.

PASCAL VOC2008 calculated an average for the 11-point interpolated AP. The recall values are sampled at 0, 0.1, 0.2, ..., 0.9 and 1.0 then the average of maximum precision values for the 11 recall values are computed. $AP=\frac{1}{11}\sum_{r\in {0, 0.1, ..., 1.0}}p_{interp}(r)$.
For PASCAL VOC2010-2012, AP=AUC after removing zigzags:

$$ AP=\sum_{r\in {r_1, r_2,..., r_N}}(r_{n+1}-r_{n})p_{interp}(r_{n+1})\\
p_{interp}(r_{n+1})=\underset{\hat{r}\geq r_{n+1}}{max},p(\hat{r}) $$

COCO mAP used a 101-point interpolated AP. AP is averaged over 10 IoU thresholds of .50: .05: .95 and over all 80 categories.

Understand via an example mAP (mean Average Precision) for Object Detection
Official Detection evaluation
GitHub Repo for eval Cartucho/mAP

Non Maximum Suppression

◎ Hard NMS

◎ Hard and Soft NMS

Materials	Links
Paper	soft-NMS
Posts	重在区别的中文博客 \| Easy-to-understand EN post
Implementations	Soft-NMS \| Hard-NMS \| Hard and Soft NMS

Dataset and splits

Datasets

Training dataset: Consisted of the samples of data used to fit the model. The model learns from the training set to tune weights and biases.
Validation dataset: Consisted of the samples of data that provide an unbiased evaluation of the model that is fit on the training dataset in the process of learning. While tuning the parameters of the model, we use the validation dataset for frequent and regular evaluation and based on the results of frequent evaluations to modify the hyperparameters. Therefore, the effects of validation set on model parameters are indirect.
Test dataset: Consisted of the samples of data that provide an unbiased evaluation of the already learned model. The test set is used to evaluate the level of competence of the learned model.

Just referenceAbout Train, Validation and Test Sets in Machine Learning

Splits and validation methods

Formulae 【机器学习】Cross-Validation（交叉验证）详解
Including codes 机器学习面试题集 - 详解四种交叉验证方法
Including cases 训练集、验证集、测试集（附：分割方法+交叉验证）

Tips for coding

Random seed for reproduction

Pytorch随机种子

Traditional computer vision based

Hand detection using multiple proposals

In general, this paper made two contributions in hand detection domain.

The proposing of a two-stage hand detector.
A large dataset of images with ground truth annotations for hands.

Review: Review: Hand detection using multiple proposals
Paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.301.3602&rep=rep1&type=pdf

DL-based Two-Stage Object Detection

R-CNN

Fast R-CNN

Faster R-CNN

DL-based Single-Shot Object Detection

YOLO

Summary:

Unified prediction of bounding boxes.
Network architecture.
Design of the loss function.

Review: Review: You only look once (YOLOv1)
Paper: https://arxiv.org/abs/1506.02640

YOLOv2

Summary:

The main contributions that this paper made in the improved YOLOv2 are:

Improved the resolution of training images.
Applied anchor boxes (from Faster R-CNN) to predict bounding boxes.
Replaced the fully connected layer in the output layer in YOLO with a convolutional layer.

Another contribution is that they used a new dataset combination method and joint training algorithm to train a model on more than 9000 classes.

Results on COCO test-dev2015. From single shot multibox detector

Review: Review: YOLOv2
Paper: https://arxiv.org/abs/1612.08242
Official implementation: https://pjreddie.com/darknet/yolov2/

YOLOv3

Abstract:

We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP 50 in 51 ms on a Titan X, compared to 57.5 AP 50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at https://pjreddie.com/yolo/.

APs on COCO.png

Review: Review: YOLOv3
yolo系列之yolo v3【深度解析】巩固细节
Paper: https://arxiv.org/abs/1804.02767
Official implementation: https://pjreddie.com/darknet/yolo/
PyTorch implementation:
https://github.com/ultralytics/yolov3 (easy to follow)
AlexeyAB / darknet (requires higher level)(YOLOv4)
ayooshkathuria / YOLO_v3_tutorial_from_scratch (from scratch)
DeNA/PyTorch_YOLOv3(good articles)

YOLOv4

AP@.50:.95

SSD

目标检测|SSD原理与实现

Face detection

Viola-Jones methods

级联的脸部检测器，使用Haar-like features和AdaBoost来训练分类器
有比较好的表现，real-time performance
在实际场景(larger visual variations of human faces)中degrade很快，即使使用了更加高级的features和分类器。

Paper: Robust real-time face detection

MTCNN

Review: Review-MTCNN
open-face/mtcnn 各种实现汇总
MTCNN_face_detection_alignment 作者论文代码 | Caffe & Matlab
mtcnn-pytorch Python & PyTorch | 只有inference
facenet-pytorch Python & PyTorch | Advanced
mxnet_mtcnn_face_detection Python & mxnet
https://github.com/davidsandberg/facenet/blob/master/src/align/detect_face.py 被经常调用的tf版

Face alignment

Regression-based

Template fitting

OHEM

Paper: Training Region-based Object Detectors with Online Hard Example Mining
“hard-mining”, “hard examples”, … - Does “hard” mean anything specific in statistics when not applied to problem difficulty?
深度学习难分样本挖掘（Hard Mining）- 知乎文章。
深度学习不可忽略之OHEM:Online Hard Example Mining - 知乎文章。

Tools & Resources

史上最全神经网络结构图画图工具介绍，没有之一！

Netron

https://github.com/lutzroeder/netron，画神经网络结构图，可以采用不同文件类型的model。以PyTorch为例，使用Netron打开我们保存的三级网络的保存文件.pkl就画出来了。

Collection of CV Colab Notebooks

Top Computer Vision Google Colab Notebooks

Here is a list of the top google colab notebooks that use computer vision to solve a complex problem such as object detection, classification etc...