美文网首页简友广场心理想法
目标识别和文字检测算法 Faster R-CNN、CTPN

目标识别和文字检测算法 Faster R-CNN、CTPN

作者: Cache_wood | 来源:发表于2021-12-12 17:22 被阅读0次

Faster R-CNN 目标检测算法

Towards Real-Time Object Detection with Region Proposal Networks

R-CNN:Regions with CNN features

  1. Input image
  2. Extract region proposals(~2k)
  3. Compute CNN features
  4. Classify regions

IoU Intersection over Union

测量在特定数据集中检测相应物体准确度的一个标准

预测范围: bounding boxex

ground-truth bounding boxes(人为在训练集图像中标出要检测物体的大概范围)

IoU = \frac{Area\quad of\quad Overlap}{Area \quad of \quad Union}

NMS (Non-Maximum Suppression)

Fast R-CNN

Selection search

Anchor sliding window Feature extraction

RPN Loss

Cls label 二分类,是否有物体,使用IoU gt bounding box anchor box

Loc label
t_x^* = (x^*-x_a)/w_a, t_y^* = (y^*-y_a)/h_a,\\ t_w^* = log(w^*/w_a), t_h^* = log(t_w^*)

t_x = (x-x_a)/w_a, t_y = (y-ya)/h_a,\\ t_w = log(w/w_a), t_h = log(h/h_a)

Cls loss

Cross Entropy交叉熵

Loc Loss
z_i = 0.5(x_i-y_i)^2/beta, \quad if |x_i-y_i|<beta\\ z_i = |x_i-y_i|-0.5*beta, \quad otherwise

RoI Head Region of Interest

Mask R-CNN

L = L_{cls}+L_{box}+L_{mask}

To this we apply a per-pixel sigmoid,and define L_{mask} as the average binary cross-entropy loss. For an RoI associated with gorund-truth k, L_{mask} is only defined o the k-th mask(other mask outputs do not contribute to the loss).

RoI Align不对齐,保留浮点,在小区域之内继续划分

CTPN 文字检测算法

Detecting Text in Natural Image with Connectionist Text Proposal Network

  • Detecting text in fine-scale proposals
  • Recurrent connectionist text proposals
  • Side-refinement

v_c= (c_y-c_y^a)/h^a\\ v_c^* = (c_y^*-c_y^a)/h^a\\ v_h = log(h/h_a)\\ v_h^* = log(h^*/h^a)

Text line construction

o^* = (x^*_{side} -c^a_x)/w^a

Code

bounding box

CRNN 文字识别算法

An End-yo-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

  • CRNN
  • Code
  • CTC
  • lexicon-based
  • lexicon-free

feature sequence —— receptive field感受野

CRNN——CTC
\pi = --hh-e-l-ll-oo--\\ B(\pi) = hello\\ p(l|y) = \sum_{\pi:B(\pi)=1} p(\pi|y), \quad p('hello'|y) = \sum_{\pi:B(\pi)='hello'} p(\pi|y)

CTC Theory

p(l|x) = \sum_{\pi \in B^{-1}(1)} p(\pi|x).\\ h(x) = arg\quad max_{1\in L\leq T} \quad p(l|x).\\ O^{ML}(S,N_w) = -\sum_{(x,z)\in S} ln(p(z|x))=-\sum_{(x,z) \in S} ln(\sum_{\pi \in B^{-1}(z)} p(\pi |x))

为了让所有的path都能在图中唯一、合法的表示,结点转换有如下约束:

  1. 转换只能往右下方向,其他方向不允许
  2. 相同的字符之间起码要有一个空字符
  3. 非空字符不能被跳过
  4. 起点必须从前两个字符开始
  5. 终点必须落在结尾两个字符

forward-backward

定义在时刻t经过节点s的全部前缀子路径的概率总和为前向概率\alpha_t(s)
\alpha_3(4) = p(_ap)+p(aap)+p(a_p)+p(app)

  • 情况1:第s个符号为空符号blank
    \alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1))·y^t_{seq(s)}

  • 情况2:第s个符号等于第s-2个符号
    \alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1))·y^t_{seq(s)}

  • 情况3:既不属于情况1,也不属于情况2
    \alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1)+\alpha_{t-1}(s-2))·y^t_{seq(s)}

不属于情况2
\alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1)+\alpha_{t-1}(s-2))·y^t_{seq(s)}

相关文章

网友评论

    本文标题:目标识别和文字检测算法 Faster R-CNN、CTPN

    本文链接:https://www.haomeiwen.com/subject/qjgrfrtx.html