用不到 30 行 Python 代码实现 YOLO!

作者: 14e61d025165 | 来源:发表于2019-04-04 14:36 被阅读1次

用不到 30 行 Python 代码实现 YOLO!
用不到 30 行 Python 代码实现 YOLO
Python一行代码能做什么，这30个案例告诉你
用 Python 制作一个贪吃蛇
拼写自动矫正
一行代码启动 Web server
20个Python项目，正在求职的你了解一下【游戏篇】
记：用Python爬到的第一篇小说
Python 一行代码实现1--100之和
10 行Python 代码，实现 AI 目标检测技术，真给力！

原文链接：

https://towardsdatascience.com/you-only-look-once-yolo-implementing-yolo-in-less-than-30-lines-of-python-code-97fb9835bfd2

"You Only Look Once"是一个实时对象检测算法，它避免了在生成区域建议上花费太多的时间。它不能完美地定位物体，而是优先考虑速度和识别。
Python学习群：683380553，有大牛答疑，有资源共享！是一个非常不错的交流基地！欢迎喜欢Python的小伙伴！
像 faster R-CNN 这样的架构是准确的，但是模型本身相当复杂，有多个输出，每个输出都是潜在的错误来源。一旦接受训练，他们仍然没有足够的速度来实时运行。

设想一辆自动驾驶汽车看到这条街道。对于一辆自动驾驶汽车来说，能够探测到周围物体的位置是至关重要的，比如行人、汽车和交通灯。最重要的是，这种检测必须在接近实时的情况下进行，这样汽车才能安全行驶在街道上。汽车并不总是需要知道所有这些物体是什么?它只是需要需要知道，千万不要撞到这些物体，另外，它也确实需要识别红绿灯、自行车和行人，以便能够正确遵守道路规则。在下面的图中，我使用YOLO算法来定位和分类不同的对象，有一个定位每个对象的包围框和相应的类标签。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1554359642352 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

动态的YOLO

很显然，下一个问题就是，YOLO的运行原理是什么？

假设我们有一个CNN，它被训练来识别几个类，包括交通灯、汽车、人和卡车。我们给了它两种类型的锚盒，一种高的和一种宽的，这样它就可以处理不同形状的重叠对象。一旦CNN经过训练，我们现在可以通过输入新的测试图像来检测图像中的物体。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1554359642356 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

设定神经网络

什么是 anchor box ?YOLO可以很好地工作于多个对象，其中每个对象都与一个网格单元关联。但是在重叠的情况下，一个网格单元实际上包含两个不同对象的中心点，我们可以使用 anchor box 来允许一个网格单元检测多个对象。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1554359642357" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">[图片上传失败...(image-deae18-1554359659798)]

动态Anchor Boxes

在上图中，我们看到我们在图像中有一个人和一辆汽车重叠。因此，汽车的一部分被遮挡了。我们还可以看到，边界框，汽车和行人的中心都落在同一个网格单元中。由于每个网格单元的输出向量只能有一个类，因此它将被强制选择汽车或人。但是通过定义 Anchor Boxes ，我们可以创建一个更长的网格单元格向量，并将多个类与每个网格单元关联起来。

Anchor Boxes 具有定义的纵横比，并且他们试图检测恰好适合具有该比率的箱子的物体。例如，由于我们正在检测宽车和站立的人，我们将定义一个大致与汽车形状相似的 Anchor Box ，这个箱子比它高的宽。我们将定义另一个 Anchor Box ，它可以容纳一个站立的人，它的高度比它宽。

首先将测试图像分解成网格，然后网络产生输出向量，每个网格单元一个。这些向量告诉我们一个单元格中是否有一个对象，该对象是什么类，以及该对象的边界框。由于我们使用两个 Anchor Box ，我们将为每个网格单元获得两个预测的锚箱。实际上，大多数预测的锚箱都具有非常低的PC（物体存在概率）值。

在生成这些输出向量之后，我们使用非最大抑制来消除不可能的边界框。对于每个类，非最大抑制消除了PC值低于某个给定阈值的边界框。

什么是非极大抑制（NMS）？

YOLO使用非极大抑制（NMS）仅保留最佳边界框。 NMS的第一步是删除检测概率小于给定NMS阈值的所有预测边界框。在下面的代码中，我们将此NMS阈值设置为0.6。这意味着将删除检测概率小于0.6的所有预测边界框。

什么是交并比阈值（IOU）？

在删除具有低检测概率的所有预测边界框之后，NMS中的第二步是选择具有最高检测概率的边界框，并消除其交并比（IOU）值高于给定的所有边界框。 IOU门槛。在下面的代码中，我们将此IOU阈值设置为0.4。这意味着将删除所有相对于最佳边界框的IOU值大于0.4的预测边界框。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1554359642359 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

然后选择PC值最高的边界框，并删除与此太相似的边界框。它将重复此操作，直到每个类的所有非最大边界框都被删除为止。最终的结果将如下图所示，我们可以看到黄色已经有效地检测到了很多物体在图像中比如这样的一辆车和一个人。雷锋网 (公众号：雷锋网) 雷锋网雷锋网

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1554359642361 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">