图像识别

对于我们的大脑来说，视觉识别似乎是一件特别简单的事。人类不费吹灰之力就可以分辨狮子和美洲虎、看懂路标或识别人脸。但对计算机而言，这些实际上是很难处理的问题：这些问题只是看起来简单，因为大脑非常擅长理解图像。

在过去几年内，机器学习领域在解决此类难题方面取得了巨大进展。尤其是，我们发现一种称为深度卷积神经网络的模型可以很好地处理较难的视觉识别任务 - 在某些领域的表现与人类大脑不相上下，甚至更胜一筹。

研究人员通过用 ImageNet（计算机视觉的一种学术基准）验证其工作成果，证明他们在计算机视觉方面取得了稳步发展。他们陆续推出了以下几个模型，每一个都比上一个有所改进，且每一次都取得了新的领先成果：QuocNet、AlexNet、Inception (GoogLeNet)、BN-Inception-v2。Google 内部和外部的研究人员均发表过关于所有这些模型的论文，但这些成果仍是难以复制的。现在我们将采取后续步骤，发布用于在我们的最新模型 Inception-v3 上进行图像识别的代码。

Inception-v3 使用 2012 年的数据针对 ImageNet 大型视觉识别挑战赛训练而成。它处理的是标准的计算机视觉任务，在此类任务中，模型会尝试将所有图像分成 1000 个类别，如“斑马”、“斑点狗”和“洗碗机”。例如，以下是 AlexNet 对某些图像进行分类的结果：

image

为了比较各个模型，我们会检查正确答案不在模型预测的最有可能的 5 个选项中的频率，称为“top-5 错误率”。AlexNet 在 2012 年的验证数据集上实现了 15.3% 的 top-5 错误率；Inception (GoogLeNet)、BN-Inception-v2 和 Inception-v3 的 top-5 错误率分别达到 6.67%、4.9% 和 3.46%。

环境

1.python3.6+tensorflow+必要库（https://www.tensorflow.org/install这里有官方安装步骤）

2.http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz下载已经训练好的Inception-v3

3.下载后解压到某个目录，如：

image

编码

找到输出和名称的关系

已知：使用Inception-v3识别的结果是一个列表，列表的有效信息是这个图片有多么像训练时使用的一类图片和这类图片的分类编号，(图片的分类用分类编号进行标识)

1.分类编号（node_id）和编号字符串n********(uid)的映射关系在文件imagenet_2012_challenge_label_map_proto.pbtxt中。(这可能是因为,一个模型只能识别所有分类的一部分,uid能标识所有分类,而Inception-v3只能识别其中的1000种)

image

2.编号字符串n********(uid)和分类名称的关系在文件imagenet_synset_to_human_label_map.txt中。

image

所以：要想通过返回值知道英文名称，需要做如下处理：

#############################################################################
#时间：2019/3/24
#整理：章伟杰
#功能：先把node_id转为uid，再把uid转为human（英文），然后把得到的字典
#   保存到node2human.txt,避免重新计算映射。
#############################################################################
label_lookup_path = 'D:/tf/model/imagenet_2012_challenge_label_map_proto.pbtxt'   
uid_lookup_path = 'D:/tf/model/imagenet_synset_to_human_label_map.txt'

#11111保存分类编号1-1000(node_id)与编号字符串n********(uid)映射关系
proto_as_ascii = open(label_lookup_path,'r').readlines()
node_id_to_uid = {}
for line in proto_as_ascii:
    if line.startswith('  target_class:'):
        #获取分类编号1-1000
        target_class = int(line.split(': ')[1])
    if line.startswith('  target_class_string:'):
        #获取编号字符串n********
        target_class_string = line.split(': ')[1]
        #保存分类编号1-1000与编号字符串n********映射关系
        node_id_to_uid[target_class] = target_class_string[1:-2]

#22222保存编号字符串n********(uid)与分类名称(human（英文）)映射关系
proto_as_ascii_lines = open(uid_lookup_path,'r').readlines()
uid_to_human = {}
#一行一行读取数据
for line in proto_as_ascii_lines :
    #去掉换行符
    line=line.strip('\n')
    #按照'\t'分割
    parsed_items = line.split('\t')
    #获取分类编号
    uid = parsed_items[0]
    #获取分类名称
    human_string = parsed_items[1]
    #保存编号字符串n********与分类名称映射关系
    uid_to_human[uid] = human_string

#33333建立分类编号1-1000对应分类名称的映射关系
node_id_to_name = {}
for key, val in node_id_to_uid.items():
    #获取分类名称
    name = uid_to_human[val]
    #建立分类编号1-1000到分类名称的映射关系
    node_id_to_name[key] = name
print(node_id_to_name)
f = open('node2human.txt','w')
f.write(str(node_id_to_name))
f.close()

现在我们直接得到了分类编号和分类名称的关系，大概像这样，我们先得到映射关系的好处是，针对Inception-v3，我们不必要每次运行的时候都重新计算映射关系.

image.png

开始真正的编码

我们这里使用Inception-v3中自带的一张图片

cropped_panda.jpg

import tensorflow as tf
import numpy as np
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

with open('node2human.txt','r') as f:
    node_id_to_name = eval(f.read())#得到刚才计算的与名称的对应关系

#创建一个图来存放google训练好的模型
with tf.gfile.GFile('D:/tf/model/classify_image_graph_def.pb','rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    tf.import_graph_def(graph_def, name='')

with tf.Session() as sess:
    softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
    image_data = tf.gfile.GFile('D:/tf/model/cropped_panda.jpg', 'rb').read()
    predictions = sess.run(softmax_tensor,{'DecodeJpeg/contents:0': image_data})#图片格式是jpg格式
    predictions = np.squeeze(predictions)#把结果转为1维数据

    top_k = predictions.argsort()[-5:][::-1]
    for node_id in top_k:     
        #获取分类名称
        human_string = node_id_to_name[node_id]
        #获取该分类的置信度
        score = predictions[node_id]
        print('%s (score = %.5f)' % (human_string, score))

我们得到了正确的结果