常用数据集介绍及转换

作者: 晓智AI | 来源:发表于2018-10-09 21:04 被阅读0次

常用数据集介绍及转换
Python数据类型转换
KNN算法应用
数据探索与质量检查工具-OpenRefine
(六)TensorFlow.js的Iris数据集示例
学习小组Day6-ZHX
K近邻算法-机器学习-实现鸢尾花种类预测
JS 里的数据类型转换
kettle初体验
新闻推荐(5): 主流数据集介绍

研究背景

在深度学习中常用的数据集进行归纳和总结

语义分割的数据集

1、COCO 数据集

COCO(Common Objects in Context)是一个新的图像识别、分割和图像语义数据集，是一个大规模的图像识别、分割、标注数据集。它可以用于多种竞赛，与本领域最相关的是检测部分，因为其一部分是致力于解决分割问题的。

COCO2014数据集类别汇总

coco目标检测数据集标注目标信息采用的是数据格式是json，其内容本质是一种字典结构，字典堆栈和列表信息内容维护。
coco里面的id和类名字对应：总共80类，但id号到90

    person  # 1
    vehicle 交通工具 #8
        {bicycle
         car
         motorcycle
         airplane
         bus
         train
         truck
         boat}
    outdoor  #5
        {traffic light
        fire hydrant
        stop sign
        parking meter
        bench}
    animal  #10
        {bird
        cat
        dog
        horse
        sheep
        cow
        elephant
        bear
        zebra
        giraffe}
    accessory 饰品 #5
        {backpack 背包
        umbrella 雨伞
        handbag 手提包
        tie 领带
        suitcase 手提箱
        }
    sports  #10
        {frisbee
        skis
        snowboard
        sports ball
        kite
        baseball bat
        baseball glove
        skateboard
        surfboard
        tennis racket
        }
    kitchen  #7
        {bottle
        wine glass
        cup
        fork
        knife
        spoon
        bowl
        }
    food  #10
        {banana
        apple
        sandwich
        orange
        broccoli
        carrot
        hot dog
        pizza
        donut
        cake
        }
    furniture 家具 #6
        {chair
        couch
        potted plant
        bed
        dining table
        toilet
        }
    electronic 电子产品 #6
        {tv
        laptop
        mouse
        remote
        keyboard
        cell phone
        }
    appliance 家用电器 #5
        {microwave
        oven
        toaster
        sink
        refrigerator
        }
    indoor  #7
        {book
        clock
        vase
        scissors
        teddy bear
        hair drier
        toothbrush
        }

coco_id_name_map={1: 'person', 2: 'bicycle', 3: 'car', 4: 'motorcycle', 5: 'airplane',
                   6: 'bus', 7: 'train', 8: 'truck', 9: 'boat', 10: 'traffic light',
                   11: 'fire hydrant', 13: 'stop sign', 14: 'parking meter', 15: 'bench',
                   16: 'bird', 17: 'cat', 18: 'dog', 19: 'horse', 20: 'sheep', 21: 'cow',
                   22: 'elephant', 23: 'bear', 24: 'zebra', 25: 'giraffe', 27: 'backpack',
                   28: 'umbrella', 31: 'handbag', 32: 'tie', 33: 'suitcase', 34: 'frisbee',
                   35: 'skis', 36: 'snowboard', 37: 'sports ball', 38: 'kite', 39: 'baseball bat',
                   40: 'baseball glove', 41: 'skateboard', 42: 'surfboard', 43: 'tennis racket',
                   44: 'bottle', 46: 'wine glass', 47: 'cup', 48: 'fork', 49: 'knife', 50: 'spoon',
                   51: 'bowl', 52: 'banana', 53: 'apple', 54: 'sandwich', 55: 'orange',
                   56: 'broccoli', 57: 'carrot', 58: 'hot dog', 59: 'pizza', 60: 'donut',
                   61: 'cake', 62: 'chair', 63: 'couch', 64: 'potted plant', 65: 'bed', 67: 'dining table',
                   70: 'toilet', 72: 'tv', 73: 'laptop', 74: 'mouse', 75: 'remote', 76: 'keyboard',
                   77: 'cell phone', 78: 'microwave', 79: 'oven', 80: 'toaster', 81: 'sink',
                   82: 'refrigerator', 84: 'book', 85: 'clock', 86: 'vase', 87: 'scissors',
                   88: 'teddy bear', 89: 'hair drier', 90: 'toothbrush'}

COCO2017数据集类别汇总

包含了超过80个物体类别，分别为：['background = 0','person=1', 'bicycle=2', 'car=3', 'motorcycle=4', 'airplane=5', 'bus=6', 'train=7', 'truck=8', 'boat=9', 'traffic light=10', 'fire hydrant=11', 'stop sign=13', 'parking meter=14', 'bench=15', 'bird=16', 'cat=17', 'dog=18', 'horse=19', 'sheep=20', 'cow=21', 'elephant=22', 'bear=23', 'zebra=24', 'giraffe=25', 'backpack=27', 'umbrella=28', 'handbag=31', 'tie=32', 'suitcase=33', 'frisbee=34', 'skis=35', 'snowboard=36', 'sports ball=37', 'kite=38', 'baseball bat=39', 'baseball glove=40', 'skateboard=41', 'surfboard=42', 'tennis racket=43', 'bottle=44', 'wine glass=46', 'cup=47', 'fork=48', 'knife=49', 'spoon=50', 'bowl=51', 'banana=52', 'apple=53', 'sandwich=54', 'orange=55', 'broccoli=56', 'carrot=57', 'hot dog=58', 'pizza=59', 'donut=60', 'cake=61', 'chair=62', 'couch=63', 'potted plant=64', 'bed=65', 'dining table=67', 'toilet=70', 'tv=72', 'laptop=73', 'mouse=74', 'remote=75', 'keyboard=76', 'cell phone=77', 'microwave=78', 'oven=79', 'toaster=80', 'sink=81', 'refrigerator=82', 'book=84', 'clock=85', 'vase=86', 'scissors=87', 'teddy bear=88', 'hair drier=89', 'toothbrush=90']。

91个填充类别，分别为['banner=92', 'blanket=93', 'branch=94', 'bridge=95', 'building-other=96', 'bush=97', 'cabinet=98', 'cage=99', 'cardboard=100', 'carpet=101', 'ceiling-other=102', 'ceiling-tile=103', 'cloth=104', 'clothes=105', 'clouds=106', 'counter=107', 'cupboard=108', 'curtain=109', 'desk-stuff=110', 'dirt=111', 'door-stuff=112', 'fence=113', 'floor-marble=114', 'floor-other=115', 'floor-stone=116', 'floor-tile=117', 'floor-wood=118', 'flower=119', 'fog=120', 'food-other=121', 'fruit=122', 'furniture-other=123', 'grass=124', 'gravel=125', 'ground-other=126', 'hill=127', 'house=128', 'leaves=129', 'light=130', 'mat=131', 'metal=132', 'mirror-stuff=133', 'moss=134', 'mountain=135', 'mud=136', 'napkin=137', 'net=138', 'paper=139', 'pavement=140', 'pillow=141', 'plant-other=142', 'plastic=143', 'platform=144', 'playingfield=145', 'railing=146', 'railroad=147', 'river=148', 'road=149', 'rock=150', 'roof=151', 'rug=152', 'salad=153', 'sand=154', 'sea=155', 'shelf=156', 'sky-other=157', 'skyscraper=158', 'snow=159', 'solid-other=160', 'stairs=161', 'stone=162', 'straw=163', 'structural-other=164', 'table=165', 'tent=166', 'textile-other=167', 'towel=168', 'tree=169', 'vegetable=170', 'wall-brick=171', 'wall-concrete=172', 'wall-other=173', 'wall-panel=174', 'wall-stone=175', 'wall-tile=176', 'wall-wood=177', 'water-other=178', 'waterdrops=179', 'window-blind=180', 'window-other=181', 'wood=182', 'other=183']。提供了118287张训练图片，5000张验证图片，以及超过40670张测试图片。由于其规模巨大，目前已非常常用，对领域发展很重要。实际上，该竞赛的结果每年都会在ECCV的研讨会上与ImageNet数据集的结果一起公布。它有如下特点：
1）Object segmentation：物体分割
2）Recognition in context ：上下文识别
3）Superpixel stuff segmentation：超分辨率的实物分割
4）330K images (>200K labeled)：33万张图片（超过20万有标记）
5）1.5 million object instances：150万个物体实例
6）80 object categories：80个物体类别
9）91 stuff categories ：91个stuff类别
10）5 captions per image：每张图像5个标题
11）250,000 people with keypoints：25万张带关节点的人物图片

COCO数据集对于图像的标注信息不仅有类别、位置信息，还有对图像的语义文本描述，COCO数据集的开源使得近两三年来图像分割语义理解取得了巨大的进展，也几乎成为了图像语义理解算法性能评价的“标准”数据集。详细介绍参考。注意COCO用于语义分割的API要从这里下载：https://github.com/nightrome/cocostuffapi

代码：获取COCO caption 每张图片有5句文本描述

from pycocotools.coco import COCO
import numpy as np
import skimage.io as io
import matplotlib.pyplot as plt
import pylab
pylab.rcParams['figure.figsize'] = (8.0, 10.0)
dataDir='./coco2017'
dataType='val2017'  # train2017
# initialize COCO api for caption annotations\n",
annFile = '{}/annotations/captions_{}.json'.format(dataDir,dataType)
coco=COCO(annFile)

coco_caps=COCO(annFile)
imgIdsall = coco_caps.getImgIds()
print(imgIdsall)
print(len(imgIdsall))

for i in imgIdsall:

  imgIds = coco.getImgIds(imgIds = [i])
  img = coco.loadImgs(imgIds[np.random.randint(0,len(imgIds))])[0]
  print(img)

  str = img['file_name']
  str1 = str[:-4]
  print(str1)

  path = './val2017/'+str1+'.txt'  # train2017

  with open (path,'w') as f:

  # load and display caption annotations\n",
    annIds = coco.getAnnIds(imgIds=img['id'])
    anns = coco.loadAnns(annIds)
    for ann in anns:
      print(ann['caption'])
      f.write(ann['caption']+'\n')
      print('    ')
  coco.showAnns(anns)

代码：从指定文本中，读取文件名，然后总指定路径将文件复制到指定文件夹中

# -*- coding: utf-8 -*-   
import time     
import os  
import shutil
 
def re_mycopyfile(srcfile,dstfile,num):
    #name_long=16
    l=len(str(num))
    zero='00000000'
    newname = srcfile[-16:-4]
    if not os.path.isfile(srcfile):
        print "%s not exist!"%(srcfile)
    else:
        #fpath,fname=os.path.split(dstfile)    #分离文件名和路径
        if not os.path.exists(dstfile):
            os.makedirs(dstfile) #创建路径
        #dstfile=dstfile+zero[:name_long-l-1]+str(num)+'.txt'
        dstfile = dstfile+str(newname)+'.txt'
        print dstfile             
        shutil.copyfile(srcfile,dstfile)      #复制文件
        print "copy %s -> %s"%(srcfile,dstfile)
 
 
 
if __name__ == '__main__':
    path1="/home/henry/Files/ICCV2019/cocostuffapi/PythonAPI/trainls.txt"  # 待复制文件列表
    path2="/home/henry/Files/ICCV2019/cocostuffapi/PythonAPI/train2017all/"  # 待复制文件目录
    path3="/home/henry/Files/ICCV2019/cocostuffapi/PythonAPI/train2017/"  # 保存目标目录
    path4="/home/henry/Files/ICCV2019/cocostuffapi/PythonAPI/trainnew.txt"  
 
    begin=0
    count=begin
    with open(path1,'r')as f:
        for line in f:
            line=line.split('\n')
            print line[0]
            srcfile = path2+str(line[0])
            print srcfile
            count=count+1
            print count
            dstfile=path3
            re_mycopyfile(srcfile,dstfile,count)
 
    count=begin
    name_long=6
    l=len(str(count+1))
    zero='00000000'

    with open(path1,'r')as f:
        for line in f:
            count=count+1
            out_words=line.split('/')
            #out_words[-1]=zero[:name_long-l-1]+str(count)+'.txt'
            out_words[-1] = zero[:name_long - l - 1] + str(count) + '.txt'
            with open(path4,'a+') as fp:
                fp.write("/".join(out_words)+"\n")

2、VOC2007数据集

类别汇总

    aeroplane
    bicycle
    bird
    boat
    bottle
    bus
    car
    cat
    chair
    cow
    diningtable
    dog
    horse
    motorbike
    person
    pottedplant
    sheep
    sofa
    train
    tvmonitor

MSCOCO数据集格式转化成VOC数据集格式
参考链接COCO数据集转化成VOC数据集格式
首先得到COCO_train.json文件，可以根据实际需要的类别进行修改

#-*- coding:utf-8-*-
import json
className = {  # 84  total
    1:'person',
    2:'bicycle',
    3:'car',
    4:'motorcycle',
    5:'airplane',
    6:'bus',
    7:'train',
    8:'truck',
    9:'boat',
    10:'traffic light',
    11:'fire hydrant',
    13:'stop sign',
    14:'parking meter',
    15:'bench',
    16:'bird',
    17:'cat',
    18:'dog',
    19:'horse',
    20:'sheep',
    21:'cow',
    22:'elephant',
    23:'bear',
    24:'zebra',
    25:'giraffe',
    27:'backpack',
    28:'umbrella',
    31:'handbag',
    32:'tie',
    33:'suitcase',
    34:'frisbee',
    35:'skis',
    36:'snowboard',
    37:'sports ball',
    38:'kite',
    39:'baseball bat',
    40:'baseball glove',
    41:'skateboard',
    42:'surfboard',
    43:'tennis racket',
    44:'bottle',
    46:'wine glass',
    47:'cup',
    48:'fork',
    49:'knife',
    50:'spoon',
    51:'bowl',
    52:'banana',
    53:'apple',
    54:'sandwich',
    55:'orange',
    56:'broccoli',
    57:'carrot',
    58:'hot dog',
    59:'pizza',
    60:'donut',
    61:'cake',
    62:'chair',
    63:'couch',
    64:'potted plant',
    65:'bed',
    67:'dining table',
    70:'toilet',
    71:'truck',
    72:'tv',
    73:'laptop',
    74:'mouse',
    75:'remote',
    76:'keyboard',
    77:'cell phone',
    78:'microwave',
    79:'oven',
    80:'toaster',
    81:'sink',
    82:'refrigerator',
    84:'book',
    85:'clock',
    86:'vase',
    87:'scissors',
    88:'teddy bear',
    89:'hair drier',
    90:'toothbrush',
}
classNum = [1,2,3,4,5,6,7,8,9,10,
11,12,13,14,15,16,17,18,19,20,
21,22,23,24,25,26,27,28,29,30,
31,32,33,34,35,36,37,38,39,40,
41,42,43,44,45,46,47,48,49,50,
51,52,53,54,55,56,57,58,59,60,
61,62,63,64,65,66,67,68,69,70,
71,72,73,74,75,76,77,78,79,80,
81,82,83,84,85,86,87,88,89,90]

cocojson="/home/ouc/data1/liuhongzhi/AttnGAN/dataset/coco2014/annotations/instances_train2014.json"
def writeNum(Num):
    with open("COCO_train.json", "a+") as f:
        f.write(str(Num))
inputfile = []
inner = {}
cnt = 0
with open(cocojson, "r+") as f:
    allData = json.load(f)
    data =allData["annotations"]
    print(data[1])
    print("read ready")
for i in data:
    if (i['category_id'] in classNum):
        inner = {
            "filename":str(i["image_id"]).zfill(12),
            "name":className[i["category_id"]],
            "bndbox":i["bbox"]
        }
        inputfile.append(inner)
        cnt = cnt + 1
        if cnt%10000 == 0:
           print("id : " + str(cnt))
inputfile = json.dumps(inputfile)
writeNum(inputfile)

其次根据选取出来的类别中的图片筛选需要的图片到指定目录存放，得到训练集图片

# -*- coding: utf-8 -*-
# @Time    : 2018/03/09 10:46
# @Author  : SyGoing
# @Site    :
# @File    : getimagesbyID.py
# @Software: PyCharm
import json
import os
import cv2
#from utils.timer import Timer

nameStr = []
with open("COCO_train.json", "r+") as f:
    data = json.load(f)
    print("read ready")
for i in data:
    imgName = "COCO_train2014_"+ str(i["filename"]) + ".jpg"
    nameStr.append(imgName)
nameStr = set(nameStr)
print(nameStr)
print(len(nameStr))

#t_total = Timer()
#total_time = t_total.toc()
#wait_time = max(int(60 - total_time * 1000), 1)
#cv2.waitKey(0)

path = "/home/ouc/data1/liuhongzhi/AttnGAN/dataset/coco2014/images/train2014/"
savePath="/home/ouc/data1/liuhongzhi/yolo2-pytorch/datasets/COCO/VOC2007/JPEGImages/"
count=0
for file in nameStr:
    print(path+file)
    img=cv2.imread(path+file)
    '''
    print(str(img))
    cv2.imshow('test',img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    '''
    cv2.imwrite(savePath+file,img)
    count=count+1
    print('num: '+count.__str__()+'     '+file)

然后根据筛选出来的图片ID生成VOC数据集的XML文件到Annotations文件夹

#-*- coding:utf-8-*-

import xml.dom
import xml.dom.minidom
import os
# from PIL import Image
import cv2
import json

# xml文件规范定义


_IMAGE_PATH = '/home/ouc/data1/liuhongzhi/yolo2-pytorch/datasets/COCO/VOC2007/JPEGImages/'

_INDENT = '' * 4
_NEW_LINE = '\n'
_FOLDER_NODE = 'COCO2014'
_ROOT_NODE = 'annotation'
_DATABASE_NAME = 'LOGODection'
_ANNOTATION = 'COCO2014'
_AUTHOR = 'SyGoing_CSDN'
_SEGMENTED = '0'
_DIFFICULT = '0'
_TRUNCATED = '0'
_POSE = 'Unspecified'

# _IMAGE_COPY_PATH= 'JPEGImages'
_ANNOTATION_SAVE_PATH = '/home/ouc/data1/liuhongzhi/yolo2-pytorch/datasets/COCO/VOC2007/Annotations/'


# _IMAGE_CHANNEL= 3

# 封装创建节点的过程
def createElementNode(doc, tag, attr):  #创建一个元素节点
    element_node = doc.createElement(tag)

    # 创建一个文本节点
    text_node = doc.createTextNode(attr)

    # 将文本节点作为元素节点的子节点
    element_node.appendChild(text_node)

    return element_node


# 封装添加一个子节点
def createChildNode(doc, tag, attr, parent_node):
    child_node = createElementNode(doc,tag, attr)

    parent_node.appendChild(child_node)


# object节点比较特殊
def createObjectNode(doc, attrs):
    object_node =doc.createElement('object')

    midname=attrs['name']


    #if midname !='person':   # 注释后可以得到所有类别
    #    midname='car'

    createChildNode(doc, 'name', midname,
                    object_node)

    #createChildNode(doc, 'name',attrs['name'],
    #                object_node)

    createChildNode(doc, 'pose',
                    _POSE, object_node)

    createChildNode(doc, 'truncated',
                    _TRUNCATED,object_node)

    createChildNode(doc, 'difficult',
                    _DIFFICULT,object_node)

    bndbox_node = doc.createElement('bndbox')

    createChildNode(doc, 'xmin',str(int(attrs['bndbox'][0])),
                    bndbox_node)

    createChildNode(doc, 'ymin',str(int(attrs['bndbox'][1])),
                    bndbox_node)

    createChildNode(doc, 'xmax',str(int(attrs['bndbox'][0] + attrs['bndbox'][2])),
                    bndbox_node)

    createChildNode(doc, 'ymax',str(int(attrs['bndbox'][1] + attrs['bndbox'][3])),
                    bndbox_node)

    object_node.appendChild(bndbox_node)

    return object_node


# 将documentElement写入XML文件
def writeXMLFile(doc, filename):
    tmpfile = open('tmp.xml', 'w')

    doc.writexml(tmpfile, addindent='' *4, newl='\n', encoding='utf-8')


    tmpfile.close()

    # 删除第一行默认添加的标记

    fin = open('tmp.xml')
    # print(filename)
    fout = open(filename, 'w')
    # print(os.path.dirname(fout))

    lines = fin.readlines()

    for line in lines[1:]:

        if line.split():
            fout.writelines(line)

            # new_lines =''.join(lines[1:])

        # fout.write(new_lines)

    fin.close()

    fout.close()


if __name__ == "__main__":
    ##读取图片列表
    img_path ="/home/ouc/data1/liuhongzhi/yolo2-pytorch/datasets/COCO/VOC2007/JPEGImages/"
    fileList = os.listdir(img_path)
    if fileList == 0:
        os._exit(-1)

    with open("COCO_train.json", "r") as f:
        ann_data = json.load(f)

    current_dirpath =os.path.dirname(os.path.abspath('__file__'))

    if not os.path.exists(_ANNOTATION_SAVE_PATH):
        os.mkdir(_ANNOTATION_SAVE_PATH)

        # if not os.path.exists(_IMAGE_COPY_PATH):
    #    os.mkdir(_IMAGE_COPY_PATH)

    for imageName in fileList:

        saveName =imageName.strip(".jpg")
        print(saveName)
        # pos =fileList[xText].rfind(".")
        # textName =fileList[xText][:pos]

        # ouput_file = open(_TXT_PATH +'/' + fileList[xText])
        # ouput_file =open(_TXT_PATH)

        # lines = ouput_file.readlines()

        xml_file_name =os.path.join(_ANNOTATION_SAVE_PATH, (saveName + '.xml'))
        # withopen(xml_file_name,"w") as f:
        #     pass

        img =cv2.imread(os.path.join(img_path, imageName))
        print(os.path.join(img_path,imageName))
        # cv2.imshow(img)
        height, width, channel =img.shape
        print(height, width, channel)

        my_dom = xml.dom.getDOMImplementation()

        doc = my_dom.createDocument(None,_ROOT_NODE, None)

        # 获得根节点
        root_node = doc.documentElement

        # folder节点

        createChildNode(doc, 'folder',_FOLDER_NODE, root_node)

        # filename节点

        createChildNode(doc, 'filename',saveName + '.jpg', root_node)

        # source节点

        source_node =doc.createElement('source')

        # source的子节点

        createChildNode(doc, 'database',_DATABASE_NAME, source_node)

        createChildNode(doc, 'annotation',_ANNOTATION, source_node)

        createChildNode(doc, 'image','flickr', source_node)

        createChildNode(doc, 'flickrid','NULL', source_node)

        root_node.appendChild(source_node)

        # owner节点

        owner_node = doc.createElement('owner')

        # owner的子节点

        createChildNode(doc, 'flickrid','NULL', owner_node)

        createChildNode(doc, 'name',_AUTHOR, owner_node)

        root_node.appendChild(owner_node)

        # size节点

        size_node =doc.createElement('size')

        createChildNode(doc, 'width',str(width), size_node)

        createChildNode(doc, 'height',str(height), size_node)

        createChildNode(doc, 'depth',str(channel), size_node)

        root_node.appendChild(size_node)

        # segmented节点

        createChildNode(doc, 'segmented',_SEGMENTED, root_node)

        for ann in ann_data:
            imgName ="COCO_train2014_" + str(ann["filename"])
            cname=saveName;
            if (saveName == imgName ):
                # object节点
                object_node =createObjectNode(doc, ann)
                root_node.appendChild(object_node)

            else:
                continue

                # 构建XML文件名称

        print(xml_file_name)

        # 创建XML文件

        # createXMLFile(attrs, width,height, xml_file_name)

        # # 写入文件
        #
        writeXMLFile(doc, xml_file_name)

最后得到train.txt文件，里面是所有训练图片的名字，需要删除路径和后缀，只保留图片名。

find ./JPEGImages -name '*.jpg'  > train.txt

3、 Cityscapes数据集

Cityscapes数据集则是由奔驰主推，提供无人驾驶环境下的图像分割数据集，用于评估视觉算法在城区场景语义理解方面的性能。图像Translation算法常用，如Pix2pix和CycleGAN。

Cityscapes包含50个欧洲城市不同场景、不同背景、不同季节的街景的33类标注物体，包括：{'unlabeled'=0 , 'ego vehicle'=1 , 'rectification border'=2 , 'out of roi'= 3 , 'static'=4 , 'dynamic'=5 , 'ground'=6 ,'road'=7 ,'sidewalk'=8 ,parking'=9 ,'rail track'=10 ,'building'=11 ,'wall'=12 ,'fence'=13 , 'guard rail'=14 ,'bridge'=15 ,'tunnel'=16 ,'pole'=17 ,'polegroup'=18 , 'traffic light'=19 ,'traffic sign'=20 , 'vegetation'=21 , 'terrain'=22 ,'sky'=23 , 'person'=24 , 'rider'=25 , 'car'=26 ,'truck'=27 , 'bus'=28 ,'caravan'=29 ,'trailer'=30 ,'train'=31 ,'motorcycle'=32 , 'bicycle'=33 }，但是在这33个类中，评估时只用到了19个类别，因此训练时将33个类映射为19个类，评估时需要将19个类又映射回33个类上传评估服务器。这个数据需要注册账号才能下载。

Cityscapes数据集共有fine和coarse两套评测标准，前者提供5000张精细标注的图像，后者提供5000张精细标注外加20000张粗糙标注的图像，用PASCAL VOC标准的 intersection-over-union （IoU）得分来对算法性能进行评价。 5000张精细标注的图片分为训练集2975张图片，验证集有500张图片，而测试集有1525张图片，测试集不对外公布，需要将预测结果上传到评估服务器才能计算mIoU值。

常用数据集介绍及转换
研究背景在深度学习中常用的数据集进行归纳和总结语义分割的数据集 1、COCO 数据集 COCO(Common ...
Python数据类型转换
本文主要介绍Python中的常用数据类型转换，更多内容请参考:Python学习指南常用的数据类型转换
KNN算法应用
1. 利用Iris数据集来使用KNN算法 1.1 Iris数据集介绍 Iris数据集是常用的分类实验数据集，由F...
数据探索与质量检查工具-OpenRefine
OpenRefine提供数据集治理功能，它很擅长数据的探索、清理、转换等，主要功能介绍如下。转换数据 Op...
(六)TensorFlow.js的Iris数据集示例
这是一个使用Iris数据集进行神经网络分类训练的示例。 Iris数据集介绍 Iris数据集是常用的分类实验数据集，...
学习小组Day6-ZHX
dplyr包数据集：iris 安装、加载包及常用功能
K近邻算法-机器学习-实现鸢尾花种类预测
一、案例：鸢尾花种类预测： 1、数据集介绍：Iris数据集是常用的分类实验数据集，由Fisher,1936收集整理...
JS 里的数据类型转换
总结一下JS中常用的数据类型转换 1. 转换为字符串 toString(),可以将数字，对象及布尔值的数据转换为字...
kettle初体验
介绍先介绍重要的两个概念转换--单个数据抽取的流程作业--可以包含多个转换 kettle是开源的ELT工具集，...
新闻推荐(5): 主流数据集介绍
前言借着ACL2020上MASR的MIND数据集论文介绍一些新闻推荐中常用的数据集/ 论文：MIND: A La...