美文网首页
python查看数据集的结构 (用dict实现switch-ca

python查看数据集的结构 (用dict实现switch-ca

作者: 铁佛爷 | 来源:发表于2018-11-07 11:54 被阅读0次

做机器学习的经常需要处理数据集,可能是json,mat,h5各种格式的,里面有各种标签结构。
了解数据集的结构、格式、类型,对我们处理数据是有帮助的。
写了一个有通用性的程序,
在此用来查看mscoco数据集的json注释,相同级别的数据使用了相同的缩进。

# -*- coding: utf-8 -*-
"""
Created on Tue Nov  6 22:23:17 2018

@author: BigFly
"""
import json

def process_dict(obj,level):
    print("<dict>")
    for key in obj.keys():
        print("  "*level, "\"%s\""%(key), end=":   ")
        process(obj[key],level+1)
        
def process_list(obj,level):
    print("<list>"," len=",len(obj))
    samplenum=1 # 对list,查看几个item
    for idx in range(min(samplenum,len(obj))):
        print("  "*level, "item",idx, end=":   ")
        process(obj[idx], level+1)
    if len(obj)>samplenum:
        print("  "*level, "item ...")
        
def process_str(obj,level):
    print("<str>",obj)
    
def process_num(obj,level):
    print("<num>",obj)
    
switch={type({}) :  process_dict,
        type([]) :  process_list,
        type("") :  process_str,
        type(1)  :  process_num,
        type(1.0) : process_num }

def process(obj,level=0):
    obj_typ=type(obj)
    try:
        switch[obj_typ](obj,level+1)
    except KeyError as e:
        print("ERROR: NO ", obj_typ)


path="E:\\dataset\\MSCOCO\\annotations_trainval2017\\annotations\\instances_val2017.json"
path="E:\\dataset\\MSCOCO\\annotations_trainval2017\\annotations\\instances_train2017.json"

jsonstr=open(path).readline()
print("jsonstr",type(jsonstr),len(jsonstr))
annotations=json.loads(jsonstr)

#查看annotations的结构
process(annotations) #['licenses', 'categories', 'annotations', 'info', 'images']

这里列举了对5种类型的处理,要处理其他类型,仿照加进去就是了。
python没有switch-case结构,可以用dict实现。

运行结果:

<dict>
   licenses:   <list>  len= 8
       item 0:   <dict>
           name:   <str> Attribution-NonCommercial-ShareAlike License
           id:   <num> 1
           url:   <str> http://creativecommons.org/licenses/by-nc-sa/2.0/
       item ...
   categories:   <list>  len= 80
       item 0:   <dict>
           supercategory:   <str> person
           name:   <str> person
           id:   <num> 1
       item ...
   annotations:   <list>  len= 36781
       item 0:   <dict>
           id:   <num> 1768
           bbox:   <list>  len= 4
               item 0:   <num> 473.07
               item ...
           image_id:   <num> 289343
           iscrowd:   <num> 0
           area:   <num> 702.1057499999998
           category_id:   <num> 18
           segmentation:   <list>  len= 1
               item 0:   <list>  len= 134
                   item 0:   <num> 510.66
                   item ...
       item ...
   info:   <dict>
       version:   <str> 1.0
       date_created:   <str> 2017/09/01
       description:   <str> COCO 2017 Dataset
       year:   <num> 2017
       contributor:   <str> COCO Consortium
       url:   <str> http://cocodataset.org
   images:   <list>  len= 5000
       item 0:   <dict>
           file_name:   <str> 000000397133.jpg
           id:   <num> 397133
           date_captured:   <str> 2013-11-14 17:02:52
           license:   <num> 4
           height:   <num> 427
           flickr_url:   <str> http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg
           coco_url:   <str> http://images.cocodataset.org/val2017/000000397133.jpg
           width:   <num> 640
       item ...

可以清晰的看出,annotations是dict类型,有5个key,以及每个项分别的类型和详情。

相关文章

网友评论

      本文标题:python查看数据集的结构 (用dict实现switch-ca

      本文链接:https://www.haomeiwen.com/subject/dfnmxqtx.html