美文网首页
python查看数据集的结构 (用dict实现switch-ca

python查看数据集的结构 (用dict实现switch-ca

作者: 铁佛爷 | 来源:发表于2018-11-07 11:54 被阅读0次

    做机器学习的经常需要处理数据集,可能是json,mat,h5各种格式的,里面有各种标签结构。
    了解数据集的结构、格式、类型,对我们处理数据是有帮助的。
    写了一个有通用性的程序,
    在此用来查看mscoco数据集的json注释,相同级别的数据使用了相同的缩进。

    # -*- coding: utf-8 -*-
    """
    Created on Tue Nov  6 22:23:17 2018
    
    @author: BigFly
    """
    import json
    
    def process_dict(obj,level):
        print("<dict>")
        for key in obj.keys():
            print("  "*level, "\"%s\""%(key), end=":   ")
            process(obj[key],level+1)
            
    def process_list(obj,level):
        print("<list>"," len=",len(obj))
        samplenum=1 # 对list,查看几个item
        for idx in range(min(samplenum,len(obj))):
            print("  "*level, "item",idx, end=":   ")
            process(obj[idx], level+1)
        if len(obj)>samplenum:
            print("  "*level, "item ...")
            
    def process_str(obj,level):
        print("<str>",obj)
        
    def process_num(obj,level):
        print("<num>",obj)
        
    switch={type({}) :  process_dict,
            type([]) :  process_list,
            type("") :  process_str,
            type(1)  :  process_num,
            type(1.0) : process_num }
    
    def process(obj,level=0):
        obj_typ=type(obj)
        try:
            switch[obj_typ](obj,level+1)
        except KeyError as e:
            print("ERROR: NO ", obj_typ)
    
    
    path="E:\\dataset\\MSCOCO\\annotations_trainval2017\\annotations\\instances_val2017.json"
    path="E:\\dataset\\MSCOCO\\annotations_trainval2017\\annotations\\instances_train2017.json"
    
    jsonstr=open(path).readline()
    print("jsonstr",type(jsonstr),len(jsonstr))
    annotations=json.loads(jsonstr)
    
    #查看annotations的结构
    process(annotations) #['licenses', 'categories', 'annotations', 'info', 'images']
    
    

    这里列举了对5种类型的处理,要处理其他类型,仿照加进去就是了。
    python没有switch-case结构,可以用dict实现。

    运行结果:

    <dict>
       licenses:   <list>  len= 8
           item 0:   <dict>
               name:   <str> Attribution-NonCommercial-ShareAlike License
               id:   <num> 1
               url:   <str> http://creativecommons.org/licenses/by-nc-sa/2.0/
           item ...
       categories:   <list>  len= 80
           item 0:   <dict>
               supercategory:   <str> person
               name:   <str> person
               id:   <num> 1
           item ...
       annotations:   <list>  len= 36781
           item 0:   <dict>
               id:   <num> 1768
               bbox:   <list>  len= 4
                   item 0:   <num> 473.07
                   item ...
               image_id:   <num> 289343
               iscrowd:   <num> 0
               area:   <num> 702.1057499999998
               category_id:   <num> 18
               segmentation:   <list>  len= 1
                   item 0:   <list>  len= 134
                       item 0:   <num> 510.66
                       item ...
           item ...
       info:   <dict>
           version:   <str> 1.0
           date_created:   <str> 2017/09/01
           description:   <str> COCO 2017 Dataset
           year:   <num> 2017
           contributor:   <str> COCO Consortium
           url:   <str> http://cocodataset.org
       images:   <list>  len= 5000
           item 0:   <dict>
               file_name:   <str> 000000397133.jpg
               id:   <num> 397133
               date_captured:   <str> 2013-11-14 17:02:52
               license:   <num> 4
               height:   <num> 427
               flickr_url:   <str> http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg
               coco_url:   <str> http://images.cocodataset.org/val2017/000000397133.jpg
               width:   <num> 640
           item ...
    

    可以清晰的看出,annotations是dict类型,有5个key,以及每个项分别的类型和详情。

    相关文章

      网友评论

          本文标题:python查看数据集的结构 (用dict实现switch-ca

          本文链接:https://www.haomeiwen.com/subject/dfnmxqtx.html