美文网首页
JSON文件与处理

JSON文件与处理

作者: 桀雫 | 来源:发表于2022-04-19 10:48 被阅读0次
    • 首先是在线转换的工具吧https://blog.csdn.net/pigs_dream/article/details/119118903
    • 关于JSON是什么,以及数据格式是怎么样的,可以看这篇文章 https://www.jb51.net/article/77660.htm
      其实,JSON和XML的功能是一样的,都是存储信息,只不过存储信息的方式有些不一样。同时JSON有好多不同的结构,用python解析起来,有好多种方式,网上一搜基本上是json.loads() 和dump这些,但是对于复杂的json格式,比如数组形式的json,json模块的功能可能不适用,这些功能对json格式的规范要求特别高。解析json格式之前最好在bejson这个网站看看自己的json文件是否符合规范。比如属性不能随意换行,比如一定要字段和对应的属性值一定要是双引号。
      图1
    • 上面提到的JSON是普遍的JSON,那么在GIS中,也有特定的JSON,可以看这篇文章https://blog.csdn.net/gislaozhang/article/details/113616526
      Esri特定的JSON和GeoJSON格式,这些格式的好处就是,它可以直接生成矢量数据,如果是Esri的标准JSON格式,可以在Arcgis中JSON TO Feature。如果是GeoJSON,可以通过GDAL转为矢量。但如果你的JSON不符合Esri特定的JSON和GeoJSON格式,那么你就不能直接转为矢量数据。
    • 要怎么转为矢量数据呢?第一个,自己的JSON文件当中有没有经纬度信息,如果有,将JSON文件转换为csv文件,读取location信息,然后再通过经纬度生成点,这就有了矢量数据了。在JSON转csv的时候,我的json是数组形式的,并且经纬度信息还跨行(图1),在json格式中是不太规范的。数据格式具体长成这样:
    [{
        "page_url":"http://restapi.amap.com/v3/place/text?key=9ad8a68e24924e15dd48ef37003f5cf2&types=060300&city=341222&citylimit=true&children=1&offset=25&page=7&extensions=all&output=JSON",
        "page_save_time":"2017-09-25 18:55:03",
        "pcode":"340000",
        "type":"购物服务;家电电子卖场;家电电子卖场",
        "photos":[
            
        ],
        "page_county":"341222",
        "poiweight":"",
        "typecode":"060300",
        "page_fetch_time":"2017-08-13 11:59:44",
        "adname":"太和县",
        "citycode":"1558",
        "children":[
            
        ],
        "doc_class":2005.0,
        "tel":"",
        "id":"7#20170813#59ae5c7c39f9a67d0f544dd8f095f498",
        "tag":"",
        "entr_location":"",
        "doc_item":20050101,
        "page_size":19857.0,
        "site_domain":"restapi.amap.com",
        "adcode":"341222",
        "pname":"安徽省",
        "biz_type":"",
        "cityname":"阜阳市",
        "postcode":"",
        "business_area":"",
        "site_name":"高德地图",
        "site_ip":"106.11.208.130",
        "name":"海尔统帅电器体验中心",
        "shopid":"",
        "navi_poiid":"",
        "page_city":"341200",
        "task_name":"高德地图poi采集",
        "distance":"",
        "page_publish_time":"2017-08-13 11:59:44",
        "doc_subclass":200501.0,
        "biz_ext":{
            "cost":"",
            "rating":""
        },
        "importance":"",
        "page_province":"340000",
        "recommend":"0",
        "task_id":"lbs.amap.com.poi",
        "doc_type":20.0,
        "discount_num":"0",
        "gridcode":"4915641922",
        "shopinfo":"0",
        "task_group":"2017-08-10",
        "alias":"",
        "spider_ip":"192.168.21.83",
        "event":"",
        "indoor_map":"0",
        "email":"",
        "timestamp":"",
        "website":"",
        "address":"人民北路与光明路交叉口北150米",
        "match":"0",
        "indoor_data":{
            "cmsid":"",
            "truefloor":"",
            "cpid":"",
            "floor":""
        },
        "exit_location":"",
        "location":"115.624885,
            33.181075",
        "groupbuy_num":"0"
    }]
    

    我使用了json中的load、loads功能,都解析失败了,我尝试了json to csv(第一个网址)的转换工具,居然成功了,我想应该是代码问题。于是我又进行了搜索:https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv
    最后,通过pandas的read_json成功了!具体代码如下:

    import pandas as pd
    
    # file path
    path = r"D:\05CBD\validation\dataverse_files\POIMap\POI2017.json"
    
    #open file and select some columns
    with open(path,encoding= 'utf-8') as fp:
        df = pd.read_json(fp)
        df = df[['typecode', 'citycode', 'cityname', 'location']]
    
    #删除location里的多余空行和空格
    df['location'] = df['location'].apply(lambda x:x.replace('\n', '').replace('\t', '').replace(' ',''))
    #select_city = ['110000','430100','500000','210200','350100','440100','320100','310000','440300','120000','410100'] 
    
    #筛选特定的行,copy很重要,这样返回的是副本,而不是视图
    select_city = ['010','0731','023','0411','0591','020','025','021','0755','022','0371']   
    dftest = df.query('@select_city in citycode').copy()
    
    #将经纬度信息变成两列(经度,纬度)
    df1 = dftest['location'].str.split(',',expand = True)
    
    #复制并且删除特定的列
    dftest['lon'] = df1[0]
    dftest['lat'] = df1[1]
    dftest = dftest.drop(columns='location')
    
    print('ok')
    

    相关文章

      网友评论

          本文标题:JSON文件与处理

          本文链接:https://www.haomeiwen.com/subject/ynksertx.html