美文网首页
07-数据提取-jsonpath

07-数据提取-jsonpath

作者: Vanna_bot | 来源:发表于2018-12-27 21:12 被阅读0次

    jsonpath用来解析多层嵌套的json数据
    jsonpath官方文档

    安装
    pip install jsonpath
    
    语法
    JSONPath 描述
    $ 根节点
    . or [] 子节点
    .. 不管位置,选择所有符合条件的条件
    使用

    字典的根节点为最外部大括号
    jsonpath()返回一个结果列表

    import jsonpath
    
    dict_data = { "store": {
        "book": [
          { "category": "reference",
            "author": "Nigel Rees",
            "title": "Sayings of the Century",
            "price": 8.95
          },
          { "category": "fiction",
            "author": "Evelyn Waugh",
            "title": "Sword of Honour",
            "price": 12.99
          },
          { "category": "fiction",
            "author": "Herman Melville",
            "title": "Moby Dick",
            "isbn": "0-553-21311-3",
            "price": 8.99
          },
          { "category": "fiction",
            "author": "J. R. R. Tolkien",
            "title": "The Lord of the Rings",
            "isbn": "0-395-19395-8",
            "price": 22.99
          }
        ],
        "bicycle": {
          "color": "red",
          "price": 19.95
        }
      }
    }
    
    print(jsonpath.jsonpath(dict_data, "$.store.bicycle.price"))
    >>[19.95]
    print(jsonpath.jsonpath(dict_data, "$..price"))
    >>[8.95, 12.99, 8.99, 22.99, 19.95]
    
    练习

    爬取bilibili电影分类下的欧美电影数据

    import json
    import jsonpath
    import requests
    
    url="https://api.bilibili.com/archive_rank/getarchiverankbypartion?jsonp=jsonp&tid=145&pn=1"
    
    headers={"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Mobile Safari/537.36"}
    
    responses=requests.get(url,headers=headers)
    
    html_dict=json.loads(responses.content)
    
    movie=jsonpath.jsonpath(html_dict,"$..data..archives..title")
    
    for i in movie:
        with open("bilibili.txt","a",encoding='utf-8') as f:
            f.write(i+"\n")
    

    相关文章

      网友评论

          本文标题:07-数据提取-jsonpath

          本文链接:https://www.haomeiwen.com/subject/vnrdlqtx.html