美文网首页
网络爬虫实战(5个案例)

网络爬虫实战(5个案例)

作者: 天道酬勤_FUN | 来源:发表于2017-04-18 12:05 被阅读0次

    案例1:京东商品页面的爬取

    商品链接

    import requests
    url = "https://item.jd.com/2967929.html"
    try:
        r = requests.get(url)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        print(r.text[:1000])
    except:
        print("爬取失败")
    

    案例2:亚马逊商品页面的爬取

    商品链接

    import requests
    url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"
    try:
        kv = {'user-agent':'Mozilla/5.0'}
        r = requests.get(url, headers = kv)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        print(r.text[1000:2000])
    except:
        print("爬取失败")
    

    案例3:百度360关键词提交

    搜索引擎关键词提交接口

    百度的关键词接口:http://www.baidu.com/s?wd=keyword
    360的关键词接口:http://www.so/com/s?q=keyword

    import requests
    keyword = "Python"
    try:
        kv = {'wd':keyword}
        r = requests.get("http://www.baidu.com/s",params=kv)
        print(r.request.url)
        r.raise_for_status()
        print(len(r.text))
    except:
        print("爬取失败")
    
    import requests
    keyword = "Python"
    try:
        kv = {'q':keyword}
        r = requests.get("http://www.so.com/s",params=kv)
        print(r.request.url)
        r.raise_for_status()
        print(len(r.text))
    except:
        print("爬取失败")
    

    案例4:网络图片的爬取和存储

    网络图片链接的格式:http://www.example.com/picture.jpg
    国家地理
    选择一个图片Web页面:
    http://www.nationalgeographic.com.cn/photography/photo_of_the_day/3921.html
    该图片地址:http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg

    import requests
    import os
    url = "http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg"
    root = "D://pics//"
    path = root + url.split('/')[-1]
    try:
        if not os.path.exists(root):
            os.mkdir(root)
        if not os.path.exists(path):
            r = requests.get(url)
            with open(path, 'wb') as f:
                f.write(r.content)
                f.close()
                print("文件保存成功")
        else:
            print("文件已存在")
    except:
        print("爬取失败")
    

    案例5:IP地址归属地的自动查询

    http://m.ip138.com/ip.asp?ip=ipaddress

    import requests
    url = "http://m.ip138.com/ip.asp?ip="
    try:
        r = requests.get(url + '202.204.80.112')
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        print(r.text[-500:])
    except:
        print("爬取失败")
    

    相关文章

      网友评论

          本文标题:网络爬虫实战(5个案例)

          本文链接:https://www.haomeiwen.com/subject/ixefzttx.html