美文网首页python 爬虫
Python实战作业1-4:获取动态网页数据

Python实战作业1-4:获取动态网页数据

作者: 浮生只言片语 | 来源:发表于2017-05-24 21:58 被阅读11次

    任务:

    获取网站:https://knewone.com/discover?page= 前20页图片链接并下载至本地

    成果:

    Snip20170524_1.png

    代码:

    from bs4 import BeautifulSoup
    import requests,urllib.request
    
    folderPath = '/Users/FS/Desktop/test/'
    urls = ['https://knewone.com/discover?page={}'.format(str(i)) for i in range(1,15)]
    
    imageUrls = []
    for url in urls:
        print(url)
        wb_data = requests.get(url)
        soup = BeautifulSoup(wb_data.text, 'lxml')
        images = soup.select('#wrapper > div > section > div > div.hits_group-things.clearfix > article > header > a > img')
        for image in images:
            url = image.get('src')
            imageUrls.insert(-1,url.split('!')[0])
            print(url)
    
    for imageUrl in imageUrls:
        urllib.request.urlretrieve(imageUrl,folderPath+imageUrl[-10:])
        print('Done')
    

    相关文章

      网友评论

        本文标题:Python实战作业1-4:获取动态网页数据

        本文链接:https://www.haomeiwen.com/subject/hxzoxxtx.html