使用beautifulsoup下爬取图片

作者: 无神 | 来源:发表于2017-11-22 21:19 被阅读1260次

使用beautifulsoup下爬取图片
【python爬虫实战】批量爬取站长之家的图片
python爬取妹子图全部图片
爬煎蛋网妹子图
Python实战 - 第2节：解析网页中的元素
爬虫项目练习
使用BeautifulSoup爬取植物图片
python爬取百度贴吧的图片2
Python爬取豆瓣图书250
BeautifulSoup4爬取某社招网站数据

使用beautifulsoup爬取图片，存放到指定的文件夹下。

1、使用urllib.request 下载到网页内容
2、使用beautifulsoup匹配到所有的图片地址
3、指定文件路径
4、调用urllib.request.urlretrieve 下载图片

#-*-coding: utf-8 -*-

import urllib.request
from bs4 import BeautifulSoup
import os

'''
使用beautifulsoup下载图片
1、使用urllib.request 下载到网页内容
2、使用beautifulsoup匹配到所有的图片地址
3、指定文件路径
4、调用urllib.request.urlretrieve 下载图片
'''

def grap_image():
    # 下载网页
    url = 'https://baike.baidu.com/item/%E6%9D%A8%E5%AD%90%E5%A7%97/10966877?fr=aladdin'
    html = urllib.request.urlopen(url)
    content = html.read()
    html.close()

    # 使用beautifulsoup匹配图片
    html_soup = BeautifulSoup(content, 'lxml')
    all_img_links = html_soup.find_all('img',)
    print(all_img_links)

    #指定文件路径
    path = os.getcwd()
    new_path = os.path.join(path, 'pictures')
    if not os.path.isdir(new_path):
        os.mkdir(new_path)
    new_path += '/' #此处需要和windows系统区分开

    # 下载图片
    image_couter = 1
    for img_link in all_img_links:

        file_name = '%s.jpg' % image_couter
        img_url = img_link['src']
        if len(img_url) > 0:
            urllib.request.urlretrieve(img_url, new_path + file_name)
            image_couter += 1
    print('下载图片完成')

if __name__ == '__main__':

    grap_image()