Python爬虫-爬取网站图片

作者: Fitz916 | 来源:发表于2017-08-12 12:42 被阅读284次

    今天要爬的网站就是

    http://www.4j4j.cn/beauty/index.html
    

    进入网站,点开美女图片栏目,打开开发者工具找到我们需要的url,跳转到详情页的url
    接下来我们随便点进一个里面,比如第二个美女,进去可以看到她的更多

    image.png

    可以看到我们想要的图片url,我们只要取到url,然后requests.get(url)就可以把图片下载下来将图片保存到我们的磁盘里

                with open('girl_%d.jpg' % i, 'wb') as fp:
                    fp.write(res.content)
    

    这个爬虫也很简单,就不多说了,主要用requests和BeautifulSoup
    最后附上完整代码

    #!/usr/bin/env python3
    # -*- coding:utf-8 -*-
    
    import requests
    import os
    from bs4 import BeautifulSoup
    
    base_url = 'http://www.4j4j.cn'
    index_url = 'http://www.4j4j.cn/beauty/index.html'
    
    # 获取每个美女详情页的url
    def get_url_list():
        response = requests.get(index_url)
        response.encoding = 'utf-8'
        html = BeautifulSoup(response.text, 'html.parser')
        data = html.find('div', {'class': 'beautiful_pictures_show'}).find_all('span')
        result = [(item.find('a')['href'], item.find('a').get_text()) for item in data]
        return result
    
    # 下载图片保存到本地
    def get_img(beauty_url, title):
        save_path = '/Users/mocokoo/Documents/py_file/%s' % title
        os.mkdir(save_path)
        os.chdir(save_path)
        response = requests.get(beauty_url)
        response.encoding = 'utf-8'
        html = BeautifulSoup(response.text, 'html.parser')
        data = html.find('div', {'class': 'beauty_details_imgs_box'})
        girls = data.find_all('img')
        i = 1
        for girl in girls:
            girl_url = girl['src']
            res = requests.get(girl_url)
            res.encoding = 'utf-8'
            if res.status_code == 200:
                with open('girl_%d.jpg' % i, 'wb') as fp:
                    fp.write(res.content)
                    i += 1
    
    
    def get_page():
        url_list = get_url_list()
        for url in url_list:
            beauty_url = base_url+url[0]
            title = url[1]
            get_img(beauty_url=beauty_url, title=title)
    
    if __name__ == '__main__':
        get_page()
    
    
    

    相关文章

      网友评论

        本文标题:Python爬虫-爬取网站图片

        本文链接:https://www.haomeiwen.com/subject/zyzdrxtx.html