我用python爬取了你们喜欢的美女图片

作者: jack_jt_z | 来源:发表于2019-02-25 14:24 被阅读0次

我用python爬取了你们喜欢的美女图片
python爬虫爬取大量高清壁纸，一分钟换一张壁纸！
爬虫爬取大量高清壁纸，一分钟换一张壁纸
python爬取性感美女图片
scrapy里面item传递数据后数据不正确的问题
Python爬取花瓣网美女图片（动态网站）
Python数据分析之贴吧的问与答
Python带给我的思考
3个适合新人上手的Python项目
python爬取高德地图地点-附带源码

无意间收到一个推送，入驻简书已经两年了。我觉得应该给大家来一波福利，所以我用python爬取了图片网站的几十G的美女图片(写完代码开着测试睡着了)，所以这么好的东西我觉得应该分享给大家，O(∩_∩)O。
下面是代码的简单实现，变量名和方法都是跑起来就行，没有整理，有需要的可以自己整理下：

image2local:

import requests
import time
from lxml import etree
import os

#存储位置
dir = 'xxxxxx'

#网址地址
image_host = 'https://www.27270.com'

#获取爬取列表
def get_list(page_detail=''):
    #爬取列表
    page = requests.get('https://www.27270.com/ent/meinvtupian/list_11_{0}.html'.format(page_detail))
    #解析列表数据
    image_urls = etree.HTML(page.text)
    print(image_urls.xpath('/html/body/div[2]/div[7]/ul/li/a[2]/@href'))
    pages = image_urls.xpath('/html/body/div[2]/div[7]/ul/li/a[2]/@href')

    return pages

def getEntityUrl(url):
    #爬取传过来的地址
    page = requests.get(image_host+url)
    page.keep_alive = False
    image_urls = etree.HTML(page.content)

    try:
        image = image_urls.xpath('//*[@id="picBody"]/p/a[1]/img/@src')[0]
        next = image_urls.xpath('//*[@id="nl"]/a/@href')[0]
        title = image_urls.xpath('/html/body/div[2]/div[2]/h1/text()')[0]
    except:
        return False

    #爬取第一张
    if next.find('##')>1:
        return False
    else:
        result = image2local(image,title,next)
    return True

#爬取图片
def image2local(url,title,name):
    title = title.split('(')[0]
    if not os.path.exists(dir + title):
        os.mkdir(dir + title)
    try:
        image = requests.get(url)
    except:
        return True
    # print(image.content+)
    with open(dir + title+'/{0}.jpg'.format(name),'wb') as f:
        f.write(image.content)
        f.close()
    return True

if __name__ == '__main__':
    x = range(1,215)
    for i in x:
        list = get_list(i)
        for image in list:
            num = 1
            result = True
            while result:
                next = image.replace('.','_{0}'.format(num)+'.')
                num = num+1
                result = getEntityUrl(next)

python代码是现学现写的，大家勿喷