python硬核爬虫

作者: 一热心市民小胡 | 来源:发表于2018-08-23 23:03 被阅读0次

python硬核爬虫
3分钟带你了解世界第一语言Python 入门上手也这么简单！
Python网络爬虫（八） - 利用有道词典实现一个简单翻译程序
Python网络爬虫（七）- 深度爬虫CrawlSpider
Python网络爬虫（二）- urllib爬虫案例
Python网络爬虫（一）- 入门基础
Python网络爬虫（四）- XPath
Python网络爬虫（三）- 爬虫进阶
Python网络爬虫（六）- Scrapy框架
Python网络爬虫（五）- Requests和Beautifu

爬美女？爬帅哥？naive

import os
import requests
from urllib import request
from urllib.request import urlopen
from urllib.request import urlretrieve
from bs4 import BeautifulSoup
import re
count = 0
picUrl = []
nums = []
names = []
for page in range(1,3,1):
    url = "https://javmoo.com/cn/search/abp/page/"+str(page)
    head = {}
    head['User-Agent'] = 'Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166  Safari/535.19'
    req = request.Request(url, headers = head)
    response = request.urlopen(req)
    html = response.read()
    bs = BeautifulSoup(html,'lxml')
    movieList = bs.find_all('div',attrs={'class':'item'})
    for tagLi in movieList:
        count += 1
        picUrl.append(tagLi.img.attrs['src'])
        nums.append(tagLi.find('date').get_text())
        names.append(tagLi.img.attrs['title'])
    print(count)
    os.makedirs('./img/',exist_ok=True)
    for (name,img,num) in zip(names,picUrl,nums):
        r = requests.get(img,stream=True)
        imageName = '['+num+']'+name+'.png'
        with open('./img/%s' % imageName,'wb+') as f:
            for chunk in r.iter_content(chunk_size=128):
                f.write(chunk)
        print('saved %s'% imageName)