爬取

作者: Rain师兄 | 来源:发表于2020-11-13 18:11 被阅读0次

python-爬虫学习（文字、图片、视频）
python爬虫学习（文字、图片、视频）
Selenium小例子
python多进程、多线程及协程爬虫速度比较
scrapy中间件实现增量爬虫
爬虫案例
五. 项目实战：爬取matplotlib源码文件
scrapy对爬取的内容进行更新爬取
day74-scrapy中间件及嵌套爬取
六. 项目实战：下载360图片

import requests

from lxmlimport etree

from bs4import BeautifulSoupas bf

# https://www.soxscc.com/SuiTangWoLaoPoShiChangSunWuGou/157152.html

# https://www.soxscc.com/SuiTangWoLaoPoShiChangSunWuGou/157153.html

# /SuiTangWoLaoPoShiChangSunWuGou/864881.html

url ='https://www.soxscc.com/SuiTangWoLaoPoShiChangSunWuGou/'

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0"}

resp = requests.get(url,headers=headers)

resp_xpath = etree.HTML(resp.text)

hrefs = resp_xpath.xpath("//div[@id='novel150661']//dd/a/@href")

for iin range(400):

url ='https://www.soxscc.com'+hrefs[i]

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0"}

resp = requests.get(url,headers=headers)

soup = bf(resp.text)

content = soup.find('div',class_='content').get_text()

r_x = etree.HTML(resp.text)

output ="\n{}\n\n{}-----------\n"

title = r_x.xpath("//div[@class='read_title']/h1/text()")

outputs = output.format(title[0],content)

print(outputs)

with open('biquge.txt','a',encoding='utf-8')as f:

f.write(outputs)