python爬取小说莽荒记全章节内容

作者: Rain师兄 | 来源:发表于2020-10-24 01:21 被阅读0次

python爬取小说莽荒记全章节内容
python多线程爬虫爬取顶点小说内容（BeautifulSou
python 爬取起点小说vip章节（失败）
爬取Python教程博客并转成PDF
2017-12-31
scrapy对爬取的内容进行更新爬取
python 爬虫练习（一）
爬取小说网站章节和小说语音播放（文章末-->获取源码）
python3网络爬虫，下载起点小说
身为一个程序员看小说还需要花钱么，不存在的，Python爬虫摆设

终于写好代码可以爬取所有的章节的href，不用去找数字规律。直接运行就可以把小说下载到txt文件里面。

之前学了点xpath，试着用了一会但是不熟练，之后直接用for循环，直接用了五次for循环。

import requests

from bs4 import BeautifulSoup as bf

url = 'https://www.soxscc.com/MangHuangJi/'

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0"}

html = requests.get(url,headers=headers)

texts = html.text

soup = bf(texts,'lxml')

t1 = soup.find('div',class_='novel_list',id='novel4451')

t2 = t1.findAll('dl')

    for i in t2:

        t3 = i.findAll('dd')

        for i in t3:

            t4 = i.findAll('a')

            for i in t4:

                    t5 = i.get('href')

                    url1 = 'https://www.soxscc.com'+ t5

                    res = requests.get(url1,headers=headers)

                    html = bf(res.text,'lxml')

                    title = html.find('h1').string

                    contental = html.findAll('div',class_='content')

                    output = "{}\n\{}n\n\n\n\n\n"

                    for i in contental:

                        contents = i.get_text()

                     outputs = output.format(title,contents)

for i in outputs:

with open('biquge.txt', 'a', encoding='utf-8') as f:

                                 f.write(i)