笔趣阁小说爬虫/txt下载（python）

作者: InPieces | 来源:发表于2019-08-06 22:44 被阅读0次

笔趣阁小说爬虫/txt下载（python）
python 爬虫 --- 爬取笔趣阁小说
python各类爬虫案例，爬到你手软！（附代码）
5.3黑客成长日记——爬虫篇(1)
urllib、pyquery下载笔趣阁小说
python爬虫-笔趣阁
对新笔趣阁小说进行爬取，保存和下载！这就是Python的魅力
python爬虫-爬取笔趣阁小说
Python下载小说并写入文件
python3--简单爬虫--小说网站

本人爬虫新手，又是个小说迷，学了一点点东西就迫不及待想要实战一下，于是写了这么个超简单的小说爬虫代码，当然只能爬取这个固定网站（https://www.biquge5200.cc/）的小说啦！

啥也不说了上代码:

from bs4 import BeautifulSoup
import os,re,requests,time

def to_chinese(str1):#从文本中提取中文
    str2=""
    for i in str1:
        x=len(i.encode())
        if x==3:
            str2=str2+i
    return str2

def hebing(path,list1=[]):#合并txt
    f1=open(path+"汇总.txt",'w')
    for i in range(len(list1)):
        f2=open(list1[i],'r')
        txt=f2.read()
        f1.write(str(txt))
        f2.close()
    f1.close()

def download(url):#单章下载
    r=requests.get(url)
    text=r.text
    soup=BeautifulSoup(text,"html.parser")
    zjmc=str(soup.h1).split("<h1>")[1]
    zjmc=zjmc.split("</h1")[0]
    text=zjmc+"\n"
    text=text+to_chinese(str(soup.p))
    h=0
    for tag in soup.find_all(re.compile('p')):
        h=h+1
        if h>20:
            text=text+to_chinese(str(tag))+"\n"
    return text

url="https://www.biquge5200.cc/34_34637/"#下载小说目录的url,可以自行更换（仅限笔趣阁https://www.biquge5200.cc/）
book_num=url.split("/")[-2]

#创建目录
if os.path.exists("D://小说//"+book_num+"//"):
    pass
else:
    os.makedirs("D://小说//"+book_num+"//")

#得到每个章节的url，并保存到列表zhangjie中
r=requests.get(url)
text=r.text
text=text.split("正文")[1]
text=text.split("新书推荐")[0]
soup=BeautifulSoup(text,"html.parser")
zhangjie=[]
pathlist=[]
i=0

for link in soup.find_all('a'):
    zhangjie.append(link.get('href'))
for i in range(len(zhangjie)):
    i=i+1
    path="D://小说//"+book_num+"//"+str(i)+".txt"
    pathlist.append(path)
    text=download(zhangjie[i-1])
    f = open(path, 'w')
    f.write(text)
    f.close()
    time.sleep(0.6)
    print("第"+str(i)+"章下载完成！")

print("合并中...请耐心等待...")
path="D://小说//"+book_num+"//"
hebing(path,pathlist)
print("合并完成！\n已保存至D盘（“D:\小说”）")

运行情况：