美文网首页
re_糗事百科demo

re_糗事百科demo

作者: 蜗牛不牛不知道 | 来源:发表于2020-04-18 16:54 被阅读0次

encoding: utf-8

import re
import requests

def parse_page(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'
    }
    response = requests.get(url,headers)
    text = response.text
    # re.S = re.DOTALL
    contents = re.findall(r'<div\sclass="content">.*?<span>(.*?)</span>',text,re.DOTALL)
    duanzi = []
    for content in contents:
        x = re.sub(r'<.*?>','',content)
        duanzi.append(x.strip())
        print(x.strip())
        print('='*50)


def main():
    url = 'https://www.qiushibaike.com/text/page/1/'
    for x in range(1,5):
        url = 'https://www.qiushibaike.com/text/page/%s/' % x
        parse_page(url)

if __name__ == '__main__':
    main()

相关文章

网友评论

      本文标题:re_糗事百科demo

      本文链接:https://www.haomeiwen.com/subject/qqjivhtx.html