美文网首页
2019-10-09 笑话网爬虫(练习requests)

2019-10-09 笑话网爬虫(练习requests)

作者: 小楼主 | 来源:发表于2019-10-09 23:21 被阅读0次
import requests
import re
def get_one_page(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}
    res = requests.get(url, headers=headers)
    return res.text

def parse_one_page(html):
    pattern=re.compile('<div class="one-cont".*?<i>(.*?)</i>.*?<a href=.*?>(.*?)</a>.*?</div>',re.S)
    items=re.findall(pattern,html)
    for item in items:
        yield {
            'author':item[0],
            'content':item[1]
        }

def main():
    url = 'https://www.xiaohua.com/duanzi?page=1'
    html=get_one_page(url)
    for item in parse_one_page(html):
        print(item)


if __name__=='__main__':
    main()

相关文章

网友评论

      本文标题:2019-10-09 笑话网爬虫(练习requests)

      本文链接:https://www.haomeiwen.com/subject/exbnpctx.html