美文网首页Python四期爬虫作业
【Python爬虫】糗百-文字版块

【Python爬虫】糗百-文字版块

作者: d1b0f55d8efb | 来源:发表于2017-09-14 13:32 被阅读20次

**糗百-文字版块
https://www.qiushibaike.com/text/
爬取作者信息(头像/昵称/性别/年龄)
帖子内容,好笑数,评论数

自己爬取的源码

#__author:'cuiwnehao'__
#coding:utf-8
from bs4 import BeautifulSoup
import requests
url='https://www.qiushibaike.com/text/'
req=requests.get(url)
req.encoding='utf-8'
html=req.text
soup=BeautifulSoup(html,'lxml')
infos=soup.find_all('div',class_="article")
#print(len(article))
for info in infos:
    zuozhe=info.h2.text
    #print(zuozhe)
    neirong=info.span.text
    #print(neirong)
    haoxiaoshu=info.find('i').text
    #print(haoxiaoshu)
    pinglunshu = info.find('span',class_='stats-comments').find('i').text
    #print(pinglunshu)

    print(zuozhe)
    print(neirong)
    print(haoxiaoshu)
    print(pinglunshu)
    print("------------------------------------------------------")

相关文章

网友评论

  • 倔强的潇洒小姐:参照你的报错
    neirong = info.span.text
    AttributeError: 'NoneType' object has no attribute 'text'
    d1b0f55d8efb:你看看网页源码,你循环那是不是,没有找准确

本文标题:【Python爬虫】糗百-文字版块

本文链接:https://www.haomeiwen.com/subject/uunrsxtx.html