爬取网站同一主题不同page的内容

作者: javen_spring | 来源:发表于2020-04-23 16:04 被阅读0次

爬取网站同一主题不同page的内容
用Python 爬虫中 re 模块，爬取《糗事百科》的糗事并存储
学院教师信息爬取报告（二）
行业垂直搜索引擎的构建
电影天堂爬虫
Python爬虫Scrapy(五)_Spiders
Selenium小例子
爬虫篇（2）——爬取博客内容
scrapy一个spider 爬取多个内容
听说你想学习爬虫？送你一本葵花宝典！不用自宫！高效的学习路径

示例代码：（其中有易犯错处已注释）

import requests
from bs4 import BeautifulSoup


headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'}  
##同一headers，便于之后一致的for循环

film_page=[]  ##建立空列表，方面存放循环的内容
for i in range(1,11):
    url='https://movie.......com/top250?start={}&filter='.format((i-1)*25) 
 ## 循环获取网站1-10页的内容
    res=requests.get(url,headers=headers)  ##要有headers！！！
    
    bs=BeautifulSoup(res.text,'html.parser')
    parent=bs.find_all('div',class_='item')
##    print(parent)
##    print(type(parent))
##    print(len(parent))   ##检查可能的错误
    
    for ele in parent:
        seq=ele.find('div',class_='pic').find('em',class_='').text.strip()
        film=ele.find('div',class_='hd').find('a').find('span',class_='title').text.strip()
        score=ele.find('div',class_='star').find('span',class_='rating_num').text.strip()
        try:
            recommend=ele.find('p',class_='quote').find('span',class_='inq').text.strip()
        except AttributeError:
            print('影片{}无推荐语'.format(film))  ## 部分无内容可能会导致报错
        link=ele.find('div',class_='hd').find('a')['href']
        film_page+=['序号：',seq,'片名：',film,'评分：',score,'推荐语：',recommend,'链接：',link]    ## film_page将所有循环内容装进列表
print(film_page)

网友评论

本文标题：爬取网站同一主题不同page的内容

本文链接：https://www.haomeiwen.com/subject/vxyaihtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

爬取网站同一主题不同page的内容

相关文章

爬取网站同一主题不同page的内容

用Python 爬虫中 re 模块，爬取《糗事百科》的糗事并存储

学院教师信息爬取报告（二）

行业垂直搜索引擎的构建

电影天堂爬虫

Python爬虫Scrapy(五)_Spiders

Selenium小例子

爬虫篇（2）——爬取博客内容

scrapy一个spider 爬取多个内容

听说你想学习爬虫？送你一本葵花宝典！不用自宫！高效的学习路径

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读