美文网首页
Scrapy泡面小镇新闻抓取练手篇一

Scrapy泡面小镇新闻抓取练手篇一

作者: 金融界审核大表哥 | 来源:发表于2018-08-23 09:37 被阅读7次

    import scrapy

    class mingyan(scrapy.Spider):

    name ="paomian"

        start_urls = [

    'http://www.pmtown.com/archives/category/早报'

            ]

    def parse(self, response):

    for vin response.css('ul.article-list li'):

    lianjie = v.css('a::attr(href)')[0].extract()

    title = v.css('a::attr(title)')[0].extract()[5:]

    detail = v.css('p::text')[0].extract()

    image = v.css('div.item-img>a>img::attr(src)').extract_first()

    img = ''

    if imageUrl is not None:

            img=imageUrl

    else:

            img='null' 

    yield {

    'title': title,

    'introduction': detail,

    'detailUrl':lianjie,

    'imageUrl':img,

    }

    dt = response.css('#wrap div.main.container div.content div.sec-panel.archive-list div.pagination.clearfix a.next')

    next_page = aa.css('a::attr(href)').extract_first()

    print('-------->%s',next_page)

    print(next_page)

    if next_page is not None:

            nexthref = response.urljoin(next_page)

            yield scrapy.Request(next_page, callback=self.parse,dont_filter=True)

    # scrapy crawl 'paomian' -o paomian.json

    以上是每日最新的泡面小镇早报篇数据抓取仅供练手而已~如有侵权请联系我删掉文章

    相关文章

      网友评论

          本文标题:Scrapy泡面小镇新闻抓取练手篇一

          本文链接:https://www.haomeiwen.com/subject/ybedmftx.html