python version: 3.6
查看官方首页示例 : https://scrapy.org/
查看官方的示例,使用scrapy爬虫初尝试,并小小的改版,把爬到的一些数据写入到a.txt
文件中进行保存。
将代码保存为p.py文件
import scrapy
file = open('a.txt','ab+')
class BlogSpider(scrapy.Spider):
name = 'blogspider'
# 被爬虫的网站:https://blog.scrapinghub.com
start_urls = ['https://blog.scrapinghub.com']
def parse(self, response):
for title in response.css('h2.entry-title'):
yield {'title': title.css('a ::text').extract_first()}
bytes = title.css('a ::text').extract_first().encode()
# 写入到文件中
file.write(bytes)
# 换行
file.write(b'\n')
for next_page in response.css('div.prev-post > a'):
yield response.follow(next_page, self.parse)
file.close()
运行scrapy 命令
$ scrapy runspider p.py
网友评论