美文网首页
scrapy总结

scrapy总结

作者: 丶君为红颜酔 | 来源:发表于2018-09-11 18:25 被阅读0次

安装scrapy

Anaconda3-5.0.1-Windows-x86_64(http://www.scrapyd.cn/download/125.html
...

创建项目

scrapy startproject mingyan

创建爬虫

scrapy genspider dytt8 http://www.dytt8.net/html/gndy/jddy/20160320/50523.html

编写项目

-- mingyan.py
import scrapy
class mingyan(scrapy.Spider): #需要继承scrapy.Spider类
    name = "mingyan" 
    def start_requests(self):
        urls = [
            'http://lab.scrapyd.cn/page/1/',
            'http://lab.scrapyd.cn/page/2/',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)
### tag
tag = getattr(self, 'tag', None)  # 获取tag值,也就是爬取时传过来的参数
if tag is not None:  # 判断是否存在tag,若存在,重新构造url
url = url + 'tag/' + tag  # 构造url若tag=爱情,url= "http://lab.scrapyd.cn/tag/爱情"
     
### 下一页
#if next_page is not None:
#next_page = response.urljoin(next_page)
#yield scrapy.Request(next_page, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]  
        filename = 'mingyan-%s.html' % page 
        with open(filename, 'wb') as f:     
            f.write(response.body)          
        self.log('保存文件: %s' % filename)   


-- xx.item
import scrapy
class MyscrapyItem(scrapy.Item):
    student_id = scrapy.Field()
    student_name = scrapy.Field()



运行项目

scrapy crawl mingyan
scrapy runspider scrapy_cn.py (单文件运行,不作为项目)

调试

scrapy shell http://lab.scrapyd.cn
< response.css('title')
> 调试结果

下载网页

scrapy fetch http://www.scrapyd.cn >3.html

python io

filename = 'mingyan-%s.html' % page 
with open(filename, 'wb') as f:
f.write(response.body)  

相关文章

网友评论

      本文标题:scrapy总结

      本文链接:https://www.haomeiwen.com/subject/xnzugftx.html