美文网首页【原创】Python网络爬虫
Python爬虫-Scrapy框架之Scrapy模拟登陆

Python爬虫-Scrapy框架之Scrapy模拟登陆

作者: 复苏的兵马俑 | 来源:发表于2020-04-27 16:24 被阅读0次

      发送POST请求:有时候我们想要在请求数据的时候发送POST请求,那么这时候需要使用Request的子类FromRequest来实现,如果想要在爬虫一开始的时候就发送POST请求,那么需要在爬虫类中重写start_request(self)方法,并且不再调用start_urls里的url。

    1、创建项目

    D:\学习笔记\Python学习\Python_Crawler>scrapy startproject renrenLogin
    New Scrapy project 'renrenLogin', using template directory 'c:\python38\lib\site-packages\scrapy\templates\project', created in:
        D:\学习笔记\Python学习\Python_Crawler\renrenLogin
    
    You can start your first spider with:
        cd renrenLogin
        scrapy genspider example example.com
    

    2、创建爬虫

    D:\学习笔记\Python学习\Python_Crawler>cd renrenLogin
    D:\学习笔记\Python学习\Python_Crawler\renrenLogin>scrapy genspider renren "renren.com"
    Created spider 'renren' using template 'basic' in module:
      renrenLogin.spiders.renren
    

    3、代码实现

      A)settings.py文件配置:

    ROBOTSTXT_OBEY = False
    
    DOWNLOAD_DELAY = 1
    
    DEFAULT_REQUEST_HEADERS = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language': 'en',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.9 Safari/537.36',
    }
    

      B)start.py文件如下:

    from scrapy import cmdline
    cmdline.execute("scrapy crawl renren".split())
    

      C)renren.py文件如下:

    # -*- coding: utf-8 -*-
    import scrapy
    
    
    class RenrenSpider(scrapy.Spider):
        name = 'renren'
        allowed_domains = ['renren.com']
        start_urls = ['http://renren.com/']
    
        def start_requests(self):
            url = "http://www.renren.com/PLogin.do"
            data = {"email": "kevin19851228@gmail.com", "password": "1qaz@WSX"}
            request = scrapy.FormRequest(url, formdata=data,callback=self.parse_page)
            yield request
    
        def parse_page(self, response):
            # with open('renren.html', 'w', encoding='utf-8') as fp:
            #     fp.write(response.text)
            request = scrapy.Request(url="http://www.renren.com/880151247/profile", callback=self.parse_profile)
            yield request
    
        def parse_profile(self, response):
            with open('dpProfile.html', 'w', encoding='utf-8') as fp:
                fp.write(response.text)
    

    4、说明:

      1)想要发送post请求,那么推荐使用“scrapyFormRequest”方法,可以方便的指定表单数据;
      2)如果想要的爬虫一开始的时候就发送post请求,那么应该重写start_requests方法,在这个方法中,发送post请求。

    相关文章

      网友评论

        本文标题:Python爬虫-Scrapy框架之Scrapy模拟登陆

        本文链接:https://www.haomeiwen.com/subject/lsxuwhtx.html