美文网首页
爬虫笔记

爬虫笔记

作者: 我是电饭煲 | 来源:发表于2020-02-11 14:27 被阅读0次

    接入微博接口笔记

    https://www.cnblogs.com/Zender/p/7229650.html?utm_source=itdadao&utm_medium=referral

    bilibili登录极验破解过程

    1.找到登录成功后,发送validate的url


    image.png

    selenium firefox设置代理

    https://www.cnblogs.com/lgh344902118/p/6339378.html

    极验验证码识别

    https://www.jianshu.com/p/3726581d218a?from=singlemessage

    scrapy设置代理教程

    https://www.bilibili.com/video/av47963962/?p=49

    滑动验证码识别

    http://www.360doc.com/content/17/0623/11/5315_665775043.shtml
    https://www.zhihu.com/question/32209043
    https://blog.csdn.net/u012067766/article/details/79793264

    scrapy scrapy.Request callback传递参数方法

    https://www.jianshu.com/p/f451b496d1ae

    scrapy xpath教程

    https://blog.csdn.net/mouday/article/details/80455560

    selenium xpath教程

    https://blog.csdn.net/u012941152/article/details/83011110

    获取ajax结构化信息教程

    https://www.jianshu.com/p/1e35bcb1cf21

    生成scrapy项目

    scrapy startproject ArticleSpider
    

    scrapy模拟登录

    https://www.jianshu.com/p/830ca5623211

    scrapy idea调试运行教程

    https://blog.csdn.net/u014738683/article/details/78072484

    scrapy 选择器获取元素教程

    https://www.cnblogs.com/my8100/p/scrapy_selectors.html

    html转markdown

    https://github.com/matthewwithanm/python-markdownify

    爬https教程

    https://www.bilibili.com/video/av19956343/?p=144

    爬虫通俗易懂教程

    https://www.bilibili.com/video/av9784617/?p=5

    selentium使用步骤

    selenium笔记

    ## 创建网页
    self.__browser.get(href)
    ## 获取当前网页句柄
    handle1 = self.__browser.current_window_handle
    ## 获取属性值
    href = elem.get_attribute('href')
    ## 悬浮
    actions = ActionChains(self.__browser)
    actions.move_to_element(private_letter)
    ## 发送点击按钮(当elem.click()无效时)
    elem.send_keys(Keys.ENTER)
    ## 填写文本框信息
    text.clear()
    text.send_keys(msg)
    ## 刷新
    self.__browser.refresh()
    ## 获取带空格的class
    weibo_elem = weibo_elem.find_element_by_css_selector("[class='WB_feed_detail clearfix']")
    ## 获取指定标签内的html文本
    content = self.__browser.find_element_by_xpath('//section[@id="reviewArticleSection"]').get_attribute('innerHTML')
    ##<a href="javascript:void(0);">点击
    elem.click()
    ## find_element_by_css_selector使用 
    elem = elem.find_element_by_css_selector("div[class='Header-login-wrap']")
    ## 获取浮动框架的代码
    iframe = self.__browser.find_element_by_xpath("//iframe[@id='login-passport-frame']")
    self.__browser.switch_to.frame(iframe)
    time.sleep(3)
    elem = self.__browser.find_element_by_xpath("//div[@class='scanicon-toLogin js-qrcode-switch']")
    ## 从iframe切回到主文档
    switch_to.default_content()
    ## 切换到新页面
    windows = self.__browser.current_window_handle  # 定位当前页面句柄
    all_handles = self.__browser.window_handles  # 获取全部页面句柄
    for handle in all_handles:  # 遍历全部页面句柄
         if handle != windows:  # 判断条件
            self.__browser.switch_to.window(handle)  # 切换到新页面
    

    scrapy笔记

    ## 获取文本
    title = response.selector.xpath('//section//h2[@class="article-title"]/text()').extract_first()
    

    相关文章

      网友评论

          本文标题:爬虫笔记

          本文链接:https://www.haomeiwen.com/subject/epefwqtx.html