美文网首页
数据获取-爬虫实践

数据获取-爬虫实践

作者: Fitz_Lee | 来源:发表于2018-07-08 20:04 被阅读18次

    爬虫入门文章

    https://zhuanlan.zhihu.com/p/24669128
    https://zhuanlan.zhihu.com/p/24769534
    https://zhuanlan.zhihu.com/p/25200262
    https://zhuanlan.zhihu.com/p/26257790

    userAgent 和 动态IP设置

    http://lawtech0902.com/2017/06/11/scrapy-useragent-proxyip/
    https://zhuanlan.zhihu.com/p/29733174
    https://github.com/hellysmile/fake-useragent

    延迟和禁止cookies

    https://blkstone.github.io/2016/03/02/crawler-anti-anti-cheat/

    PhantomJs 和 selenium 处理Ajax

    https://my.oschina.net/lewisgong/blog/872257
    https://chaycao.github.io/2016/08/19/Scrapy-Selenium-Phantomjs/

    页面解析 Beautiful xpath css.

    https://cuiqingcai.com/1319.html

    python

    lxml安装

    https://pypi.org/project/lxml/#files
    pip install lxml-4.2.1-cp27-cp27m-win_amd64.whl
    https://blog.csdn.net/g1apassz/article/details/46574963
    https://blog.csdn.net/acingdreamer/article/details/53348649

    pip升级

    pip install --upgrade pip

    requirements.txt的创建及使用

    https://blog.csdn.net/orangleliu/article/details/60958525

    python path 引用

    https://blog.csdn.net/tony_wong/article/details/18044273

    Scrapy安装错误:Microsoft Visual C++ 14.0 is required...

    https://blog.csdn.net/nima1994/article/details/74931621?locationNum=10&fps=1

    Scrapy shell

    https://blog.csdn.net/laoyang360/article/details/52809927
    Scrapy运行ImportError: No module named win32api错误
    https://blog.csdn.net/u013687632/article/details/57075514

    xpath

    https://blog.csdn.net/manongpengzai/article/details/77109600

    python log

    https://blog.csdn.net/chosen0ne/article/details/7319306

    scrapy link extrator

    https://www.jianshu.com/p/ff9125650697

    启动爬虫

    进入项目的根目录,执行下列命令启动spider:
    scrapy crawl dmoz

    相关文章

      网友评论

          本文标题:数据获取-爬虫实践

          本文链接:https://www.haomeiwen.com/subject/xpfrrftx.html