Intro

Intro

作者: 方方块 | 来源:发表于2017-07-15 04:36 被阅读0次

Intro
Intro
Intro
Intro
《The intro》
Intro
Intro
Intro
Intro
Intro

(Optional) Create virtual environment

prefer using python version 3
mkvirtualenv --python=/usr/bin/python3 python3

check pip version by pip --version to make sure python 3 is used

Steps

scrapy startproject name
scrapy genspider botname url

robotstxt in setting should be true to always crawl permitted pages and be a good web citizen

inside project folder scrapy crawl botname
test in shell
scrapy crawl botname -o xx.json or csv to see result

shell to debug and test

scrapy shell

test url is valid - fetch(url)
test valid html - view(response.body)

Alternative xpath testing tool
http://www.freeformatter.com/xpath-tester.html

Xpath docs

uses response from selector

selctor, as it is named, selects html content,
from scrapy.selector import Selector
Since this is a common operation, response.selector is shorten to .xpath()

Extra
css can also be used as selector, but xpath is the official way

//name or //* - relative select every instance of html tag name or all
text() - text content in unicode
'//name[1]' - python isolated selector for ('//name')[0], use either
. - extracting first instance of data that is not response, can also just omit //
@ - attribute grabbing

if itemprop exist, use it over class to extract

Tools to get xpath fast -

Paste_Image.png

https://chrome.google.com/webstore/detail/xpath-helper/hgimnogjllphhhkhlmebbmlgjoejdpjl

相关文章

Intro
希望通过这个软件每天做一点记录，类似小日记吧，可能无聊或琐碎的这些点滴构成了我的生活^_^
Intro
太漫长了相当于长出一根犀牛角的时间在你不断对我说着话时泡沫们跃升着我的眼中口中开满了花所有的泪水和话语...
Intro
本人名“悦”，金牛座，性格实在活脱的不像人类。因重视生命起源，想尝试摆脱各种人类局限，于是喜欢画一些“受精卵”磨...
Intro
大家好啊，我是Tonia。Full stake designer。
《The intro》
我好不容易离开了忙碌，远离了人海，避开了聒噪，走进了你的世界，却又发现我已经再也睁不开疲乏的双眼，挥不动思想的笔触...
Intro
这首歌真的很适合一大早没有人，然后自己很拉风地路跑，感觉整个世界都是自己的！！有时候，我真的不喜欢热闹， ...
Intro
(Optional) Create virtual environment prefer using python...
Intro
As my return flight was making its initial landing attemp...
Intro
敏捷教练，CSP，CSM。有13年软件开发经验，为团队提供敏捷教导和培训服务。他的使命是：和程序员一起重新点燃编程...
Intro
SQL, which stands forStructured Query Language, is a lang...

网友评论

本文标题：Intro

本文链接：https://www.haomeiwen.com/subject/jgkahxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|Intro|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！