开发python,从程序的复杂程度可分为:爬虫项目和爬虫文件。
使用Scrapy可以提高开发效率。
Scrapy安装步骤
建议安装顺序
0.开个VPN或者采用下载到本地安装方式
1.首先,升级pip:python -m pip install --upgrade pip(建议网络安装)
2.安装wheel(pip install wheel)
3.安装lxml(下载安装,在网上下载好,然后在cmd里转到安装目录,pip install lxml<Tab>
4.安装Twisted(下载安装 pip install tw<Tab>)
5.pip install scrapy或pip install scrapy==1.1.0rc3
常见错误
D:\python>scrapy shell
2017-11-03 15:24:25 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybo
t)
2017-11-03 15:24:25 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INT
ERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2017-11-03 15:24:25 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
Traceback (most recent call last):
File "d:\python\lib\runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "d:\python\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\python\Scripts\scrapy.exe\__main__.py", line 9, in <module>
File "d:\python\lib\site-packages\scrapy\cmdline.py", line 149, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "d:\python\lib\site-packages\scrapy\cmdline.py", line 89, in _run_print_h
elp
func(*a, **kw)
File "d:\python\lib\site-packages\scrapy\cmdline.py", line 156, in _run_comman
d
cmd.run(args, opts)
File "d:\python\lib\site-packages\scrapy\commands\shell.py", line 67, in run
crawler.engine = crawler._create_engine()
File "d:\python\lib\site-packages\scrapy\crawler.py", line 102, in _create_eng
ine
return ExecutionEngine(self, lambda _: self.stop())
File "d:\python\lib\site-packages\scrapy\core\engine.py", line 69, in __init__
self.downloader = downloader_cls(crawler)
File "d:\python\lib\site-packages\scrapy\core\downloader\__init__.py", line 88
, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "d:\python\lib\site-packages\scrapy\middleware.py", line 58, in from_craw
ler
return cls.from_settings(crawler.settings, crawler)
File "d:\python\lib\site-packages\scrapy\middleware.py", line 34, in from_sett
ings
mwcls = load_object(clspath)
File "d:\python\lib\site-packages\scrapy\utils\misc.py", line 44, in load_obje
ct
mod = import_module(module)
File "d:\python\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 665, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "d:\python\lib\site-packages\scrapy\downloadermiddlewares\retry.py", line
20, in <module>
from twisted.web.client import ResponseFailed
File "d:\python\lib\site-packages\twisted\web\client.py", line 42, in <module>
from twisted.internet.endpoints import HostnameEndpoint, wrapClientTLS
File "d:\python\lib\site-packages\twisted\internet\endpoints.py", line 41, in
<module>
from twisted.internet.stdio import StandardIO, PipeAddress
File "d:\python\lib\site-packages\twisted\internet\stdio.py", line 30, in <mod
ule>
from twisted.internet import _win32stdio
File "d:\python\lib\site-packages\twisted\internet\_win32stdio.py", line 9, in
<module>
import win32api
ImportError: No module named 'win32api'
解决方案:
http://blog.csdn.net/olanlanxiari/article/details/48196255
网友评论