python爬虫用到的工具和类库
2018年04月23日 00:40:58 彭世瑜 阅读数:1346
版权声明:本文为博主原创文章,欢迎转载,请注明出处 https://blog.csdn.net/mouday/article/details/80045172
需要安装的工具和库
python https://www.python.org/
pycharm https://www.jetbrains.com/pycharm/
可以直接去官网下载安装
urllib re
>>> fromurllib.requestimporturlopen>>> response = urlopen("http://www.baidu.com")>>> response
1
2
3
4
requests http://cn.python-requests.org/zh_CN/latest/
>>> importrequests>>> response = requests.get("http://www.baidu.com")>>> response
1
2
3
4
selenium https://www.seleniumhq.org/
chromedriver
google官网:https://sites.google.com/a/chromium.org/chromedriver/downloads
淘宝镜像:https://npm.taobao.org/mirrors/chromedriver/
>>> fromseleniumimportwebdriver>>> driver = webdriver.Chrome()>>> driver.get("http://www.baidu.com")>>> driver.get("https://www.python.org")>>> html = driver.page_source
1
2
3
4
5
phantomjs http://phantomjs.org/
>>> fromseleniumimportwebdriver>>> dirver = webdriver.PhantomJS()>>> dirver.get("http://www.baidu.com")>>> html = driver.page_source
1
2
3
4
lxml http://lxml.de/
beautifulsoup4 https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
>>> from bs4 import BeautifulSoup as BS>>> html = "
">>> soup = BS(html, "lxml")>>> soup.h11
2
3
4
5
pyquery https://pythonhosted.org/pyquery/
>>> frompyqueryimportPyQueryaspq>>> html ="<html><h1>title</h1></html>">>> doc = pq(html)>>> doc("html").text()'title'>>> doc("h1").text()'title'
1
2
3
4
5
6
7
mysql https://dev.mysql.com/downloads/mysql/
redis https://redis.io/
mongobd https://www.mongodb.com/
mac os 可以使用 brew 安装 https://docs.brew.sh/
pymysql
>>> importpymysql https://pypi.org/project/PyMySQL/>>> conn = pymysql.connect(host="localhost", user="root", password="123456", port=3306, db="demo")>>> cursor = conn.cursor()>>> sql ="select * from mytable">>> cursor.execute(sql)3>>> cursor.fetchone()(1, datetime.date(2018,4,14))>>> cursor.close()>>> conn.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
pymongo http://api.mongodb.com/python/current/index.html
>>> importpymongo>>> client = pymongo.MongoClient("localhost")>>> db = client["newtestdb"]>>> db["table"].insert({"name":"Tom"})ObjectId('5adcb250d7696c839a251658')>>> db["table"].find_one({"name":"Tom"}){'_id': ObjectId('5adcb250d7696c839a251658'),'name':'Tom'}
1
2
3
4
5
6
7
8
9
10
redis
>>> importredis>>> r = redis.Redis("localhost",6379)>>> r.set("name","Tom")True>>> r.get("name")b'Tom'
1
2
3
4
5
6
7
8
web框架包:
flask http://docs.jinkan.org/docs/flask/
django https://www.djangoproject.com/
jupyter http://jupyter.org/
运行:jupyter notebook
快捷键 增加一行:b
pipinstallrequests selenium beautifulsoup4 pyquery pymysql pymongo redis flask django jupyter
网友评论