集群已安装CDH与Hive,所以算混布。开始按照一些教程从头开始配置,一直报错,包括
- 发现服务找不到
io.airlift.discovery.client.CachingServiceSelector
Cannot connect to discovery server
- 高版本presto报jdk8的小版本太低
Presto requires Java 8u151+
等等,直到使用这个0.177版本的简单粗暴的配置教程
https://axsauze.github.io/hadoop-overview/section-7/7-8.html
Server安装配置
分别在test集群的4、5、6节点上操作,4作为coordinator,其他俩作为worker
cd /home/work/
# 获取presto
wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.177/presto-server-0.177.tar.gz
# 解压安装
tar -xvf presto-server-0.177.tar.gz
cd presto-server-0.177
# 获取默认配置文件
wget http://media.sundog-soft.com/hadoop/presto-hdp-config.tgz
tar -xvf presto-hdp-config.tgz
查看配置文件
cat node.properties
node.environment=production # 每个节点一致
node.id=f7c4bf3c-dbb4-4807-baae-9b7e41807bc4 # 这个id需要每个节点不同
node.data-dir=/var/presto/data # 不存在的目录需要新建
cat log.properties
com.facebook.presto=WARN
cat jvm.config
# 根据自己机器情况配置
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
cat config.properties
# coordinator的配置
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8090
query.max-memory=10GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://10.1.249.55:8090 # coordinator 的内网地址
# worker的配置
coordinator=false
# node-scheduler.include-coordinator=true
http-server.http.port=8090
query.max-memory=10GB
query.max-memory-per-node=1GB
#discovery-server.enabled=true
discovery.uri=http://10.1.249.55:8090 # coordinator 的内网地址
修改hive连接配置中的metastore.uri
cat catalog/hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://bigdata-hadoop-test-datanode02.com:9083
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
启动Server
先启动coordinator(+发现服务)
bin/launcher start
然后在worker节点也用这个指令启动
安装客户端
在节点4安装
cd bin
wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.177/presto-cli-0.177-executable.jar
# And rename it
mv presto-cli-0.177-executable.jar presto
# and executable
chmod +x presto
使用
bin/presto --server 127.0.0.1:8090 --catalog hive
# 只起了1个worker
presto> select * from test.t_employee;
id | emp_name | dep_name | salary | age
----+----------+------------+--------+-----
12 | Matthew | Management | 4500.0 | 20
13 | Matthew | Management | 4600.0 | 60
1 | Matthew | Management | 4500.0 | 55
2 | Olivia | Management | 4400.0 | 61
3 | Grace | Management | 4000.0 | 42
4 | Jim | Production | 3700.0 | 35
5 | Alice | Production | 3500.0 | 24
6 | Michael | Production | 3600.0 | 28
7 | Tom | Production | 3800.0 | 35
8 | Kevin | Production | 4000.0 | 52
9 | Elvis | Service | 4100.0 | 40
10 | Sophia | Sales | 4300.0 | 36
11 | Samantha | Sales | 4100.0 | 38
(13 rows)
Query 20190517_081300_00005_xwftv, FINISHED, 2 nodes
Splits: 19 total, 19 done (100.00%)
0:01 [13 rows, 377B] [10 rows/s, 305B/s]
# 起了第二个worker,并重新连接客户端
presto> select count(*) from peiyou4.t_employee;
_col0
-------
13
(1 row)
Query 20190517_083441_00006_xwftv, FINISHED, 3 nodes
Splits: 20 total, 20 done (100.00%)
0:01 [13 rows, 377B] [9 rows/s, 289B/s]
附一个Python连接的例子
import prestodb
conn=prestodb.dbapi.connect(
host='localhost',
port=8090,
user='me',
catalog='hive',
schema='test',
)
cur = conn.cursor()
cur.execute('SELECT * FROM test.t_employee')
rows = cur.fetchall()
for i in rows:
print i
Done。
网友评论