前言
最近接手业务团队写的数据处理代码,由于他们不熟悉大数据,数据处理的过程都是通过jdbc的方式从presto拉取数据,然后通过pandas等python框架进行处理,最后输出到业务端,为了更好的调试代码,今天在本地安装一个presto。
Presto架构

安装步骤
1. 官网链接如下,本次选用版本为presto-0.266,需下载下面两个包
https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.266/presto-server-0.266.tar.gz
https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.266/presto-cli-0.266-executable.jar
- 解压安装包并配置客户端工具
tar -zxvf presto-server-0.266.tar.gz -C ~/bigdata
tar -zxvf presto-cli-0.266.tar.gz -C ~/bigdata
cp presto-cli-0.266-executable.jar ~/bigdata/presto-server-0.266/bin
//进入presto-server-0.266/bin目录
mv presto-cli-0.227-executable.jar presto
chmod +x presto
- 创建所需文件夹
//在~/bigdata目录下创建,用来存放presto源数据,建议放在presto-server之外的路径,方便后面升级
mkdir presto-directory
//在presto-server-0.266目录下创建
mkdir etc
2. etc目录下创建所需文件
- vim node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff-001
node.data-dir=/Users/dengpengfei/bigdata/presto-directory
- vim jvm.config
-server
-Xmx4G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
- vim config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=2GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
- vim log.properties
com.facebook.presto=INFO
3. catalog配置
- 创建catalog目录
//在etc目录下创建
mkdir catalog
- vim mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://localhost:3306
connection-user=root
connection-password=123456
- vim hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
hive.config.resources=/Users/dengpengfei/bigdata/hadoop-3.2.2/etc/hadoop/core-site.xml,/Users/dengpengfei/bigdata/hadoop-3.2.2/etc/hadoop/hdfs-site.xml
4. hive metastore启动
//在hive按照目录下
nohup bin/hive --service metastore >/dev/null 2>&1 &
注:此处必须保证hive-env.sh中配置了hadoop
5. presto启动
- 启动命令
//在presto-server-0.266目录下,前台运行
bin/launcher run
或者
//在presto-server-0.266目录下,后台运行
bin/launcher start
- 运行客户端
//在presto-server-0.266目录下运行
bin/presto
- 执行sql查询
dengpengfei@192 presto-server-0.266 % bin/presto
presto> select * from hive.userdb.employee;
id | name | salary | desc
----+-------+--------+------------
1 | david | 10000 | 一月份工资
(1 row)
Query 20211207_153723_00002_nxyy2, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
142ms [1 rows, 30B] [7 rows/s, 211B/s]
6. jdbc连接presto
import logging
import os
import traceback
import pandas as pd
from pyhive import presto
from config.presto_conf import *
class PrestoHelper(object):
logger = logging.getLogger("PrestoHelper")
def __init__(self):
self.conn = presto.connect(host=HOST, port=PORT, username=os.environ.get("USER", USERNAME))
self.cursor = self.conn.cursor()
print('presto connected')
def get_last_item(self, sql):
try:
self.cursor.execute(sql)
ret = self.cursor.fetchall()
last_item = sorted([r[0] for r in ret])[-1]
return last_item
except Exception as e:
self.logger.warning("Failed to execute: {}".format(sql))
return None
def get_all_item(self, sql):
try:
self.cursor.execute(sql)
ret = self.cursor.fetchall()
ret = [[rr for rr in r] for r in ret]
return ret
except Exception as e:
# traceback.print_exc()
self.logger.warning("Failed to execute: {}".format(sql))
return None
def iterate_sql_result_as_dict(self, sql):
self.cursor.execute(sql)
next_dict = dict()
columns = None
try:
columns = [d[0] for d in self.cursor.description]
except Exception as e:
self.logger.warning("Failed to execute: {}".format(sql))
for x in []:
yield x
for doc in self.cursor:
for column_name, value in zip(columns, doc):
next_dict[column_name] = value
yield next_dict
def get_df(self, sql):
try:
ret = self.get_all_item(sql)
df = pd.DataFrame(ret)
df.columns = [d[0] for d in self.cursor.description]
return df
except Exception as e:
traceback.print_exc()
self.logger.error("Failed to execute: {}".format(sql))
return None
if __name__ == "__main__":
helper = PrestoHelper()
for x in helper.iterate_sql_result_as_dict(
"select * from hive.userdb.employee"):
print(x)
输出结果:
/Users/dengpengfei/PycharmProjects/PySpark/venv/bin/python /Users/dengpengfei/PycharmProjects/brunei-message-manage/utils/presto_helper.py
presto connected
{'id': 1, 'name': 'david', 'salary': 10000, 'desc': '一月份工资'}
Process finished with exit code 0
结
本次本地搭建presto环境,并输出了一个python查询的小案例。
网友评论