环境信息:CarbonData Carbon Thrift Server集成OSS - 环境搭建
1、运行carbondata thrift server
/home/carbondata/spark-2.2.1-bin-hadoop2.7/bin/spark-submit --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer $SPARK_HOME/carbonlib/apache-carbondata-1.6.0-SNAPSHOT-bin-spark2.2.1-hadoop2.7.2.jar <table-path-on-s3> <access-key> <secret-key> <s3-endpoint>
其中access-key,secret-key,s3-endpoint,table-path-on-s3为CarbonData oss bucket管理中提到的4个基本要素
6、使用Beeline连接thrift server
- Beeline命令
cd $SPARK_HOME
./bin/beeline -u jdbc:hive2://localhost:10000
- Beeline中输入创建表语句
CREATE TABLE IF NOT EXISTS test_table ( id string, name string,city string,age Int) STORED AS carbondata LOCATION 's3a://demo20190203/carbon/session/data/store' ;
oss中数据如下:
image.png
3)Beeline中载入数据到OSS
LOAD DATA INPATH 'hdfs://localhost:9000/tmp/sample.csv' INTO TABLE test_table;
OSS中数据如下:
image.png
image.png
4)Beeline中查询数据
SELECT city, avg(age), sum(age) FROM test_table GROUP BY city;
查询的数据如下:
image.png
网友评论