sample java:
org.apache.spark.examples.sql.hive.JavaSparkHiveExample
几处修改:
SparkSession spark = SparkSession
.builder()
.appName("Java Spark Hive jar Example")
//.master("spark://hadoopnode3:7077")
.config("spark.executor.memory", "512m")
//.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate();
spark.sql("use myhive");
spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)");
//spark.sql("LOAD DATA LOCAL INPATH 'resources/kv1.txt' INTO TABLE src");
spark.sql("LOAD DATA INPATH '/resources/kv1.txt' INTO TABLE src");
kv1.txt 是个简单的2列数据, 空格分开, 执行后会自动删除。
1. eclipse java 搭建
image.png
2. export to jar file from eclipse
a. 右键点击该java file, 并导出export to a jar file: hiveTesting.jar
b. 上传范例需要的txt or json 等文件
hadoop fs -put ./resources /resources
3. upload to one testbed on hadoop cluster
/tmp/hiveTesting.jar
- spark submit
spark-submit --class org.apache.spark.examples.sql.hive.JavaSparkHiveExample --master spark://hadoopnode3:7077 /tmp/hiveTesting.jar
因为有个hadoop node关了, 就直接指定了 -- master 机器, 没有使用 yarn cluster.
2019-01-04 12:21:58 INFO metastore:376 - Trying to connect to metastore with URI thrift://hadoopnode3:9083
2019-01-04 12:21:58 INFO metastore:472 - Connected to metastore.
2019-01-04 12:21:58 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.2.28.8:53935) with ID 0
2019-01-04 12:21:58 INFO BlockManagerMasterEndpoint:54 - Registering block manager 10.2.28.8:54120 with 93.3 MB RAM, BlockManagerId(0, 10.2.28.8, 54120, None)
2019-01-04 12:22:00 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.2.135.104:46928) with ID 1
2019-01-04 12:22:00 INFO BlockManagerMasterEndpoint:54 - Registering block manager 10.2.135.104:35993 with 93.3 MB RAM, BlockManagerId(1, 10.2.135.104, 35993, None)
2019-01-04 12:22:19 INFO SessionState:641 - Created local directory: /opt/hive/iotmp/dd87c362-9b6e-4f83-b21a-14196a6cd64d_resources
2019-01-04 12:22:19 INFO SessionState:641 - Created HDFS directory: /user/hive/tmp/hadoop/dd87c362-9b6e-4f83-b21a-14196a6cd64d
2019-01-04 12:22:19 INFO SessionState:641 - Created local directory: /opt/hive/iotmp/root/dd87c362-9b6e-4f83-b21a-14196a6cd64d
2019-01-04 12:22:19 INFO SessionState:641 - Created HDFS directory: /user/hive/tmp/hadoop/dd87c362-9b6e-4f83-b21a-14196a6cd64d/_tmp_space.db
2019-01-04 12:22:19 INFO HiveClientImpl:54 - Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse
2019-01-04 12:22:21 INFO SQLStdHiveAccessController:95 - Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=dd87c362-9b6e-4f83-b21a-14196a6cd64d, clientType=HIVECLI]
2019-01-04 12:22:21 INFO metastore:291 - Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
2019-01-04 12:22:21 INFO metastore:376 - Trying to connect to metastore with URI thrift://hadoopnode3:9083
2019-01-04 12:22:21 INFO metastore:472 - Connected to metastore.
2019-01-04 12:22:21 INFO metastore:376 - Trying to connect to metastore with URI thrift://hadoopnode3:9083
2019-01-04 12:22:21 INFO metastore:472 - Connected to metastore.
2019-01-04 12:22:22 ERROR KeyProviderCache:87 - Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
2019-01-04 12:22:22 INFO Hive:2641 - Renaming src: hdfs://ns1/resources/kv1.txt, dest: hdfs://ns1/user/hive/warehouse/myhive.db/src/kv1_copy_1.txt, Status:true
2019-01-04 12:23:04 INFO CodeGenerator:54 - Code generated in 263.536255 ms
2019-01-04 12:23:04 INFO CodeGenerator:54 - Code generated in 34.751521 ms
2019-01-04 12:23:05 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 618.1 KB, free 911.7 MB)
2019-01-04 12:23:06 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 54.9 KB, free 911.6 MB)
2019-01-04 12:23:06 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hadoopnode3:41685 (size: 54.9 KB, free: 912.2 MB)
2019-01-04 12:23:06 INFO SparkContext:54 - Created broadcast 0 from
2019-01-04 12:23:07 INFO FileInputFormat:249 - Total input paths to process : 2
2019-01-04 12:23:07 INFO SparkContext:54 - Starting job: show at JavaSparkHiveExample.java:87
2019-01-04 12:23:07 INFO DAGScheduler:54 - Got job 0 (show at JavaSparkHiveExample.java:87) with 1 output partitions
2019-01-04 12:23:07 INFO DAGScheduler:54 - Final stage: ResultStage 0 (show at JavaSparkHiveExample.java:87)
2019-01-04 12:23:07 INFO DAGScheduler:54 - Parents of final stage: List()
2019-01-04 12:23:07 INFO DAGScheduler:54 - Missing parents: List()
2019-01-04 12:23:07 INFO DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[6] at show at JavaSparkHiveExample.java:87), which has no missing parents
2019-01-04 12:23:07 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 11.8 KB, free 911.6 MB)
2019-01-04 12:23:07 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.8 KB, free 911.6 MB)
2019-01-04 12:23:07 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on hadoopnode3:41685 (size: 5.8 KB, free: 912.2 MB)
2019-01-04 12:23:07 INFO SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1039
2019-01-04 12:23:07 INFO DAGScheduler:54 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[6] at show at JavaSparkHiveExample.java:87) (first 15 tasks are for partitions Vector(0))
2019-01-04 12:23:07 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 1 tasks
2019-01-04 12:23:07 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, 10.2.28.8, executor 0, partition 0, ANY, 7903 bytes)
2019-01-04 12:23:08 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on 10.2.28.8:54120 (size: 5.8 KB, free: 93.3 MB)
2019-01-04 12:23:08 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 10.2.28.8:54120 (size: 54.9 KB, free: 93.2 MB)
2019-01-04 12:23:30 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 22968 ms on 10.2.28.8 (executor 0) (1/1)
2019-01-04 12:23:30 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2019-01-04 12:23:30 INFO DAGScheduler:54 - ResultStage 0 (show at JavaSparkHiveExample.java:87) finished in 23.171 s
2019-01-04 12:23:30 INFO DAGScheduler:54 - Job 0 finished: show at JavaSparkHiveExample.java:87, took 23.556848 s
+---+-------+
|key| value|
+---+-------+
|238|val_238|
| 86| val_86|
|311|val_311|
| 27| val_27|
|165|val_165|
|409|val_409|
|255|val_255|
|278|val_278|
| 98| val_98|
|484|val_484|
|265|val_265|
|193|val_193|
|401|val_401|
|150|val_150|
|273|val_273|
|224|val_224|
|369|val_369|
| 66| val_66|
|128|val_128|
|213|val_213|
+---+-------+
only showing top 20 rows
2019-01-04 12:23:59 INFO DAGScheduler:54 - Job 1 finished: show at JavaSparkHiveExample.java:97, took 27.595771 s
+--------+
|count(1)|
+--------+
| 502|
+--------+
2019-01-04 12:24:03 INFO DAGScheduler:54 - Job 5 finished: show at JavaSparkHiveExample.java:111, took 0.074884 s
+--------------------+
| value|
+--------------------+
|Key: 0, Value: val_0|
|Key: 0, Value: val_0|
|Key: 0, Value: val_0|
|Key: 2, Value: val_2|
|Key: 4, Value: val_4|
|Key: 5, Value: val_5|
|Key: 5, Value: val_5|
|Key: 5, Value: val_5|
|Key: 8, Value: val_8|
|Key: 9, Value: val_9|
+--------------------+
2019-01-04 12:24:04 INFO AbstractConnector:318 - Stopped Spark@6230c6fb{HTTP/1.1,[http/1.1]}{10.2.28.8:4041}
2019-01-04 12:24:04 INFO SparkUI:54 - Stopped Spark web UI at http://hadoopnode3:4041
2019-01-04 12:24:04 INFO StandaloneSchedulerBackend:54 - Shutting down all executors
2019-01-04 12:24:04 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asking each executor to shut down
2019-01-04 12:24:04 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2019-01-04 12:24:04 INFO MemoryStore:54 - MemoryStore cleared
2019-01-04 12:24:04 INFO BlockManager:54 - BlockManager stopped
2019-01-04 12:24:04 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2019-01-04 12:24:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2019-01-04 12:24:04 INFO SparkContext:54 - Successfully stopped SparkContext
2019-01-04 12:24:04 INFO ShutdownHookManager:54 - Shutdown hook called
2019-01-04 12:24:04 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-80628156-cd36-433a-813d-71dccf299d09
2019-01-04 12:24:04 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-e17b4714-922b-4e8d-bcfd-00d2ca298b1a
运行了两次, fs文件列表如下
[hadoop@hadoopnode3 ~]$ hadoop fs -ls /user/hive/warehouse/myhive.db/src
Found 2 items
-rwxr-xr-x 2 hadoop supergroup 5812 2019-01-04 11:43 /user/hive/warehouse/myhive.db/src/kv1.txt
-rwxr-xr-x 2 hadoop supergroup 16 2019-01-04 12:14 /user/hive/warehouse/myhive.db/src/kv1_copy_1.txt
[hadoop@hadoopnode3 ~]$ hadoop fs -cat /user/hive/warehouse/myhive.db/src/kv1_copy_1.txt
888 999
666 777
网友评论