本文介绍了使用Spark连接Hive的两种方式,spark-shell和IDEA远程连接。
1.spark-shell
1.1.拷贝配置文件
- 拷贝hive/conf/hdfs-site.xml 到 spark/conf/ 下
- 拷贝hive/lib/mysql 到 spark/jars/下
这里可以通过如下参数来实现指定jar-path
--driver-class-path path/mysql-connector-java-5.1.13-bin.jar
1.2.启动spark-shell
spark.sql("show databases").show()
spark.sql("use test")
spark.sql("select * from student").show()
执行结果:
[hadoop@hadoop1 spark-2.3.0-bin-hadoop2.7]$ ./bin/spark-shell
2018-09-04 11:43:10 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop1:4040
Spark context available as 'sc' (master = local[*], app id = local-1536032600945).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show databases").show()
2018-09-04 11:43:54 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
+------------+
|databaseName|
+------------+
| default|
| test|
+------------+
scala> spark.sql("use test")
res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("select * from student").show()
+----+-----+---+----+-----+
| sno|sname|sex|sage|sdept|
+----+-----+---+----+-----+
|1001| 张三| 男| 22| 高一|
|1002| 李四| 女| 25| 高二|
+----+-----+---+----+-----+
2.IDEA连接Hive
这里是连接远程的Hive,如果还没有部署Hive,请参考Hive之环境安装,前提是必须先启动hdfs。
2.1.引入依赖
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
<!--<scope>provided</scope>-->
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.0</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency><!--数据库驱动:Mysql-->
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.40</version>
</dependency>
2.2.拷贝配置文件
拷贝hive-site.xml到项目的resources目录下即可
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop1:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
</configuration>
2.3.编写代码
object HiveSupport {
def main(args: Array[String]): Unit = {
//val warehouseLocation = "D:\\workspaces\\idea\\hadoop"
val spark =
SparkSession.builder()
.appName("HiveSupport")
.master("local[2]")
//拷贝hdfs-site.xml不用设置,如果使用本地hive,可通过该参数设置metastore_db的位置
//.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport() //开启支持hive
.getOrCreate()
//spark.sparkContext.setLogLevel("WARN") //设置日志输出级别
import spark.implicits._
import spark.sql
sql("show databases")
sql("use test")
sql("select * from student").show()
Thread.sleep(150 * 1000)
spark.stop()
}
}
执行结果:
+----+-----+---+----+-----+
| sno|sname|sex|sage|sdept|
+----+-----+---+----+-----+
|1001| 张三| 男| 22| 高一|
|1002| 李四| 女| 25| 高二|
+----+-----+---+----+-----+
参考:
网友评论