美文网首页
spark连接hive,使用sparksql处理hive中的数据

spark连接hive,使用sparksql处理hive中的数据

作者: 会飞的蜗牛66666 | 来源:发表于2019-03-29 15:00 被阅读0次

    spark连接到hive首先要先配置3个文件,放到idea的resource目录下,如下:
    core-site.xml
    从集群环境中拉下来。
    hdfs-site.xml
    从环境中拉下来
    hive-site.xml:
    <configuration>
    <property>
    <name>hive.exec.scratchdir</name>
    <value>/user/hive/tmp</value>
    </property>
    <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    </property>
    <property>
    <name>hive.querylog.location</name>
    <value>/user/hive/log</value>
    </property>
    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://knowyou-hdp-02:9083</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://knowyou-hdp-01:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive</value>
    </property>
    

    </configuration>

    pom文件配置,与集群环境一致
    <dependencies>

    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.2.0</version>
    </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>2.2.0</version>
            <scope>compile</scope>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
    </dependencies>
    

    程序测试:成功
    object SparkHive {

    def main(args: Array[String]): Unit = {

    val spark = SparkSession
      .builder()
      .master("local[*]")
      .appName("aaa")
      .enableHiveSupport()
      .getOrCreate()
    spark.sparkContext.setLogLevel(LogLevel.ERROR.toString)
    val sql = "select * from default.sparkdemo"
    spark.sql(sql).show()
    

    }
    }

    相关文章

      网友评论

          本文标题:spark连接hive,使用sparksql处理hive中的数据

          本文链接:https://www.haomeiwen.com/subject/shblbqtx.html