Hive On Spark搭建(cdh)

作者: 阿武z | 来源:发表于2018-08-04 11:40 被阅读1次

    hive 和 spark版本之前有强对应关系

    apache hive 和 spark 对应关系表
    master 2.3.0
    3.0.x 2.3.0
    2.3.x 2.0.0
    2.2.x 1.6.0
    2.1.x 1.6.0
    2.0.x 1.5.0
    1.2.x 1.3.1
    1.1.x 1.2.0
    cdh hive 和 spark对应关系

    http://archive.cloudera.com/cdh5/cdh/5/

    编译环境准备

    下载scala 2.11版本

    下载地址

    # 添加环境变量
    vim /etc/profile
    export SCALA_HOME=/root/scala-2.11.12
    export PATH=$PATH:$SCALA_HOME/bin
    
    下载maven (3.3 版本以上)

    下载地址

    # 添加环境变量
    vim /etc/profile
    export MAVEN_HOME=/root/apache-maven-3.5.3
    export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m -XX:MaxPermSize=2014M"
    export PATH=$PATH:$MAVEN_HOME/bin
    
    source /etc/profile
    

    下载源码,进行编译

    下载地址

    查看hadoop version
    指定hadoop version 编译
    # 确定版本 + 不编译hive包
    ./make-distribution.sh --name hadoop2-without-hive --tgz -Pyarn -Phadoop-provided -Phadoop-2.6 -Porc-provided -Dhadoop.version=2.6.0-cdh5.14.2
    

    编译生成spark-1.6.0-bin-hadoop2-without-hive.tgz
    解压spark-1.6.0-bin-hadoop2-without-hive.tgz 到目录(eg. /root/spark-1.6.0-bin-hadoop2-without-hive)

    添加spark配置文件
    • spark hdfs
    sudo -u hdfs hdfs dfs -mkdir -p /spark/jars
    sudo -u hdfs hdfs dfs -mkdir -p /spark/log/envent-log
    # 上传 jar包
    hdfs dfs -put /root/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar /user/root
    sudo -u hdfs hdfs dfs -mv /user/root/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar /spark/jars
    sudo -u hdfs hdfs dfs -chown hdfs /spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar
    sudo -u hdfs hdfs dfs -chmod 777 /spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar
    
    • spark-evn.sh
    vim spark-1.6.0-bin-hadoop2-without-hive/conf/spark-env.sh
    export JAVA_HOME=/usr/java/default
    export SPARK_HOME=/root/spark-1.6.0-bin-hadoop2-without-hive
    export HADOOP_HOME=/usr/lib/hadoop
    export HADOOP_CONF_DIR=/etc/hadoop/conf
    export YARN_CONF_DIR=/etc/hadoop/conf
    export SPARK_LIBARY_PATH=$SPARK_LIBARY_PATH:$HADOOP_HOME/lib/native
    export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_HOME/lib/*
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/
    export SPARK_DIST_CLASSPATH=$(hadoop classpath)
    export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=17777 -Dspark.history.fs.logDirectory=hdfs://xiwu-cluster/spark/log/envent-log"
    
    • spark-defaults.conf
    vim spark-1.6.0-bin-hadoop2-without-hive/conf/spark-defaults.conf
    spark.yarn.archive hdfs://xiwu-cluster/spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar
    spark.eventLog.enabled  true
    spark.eventLog.dir      hdfs://xiwu-cluster/spark/log/envent-log
    spark.serializer        org.apache.spark.serializer.KryoSerializer
    spark.driver.memory     1g
    

    修改hive-site.xml

    <property>
      <name>hive.execution.engine</name>
      <value>spark</value>
    </property>
    <property>
      <name>hive.enable.spark.execution.engine</name>
      <value>true</value>
    </property>
    <property>
      <name>spark.home</name>
      <value>/root/spark-1.6.0-bin-hadoop2-without-hive</value>
    </property>
    <property>
      <name>spark.yarn.jar</name>
      <value>hdfs://xiwu-cluster/spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar</value>
    </property>
    <property>
      <name>spark.master</name>
      <value>yarn-cluster</value>
    </property>
    <property>
      <name>spark.serializer</name>
      <value>org.apache.spark.serializer.KryoSerializer</value>
    </property>
    

    官方文档
    spark编译 文档
    hive on spark 文档

    相关文章

      网友评论

        本文标题:Hive On Spark搭建(cdh)

        本文链接:https://www.haomeiwen.com/subject/wropvftx.html