美文网首页
CDH支持Spark-sql

CDH支持Spark-sql

作者: Bloo_m | 来源:发表于2017-10-23 19:42 被阅读0次

之前用CDH5.2进行集群的搭建,现需要将CDH支持spark-sql,具体搭建请见CDH离线安装

一:准备环境

jdk1.7.0_79
scala2.10.4
maven3.3.9
spark-1.1.0.tgz
配置环境变量如下,并使其生效:source /etc/profile
export JAVA_HOME=/usr/local/jdk1
export M2_HOME=/usr/local/maven
export SCALA_HOME=/usr/local/scala
export PATH=$JAVA_HOME/bin:$M2_HOME/bin:$SCALA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

现已有编译好的spark

spark-hadoop

二:编译spark源码

1. 重新设置maven编译所占空间,因为编译过程复杂、时间长
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

2解压源码并进行编译
nohup mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=hadoop2.5.0-cdh5.2.0 -Dscala-2.10.4 -Phive -Phive-thriftserver -DskipTests clean package > ./spark-mvn-date +%Y%m%d%H.log 2>&1 &

三.安装spark assembly

1.拷贝assembly jar包
将编译好的assembly包拷贝到指向CDH的jars目录下
$cp spark-assembly-1.1.0-hadoop2.4.0.jar /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/

2.替换CDH中spark下的assembly jar包
修改软链接
$ cd /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark/assembly/lib
$ ln -s ../../../jars/spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar
$ ln -s spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar spark-assembly.jar

3.拷贝spark-sql运行文件
从spark源文件的bin下拷贝到CDH的spark的bin目录下
$ mv /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql.bak
$ cp /root/spark-1.1.0-bin-hadoop2.4/bin/spark-sql /opt/cloudera/parcels/CDH/lib/spark/bin/

4.配置环境变量
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HADOOP_CMD=/opt/cloudera/parcels/CDH/bin/hadoop
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export SCALA_HOME=/usr/local/scala-2.10.4
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SCALA_HOME/bin

5.拷贝assembly jar包拷贝到HDFS
首先需要将assembly jar拷贝到HDFS的/user/spark/share/lib目录下,修改文件权限为755

6.在CM上配置
登陆CM,修改spark的服务范围为assembly jar在HDFS中的路径

修改服务范围
修改高级配置
修改客户端配置

7.运行spark-sql

运行sql

四:关闭spark-sql的INFO信息

1.备份log4j.properties
进入$SPARK_HOME/conf目录下
$cp /opt/cloudera/parcels/CDH/lib/spark/conf/log4j.properties /opt/cloudera/parcels/CDH/lib/spark/conf/log4j.properties.bak

2. 进入log4j.properties文件,将其中的INFO修改为WARN(第二行位置),内容如下:

修改文件

3 local class incompatible: stream classdesc serialVersionUID = 5017373498943810947, local class serialVersionUID = 18257903091306170
解决方案:client端类版本与server端不一致,将client端的jar包上传到hdfs上并配置spark文件

相关文章

网友评论

      本文标题:CDH支持Spark-sql

      本文链接:https://www.haomeiwen.com/subject/pwbauxtx.html