1.准备工作
1.1 组件及源码下载
组件名称 | 组件版本 |
---|---|
centos | centos6.4 |
jdk | jdk-8u80-linux-x64.tar.gz |
maven | apache-maven-3.6.1-bin.tar.gz |
scala | scala-2.11.6.tgz |
hadoop | Hadoop-2.6.0-cdh5.15.1-src.tar.gz |
⚠️ 版本要求:JDK1.8+,maven3.5.4+,scala 2.11.+
1.2 环境搭建同:hadoop-2.6.0-cdh5.15.1源码编译
2.编译Spark
本次编译的spark版本是2.4.2,可以去官网下载对应版本的源码包
2.1.上传源码包并解压
[root@hadoop001 ~]# tar -zxvf /home/hadoop/soft/spark-2.4.2.tar.gz -C /home/hadoop/source/
2.2.修改配置文件
2.2.1.源码包的pom.xml文件
[root@hadoop001 ~]# vim /home/hadoop/source/spark-2.4.2/pom.xml
<!--添加cdh仓库地址-->
<repository>
<id>cloudera</id>
<name>cloudera repository</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
2.2.2.修改make-distribution.sh文件(可选)
为了加编译速度,修改ake-distribution.sh文件
[root@hadoop001 ~]# vim /home/hadoop/source/spark-2.4.2/dev/make-distribution.sh
# 注销如下配置
#VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v "INFO" | tail -n 1)
#SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
# | grep -v "INFO"\
# | tail -n 1)
#SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
# | grep -v "INFO"\
# | tail -n 1)
#SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
# | grep -v "INFO"\
# | fgrep --count "<id>hive</id>";\
# # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
# # because we use "set -o pipefail"
# echo -n)
注销后面添加如下配置:
VERSION=2.4.2
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.15.1
SPARK_HIVE=1
2.3.编译
- 执行编译命令
[root@hadoop001 ~]# cd ~/source/spark-2.4.2/
[root@hadoop001 spark-2.4.2]# ./dev/make-distribution.sh --name 2.6.0-cdh5.15.1 --tgz -Pyarn -Phive -Phive-thriftserver -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.15.1
- 命令解释
1) --name 指定的是编译后”spark-2.4.2-后缀“ 的后缀名,规范写成hadoop的版本
2) -tgz,表示打成tar.gz包,必须有
3) -Pyarn,表示打的包支持yarn
4) -Phive -Phive-thriftserver,表示打的包支持hive的相关服务
5) -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.15.1,表示打的包支持集成hadoop的相关服务
- 查看编译结果
make-distribution.sh脚本最终打包的包在spark-2.4.2目录下
# 将已经编译好的包移动到/home/hadoop/soft下面
[root@hadoop001 spark-2.4.2]# mv /home/hadoop/source/spark-2.4.2/spark-2.4.2-bin-2.6.0-cdh5.15.1.tgz /home/hadoop/soft/
3.部署验证
3.1.部署spark-2.4.2-bin-2.6.0-cdh5.15.1.tgz
- 解压并配置环境变量
# 解压
[root@hadoop001 soft]# tar -zxvf spark-2.4.2-bin-2.6.0-cdh5.15.1.tgz -C /home/hadoop/app/
# 配置环境变量
[root@hadoop001 soft]# vi ~/.bash_profile
export SPARK_HOME=/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.15.1
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
# 使配置生效
[root@hadoop001 soft]# source ~/.bash_profile
- 运行测试
[root@hadoop001 spark-2.4.2]# spark-shell local[2]
19/08/16 09:12:57 WARN Utils: Your hostname, i-oj1j9ghi resolves to a loopback address: 127.0.0.1; using 10.160.4.55 instead (on interface eth0)
19/08/16 09:12:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/08/16 09:12:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://10.160.4.55:4040
Spark context available as 'sc' (master = local[*], app id = local-1565917995440).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.2
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_40)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
看到以上效果即表示部署成功
网友评论