一、Spark2.4.0源码下载
下载地址:https://archive.apache.org/dist/spark/spark-2.4.2/

/home/hadoop/soul/soft/source
wget https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgz

解压
tar -zxvf spark-2.4.2.tgz
二、Spark2.4.2编译网站及前置准备
Spark2.4.2编译官网
The Maven-based build is the build of reference for Apache Spark. Building Spark using Maven requires Maven 3.5.4 and Java 8. Note that support for Java 7 was removed as of Spark 2.2.0.
1、从官网得知我们的Java需1.8+,Maven需3.5.4+
2、You’ll need to configure Maven to use more memory than usual by setting MAVEN_OPTS:
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
If you don’t add these parameters to MAVEN_OPTS, you may see errors and warnings like the following:
[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.12/classes...
[ERROR] Java heap space -> [Help 1]
当编译遇到以上错误时请设置Maven参数
3、环境准备
[hadoop@hadoop000 source]$ java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
-----------------------------scala 版本需要2.12版本---------------------------------------
[hadoop@hadoop000 source]$ scala -version
Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
---------------------------------------------------------------------------------------
[hadoop@hadoop000 source]$ mvn -version
Apache Maven 3.6.1 (d66c9c0b3152b2e69ee9bac180bb8fcc8e6af555; 2019-04-05T03:00:29+08:00)
Maven home: /home/hadoop/soul/app/apache-maven-3.6.1
Java version: 1.8.0_201, vendor: Oracle Corporation, runtime: /home/hadoop/soul/app/jdk1.8/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-431.el6.x86_64", arch: "amd64", family: "unix"
三、源码编译
1、修改pom.文件,添加cloudera的repository,因为集群是CDH版本的。
进入解压后的源码路径
/home/hadoop/soul/soft/source/spark-2.4.2
vim pom.xml
在245行添加以下内容
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
</repository>
2、修改make-distribution.sh,加快编译速度
将128-146行注释,添加以下内容
VERSION=2.4.2
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1

VERSION:编译Saprk的版本
SCALA_VERSION:Scala的binary版本
SPARK_HADOOP_VERSION:支持的Hadoop版本
SPARK_HIVE:Spark是否支持Hive,1代表支持
3、执行编译
./dev/make-distribution.sh \
--name 2.6.0-cdh5.7.0 \
--tgz \
-Phadoop-2.6 \
-Phive \
-Phive-thriftserver \
-Pyarn \
-Pkubernetes \
-Dhadoop.version=2.6.0-cdh5.7.0
这里取决于自己网络快慢,我头两次编译总是报错,过来两天选用网人少时编译就直接成功了,有VPN建议开着编译。
4、编译完后在解压的Spark源码包根路径下生成一个spark-2.4.2-bin-2.6.0-cdh5.7.0.tgz

5、部署编译后的包
export SPARK_HOME=/home/hadoop/soul/app/spark-2.4.2-bin-2.6.0-cdh5.7.0
export PATH=$SPARK_HOME/bin:$PATH
source ~/.bash_profile
[hadoop@hadoop000 ~]$ spark-shell --master local[2]
19/05/02 14:59:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop000:4040
Spark context available as 'sc' (master = local[2], app id = local-1556780387603).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.2
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
网友评论