一、下载源码
https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2-bin-sources.tgz
二、解压源码
tar -xzvf spark-2.4.2-bin-sources.tgz
三、修改版本
dev/make-distribution.sh 文件
# that is 128~146 Lines
#VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\ # | grep -v "INFO"\
# | grep -v "WARNING"\
# | tail -n 1)
#SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
# | grep -v "INFO"\
# | grep -v "WARNING"\
# | tail -n 1)
#SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
# | grep -v "INFO"\
# | grep -v "WARNING"\
# | tail -n 1)
#SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
# | grep -v "INFO"\
# | grep -v "WARNING"\
# | fgrep --count "<id>hive</id>";\
# # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\ # # because we use "set -o pipefail" # echo -n)
# 为了让编译的时候跳过检测
VERSION=2.4.2 # spark 版本
SCALA_VERSION=2.11 # scala 版本
SPARK_HADOOP_VERSION=2.6.0-cdh5.16.1 #对应的 hadoop 版本
SPARK_HIVE=1 # 支持的 hive
四、修改 pom.xml 仓库,添加阿里云和 cloudera 仓库地址
<repositories>
<!-- This should be at top, it makes maven try the central repo first and then others and hence faster dep resolution <repository>
<id>central</id>
<name>Maven Repository</name>
<url>https://repo.maven.apache.org/maven2</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository> -->
<repository>
<id>maven-ali</id>
<url>http://maven.aliyun.com/nexus/content/groups/public//</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>always</updatePolicy>
<checksumPolicy>fail</checksumPolicy>
</snapshots>
</repository>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
五、编译
./dev/make-distribution.sh \
--name cdh5.16.1 \
--tgz \
-Dhadoop.version=2.6.0-cdh5.16.1 \
-Phadoop-2.6 \
-Phive \
-Phive-thriftserver \
-Pyarn
网友评论