美文网首页
Spark2.4.0 源码编译

Spark2.4.0 源码编译

作者: 井地儿 | 来源:发表于2019-03-17 14:19 被阅读0次

    Spark源码编译

    源码下载

    从github上下载最新版本spark源码
    https://github.com/apache/spark

    Apache Maven(Maven编译)

    基于maven的编译的版本要求如下:
    Maven版本:3.5.4+
    Java版本:java8+

    设置maven使用内存

    export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
    

    如果没有设置上述参数,可能会报错:

    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.11/classes...
    [ERROR] Java heap space -> [Help 1]
    

    build/mvn

    Spark提供了自动化maven编译脚本,会自动下载安装编译所需要的Maven,Scala,Zinc。
    编译命令

    ./build/mvn -DskipTests clean package
    

    mac环境下,如果你曾从bash风格切换到zsh风格之后,没有在.zshrc中配置JAVA_HOME环境变量,可能会报错:

    Cannot run program "/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/bin/javac": error=2, No such file or directory
    

    在 ~/.zshrc 配置文件中配置JAVA_HOME即可。

    building...

    stefan@localhost  ~/Documents/workspace/code/spark   master  ./build/mvn -DskipTests clean package
    Using `mvn` from path: /Users/stefan/Documents/workspace/code/spark/build/apache-maven-3.6.0/bin/mvn
    [INFO] Scanning for projects...
    ...
    [INFO] ------------------------------------------------------------------------
    [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
    [INFO]
    [INFO] Spark Project Parent POM ........................... SUCCESS [  4.010 s]
    [INFO] Spark Project Tags ................................. SUCCESS [  7.204 s]
    [INFO] Spark Project Sketch ............................... SUCCESS [  6.099 s]
    [INFO] Spark Project Local DB ............................. SUCCESS [  3.870 s]
    [INFO] Spark Project Networking ........................... SUCCESS [  8.308 s]
    [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  3.860 s]
    [INFO] Spark Project Unsafe ............................... SUCCESS [  6.418 s]
    [INFO] Spark Project Launcher ............................. SUCCESS [  5.159 s]
    [INFO] Spark Project Core ................................. SUCCESS [02:01 min]
    [INFO] Spark Project ML Local Library ..................... SUCCESS [  5.823 s]
    [INFO] Spark Project GraphX ............................... SUCCESS [  8.543 s]
    [INFO] Spark Project Streaming ............................ SUCCESS [ 21.891 s]
    [INFO] Spark Project Catalyst ............................. SUCCESS [01:15 min]
    [INFO] Spark Project SQL .................................. SUCCESS [02:28 min]
    [INFO] Spark Project ML Library ........................... SUCCESS [01:13 min]
    [INFO] Spark Project Tools ................................ SUCCESS [  1.534 s]
    [INFO] Spark Project Hive ................................. SUCCESS [ 56.505 s]
    [INFO] Spark Project REPL ................................. SUCCESS [  5.497 s]
    [INFO] Spark Project Assembly ............................. SUCCESS [  4.034 s]
    [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [  6.713 s]
    [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [  2.156 s]
    [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [  9.314 s]
    [INFO] Spark Project Examples ............................. SUCCESS [ 14.136 s]
    [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  3.357 s]
    [INFO] Spark Avro ......................................... SUCCESS [  5.773 s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  10:09 min
    [INFO] Finished at: 2019-03-17T11:11:29+08:00
    [INFO] ------------------------------------------------------------------------
    

    Building a Runnable Distribution(编译可运行的分布式版本)

    Spark提供了自动化的分布式编译脚本:./dev/make-distribution.sh

    脚本各参数含义可以通过命令

    ./dev/make-distribution.sh --help
    

    查看。

    ✘ stefan@localhost  ~/Documents/workspace/code/spark   master  ./dev/make-distribution.sh --help
    +++ dirname ./dev/make-distribution.sh
    ++ cd ./dev/..
    ++ pwd
    + SPARK_HOME=/Users/didi/Documents/workspace/code/spark
    + DISTDIR=/Users/didi/Documents/workspace/code/spark/dist
    + MAKE_TGZ=false
    + MAKE_PIP=false
    + MAKE_R=false
    + NAME=none
    + MVN=/Users/didi/Documents/workspace/code/spark/build/mvn
    + ((  1  ))
    + case $1 in
    + exit_with_usage
    + echo 'make-distribution.sh - tool for making binary distributions of Spark'
    make-distribution.sh - tool for making binary distributions of Spark
    + echo ''
    
    + echo usage:
    usage:
    + cl_options='[--name] [--tgz] [--pip] [--r] [--mvn <mvn-command>]'
    + echo 'make-distribution.sh [--name] [--tgz] [--pip] [--r] [--mvn <mvn-command>] <maven build options>'
    make-distribution.sh [--name] [--tgz] [--pip] [--r] [--mvn <mvn-command>] <maven build options>
    + echo 'See Spark'\''s "Building Spark" doc for correct Maven options.'
    See Spark's "Building Spark" doc for correct Maven options.
    + echo ''
    
    + exit 1
    

    编译命令

    ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
    

    上述命令会编译spark分发包,Python pip 和R包。执行前,请确认本地安装了R。

    Specifying the Hadoop Version and Enabling YARN(指定Hadoop版本并启用YARN)

    可以通过hadoop.version参数指定Hadoop编译版本,如果不指定,Spark将默认使用Hadoop2.6.X版本编译。
    编译命令

    # Apache Hadoop 2.6.X
    ./build/mvn -Pyarn -DskipTests clean package
    
    # Apache Hadoop 2.7.X and later
    ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.7 -DskipTests clean package
    

    building...

    ✘ stefan@localhost  ~/Documents/workspace/code/spark   master  ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.7 -DskipTests clean package
    Using `mvn` from path: /Users/didi/Documents/workspace/code/spark/build/apache-maven-3.6.0/bin/mvn
    [WARNING]
    [WARNING] Some problems were encountered while building the effective toolchains
    [WARNING] expected START_TAG or END_TAG not TEXT (position: TEXT seen ...</toolchain>\n   \n  -->z\n\n</... @103:3)  @ line 103, column 3
    [WARNING]
    [INFO] Scanning for projects...
    ...
    [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
    [INFO]
    [INFO] Spark Project Parent POM ........................... SUCCESS [  4.564 s]
    [INFO] Spark Project Tags ................................. SUCCESS [  8.780 s]
    [INFO] Spark Project Sketch ............................... SUCCESS [  6.256 s]
    [INFO] Spark Project Local DB ............................. SUCCESS [  5.063 s]
    [INFO] Spark Project Networking ........................... SUCCESS [  8.652 s]
    [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  4.215 s]
    [INFO] Spark Project Unsafe ............................... SUCCESS [  7.210 s]
    [INFO] Spark Project Launcher ............................. SUCCESS [01:07 min]
    [INFO] Spark Project Core ................................. SUCCESS [02:13 min]
    [INFO] Spark Project ML Local Library ..................... SUCCESS [  6.008 s]
    [INFO] Spark Project GraphX ............................... SUCCESS [  8.864 s]
    [INFO] Spark Project Streaming ............................ SUCCESS [ 22.931 s]
    [INFO] Spark Project Catalyst ............................. SUCCESS [01:35 min]
    [INFO] Spark Project SQL .................................. SUCCESS [02:23 min]
    [INFO] Spark Project ML Library ........................... SUCCESS [01:17 min]
    [INFO] Spark Project Tools ................................ SUCCESS [  0.616 s]
    [INFO] Spark Project Hive ................................. SUCCESS [01:09 min]
    [INFO] Spark Project REPL ................................. SUCCESS [  7.165 s]
    [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [  9.303 s]
    [INFO] Spark Project YARN ................................. SUCCESS [ 24.783 s]
    [INFO] Spark Project Assembly ............................. SUCCESS [  3.523 s]
    [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [  7.028 s]
    [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [  1.989 s]
    [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [  9.736 s]
    [INFO] Spark Project Examples ............................. SUCCESS [ 14.508 s]
    [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  3.328 s]
    [INFO] Spark Avro ......................................... SUCCESS [  7.217 s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  12:39 min
    [INFO] Finished at: 2019-03-17T13:17:52+08:00
    [INFO] ------------------------------------------------------------------------
    

    Building With Hive and JDBC Support(支持Hive和JDBC编译)

    集成Spark SQL,Hive和JDBC,如果不指定,将默认绑定Hive 1.2.1编译。

    编译命令

    # With Hive 1.2.1 support
    ./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package
    

    building...

     stefan@localhost  ~/Documents/workspace/code/spark   master  ./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package
    Using `mvn` from path: /Users/didi/Documents/workspace/code/spark/build/apache-maven-3.6.0/bin/mvn
    [WARNING]
    [WARNING] Some problems were encountered while building the effective toolchains
    [WARNING] expected START_TAG or END_TAG not TEXT (position: TEXT seen ...</toolchain>\n   \n  -->z\n\n</... @103:3)  @ line 103, column 3
    [WARNING]
    [INFO] Scanning for projects...
    ...
    [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
    [INFO]
    [INFO] Spark Project Parent POM ........................... SUCCESS [  4.719 s]
    [INFO] Spark Project Tags ................................. SUCCESS [  8.717 s]
    [INFO] Spark Project Sketch ............................... SUCCESS [  6.270 s]
    [INFO] Spark Project Local DB ............................. SUCCESS [  3.983 s]
    [INFO] Spark Project Networking ........................... SUCCESS [  7.893 s]
    [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  4.385 s]
    [INFO] Spark Project Unsafe ............................... SUCCESS [  6.898 s]
    [INFO] Spark Project Launcher ............................. SUCCESS [  5.493 s]
    [INFO] Spark Project Core ................................. SUCCESS [02:13 min]
    [INFO] Spark Project ML Local Library ..................... SUCCESS [ 10.281 s]
    [INFO] Spark Project GraphX ............................... SUCCESS [ 10.138 s]
    [INFO] Spark Project Streaming ............................ SUCCESS [ 26.678 s]
    [INFO] Spark Project Catalyst ............................. SUCCESS [02:23 min]
    [INFO] Spark Project SQL .................................. SUCCESS [04:46 min]
    [INFO] Spark Project ML Library ........................... SUCCESS [01:21 min]
    [INFO] Spark Project Tools ................................ SUCCESS [  1.319 s]
    [INFO] Spark Project Hive ................................. SUCCESS [01:04 min]
    [INFO] Spark Project REPL ................................. SUCCESS [  5.929 s]
    [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [  6.662 s]
    [INFO] Spark Project YARN ................................. SUCCESS [ 21.103 s]
    [INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 21.623 s]
    [INFO] Spark Project Assembly ............................. SUCCESS [  3.794 s]
    [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [  6.660 s]
    [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [  2.034 s]
    [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [  8.895 s]
    [INFO] Spark Project Examples ............................. SUCCESS [ 14.781 s]
    [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  3.565 s]
    [INFO] Spark Avro ......................................... SUCCESS [  5.989 s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  15:08 min
    [INFO] Finished at: 2019-03-17T13:46:54+08:00
    [INFO] ------------------------------------------------------------------------
    

    Packaging without Hadoop Dependencies for YARN(不包含hadoop依赖的yarn打包)

    采用hadoop-provided profile编译时,会排除hadoop依赖进行编译打包。
    编译命令

    ./build/mvn -Dhadoop-provided -DskipTests clean package
    

    building

    stefan@localhost  ~/Documents/workspace/code/spark   master  ./build/mvn -Dhadoop-provided -DskipTests clean package
    Using `mvn` from path: /Users/didi/Documents/workspace/code/spark/build/apache-maven-3.6.0/bin/mvn
    [WARNING]
    [WARNING] Some problems were encountered while building the effective toolchains
    [WARNING] expected START_TAG or END_TAG not TEXT (position: TEXT seen ...</toolchain>\n   \n  -->z\n\n</... @103:3)  @ line 103, column 3
    [WARNING]
    [INFO] Scanning for projects...
    [INFO] ------------------------------------------------------------------------
    ...
    [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
    [INFO]
    [INFO] Spark Project Parent POM ........................... SUCCESS [  5.056 s]
    [INFO] Spark Project Tags ................................. SUCCESS [  8.136 s]
    [INFO] Spark Project Sketch ............................... SUCCESS [  5.885 s]
    [INFO] Spark Project Local DB ............................. SUCCESS [  4.064 s]
    [INFO] Spark Project Networking ........................... SUCCESS [ 13.564 s]
    [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  6.083 s]
    [INFO] Spark Project Unsafe ............................... SUCCESS [ 16.586 s]
    [INFO] Spark Project Launcher ............................. SUCCESS [  6.701 s]
    [INFO] Spark Project Core ................................. SUCCESS [02:19 min]
    [INFO] Spark Project ML Local Library ..................... SUCCESS [  7.171 s]
    [INFO] Spark Project GraphX ............................... SUCCESS [  9.424 s]
    [INFO] Spark Project Streaming ............................ SUCCESS [ 32.804 s]
    [INFO] Spark Project Catalyst ............................. SUCCESS [01:31 min]
    [INFO] Spark Project SQL .................................. SUCCESS [02:52 min]
    [INFO] Spark Project ML Library ........................... SUCCESS [01:41 min]
    [INFO] Spark Project Tools ................................ SUCCESS [  0.879 s]
    [INFO] Spark Project Hive ................................. SUCCESS [01:14 min]
    [INFO] Spark Project REPL ................................. SUCCESS [  4.553 s]
    [INFO] Spark Project Assembly ............................. SUCCESS [  4.331 s]
    [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 10.777 s]
    [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [  2.870 s]
    [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [ 26.260 s]
    [INFO] Spark Project Examples ............................. SUCCESS [ 25.948 s]
    [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  4.794 s]
    [INFO] Spark Avro ......................................... SUCCESS [  8.309 s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  13:03 min
    [INFO] Finished at: 2019-03-17T14:13:57+08:00
    [INFO] ------------------------------------------------------------------------
    
    

    至此,我们演示了几种常用的编译方式。

    测试成功

    开启spark-shell

    ✘ didi@localhost  ~/Documents/workspace/code/spark   master  ./bin/spark-shell
    19/03/17 13:58:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://bogon:4040
    Spark context available as 'sc' (master = local[*], app id = local-1552805589600).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 3.0.0-SNAPSHOT
          /_/
    
    Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala>
    

    成功打开spark-shell交互界面,说明编译成功。
    后面我们将介绍如何在本地进行Spark本地源码的开发测试。
    参考:http://spark.apache.org/docs/latest/building-spark.html

    相关文章

      网友评论

          本文标题:Spark2.4.0 源码编译

          本文链接:https://www.haomeiwen.com/subject/zfuymqtx.html