美文网首页
(一)Spark-2.3.1编译

(一)Spark-2.3.1编译

作者: 白面葫芦娃92 | 来源:发表于2018-09-18 09:27 被阅读0次

    1.下载Spark
    链接:http://spark.apache.org/downloads.html


    2.将spark-2.3.1.tgz上传至~/software文件夹下
    3.将spark-2.3.1.tgz解压至~/app文件夹下
    [hadoop@hadoop001 software]$ tar -zxvf spark-2.3.1.tgz -C ~/app
    4.编译Spark所需做的准备:
    1)Maven 3.3.9 or newer
    2)Java 8+
    3)Scala-2.11.8
    4)Git
    确认各组件是否安装好
    [hadoop@hadoop001 software]$ java -version
    java version "1.8.0_45"
    Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
    Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
    [hadoop@hadoop001 software]$ mvn -version
    Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
    Maven home: /home/hadoop/app/apache-maven-3.3.9
    Java version: 1.8.0_45, vendor: Oracle Corporation
    Java home: /usr/java/jdk1.8.0_45/jre
    Default locale: en_US, platform encoding: UTF-8
    OS name: "linux", version: "2.6.32-431.el6.x86_64", arch: "amd64", family: "unix"
    [hadoop@hadoop001 software]$ scala
    Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45).
    Type in expressions for evaluation. Or try :help.
    scala> 
    

    5.将export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"写入环境变量
    export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
    If you don’t add these parameters to MAVEN_OPTS, you may see errors and warnings like the following:

    [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.11/classes...
    [ERROR] Java heap space -> [Help 1]
    

    You can fix these problems by setting the MAVEN_OPTS variable as discussed before.
    6.确认环境变量都已设置

    [hadoop@hadoop001 ~]$ vi .bash_profile
    export JAVA_HOME=/usr/java/jdk1.8.0_45
    export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0
    export HIVE_HOME=/home/hadoop/app/hive-1.1.0-cdh5.7.0
    export MVN_HOME=/home/hadoop/app/apache-maven-3.3.9
    export FINDBUGS_HOME=/home/hadoop/app/findbugs-1.3.9
    export PROTOC_HOME=/usr/local/protobuf
    export SQOOP_HOME=/home/hadoop/app/sqoop-1.4.6-cdh5.7.0
    export SCALA_HOME=/home/hadoop/app/scala-2.11.8
    export FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin
    export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
    
    export PATH=$FLUME_HOME/bin:$SCALA_HOME/bin:$SQOOP_HOME/bin:$PROTOC_HOME/bin:$FINDBUGS_HOME/bin:$MVN_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
    [hadoop@hadoop001 ~]$ source .bash_profile
    

    7.设置Maven的本地仓库

    [hadoop@hadoop001 conf]$ cd /home/hadoop/app/apache-maven-3.3.9/conf
    [hadoop@hadoop001 conf]$ cat settings.xml
     <!-- localRepository
       | The path to the local repository maven will use to store artifacts.
       |
       | Default: ${user.home}/.m2/repository
      <localRepository>/path/to/local/repo</localRepository>
      -->
      <localRepository>/home/hadoop/maven_repo</localRepository>
    

    8.安装git

    [hadoop@hadoop001 ~]$ sudo yum install git
    

    9.配置make-distribution.sh

    [hadoop@hadoop001 ~]$ cd /home/hadoop/app/spark-2.3.1/dev
    [hadoop@hadoop001 dev]$ vi make-distribution.sh
    
    #VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v "INFO" | tail -n 1)
    #SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
    #    | grep -v "INFO"\
    #    | tail -n 1)
    #SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
    #    | grep -v "INFO"\
    #    | tail -n 1)
    #SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
    #    | grep -v "INFO"\
    #    | fgrep --count "<id>hive</id>";\
    #    # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
    #    # because we use "set -o pipefail"
    #    echo -n)
    
    VERSION=2.3.1
    SCALA_VERSION=2.11
    SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
    SPARK_HIVE=1
    

    (是个检测的过程,不配会很慢,配置后相当于写死了,节省时间)
    9.在pom.xml文件中添加cloudera repos

    [hadoop@hadoop001 spark-2.3.1]$ vi pom.xml 
    
     <repository>
           <id>cloudera</id>
           <name>cloudera Repository</name>
           <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    

    10.编译spark

    [hadoop@hadoop001 spark-2.3.1]$ ./dev/make-distribution.sh \
    > --name 2.6.0-cdh5.7.0 \
    > --tgz \
    > -Dhadoop.version=2.6.0-cdh5.7.0 \
    > -Phadoop-2.6 \
    > -Phive -Phive-thriftserver \
    > -Pyarn
    

    相关文章

      网友评论

          本文标题:(一)Spark-2.3.1编译

          本文链接:https://www.haomeiwen.com/subject/qzxabftx.html