美文网首页
(一)Spark-2.3.1编译

(一)Spark-2.3.1编译

作者: 白面葫芦娃92 | 来源:发表于2018-09-18 09:27 被阅读0次

1.下载Spark
链接:http://spark.apache.org/downloads.html


2.将spark-2.3.1.tgz上传至~/software文件夹下
3.将spark-2.3.1.tgz解压至~/app文件夹下
[hadoop@hadoop001 software]$ tar -zxvf spark-2.3.1.tgz -C ~/app
4.编译Spark所需做的准备:
1)Maven 3.3.9 or newer
2)Java 8+
3)Scala-2.11.8
4)Git
确认各组件是否安装好
[hadoop@hadoop001 software]$ java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
[hadoop@hadoop001 software]$ mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /home/hadoop/app/apache-maven-3.3.9
Java version: 1.8.0_45, vendor: Oracle Corporation
Java home: /usr/java/jdk1.8.0_45/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-431.el6.x86_64", arch: "amd64", family: "unix"
[hadoop@hadoop001 software]$ scala
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45).
Type in expressions for evaluation. Or try :help.
scala> 

5.将export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"写入环境变量
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
If you don’t add these parameters to MAVEN_OPTS, you may see errors and warnings like the following:

[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.11/classes...
[ERROR] Java heap space -> [Help 1]

You can fix these problems by setting the MAVEN_OPTS variable as discussed before.
6.确认环境变量都已设置

[hadoop@hadoop001 ~]$ vi .bash_profile
export JAVA_HOME=/usr/java/jdk1.8.0_45
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0
export HIVE_HOME=/home/hadoop/app/hive-1.1.0-cdh5.7.0
export MVN_HOME=/home/hadoop/app/apache-maven-3.3.9
export FINDBUGS_HOME=/home/hadoop/app/findbugs-1.3.9
export PROTOC_HOME=/usr/local/protobuf
export SQOOP_HOME=/home/hadoop/app/sqoop-1.4.6-cdh5.7.0
export SCALA_HOME=/home/hadoop/app/scala-2.11.8
export FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"

export PATH=$FLUME_HOME/bin:$SCALA_HOME/bin:$SQOOP_HOME/bin:$PROTOC_HOME/bin:$FINDBUGS_HOME/bin:$MVN_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
[hadoop@hadoop001 ~]$ source .bash_profile

7.设置Maven的本地仓库

[hadoop@hadoop001 conf]$ cd /home/hadoop/app/apache-maven-3.3.9/conf
[hadoop@hadoop001 conf]$ cat settings.xml
 <!-- localRepository
   | The path to the local repository maven will use to store artifacts.
   |
   | Default: ${user.home}/.m2/repository
  <localRepository>/path/to/local/repo</localRepository>
  -->
  <localRepository>/home/hadoop/maven_repo</localRepository>

8.安装git

[hadoop@hadoop001 ~]$ sudo yum install git

9.配置make-distribution.sh

[hadoop@hadoop001 ~]$ cd /home/hadoop/app/spark-2.3.1/dev
[hadoop@hadoop001 dev]$ vi make-distribution.sh

#VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v "INFO" | tail -n 1)
#SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
#    | grep -v "INFO"\
#    | tail -n 1)
#SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
#    | grep -v "INFO"\
#    | tail -n 1)
#SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
#    | grep -v "INFO"\
#    | fgrep --count "<id>hive</id>";\
#    # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
#    # because we use "set -o pipefail"
#    echo -n)

VERSION=2.3.1
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1

(是个检测的过程,不配会很慢,配置后相当于写死了,节省时间)
9.在pom.xml文件中添加cloudera repos

[hadoop@hadoop001 spark-2.3.1]$ vi pom.xml 

 <repository>
       <id>cloudera</id>
       <name>cloudera Repository</name>
       <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>

10.编译spark

[hadoop@hadoop001 spark-2.3.1]$ ./dev/make-distribution.sh \
> --name 2.6.0-cdh5.7.0 \
> --tgz \
> -Dhadoop.version=2.6.0-cdh5.7.0 \
> -Phadoop-2.6 \
> -Phive -Phive-thriftserver \
> -Pyarn

相关文章

  • (一)Spark-2.3.1编译

    1.下载Spark链接:http://spark.apache.org/downloads.html 5.将exp...

  • 编译过程与静态库&动态库

    一、编译过程 1、预编译(Preprocessing) 预编译即预处理,编译器不能直接对预编译命令进行编译,必须在...

  • 宏 const static extern的区别与使用

    一. const 与 宏的区别 编译时刻: 宏是预编译(编译之前处理), const是编译阶段处理 编译检查: 宏...

  • 编译链接过程

    编译链接过程 一个完整的编译链接过程包含了以下步骤: 预编译 编译 汇编 链接 预编译 预编译的处理规则如下 删除...

  • 字节码1

    什么是反编译? ※ 编译 Compile 将一个 *.java文件编译成 *.class 文件的过程,称为编译。 ...

  • C++复习之知识点

    1.编译 编译C++程序时,编译器自动定义一个__cplusplus。 编译标准C时,编译器自动定义__STDC_...

  • 说给小白的编译型语言和解释型语言

    编译型 优点:编译器一般会有预编译的过程对代码进行优化。因为编译只做一次,运行时不需要编译,所以编译型语言的程序执...

  • 10-29(clang)

    编译器可以编译一个程序,但是编译器本身也是一个程序,它是由更早的编译器编译而成 的,那么,最早的编译器是谁呢? A...

  • Memcache操作详解及分析

    一、 linux 编译 memcached 准备编译环境在 linux 编译,需要 gcc,make,cmake,...

  • dtb文件编译

    反编译 编译 批量反编译

网友评论

      本文标题:(一)Spark-2.3.1编译

      本文链接:https://www.haomeiwen.com/subject/qzxabftx.html