HDP2.5上尝鲜Spark2.1稳定版

作者: biggeng | 来源:发表于2017-01-10 20:17 被阅读0次

HDP2.5上尝鲜Spark2.1稳定版
[spark] 从spark-submit开始解析整个任务调度流
【2019-07-17】jh使用spark on hbase获取
樱桃
常用git操作
尝鲜
尝鲜
尝鲜
尝鲜
尝鲜

HDP2.5不仅支持Spark1.6.2，还推出了Spark2.0的预览版。在近期Spark社区又发布了Spark2.1正式版。如果需要在HDP stack上支持Spark2.1，估计需要等待hortonworks后续推出。笔者尝试在HDP2.5上运行Spark2.1社区正式版，以下为部署方法。

前置条件

安装HDP2.5，使用Ambari部署Spark2预览版。
从Spark社区下载Spark2.1 (基于Hadoop2.7及later版本)

部署Spark2.1步骤

在HDP安装目录下解压缩Spark2.1包
修改conf/spark-env.sh 为以下内容.

#!/usr/bin/env bash

# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.

# Options read in YARN client mode
#SPARK_EXECUTOR_INSTANCES="2" #Number of workers to start (Default: 2)
#SPARK_EXECUTOR_CORES="1" #Number of cores for the workers (Default: 1).
#SPARK_EXECUTOR_MEMORY="1G" #Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
#SPARK_DRIVER_MEMORY="512M" #Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
#SPARK_YARN_APP_NAME="spark" #The name of your application (Default: Spark)
#SPARK_YARN_QUEUE="default" #The hadoop queue to use for allocation requests (Default: default)
#SPARK_YARN_DIST_FILES="" #Comma separated list of files to be distributed with the job.
#SPARK_YARN_DIST_ARCHIVES="" #Comma separated list of archives to be distributed with the job.

# Generic options for the daemons used in the standalone deploy mode

# Alternate conf dir. (Default: ${SPARK_HOME}/conf)
export SPARK_CONF_DIR=${SPARK_CONF_DIR:-/usr/hdp/current/spark2-historyserver/conf}

# Where log files are stored.(Default:${SPARK_HOME}/logs)
#export SPARK_LOG_DIR=${SPARK_HOME:-/usr/hdp/current/spark2-historyserver}/logs
export SPARK_LOG_DIR=/var/log/spark2

# Where the pid file is stored. (Default: /tmp)
export SPARK_PID_DIR=/var/run/spark2

#Memory for Master, Worker and history server (default: 1024MB)
export SPARK_DAEMON_MEMORY=1024m

# A string representing this instance of spark.(Default: $USER)
SPARK_IDENT_STRING=$USER

# The scheduling priority for daemons. (Default: 0)
SPARK_NICENESS=0

export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/current/hadoop-client/conf}

# The java implementation to use.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_60

修改conf/spark-defaults.conf为以下内容

spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.eventLog.dir hdfs:///spark2-history/
spark.eventLog.enabled true
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.history.fs.logDirectory hdfs:///spark2-history/
spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.ui.port 18081
spark.yarn.historyServer.address ochadoop02.jcloud.local:18081
spark.yarn.queue default

打开Ambari，在YARN配置里，diable yarn.timeline-service.enabled。
打开Ambari，在MapReduce配置里修改/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar为 /usr/hdp/2.5.0.0-1245/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-1245.jar
通过Ambari重启受影响的组件。
提交SparkPi，验证是否部署成功。