美文网首页
Spark开发--Yarn集群模式(五)

Spark开发--Yarn集群模式(五)

作者: 无剑_君 | 来源:发表于2019-12-02 17:31 被阅读0次

一、Yarn集群模式

  Spark on Yarn 模式就是将Spark应用程序跑在Yarn集群之上,通过Yarn资源调度将executor启动在container中,从而完成driver端分发给executor的各个任务。将Spark作业跑在Yarn上,首先需要启动Yarn集群,然后通过spark-shell或spark-submit的方式将作业提交到Yarn上运行。
  提交作业之前需要将HADOOP_CONF_DIR或YARN_CONF_DIR配置到Spark-env.sh中。

  1. 集群规划
服务器 IP地址 软件 服务 备注
master 192.168.247.131 JDK、Scala、Spark resourceManager、namenode、datanode 主机
slave1 192.168.247.132 JDK、Scala、Spark nodeManager、namenode、datanode 从机
slave2 192.168.247.130 JDK、Scala、Spark nodeManager、namenode、datanode 从机
  1. 主机配置
192.168.247.131  master
192.168.247.132  slave1
192.168.247.130  slave2

  1. 配置免密

二、前置条件

1、Java8安装

/usr/lib/jvm/java-8-openjdk-amd64/

2、Scala scala 2.12.10

root@master:~# scala
Welcome to Scala 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231).
Type in expressions for evaluation. Or try :help.

scala> :quit

3、安装Hadoop

需要安装HDFS模块和YARN模块,HDFS必须安装,spark运行时要把jar包存放到HDFS上。

# 下载
# Hadoop下载
root@master:~# wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/core/hadoop-2.9.2/hadoop-2.9.2.tar.gz

# 解压
root@master:~# tar -zxvf hadoop-2.9.2.tar.gz -C /usr/local
# 配置环境变量
root@master:~# vi /etc/profile
export SCALA_HOME=/usr/local/scala-2.12.10
export HADOOP_HOME=/usr/local/hadoop-2.9.2
export PATH=$PATH:$SCALA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

三、下载安装Spark

spark程序将作为YARN的客户端用于提交任务。

  1. 下载安装
    下载地址:http://spark.apache.org/downloads.html
    下载地址
# 下载
root@master:~# wget https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz

# 解压
root@master:~# tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz -C /usr/local

  1. 配置环境变量
# 配置环境变量
root@master:~# vi /etc/profile
# 内容
export SCALA_HOME=/usr/local/scala-2.12.10
export HADOOP_HOME=/usr/local/hadoop-2.9.2
export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7
export PATH=$PATH:$SCALA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin

# 环境变量立即生效
root@master:~# source /etc/profile

  1. 配置hadoop
# 添加java环境变量
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/hadoop-env.sh 
# 内容
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

root@master:~# scp  /usr/local/hadoop-2.9.2/etc/hadoop/hadoop-env.sh root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
hadoop-env.sh                                                                                                                                                    100% 4991     4.8MB/s   00:00    
root@master:~# scp  /usr/local/hadoop-2.9.2/etc/hadoop/hadoop-env.sh root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
hadoop-env.sh   

# core-site.xml 
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/core-site.xml 
# 内容
<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://master:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/hadoop/tmp</value>
        </property>
        <property>
                 <name>hadoop.proxyuser.root.hosts</name>
                 <value>*</value>
        </property>
        <property>   
                 <name>hadoop.proxyuser.root.groups</name>
                 <value>*</value>
        </property>
</configuration>

root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/core-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
core-site.xml                                                                                                                                                    100% 1258   330.8KB/s   00:00    
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/core-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
core-site.xml  

# hdfs-site.xml 
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/hdfs-site.xml 
# 内容
<configuration>
        <!-- 设置namenode的http通讯地址 -->
        <property>
                <name>dfs.namenode.http-address</name>
                <value>master:50070</value>
        </property>
        <!-- 设置secondarynamenode的http通讯地址 -->
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>slave1:50090</value>
        </property>
        <!-- 设置namenode存放的路径 -->
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/hadoop/tmp/name</value>
        </property>
        <!-- 设置datanode存放的路径 -->
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/hadoop/tmp/data</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
</configuration>

root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/hdfs-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
hdfs-site.xml                                                                                                                                                    100% 1576   440.6KB/s   00:00    
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/hdfs-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
hdfs-site.xml   

# 修改文件slaves
root@master:~# vi /usr/local/hadoop-2.9.2/etc/hadoop/slaves 
master
slave1
slave2

root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/slaves root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
slaves                                                                                                                                                           100%   21     1.4KB/s   00:00    
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/slaves root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
slaves 
# 本机启动免密
root@master:~/.ssh# cp id_rsa.pub authorized_keys

# 格式化hdfs
root@master:~# hdfs namenode -format

  1. 配置spark-env.sh
# hadoop环境变量配置
export SPARK_DIST_CLASSPATH=/usr/local/hadoop-2.9.2
# 绑定master的主机域名
SPARK_MASTER_HOST=master
# master 通信端口,worker和master通信端口
SPARK_MASTER_PORT=7077
# master SParkUI用的端口
SPARK_MASTER_WEBUI_PORT=8080
# 配置worker的内存大小
SPARK_WORKER_MEMORY=1g

  1. slaves配置
slave1
slave2

  1. history server 配置
root@master:~# cp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf.template /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf
root@master:~# vi /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf
# 内容
# history 
spark.master=spark://master:7077
# 设定事件日志为true
spark.eventLog.enabled=true
# 设定事件日志目录
spark.eventLog.dir=hdfs://master:9000/spark/log/historyEventLog
spark.serializer=org.apache.spark.serializer.KryoSerializer
# 设定Driver的内存大小
spark.driver.memory=1g
# 设定历史操作日志保存路径
spark.history.fs.logDirectory=hdfs://master:9000/spark/log/historyEventLog

spark.history.ui.port=18080
spark.history.fs.update.interval=10s
#    要保留的应用程序ui的数目。如果超过此上限,则将删除最旧的应用程序。
spark.history.retainedApplications=50
spark.history.fs.cleaner.enabled=false
# 设定记录删除时间
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=7d
spark.history.ui.acls.enable=false

root@master:~# scp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7/conf/
spark-defaults.conf                                                                                                                                              100% 2091     2.5MB/s   00:00    
root@master:~# scp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-defaults.conf root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7/conf/
spark-defaults.conf  

注意: spark.eventLog.dir 和spark.history.fs.logDirectory 要相同路径

四、启动集群

  1. master启动命令
root@master:~# start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
root@master:~# jps
111570 Master
111661 Jps

  1. worker启动命令
root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7# ./sbin/start-slave.sh spark://master:7077
starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
126165 Jps
125909 Worker

root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7#  ./sbin/start-slave.sh spark://master:7077
starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
7572 Worker
7656 Jps

  1. Web查看
    http://192.168.247.131:8080/


    master控制台
  2. 启动历史服务

# 启动hdfs
root@master:~# start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-slave1.out
master: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-secondarynamenode-slave1.out
root@master:~# jps
11874 NameNode
111570 Master
12583 Jps
12157 DataNode
# 创建历史记录存放目录
root@master:~# hadoop fs -mkdir -p /spark/log/historyEventLog

# 启动历史服务
root@master:~# start-history-server.sh 
starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-master.out
root@master:~# jps
11874 NameNode
111570 Master
13975 HistoryServer
14057 Jps
12157 DataNode

查看历史日志

五、验证测试


六、Yarn模式

Yarn模式是指Spark直接调用Yarn进行资源调度,Standalone下资源调度用的是Spark自身的Master和ApplicationMaster;
在Yarn模式下直接直接用Yarn的ResourceManager、NodeManager和Container环境做资源调度,Spark计算框架的ApplicationMaster和Executor存在于Container中。
Spark ApplicationMaster负责和Yarn交互做资源调度,
Spark Driver负责和Executor交互做任务调度,
Spark中的Master和Worker没有了,换成了对应的yarn的RM和NM,Executor和Driver依然存在。

Yarn模式运行流程
(1)客户端提交应用程序,SparkSubmit
(2)让RM启动Spark的ApplicationMaster程序,用于Spark与Yarn之间资源交互
(3)AM向RM申请资源,用于启动Executor
(4)RM获取集群的资源信息(NM)
(5)RM将资源信息发送给AM,由AM中的Driver判断任务调度的地址
(6)Driver划分任务,分配任务task发送给Executor执行
(7)Executor执行任务,执行完毕后,通知Driver
(8)Driver和AM交互通知RM回收资源
(9)Executor、Container、Driver、ApplicationMaster就都释放资源消失
(10)最终留下Yarn的RM和NM,在client端打印结果。

七、运行模式

(一) client模式(开发测试)

  在Client模式下,Driver进程会在当前客户端启动,客户端进程一直存在直到应用程序运行结束。


client模式
  1. 配置 spark-env.sh
# 配置 spark-env.sh
root@master:~# cp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh.template /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh
# 添加配置
HADOOP_CONF_DIR=/usr/local/hadoop-2.9.2/etc/hadoop
YARN_CONF_DIR=/usr/local/hadoop-2.9.2/etc/hadoop

# 分发Spark
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave1:/usr/local/
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave2:/usr/local/

  1. 分发Scala
# 分发scala
root@master:~# scp -r /usr/local/scala-2.13.1/ root@slave1:/usr/local/
root@master:~# scp -r /usr/local/scala-2.13.1/ root@slave2:/usr/local/

# 分发环境变量
root@master:~# scp -r /etc/profile root@slave1:/etc/
profile 
root@master:~# scp -r /etc/profile root@slave2:/etc/
profile   

  1. 配置Hadoop
# 编辑yarn-site.xml文件
# 添加
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
     <name>yarn.nodemanager.pmem-check-enabled</name>
     <value>false</value> 
</property>

<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
     <name>yarn.nodemanager.vmem-check-enabled</name>
     <value>false</value>
</property>

# 分发到其他节点
root@master:~# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave1:/usr/local/
root@master:~# scp -r /usr/local/spark-2.4.4-bin-hadoop2.7/ root@slave2:/usr/local/
# 分发hadoop配置
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml                                                                                                                                                          100% 3128   294.2KB/s   00:00    
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml   
  1. 启动hadoop
# 启动hadoop
# master
root@master:~# start-dfs.sh
root@master:~# start-yarn.sh

  1. 提交作业
# 提交作业
spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--driver-memory 1g \
--executor-memory 512m \
--executor-cores 1 \
 /usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar \
10
# 运行结果
Pi is roughly 3.1428957144785725

说明:
--master yarn:使用yarn进行资源调度
--deploy-mode client:使用client模式

  1. 查看结果:
    http://master:8088

    查看结果
  2. spark-shell
    spark-shell必须使用client模式

# 创建文件
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# cat /root/wordcount.txt 
hello tom
hello jerry
hello kitty
hello world
hello tom
hello marquis
hello jone
# 上传数据文件
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# hadoop fs -put /root/wordcount.txt /wordcount

# 行数统计
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# ./bin/spark-shell --master yarn --deploy-mode client
19/12/02 11:11:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/12/02 11:12:24 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://master:4040
Spark context available as 'sc' (master = yarn, app id = application_1575253969569_0004).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val lines  = sc.textFile("/wordcount")
lines: org.apache.spark.rdd.RDD[String] = /wordcount MapPartitionsRDD[1] at textFile at <console>:24

scala> lines.count()
res0: Long = 7                                                                  

scala> lines.first()
res1: String = hello tom

运行结果

(二). cluster模式(生产)

yarn-cluster 不支持spark-shell /spark-sql
  在cluster模式下,Driver进程将会在集群中的一个worker中启动,而且客户端进程在完成自己提交任务的职责后,就可以退出,而不用等到应用程序执行完毕。

cluster模式
  1. 日志配置
    因为集群模式只能在日志中查看结果,所以必须配置历史日志。
# 非高可用模式
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
 <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>2592000</value>
</property>
<property>
    <name>yarn.log.server.url</name>
    <value>http://master:19888/jobhistory/logs</value>
</property>
<property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>hdfs://master:9000/user/hadoop/yarn-logs/</value>
</property>
# yarn.nodemanager.remote-app-log-dir是文件存放位置,可以是本地位置,也可以是HDFS上的位置,建议是存放在HDFS上。文件存放在HDFS上,目录一定要存在。

# 高可用模式
<property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
</property>
<property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>106800</value>
</property>
<property>
        <name>yarn.log.server.url</name>
        <value>http://master:19888/jobhistory/logs</value>
</property>
<property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>hdfs://hadoopha/user/hadoop/yarn-logs/</value>
</property>

# 创建目录
root@master~ # hadoop fs -mkdir -p /user/hadoop/yarn-logs

# hadoopha是在hdfs-site.xml中配置的dfs.nameservices的值。注意:高可用模式后面不要加端口号了。

# 文件分发
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave2:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml                                                                                                                                                          100% 3365     4.9MB/s   00:00    
root@master:~# scp /usr/local/hadoop-2.9.2/etc/hadoop/yarn-site.xml root@slave1:/usr/local/hadoop-2.9.2/etc/hadoop/
yarn-site.xml                                                                                                                                                          100% 3365     7.0MB/s   00:00  
  1. 启动历史服务
# 开启hadoop 历史服务
root@master~ # mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.9.2/logs/mapred-root-historyserver-master.out
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
14144 JobHistoryServer
11298 NodeManager
11063 ResourceManager
21272 QuorumPeerMain
14475 Jps
22508 NameNode
23293 DFSZKFailoverController
22767 DataNode
# 开启spark历史服务
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# ./sbin/start-history-server.sh 

  1. 作业提交
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# ./bin/spark-submit --master yarn --deploy-mode cluster --executor-memory 1G --executor-cores 1 --class org.apache.spark.examples.SparkPi  /usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar 10

19/12/02 11:23:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/02 11:24:07 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
19/12/02 11:24:07 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
19/12/02 11:24:07 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
19/12/02 11:24:07 INFO yarn.Client: Setting up container launch context for our AM
19/12/02 11:24:07 INFO yarn.Client: Setting up the launch environment for our AM container
19/12/02 11:24:07 INFO yarn.Client: Preparing resources for our AM container
19/12/02 11:24:07 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/12/02 11:24:09 INFO yarn.Client: Uploading resource file:/tmp/spark-253add86-aa04-4d50-8034-87963da2a896/__spark_libs__1592300848359771379.zip -> hdfs://hadoopha/user/root/.sparkStaging/application_1575253969569_0006/__spark_libs__1592300848359771379.zip
19/12/02 11:24:13 INFO yarn.Client: Uploading resource file:/usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar -> hdfs://hadoopha/user/root/.sparkStaging/application_1575253969569_0006/spark-examples_2.11-2.4.4.jar
19/12/02 11:24:13 INFO yarn.Client: Uploading resource file:/tmp/spark-253add86-aa04-4d50-8034-87963da2a896/__spark_conf__836577871397246590.zip -> hdfs://hadoopha/user/root/.sparkStaging/application_1575253969569_0006/__spark_conf__.zip
19/12/02 11:24:13 INFO spark.SecurityManager: Changing view acls to: root
19/12/02 11:24:13 INFO spark.SecurityManager: Changing modify acls to: root
19/12/02 11:24:13 INFO spark.SecurityManager: Changing view acls groups to: 
19/12/02 11:24:13 INFO spark.SecurityManager: Changing modify acls groups to: 
19/12/02 11:24:13 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
19/12/02 11:24:14 INFO yarn.Client: Submitting application application_1575253969569_0006 to ResourceManager
19/12/02 11:24:14 INFO impl.YarnClientImpl: Submitted application application_1575253969569_0006
19/12/02 11:24:15 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:15 INFO yarn.Client: 
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1575257054794
         final status: UNDEFINED
         tracking URL: http://master:8088/proxy/application_1575253969569_0006/
         user: root
19/12/02 11:24:16 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:17 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:18 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:19 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:20 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:21 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:22 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:23 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:24 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:25 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:26 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:27 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:28 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:29 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:30 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:31 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:32 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:33 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:34 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:35 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:36 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:37 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:38 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:39 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:40 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:41 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:42 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:43 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:44 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:45 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:46 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:47 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:48 INFO yarn.Client: Application report for application_1575253969569_0006 (state: ACCEPTED)
19/12/02 11:24:49 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:49 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: master
         ApplicationMaster RPC port: 36121
         queue: default
         start time: 1575257054794
         final status: UNDEFINED
         tracking URL: http://master:8088/proxy/application_1575253969569_0006/
         user: root
19/12/02 11:24:50 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:51 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:52 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:53 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:54 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:55 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:56 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:57 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:58 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:24:59 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:00 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:01 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:02 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:03 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:04 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:05 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:06 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:07 INFO yarn.Client: Application report for application_1575253969569_0006 (state: RUNNING)
19/12/02 11:25:08 INFO yarn.Client: Application report for application_1575253969569_0006 (state: FINISHED)
19/12/02 11:25:08 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: master
         ApplicationMaster RPC port: 36121
         queue: default
         start time: 1575257054794
         final status: SUCCEEDED
         tracking URL: http://master:8088/proxy/application_1575253969569_0006/
         user: root
19/12/02 11:25:09 INFO util.ShutdownHookManager: Shutdown hook called
19/12/02 11:25:09 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-488a412e-9786-4242-9768-3df83c89078c
19/12/02 11:25:09 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-253add86-aa04-4d50-8034-87963da2a896

运行结果
必须配置历史日志:
结果在日志中查看。
结果
结果
结果
注意:必须启动Spark的历史服务。

(三)两种模式的区别

client模式:Driver运行在Client上,应用程序运行结果会在客户端显示,所有适合运行结果有输出的应用程序(如spark-shell)。
cluster模式:Driver程序在YARN中运行,应用的运行结果不能在客户端显示,所以最好运行那些将结果最终保存在外部存储介质(如HDFS、Redis、Mysql)而非stdout输出的应用程序,客户端的终端显示的仅是作为YARN的job的简单运行状况。

八、常见问题:

  1. WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
    解决:
# 在hdfs上创建目录
root@master:~# hadoop fs -mkdir -p /home/hadoop/spark_jars
# 上传spark的jars
root@master:~# hadoop fs -put /usr/local/spark-2.4.4-bin-hadoop2.7/jars/* /home/hadoop/spark_jars

  1. 内存不足
# 关闭虚拟机内存检查(避免虚拟机内存不足时,无法运行)
<property>
  <name>yarn.nodemanager.vmem-check-enabled</name> 
  <value>false</value> 
 </property>

相关文章

网友评论

      本文标题:Spark开发--Yarn集群模式(五)

      本文链接:https://www.haomeiwen.com/subject/utwtgctx.html