Yarn资源相关

作者: 小KKKKKKKK | 来源:发表于2021-06-22 09:18 被阅读0次

1.Yarn上的角色

1.1.Client

客户端，接收作业请求。接收请求后向RM（Resource Manager）发起请求，让RM对这个作业生成一个Job ID。

1.2.Resource Manager

主节点，它负责管理整个集群的计算资源，并将这些资源分别给应用程序。

1.3.Node Manager

计算节点，根据相关的设置来启动容器的。NM（Node Manager）会定期向RM发送心跳信息来更新其健康状态。同时其也会监督Container的生命周期管理，监控每个Container的资源使用，管理日志和不同应用程序用到的附属服务。

1.4.Application Master

管理运行在Yarn上的应用程序，AM（ApplicationMaster）负责和RM的scheduler协商资源，并且和NM通信来运行相应的任务。RM 为 AM 分配容器，这些容器将会用来运行任务。AM 也会追踪应用程序的状态，监控容器的运行进度。

1.5.Container

容器，是YARN里面资源分配的基本单位，具有一定的内存以及CPU资源。容器授予 AM 使用特定主机的特定数量资源的权限。AM 也是在容器中运行的，其在应用程序分配的第一个容器中运行。

2.Yarn上任务流程

2.1.client接收到任务，client与RM（Resource Manager）发起请求，将这个任务赋予一个Job ID，将任务状态定义为New；
2.2.client继续将Job的详细信息提交给RM，RM将作业的详细信息保存，并且将Job的状态修改为Submit；
2.3.RM继续将作业信息提交给scheduler，scheduler会检查client的权限，并检查要运行AM（Application Master）是否有足够的资源，将Job的状态是Accept；
2.4.RM开始为要运行AM分配Container资源，并在Container上启动AM，修改Job的状态是Running；
2.5.AM启动成功后，开始与RM协调，并向RM申请要运行程序的资源，并定期检查状态；
2.6.Job按照预期完成，修改Job的状态为Finished。如果运行过程中出现故障，Job的状态为Failed。如果客户端主动kill掉作业，Job的状态为Killed；

3.Yarn上的资源管理

3.1.查看Yarn上的资源

以CDH为例
进入Yarn

image.png

点击Web UI --> Resource Manager Web UI

image.png
进入Yarn Web

image.png
可以查看到Yarn上可以使用的内存为141.52GB，虚拟CPU为160个。

3.2.如何确定集群使用的资源

在Yarn的配置中可以搜索yarn.nodemanager.resource.memory-mb

image.png
可以看到每个节点可以使用的内存为35.38，Yarn一共有4个NM，Yarn可以使用的内存就为35.38 * 4 = 141.52。
这个配置是表示NM总共能够使用的物理内存，这也是可以给container使用的物理内存。
在配置页面中搜索yarn.nodemanager.resource.cpu-vcores

image.png
这个配置是表示NM总共能够使用的虚拟CPU数量，这也是可以给container使用的虚拟CPU数量。

3.2.查看scheduler调度资源

进入Yarn Web --> scheduler

image.png

可以看出目前Yarn上

3.2.1.Scheduler类型，Fair Scheduler；

调度器类型在CDH中提供三种，通过Yarn配置界面搜索yarn.resourcemanager.scheduler.class

image.png
提供了三种scheduler类型：
FairScheduler，公平调度器，设计目标是为所有的应用分配公平的资源。在FairScheduler中，我们不需要预先占用一定的系统资源，FairScheduler会为所有运行的job动态的调整系统资源；
FifoScheduler，先进先出调度器，FIfoScheduler把应用按提交的顺序排成一个队列，这是一个先进先出队列，在进行资源分配的时候，先给队列中最头上的应用进行分配资源，待最头上的应用需求满足后再给下一个分配，以此类推；
CapacityScheduler，容量调度器，有一个专门的队列用来运行小任务，但是为小任务专门设置一个队列会预先占用一定的集群资源，这就导致大任务的执行时间会落后于使用FifoScheduler时的时间；

3.2.2.最小分配2GB内存，1虚拟CPU；

3.2.3.最大分配32GB内存，40个虚拟CPU；

这里需要提出4个相关配置来定义以上配置：

配置项	说明
yarn.scheduler.minimum-allocation-mb	最小分配内存，如果请求的资源小于1G，也会设置为1G。
yarn.scheduler.maximum-allocation-mb	最大分配的内存，如果比这个内存高，就会抛出InvalidResourceRequestException异常。
yarn.scheduler.minimum-allocation-vcores	最小分配虚拟CPU
yarn.scheduler.maximum-allocation-vcores	最大分配虚拟CPU

4.Spark on Yarn

在日常生产环境中，将spark程序提交到Yarn上运行。

4.1.两种运行模式

1.Cluster模式，Driver运行在Application Master里面的；
2.Client模式，Driver就运行在提交spark程序的地方；

4.2.启动相关参数

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
  --class CLASS_NAME          Your application's main class (for Java / Scala apps).
  --name NAME                 A name of your application.
  --jars JARS                 Comma-separated list of jars to include on the driver
                              and executor classpaths.
  --packages                  Comma-separated list of maven coordinates of jars to include
                              on the driver and executor classpaths. Will search the local
                              maven repo, then maven central and any additional remote
                              repositories given by --repositories. The format for the
                              coordinates should be groupId:artifactId:version.
  --exclude-packages          Comma-separated list of groupId:artifactId, to exclude while
                              resolving the dependencies provided in --packages to avoid
                              dependency conflicts.
  --repositories              Comma-separated list of additional remote repositories to
                              search for the maven coordinates given with --packages.
  --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place
                              on the PYTHONPATH for Python apps.
  --files FILES               Comma-separated list of files to be placed in the working
                              directory of each executor. File paths of these files
                              in executors can be accessed via SparkFiles.get(fileName).

  --conf PROP=VALUE           Arbitrary Spark configuration property.
  --properties-file FILE      Path to a file from which to load extra properties. If not
                              specified, this will look for conf/spark-defaults.conf.

  --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 1024M).
  --driver-java-options       Extra Java options to pass to the driver.
  --driver-library-path       Extra library path entries to pass to the driver.
  --driver-class-path         Extra class path entries to pass to the driver. Note that
                              jars added with --jars are automatically included in the
                              classpath.

  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).

  --proxy-user NAME           User to impersonate when submitting the application.
                              This argument does not work with --principal / --keytab.

  --help, -h                  Show this help message and exit.
  --verbose, -v               Print additional debug output.
  --version,                  Print the version of current Spark.

 Cluster deploy mode only:
  --driver-cores NUM          Number of cores used by the driver, only in cluster mode
                              (Default: 1).

 Spark standalone or Mesos with cluster deploy mode only:
  --supervise                 If given, restarts the driver on failure.
  --kill SUBMISSION_ID        If given, kills the driver specified.
  --status SUBMISSION_ID      If given, requests the status of the driver specified.

 Spark standalone and Mesos only:
  --total-executor-cores NUM  Total cores for all executors.

 Spark standalone and YARN only:
  --executor-cores NUM        Number of cores per executor. (Default: 1 in YARN mode,
                              or all available cores on the worker in standalone mode)

 YARN-only:
  --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").
  --num-executors NUM         Number of executors to launch (Default: 2).
                              If dynamic allocation is enabled, the initial number of
                              executors will be at least NUM.
  --archives ARCHIVES         Comma separated list of archives to be extracted into the
                              working directory of each executor.
  --principal PRINCIPAL       Principal to be used to login to KDC, while running on
                              secure HDFS.
  --keytab KEYTAB             The full path to the file that contains the keytab for the
                              principal specified above. This keytab will be copied to
                              the node running the Application Master via the Secure
                              Distributed Cache, for renewing the login tickets and the
                              delegation tokens periodically.

常用到的参数：
--master 运行模式，有local、yarn、spark://host:port、mesos://host:port、k8s://https://host:port；
--deploy-mode 驱动模式，有cluster、client两种；
--num-executors 启动spark中设置executor的数量；
--executor-cores 每一个executor中虚拟CPU数量；
--executor-memory 每一个executor的内存大小；
--driver-memory driver端内存大小；
--driver-cores driver端虚拟CPU数量；
--jars 启动jar包需要的依赖jar包，绝对路径，逗号分隔；
--class 启动类；