背景
我们想将集群的机器打上标签,将不同的业务跑在不同的机器上,以应对不同级别客户的业务需求。
root
/ \
default perjob
Yarn调度方式
我们hadoop版本使用的是3.1.4。yarn的调度方式有三总:FIFOScheduler、CapacityScheduler、FairScheduler。一般常用的是后两种。之前没有使用标签的功能所以一直使用的FairScheduler,这个调度器比较简单。如果想用标签的话,只能使用CapacityScheduler调度器。
配置yarn-site.xml
<!-- 设置调度为CapacityScheduler -->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<!--value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value-->
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<!-- 开启标签功能 -->
<property>
<name>yarn.node-labels.enabled</name>
<value>true</value>
</property>
<!-- 设置标签存储位置-->
<property>
<name>yarn.node-labels.fs-store.root-dir</name>
<value>hdfs://node1:9900/yn/node-labels/</value>
</property>
<!-- 开启资源抢占监控 -->
<property>
<name>yarn.resourcemanager.scheduler.monitor.enable</name>
<value>true</value>
</property>
<!-- 设置一轮抢占的资源占比,默认为0.1 -->
<property>
<name>yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round</name>
<value>0.3</value>
</property>
配置capacity-scheduler.xml
这个调度器的配置实在是太多了,也是最复杂的一个调度器。官方的文档是非常详细的,但是想看懂你首先需要有个总体的了解。直接使用如下配置覆盖默认的capacity-scheduler.xml。
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>10000</value>
<description>
Maximum number of applications that can be pending and running.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.1</value>
<description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls number of concurrent running
applications.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
<description>
The ResourceCalculator implementation to be used to compare
Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare
multi-dimensional resources such as Memory, CPU etc.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,perjob</value>
<description>
The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>60</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.perjob.capacity</name>
<value>40</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.perjob.maximum-capacity</name>
<value>80</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
<value>SE</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.default-node-label-expression</name>
<value>SE</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.perjob.accessible-node-labels</name>
<value>AP</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.perjob.default-node-label-expression</name>
<value>AP</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels.SE.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels.SE.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels.AP.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.perjob.accessible-node-labels.AP.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
<value>5</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.perjob.user-limit-factor</name>
<value>5</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.default-application-priority</name>
<value>10</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.perjob.default-application-priority</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.leaf-queue-template.ordering-policy</name>
<value>fair</value>
</property>
</configuration>
配置标签
新建SE、AP两个标签
yarn rmadmin -addToClusterNodeLabels "SE,AP";
将机器打上标签
yarn rmadmin -replaceLabelsOnNode "node1=SE node2=AP node3=AP";
属性配置
yarn rmadmin -refreshQueues
结论和总结
配置的成功之前,遇到个比较棘手的问题:就是提交flink任务的时候,任务一直处于ACCEPTED状态,查看yarn rm日志为看到相关异常。那么如何查看调度的异常信息的呢,我也是无意间发现,在控制台Scheduler菜单页面,可以Dump scheduler logs。
点击后会在hadoop日志目录下生成yarn-capacity-scheduler-debug.log。
2021-09-02 15:29:18,687 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Trying to assign containers to child-queue of root
2021-09-02 15:29:18,687 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Failed to assign to queue: root nodePatrition: AP, usedResources: <memory:0, vCores:0>, clusterResources: <memory:110592, vCores:96>, reservedResources: <memory:0, vCores:0>, maxLimitCapacity: <memory:0, vCores:0>, currTotalUsed:<memory:0, vCores:0>
结合github上的源码,找到AbstractCSQueue,很容易定位到打日志的代码行:
可以看出来很多参数都是默认的值0,导致无法分配资源。我是因为没有配置yarn.scheduler.capacity.<queue-path>.accessible-node-labels.<label>.capacity,导致一直分配不了资源。此配置项默认值是0,官方文档上有详细的说明:
配置好后,通过yarn rmadmin -refreshQueues来刷新capacity-scheduler.xml的配置信息。
正常的yarn-capacity-scheduler-debug.log如下:
2021-09-03 08:04:54,261 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager: User limit computation for deployer, in queue: perjob, userLimitPercent=100, userLimitFactor=5.0, required=<memory:512, vCores:1>, consumed=<memory:0, vCores:0>, user-limit-resource=<memory:512, vCores:1>, queueCapacity=<memory:512, vCores:1>, qconsumed=<memory:0, vCores:0>, currentCapacity=<memory:512, vCores:1>, activeUsers=0.0, clusterCapacity=<memory:51200, vCores:32>, resourceByLabel=<memory:51200, vCores:32>, usageratio=0.0, Partition=SE, resourceUsed=<memory:512, vCores:1>, maxUserLimit=<memory:2560, vCores:5>, userWeight=1.0
至此终于将Capacity Scheduler调度配置完成。断断续续的看了两天了,还是挺不容易的。周末给自己加个鸡腿,犒劳下自己。
网友评论