今天搭建了一个3节点yarn集群,3个节点的主机名分别是hadoo-master、hadoo-slave1、hadoop-slave2。但是启动yarn一直失败,查看了hadoop-slave2上的logs目录下的yarn-root-nodemanager-hadoop-slave2.log
如下:
2018-12-30 08:00:26,246 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app node started at 8042
2018-12-30 08:00:26,270 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node ID assigned is : hadoop-slave2:43533
2018-12-30 08:00:26,277 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at hadoop-master/172.16.30.41:8031
2018-12-30 08:00:26,304 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
2018-12-30 08:00:26,357 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []
2018-12-30 08:00:26,376 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
2018-12-30 08:00:27,461 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-master/172.16.30.41:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-12-30 08:00:28,462 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop-master/172.16.30.41:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
从日志可以看出hadoop-slave2上的nodemanger启动成功,但是一直连接不上hadoop-master上的ResourceManager。
再去查看hadoop-master上的yarn-root-resourcemanager-hadoop-master.log
,发现下面的内容:
2018-12-30 08:00:20,539 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid resource scheduler vcores allocation configuration, yarn.scheduler.minimum-allocation-vcores=0, yarn.scheduler.maximum-allocation-vcores=1, min and max should be greater than 0, max should be no smaller than min.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid resource scheduler vcores allocation configuration, yarn.scheduler.minimum-allocation-vcores=0, yarn.scheduler.maximum-allocation-vcores=1, min and max should be greater than 0, max should be no smaller than min.
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.validateConf(CapacityScheduler.java:213)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:321)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:395)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:740)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1137)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:300)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1421)
从日志可以看出,启动失败是因为允许分配给container的最小cpu核数是0,而hadoop要求该参数必须大于0。yarn-site.xml中该参数配置如下:
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>0</value>
</property>
将其修改为1:
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
网友评论