美文网首页HbaseSpark
Yarn, Hbase日志

Yarn, Hbase日志

作者: Grey____ | 来源:发表于2019-01-16 16:28 被阅读0次

    cdh默认安装,日志都在/var/log下,先找这里最方便

    yarn

    查看某个具体的applicationid的log:yarn logs -applicationId application_1546927165868_0023
    如果你想看有多少application_id,可以进入:hdfs dfs -ls /tmp/logs/root/logs

    例子:

    当我搭建好cdh后,执行wordcount,尝试mr,报错

    root@hadoop-slave1:/home/zhanqian/input# hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-examples-2.6.0-cdh5.15.1.jar wordcount /input /output/wordcount1
    19/01/09 10:24:17 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/10.0.0.81:8032
    19/01/09 10:24:18 INFO input.FileInputFormat: Total input paths to process : 2
    19/01/09 10:24:18 INFO mapreduce.JobSubmitter: number of splits:2
    19/01/09 10:24:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1546927165868_0024
    19/01/09 10:24:19 INFO impl.YarnClientImpl: Submitted application application_1546927165868_0024
    19/01/09 10:24:19 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1546927165868_0024/
    19/01/09 10:24:19 INFO mapreduce.Job: Running job: job_1546927165868_0024
    19/01/09 10:24:28 INFO mapreduce.Job: Job job_1546927165868_0024 running in uber mode : false
    19/01/09 10:24:28 INFO mapreduce.Job:  map 0% reduce 0%
    19/01/09 10:24:28 INFO mapreduce.Job: Job job_1546927165868_0024 failed with state FAILED due to: Application application_1546927165868_0024 failed 2 times due to AM Container for appattempt_1546927165868_0024_000002 exited with  exitCode: 1
    For more detailed output, check application tracking page:http://hadoop-master:8088/proxy/application_1546927165868_0024/Then, click on links to logs of each attempt.
    Diagnostics: Exception from container-launch.
    Container id: container_1546927165868_0024_02_000001
    Exit code: 1
    Stack trace: ExitCodeException exitCode=1: 
            at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
            at org.apache.hadoop.util.Shell.run(Shell.java:507)
            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
            at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
            at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
            at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    
    
    Container exited with a non-zero exit code 1
    

    上面报错并不能找到有效的信息,根据提示打开url也没有有效信息,里面可以找到一个hadoop-cmf-yarn-JOBHISTORY-hadoop-master.log.out,但没有有效信息,里面的报错并不是根本原因,而是连锁反应发生的。这时候就要看yarn日志,用yarn logs -applicationId application_1546927165868_0023查看即可。

    查看yarn当前运行任务列表,可使用如下命令查看:yarn application -list
    如需杀死当前某个作业,使用kill application-id的命令如下:yarn application -kill application_1437456051228_1725

    executor端日志

    当以cluster/client运行spark时候,运行在如下所示,没有任何异常报错。

    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Registering RDD 1 (map at UserAction.scala:598)
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Got job 0 (collect at UserAction.scala:609) with 1 output partitions
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (collect at UserAction.scala:609)
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[1] at map at UserAction.scala:598), which has no missing parents
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.8 KB, free 365.9 MB)
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 365.9 MB)
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.81.77.67:17664 (size: 2.3 KB, free: 366.3 MB)
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[1] at map at UserAction.scala:598) (first 15 tasks are for partitions Vector(0))
    16-11-2018 15:14:36 CST noah-dp-spark INFO - 18/11/16 15:14:36 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
    16-11-2018 15:14:37 CST noah-dp-spark INFO - 18/11/16 15:14:37 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
    16-11-2018 15:14:41 CST noah-dp-spark INFO - 18/11/16 15:14:41 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.81.174.117:39678) with ID 1
    16-11-2018 15:14:41 CST noah-dp-spark INFO - 18/11/16 15:14:41 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1)
    16-11-2018 15:14:41 CST noah-dp-spark INFO - 18/11/16 15:14:41 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop-slave1:46294 with 366.3 MB RAM, BlockManagerId(1, hadoop-slave1, 46294, None)
    16-11-2018 15:14:41 CST noah-dp-spark INFO - 18/11/16 15:14:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop-slave1, executor 1, partition 0, RACK_LOCAL, 5811 bytes)
    16-11-2018 15:14:41 CST noah-dp-spark INFO - 18/11/16 15:14:41 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop-slave1:46294 (size: 2.3 KB, free: 366.3 MB)
    16-11-2018 15:14:43 CST noah-dp-spark INFO - 18/11/16 15:14:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop-slave1:46294 (size: 32.8 KB, free: 366.3 MB)
    

    接下来就是找日志,发现卡在hadoop-slave1节点上,那么我们去hadoop-slave1上去找日志信息。
    spark on yarn模式下一个executor对应yarn的一个container,所以在executor的节点运行ps -ef|grep spark.yarn.app.container.log.dir,如果这个节点上可能运行多个application,那么再通过application id进一步过滤。上面的命令会查到executor的进程信息,并且包含了日志路径,例如

    -Djava.io.tmpdir=/data1/hadoop/yarn/local/usercache/ocdp/appcache/application_1521424748238_0051/container_e07_1521424748238_0051_01_000002/tmp '
    -Dspark.history.ui.port=18080' '-Dspark.driver.port=59555' 
    -Dspark.yarn.app.container.log.dir=/data1/hadoop/yarn/log/application_1521424748238_0051/container_e07_1521424748238_0051_01_000002
    
    

    也就是说这个executor的日志就在/data1/hadoop/yarn/log/application_1521424748238_0051/container_e07_1521424748238_0051_01_000002目录里。至此,我们就找到了运行时的executor日志。

    另外还遇到个问题,我在以cluster模式启动的时候,14秒左右就fail了,想看container里面的日志,结果被删除了,原因是默认运行结束删除,我在CDH中修改了yarn的配置yarn.nodemanager.delete.debug-delay-sec = 1000 修改该配置即可,你就能看到运行完的debug log记录了。

    hbase
    1. 先在cdh的std角色和sterr日志里看
    2. 找不到,在日志在相关进程寻找看看,例如:/opt/cm-5.15.1/run/cloudera-scm-agent/process/444-hbase-REGIONSERVER/logs
      之前的RS启动后即宕机,发现错误在这里面才看到,原因是默认配置50M导致的对内存不足
    Thu Feb 21 16:02:21 CST 2019
    JAVA_HOME=/opt/jdk/jdk1.8.0_181
    using /opt/jdk/jdk1.8.0_181 as JAVA_HOME
    using 5 as CDH_VERSION
    using  as HBASE_HOME
    using /opt/cm-5.15.1/run/cloudera-scm-agent/process/444-hbase-REGIONSERVER as HBASE_CONF_DIR
    using /opt/cm-5.15.1/run/cloudera-scm-agent/process/444-hbase-REGIONSERVER as HADOOP_CONF_DIR
    using  as HADOOP_HOME
    CONF_DIR=/opt/cm-5.15.1/run/cloudera-scm-agent/process/444-hbase-REGIONSERVER
    CMF_CONF_DIR=/opt/cm-5.15.1/etc/cloudera-scm-agent
    Thu Feb 21 16:02:21 CST 2019 Starting znode cleanup thread with HBASE_ZNODE_FILE=/opt/cm-5.15.1/run/cloudera-scm-agent/process/444-hbase-REGIONSERVER/znode25911 for regionserver
    java.lang.OutOfMemoryError: Java heap space
    Dumping heap to /tmp/hbase_hbase-REGIONSERVER-409cef9ec8084db201a877a119f4f55e_pid25911.hprof ...
    Heap dump file created [62523093 bytes in 0.327 secs]
    #
    # java.lang.OutOfMemoryError: Java heap space
    # -XX:OnOutOfMemoryError="kill -9 %p
    /opt/cm-5.15.1/lib/cmf/service/common/killparent.sh"
    #   Executing /bin/sh -c "kill -9 25911
    /opt/cm-5.15.1/lib/cmf/service/common/killparent.sh"...
    

    相关文章

      网友评论

        本文标题:Yarn, Hbase日志

        本文链接:https://www.haomeiwen.com/subject/pgbmdqtx.html