美文网首页
Spark2x on yarn日志配置详解

Spark2x on yarn日志配置详解

作者: super_wing | 来源:发表于2019-12-18 20:03 被阅读0次

    概述

    Spark on Yarn的日志配置分为两类:

    1. Spark on Yarn client模式
    2. Spark on Yarn cluster模式

    接下为大家逐一介绍。

    Spark on Yarn client模式下的日志配置

    在client模式下,Spark分为三部分,分别是
    driver,application master以及executor,这种模式通常使用在测试环境中。

    • driver:可以认为是spark application客户端

    • application master:是用来从yarn的ResourceManager获取资源,并分配资源给具体的任务,启动/停止任务等。

    • executor运行在某个的nodeManager节点的container中,并执行具体的任务。

    基于以上的讲解,来看一下其日志的配置:

    • driver端
     spark-submit \
     --class com.hm.spark.Application \
     --master yarn \
     --deploy-mode cluster \
     // client模式下driver端日志
     --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties" \
     // 将本地文件上传到container中
     --files /home/hadoop/spark-workspace/log4j.properties \
     /home/hadoop/spark-workspace/my-spark-etl-assembly-1.0-SNAPSHOT.jar
    
    • applicationMaster端
     spark-submit \
     --class com.hm.spark.Application \
     --master yarn \
     --deploy-mode cluster \
     // application master端日志
     --conf "spark.yarn.am.extraJavaOptions=-Dlog4j.configuration=log4j.properties" \
     // 将本地文件上传到container中
     --files /home/hadoop/spark-workspace/log4j.properties \
     /home/hadoop/spark-workspace/my-spark-etl-assembly-1.0-SNAPSHOT.jar
    
    • executor端
     spark-submit \
     --class com.hm.spark.Application \
     --master yarn \
     --deploy-mode cluster \
     // executor端日志
     --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties" \
     // 将本地文件上传到container中
     --files /home/hadoop/spark-workspace/log4j.properties \
     /home/hadoop/spark-workspace/my-spark-etl-assembly-1.0-SNAPSHOT.jar
    

    Spark on Yarn cluster模式下的日志配置

    在cluster模式下,Spark分为两部分,分别是
    driver和executor,通常应用在生产环境。

    • driver既承担client的角色又有application master的能力,运行在某个的nodeManager节点的container中。

    • executor运行在具体的nodeManager的container上,并执行具体的任务。

    基于以上的讲解,来看一下其日志的配置:

    • driver端
    spark-submit \
     --class com.hm.spark.Application \
     --master yarn \
     --deploy-mode cluster \
     // yarn cluster driver端日志
     --conf "spark.yarn.cluster.driver.extraJavaOption=-Dlog4j.configuration=log4j.properties" \
     // 将本地文件上传到container中
     --files /home/hadoop/spark-workspace/log4j.properties \
     /home/hadoop/spark-workspace/my-spark-etl-assembly-1.0-SNAPSHOT.jar
    
    • executor端
     spark-submit \
     --class com.hm.spark.Application \
     --master yarn \
     --deploy-mode cluster \
     // executor端日志
     --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties" \
     // 将本地文件上传到container中
     --files /home/hadoop/spark-workspace/log4j.properties \
     /home/hadoop/spark-workspace/my-spark-etl-assembly-1.0-SNAPSHOT.jar
    

    具体的日志文件内容

    在client模式下,driver日志配置模板为:

    # Set everything to be logged to the console
    log4j.rootCategory=INFO, console
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    # Set the default spark-shell log level to WARN. When running the spark-shell, the
    # log level for this class is used to overwrite the root logger's log level, so that
    # the user can have different defaults for the shell and regular Spark apps.
    log4j.logger.org.apache.spark.repl.Main=WARN
    # Settings to quiet third party logs that are too verbose
    log4j.logger.org.spark_project.jetty=WARN
    log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
    log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
    log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
    log4j.logger.org.apache.parquet=ERROR
    log4j.logger.parquet=ERROR
    # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
    log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
    log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
    

    这里使用控制台输出可以在driver更加方便的查看日志。

    其它日志配置

    log4j.rootLogger=INFO,rolling
    log4j.appender.rolling=org.apache.log4j.RollingFileAppender
    log4j.appender.rolling.File=${log}/abc.log
    log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
    log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n
    log4j.appender.rolling.maxFileSize=2KB
    log4j.appender.rolling.maxBackupIndex=10
    

    这里建议使用appender,从而防止日志过大把磁盘撑爆。

    相关文章

      网友评论

          本文标题:Spark2x on yarn日志配置详解

          本文链接:https://www.haomeiwen.com/subject/nrtpnctx.html