美文网首页
(十九)保存Spark运行日志的配置

(十九)保存Spark运行日志的配置

作者: 白面葫芦娃92 | 来源:发表于2018-09-19 16:18 被阅读0次

    生产上作业运行日志务必要保存下来,以防出现错误无法排查
    官网关于日志监控的说明:
    You can access this interface by simply opening http://<driver-node>:4040 in a web browser. If multiple SparkContexts are running on the same host, they will bind to successive ports beginning with 4040 (4041, 4042, etc).
    Note that this information is only available for the duration of the application by default. To view the web UI after the fact, set spark.eventLog.enabled to true before starting the application. This configures Spark to log Spark events that encode the information displayed in the UI to persisted storage.
    怎样设置呢?

    [hadoop@hadoop000 ~]$ cd $SPARK_HOME/conf
    [hadoop@hadoop000 conf]$ vi spark-defaults.conf
    spark.eventLog.enabled           true
    spark.eventLog.dir               hdfs://hadoop000:9000/directory
    [hadoop@hadoop000 conf]$ vi spark-env.sh
    SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop000:9000/directory"
    [hadoop@hadoop000 ~]$ cd $SPARK_HOME/sbin
    [hadoop@hadoop000 sbin]$ ./start-history-server.sh
    starting org.apache.spark.deploy.history.HistoryServer, logging to /home/hadoop/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/logs/spark-hadoop-org.apache.spark.deploy.history.HistoryServer-1-hadoop000.out
    
    spark.history.ui.port 默认为18080,如果要修改,也在SPARK_HISTORY_OPTS里设置,如SPARK_HISTORY_OPTS="-Dspark.history.ui.port=7777"

    运行一个作业验证一下

    ./spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master local[2] \
     /home/hadoop/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/examples/jars/spark-examples_2.11-2.3.1.jar \
    3
    
    ./spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master yarn \
     /home/hadoop/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/examples/jars/spark-examples_2.11-2.3.1.jar \
    3
    

    前面设置日志保存在hdfs上,查看一下

    [hadoop@hadoop000 sbin]$ hadoop fs -ls /directory
    Found 2 items
    -rwxrwx---   1 hadoop supergroup      39964 2018-09-19 23:56 /directory/application_1537370027569_0002
    -rwxrwx---   1 hadoop supergroup      38568 2018-09-19 23:54 /directory/local-1537372441615
    
    为了防止日志在HDFS累积过多,可以设置定期清理

    日志还可以设置压缩,官网没有提及,可查看源码

    还有其他监控方法,详见官网

    相关文章

      网友评论

          本文标题:(十九)保存Spark运行日志的配置

          本文链接:https://www.haomeiwen.com/subject/aoyinftx.html