（十九）保存Spark运行日志的配置

作者: 白面葫芦娃92 | 来源:发表于2018-09-19 16:18 被阅读0次

（十九）保存Spark运行日志的配置
解决spark streaming长时间运行日志不断增长问题
解决spark streaming长时间运行日志不断增长问题
spark stream配置log输出
pycharm 开发pyspark
hadoop 3.x大数据集群搭建系列10-配置Spark Sh
解决spark streaming日志不断增长问题
Spark standalone 实践
Spark配置History服务
spark安装与部署

生产上作业运行日志务必要保存下来，以防出现错误无法排查
官网关于日志监控的说明：
You can access this interface by simply opening http://<driver-node>:4040 in a web browser. If multiple SparkContexts are running on the same host, they will bind to successive ports beginning with 4040 (4041, 4042, etc).
Note that this information is only available for the duration of the application by default. To view the web UI after the fact, set spark.eventLog.enabled to true before starting the application. This configures Spark to log Spark events that encode the information displayed in the UI to persisted storage.
怎样设置呢？

[hadoop@hadoop000 ~]$ cd $SPARK_HOME/conf
[hadoop@hadoop000 conf]$ vi spark-defaults.conf
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://hadoop000:9000/directory
[hadoop@hadoop000 conf]$ vi spark-env.sh
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop000:9000/directory"
[hadoop@hadoop000 ~]$ cd $SPARK_HOME/sbin
[hadoop@hadoop000 sbin]$ ./start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /home/hadoop/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/logs/spark-hadoop-org.apache.spark.deploy.history.HistoryServer-1-hadoop000.out

spark.history.ui.port 默认为18080，如果要修改，也在SPARK_HISTORY_OPTS里设置，如SPARK_HISTORY_OPTS="-Dspark.history.ui.port=7777"

运行一个作业验证一下

./spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[2] \
 /home/hadoop/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/examples/jars/spark-examples_2.11-2.3.1.jar \
3

./spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
 /home/hadoop/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/examples/jars/spark-examples_2.11-2.3.1.jar \
3

前面设置日志保存在hdfs上，查看一下

[hadoop@hadoop000 sbin]$ hadoop fs -ls /directory
Found 2 items
-rwxrwx---   1 hadoop supergroup      39964 2018-09-19 23:56 /directory/application_1537370027569_0002
-rwxrwx---   1 hadoop supergroup      38568 2018-09-19 23:54 /directory/local-1537372441615

为了防止日志在HDFS累积过多，可以设置定期清理