参考
24/7 Spark Streaming on YARN in Production
Long-running Spark Streaming Jobs on YARN Cluster(还没看)
配置
- spark 配置,excutors数量,核数等
- yarn 配置
反压机制
下游遇到压力的情况下通知上游系统降低输入速率
监控
- 原监控,例如kafka,工具如Burrow (LinkedIn) or KafkaOffsetMonitor
- spark监控
- spark自带的数据监控,可以接入Graphite or Ganglia等监控系统
- 自定义的监控使用spark-metrics库
日志
- 使用RollingFileAppender来防止日志堆积
* For using a custom log4j configuration, several configurations are necessary:
* Upload a custom log4j.properties into the working directory of each container of the application: --files hdfs:///path/to/log4j-yarn.properties
* Overwrite the driver and executor log4j.properties using System properties:
- 更好的选择
There are more advanced methods for dealing with YARN logs (a detailed discussion is out of scope of this article):
* Install ELK stack and configure a Logstash log4j appender (for a short elaboration see Logging section in the spark streaming on yarn blogpost).
* Use a log4j SMTPAppender to send E-Mail alerts in case of errors (needs fine-tuning to prevent spam).
网友评论