美文网首页
spark in production

spark in production

作者: Bitson | 来源:发表于2018-12-19 23:39 被阅读0次

参考

24/7 Spark Streaming on YARN in Production

Long-running Spark Streaming Jobs on YARN Cluster(还没看)

兑吧:从自建HBase迁移到阿里云HBase实战经验

配置

- spark 配置,excutors数量,核数等

- yarn 配置

反压机制

下游遇到压力的情况下通知上游系统降低输入速率

监控

- 原监控,例如kafka,工具如Burrow (LinkedIn) or KafkaOffsetMonitor

- spark监控

    - spark自带的数据监控,可以接入Graphite or Ganglia等监控系统

    - 自定义的监控使用spark-metrics

日志

- 使用RollingFileAppender来防止日志堆积

* For using a custom log4j configuration, several configurations are necessary:

* Upload a custom log4j.properties into the working directory of each container of the application: --files hdfs:///path/to/log4j-yarn.properties

* Overwrite the driver and executor log4j.properties using System properties:

- 更好的选择

There are more advanced methods for dealing with YARN logs (a detailed discussion is out of scope of this article):

* Install ELK stack and configure a Logstash log4j appender (for a short elaboration see Logging section in the spark streaming on yarn blogpost).

* Use a log4j SMTPAppender to send E-Mail alerts in case of errors (needs fine-tuning to prevent spam).

相关文章

网友评论

      本文标题:spark in production

      本文链接:https://www.haomeiwen.com/subject/mjgocqtx.html