美文网首页
spark Broadcast

spark Broadcast

作者: clive0x | 来源:发表于2019-06-25 21:53 被阅读0次

drive端写入,val broadcastVar = sc.broadcast(Array(1, 2, 3))

-〉env.broadcastManager.newBroadcast[T](value, isLocal)

  -〉TorrentBroadcastFactory:new TorrentBroadcast[T](value_, id)

在driver端的BlockManager中存储。

使用时调用:

broadcastVar.value broadcast仅是对value的包装,存储在driver/executors中,存储时可以指定racking(spark.storage.replication.topologyFile和spark.storage.replication.topologyMapper),按spark.broadcast.blockSize(4M默认)分chunk存储,存储到本机、本rack、其它rack。

入口类BroadcastManager没有getValue或者value,只能在Driver中使用,如rdd.Map(_ ==broadcastVar.value)等

有遍文章不错:

https://umbertogriffo.gitbooks.io/apache-spark-best-practices-and-tuning/content/sparksqlshufflepartitions_draft.html

相关文章

网友评论

      本文标题:spark Broadcast

      本文链接:https://www.haomeiwen.com/subject/chdxcctx.html