美文网首页
spark Broadcast

spark Broadcast

作者: clive0x | 来源:发表于2019-06-25 21:53 被阅读0次

    drive端写入,val broadcastVar = sc.broadcast(Array(1, 2, 3))

    -〉env.broadcastManager.newBroadcast[T](value, isLocal)

      -〉TorrentBroadcastFactory:new TorrentBroadcast[T](value_, id)

    在driver端的BlockManager中存储。

    使用时调用:

    broadcastVar.value broadcast仅是对value的包装,存储在driver/executors中,存储时可以指定racking(spark.storage.replication.topologyFile和spark.storage.replication.topologyMapper),按spark.broadcast.blockSize(4M默认)分chunk存储,存储到本机、本rack、其它rack。

    入口类BroadcastManager没有getValue或者value,只能在Driver中使用,如rdd.Map(_ ==broadcastVar.value)等

    有遍文章不错:

    https://umbertogriffo.gitbooks.io/apache-spark-best-practices-and-tuning/content/sparksqlshufflepartitions_draft.html

    相关文章

      网友评论

          本文标题:spark Broadcast

          本文链接:https://www.haomeiwen.com/subject/chdxcctx.html