美文网首页工作生活
Spark BlockManager

Spark BlockManager

作者: clive0x | 来源:发表于2019-06-29 22:48 被阅读0次

BlockManager为Spark 存储block主要类,和HDFS类似点:

三份数据存储时,本机,本RACK,其它机器。

和HDFS不一样点:

数据先存Memory,如果StorageLevel指定了disk,在内存不足时存disk。

几个主要的类:

1.BlockInfoManager,主要管理blockID与blockinfo信息

private[this] val infos = new mutable.HashMap[BlockId, BlockInfo]

对block的操作提供读写保护锁

2.BlockManagerMasterEndpoint,管理集群BlockManagers,

提供BlockMangerId=>BlockMangerInfo mapping,其它

BlockMangerInfo为单个blockManager及上面存储的blocks

// Mapping from block manager id to the block manager's information.

  private val blockManagerInfo = new mutable.HashMap[BlockManagerId, BlockManagerInfo]

  // Mapping from executor ID to block manager ID.

  private val blockManagerIdByExecutor = new mutable.HashMap[String, BlockManagerId]

  // Mapping from block id to the set of block managers that have the block.

  private val blockLocations = new JHashMap[BlockId, mutable.HashSet[BlockManagerId]]

BlockManagerInfo类:

// Mapping from block id to its status.

  private val _blocks = new JHashMap[BlockId, BlockStatus]

  // Cached blocks held by this BlockManager. This does not include broadcast blocks.

  private val _cachedBlocks = new mutable.HashSet[BlockId]

最后就是memoryStore和diskStore了,优先使用memoryStore,这也是为啥Spark打败MapReduce的原因,优先使用内存。MapReduce,至少V1,Map端存储到disk,shuffle到Reduce端后,在Reduce端做外排序。

相关文章

网友评论

    本文标题:Spark BlockManager

    本文链接:https://www.haomeiwen.com/subject/nrpmcctx.html