美文网首页工作生活
Spark BlockManager

Spark BlockManager

作者: clive0x | 来源:发表于2019-06-29 22:48 被阅读0次

    BlockManager为Spark 存储block主要类,和HDFS类似点:

    三份数据存储时,本机,本RACK,其它机器。

    和HDFS不一样点:

    数据先存Memory,如果StorageLevel指定了disk,在内存不足时存disk。

    几个主要的类:

    1.BlockInfoManager,主要管理blockID与blockinfo信息

    private[this] val infos = new mutable.HashMap[BlockId, BlockInfo]

    对block的操作提供读写保护锁

    2.BlockManagerMasterEndpoint,管理集群BlockManagers,

    提供BlockMangerId=>BlockMangerInfo mapping,其它

    BlockMangerInfo为单个blockManager及上面存储的blocks

    // Mapping from block manager id to the block manager's information.

      private val blockManagerInfo = new mutable.HashMap[BlockManagerId, BlockManagerInfo]

      // Mapping from executor ID to block manager ID.

      private val blockManagerIdByExecutor = new mutable.HashMap[String, BlockManagerId]

      // Mapping from block id to the set of block managers that have the block.

      private val blockLocations = new JHashMap[BlockId, mutable.HashSet[BlockManagerId]]

    BlockManagerInfo类:

    // Mapping from block id to its status.

      private val _blocks = new JHashMap[BlockId, BlockStatus]

      // Cached blocks held by this BlockManager. This does not include broadcast blocks.

      private val _cachedBlocks = new mutable.HashSet[BlockId]

    最后就是memoryStore和diskStore了,优先使用memoryStore,这也是为啥Spark打败MapReduce的原因,优先使用内存。MapReduce,至少V1,Map端存储到disk,shuffle到Reduce端后,在Reduce端做外排序。

    相关文章

      网友评论

        本文标题:Spark BlockManager

        本文链接:https://www.haomeiwen.com/subject/nrpmcctx.html