美文网首页
LSM Compaction Strategy

LSM Compaction Strategy

作者: chenfh5 | 来源:发表于2018-12-19 19:48 被阅读10次

    Outline

    0. Overview
    1. Amplification
    2. Size-tiered Compaction
    3. Leveled Compaction
    4. Summary
    5. Lucene Merge Policy
    6. Reference
    

    Overview

    Compaction operations are expensive in terms of CPU, memory, and Disk I/O,而由于immutable特质,该操作在LSM架构上有必不可少。

    Log Structured Merge (LSM)

    data过来之后会写到memory table (MemTable),当mem满了之后,会flush到disk形成不可变的immutable Sorted String Table (SSTable)。当SSTable太多,os所打开的文件句柄也会过多,所以此时需要将多个同质的SSTable合并成一个SSTable。

    leveldb architecture

    Amplification

    • 写放大:一份数据被顺序写几次(还是那一份数据,只是需要一直往下流)。第一次写到L0,之后由于compaction可能在L1~LN各写一次
    • 读放大:一个query会在多个sstable,而这些sstable分布在L0~LN
    • 空间放大:一份原始数据被copy/overwritten(一份用于服务query,一份用于snapshot之后用于compaction)

    Size-tiered Compaction

    • Triggered when the system has enough similarly sized SSTables, merged together to form one larger sstable. A disadvantage of this strategy is that very large SSTable will stay behind for a long time and utilize a lot of disk space (recommended for write-intensive workloads)
    • 每一个tiers的单片大小逐渐变大,但是每一个tiers的sstables数量一致
    • 如果某一个tier满了(即sstables数量达到阈值)就会进行compaction,从而将该tier的所有数据merge为一个然后丢给下一个tier作为下一个tier的一个sstable。而在这个merge的过程,会copy一份原数据snapshot用于merge,merge之后再删除
    tiered (num same,size grow)

    Leveled Compaction

    • Triggered when unleveled SSTables (newly flushed SSTable files in Level 0) exceeds 4 (recommended for read-intensive workloads)
    • 每一个tier里面的 sstable大小都是一致的,区别是每一个tier的sstable数量是逐渐变大的(一个数量级)
    • tier1里面的sstables会跟tier2的sstables一起进行merge操作,最终在tier2(量大者)上形成一个有序的sstable
    leveled (num grow,size same)

    Summary

    Size-tiered Compaction vs. Leveled Compaction
    • data in one SSTable which is later modified or deleted in another SSTable wastes space as both tables are present in the system
    • when data is split across many SSTables, read requests are processed slower as many SSTables need to be read
    Scylla compaction summary

    Lucene Merge Policy

    这里同样有三个放大问题,

    • 写放大(doc在segment之间的迁移)
    • 读放大(doc不同版本在不同segment,而打开一个segment需要一个indexReader)
    • 空间放大(segments tmp空间)
      lucene write-once vs. random write

    Reference

    相关文章

      网友评论

          本文标题:LSM Compaction Strategy

          本文链接:https://www.haomeiwen.com/subject/krklkqtx.html