美文网首页
邮件翻译(pg位图索引相关问题解答)

邮件翻译(pg位图索引相关问题解答)

作者: M_lear | 来源:发表于2022-10-27 15:38 被阅读0次

    原邮件内容

    Bitmap Heap Scan阶段做了什么?

    A plain indexscan fetches one tuple-pointer at a time from the index, and immediately visits that tuple in the table. A bitmap scan fetches all the tuple-pointers from the index in one go, sorts them using an in-memory "bitmap" data structure, and then visits the table tuples in physical tuple-location order. The bitmap scan improves locality of reference to the table at the cost of more bookkeeping overhead to manage the "bitmap" data structure --- and at the cost that the data is no longer retrieved in index order, which doesn't matter for your query but would matter if you said ORDER BY.

    普通索引扫描每次从索引中获取一个元组指针,并立即访问表中的该元组。位图扫描一次性从索引中获取所有元组指针,使用内存中的“位图”数据结构对它们进行排序,然后以物理元组位置顺序访问表元组。位图扫描改善了访问表时的局部性,代价是更多的簿记开销来管理“位图”数据结构——并且代价是数据不再按索引顺序检索,这对你的查询无关紧要,但是如果你使用 ORDER BY 就会有影响。


    个人总结:位图扫描的优化点在于,把随机IO优化为顺序IO。对于磁盘访问,随机IO是一个很值得优化的点。


    位图扫描

    什么是Recheck condition,为什么需要它?

    If the bitmap gets too large we convert it to "lossy" style, in which we only remember which pages contain matching tuples instead of remembering each tuple individually. When that happens, the table-visiting phase has to examine each tuple on the page and recheck the scan condition to see which tuples to return.

    如果位图变得太大,我们将其转换为“有损”风格,在这种风格中,我们只记住哪些页面包含匹配的元组,而不是单独记住每个元组。当这种情况发生时,表访问阶段必须检查页面上的每个元组,并重新检查扫描条件,以确定返回哪个元组。

    我认为Bitmap Index Scan只有在计划中有两个或更多适用的索引时才会使用,所以我不明白为什么现在要使用它?

    True, we can combine multiple bitmaps via AND/OR operations to merge results from multiple indexes before visiting the table ... but it's still potentially worthwhile even for one index. A rule of thumb is that plain indexscan wins for fetching a small number of tuples, bitmap scan wins for a somewhat larger number of tuples, and seqscan wins if you're fetching a large percentage of the whole table.

    的确,我们可以通过 AND/OR 操作组合多个位图,在访问表之前合并来自多个索引的结果...但是即使对于一个索引来说,这仍然是值得的。一个经验法则是,普通【索引扫描】适用于获取少量的元组,【位图扫描】适用于获取较大数量的元组,如果你要获取整个表的大部分内容,则最好使用【顺序扫描】。


    个人感触:这里其实点醒我,不要盲目迷信索引。对于pg的堆表存储结构,走索引获取数据,需要先IO索引,再IO数据表,且都是随机IO。当需要获取的结果涵盖表的大部分内容时,还不如走全表扫描。

    相关文章

      网友评论

          本文标题:邮件翻译(pg位图索引相关问题解答)

          本文链接:https://www.haomeiwen.com/subject/wmuttdtx.html