什么是学习型索引(Learned Index)?
The Case for Learned Index Structures 来自 Tim Kraska 团队,一维数据学习 CDF。
一维学习索引进军多维学习索引的困境
该 MIT 团队也很快开启了多分支科研上的降维打击,比如哈希、布隆过滤器、多维索引都被这套技术赋能。换汤不换药,最重要的点仍是 RMI 模型。我们这里更加集中于去思考在多维数据这个对象主体,尤其是空间数据(简单理解为 GPS 坐标、移动对象轨迹等)。
LISA
这种转换在计算上是昂贵的,并且这种方法仅对基于磁盘的 R 树有效,其中与 IO 成本相比,CPU 时间可以忽略不计。
多维 投影到 一维
LISA 专注于通过使用 lattice regression 模型将空间二维数据映射到一维来最小化磁盘上的 IO。
他的目的是磁盘 IO 优化,这是一个很容易说得通的点。
Flood 和 Tsunami
Flood 基于 RMI,探索多维数据怎么来做 Learned Index,他就是通过 M-D 投影到 1-D,沿用 RMI,有代价模型支持
Tsunami 基于 Flood 考虑 Query Workload,使得结果优化。
IF-X
R 树具有与叶节点和非叶(内部)节点完全不同的节点,R-Tree 的内部节点存储其子节点的 MBR 以及指向子节点的指针,而叶节点仅存储数据,即点(Point)。
One particular issue on making a learning-augmented spatial index is to understand which part of the spatial index can be augmented with prediction models.
这句话写得很好,我们要明白空间索引的哪部分可以被预测模型加强
IF-X 没有考虑 query 负载及其分布
插值(interpolation)友好体现在哪??
参考文献
[1] The Case for Learned Index Structures,SIGMOD,2018
[2] The Potential of Learned Index Structures for Index Compression,ADCS,2018
[3] ASLM: Adaptive Single Layer Model for Learned Index,DASFAA,2019
[4] Interpolation-friendly B-trees: Bridging the Gap Between Algorithmic and Learned Indexes,EDBT,2019
[5] Learned Index for Spatial Queries,MDM,2019
[6] Considerations for handling updates in learned index structures,SIGMOD,2019
[7] Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads,VLDB,2020
[8] The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds,VLDB,2020
[9] Benchmarking Learned Indexes,VLDB,2020
[10] Function Interpolation for Learned Index Structures,ADC,2020
[11] "The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries",EDBT,2020
[12] A Tutorial on Learned Multi-dimensional Indexes,SIGSPATIAL,2020
[13] Why Are Learned Indexes So Effective,ICML,2020
[14] From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees,OSDI,2020
[15] XIndex: a scalable learned index for multicore data storage,PPoPP,2020
[16] ALEX: An Updatable Adaptive Learned Index,SIGMOD,2020
[17] CDFShop: Exploring and Optimizing Learned Index Structures,SIGMOD,2020
[18] Spatial Queries Based on Learned Index,SPATIALDI,2020
[19] The Case for Learned Spatial Indexes,VLDB,2020
[20] Updatable Learned Index with Precise Positions,VLDB,2021
[21] Shift-Table: A Low-latency Learned Index for Range Queries using Model Correction,EDBT,2021
[22] How Does Updatable Learned Index Perform on Non-Volatile Main Memory,ICDE,2021
[23] RUSLI: Real-time Updatable Spline Learned Index,SIGMOD,2021
[24] A Tailored Regression for Learned Indexes: Logarithmic Error Regression,SIGMOD,2021
[25] Effectively Learning Spatial Indices,VLDB,2020
[26] LISA: A Learned Index Structure for Spatial Data,SIGMOD,2020
[27] Learning Multi-Dimensional Indexes,SIGMOD,2020
[28] SageDB: A Learned Database System,CIDR,2019
[29] AI Meets Database: AI4DB and DB4AI,SIGMOD,2021
[30] SIA: Optimizing Queries using Learned Predicates,SIGMOD,2021
[31] LEA: A Learned Encoding Advisor for Column Stores,SIGMOD,2021
[32] Instance-Optimized Data Layouts for Cloud Analytics Workloads.,SIGMOD,2021
[33] A Tailored Regression for Learned Indexes: Logarithmic Error Regression,SIGMOD,2021
[34] Learning Algorithms for Automatic Data Structure Design,SIGMOD,2021
[35] Towards a Benchmark for Learned Systems,ICDE,2021
网友评论