(2022.06.12 Sun)
B-tree即B树,是多路搜索树的一种,常用于数据库的实现。
多路搜索树 Multiway Search tree
定义是有序树的一个节点,如果有d个子节点,则称其为d-node。多路搜索树是一个有序树,满足如下性质:
- 每个内部节点包含至少两个节点,即每个内部节点都是d-node,其中
- 每个树的内部d-node 有子树,保存了d-1个k-v对有序集合,其中
- 定义,对的以子树节点中的,,有
一个d-node保存了个常规key。外部节点external node仅仅作为占位符,不保存任何数据,用None
引用表示。根据上面定义,多路搜索树的k-v对的个数与外部节点数有如下关系:
一个有n个元素的多路搜索树有n+1个外部节点。
在多路搜索树中搜索
从树的根节点开始,在d-node ,比较搜索的key 和上的key 。如果,则搜索完成。否则,我们在的子节点上搜索。如果到达外部节点,我们直到树中没有key ,搜索失败。
多路搜索树的实现
对于非二叉树的一般树实现,节点的子节点保存为一个容器。
We discussed a linked data structure for representing a general tree.
This representation can also be used for a multiway search tree. When using a general tree to implement a multiway search tree, we must store at each node one or more key-value pairs associated with that node. That is, we need to store with
a reference to some collection that stores the items for .
During a search for key in a multiway search tree, the primary operation needed when navigating a node is finding the smallest key at that node that is greater than or equal to . For this reason, it is natural to model the information at a node itself as a sorted map, allowing use of the find_ge(k)
method. We say such
a map serves as a secondary data structure to support the primary data structure represented by the entire multiway search tree. This reasoning may at first seem like a circular argument, since we need a representation of a (secondary) ordered
map to represent a (primary) ordered map. We can avoid any circular dependence, however, by using the bootstrapping technique, where we use a simple solution to a problem to create a new, more advanced solution. In the context of a multiway search tree, a natural choice for the secondary structure at each node is the SortedTableMap. Because we want to determine the associated value in case of a match for key , and otherwise the
corresponding child ci such that , we recommend having each key in the secondary structure map to the pair . With such a realization of a
multiway search tree , processing a d-node while searching for an item of with key can be performed using a binary search operation in time. Let
denote the maximum number of children of any node of , and let denote the height of . The search time in a multiway search tree is therefore .
If is a constant, the running time for performing a search is .
The primary efficiency goal for a multiway search tree is to keep the height as small as possible. We next discuss a strategy that caps at 4 while guaranteeing a height that is logarithmic in , the total number of items stored in the map.
(2, 4)-tree (2,4)树
(2,4)树是多路搜索树的一种,也叫2-3-4树或2-4树,满足如下两个属性:
- 尺寸属性:每个内部节点最多有4个子节点,最少2个,这也是该数据结构名字的来源
- 深度属性:所有外部节点有相同的高度
假定外部节点都是None
。(2, 4)树保证了每个节点的二级数据结构足够小,同时保证了多路主树的平衡。
(2, 4)树的高度有如下定理:
Proposition:存储了n个元素的(2, 4)树的高度是
证明该定理可以考虑两种阶段情况,即每个内部节点都只有2个子节点,和有4个子节点。当每个内部节点有2个子节点,则形成了二叉树,对于外部节点的个数与高度满足,或;每个内部节点有4个节点,则外部节点的高度与节点个数满足,或。这两种极端情况使得真实的高度所在的范围是,得证。
该定理表明尺寸和深度属性对于保持多路搜索树平衡来说已经足够。此外,这个定理也显示出对(2, 4)树做检索的时间复杂度为,且二级存储结构的实现对于设计来说影响不大因为最大子节点数为常数。
插入和删除
执行对(2, 4)树的插入之前,首先执行检索。如果树中没有key ,则搜索终止于外部节点。设外部节点的父/母节点是。在中插入新元素,并在的左侧加入新的子节点。这种插入方法并没有破坏深度属性,因为我们在已经存在的外部节点中加入了新的外部节点,但却破坏了尺寸属性。节点之前是4-node,插入后成为5-node,使得不再是(2, 4)树。这种与尺寸属性的冲突称作在点的overflow。
设的子节点为,是保存在的keys。为避免overflow的影响,在上执行拆分(split)操作如下:
- 用和代替,使得
- 是3-node,子节点,存储keys 和
- 是2-node,子节点,存储keys
- 如果是的根节点,创建一个新的根节点,否则将设为的父/母节点
- 将key 插入,使其成为和的父/母节点,所以如果是的第个子节点,则和相应的成为的和个子节点。
对节点的拆分导致新的overflow在的父/母节点,可以在继续执行拆分。拆分操作或者消除overflow效果,或者将overflow传播到当前节点的父/母节点。
性能分析
最多为4,初始对新key 的所搜在每一层的时间是,因此总的时间是,考虑到树的高度是。对于单个节点插入新key的操作运行时间。执行多次split操作的次数被树的高度限制,所以插入阶段的运行时间是。(2, 4)树执行插入操作的总时间的是。
insert demo case
insert from empty demo
删除:placeholder
(2, 4)树的性能分析
(2, 4)树的渐进性能(asymptotic performance)保证了多数情况下的log边界。
- 有个节点的(2, 4)树的高度是
- 拆分、转换和融合操作的时间是
- 检索、插入、删除需要访问个节点
B-tree
B-tree/B树是一种(a, b)树,阶B树满足和。
Reference
1 Goodrich and etc., Data Structures and Algorithms in Python
网友评论