N-d Selection (MA.hashing 多属性散列)

作者: Kevifunau | 来源:发表于2018-10-29 21:05 被阅读0次

N-d Selection (MA.hashing 多属性散列)
d3之操作元素
Vue2.x + Element table 多列选中、批量选中
CSS3 多列
columns 多列布局
CSS3之多列布局
CSS3 多列
css3多列，用户界面
element-ui table跨页保存或者显示已经选中的值
散列 & 线性散列

Hashing for N-d Selection

For a pmr query like:
select * from R where A1 = C1 and A2 = C2 and ....An=Cn
if one $a_i$ is the hash key, query is O(1)
if no $a_i$ is the hash key, need to use linear scan
对于多属性的查询操作，只要有一个属性有散列索引，查询就是O（1）。

如果没有一个属性上有散列索引，那么我们可以依据所有的属性来构建一个的hash key --- Multi-attribute hashing

Multi-attribute hashing 多属性散列：

多属性散列key 的长度
use d-bit hash values （file size = b = $2^d$ pages）
长度为d的比特位表示一个散列key。
多属性散列由 $a_1, a_2,...a_n$ 每个属性值的散列值的某些比特位组成。
组成规则遵循 check vector 。
e.g. CV = <(1,0),(2.0),(3,0),(1,1),(2,1),(3,1),(1,2),(2,2)>
check vector（从左往右拼接）

比如某条记录 ('John Smith','BSC(CompSci)',1990, 99.5)
d = 6
cv = <(1,0), (1,1), (2,0), (3,0), (1,2), (2,1), (3,1), (1,3), ...>
hash1('John Smith') = ...0101010110110100
hash2('BSc(CompSci)') = ...1011111101101111
hash3(1990) = ...0001001011000000
最终该条记录的散列值就是
第1属性的第0 个比特位+ 第1属性的第1个比特位+ 第2个属性的第0个比特位+....
所以 hash( ('John Smith','BSC(CompSci)',1990, 99.5)) = 110100

当部分属性没有被查询时,比如 select * from R where name = "Green"

image.png
这种情况，需要查询的key就变成了100, 101, 110 ,111 四种可能。
假设 (a，b，？) -->会被表示成类似0001*1*111 的形式，但是二进制没有*这个东西，所以引入 stars ，将0001*1*111 转化为 bit0001010111, stars0000101000, stars中的 1 位表示 bits 中应该为*的位置。

Multi-attribute hashing 多属性散列的复杂度

多属性散列的查询复杂度和该query 中没有涉及到的属性数量有关。
比如某个query 散列key为 01001*11*1*1, 存在3个*，那么就需要查 $2^3$ 次。
所以复杂度为 $2^S$ , $S = \sum_{i \not\in Q }d_i$ .

Multi-attribute hashing 多属性散列的优化

多属性散列的优化办法就是优化 check vector cv，尽可能给容易被查的属性在散列key中分配多一些的比特位。

Consider a table with four attributes:
表 b = $10^4$ , 所以散列key长度为 log(10000,2) = 14 bits.
(branch, account, name, amount) 简写为（br,ac,nm,amt）
以下是所有可能的Query types 和相应的复杂度和用户查询概率。

image.png