美文网首页
8. 全域哈希和完全哈希

8. 全域哈希和完全哈希

作者: 邢昱 | 来源:发表于2018-08-13 17:41 被阅读0次

A fundamental weakness of hashing:
For any choice of hash function, there exists a bad set of keys that all hash to the same slot.
避免方法:Randomness
The idea is that you choose a hash function at random.
The name of the scheme is universal hashing.

Universal Hashing

Def. Let U be a universe of keys and H be a finite collection of hash functions mapping U to the slots in our hash table.
H is universal: if ∀x:∀y: x∈U ∧ y∈U ∧ x≠y, | {h∈H: h(x) = h(y)} | = |H|/m

Theorem

If we choose h randomly from the set of hash functions H and then we suppose we're hashing n keys into m slots in table T. Then for given key x, the expected number of collision with x = n/m = α(the load factor of the table)

proof

Let Cx be the random variable denoting the total number of collisions of keys in T with x.
C_{xy}=\begin{cases} 1, h(x) = h(y)\\ 0, otherwise \end{cases}
Note: E[Cxy] = 1/m and Cx = \sum_{y∈T*{x}}{C_{xy}}

Constructing a universal hash function

Let m be prime. Decompose key k into r + 1 digits:
k=<k0, k1, ..., kr> where 0 < ki <= m-1
Pick a = <a0, a1, ..., ar>, each ai is chosen randomly from {0, 1, ..., m-1}
Define ha(k) = \left(\sum_{i = 0}^r{a_ik_i}\right)mod(m)
How big is H? |H| = mr+1

Theorem

H is universal

proof

Let x = <x0, x1, ..., xr>
y = <y0, y1, ..., yr> be distinct keys.

They differ in at least one digit, without loss of generality position 0. For how many ha ∈ H do x and y collide?
Must have ha(x) = ha(y).
⇒\sum_{i=0}^r{a_ix_i}\equiv\sum_{i=0}^r{a_iy_i}(mod(m))
⇒\sum_{i=0}^r{a_i(x_i-y_i)}\equiv0(mod(m))
⇒a_0(x_0-y_0) + \sum_{i=1}^r{a_i(x_i-y_i)}\equiv0(mod(m))
⇒a_0(x_0-y_0) \equiv -\sum_{i=1}^r{a_i(x_i-y_i)}(mod(m))
⇒a_0 \equiv (-\sum_{i=1}^r{a_i(x_i-y_i)})*(x_0-y_0)^{-1}(mod(m))
Thus for any choices of a1, a2, ... , ar, exactly 1 of the m choices for a0 causes x and y to collide, and no collision for other m-1 choices for a0.
has that cause x, y to collides = mr = |H|/m

Number theory fact

Let m be prime for any z∈Zm(integers mod m) such that z \not\equiv 0, ∃ unique z-1 ∈ Zm such that z*z-1 \equiv 1 (mod(m))

Perfect Hashing

Problem: Given n keys, construct a static hash table of size m = O(n) such that search takes O(1) time in the worst case.
Idea: 2-level scheme with universal hashing at both levels.
No collision at level 2.
The reason why we don't have collisions in the second level:
If there are ni item that hash to level one slot i, then we're going to use mi = ni2 in the level two hash table. Under these circumstances, it's easy to find hash functions such that there are no collisions.

Level 2 analysis

Theorem

Hash n keys into m = n2 slots, using a random hash function in a universal set H. Then the expected number of collision is less than on half.

proof

The probability that two given keys collide under h is 1/(n2).
There are n(n-1)/2 pairs of keys.
E[#collisions] = (n(n-1)/2)*(1/(n2)) = n(n-1)/(2n2) < 1/2

Markov's Inequality

For random variable X \geq 0, Pr{X \geq t} \leq E[X]/t
proof:
E[x] = \sum_{x=0}^\infty{x*Pr\{X = \})} \geq \sum_{x=t}^\infty{x*Pr\{X = x\}}
\geq \sum_{x=t}^\infty{t*Pr\{X = x\}} = t*Pr\{x \geq t\}

Corollary

Pr{no collisions} \geq 1/2
proof Pr{\geq 1 cooision} \leq E[#collisoins]/1 < 1/2
To find a good level-2 hash function, just test a few at random. Find one quickly, since at least half will work.

Analysis of storage

For level 1, choose m = n, and let ni be the random variable for the number for #keys that hash to slot i in T. Let mi = ni2 slots in each level 2 table S sub i.
E[total storage] = n + E[\sum_{i=0}^{m-1}{θ(n_i^2)}] = θ(n) by bucket sort analysis.

(⇒⇔↔¬∧∨∀∃∈⊂⊃∪∩→← )

相关文章

  • 8. 全域哈希和完全哈希

    A fundamental weakness of hashing:For any choice of hash ...

  • Algorithms 算法学习笔记20180416

    今天主要学习的是MIT课程算法导论中的"哈希表",“全域哈希和完全哈希”这两课。老师的讲解是层层深入,循循善诱的。...

  • MIT算法导论八 全域哈希和完全哈希

    - 全域哈希- 完全哈希 普通哈希的一个缺点:对任意的hash函数h,总存在一组keys,让他们都映射到同一个槽里...

  • 哈希表

    映射(Map) 和 集合(Set) 哈希表(HashTable)、哈希函数(Hash Function)、哈希碰撞...

  • 【区块链】哈希算法在比特币系统作用

    比特币地址是由公钥经过单向的加密哈希算法生成。被广播的交易会有哈希值,每个区块也会有哈希值。 哈希算法和哈希值究竟...

  • 哈希函数和哈希表

    哈希函数 定义 输入域是无穷的,输出域S是有限的。 输入参数一旦确定,返回值一定是相同的,不存在随机性;多个不同的...

  • 哈希 IN 哈希

    具体实例: 把下面的哈希值进行转换成哈希in哈希 转换后: 打印hash,哈希不能直接打印,必须在foreach循...

  • 哈希表

    哈希=散列哈希法=散列法,对应的哈希表=散列表什么是哈希法?哈希法思想:首先在元素的关键字k和元素的存储位置p之间...

  • 左神初级算法课程第六讲笔记-哈希

    问题一:哈希函数和哈希表 哈希函数的性质:①输入域无穷大;②输出域有穷尽;③哈希函数不是随机的,多次相同输入计算返...

  • Java数据结构_哈希表_基本概念

    本文目标 哈希表的基本概念,哈希冲突,哈希函数 什么是哈希表 哈希表也叫做散列表(hash有剁碎的意思)哈希表是空...

网友评论

      本文标题:8. 全域哈希和完全哈希

      本文链接:https://www.haomeiwen.com/subject/ijjybftx.html