1.生成8个随机数,最好为质数
2.将value添加到BitSet中,分别使用8个随机数作为seed,对value进行hash后的result,
然后BitSet.add(Bitset.size&result,true)
3.判断一个othValue是否在bloom中,同样的逻辑,将othValue与以8个质数为seed进行hash,
得到Bitset.size&result,判断8个结果是否都在bloom中,
如果全在,则表示存在。误报率千万分之一
代码实现:
public class BloomFilter {
int[] seeds = new int[]{3, 5, 7, 11, 13, 31, 37, 61};
Hash[] funcs = new Hash[seeds.length];
int length = 2 << 28;
BitSet bitSet = new BitSet(length);
public BloomFilter() {
for (int i = 0; i < funcs.length; i++) {
funcs[i] = new Hash(length, seeds[i]);
}
}
public void addValue(String value) {
for (Hash fun : funcs) {
bitSet.set(fun.hash(value), true);
}
}
public static void main(String[] args) {
BloomFilter filter = new BloomFilter();
filter.addValue("www.baidu.com");
filter.addValue("www.sohu.com");
System.out.println(filter.contains("www.baidu.com"));
System.out.println(filter.contains("www.sina.com"));
}
private boolean contains(String value) {
boolean result = true;
for (Hash fun : funcs) {
if (result)
result = result & bitSet.get(fun.hash(value));
}
return result;
}
}
public class Hash {
int size;
int seed;
public Hash(int size, int seed) {
this.size = size;
this.seed = seed;
}
public int hash(String value) {
int len = value.length();
int result = 0;
for (int i = 0; i < len; i++) {
result = result * seed + value.charAt(i);
}
return (size - 1) & result;
}
}
网友评论