美文网首页
数据结构与算法分析 第5章总结

数据结构与算法分析 第5章总结

作者: fjxCode | 来源:发表于2018-07-11 17:02 被阅读0次

5.2散列函数

关键字被散列函数映射为表中的某个数,称为映射。好的散列函数均匀地分配关键字。两个关键字散列到同一个值时叫冲突。

一个较好的散列函数:(注意溢出可能会引起负数,再被上tableSize)

[if !supportLists]l [endif]散列步骤:素进制哈希码、表大小除留余数、溢出补偿。

                  publicstatic int hash(String key,int tableSize){

                                    inthashVal=0;

                                    for(inti=0;i

                                                      hashVal=37*hashVal+key.charAt(i);

                                    hashVal%=tableSize;

                                    if(hashVal<0)

                                                      hashVal+=tableSize;

                                    returnhashVal;

                  }

剩下来的问题是冲突解决:分离链接法和开放定址法。

5.3分离链接法

分离链接法将散列到同一值的元素保存在链表中。使用标准库的LinkedList,新插入元素很可能被再次访问,插入到链表前端。

分离链接表的由键取值的过程是:以键的哈希值为索引取数组中的链表,在链表中找出值。

分离链接哈希表的类SeparateChainingHashTable:

[if !supportLists]l [endif]变量:链表数组theLists、容量currentSize、DEFAULT_TABLE_SIZE=101。

标准外部方法:仍为标准容器的判长空置空添删容遍。

[if !supportLists]l [endif]内部实现方法:取哈希值myhash、再哈希rehash。

[if !supportLists]l [endif]编程细节:标准方法str.hashCode()的默认素进制31、默认表大小101。

[if !supportLists]l [endif]add方法2个要点:添加元素前检查包含并头插,添加后元素数量大于哈希表大小需要再哈希。

[if !supportLists]l [endif]remove方法记得更新theSize即可。

[if !supportLists]l [endif]分离链接的再哈希方法resh:先保留旧表地址,再重申一个超过原哈希表大小2倍以上素数的新表,theSize置0(因为add方法会再次更新theSize),再将旧表元素全部添加到新表。

public class SeparateChainingHashTable {

                  privatestatic final int DEFAULT_TABLE_SIZE=101;

                  privateLinkedList[] theList;

                  privateint theSize;

                  publicSeparateChainingHashTable(){

                                    theList=(LinkedList[])newLinkedList[DEFAULT_TABLE_SIZE];

                                    for(inti=0;i

                                                      theList[i]=newLinkedList();

                                    theSize=0;

                  }

                                    publicSeparateChainingHashTable(int minTableSize){

                                    Random rand=new Random();

                                    int bits=0;

                                    while(minTableSize!=0){

                                                      bits++;

                                                      minTableSize=minTableSize>>1;

                                    }

                                    intsetSize=BigInteger.probablePrime(bits+1,rand).intValue();

                                    theList=(LinkedList[])newLinkedList[setSize];

                                    for(inti=0;i

                                                      theList[i]=newLinkedList();

                                    theSize=0;

                  }

                  publicvoid makeEmpty(){

                                    for(inti=0;i

                                                      theList[i].clear();

                                    theSize=0;

                  }

                  publicvoid insert(AnyType x){

                                    LinkedListwhitchList=theList[myhash(x)];

                                    if(!whitchList.contains(x)){

                                                      whitchList.addFirst(x);

            theSize++;

                                                      if(theSize>theList.length)

                                                                        rehash();

                                    }

                  }

                  publicvoid remove(AnyType x){

                                    LinkedListwhitchList=theList[myhash(x)];

                                    if(whitchList.contains(x)){

                                                      whitchList.remove(x);

                                                      theSize--;

                                    }

                  }

                  publicboolean contains(AnyType x){

                                    LinkedListwhitchList=theList[myhash(x)];

                                    returnwhitchList.contains(x);

                  }

                  privateint myhash(AnyType x){

                                    int hashVal=x.hashCode();

                                    hashVal%=theList.length;

                                    if(hashVal<0)

                                                      hashVal+=theList.length;

                                    returnhashVal;

                  }

                                    privatevoid rehash(){

                                    intminNewLength=2*theList.length;

                                    intbits=0;

                                    while(minNewLength!=0){

                                                      bits++;

                                                      minNewLength=minNewLength>>1;

                                    }

                                    Randomrand=new Random();

                                    intnewLength=BigInteger.probablePrime(bits+1,rand).intValue();

                                    LinkedList[]oldList=theList;

                                    theList=(LinkedList[])newLinkedList[newLength];

                                    for(inti=0;i

                                                      theList[i]=newLinkedList();

                                    theSize=0;

                                    for(inti=0;i

                                                      for(AnyTypex:oldList[i])

                                                                        insert(x);

                  }

}

5.4开放定址的散列表

分离链接的散列表的缺点是链表分配空间耗时长,同时还要求附加的数据结构实现。

开放定址散列表的理论基础(三种探测方法):

[if !supportLists]l [endif]探测散列表公式:[if !vml]

[endif] 

[if !supportLists]l  [endif]装填因子为f的线性探测预期插入和不成功查找次数(1+1/(1-f)^2)/2,成功查找次数(1+1/(1-f))/2。使用无链接的探测散列表装填因子应低于0.5,才能保证插入和不成功查找的预期次数小于3。

[if !supportLists]l [endif]定理:使用平方探测f(i)=i^2,素表的平方探测,对于不过半装填必能插入新元素。

(即使装填数仅仅比表大小多1个,也可能装填失败)

[if !supportLists]l [endif]"平方探测的偏移量"应每次控制后步进2,从而可以避免乘除法运算。

[if !supportLists]l [endif]聚集:散列到区块中的任何关键字都需要多次试选单元才能被添加,这称为一次聚集。散列到同一位置上的元素将探测相同的备选单元,这称为二次聚集。平方探测仅排除了一次聚焦。解决二次聚集的方法是双散列。

[if !supportLists]l [endif]。双散列公式f(i)=i*hash2(x)。第二散列函数hash2(x)=Q-(xmodQ)。双散列对于期望探测次数的降低不大,且耗计算量,实际情况中更多采用平方探测。

[if !supportLists]l [endif]探测散列表使用活动标记的原因:探测散列表不能执行标准的删除操作,否则依赖当前元素的后续冲突元素将无法删除(类似于断链)。解决的办法的使用惰性删除,即将元素和活动标记封装在一起。

元素HashEntry状态有三种:空null(EMPTY)、非空且活动(ACTIVE)、非空非活动(DELETE)。意义在于惰性删除避免分配空间。

[if !supportLists]l [endif]探测散列表的再散列的策略:(1)当表装填到一半时,(2)遇到插入失败时,(3)达到一定装填因子时(途中策略)。通常选取合适装填因子的第3种方案。

[if !supportLists]l [endif] 

平方探测散列表QuadraticProbingHashTable:

[if !supportLists]l [endif]HashEntry的内部类、HashEntry的数组theArray、元素数theSize、默认哈希表大小DEFAULT_TABLE_SIZE=11。

开放定址的散列表将关键字封闭为带有活动标记的HashEntry类。

标准方法同容器类。

[if !supportLists]l [endif]内部实现方法:哈希计算myhash、再哈希rehash。用于初始化和插入的扩容allocateArray、项的活动判断isActive、散列冲突解决方法findPos。(无链散列表有"算哈/重哈/散冲/活动项"的内部实现方法)

[if !supportLists]l [endif]构造方法:只有数组实现的构造与置空的代码是重复的,借用置空方法实现构造。

[if !supportLists]l [endif]内部实现的活动项方法isActive:元素非空且活动为true。

[if !supportLists]l [endif]内部实现的无散冲定位方法findPos:冲突解释为哈希索引处非空且元素不等,冲突则偏移,溢出则减去哈希表大小。偏移量为常为线性探测,偏移量初值为1每次偏移步进2则为平方探测。

[if !supportLists]l [endif]包含方法contains:需要非空且活动。

[if !supportLists]l [endif]插入方法insert:先包含检测,再加元素并更新theSize,"元素数量"超过"半数哈希表大小"则再哈希。

[if !supportLists]l [endif]删除方法remove:将"非空且活动"置为"非空非活动"即可。

[if !supportLists]l [endif]内部实现的重哈方法rehash:保存旧表地址,以超过原哈希表两倍大小的素数扩容新表。先将theSize置0再将旧表元素添加到新表。

public class QuadraticProbingHashTable {

                  privatestatic class HashEntry{

                                    privateAnyType element;

                                    privateboolean isActive;

                                    publicHashEntry(AnyType x){

                                                      this.element=x;this.isActive=true;

                                    }

                                    publicHashEntry(AnyType x,boolean i){

                                                      this.element=x;this.isActive=i;

                                    }

                  }

                  privatestatic final int DEFAULT_TABLE_SIZE=11;

                  privateHashEntry[] theArray;

                  privateint theSize;

                  publicQuadraticProbingHashTable(){

                                    this(DEFAULT_TABLE_SIZE);

                  }

                  publicQuadraticProbingHashTable(int minTableSize){

                                    theArray=newHashEntry[minTableSize];

                                    makeEmpty();

                  }

                  publicvoid makeEmpty(){

                                    for(inti=0;i

                                                      theArray[i]=null;

                                    theSize=0;

                  }

                  publicboolean contains(AnyType x){

                                    intcurrentPos=findPos(x);

                                    returnisActive(currentPos);

                  }

                  publicvoid insert(AnyType x){

                                    intcurrentPos=findPos(x);

                                    if(isActive(currentPos))

                                                      return;

                                    theArray[currentPos]=newHashEntry(x,true);

                                    if(++theSize>theArray.length/2)

                                                      rehash();

                  }

                  publicvoid remove(AnyType x){

                                    intcurrentPos=findPos(x);

                                    if(isActive(currentPos))

                                                      theArray[currentPos].isActive=false;

                  }

                  privateint myhash(AnyType x){

                                    inthashVal=x.hashCode();

                                    hashVal%=theArray.length;

                                    if(hashVal<0)

                                                      hashVal+=theArray.length;

                                    returnhashVal;

                  }

                  privatevoid rehash(){

                                    intminNewLength=2*theArray.length;

                                    intbits=1;

                                    while(minNewLength!=0){

                                                      bits++;

                                                      minNewLength=minNewLength>>1;

                                    }

                                    Randomrand=new Random();

                                    intnewLength=BigInteger.probablePrime(bits+1,rand).intValue();

                                    HashEntry[]oldArray=theArray;

                                    theArray=(HashEntry[])newHashEntry[newLength];

                                    for(inti=0;i

                                                      theArray[i]=null;

                                    theSize=0;

                                    for(inti=0;i

                                                      if(oldArray[i]!=null&&oldArray[i].isActive==true)

                                                                        insert(oldArray[i].element);                                   

                  }

                  privateint findPos(AnyType x){

                                    intcurrentPos=myhash(x);

                                    intoffset=1;

                                    while(theArray[currentPos]!=null&&!theArray[currentPos].element.equals(x)){

                                                      currentPos+=offset;//注意正是因为惰性的删除才保证了如此简单的非空检查。

                                                      offset+=2;//平方探测

                                                      if(currentPos>=theArray.length)

                                                                        currentPos-=theArray.length;

                                    }

                                    returncurrentPos;

                  }

                  privateboolean isActive(int currentPos){

                                    returntheArray[currentPos]!=null&&theArray[currentPos].isActive;

                  }

}

5.6标准库的散列表

HashMap的性能常优于TreeMap。

关键字必需定义equals()和hashCode()方法。如String类自带hash变量闪存散列代码,当String被修改时hash变量被重置为0。

开放问题:大整数的桶排序是否可以改进为对哈希值排序?

5.7可扩展散列

分离链接散列和探测散列,问题在于一次查找可能访问多个区块。并且再散列的巨大代价为O(N)次磁盘访问。

可扩散列用于数据太大以致不能一次装入主存。对比B树的实现,将根D装入主存,而对磁盘内容进行索引,关键问题是降低分支系数和树结构的分裂和扩展。

可扩展散列将"哈希值的前D个比特数"存为根目录的一个元素。根目录的最大项数2^D(但不能超过块大小)。

树结构的分裂扩展只会导致目录的重写,而不会导致其它树叶访问,避免磁盘读写。

目录期望大小为2^D或O(N^(1+1/M)/M),过小的M(每片树叶最多存储的元素数)会导致目录过大。则需要做二级索引,也就是让树叶包含指向记录的链而不是实际的链,而且无可避免地使用二次磁盘访问。

散列的最坏情况来自于实现错误。二叉查找树的最坏情形来自于"有序的输入"。平衡查找树的实现代价较高。因此如果不需要排序(返回最大值或最小值)或不确定输入是否有序时可以使用散列数据结构。

应用:

[if !supportLists]l [endif]编译器使用散列跟踪源码中声明的变量,即符号表(symbol table)。

[if !supportLists]l [endif]游戏编程中,通过计算基于位置的散列跟踪和存储一些简单的运动路径。当同样的位置再次出现可以避免昂贵的重复计算。

[if !supportLists]l [endif]在线拼写检查程序。

相关文章

网友评论

      本文标题:数据结构与算法分析 第5章总结

      本文链接:https://www.haomeiwen.com/subject/odahpftx.html