美文网首页
数据结构与算法分析 第5章总结

数据结构与算法分析 第5章总结

作者: fjxCode | 来源:发表于2018-07-11 17:02 被阅读0次

    5.2散列函数

    关键字被散列函数映射为表中的某个数,称为映射。好的散列函数均匀地分配关键字。两个关键字散列到同一个值时叫冲突。

    一个较好的散列函数:(注意溢出可能会引起负数,再被上tableSize)

    [if !supportLists]l [endif]散列步骤:素进制哈希码、表大小除留余数、溢出补偿。

                      publicstatic int hash(String key,int tableSize){

                                        inthashVal=0;

                                        for(inti=0;i

                                                          hashVal=37*hashVal+key.charAt(i);

                                        hashVal%=tableSize;

                                        if(hashVal<0)

                                                          hashVal+=tableSize;

                                        returnhashVal;

                      }

    剩下来的问题是冲突解决:分离链接法和开放定址法。

    5.3分离链接法

    分离链接法将散列到同一值的元素保存在链表中。使用标准库的LinkedList,新插入元素很可能被再次访问,插入到链表前端。

    分离链接表的由键取值的过程是:以键的哈希值为索引取数组中的链表,在链表中找出值。

    分离链接哈希表的类SeparateChainingHashTable:

    [if !supportLists]l [endif]变量:链表数组theLists、容量currentSize、DEFAULT_TABLE_SIZE=101。

    标准外部方法:仍为标准容器的判长空置空添删容遍。

    [if !supportLists]l [endif]内部实现方法:取哈希值myhash、再哈希rehash。

    [if !supportLists]l [endif]编程细节:标准方法str.hashCode()的默认素进制31、默认表大小101。

    [if !supportLists]l [endif]add方法2个要点:添加元素前检查包含并头插,添加后元素数量大于哈希表大小需要再哈希。

    [if !supportLists]l [endif]remove方法记得更新theSize即可。

    [if !supportLists]l [endif]分离链接的再哈希方法resh:先保留旧表地址,再重申一个超过原哈希表大小2倍以上素数的新表,theSize置0(因为add方法会再次更新theSize),再将旧表元素全部添加到新表。

    public class SeparateChainingHashTable {

                      privatestatic final int DEFAULT_TABLE_SIZE=101;

                      privateLinkedList[] theList;

                      privateint theSize;

                      publicSeparateChainingHashTable(){

                                        theList=(LinkedList[])newLinkedList[DEFAULT_TABLE_SIZE];

                                        for(inti=0;i

                                                          theList[i]=newLinkedList();

                                        theSize=0;

                      }

                                        publicSeparateChainingHashTable(int minTableSize){

                                        Random rand=new Random();

                                        int bits=0;

                                        while(minTableSize!=0){

                                                          bits++;

                                                          minTableSize=minTableSize>>1;

                                        }

                                        intsetSize=BigInteger.probablePrime(bits+1,rand).intValue();

                                        theList=(LinkedList[])newLinkedList[setSize];

                                        for(inti=0;i

                                                          theList[i]=newLinkedList();

                                        theSize=0;

                      }

                      publicvoid makeEmpty(){

                                        for(inti=0;i

                                                          theList[i].clear();

                                        theSize=0;

                      }

                      publicvoid insert(AnyType x){

                                        LinkedListwhitchList=theList[myhash(x)];

                                        if(!whitchList.contains(x)){

                                                          whitchList.addFirst(x);

                theSize++;

                                                          if(theSize>theList.length)

                                                                            rehash();

                                        }

                      }

                      publicvoid remove(AnyType x){

                                        LinkedListwhitchList=theList[myhash(x)];

                                        if(whitchList.contains(x)){

                                                          whitchList.remove(x);

                                                          theSize--;

                                        }

                      }

                      publicboolean contains(AnyType x){

                                        LinkedListwhitchList=theList[myhash(x)];

                                        returnwhitchList.contains(x);

                      }

                      privateint myhash(AnyType x){

                                        int hashVal=x.hashCode();

                                        hashVal%=theList.length;

                                        if(hashVal<0)

                                                          hashVal+=theList.length;

                                        returnhashVal;

                      }

                                        privatevoid rehash(){

                                        intminNewLength=2*theList.length;

                                        intbits=0;

                                        while(minNewLength!=0){

                                                          bits++;

                                                          minNewLength=minNewLength>>1;

                                        }

                                        Randomrand=new Random();

                                        intnewLength=BigInteger.probablePrime(bits+1,rand).intValue();

                                        LinkedList[]oldList=theList;

                                        theList=(LinkedList[])newLinkedList[newLength];

                                        for(inti=0;i

                                                          theList[i]=newLinkedList();

                                        theSize=0;

                                        for(inti=0;i

                                                          for(AnyTypex:oldList[i])

                                                                            insert(x);

                      }

    }

    5.4开放定址的散列表

    分离链接的散列表的缺点是链表分配空间耗时长,同时还要求附加的数据结构实现。

    开放定址散列表的理论基础(三种探测方法):

    [if !supportLists]l [endif]探测散列表公式:[if !vml]

    [endif] 

    [if !supportLists]l  [endif]装填因子为f的线性探测预期插入和不成功查找次数(1+1/(1-f)^2)/2,成功查找次数(1+1/(1-f))/2。使用无链接的探测散列表装填因子应低于0.5,才能保证插入和不成功查找的预期次数小于3。

    [if !supportLists]l [endif]定理:使用平方探测f(i)=i^2,素表的平方探测,对于不过半装填必能插入新元素。

    (即使装填数仅仅比表大小多1个,也可能装填失败)

    [if !supportLists]l [endif]"平方探测的偏移量"应每次控制后步进2,从而可以避免乘除法运算。

    [if !supportLists]l [endif]聚集:散列到区块中的任何关键字都需要多次试选单元才能被添加,这称为一次聚集。散列到同一位置上的元素将探测相同的备选单元,这称为二次聚集。平方探测仅排除了一次聚焦。解决二次聚集的方法是双散列。

    [if !supportLists]l [endif]。双散列公式f(i)=i*hash2(x)。第二散列函数hash2(x)=Q-(xmodQ)。双散列对于期望探测次数的降低不大,且耗计算量,实际情况中更多采用平方探测。

    [if !supportLists]l [endif]探测散列表使用活动标记的原因:探测散列表不能执行标准的删除操作,否则依赖当前元素的后续冲突元素将无法删除(类似于断链)。解决的办法的使用惰性删除,即将元素和活动标记封装在一起。

    元素HashEntry状态有三种:空null(EMPTY)、非空且活动(ACTIVE)、非空非活动(DELETE)。意义在于惰性删除避免分配空间。

    [if !supportLists]l [endif]探测散列表的再散列的策略:(1)当表装填到一半时,(2)遇到插入失败时,(3)达到一定装填因子时(途中策略)。通常选取合适装填因子的第3种方案。

    [if !supportLists]l [endif] 

    平方探测散列表QuadraticProbingHashTable:

    [if !supportLists]l [endif]HashEntry的内部类、HashEntry的数组theArray、元素数theSize、默认哈希表大小DEFAULT_TABLE_SIZE=11。

    开放定址的散列表将关键字封闭为带有活动标记的HashEntry类。

    标准方法同容器类。

    [if !supportLists]l [endif]内部实现方法:哈希计算myhash、再哈希rehash。用于初始化和插入的扩容allocateArray、项的活动判断isActive、散列冲突解决方法findPos。(无链散列表有"算哈/重哈/散冲/活动项"的内部实现方法)

    [if !supportLists]l [endif]构造方法:只有数组实现的构造与置空的代码是重复的,借用置空方法实现构造。

    [if !supportLists]l [endif]内部实现的活动项方法isActive:元素非空且活动为true。

    [if !supportLists]l [endif]内部实现的无散冲定位方法findPos:冲突解释为哈希索引处非空且元素不等,冲突则偏移,溢出则减去哈希表大小。偏移量为常为线性探测,偏移量初值为1每次偏移步进2则为平方探测。

    [if !supportLists]l [endif]包含方法contains:需要非空且活动。

    [if !supportLists]l [endif]插入方法insert:先包含检测,再加元素并更新theSize,"元素数量"超过"半数哈希表大小"则再哈希。

    [if !supportLists]l [endif]删除方法remove:将"非空且活动"置为"非空非活动"即可。

    [if !supportLists]l [endif]内部实现的重哈方法rehash:保存旧表地址,以超过原哈希表两倍大小的素数扩容新表。先将theSize置0再将旧表元素添加到新表。

    public class QuadraticProbingHashTable {

                      privatestatic class HashEntry{

                                        privateAnyType element;

                                        privateboolean isActive;

                                        publicHashEntry(AnyType x){

                                                          this.element=x;this.isActive=true;

                                        }

                                        publicHashEntry(AnyType x,boolean i){

                                                          this.element=x;this.isActive=i;

                                        }

                      }

                      privatestatic final int DEFAULT_TABLE_SIZE=11;

                      privateHashEntry[] theArray;

                      privateint theSize;

                      publicQuadraticProbingHashTable(){

                                        this(DEFAULT_TABLE_SIZE);

                      }

                      publicQuadraticProbingHashTable(int minTableSize){

                                        theArray=newHashEntry[minTableSize];

                                        makeEmpty();

                      }

                      publicvoid makeEmpty(){

                                        for(inti=0;i

                                                          theArray[i]=null;

                                        theSize=0;

                      }

                      publicboolean contains(AnyType x){

                                        intcurrentPos=findPos(x);

                                        returnisActive(currentPos);

                      }

                      publicvoid insert(AnyType x){

                                        intcurrentPos=findPos(x);

                                        if(isActive(currentPos))

                                                          return;

                                        theArray[currentPos]=newHashEntry(x,true);

                                        if(++theSize>theArray.length/2)

                                                          rehash();

                      }

                      publicvoid remove(AnyType x){

                                        intcurrentPos=findPos(x);

                                        if(isActive(currentPos))

                                                          theArray[currentPos].isActive=false;

                      }

                      privateint myhash(AnyType x){

                                        inthashVal=x.hashCode();

                                        hashVal%=theArray.length;

                                        if(hashVal<0)

                                                          hashVal+=theArray.length;

                                        returnhashVal;

                      }

                      privatevoid rehash(){

                                        intminNewLength=2*theArray.length;

                                        intbits=1;

                                        while(minNewLength!=0){

                                                          bits++;

                                                          minNewLength=minNewLength>>1;

                                        }

                                        Randomrand=new Random();

                                        intnewLength=BigInteger.probablePrime(bits+1,rand).intValue();

                                        HashEntry[]oldArray=theArray;

                                        theArray=(HashEntry[])newHashEntry[newLength];

                                        for(inti=0;i

                                                          theArray[i]=null;

                                        theSize=0;

                                        for(inti=0;i

                                                          if(oldArray[i]!=null&&oldArray[i].isActive==true)

                                                                            insert(oldArray[i].element);                                   

                      }

                      privateint findPos(AnyType x){

                                        intcurrentPos=myhash(x);

                                        intoffset=1;

                                        while(theArray[currentPos]!=null&&!theArray[currentPos].element.equals(x)){

                                                          currentPos+=offset;//注意正是因为惰性的删除才保证了如此简单的非空检查。

                                                          offset+=2;//平方探测

                                                          if(currentPos>=theArray.length)

                                                                            currentPos-=theArray.length;

                                        }

                                        returncurrentPos;

                      }

                      privateboolean isActive(int currentPos){

                                        returntheArray[currentPos]!=null&&theArray[currentPos].isActive;

                      }

    }

    5.6标准库的散列表

    HashMap的性能常优于TreeMap。

    关键字必需定义equals()和hashCode()方法。如String类自带hash变量闪存散列代码,当String被修改时hash变量被重置为0。

    开放问题:大整数的桶排序是否可以改进为对哈希值排序?

    5.7可扩展散列

    分离链接散列和探测散列,问题在于一次查找可能访问多个区块。并且再散列的巨大代价为O(N)次磁盘访问。

    可扩散列用于数据太大以致不能一次装入主存。对比B树的实现,将根D装入主存,而对磁盘内容进行索引,关键问题是降低分支系数和树结构的分裂和扩展。

    可扩展散列将"哈希值的前D个比特数"存为根目录的一个元素。根目录的最大项数2^D(但不能超过块大小)。

    树结构的分裂扩展只会导致目录的重写,而不会导致其它树叶访问,避免磁盘读写。

    目录期望大小为2^D或O(N^(1+1/M)/M),过小的M(每片树叶最多存储的元素数)会导致目录过大。则需要做二级索引,也就是让树叶包含指向记录的链而不是实际的链,而且无可避免地使用二次磁盘访问。

    散列的最坏情况来自于实现错误。二叉查找树的最坏情形来自于"有序的输入"。平衡查找树的实现代价较高。因此如果不需要排序(返回最大值或最小值)或不确定输入是否有序时可以使用散列数据结构。

    应用:

    [if !supportLists]l [endif]编译器使用散列跟踪源码中声明的变量,即符号表(symbol table)。

    [if !supportLists]l [endif]游戏编程中,通过计算基于位置的散列跟踪和存储一些简单的运动路径。当同样的位置再次出现可以避免昂贵的重复计算。

    [if !supportLists]l [endif]在线拼写检查程序。

    相关文章

      网友评论

          本文标题:数据结构与算法分析 第5章总结

          本文链接:https://www.haomeiwen.com/subject/odahpftx.html