散列表

作者: null12 | 来源:发表于2018-03-22 10:16 被阅读0次

    一、定义

    散列表(Hash Table,也叫哈希表),是通过把键值映射成整数来作为数组的索引,并进行访问记录的一种数据结构。

    二、基本思想

    实现散列表的关键是散列算法,即如何将任意类型的键值转化为数组的索引。通常,使用散列表进行查找分为两步:

    1. 利用散列函数将被查找的键转化为数组的一个索引。
    2. 访问索引以得到键对应的值。

    三、散列函数

    理想情况下,散列函数能将每个不同的键值转换成唯一的索引。但事实上,因为要考虑空间(内存)的使用,会出现碰撞冲突,即两个不同的键值映射到相同的索引,常见的解决碰撞冲突的方法有:拉链法线性探测法

    优秀的散列函数满足如下条件:

    1. 一致性——等价的键必然产生相等的散列值
    2. 高效性——计算简便
    3. 均匀性——均匀地散列所有的键

    对于大小为M的数组,理想的散列函数对任意键处理后,其值分布在0~M-1之间的概率应该相等。针对不同类型的键值,常见的散列函数有如下几种:

    1、除留余数法
    步骤如下:
    ①选择大小为M的数组(M应当为素数);
    ②对于任意正整数键值k,取k%M作为散列值。

    为什么M必须用素数?
    因为素数在数学上有很多特殊的性质,使用素数可以使散列后的值分布更均匀。例如,M=10k,N为正整数,N%M后的值为N的后k位。

    四、碰撞冲突

    4.1 拉链法

    基本思想:
    将大小为M的数组中的每个元素指向一条链表,链表的每个结点存储了散列值为该元素的索引的键值对。则N个键最终保存在M条链表中,链的平均长度为:N/M

    4-1 拉链法

    拉链法实现源码:

    public class SeparateChainingHashST<Key, Value> {
        private static final int INIT_CAPACITY = 4;
        private int n;                                // number of key-value pairs
        private int m;                                // hash table size
        private SequentialSearchST<Key, Value>[] st;  // array of linked-list symbol tables
    
        /**
         * Initializes an empty symbol table.
         */
        public SeparateChainingHashST() {
            this(INIT_CAPACITY);
        } 
    
        /**
         * Initializes an empty symbol table with {@code m} chains.
         * @param m the initial number of chains
         */
        public SeparateChainingHashST(int m) {
            this.m = m;
            st = (SequentialSearchST<Key, Value>[]) new SequentialSearchST[m];
            for (int i = 0; i < m; i++)
                st[i] = new SequentialSearchST<Key, Value>();
        } 
    
        // resize the hash table to have the given number of chains,
        // rehashing all of the keys
        private void resize(int chains) {
            SeparateChainingHashST<Key, Value> temp = new SeparateChainingHashST<Key, Value>(chains);
            for (int i = 0; i < m; i++) {
                for (Key key : st[i].keys()) {
                    temp.put(key, st[i].get(key));
                }
            }
            this.m  = temp.m;
            this.n  = temp.n;
            this.st = temp.st;
        }
    
        // hash value between 0 and m-1
        private int hash(Key key) {
            return (key.hashCode() & 0x7fffffff) % m;
        } 
    
        /**
         * Returns the number of key-value pairs in this symbol table.
         *
         * @return the number of key-value pairs in this symbol table
         */
        public int size() {
            return n;
        } 
    
        /**
         * Returns true if this symbol table is empty.
         *
         * @return {@code true} if this symbol table is empty;
         *         {@code false} otherwise
         */
        public boolean isEmpty() {
            return size() == 0;
        }
    
        /**
         * Returns true if this symbol table contains the specified key.
         *
         * @param  key the key
         * @return {@code true} if this symbol table contains {@code key};
         *         {@code false} otherwise
         * @throws IllegalArgumentException if {@code key} is {@code null}
         */
        public boolean contains(Key key) {
            if (key == null) throw new IllegalArgumentException("argument to contains() is null");
            return get(key) != null;
        } 
    
        /**
         * Returns the value associated with the specified key in this symbol table.
         *
         * @param  key the key
         * @return the value associated with {@code key} in the symbol table;
         *         {@code null} if no such value
         * @throws IllegalArgumentException if {@code key} is {@code null}
         */
        public Value get(Key key) {
            if (key == null) throw new IllegalArgumentException("argument to get() is null");
            int i = hash(key);
            return st[i].get(key);
        } 
    
        /**
         * Inserts the specified key-value pair into the symbol table, overwriting the old 
         * value with the new value if the symbol table already contains the specified key.
         * Deletes the specified key (and its associated value) from this symbol table
         * if the specified value is {@code null}.
         *
         * @param  key the key
         * @param  val the value
         * @throws IllegalArgumentException if {@code key} is {@code null}
         */
        public void put(Key key, Value val) {
            if (key == null) throw new IllegalArgumentException("first argument to put() is null");
            if (val == null) {
                delete(key);
                return;
            }
    
            // double table size if average length of list >= 10
            if (n >= 10*m) resize(2*m);
    
            int i = hash(key);
            if (!st[i].contains(key)) n++;
            st[i].put(key, val);
        } 
    
        /**
         * Removes the specified key and its associated value from this symbol table     
         * (if the key is in this symbol table).    
         *
         * @param  key the key
         * @throws IllegalArgumentException if {@code key} is {@code null}
         */
        public void delete(Key key) {
            if (key == null) throw new IllegalArgumentException("argument to delete() is null");
    
            int i = hash(key);
            if (st[i].contains(key)) n--;
            st[i].delete(key);
    
            // halve table size if average length of list <= 2
            if (m > INIT_CAPACITY && n <= 2*m) resize(m/2);
        } 
    
        // return keys in symbol table as an Iterable
        public Iterable<Key> keys() {
            Queue<Key> queue = new Queue<Key>();
            for (int i = 0; i < m; i++) {
                for (Key key : st[i].keys())
                    queue.enqueue(key);
            }
            return queue;
        } 
    }
    

    4.2 线性探测法

    基本思想:
    用大小为M的数组保存N个键值对,其中M>N,即内部索引数组的大小总是大于已经插入的键值对。基于这种策略的所有方法被统称为开放地址散列表

    具体步骤:

    1. 用散列函数查找键在数组中的索引;
    2. 如果其中的键和被查找的键相同,则返回键值;如果不同,则继续向后查找(索引+1,遇末尾则折回开头),直到找到该键或遇到空位置。
    4-2-1 线性探测法

    线性探测法实现源码:

    public class LinearProbingHashST<Key, Value> {
        private static final int INIT_CAPACITY = 4;
        private int n;           // number of key-value pairs in the symbol table
        private int m;           // size of linear probing table
        private Key[] keys;      // the keys
        private Value[] vals;    // the values
    
        /**
         * Initializes an empty symbol table.
         */
        public LinearProbingHashST() {
            this(INIT_CAPACITY);
        }
    
        /**
         * Initializes an empty symbol table with the specified initial capacity.
         *
         * @param capacity the initial capacity
         */
        public LinearProbingHashST(int capacity) {
            m = capacity;
            n = 0;
            keys = (Key[])   new Object[m];
            vals = (Value[]) new Object[m];
        }
    
        /**
         * Returns the number of key-value pairs in this symbol table.
         *
         * @return the number of key-value pairs in this symbol table
         */
        public int size() {
            return n;
        }
    
        /**
         * Returns true if this symbol table is empty.
         *
         * @return {@code true} if this symbol table is empty;
         *         {@code false} otherwise
         */
        public boolean isEmpty() {
            return size() == 0;
        }
    
        /**
         * Returns true if this symbol table contains the specified key.
         *
         * @param  key the key
         * @return {@code true} if this symbol table contains {@code key};
         *         {@code false} otherwise
         * @throws IllegalArgumentException if {@code key} is {@code null}
         */
        public boolean contains(Key key) {
            if (key == null) throw new IllegalArgumentException("argument to contains() is null");
            return get(key) != null;
        }
    
        // hash function for keys - returns value between 0 and M-1
        private int hash(Key key) {
            return (key.hashCode() & 0x7fffffff) % m;
        }
    
        // resizes the hash table to the given capacity by re-hashing all of the keys
        private void resize(int capacity) {
            LinearProbingHashST<Key, Value> temp = new LinearProbingHashST<Key, Value>(capacity);
            for (int i = 0; i < m; i++) {
                if (keys[i] != null) {
                    temp.put(keys[i], vals[i]);
                }
            }
            keys = temp.keys;
            vals = temp.vals;
            m    = temp.m;
        }
    
        /**
         * Inserts the specified key-value pair into the symbol table, overwriting the old 
         * value with the new value if the symbol table already contains the specified key.
         * Deletes the specified key (and its associated value) from this symbol table
         * if the specified value is {@code null}.
         *
         * @param  key the key
         * @param  val the value
         * @throws IllegalArgumentException if {@code key} is {@code null}
         */
        public void put(Key key, Value val) {
            if (key == null) throw new IllegalArgumentException("first argument to put() is null");
    
            if (val == null) {
                delete(key);
                return;
            }
    
            // double table size if 50% full
            if (n >= m/2) resize(2*m);
    
            int i;
            for (i = hash(key); keys[i] != null; i = (i + 1) % m) {
                if (keys[i].equals(key)) {
                    vals[i] = val;
                    return;
                }
            }
            keys[i] = key;
            vals[i] = val;
            n++;
        }
    
        /**
         * Returns the value associated with the specified key.
         * @param key the key
         * @return the value associated with {@code key};
         *         {@code null} if no such value
         * @throws IllegalArgumentException if {@code key} is {@code null}
         */
        public Value get(Key key) {
            if (key == null) throw new IllegalArgumentException("argument to get() is null");
            for (int i = hash(key); keys[i] != null; i = (i + 1) % m)
                if (keys[i].equals(key))
                    return vals[i];
            return null;
        }
    
        /**
         * Removes the specified key and its associated value from this symbol table     
         * (if the key is in this symbol table).    
         *
         * @param  key the key
         * @throws IllegalArgumentException if {@code key} is {@code null}
         */
        public void delete(Key key) {
            if (key == null) throw new IllegalArgumentException("argument to delete() is null");
            if (!contains(key)) return;
    
            // find position i of key
            int i = hash(key);
            while (!key.equals(keys[i])) {
                i = (i + 1) % m;
            }
    
            // delete key and associated value
            keys[i] = null;
            vals[i] = null;
    
            // rehash all keys in same cluster
            i = (i + 1) % m;
            while (keys[i] != null) {
                // delete keys[i] an vals[i] and reinsert
                Key   keyToRehash = keys[i];
                Value valToRehash = vals[i];
                keys[i] = null;
                vals[i] = null;
                n--;
                put(keyToRehash, valToRehash);
                i = (i + 1) % m;
            }
    
            n--;
    
            // halves size of array if it's 12.5% full or less
            if (n > 0 && n <= m/8) resize(m/2);
    
            assert check();
        }
    
        /**
         * Returns all keys in this symbol table as an {@code Iterable}.
         * To iterate over all of the keys in the symbol table named {@code st},
         * use the foreach notation: {@code for (Key key : st.keys())}.
         *
         * @return all keys in this symbol table
         */
        public Iterable<Key> keys() {
            Queue<Key> queue = new Queue<Key>();
            for (int i = 0; i < m; i++)
                if (keys[i] != null) queue.enqueue(keys[i]);
            return queue;
        }
    
        // integrity check - don't check after each put() because
        // integrity not maintained during a delete()
        private boolean check() {
    
            // check that hash table is at most 50% full
            if (m < 2*n) {
                System.err.println("Hash table size m = " + m + "; array size n = " + n);
                return false;
            }
    
            // check that each key in table can be found by get()
            for (int i = 0; i < m; i++) {
                if (keys[i] == null) continue;
                else if (get(keys[i]) != vals[i]) {
                    System.err.println("get[" + keys[i] + "] = " + get(keys[i]) + "; vals[i] = " + vals[i]);
                    return false;
                }
            }
            return true;
        }
    }
    

    性能分析:
    开放地址类的散列表的性能依赖于α=N/M的值,α称为散列表的使用率(0≤α<1)。
    线性探测的平均成本取决于元素在插入符号表后形成的键簇的大小。所谓键簇,就是一条连续的元素组大小,键簇越小,性能越好,如下图:

    根据数学分析,在一张大小为M并含有N=αM个键的基于线性探测的散列表中:
    命中查找所需的探测次数为:

    未命中查找所需的探测次数为:

    当α约为0.5时,查找命中所需探测次数约为1.5次,查找未命中所需探测次数约为2.5次。
    也就是说当散列表快满的时候,查找所需的探测次数是巨大的(α趋近于1),但当α<0.5时,查找所需的探测次数只在1.5~2.5之间。

    相关文章

      网友评论

        本文标题:散列表

        本文链接:https://www.haomeiwen.com/subject/zdboqftx.html