Java集合源码分析（一）HashMap

作者: 努力做一个最懒的程序员 | 来源:发表于2019-06-20 12:05 被阅读0次

Collection简易族谱.png

Java中的集合Collection是我们开发中存储数据的利器，它的孩子有List、Map、Set等，根据不同需求衍生出ArrayList、LinkList、HashMap、HashTable、HashSet等等。本文我们将探讨一下HashMap的源码实现。

正文

一、Hash是什么

Hash，一般翻译做“散列”，也有直接音译为“哈希”的，就是把任意长度的输入（又叫做预映射pre-image）通过散列算法变换成固定长度的输出，该输出就是散列值。这种转换是一种压缩映射，也就是，散列值的空间通常远小于输入的空间，不同的输入可能会散列成相同的输出，所以不可能从散列值来确定唯一的输入值。简单的说就是一种将任意长度的消息压缩到某一固定长度的消息摘要的函数。
常用 Hash 函数有：直接取余法、乘法取整法、平方取中法等。

二、HashMap是什么

HashMap是一种用于存储Key-Value类型数据的集合（哈希表），Key用于查找，Value用于存储。

三、HashMap的结构

HashMap是基于数组+链表（拉链法、链地址法）而组建的哈希表。在哈希表中进行添加，删除，查找等操作，性能十分之高，不考虑哈希冲突（当我们对某个元素进行哈希运算，得到一个存储地址，然后进行插入的时候，发现已经被其他元素占用）的情况下，仅需一次定位即可完成，时间复杂度为O(1)。
从源码中可以看到 HashMap 的主干是一个 Entry 数组，HashMap 的 Entry 实现了 Map 的 Entry 接口，持有 hash,key,value,next 属性，并重写了 hashCode 与 equals 函数。

    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

    static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }

        public final int hashCode() {
            return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
        }

        public final String toString() {
            return getKey() + "=" + getValue();
        }

        ...
    }

Java7 HashMap结构图.png

四、HashMap的特性

特性	原因
Key和Value都允许为空	HashMap不是直接存放K、V，而是使用Entry对象包装存放
Key不可重复	如果Key相同或者Key的equals(oldKey)相同会将value重新赋值
无序	HashMap的存值位置并不是从数组第0位开始的而是根据Key的hash值和数组长度位与计算得来的
线程不安全	HashMap的存取值没有用到线程锁之类的限制
新值总是放在链表头节点	直接修改头节点可以减少存取时间
长度一定为2的幂	HashMap存值位置是根据Key的hash值和数组长度位于计算得来，2的幂-1进行位与操作更容易产生均匀的index值
...	...

五、存值

Java7 HashMap存值.png

   public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        // 计算 hash 值
        int hash = hash(key);
        // 根据 hash 值与数组的长度计算将要放置的 index
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            // 通过重写 key 的 hashCode 算法或者 equals 函数可以达到相同 key 不覆盖
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

    static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        // key 的 hash 值和数组长度-1进行位与计算 index
        return h & (length-1);
    }

   private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        // 如果 key 为空，会放到数组第0位
        addEntry(0, null, value, 0);
        return null;
    }

  void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        // 将新的 key value 生成 Entry 放入头节点
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

六、取值

Java7 HashMap取值.png

    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

    private V getForNullKey() {
        if (size == 0) {
            return null;
        }
        // key 为空直接从数组第0位遍历链表
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }

    final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }

        // 通过计算 key 的 hash 值快速定位
        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

七、扩容

    void addEntry(int hash, K key, V value, int bucketIndex) {
        // 存放数量超过阀值 threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        // 发生哈希冲突
        if ((size >= threshold) && (null != table[bucketIndex])) {
            // 每次扩容到之前长度的两倍
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        // MAXIMUM_CAPACITY = 1 << 30
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        // 将旧的数据重新装载到新数组中
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }

    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    // 重新计算 hash 值
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                // 生成新的存放位置
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

删值与取值类似就不再赘述。

七、对比JDK1.8的HashMap

1、数据结构优化

Java8 HashMap结构图.png

将原来的 Entry 节点更名为 Node ，并引入了黑红树，新增了黑红树节点 TreeNode。

    // 黑红树节点
    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }
        ...
    }

新增的黑红树属性参数：

    // 桶的树化阈值：即 链表转成红黑树的阈值，在存储数据时，当链表长度 > 该值时，则将链表转换成红黑树
    static final int TREEIFY_THRESHOLD = 8;

    // 桶的链表还原阈值：即 红黑树转为链表的阈值，当在扩容（resize（））时（此时HashMap的数据存储位置会重新计算）
    // 在重新计算存储位置后，当原有的红黑树内数量 < 6时，则将 红黑树转换成链表
    static final int UNTREEIFY_THRESHOLD = 6;

    // 最小树形化容量阈值：即 当哈希表中的容量 > 该值时，才允许树形化链表 （即 将链表 转换成红黑树）
    // 否则，若桶内元素太多时，则直接扩容，而不是树形化
    // 为了避免进行扩容、树形化选择的冲突，这个值不能小于 4 * TREEIFY_THRESHOLD
    static final int MIN_TREEIFY_CAPACITY = 64;

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // 优先判断是否为黑红树
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // 链表长度大于8转换为红黑树进行处理
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

2、hash算法优化

    /**
     * 分析1：hash(key)
     * 作用：计算传入数据的哈希码（哈希值、Hash值）
     * 该函数在JDK 1.7 和 1.8 中的实现不同，但原理一样 = 扰动函数 = 使得根据key生成的哈希码（hash值）分布更加均匀、更具备随机性，避免出现hash值冲突（即指不同key但生成同1个hash值）
     * JDK 1.7 做了9次扰动处理 = 4次位运算 + 5次异或运算
     * JDK 1.8 简化了扰动函数 = 只做了2次扰动 = 1次位运算 + 1次异或运算
     */

      // JDK 1.7实现：将 键key 转换成 哈希码（hash值）操作  = 使用hashCode() + 4次位运算 + 5次异或运算（9次扰动）
      static final int hash(int h) {
        h ^= k.hashCode(); 
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
      }

      // JDK 1.8实现：将 键key 转换成 哈希码（hash值）操作 = 使用hashCode() + 1次位运算 + 1次异或运算（2次扰动）
      // 1. 取hashCode值： h = key.hashCode() 
      // 2. 高位参与低位的运算：h ^ (h >>> 16)  
      static final int hash(Object key) {
           int h;
            return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
            // a. 当key = null时，hash值 = 0，所以HashMap的key 可为null      
            // 注：对比HashTable，HashTable对key直接hashCode（），若key为null时，会抛出异常，所以HashTable的key不可为null
            // b. 当key ≠ null时，则通过先计算出 key的 hashCode()（记为h），然后 对哈希码进行 扰动处理： 按位 异或（^） 哈希码自身右移16位后的二进制
            // 可以看出这种hash在扩容后计算出的新的index只有两种可能，一种是index保持不变，另一种是index+旧的数组长度，这样就可以大大降低数据重新移动的概率。
     }

     /**
     * 计算存储位置的函数分析：indexFor(hash, table.length)
     * 注：该函数仅存在于JDK 1.7 ，JDK 1.8中实际上无该函数（直接用1条语句判断写出），但原理相同
     * 为了方便讲解，故提前到此讲解
     */
     static int indexFor(int h, int length) {  
          return h & (length-1); 
          // 将对哈希码扰动处理后的结果 与运算(&) （数组长度-1），最终得到存储在数组table的位置（即数组下标、索引）
     }

3、添加数据策略优化

将头插法改为了尾插法,将扩容后插入改为扩容前插入。

Java集合源码分析（一）HashMap

目录

正文

一、Hash是什么

二、HashMap是什么

三、HashMap的结构

四、HashMap的特性

五、存值

六、取值

七、扩容

七、对比JDK1.8的HashMap

1、数据结构优化

2、hash算法优化

3、添加数据策略优化

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读