HashMap底层实现及源码分析

作者: SevenLin1993 | 来源:发表于2018-07-21 16:35 被阅读4次

Java中HashMap底层实现原理(JDK1.8)源码分析
HashMap底层实现及源码分析
What？HashMap的实现原理？
HashMap源码分析
【老实李】JDK1.8中HashMap的红黑树
Android面试复习
[简单集合] HashSet源码分析
jdk源码浅读-HashSet
HashMap 和 HashSet 区别
HashMap源码分析

HashMap类继承关系

image.png

HashMap简介

先来看看JDK中对HashMap的描述：

Hash table based implementation of the Map interface.  
This implementation provides all of the optional map operations, and permits null values and the null key.  
(The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.)  
This class makes no guarantees as to the order of the map; 
in particular, it does not guarantee that the order will remain constant over time.


基于哈希表的 Map 接口的实现。
此实现提供所有可选的映射操作，并允许使用 null 值和 null 键。(除了非同步和允许使用 null 之外，HashMap 类与 Hashtable 大致相同。)
此类不保证映射的顺序，特别是它不保证该顺序恒久不变。

HashMap数据结构

每一个数据元素集合其背后都有其对于的数据结构，而HashMap的数据结构是怎样的？

在JDK7中，HashMap的数据结构为：数组+链表

数组，在内存中使用连续的空间来存储数据，查询遍历速度快(O(1))，但是插入删除速度慢(O(n))

链表，在内存中使用一系列离散空间，其插入删除操作快(O(1))，但是查询遍历却比较慢(O(n))

HashMap就很好地将两种数据结构模型结合起来，用数组来保存链表，结合两者的优点，大致数据结构如图：

image.png

其中Table为Entry数组，长度为Capacity，其中第一个元素用于保存key为null的Entry。

/**
 * HashMap中Entry的内部实现
 */
static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }

        public final int hashCode() {
            return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
        }

        public final String toString() {
            return getKey() + "=" + getValue();
        }

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) {
        }

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m) {
        }
    }

影响HashMap性能的两个重要参数：

`initial capacity`（初始容量）

capacity是哈希表的长度,而initial capacity（默认值为16）则是在创建哈希表的时候的初始容量。

`load factor`（加载因子）

load factor（默认值为0.75）,是哈希表在其容量自动增加之前可以达到多满的一种尺度。当哈希表中的条目数超出了加载因子与当前容量的乘积时，则要对该哈希表进行rehash操作（即重建内部数据结构），从而哈希表将具有大约两倍的桶数。

load factor默认使用0.75是对时间和空间成本上的一种折中，load factor越大空间开销越小，每个链表中存储的数据将会越多，而查询成本将会变大。所以，在使用HashMap的时候，尽可能地设置适当的初始容量，减少rehash的操作，提高性能。

HashMap的PUT操作（插入）

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

对PUT实现的描述大致如下：

先判断哈希表是否已经扩充，若没有对其扩充
如果key为null,将其放入哈希表的第一个位置
如果key不为null,使用hash函数对key进行哈希计算得到Hash值，通过Hash值能够确定其在哈希表的位置
进行保存操作，如果链表中已经存存在相同的key则进行替换并返回旧值，如果这是一个新的key则保存新值并返回null（在保存新的k-v的时候，如果map的长度超过threshold的话，将会进行rehash操作）,保存的新值总是在链表的头部，也就是位于指向数组的位置

HashMap的GET操作（查找）

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it's also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

对GET实现的描述大致如下：

如果key为null,则从哈希表的第一个元素中获取value
如果key不为null,则通过hash函数取到hash值，通过Hash值确定其在哈希表的位置取得Entry，对Entry遍历最终取到value

扩容机制（rehash实现，也就是resize方法）

将Map中的内容进行一次rehash之后放入一个更大的容量的数组，resize方法在Map的key数量达到threshold的时候自动调用。

如果当前的Map容量已经达到MAXIMUM_CAPACITY(HashMap的最大容量，JDK7中是2^30)时，将不会进行扩容操作，但是threshold会被设置为Integer.MAX_VALUE以防止继续调用resize方法

具体实现如下：

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

这里主要关注transfer方法，用于把旧数组拷贝到新数组：

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

遍历数组，将数组上的元素逐个拷贝到新数组的新元素上，拷贝之后每个Entry的数据会颠倒（也就是链头变成链尾）

并发同时访问HashMap

由于HashMap不是线程安全的，在多线程的场景中使用需要通过同步锁对其访问，或者通过Collections.synchronizedMap(new HashMap(...))对HashMap进行包装之后使用；
Java中也提供了专门用于多线程并发使用的HashMap,那就是java.util.ConcurrentHashMap,也能避免并发操作带来的问题

Java8之HashMap新变化

数据结构变化

Java8之后，HashMap内部的实现以及数据结构发生了一定的变化，数据存储变为：数组+链表+红黑树（多了红黑树），当链表的长度大于8的时候，改为红黑树。

其中链表的结构保持一致，只是类名从Entry改为Node,其中红黑树的内部实现如下：

/**
 * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
 * extends Node) so can be used as extension of either regular or
 * linked node.
 */
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next);
    }

    /**
     * Returns root of tree containing this node.
     */
    final TreeNode<K,V> root() {
        for (TreeNode<K,V> r = this, p;;) {
            if ((p = r.parent) == null)
                return r;
            r = p;
        }
    }

    /**
     * Ensures that the given root is the first node of its bin.
     */
    static <K,V> void moveRootToFront(Node<K,V>[] tab, TreeNode<K,V> root) {
        ...
    }

    /**
     * Finds the node starting at root p with the given hash and key.
     * The kc argument caches comparableClassFor(key) upon first use
     * comparing keys.
     */
    final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
        ...
    }

    /**
     * Calls find for root node.
     */
    final TreeNode<K,V> getTreeNode(int h, Object k) {
        ...
    }

    /**
     * Tie-breaking utility for ordering insertions when equal
     * hashCodes and non-comparable. We don't require a total
     * order, just a consistent insertion rule to maintain
     * equivalence across rebalancings. Tie-breaking further than
     * necessary simplifies testing a bit.
     */
    static int tieBreakOrder(Object a, Object b) {
        ...
    }

    /**
     * Forms tree of the nodes linked from this node.
     * @return root of tree
     */
    final void treeify(Node<K,V>[] tab) {
        ...
    }

    /**
     * Returns a list of non-TreeNodes replacing those linked from
     * this node.
     */
    final Node<K,V> untreeify(HashMap<K,V> map) {
        ...
    }

    /**
     * Tree version of putVal.
     */
    final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab,
                                   int h, K k, V v) {
        ...
    }

    /**
     * Removes the given node, that must be present before this call.
     * This is messier than typical red-black deletion code because we
     * cannot swap the contents of an interior node with a leaf
     * successor that is pinned by "next" pointers that are accessible
     * independently during traversal. So instead we swap the tree
     * linkages. If the current tree appears to have too few nodes,
     * the bin is converted back to a plain bin. (The test triggers
     * somewhere between 2 and 6 nodes, depending on tree structure).
     */
    final void removeTreeNode(HashMap<K,V> map, Node<K,V>[] tab,
                              boolean movable) {
        ...
    }

    /**
     * Splits nodes in a tree bin into lower and upper tree bins,
     * or untreeifies if now too small. Called only from resize;
     * see above discussion about split bits and indices.
     *
     * @param map the map
     * @param tab the table for recording bin heads
     * @param index the index of the table being split
     * @param bit the bit of hash to split on
     */
    final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
        ...
    }

    /* ------------------------------------------------------------ */
    // Red-black tree methods, all adapted from CLR

    static <K,V> TreeNode<K,V> rotateLeft(TreeNode<K,V> root,
                                          TreeNode<K,V> p) {
        ...
    }

    static <K,V> TreeNode<K,V> rotateRight(TreeNode<K,V> root,
                                           TreeNode<K,V> p) {
        ...
    }

    static <K,V> TreeNode<K,V> balanceInsertion(TreeNode<K,V> root,
                                                TreeNode<K,V> x) {
        ...
    }

    static <K,V> TreeNode<K,V> balanceDeletion(TreeNode<K,V> root,
                                               TreeNode<K,V> x) {
        ...
    }

    /**
     * Recursive invariant check
     */
    static <K,V> boolean checkInvariants(TreeNode<K,V> t) {
        ...
    }
}

引入红黑树之后`PUT`的变化


public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

大致实现如下：

计算出key的hash值（hash算法和1.7有些变化,但是对于null值的除了还是一样，都是0）
先判断哈希表是否已经扩充，若没有对其扩充
根据hash值计算当前要插入的位置如果为空，则新建NODE并存储
如果当前要插入的位置不为空
- 判断key是否存在，如果存在则直接覆盖
- 如果不存在
  - 判断当前的存储结构是否是红黑树(TreeNode)，如果是红黑树，则执行红黑树的插入操作
  - 如果当前存储结构为链表(Node)，则判断长度是否已经达到8，如果达到8则使用红黑树替换链表并且执行插入操作；如果还没达到8则对链表执行插入操作
如果map的长度超过threshold的话，将会进行rehash操作

引入红黑树之后`GET`的变化

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}