美文网首页Java 杂谈java
HashMap底层实现及源码分析

HashMap底层实现及源码分析

作者: SevenLin1993 | 来源:发表于2018-07-21 16:35 被阅读4次

    HashMap类继承关系

    image.png

    HashMap简介

    先来看看JDK中对HashMap的描述:

    Hash table based implementation of the Map interface.  
    This implementation provides all of the optional map operations, and permits null values and the null key.  
    (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.)  
    This class makes no guarantees as to the order of the map; 
    in particular, it does not guarantee that the order will remain constant over time.
    
    
    基于哈希表的 Map 接口的实现。
    此实现提供所有可选的映射操作,并允许使用 null 值和 null 键。(除了非同步和允许使用 null 之外,HashMap 类与 Hashtable 大致相同。)
    此类不保证映射的顺序,特别是它不保证该顺序恒久不变。
    

    HashMap数据结构

    每一个数据元素集合其背后都有其对于的数据结构,而HashMap的数据结构是怎样的?

    在JDK7中,HashMap的数据结构为:数组+链表

    数组,在内存中使用连续的空间来存储数据,查询遍历速度快(O(1)),但是插入删除速度慢(O(n))

    链表,在内存中使用一系列离散空间,其插入删除操作快(O(1)),但是查询遍历却比较慢(O(n))

    HashMap就很好地将两种数据结构模型结合起来,用数组来保存链表,结合两者的优点,大致数据结构如图:

    image.png

    其中Table为Entry数组,长度为Capacity,其中第一个元素用于保存keynullEntry

    /**
     * HashMap中Entry的内部实现
     */
    static class Entry<K,V> implements Map.Entry<K,V> {
            final K key;
            V value;
            Entry<K,V> next;
            int hash;
    
            /**
             * Creates new entry.
             */
            Entry(int h, K k, V v, Entry<K,V> n) {
                value = v;
                next = n;
                key = k;
                hash = h;
            }
    
            public final K getKey() {
                return key;
            }
    
            public final V getValue() {
                return value;
            }
    
            public final V setValue(V newValue) {
                V oldValue = value;
                value = newValue;
                return oldValue;
            }
    
            public final boolean equals(Object o) {
                if (!(o instanceof Map.Entry))
                    return false;
                Map.Entry e = (Map.Entry)o;
                Object k1 = getKey();
                Object k2 = e.getKey();
                if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                    Object v1 = getValue();
                    Object v2 = e.getValue();
                    if (v1 == v2 || (v1 != null && v1.equals(v2)))
                        return true;
                }
                return false;
            }
    
            public final int hashCode() {
                return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
            }
    
            public final String toString() {
                return getKey() + "=" + getValue();
            }
    
            /**
             * This method is invoked whenever the value in an entry is
             * overwritten by an invocation of put(k,v) for a key k that's already
             * in the HashMap.
             */
            void recordAccess(HashMap<K,V> m) {
            }
    
            /**
             * This method is invoked whenever the entry is
             * removed from the table.
             */
            void recordRemoval(HashMap<K,V> m) {
            }
        }
    

    影响HashMap性能的两个重要参数:

    initial capacity(初始容量)

    capacity是哈希表的长度,而initial capacity(默认值为16)则是在创建哈希表的时候的初始容量。

    load factor(加载因子)

    load factor(默认值为0.75),是哈希表在其容量自动增加之前可以达到多满的一种尺度。当哈希表中的条目数超出了加载因子与当前容量的乘积时,则要对该哈希表进行rehash操作(即重建内部数据结构),从而哈希表将具有大约两倍的桶数。

    load factor默认使用0.75是对时间和空间成本上的一种折中,load factor越大空间开销越小,每个链表中存储的数据将会越多,而查询成本将会变大。所以,在使用HashMap的时候,尽可能地设置适当的初始容量,减少rehash的操作,提高性能。

    HashMap的PUT操作(插入)

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
    
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }
    

    PUT实现的描述大致如下:

    • 先判断哈希表是否已经扩充,若没有对其扩充
    • 如果keynull,将其放入哈希表的第一个位置
    • 如果key不为null,使用hash函数对key进行哈希计算得到Hash值,通过Hash值能够确定其在哈希表的位置
    • 进行保存操作,如果链表中已经存存在相同的key则进行替换并返回旧值,如果这是一个新的key则保存新值并返回null(在保存新的k-v的时候,如果map的长度超过threshold的话,将会进行rehash操作),保存的新值总是在链表的头部,也就是位于指向数组的位置

    HashMap的GET操作(查找)

    /**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);
    
        return null == entry ? null : entry.getValue();
    }
    

    GET实现的描述大致如下:

    • 如果key为null,则从哈希表的第一个元素中获取value
    • 如果key不为null,则通过hash函数取到hash值,通过Hash值确定其在哈希表的位置取得Entry,对Entry遍历最终取到value

    扩容机制(rehash实现,也就是resize方法)

    将Map中的内容进行一次rehash之后放入一个更大的容量的数组,resize方法在Map的key数量达到threshold的时候自动调用。

    如果当前的Map容量已经达到MAXIMUM_CAPACITY(HashMap的最大容量,JDK7中是2^30)时,将不会进行扩容操作,但是threshold会被设置为Integer.MAX_VALUE以防止继续调用resize方法

    具体实现如下:

    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
    
        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }
    

    这里主要关注transfer方法,用于把旧数组拷贝到新数组:

    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }
    

    遍历数组,将数组上的元素逐个拷贝到新数组的新元素上,拷贝之后每个Entry的数据会颠倒(也就是链头变成链尾)

    并发同时访问HashMap

    由于HashMap不是线程安全的,在多线程的场景中使用需要通过同步锁对其访问,或者通过Collections.synchronizedMap(new HashMap(...))对HashMap进行包装之后使用;
    Java中也提供了专门用于多线程并发使用的HashMap,那就是java.util.ConcurrentHashMap,也能避免并发操作带来的问题

    Java8之HashMap新变化

    数据结构变化

    Java8之后,HashMap内部的实现以及数据结构发生了一定的变化,数据存储变为:数组+链表+红黑树(多了红黑树),当链表的长度大于8的时候,改为红黑树。

    其中链表的结构保持一致,只是类名从Entry改为Node,其中红黑树的内部实现如下:

    /**
     * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
     * extends Node) so can be used as extension of either regular or
     * linked node.
     */
    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }
    
        /**
         * Returns root of tree containing this node.
         */
        final TreeNode<K,V> root() {
            for (TreeNode<K,V> r = this, p;;) {
                if ((p = r.parent) == null)
                    return r;
                r = p;
            }
        }
    
        /**
         * Ensures that the given root is the first node of its bin.
         */
        static <K,V> void moveRootToFront(Node<K,V>[] tab, TreeNode<K,V> root) {
            ...
        }
    
        /**
         * Finds the node starting at root p with the given hash and key.
         * The kc argument caches comparableClassFor(key) upon first use
         * comparing keys.
         */
        final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
            ...
        }
    
        /**
         * Calls find for root node.
         */
        final TreeNode<K,V> getTreeNode(int h, Object k) {
            ...
        }
    
        /**
         * Tie-breaking utility for ordering insertions when equal
         * hashCodes and non-comparable. We don't require a total
         * order, just a consistent insertion rule to maintain
         * equivalence across rebalancings. Tie-breaking further than
         * necessary simplifies testing a bit.
         */
        static int tieBreakOrder(Object a, Object b) {
            ...
        }
    
        /**
         * Forms tree of the nodes linked from this node.
         * @return root of tree
         */
        final void treeify(Node<K,V>[] tab) {
            ...
        }
    
        /**
         * Returns a list of non-TreeNodes replacing those linked from
         * this node.
         */
        final Node<K,V> untreeify(HashMap<K,V> map) {
            ...
        }
    
        /**
         * Tree version of putVal.
         */
        final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab,
                                       int h, K k, V v) {
            ...
        }
    
        /**
         * Removes the given node, that must be present before this call.
         * This is messier than typical red-black deletion code because we
         * cannot swap the contents of an interior node with a leaf
         * successor that is pinned by "next" pointers that are accessible
         * independently during traversal. So instead we swap the tree
         * linkages. If the current tree appears to have too few nodes,
         * the bin is converted back to a plain bin. (The test triggers
         * somewhere between 2 and 6 nodes, depending on tree structure).
         */
        final void removeTreeNode(HashMap<K,V> map, Node<K,V>[] tab,
                                  boolean movable) {
            ...
        }
    
        /**
         * Splits nodes in a tree bin into lower and upper tree bins,
         * or untreeifies if now too small. Called only from resize;
         * see above discussion about split bits and indices.
         *
         * @param map the map
         * @param tab the table for recording bin heads
         * @param index the index of the table being split
         * @param bit the bit of hash to split on
         */
        final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
            ...
        }
    
        /* ------------------------------------------------------------ */
        // Red-black tree methods, all adapted from CLR
    
        static <K,V> TreeNode<K,V> rotateLeft(TreeNode<K,V> root,
                                              TreeNode<K,V> p) {
            ...
        }
    
        static <K,V> TreeNode<K,V> rotateRight(TreeNode<K,V> root,
                                               TreeNode<K,V> p) {
            ...
        }
    
        static <K,V> TreeNode<K,V> balanceInsertion(TreeNode<K,V> root,
                                                    TreeNode<K,V> x) {
            ...
        }
    
        static <K,V> TreeNode<K,V> balanceDeletion(TreeNode<K,V> root,
                                                   TreeNode<K,V> x) {
            ...
        }
    
        /**
         * Recursive invariant check
         */
        static <K,V> boolean checkInvariants(TreeNode<K,V> t) {
            ...
        }
    }
    
    引入红黑树之后PUT的变化
    
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
    
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
    

    大致实现如下:

    • 计算出key的hash值(hash算法和1.7有些变化,但是对于null值的除了还是一样,都是0)
    • 先判断哈希表是否已经扩充,若没有对其扩充
    • 根据hash值计算当前要插入的位置如果为空,则新建NODE并存储
    • 如果当前要插入的位置不为空
      • 判断key是否存在,如果存在则直接覆盖
      • 如果不存在
        • 判断当前的存储结构是否是红黑树(TreeNode),如果是红黑树,则执行红黑树的插入操作
        • 如果当前存储结构为链表(Node),则判断长度是否已经达到8,如果达到8则使用红黑树替换链表并且执行插入操作;如果还没达到8则对链表执行插入操作
    • 如果map的长度超过threshold的话,将会进行rehash操作
    引入红黑树之后GET的变化
    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }
    

    大致实现如下:

    • 通过key计算出hash值
    • 通过hash值拿到哈希表中的对于节点,如果为null,返回null
    • 如果节点为红黑树,则通过红黑树遍历拿到对于的value
    • 如果节点为链表,则通过链表遍历拿到value
    引入红黑树对性能的提升(Java8之后)

    红黑树(Red–black tree)是一种自平衡二叉查找树,它可以在O(logn)时间内做查找,插入和删除

    所以,当使用链表(O(n))的时候,如果继续变成那么查找时间的增长梯度会比红黑树(O(logn))快,那么使用红黑树在n(大于8)比较大的情况下性能将会提升。

    总结

    • 本文这是对HashMap的底层实现和重要的两个API:PUTGET的大致描述
    • 如果想深入了解HashMap中hash算法、rehash操作等实现深入的话可以自行看JDK源码(源码真的很赞)
    • 对红黑树的实现感兴趣可以自行了解

    参考:
    Java中HashMap原理分析
    Java 8系列之重新认识HashMap

    相关文章

      网友评论

        本文标题:HashMap底层实现及源码分析

        本文链接:https://www.haomeiwen.com/subject/owihmftx.html