22.源码阅读（jdk1.6 HashMap源码和原理分析）

作者: 任振铭 | 来源:发表于2018-09-23 15:56 被阅读13次

22.源码阅读（jdk1.6 HashMap源码和原理分析）
ConcurrentHashMap 原理和源码分析(一)
Q&A-01 Java基础
HashMap源码分析——put和get（三）
java源码分析之LinkedHashMap
HashMap源码分析——put和get（二）
HashMap1.8源码分析
ConcurrentHashMap 源码分析（Java vers
java源码分析之HashMap（三）
Java集合框架—HashMap—源码研读1

HashMap 底层采用数组 + 链表的的实现方式来降低数据插入和查询的时间复杂度,理想状态下可以实现时间复杂度位O(1)，今天就从源码的角度看一下它是如何实现的。我们从它的两个关键方法put和get入手。

put方法

    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        //数组中i位置存在Entry对象
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        //不存在则创建一个新的Entry加入数组，并将计数器加1
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

这里有两个关键方法hash和indexFor，hash()方法的作用是对key的hashCode值进行一系列的位运算，看下边的源码

/**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     * 注释的意思大概是说下边代码是对给定的哈希值应用补充散列函数
     * 我们只需要知道一点，经过这些位运算之后，hash值的散列效果会
     * 得到优化即可
     */
    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

再看indexFor方法,将得到的hash值与hashMap中数组的长度-1的值进行与运算，得到的就是这个元素将会被放置在数组中的哪个位置的index，并且绝不用担心发生数组越界之类的问题，原因请自行发现

   /**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);
    }

我们再回到上边的put方法中，获取到当前元素将要插入的index后，会从当前的这个内置的数组中查找这个index位置是否已经存放了Entry，如果有则变量这个index位置的链表，判断是否有相同key值的元素存在（当前的这个数组指的就是在new HashMap的时候默认创建出来的一个Entry数组，它有一个默认长度，此时在没有进行put操作的时候，数组中元素都是null）如果此时是第一次put，那么数组中是空的，直接绕过这个for循环往下执行。如果进行put操作的时候，数组中存这个index，那么会进入这个for循环，拿到i这个位置的单链表，遍历链表中所有的Entry,如果有相同key的元素，则找到它，用新的value值替换旧value，并返回旧value

for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

继续往下，如果不存在，那么创建新的Entry对象
addEntry(hash, key, value, i)
hash：key值计算得到的hash值
i: 经过hash和indexFor方法计算得到的被put的key，value即将被放入数组中的位置

   /**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        //拿到数组中bucketIndex位置的Entry对象，第一次执行，他是null
    Entry<K,V> e = table[bucketIndex];
        //new了一个Entry对象放在这个bucketIndex位置上
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
        //判断是否已经达到了需要扩容的条件，如果是则对HashMap进行扩容
        if (size++ >= threshold)
            resize(2 * table.length);
    }

Entry构造方法中第四个参数很重要，我们知道HashMap是基于数组和单链表的，数组就是指的Entry数组，而单链表呢？就在Entry对象的这个next属性上

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        final int hash;
}

添加一个Entry到数组中某一位置后，会同时给它指定它的next，这个next就是它所指向的下一个Entry元素，这样，每次添加一个Entry都会把这个Entry加入到这个链表的第一个位置，它的next就是原来在第一个位置的Entry对象，这样就连成了一个链子

添加了第一个Entry到数组中某一index后，它的next指向的是一个null，因为在没有添加的时候，这个index的位置存放的就是null值

此时我们的数组中添加了第一个Entry，此时会判断当前数组的长度是否已经达到了极值（每次添加都会进行一次判断），如果是则对数组进行扩容，执行 resize(2 * table.length)可以看到，每次会扩容一倍的长度

    /**
     * Rehashes the contents of this map into a new array with a
     * larger capacity.  This method is called automatically when the
     * number of keys in this map reaches its threshold.
     *
     * If current capacity is MAXIMUM_CAPACITY, this method does not
     * resize the map, but sets threshold to Integer.MAX_VALUE.
     * This has the effect of preventing future calls.
     *
     * @param newCapacity the new capacity, MUST be a power of two;
     *        must be greater than current capacity unless current
     *        capacity is MAXIMUM_CAPACITY (in which case value
     *        is irrelevant).
     */
    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        //判断当前数组的长度是否已经达到了设置的MAXIMUM_CAPACITY
        //如果是则将threshold （达到这个值就满足扩容的条件），设置为int的最大值
        //那么此时hashMap停止扩容，可见，hashMap容量是有极限的
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
        //创建一个新的Entry数组
        Entry[] newTable = new Entry[newCapacity];
        //将旧的数组中的值经过转换放入新的数组中
        transfer(newTable);
        //使用扩容后的数组
        table = newTable;
        //扩容阈值更新
        threshold = (int)(newCapacity * loadFactor);
    }

HashMap扩容的原理也很简单，因为它是基于数组实现的，所以所谓的扩容并不是将原有的数组长度扩大，而是创建了一个新的数组，这个数组的长度等于扩容后的长度，然后将旧数组中的元素转移到新数组中，旧的数组废弃，这个转移的过程仍然是经过了一系列的变换的，我们知道之前的数组中每一个put进来的元素的index都是经过一系列的位运算得到的，那么这里在进行转移的时候同样要再次进行运算得到新的index，因为这个index是基于hash值和数组长度生成的，hash值不会改变但是数组长度经过扩容后是不同了，所以要重新计算，否则会导致查询数据失败

   /**
     * Transfers all entries from current table to newTable.
     */
    void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        //遍历旧的数组中所有的Entry对象
        for (int j = 0; j < src.length; j++) {
            Entry<K,V> e = src[j];
            //拿到index位置的Entry后，如果这个Entry不为null，则循环遍历这个链表
            if (e != null) {
                src[j] = null;
                do {
                    //拿到链表上每一个Entry
                    Entry<K,V> next = e.next;
                    //计算这个Entry在扩容后数组中的index值
                    int i = indexFor(e.hash, newCapacity);
                    //设置这个Entry的next值
                    e.next = newTable[i];
                    //把这个Entry放在计算得到的index位置上
                    newTable[i] = e;
                    //赋值获取下一个Entry，直到这个链表遍历完成
                    e = next;
                } while (e != null);
            }
        }
    }

源码是这样的先后顺序放置的，如果你把顺序调换一下会更好理解。这里我们先调整一下代码的顺序如下（不影响结果，但是更易理解）。大概过程就是假设我第一个找到了index=1的位置的这个单链表（这个链表可能只有一个元素，也可能有多个），我会获取到这个位置的第一个Entry,计算得到它在新数组中的index,然后把这个Entry放在这个数组的index位置上，将它的next指定为原本这个位置的值，其实就是null,这是第一步，第二步就是获取到第一个Entry的next值，也就是链表中第二个位置的Entry，把这个Entry重新指定为e，重复上边的操作，这样一整个过程下来，旧的数组中的所有元素都被正确的放置在了新数组中的正确位置上，达到了扩容的效果

  do {
        //计算这个Entry在扩容后数组中的index值
        int i = indexFor(e.hash, newCapacity);
        //把这个Entry放在计算得到的index位置上
        newTable[i] = e;
       //设置这个Entry的next值
        e.next = newTable[i];

        //拿到链表上每一个Entry
        Entry<K,V> next = e.next;
        //赋值获取下一个Entry，直到这个链表遍历完成
        e = next;
  } while (e != null);

get方法

put看完之后get就简单多了。根据传入的key值，进行hash和indexFor运算，得到key在数组中的index，然后拿到数组中index位置的entry对象，再从这个位置的单链表中查找key值相同的value返回

    /**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

总结一下：
1.HashMap底层基于数组和单链表
2.hash()和indexFor()方法作用很关键
3.如果你去看看native曾hashCode值获取的方式，你会得到两个结论(hashCode值并不是简单的地址值，而是地址值进行右移16位之后的一个值，参阅ndk底层c++代码实现)：

两个不同的对象 hashCode 值可能会相等，hashCode 值不相等的两个对象肯定不是同一对象。

附上手写的HashMap代码


public class MyHashMap<K,V> {
    
    public MapEntry<K,V>[] table;
    
    int size;
    
    /**
     * 扩容阈值,达到这个值时就要扩容
     */
    int threshold = 8;
    
    /**
     * 扩容因子
     */
    final float loadFactor = 0.75f; 
    
    public class MapEntry<K,V>{
        private K key;
        private V value;
        MapEntry<K,V> next;
        int hash;
        
        public MapEntry(int hash,K key,V value,MapEntry<K,V> next) {
            this.key = key;
            this.value = value;
            this.hash =hash;
            this.next = next;
        }
    }

    public V put(K key,V value) {
        if(table == null) {
            table = new MapEntry[8];
        }
        
        //判断key为null
        if(key == null) {
            return null;
        }
        
        int hash = hash(key);
        int index = getIndex(hash,table.length);
        System.out.println("key = "+key+" hash="+hash+" index="+ index);
        //判断这个key是否存在
        for (MapEntry<K,V> e = table[index]; e!=null;e = e.next) {
            Object k = null;
            if(e.hash == hash && ((k = e.key) == key) || (key.equals(k))){
                V oldV = e.value;
                e.value = value;
                return oldV;
            }
        }
        
        //添加新的
        addEntry(hash,key,value,index);
        return null;
    }
    
    private void addEntry(int hash, K key, V value, int index) {
        //hash值相等，两个对象不一定相等，两个对象不相等，hash值有可能相等
        //hashCode值 从native看，mask_bits(value()>> hash_shift,hash_mask))
        //是地址右移16位的结果
        if(size >= threshold && table[index] != null) {
            //右移一位，扩容一倍
            resize(size << 1);
            //重新计算index
            index = getIndex(hash,table.length);
        }
        //添加
        createEntry(hash,key,value,index);
        size++;
    }

    private void createEntry(int hash, K key, V value, int index) {
        MapEntry<K, V> newEntry = new MapEntry<K, V>(hash, key, value, table[index]);
        table[index] = newEntry;
    }

    private void resize(int newCapacity) {
        MapEntry<K, V> [] newTable = new MapEntry[newCapacity];
        //将原来数组中的内容经过转换之后放入新数组
        transform(newTable);
        table = newTable;
        threshold = (int) (newCapacity*loadFactor);
        System.out.println("扩容之后：newCapacity="+newCapacity+" threshold="+threshold);
    }

    /**
     * 重新计算散列
     * @param newTable
     */
    private void transform(MyHashMap<K, V>.MapEntry<K, V>[] newTable) {
        int newCapacity = newTable.length;
        for(MapEntry<K,V> e:table) {
            while(null != e) {
                MapEntry<K, V> next = e.next;
                int index = getIndex(e.hash,newCapacity);
                //把原来数组中的e放在新的数组中index位置
                newTable[index] = e;
                //将这个e插入到index位置的第一个，将原来index位置的e指定给e的next
                e.next =  newTable[index];
                //把原来数组中某index位置这条链表上的每一个元素都重新归位
                e = next;
            }
        }
    }

    /**
     * 通过hash值找到table的index
     * @param hash
     * @return
     * 
     * & : 两位同时为“1”，结果才为“1”，否则为0
     */ 
    private int getIndex(int hash,int length) {
        return hash & length-1;
    }

    /**
     * ^ : 就是相同为0不同为1
     * @param key
     * @return
     */
    private int hash(K key) {
        int h = 0;
        h = key.hashCode();
        int h16 = h>>>16;
        return (key == null)?0:(h^(h16));
    }

    public V get(K key) {
        if(key == null) {
            return null;
        }
        MapEntry<K, V> entry = getEntry(key);
        return entry == null?null:entry.value;
    }

    private MyHashMap<K, V>.MapEntry<K, V> getEntry(K key) {
        int hash = hash(key);
        int index = getIndex(hash,table.length);
        
        for (MapEntry<K,V> e = table[index]; e!=null;e = e.next) {
            Object k = null;
            if(e.hash == hash && ((k = e.key) == key) || (key.equals(k))){
                return e;
            }
        }
        return null;
    }

    public int getSize() {
        // TODO Auto-generated method stub
        return size;
    }
}

22.源码阅读（jdk1.6 HashMap源码和原理分析）
HashMap 底层采用数组 + 链表的的实现方式来降低数据插入和查询的时间复杂度,理想状态下可以实现时间复杂度位...
ConcurrentHashMap 原理和源码分析(一)
通过之前几篇文章《HashMap原理和源码分析》《HashTable原理和源码分析》《LinkedHashMap...
Q&A-01 Java基础
1、HashMap的源码，实现原理，JDK8中对HashMap做了怎样的优化？在JDK1.6，JDK1.7中，H...
HashMap源码分析——put和get（三）
HashMap源码分析——put和get（三）链接上一节： HashMap源码分析——put和get（二） ...
java源码分析之LinkedHashMap
相关文章java源码分析之HashMap（一）java源码分析之HashMap（二）java源码分析之HashMa...
HashMap源码分析——put和get（二）
HashMap源码分析——put和get（二）链接上一节：HashMap源码分析——put和get（一） 2...
HashMap1.8源码分析
本文主要有关HashMap1.8源码分析原理和过程 1：HashMap的原理，内部数据结构如何？底层使用哈希表...
ConcurrentHashMap 源码分析（Java vers
在我之前的文章《HashMap 源码分析》中分析了HashMap的源码，众所周知，ConcurrentHashMa...
java源码分析之HashMap（三）
相关文章java源码分析之HashMap（一）java源码分析之HashMap（二）https://blog.cs...
Java集合框架—HashMap—源码研读1
前言：本篇为HashMap源码研读系列第一篇,主要分析HashMap中put()方法的源码。阅读前需要对Hash...