聊聊HashMap原理

作者: 李孝伟 | 来源:发表于2018-03-28 23:00 被阅读50次

能力要进阶，必然就要多了解了解原理。看了一些Java的数据结构，简单的总结下HashMap原理。
所谓map就是一个个key、value对，value允许重复，key必须唯一；那HashMap从字面上看就是加入了Hash的Map。那HashMap到底是什么类型呢？

HashMap的底层结构是由数组和链表组成；每个对象都是Map.Entry对象，Entry存储着：hash、key、value、next（下一个Entry指针）
数组：寻址快，但是不方便删除和插入；
链表：方便删除和插入，但是遍历速度慢；
两者结合，就有事情发生。

HashMap初始化的时候是分配默认大小为16的数组table，每个数组里存放着Entry对象。最多的操作就是put() 和 get()；

说说put(k,v)：

1. 拿到key之后，先会计算一下key的hash值 ；

2. 再用这个hash值%（模）数组的长度length-1，这样就可以得到一个不超过数组长度的坐标i；

3. 这个坐标就是这个value要保存在数组上的位置。一般的数组保存都是table[i]=entry(hash,key,value,next)；但是不能保证index=hash%length后就一定都不一样，所以这种直接等于的方式就会覆盖，这种写入是不可取的；（index相同就是数据碰撞）

4. 那同一个坐标上的entry就用一个链表list保存，数组上保存最新插入的entry即链表头；先记录oldEntry=table[i]，然后将table[i]=newEntry ，然后用newEntry.next指向oldEntry；这里oldEntry与一个个newEntry就形成了一个链表；

当然这只是大流程，这里还有很多细节和逻辑，后面代码再续。

get()就更好理解了，通过key的hash值找到链表，再遍历列表，如果hash和key相等，就返回这个entry的value；

下图：纵列0～15是数组（默认16个），每个横是一个链表，如496是一个entry；

image

大体了解后，我们就一起看看代码，在此之前再说几点细节：

1. 为了快速遍历，就会尽可能的将entry保存在数组上（每个链表尽可能只有一个节点），所以当所有元素数快达到数组大小的时候，就会扩大数组；
2. 为了优化查找速度，数组大小一般是2的n次幂；……

初始化函数：initialCapacity 指定初始化大小（默认16），loadFactor 实际容纳元素因子默认0.75

  public HashMap(int initialCapacity, float loadFactor) {

     if (initialCapacity < 0)
         throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);

     if (initialCapacity > MAXIMUM_CAPACITY)
         initialCapacity = MAXIMUM_CAPACITY;

     if (loadFactor <= 0 || Float.isNaN(loadFactor))
         throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

     // Find a power of 2 >= initialCapacity

     int capacity = 1;
     //注意这个循环，设置一个2的n次幂数，大于指定初始值，后续再讲为何是2的n次幂
     while (capacity < initialCapacity)
         capacity <<= 1;

     this.loadFactor = loadFactor;
     //threshold 即当前hashmap的最大容量
     threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
     //table 即初始化的数组
     table = new Entry[capacity];

     useAltHashing = sun.misc.VM.isBooted() &&
             (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
     init();
 }

通过初始化函数，可以看出本质就是一个数组。

说下存储数据的元素：

static class Entry<K,V> implements Map.Entry<K,V> {
    final K key; //键
    V value;//值
    Entry<K,V> next;//下一个entry指针
    final int hash;//hash值
    ……
}

认识下put(key,value)源码：

/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
public V put(K key, V value) {
        //其允许存放null的key和null的value，当其key为null时，调用putForNullKey方法，放入到table[0]的这个位置
        if (key == null)
            return putForNullKey(value);
        //通过调用hash方法对key进行哈希，得到哈希之后的数值。该方法实现可以通过看源码，其目的是为了尽可能的让键值对可以分不到不同的桶中
        int hash = hash(key);
        //根据上一步骤中求出的hash得到在数组中是索引i
        int i = indexFor(hash, table.length);//这个很重要，看后面函数说明
        //如果i处的Entry不为null，则通过其next指针不断遍历e元素的下一个元素，如果找到相同hash和key则进行数据替换，put结束
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        //如果没有找到，则进行添加；
        modCount++;//注意这个参数，用来做线程安全保护的，后续再说明，相当于版本号
        addEntry(hash, key, value, i);
        return null;
}
//hash函数，对k的hashcode进行高位运算，减少hash冲突
final int hash(Object k) {
        int h = 0;
        if (useAltHashing) {
            if (k instanceof String) {
                return sun.misc.Hashing.stringHash32((String) k);
            }
            h = hashSeed;
        }
        //得到k的hashcode值
        h ^= k.hashCode();
        //进行计算
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
}
/**
     * Returns index for hash code h.
     */
static int indexFor(int h, int length) {  
    return h & (length-1);
}
//这个函数就非常巧妙的用&代替了%运算，因为数组的大小总是2的n次幂；(详见初始化函数)
//&运算速率高于%运算

/**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
void addEntry(int hash, K key, V value, int bucketIndex) {
      //size记录了当前hashmap的元素数，在createEntry函数有说明；
      //threshold是当前hashmap最大容量，由capacity*loadFactor计算来，详见初始化函数；
      //当达到上限时进行扩容，扩容是非常耗资源的，所以初始化尽量合适的大小
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);//扩容操作，多线程死循环的根源
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
}
//在bucketIndex位置上创建entry
void createEntry(int hash, K key, V value, int bucketIndex) {
        // 获取指定 bucketIndex 索引处的 Entry
        Entry<K,V> e = table[bucketIndex];
        // 将新创建的 Entry 放入 bucketIndex 索引处，并让新的 Entry.next 指向原来的 Entry
        table[bucketIndex] = new Entry<>(hash, key, value, e);//最新的entry当链表头
        size++;
}
void resize(int newCapacity)
{
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    ......
    //创建一个新的Hash Table
    Entry[] newTable = new Entry[newCapacity];
    //将Old Hash Table上的数据迁移到New Hash Table上
    transfer(newTable);
    table = newTable;
    threshold = (int)(newCapacity * loadFactor);
}
void transfer(Entry[] newTable)
{
    Entry[] src = table;
    int newCapacity = newTable.length;
    //下面这段代码的意思是：
    //  从OldTable里摘一个元素出来，然后放到NewTable中，原顺序上1234会变成4321，当然因为扩容了，所以不会4个仍在一起。
    for (int j = 0; j < src.length; j++) {
        Entry<K,V> e = src[j];
        if (e != null) {
            src[j] = null;
            do {
                Entry<K,V> next = e.next;//
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];// 这个是导致多线程死循环的根源！！！
                newTable[i] = e;
                e = next;
            } while (e != null);
        }
    }
}
多线程不安全的原因：比如一个链条上有entry1->entry2->entry3，当两个线程都在执行扩容时，假设entry1和entry3又都hash到了同一个链表上，线程一获取了entry1后挂起，线程二顺利完成扩容，即entry1->entry3，当线程一继续执行后，entry1.next=newtable[i] 即entry3；此时就产生了entry1.next=entry2,entry2.next=entry1死循环！！！
参考：https://coolshell.cn/articles/9606.html

//读取比较好理解
/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }
    final Entry<K,V> getEntry(Object key) {
        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

Fast-Fail机制
还记得modCount参数么，每次put都会+1，这里就相当于hashmap的一个版本号，迭代过程中就会检测是否有变化，如果有就会抛出异常。

HashIterator() {
    expectedModCount = modCount;
    if (size > 0) { // advance to first entry
    Entry[] t = table;
    while (index < t.length && (next = t[index++]) == null)  
        ;
    }
}
final Entry<K,V> nextEntry() {
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();

再说下遍历方式：

//第一种比较高效
　　Map map = new HashMap();
　　Iterator iter = map.entrySet().iterator();
　　while (iter.hasNext()) {
　　Map.Entry entry = (Map.Entry) iter.next();
　　Object key = entry.getKey();
　　Object val = entry.getValue();
　　}

//第二种相对较低
Map map = new HashMap();
　　Iterator iter = map.keySet().iterator();
　　while (iter.hasNext()) {
　　Object key = iter.next();
　　Object val = map.get(key);
　　}

聊聊HashMap原理

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读