HashMap1.7

作者: virtual灬zzZ | 来源:发表于2022-01-24 18:15 被阅读0次

HashMap1.7
HashMap1.7
ConcurrentHashMap
HashMap1.7源码分析
大厂面试系列(十三):Java基础
HashMap及ConcurrentHashMap源码分析
hashmap1.7 循环链表图解
HashMap1.7和ConcurrentHashMap1.7原
HashMap1.7和1.8的实现原理
Hashmap1.7和1.8区别+ConcurrentHashm

1.7 HashMap

主要构成：

HashMap的K&V对，主要由内部类Entry来描述。

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

数据模型：

模型主要是采用数组+单链表组成，图形如下：

重要属性：

默认初始化容量
 static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

最大的容量
static final int MAXIMUM_CAPACITY = 1 << 30;

默认负载因子，用于扩容
static final float DEFAULT_LOAD_FACTOR = 0.75f;

空数组，仅仅用于判断
static final Entry<?,?>[] EMPTY_TABLE = {};

数组部分
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

总容量大小
transient int size;

阈值，用于扩容做参考临界点， capacity * load factor
int threshold;

负载因子，绝顶阈值
final float loadFactor;

简要概括inti、put、get：

init
初始一个HashMap，可以传入容量大小和负载因子，不传容量大小的话默认是16，负载因子默认0.75，初始化的时候，仅仅是设置参数而已，还没赋予table数组的内存空间，即还没有new对象。真正赋予内存空间，是第一次执行put操作的时候，后续源码会详细讲解。
put
传入HashMap的是Key-Value对，进行put的时候，主要首先对Key进行hash扰动计算，计算出在数组的哪个下标index，然后采用头插法的方式进行添加，如果该index上的单链表，key存在就替换，不同就用头插法顶替。所谓头插法，即是新的不同Key数据会顶替原有在该数组这个下标的数据，然后这个新的Entry就会成为这个table[index]的新主人，而oldEntry就自然是newEntry的nextEntry。如果put的时候，size已经达到了阈值threshold，而且计算出的这个table[index]上的数据不为空，就会进行resize扩容。这是两个条件，缺一都不可进行扩容，这点需要注意。还有就是key可以是null，直接方法table[0]处。
get
get就比较简单点，因为key可以是null，如果是则直接在table[0]处遍历其单链表查找数据，如果不是null，就计算出key的hash，找到相应的index，然后在其单链表中找到匹配该key的value即可。

源码分析：

初始化函数：（仅仅赋予属性值）
不带这两个参数，会给默认的，最终还是调用以下代码。

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        this.loadFactor = loadFactor;
        threshold = initialCapacity;
        init(); 空函数
    }

put函数

public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

从第一行就可以看到，如果数组table是空的，就会根据阈值初始化table，阈值一开始等于传入的capacity参数。查看inflateTavle(thresold)

private void inflateTable(int toSize) {
        // Find a power of 2 >= toSize
        int capacity = roundUpToPowerOf2(toSize);  取比自己大的最靠近的2次幂的数值,作为数组table的size

        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);   threshold = capacity * loadfactor
        table = new Entry[capacity];
        initHashSeedAsNeeded(capacity);

如果key==null，直接执行putNullKey方法，在table[0]处找key==null的，如果没找到就头插法，找到就替换并返回oldValue，addEntry() 有resize扩容，这里不展开。

private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }

void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

如果key不是null，就hash计算它所在的下标index，在tables[index]下，开始遍历其单链表，如果存在key一样，就替换返回oldValue，不存在就成为table[index]的新主人，原来的数据变为其的next。createEntry(int hash, K key, V value, int bucketIndex) 方法写得很清楚。

resize扩容：

void addEntry(int hash, K key, V value, int bucketIndex) {
         必须注意这个条件
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

必须要主要这个条件， **if ((size >= threshold) && (null != table[bucketIndex])) **，&& 两者必须为true才会resize。

void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }

首先是将oldTable的大小*2作为newTable的capacity，然后进行transfer()转移,最后就是计算出threshold，查看transfer()

rehash是true
void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

遍历oldTable，从oldTable的数组开始，逐个index及其的单链表下的所有元素都进行循环，key的rehash，整理到newTable中，这里进行个比喻，如果rehash和原来的一样，这些key入newTable的顺序是逆序的，即原来 1》2》3的顺序，如果它们rehash之后在newTable还是一样的下标的话，它们的顺序就是3》2》1。

关于hashMap不安全的原因

hashMap在并发操作的时候，是可能发生死链和丢失的，主要是在于扩容的时候，头插法是导致出现问题的要因，主要在transfer()方法中：

void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next; ①
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

死链问题

首先说一下hash()方法：sun.misc.Hashing.stringHash32()会强制去主内存读数据

final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }

        h ^= k.hashCode();

        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

首先我们试想下某个index的单链表如下：
3->7->5->null

假设最后正常扩容如下：
7->3->null
5->null

假设有两个线程A和B，线程A执行到 ① 处，就让出了CPU时间片，而这时线程B等到了CPU的控制权，它一下子就完成了全部操作，并把newTable赋予给了table全局变量，而线程A是浑然不知的，待A再拿到CPU控制权时，它边继续往下执行。

此时对于线程A，e=3，e.next =7，继续执行下去，前提需要说明，这时table全局变量已经是新的了，正常扩容的，所以 7.next=3，3.next=null，它们是enrty，是对象，现在线程操作的是它们的引用。这点需要明确。

对于A的newTable，现在是 3->null，e=7，接着(e!=null)循环while
7的hash和3一致，和线程B一样相同hash进入相同的table[index]，所以对于这个index，现在是e=7，e.next=3 ，接着 7->3->null，之后e=3，接着while循环
e=3，e.next= null，接着3->7->3，后续e=null，不再while循环，这时已经形成死链，3->7->3是绕圈。

后续会将线程A这个newTable赋予给全局变量table，如果调用put、get或者transfer时，如果命中此死链，将导致死循环！

数据丢失

假设一开始是如下所示

线程A和B存在如下顺序

线程A执行到第一处被挂起，此时e为3，next为5
线程B执行完，得到如下结果
此时线程A继续执行，将3放在指定位置，然后下次循环去放5，放完5由于5的next指向了null，故本次扩容结束，对于线程A和线程B，他俩有各自线程私有的newTable，其中A是正确的，但是线程A先执行了table=newTable进行赋值，线程B后执行，导致B覆盖了A，产生数据丢失

参考：
JDK1.7中HashMap导致的死链以及数据丢失问题
 HashMap1.7的死循环的产生
 这是一份全面 & 详细的HashMap 1.7源码分析指南

HashMap1.7
存储流程数组元素 & 链表节点的实现类 HashMap中的数组元素 & 链表节点采用 Entry类实现，如...
HashMap1.7
1.7 HashMap 主要构成： HashMap的K&V对，主要由内部类Entry来描述。数据模型：模型主要...
ConcurrentHashMap
HashMap 在并发环境下HashMap1.7是线程不安全的，在扩容时可能会造成环状链表HashTable 是...
HashMap1.7源码分析
HashMap在日常开发中基本是天天见的，而且都知道什么时候需要用HashMap，根据Key存取Value，但是存...
大厂面试系列(十三):Java基础
Java基础 hashmap1.7跟1.8？优化点？红黑树化为什么是8？退化为什么？ dp怎么玩？回溯怎么玩？递归...
HashMap及ConcurrentHashMap源码分析
HashMap hashMap1.7的数据结构 1.7的结构如下图，底层是一个大的Entry数组，每个数组元素为一...
hashmap1.7 循环链表图解
HashMap1.7和ConcurrentHashMap1.7原
面试的时候经常问到。这里对jdk1.7的HashMap和ConcurrentHashMap原理进行分析。下篇将详细...
HashMap1.7和1.8的实现原理
概述 HashMap基于Map接口实现，元素以键值对的方式存储，并且允许使用null键和null值，因为key不允...
Hashmap1.7和1.8区别+ConcurrentHashm
Hashmap JDK1.7中使用一个Entry数组来存储数据，用key的hashcode取模来决定key会被放到...