美文网首页
HashMap 1.8 源码简读

HashMap 1.8 源码简读

作者: tinyvampirepudg | 来源:发表于2019-07-13 06:57 被阅读0次

    HashMap 1.8 源码简读

    初始大小:默认为16

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
    

    如果指定了大小,则会在初始化的时候将初始容量大小设置为大于等于指定初始值、且最小的2的整数次幂。

    原理:
    ①在HashMap的构造方法HashMap(int initialCapacity, float loadFactor)HashMap(int initialCapacity)中,会通过tableSizeFor方法将threshold的值置为2的整数次幂。
    ②在第一次调用put(K key, V value)方法添加元素时,会调用resize()方法来初始化table,而在resize()方法中,此时初始容量会被threshold的值代替。

    关键代码如下所示:

    public HashMap(int initialCapacity, float loadFactor) {
        ...
        this.threshold = tableSizeFor(initialCapacity);
    }
    

    请看resize()方法中的这行注释:initial capacity was placed in threshold

    final Node<K,V>[] resize() {
        ...
        if (oldCap > 0) {
            ...
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            ...
        }
        ...
    }
    

    如上所示,HashMap在指定了初始化容量之后,初始容器的大小会变为大于等于指定初始值、且最小的2的整数次幂。

    tableSizeFor方法如下所示:

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
    

    返回大于等于输入参数且最近的2的整数次幂的数.
    测试结果如下:

    tableSizeFor(0):1
    tableSizeFor(1):1
    tableSizeFor(2):2
    tableSizeFor(3):4
    tableSizeFor(4):4
    tableSizeFor(5):8
    tableSizeFor(6):8
    tableSizeFor(7):8
    tableSizeFor(8):8
    tableSizeFor(9):16
    

    装载因子和动态扩容:

    装载因子:默认为0.75,可在HashMap初始化时设置。

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    

    扩容规则是新的容量是原来容量的两倍。
    HashMap中table数组的初始化和动态扩容的方法是resize(),关于扩容容量的代码如下:
    注意这行代码:newCap = oldCap << 1

    /**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * @return the table
     */
    final Node<K,V>[] resize() {
        ...
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            ...
        else { // zero initial threshold signifies using defaults
            ...
        }
        ...
    }
    
    触发扩容操作的阈值:threshold

    计算规则:threshold = capacity * loadFactor
    变量定义如下:

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;
    

    扩容触发条件:数据数量大于阈值时。
    代码在resize()方法中:

    if (++size > threshold)
        resize();
    

    触发扩容操作以后,会在resize()方法中进行数据迁移。是一次性数据迁移,细节请看源码。

    散列冲突解决方法:链表法加红黑树

    链表转化成红黑树,红黑树转化成链表。
    1、链表转化成红黑树:
    HashMap#putVal方法中,当在该角标下,原有存储的链表结点个数大于等于7的时候,就将链表转化为红黑树。具体方法代码如下:

    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
        treeifyBin(tab, hash);
    

    具体的转换过程请看treeifyBin方法。

    2、红黑树转化成链表:
    当冲突以红黑树形态情况下,进行扩充时,将树转成两棵树,若树的的结点数小于等于UNTREEIFY_THRESHOLD,则转为链表形式。
    细节请看split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit)方法。

    HashMap中的散列函数

    hash方法如下:

    /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
    

    在插入或查找的时候,计算key被映射到的位置

    int index = hash(key) & (capacity - 1)
    

    JDK HashMap中hash函数的设计,确实很巧妙:

    首先hashcode本身是个32位整型值,这个值对于不同的对象必须保证唯一(Java规范),这也是大家常说的,重写equals必须重写hashcode的重要原因。

    获取对象的hashcode以后,先进行移位运算,然后再和自己做异或运算,及hashcode ^ (hashcode >>> 16),这一步甚是巧妙,是将高16位移到低16位,这样计算出来的整型值将“具有”高位和低位的性质。

    最后,用hash表当前的容量减去一,再和刚刚计算出来的整型值做位与运算。进行位与运算,很好理解,是为了计算出数组中的位置。但这里有个问题:为什么要用容量减去一?
    因为 A % B = A & (B - 1),所以(h ^ (h >>> 16)) & (capitity -1) = (h ^ (h >>> 16)) % capitity,可以看出这里本质上是使用了[除留余数法]。

    综上,可以看出,hashcode的随机性,加上移位异或运算,得到一个非常随机的hash值,再通过[除留余数法],得到index,整体的设计过程和“散列函数”设计原则非常吻合。

    参考:

    https://time.geekbang.org/column/article/64586 的第一条评论

    Java8 HashMap之tableSizeFor

    散列表——维基百科

    Java8-HashMap 详解

    相关文章

      网友评论

          本文标题:HashMap 1.8 源码简读

          本文链接:https://www.haomeiwen.com/subject/dughkctx.html