Java8—ConcurrentHashMap实现原理

作者: BlackJava | 来源:发表于2018-09-05 16:16 被阅读136次

Java8—ConcurrentHashMap实现原理
JDK Map 集合总结
java 面试总结
java并发编程的艺术笔记第六章——java并发容器和框架
ConcurrentHashMap 原理解析（JDK1.8）
Java基础 (23) HashMap
Java并发容器和框架
Java中的HashMap和ConcurrentHashMap
2019-02-27
集合

越来越多人使用ConcurrentHashMap 替换使用 HashMap，抱着学习的态度看一看源码，发现内部的实现还是很复杂的，而且实现很精妙，膜拜一下jdk 大神们的智慧。一般我们对于多线程访问都是敬而远之，或者拒之门外，而 Doug Lea 精妙的设计，反而邀请他们一起来帮忙工作。以下写一下自己对它的学习理解，有什么不对的地方，还请指正。

Java8-ConcurrentHashMap特点

1.使用了懒加载模式，在第一次put数据时，会执行初始化操作，初始化大小默认为16
2.使用了数组+链表+红黑树的方式存储数据
3.使用了CAS+Synchronize来并发控制put、remove操作，对于get 读操作是没有添加锁的
4.支持多线程操作，并发控制，对于同一桶进行操作需要取得锁才能访问（put, remove）
下面冲put实现来一一分析一下

put实现 --第一部分初始化

  /**
 * Maps the specified key to the specified value in this table.
 * Neither the key nor the value can be null.
 *
 * <p>The value can be retrieved by calling the {@code get} method
 * with a key that is equal to the original key.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with {@code key}, or
 *         {@code null} if there was no mapping for {@code key}
 * @throws NullPointerException if the specified key or value is null
 */
public V put(K key, V value) {
    return putVal(key, value, false);
}

/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
    // 不能存储 key、value 为 null 数据
    if (key == null || value == null) throw new NullPointerException();
   // 对hash 进行再运算，充分利用hashcode，减少碰撞
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
        if (tab == null || (n = tab.length) == 0)
            // 初始化 hashmap  
            tab = initTable();

这里需要详细说明的是 initTable 方法，初始化哈希表

initTable

   /**
 * Initializes table, using the size recorded in sizeCtl.
 */
private final Node<K,V>[] initTable() {
    Node<K,V>[] tab; int sc;
    // 当前 哈希表为空的时候 才初始化
    while ((tab = table) == null || tab.length == 0) {
        // 判断当前有多少个 线程执行到这个地方，
        // 如果 sizeCtl < 0 代表当前已经有线程正在初始化，
        if ((sc = sizeCtl) < 0)
            Thread.yield(); // lost initialization race; just spin
        // 初始时SIZECTL = sc = 0，则返回 true 并设置 SIZECTL 为 -1，并执行 初始化方法
        else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
            try {
              // 再次确认 哈希表为空，需要初始化数据
                if ((tab = table) == null || tab.length == 0) {
                  // 初始时 sc = 0， 所以初始大小为 DEFAULT_CAPACITY  16
                  // 如果sc > 0, 说明已经初始化了
                    int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                    // 初始化 哈希表
                    @SuppressWarnings("unchecked")
                    Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                    table = tab = nt;
                    // 计算阈值，等效于 n * 0.75
                    sc = n - (n >>> 2);
                }
            } finally {
                // 设置阈值
                sizeCtl = sc;
            }
            break;
        }
    }
    return tab;
}

这里需要解释一下 sizeCtl 默认为 0 ，-1 代表当前正在初始化，或者resizing，-n 代表当前有多少个线程正在访问，已经初始化之后代表着当前哈希表的存储阈值，用于判断当前是否需要扩容，另外一个就是compareAndSwapInt，他有四个参数，第一参数就是操作实体，第二参数就是偏移量，也可以理解为存储字段，第三个参数标识判断和第二个字段的内容是否相等，如果相等返回true，并更新第二个参数所对应的值 = 第四个参数，如果不相等则返回false，我们可以发现只要一个线程可以进入初始化操作。

put实现 --第二部分桶为空

        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            if (casTabAt(tab, i, null,
                         new Node<K,V>(hash, key, value, null)))
                break;                   // no lock when adding to empty bin
        }

当桶为空的时候，直接插入节点数据，这里没有使用synchronize，使用了CAS 插入，可以保证只有一个线程修改实体数据，这里解释一下

CAS : 无锁的执行者(Compare And Swap)

V表示要更新的变量
E表示预期值
N表示新值
如果V值等于E值，则将V的值设为N。若V值和E值不同，则说明已经有其他线程做了更新，则当前线程什么都不做

put实现 --第三部分桶的第一个值为特殊值 MOVED

         else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);

这里我们只大概说一下如果当前节点的hash值为 MOVED，表示当前哈希表正在执行resizing操作，这个时候helpTransfer 意指需要当前线程去帮组做resizing操作，helpTransfer 这个方法比较复杂我们后面讲，

put实现 --第四部分桶为正常值

  else {
            V oldVal = null;
            // 对桶 进行加锁处理
            synchronized (f) {
                // 再次确认 桶的数据
                if (tabAt(tab, i) == f) {
                   // 节点为 链式存储
                    if (fh >= 0) {
                        binCount = 1;
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                            if (e.hash == hash &&
                                ((ek = e.key) == key ||
                                 (ek != null && key.equals(ek)))) {
                                oldVal = e.val;
                                if (!onlyIfAbsent)
                                    e.val = value;
                                break;
                            }
                            Node<K,V> pred = e;
                            // 链表末尾插入数据
                            if ((e = e.next) == null) {
                                pred.next = new Node<K,V>(hash, key,
                                                          value, null);
                                break;
                            }
                        }
                    }
                    // 红黑树存储结构 TreeBin 存储的是根节点的信息 hash 值 为 -2
                    else if (f instanceof TreeBin) {
                        Node<K,V> p;
                        binCount = 2;
                        // 红黑树插入数据，这里就不多说了，不知道的可以先去了解一下
                        if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                       value)) != null) {
                            oldVal = p.val;
                            if (!onlyIfAbsent)
                                p.val = value;
                        }
                    }
                }
            }
             // binCount  = 0 情况表示，该桶没有节点，插入数据
            // binCount ！= 0 情况表示，链表插入或者 红黑树节点插入 或修改
            if (binCount != 0) {
                // 当链表插入数量 超过 8个时，转化成红黑树结构存储
                if (binCount >= TREEIFY_THRESHOLD)
                    // 这里也是红黑树转化，内部也是加了 桶 锁，具体实现就不多说了
                    treeifyBin(tab, i);
                if (oldVal != null)
                    return oldVal;
                break;
            }
        }

put实现 --第五部分修改存储数量值

    addCount(1L, binCount);
    return null;

概述一下put 都做了什么事，首先判断一下哈希表有没有初始化，没有初始化时，去初始化操作，然后确定桶的位置，空桶的时候使用CAS 插入节点数据，如果是ForwardingNode 节点，则去协助扩容操作，如果是正常节点，则分别取对应的链表或者红黑树中插入或者更新数据。

注意可以发现put 里还有两个非常重要的方法我们没有分析，一个是helpTransfer（线程帮助扩容操作），一个是addCount（更新存储数量值），我们先从简单的开始 helpTransfer，这里需要介绍一下ForwardingNode—— A node inserted at head of bins during transfer operations.他是在扩容操作中的一个插入在桶头部的特殊节点，他的含义表明，这个桶已经完成了扩容操作，但是整个哈希表扩容操作还没有结束，如果检测到这种节点，当前线程会被要求一起来完成部分扩容操作。

helpTransfer

  /**
 * Helps transfer if a resize is in progress.
 */
final Node<K,V>[] helpTransfer(Node<K,V>[] tab, Node<K,V> f) {
    Node<K,V>[] nextTab; int sc;
    // 这里就是检测，当前正在扩容操作，当前桶 已完成了扩容操作
    if (tab != null && (f instanceof ForwardingNode) &&
        (nextTab = ((ForwardingNode<K,V>)f).nextTable) != null) {
         //返回一个 16 位长度的扩容校验标识
        int rs = resizeStamp(tab.length);
        while (nextTab == nextTable && table == tab &&
               (sc = sizeCtl) < 0) {
         //sizeCtl 如果处于扩容状态的话
        //高位16 位是数据校验标识，地位16 位是当前正在扩容的线程总数
        //这里判断校验标识是否相等，如果校验符不等或者扩容操作已经完成了，直接退出循环，不用协助它们扩容了
            if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                sc == rs + MAX_RESIZERS || transferIndex <= 0)
                break;
            //否则调用 transfer 帮助它们进行扩容
            //sc + 1 标识增加了一个线程进行扩容
            if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1)) {
                transfer(tab, nextTab);
                break;
            }
        }
        return nextTab;
    }
    return table;
}

接下来就是扩容操作了transfer，这个方法内容有点多，我们分块介绍

transfer --第一部分扩容准备

  /**
 * Moves and/or copies the nodes in each bin to new table. See
 * above for explanation.
 */
private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
    int n = tab.length, stride;
    //计算单个线程允许处理的最少table桶首节点个数，不能小于 16
   // stride 也叫步幅，是处理的节点跨度个数，最小是 16，也就是默认初始化哈希表的大小
    if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
        stride = MIN_TRANSFER_STRIDE; // subdivide range
  // 初始化新的哈希表
    if (nextTab == null) {            // initiating
        try {
            @SuppressWarnings("unchecked")
            Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];
            nextTab = nt;
        } catch (Throwable ex) {      // try to cope with OOME
            sizeCtl = Integer.MAX_VALUE;
            return;
        }
        nextTable = nextTab;
        //transferIndex 指向最后一个桶，方便从后向前遍历 
        transferIndex = n;
    }
    int nextn = nextTab.length;
    // 已完成的 桶的特殊节点信息
    ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);

transfer --第二部分扩容控制的核心

    boolean advance = true;
    // 标志 当前扩容操作是否完成后
    boolean finishing = false; // to ensure sweep before committing nextTab
    //i 指向当前桶，bound 指向当前线程需要处理的桶结点的区间下限
    for (int i = 0, bound = 0;;) {
        Node<K,V> f; int fh;
       //这个 while 循环的目的就是通过 --i 遍历当前线程所分配到的桶结点
       //一个桶一个桶的处理
        while (advance) {
            int nextIndex, nextBound;
            // 当前正在迁移过程中，或者已经完成了扩容
            if (--i >= bound || finishing)
                advance = false;
             //transferIndex <= 0 说明已经没有需要迁移的桶了
            else if ((nextIndex = transferIndex) <= 0) {
                i = -1;
                advance = false;
            }
        //更新 transferIndex
       //为当前线程分配任务，处理的桶结点区间为（nextBound,nextIndex）
            else if (U.compareAndSwapInt
                     (this, TRANSFERINDEX, nextIndex,
                      nextBound = (nextIndex > stride ?
                                   nextIndex - stride : 0))) {
                bound = nextBound;
                i = nextIndex - 1;
                advance = false;
            }
        }
      //当前线程所有任务完成
        if (i < 0 || i >= n || i + n >= nextn) {
            int sc;
            // 已完成扩容操作，则设置新的哈希表连接，更新sizeCtl  的阈值
            if (finishing) {
                nextTable = null;
                table = nextTab;
                sizeCtl = (n << 1) - (n >>> 1);
                return;
            }
            // 更新 SIZECTL 值为 sizeCtl -1，当前线程已经完成扩容操作
            if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
                // 判断其他线程是否处理完，如果已处理完，其他线程已经执行过这块区域代码，说明扩容操作已经完成，直接返回
                if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                    return;
                finishing = advance = true;
                i = n; // recheck before commit
            }
        }
        //待迁移桶为空，那么在此位置 CAS 添加 ForwardingNode 结点标识该桶已经被处理过了
        else if ((f = tabAt(tab, i)) == null)
            advance = casTabAt(tab, i, null, fwd);
       //如果扫描到 ForwardingNode，说明此桶已经被处理过了，跳过即可
        else if ((fh = f.hash) == MOVED)
            advance = true; // already processed

transfer --第三部分迁移操作

        else {
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    Node<K,V> ln, hn;
                    // 链表迁移
                    if (fh >= 0) {
                        int runBit = fh & n;
                        Node<K,V> lastRun = f;
                       //整个 for 循环为了找到整个桶中最后连续的 fh & n 不变的结点
                        for (Node<K,V> p = f.next; p != null; p = p.next) {
                            int b = p.hash & n;
                            if (b != runBit) {
                                runBit = b;
                                lastRun = p;
                            }
                        }
                        if (runBit == 0) {
                            ln = lastRun;
                            hn = null;
                        }
                        else {
                            hn = lastRun;
                            ln = null;
                        }
                      //如果fh&n不变的链表的runbit都是0，则nextTab[i]内元素ln前逆序，ln及其之后顺序
                      //否则，nextTab[i+n]内元素全部相对原table逆序
                      //这是通过一个节点一个节点的往nextTab添加
                        for (Node<K,V> p = f; p != lastRun; p = p.next) {
                            int ph = p.hash; K pk = p.key; V pv = p.val;
                            if ((ph & n) == 0)
                                ln = new Node<K,V>(ph, pk, pv, ln);
                            else
                                hn = new Node<K,V>(ph, pk, pv, hn);
                        }
                        //把两条链表整体迁移到nextTab中
                        setTabAt(nextTab, i, ln);
                        setTabAt(nextTab, i + n, hn);
                        // 标志着这个桶已经完成了迁移
                        setTabAt(tab, i, fwd);
                        advance = true;
                    }
                    // 红黑树 迁移操作
                    else if (f instanceof TreeBin) {
                        TreeBin<K,V> t = (TreeBin<K,V>)f;
                        TreeNode<K,V> lo = null, loTail = null;
                        TreeNode<K,V> hi = null, hiTail = null;
                        int lc = 0, hc = 0;
                        for (Node<K,V> e = t.first; e != null; e = e.next) {
                            int h = e.hash;
                            TreeNode<K,V> p = new TreeNode<K,V>
                                (h, e.key, e.val, null, null);
                            if ((h & n) == 0) {
                                if ((p.prev = loTail) == null)
                                    lo = p;
                                else
                                    loTail.next = p;
                                loTail = p;
                                ++lc;
                            }
                            else {
                                if ((p.prev = hiTail) == null)
                                    hi = p;
                                else
                                    hiTail.next = p;
                                hiTail = p;
                                ++hc;
                            }
                        }
                         // 如果节点数 少于 6个，则红黑树还原成 链表，否则分别对两个链表进行红黑树处理
                        ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
                            (hc != 0) ? new TreeBin<K,V>(lo) : t;
                        hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
                            (lc != 0) ? new TreeBin<K,V>(hi) : t;
                       //把两条已处理的结构 整体迁移到nextTab中
                        setTabAt(nextTab, i, ln);
                        setTabAt(nextTab, i + n, hn);
                        //将原桶标识位已经处理
                        setTabAt(tab, i, fwd);
                        advance = true;
                    }
                }
            }
        }
    }
}

好了，协助扩容操作就已经看完了，我们总结一下，每个线程进入的时候首先是领取自己的迁移区间，然后通过 --i 来遍历迁移区间中的每个桶的情况，如果是空桶，则插入ForwardingNode 标志节点，如果是已经ForwardingNode 节点开始，说明已经完成了当前桶的迁移，则跳过，如果是链表或者红黑树，对桶加锁，正常的迁移即可，迁移结束后依然会将原来的表中添加ForwardingNode 标志节点。

接下来还有一个方法addCount的实现，这里先解释一下CounterCell数组的含义，它主要用于存储节点数据已经插入或更新到哈希表中，但是baseCount没有得到及时更新数据，则会把这些数据存储到CounterCell。

addCount

  /**
 * Adds to count, and if table is too small and not already
 * resizing, initiates transfer. If already resizing, helps
 * perform transfer if work is available.  Rechecks occupancy
 * after a transfer to see if another resize is already needed
 * because resizings are lagging additions.
 *
 * @param x the count to add
 * @param check if <0, don't check resize, if <= 1 only check if uncontended
 */
private final void addCount(long x, int check) {
    CounterCell[] as; long b, s;
    // 如果CounterCell数组中存在值，则说明有 更新值没有存储到baseCount 中
    // 并且CAS 中存储的baseCount值 不一样，需要把差量数据全量插入
    // 如果相同则 更新 baseCount 的值 = baseCount + x
    if ((as = counterCells) != null ||
        !U.compareAndSwapLong(this, BASECOUNT, b = baseCount, s = b + x)) {
        CounterCell a; long v; int m;
        boolean uncontended = true;
        //高并发下 CAS 失败会执行 fullAddCount 方法
        if (as == null || (m = as.length - 1) < 0 ||
            (a = as[ThreadLocalRandom.getProbe() & m]) == null ||
            !(uncontended =
              U.compareAndSwapLong(a, CELLVALUE, v = a.value, v + x))) {
            fullAddCount(x, uncontended);
            return;
        }
        if (check <= 1)
            return;
        s = sumCount();
    }
    if (check >= 0) {
        Node<K,V>[] tab, nt; int n, sc;
        // 检查操作容量阈值，判断是否需要扩容处理
        while (s >= (long)(sc = sizeCtl) && (tab = table) != null &&
               (n = tab.length) < MAXIMUM_CAPACITY) {
            int rs = resizeStamp(n);
            if (sc < 0) {
                if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                    sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
                    transferIndex <= 0)
                    break;
                if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1))
                    transfer(tab, nt);
            }
            else if (U.compareAndSwapInt(this, SIZECTL, sc,
                                         (rs << RESIZE_STAMP_SHIFT) + 2))
                transfer(tab, null);
            s = sumCount();
        }
    }
}

好了对于put 的分析就到这了，对remove分析一下，可以发现他跟put 的处理逻辑基本一样。
其他一些常用的方法我们也看一下

      /**
 * {@inheritDoc}
 */
public int size() {
    long n = sumCount();
    return ((n < 0L) ? 0 :
            (n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
            (int)n);
}

final long sumCount() {
    CounterCell[] as = counterCells; CounterCell a;
    long sum = baseCount;
    if (as != null) {
        for (int i = 0; i < as.length; ++i) {
            if ((a = as[i]) != null)
                sum += a.value;
        }
    }
    return sum;
}

可以发现size 方法中不仅仅只有 baseCount 的值，还有我们刚刚提到的CounterCell的值。包含已经计入baseCount的值，还包括高并发CAS 没有更新到的数据值，存储在CounterCell中，两个部分合起来才是最终的size大小。

get

  /**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code key.equals(k)},
 * then this method returns {@code v}; otherwise it returns
 * {@code null}.  (There can be at most one such mapping.)
 *
 * @throws NullPointerException if the specified key is null
 */
public V get(Object key) {
    Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
    int h = spread(key.hashCode());
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (e = tabAt(tab, (n - 1) & h)) != null) {
        if ((eh = e.hash) == h) {
            if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                return e.val;
        }
        else if (eh < 0)
            return (p = e.find(h, key)) != null ? p.val : null;
        while ((e = e.next) != null) {
            if (e.hash == h &&
                ((ek = e.key) == key || (ek != null && key.equals(ek))))
                return e.val;
        }
    }
    return null;
}

get 就比较简单了，在对于读操作是没有添加锁控制的。

clear

  /**
 * Removes all of the mappings from this map.
 */
public void clear() {
    long delta = 0L; // negative number of deletions
    int i = 0;
    Node<K,V>[] tab = table;
    while (tab != null && i < tab.length) {
        int fh;
        Node<K,V> f = tabAt(tab, i);
        if (f == null)
            ++i;
        else if ((fh = f.hash) == MOVED) {
          // 协助扩容之后需要重新开始计算
            tab = helpTransfer(tab, f);
            i = 0; // restart
        }
        else {
            // 对每个桶处理 都会添加锁处理
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                  // 找到 链表或者红黑树的第一个节点，然后遍历统计有多少个节点
                    Node<K,V> p = (fh >= 0 ? f :
                                   (f instanceof TreeBin) ?
                                   ((TreeBin<K,V>)f).first : null);
                    while (p != null) {
                        --delta;
                        p = p.next;
                    }
                   // 清空桶的数据
                    setTabAt(tab, i++, null);
                }
            }
        }
    }
    if (delta != 0L)
        // 更新 哈希表的节点数
        addCount(delta, -1);
}

好了，就分析到这里了，如果有什么分析不对的地方，请指正！！

Java8—ConcurrentHashMap实现原理
越来越多人使用ConcurrentHashMap 替换使用 HashMap，抱着学习的态度看一看源码，发现内部的实...
JDK Map 集合总结
1. ConcurrentHashMap 的实现原理 ConcurrentHashMap 在 JDK 1.6 和 ...
java 面试总结
一、集合 1. ConcurrentHashMap 的实现原理 ConcurrentHashMap 在 JDK 1...
java并发编程的艺术笔记第六章——java并发容器和框架
1、ConcurrentHashMap的实现原理与使用 1.1、为什么使用ConcurrentHashMap Ha...
ConcurrentHashMap 原理解析（JDK1.8）
了解ConcurrentHashMap 实现原理，建议首先了解下HashMap实现原理。HashMap 源码解析(...
Java基础 (23) HashMap
1）HashMap的实现原理2）ConcurrentHashMap的实现原理3）TreeMap 具体实现4）Lin...
Java并发容器和框架
1.ConcurrentHashMap 的实现原理与使用 ConcurrentHashMap是线程安全高效的H...
Java中的HashMap和ConcurrentHashMap
HashMap和ConcurrentHashMap在Java7和Java8中原理不同，所以这里分别介绍。 Java...
2019-02-27
# ConcurrentHashMap源码解析 [TOC] ## jdk8之前的实现原理 ## jdk8的实现原理...
集合
1.HashMap(实现原理、底层结构、性能差异原因) 2.concurrentHashmap(实现原理、底层结构...