原子操作类-LongAdder、LongAccumulator、

作者: 王侦 | 来源:发表于2019-07-17 15:45 被阅读0次

1.Striped64

This class maintains a lazily-initialized table of atomically
updated variables, plus an extra "base" field. The table size
is a power of two. Indexing uses masked per-thread hash codes.
Nearly all declarations in this class are package-private,
accessed directly by subclasses.

Table entries are of class Cell; a variant of AtomicLong padded
(via @sun.misc.Contended) to reduce cache contention. Padding
is overkill for most Atomics because they are usually
irregularly scattered in memory and thus don't interfere much
with each other. But Atomic objects residing in arrays will
tend to be placed adjacent to each other, and so will most
often share cache lines (with a huge negative performance
impact) without this precaution.

In part because Cells are relatively large, we avoid creating
them until they are needed.  When there is no contention, all
updates are made to the base field.  Upon first contention (a
failed CAS on base update), the table is initialized to size 2.
The table size is doubled upon further contention until
reaching the nearest power of two greater than or equal to the
number of CPUS. Table slots remain empty (null) until they are
needed.

该类维护一个原子更新变量的延迟初始化表，以及一个额外的"base"域。表大小为2的幂。索引使用线程哈希码的掩码。

表条目使用类Cell，AtomicLong的填充（@sun.misc.Contended）变体以减少缓存竞争。大多数原子变量是不规则分布所以填充是多余的，但是对于数组来说，原子对象会相邻放置，因此在没有预防措施时大多数情况下会共享缓存行（对性能产生巨大的负面影响）。

延迟创建直到需要的时候，部分原因是因为Cell相对较大。当没有争用时，所有更新都在base域进行。在第一次争用时（base域CAS更新失败），表容量初始化为2。表大小会在进一步争用时加倍，直到达到最接近于（大于等于）CPU数目的2的幂。表槽保持为null直到需要使用时。

A single spinlock ("cellsBusy") is used for initializing and
resizing the table, as well as populating slots with new Cells.
There is no need for a blocking lock; when the lock is not
available, threads try other slots (or the base).  During these
retries, there is increased contention and reduced locality,
which is still better than alternatives.

The Thread probe fields maintained via ThreadLocalRandom serve
as per-thread hash codes. We let them remain uninitialized as
zero (if they come in this way) until they contend at slot
0. They are then initialized to values that typically do not
often conflict with others.  Contention and/or table collisions
are indicated by failed CASes when performing an update
operation. Upon a collision, if the table size is less than
the capacity, it is doubled in size unless some other thread
holds the lock. If a hashed slot is empty, and lock is
available, a new Cell is created. Otherwise, if the slot
exists, a CAS is tried.  Retries proceed by "double hashing",
using a secondary hash (Marsaglia XorShift) to try to find a
free slot.

自旋锁cellBusy用于初始化和扩容表，以及将新建的Cells填充到槽中。不需要阻塞锁，当锁不可用时，线程可以尝试其他槽或者base。在这些重试期间，争用会增加且局部性会降低，但是仍好于使用阻塞锁的方案。

线程probe字段通过ThreadLocalRandom作为线程哈希值来维护。保持未初始化（0）直到在槽0发生争用。然后就会被初始化为通常不会其他值冲突的值。争用或表冲突发生在执行更新操作CAS失败时。碰撞时，如果表大小小于容量，则大小加倍，除非其他线程持有锁。如果哈希槽为空，且锁可用，则创建Cell。否则，如果操作已有元素，则尝试CAS。使用双重哈希进行重试，使用辅助的hash（Marsaglia XorShift）尝试查找空闲的槽。

The table size is capped because, when there are more threads
than CPUs, supposing that each thread were bound to a CPU,
there would exist a perfect hash function mapping threads to
slots that eliminates collisions. When we reach capacity, we
search for this mapping by randomly varying the hash codes of
colliding threads.  Because search is random, and collisions
only become known via CAS failures, convergence can be slow,
and because threads are typically not bound to CPUS forever,
may not occur at all. However, despite these limitations,
observed contention rates are typically low in these cases.

It is possible for a Cell to become unused when threads that
once hashed to it terminate, as well as in the case where
doubling the table causes no thread to hash to it under
expanded mask.  We do not try to detect or remove such cells,
under the assumption that for long-running instances, observed
contention levels will recur, so the cells will eventually be
needed again; and for short-lived ones, it does not matter.

表大小是有限的，如果线程多于CPU数目时，通过随机改变冲突线程的哈希码来查找映射。因为搜索是随机的，碰撞仅由CAS失败导致，收敛可能变得很慢。但是，尽管存在这些限制，在这些情况下观察到的争用率通常较低。

Cell可能变为失效当曾经散列到该位置的线程终止了，以及当表扩容时没有线程散列到该位置。不会尝试检测或删除这种cells，假设对于长期运行的实例，观察到的竞争会重现，因此最终会需要这些cells；对于短期运行的实例，这无关紧要。

    /**
     * Returns the probe value for the current thread.
     * Duplicated from ThreadLocalRandom because of packaging restrictions.
     */
    static final int getProbe() {
        return UNSAFE.getInt(Thread.currentThread(), PROBE);
    }

获取的线程的threadLocalRandomProbe


    // The following three initially uninitialized fields are exclusively
    // managed by class java.util.concurrent.ThreadLocalRandom. These
    // fields are used to build the high-performance PRNGs in the
    // concurrent code, and we can not risk accidental false sharing.
    // Hence, the fields are isolated with @Contended.

    /** The current seed for a ThreadLocalRandom */
    @sun.misc.Contended("tlr")
    long threadLocalRandomSeed;

    /** Probe hash value; nonzero if threadLocalRandomSeed initialized */
    @sun.misc.Contended("tlr")
    int threadLocalRandomProbe;

    /** Secondary seed isolated from public ThreadLocalRandom sequence */
    @sun.misc.Contended("tlr")
    int threadLocalRandomSecondarySeed;

重点是longAccumulate方法：

    /**
     * Handles cases of updates involving initialization, resizing,
     * creating new Cells, and/or contention. See above for
     * explanation. This method suffers the usual non-modularity
     * problems of optimistic retry code, relying on rechecked sets of
     * reads.
     *
     * @param x the value
     * @param fn the update function, or null for add (this convention
     * avoids the need for an extra field or function in LongAdder).
     * @param wasUncontended false if CAS failed before call
     */
    final void longAccumulate(long x, LongBinaryOperator fn,
                              boolean wasUncontended) {
        int h;
        if ((h = getProbe()) == 0) {
            ThreadLocalRandom.current(); // force initialization
            h = getProbe();
            wasUncontended = true;
        }
        boolean collide = false;                // True if last slot nonempty
        for (;;) {
            Cell[] as; Cell a; int n; long v;
            if ((as = cells) != null && (n = as.length) > 0) {
                if ((a = as[(n - 1) & h]) == null) {
                    if (cellsBusy == 0) {       // Try to attach new Cell
                        Cell r = new Cell(x);   // Optimistically create
                        if (cellsBusy == 0 && casCellsBusy()) {
                            boolean created = false;
                            try {               // Recheck under lock
                                Cell[] rs; int m, j;
                                if ((rs = cells) != null &&
                                    (m = rs.length) > 0 &&
                                    rs[j = (m - 1) & h] == null) {
                                    rs[j] = r;
                                    created = true;
                                }
                            } finally {
                                cellsBusy = 0;
                            }
                            if (created)
                                break;
                            continue;           // Slot is now non-empty
                        }
                    }
                    collide = false;
                }
                else if (!wasUncontended)       // CAS already known to fail
                    wasUncontended = true;      // Continue after rehash
                else if (a.cas(v = a.value, ((fn == null) ? v + x :
                                             fn.applyAsLong(v, x))))
                    break;
                else if (n >= NCPU || cells != as)
                    collide = false;            // At max size or stale
                else if (!collide)
                    collide = true;
                else if (cellsBusy == 0 && casCellsBusy()) {
                    try {
                        if (cells == as) {      // Expand table unless stale
                            Cell[] rs = new Cell[n << 1];
                            for (int i = 0; i < n; ++i)
                                rs[i] = as[i];
                            cells = rs;
                        }
                    } finally {
                        cellsBusy = 0;
                    }
                    collide = false;
                    continue;                   // Retry with expanded table
                }
                h = advanceProbe(h);
            }
            else if (cellsBusy == 0 && cells == as && casCellsBusy()) {
                boolean init = false;
                try {                           // Initialize table
                    if (cells == as) {
                        Cell[] rs = new Cell[2];
                        rs[h & 1] = new Cell(x);
                        cells = rs;
                        init = true;
                    }
                } finally {
                    cellsBusy = 0;
                }
                if (init)
                    break;
            }
            else if (casBase(v = base, ((fn == null) ? v + x :
                                        fn.applyAsLong(v, x))))
                break;                          // Fall back on using base
        }
    }

分为以下几个步骤：

step1.cells已经正常初始化过了（应对LongAdder中槽为空以及cas累加失败，发生碰撞）
step1-1.如果此时槽中并没有Cell，则新建（使用cellsBusy锁）
step1-2.如果LongAdder发生cas累加碰撞，则标记wasUncontended为true，下一次就会使用advanceProbe重新找一个槽，并跳过此步骤
step1-3.使用cas更新到指定的槽中（如果更新失败，表明发生了碰撞，才会进行后面的步骤）
step1-4.如果数组长度达到最大值或者当前cells已经扩容，则将collide置为false
step1-5.如果collide为false，则置为true。
step1-6.如果能运行到此，表明collide为true，此时会进行扩容
step2.表明cells数组未初始化，使用cellsBusy锁将cells数组初始化为大小为2的数组，并将累加的数组放到其中的一个槽中
step3.前面两步尝试失败（即cells未完成初始化，且其他线程正在对其初始化时），尝试累加到base中

    @sun.misc.Contended static final class Cell {
        volatile long value;
        Cell(long x) { value = x; }

总结

1）发生碰撞时，使用双重哈希重新随机找一个槽（使用Marsaglia XorShift）；对每一个线程使用不同的种子，每个线程都累加SEEDER_INCREMENT = 0xbb67ae8584caa73bL
2）无争用情况使用base累加，发生争用时使用cells数组，并在适当情况下进行扩容。分担了负担
3）对于伪共享的处理，使用@sun.misc.Contended标识Cell
4）该技术同样用在了ConcurrentHashMap的addCount中了

2.LongAdder

One or more variables that together maintain an initially zero long 
sum. When updates (method add(long)) are contended across 
threads, the set of variables may grow dynamically to reduce 
contention. Method sum() (or, equivalently, longValue()) returns 
the current total combined across the variables maintaining the 
sum.

This class is usually preferable to AtomicLong when multiple 
threads update a common sum that is used for purposes such as 
collecting statistics, not for fine-grained synchronization control. 
Under low update contention, the two classes have similar 
characteristics. But under high contention, expected throughput of 
this class is significantly higher, at the expense of higher space 
consumption.

LongAdders can be used with a ConcurrentHashMap to maintain 
a scalable frequency map (a form of histogram or multiset). For 
example, to add a count to a 
ConcurrentHashMap<String,LongAdder> freqs, initializing if not 
already present, you can use freqs.computeIfAbsent(k -> new 
LongAdder()).increment();

This class extends Number, but does not define methods such as 
equals, hashCode and compareTo because instances are 
expected to be mutated, and so are not useful as collection keys.

一个或多个变量共同维护初始为0的long型和。当更新（add方法）发生争用时，变量集可能会增长以减少争用。方法sum或者longValue返回变量集保存的当前总和。

当用于多线程更新收集统计信息而不是用于细粒度同步控制时，该类要优于AtomicLong。在低更新争用时，两个类性能差不多。但在高争用的情况下，该类的吞吐量更高，代价是空间消耗更高。

LongAdders可以与ConcurrentHashMap一同使用，以维护可伸缩的频率映射（直方图或multiset形式）。例如，ConcurrentHashMap<String,LongAdder> freqs，如果不存在则进行初始化freqs.computeIfAbsent(k -> new
LongAdder()).increment();

该类继承自Number，但是没有定义equals, hashCode 和 compareTo方法，因为该类实例是可变的，因此不能用作集合的键。

示例：

public class LongAdderDemo {
    private static final int MAX_THREADS = 10;
    private static final int TASK_COUNT = 20;
    private static final int TARGET_COUNT = 10000000;

    private AtomicLong acount = new AtomicLong(0L);
    private LongAdder lacount = new LongAdder();
    private long count = 0;

    private static CountDownLatch cdlatomic = new CountDownLatch(TASK_COUNT);
    private static CountDownLatch cdladdr = new CountDownLatch(TASK_COUNT);


    public class AtomicThread implements Runnable {
        protected String name;
        protected long starttime;

        public AtomicThread(long starttime) {
            this.starttime = starttime;
        }

        @Override
        public void run() {
            long i = 0;
            while (i < TARGET_COUNT) {
                acount.incrementAndGet();
                i++;
            }
            cdlatomic.countDown();
        }
    }

    public void testAtomic() throws InterruptedException {
        ExecutorService exe = Executors.newFixedThreadPool(MAX_THREADS);
        long starttime = System.currentTimeMillis();
        AtomicThread atomic = new AtomicThread(starttime);
        for (int i = 0; i < TASK_COUNT; i++) {
            exe.submit(atomic);
        }
        cdlatomic.await();
        exe.shutdown();
        long v = acount.get();
        long endtime = System.currentTimeMillis();
        System.out.println("AtomicThread spend:" + (endtime - starttime) + "ms" + " v" + v);
    }

    public class LongAdderThread implements Runnable {
        protected String name;
        protected long starttime;

        public LongAdderThread(long starttime) {
            this.starttime = starttime;
        }

        @Override
        public void run() {
            long i = 0;
            while (i < TARGET_COUNT) {
                lacount.increment();
                i++;
            }
            cdladdr.countDown();
        }

    }

    public void testLongAdder() throws InterruptedException {
        ExecutorService exe = Executors.newFixedThreadPool(MAX_THREADS);
        long starttime = System.currentTimeMillis();
        LongAdderThread atomic = new LongAdderThread(starttime);
        for (int i = 0; i < TASK_COUNT; i++) {
            exe.submit(atomic);
        }
        cdladdr.await();
        exe.shutdown();
        long v = lacount.sum();
        long endtime = System.currentTimeMillis();
        System.out.println("LongAdderThread spend:" + (endtime - starttime) + "ms" + " v" + v);
    }

    public static void main(String[] args) throws InterruptedException {
        LongAdderDemo demo = new LongAdderDemo();
        demo.testAtomic();
        demo.testLongAdder();
    }
}

结果：

AtomicThread spend:3807ms v200000000
LongAdderThread spend:727ms v200000000

可见LongAdder明显要比AtomicLong要高效。

构造器：

    public LongAdder() {
    }

增加：

    public void increment() {
        add(1L);
    }

    public void add(long x) {
        Cell[] as; long b, v; int m; Cell a;
        if ((as = cells) != null || !casBase(b = base, b + x)) {
            boolean uncontended = true;
            if (as == null || (m = as.length - 1) < 0 ||
                (a = as[getProbe() & m]) == null ||
                !(uncontended = a.cas(v = a.value, v + x)))
                longAccumulate(x, null, uncontended);
        }
    }

add的步骤总结：

step1.cells不为null表示已发送争用，为null时无争用，无争用首先尝试累加到base上面：casBase
step2.as == null || (m = as.length - 1) < 0表示cells数组还未初始化； (a = as[getProbe() & m]) == null表示槽位空。对于这两种情况，需要调用 longAccumulate(x, null, uncontended);进行初始化和创建Cell处理
step3.如果cells[]已经初始化过了，并且对应槽的Cell已经存在，则直接累加到该槽

求和：

    public long sum() {
        Cell[] as = cells; Cell a;
        long sum = base;
        if (as != null) {
            for (int i = 0; i < as.length; ++i) {
                if ((a = as[i]) != null)
                    sum += a.value;
            }
        }
        return sum;
    }

3.LongAccumulator

One or more variables that together maintain a running long value 
updated using a supplied function. When updates (method 
accumulate(long)) are contended across threads, the set of 
variables may grow dynamically to reduce contention. Method 
get() (or, equivalently, longValue()) returns the current value 
across the variables maintaining updates.

This class is usually preferable to AtomicLong when multiple 
threads update a common value that is used for purposes such 
as collecting statistics, not for fine-grained synchronization 
control. Under low update contention, the two classes have 
similar characteristics. But under high contention, expected 
throughput of this class is significantly higher, at the expense of 
higher space consumption.

The order of accumulation within or across threads is not 
guaranteed and cannot be depended upon, so this class is only 
applicable to functions for which the order of accumulation does 
not matter. The supplied accumulator function should be side-
effect-free, since it may be re-applied when attempted updates 
fail due to contention among threads. The function is applied with 
the current value as its first argument, and the given update as 
the second argument. For example, to maintain a running 
maximum value, you could supply Long::max along with 
Long.MIN_VALUE as the identity.

Class LongAdder provides analogs of the functionality of this 
class for the common special case of maintaining counts and 
sums. The call new LongAdder() is equivalent to new 
LongAccumulator((x, y) -> x + y, 0L.

This class extends Number, but does not define methods such as 
equals, hashCode and compareTo because instances are 
expected to be mutated, and so are not useful as collection keys.

使用一个函数维护一个long值。当发生争用时，变量集会动态增长。get方法（longValue）返回当前值。

累积的顺序无法保证，也不应该依赖顺序，因此该类只适用于对顺序无要求的函数。函数应该是无副作用，当发生争用失败时可以重新调用它。该函数将当前作为第一个参数，并将给定更新作为第二个参数。

也即accumulatorFunction要满足结合律和交换律。

    public LongAccumulator(LongBinaryOperator accumulatorFunction,
                           long identity) {
        this.function = accumulatorFunction;
        base = this.identity = identity;
    }

    public void accumulate(long x) {
        Cell[] as; long b, v, r; int m; Cell a;
        if ((as = cells) != null ||
            (r = function.applyAsLong(b = base, x)) != b && !casBase(b, r)) {
            boolean uncontended = true;
            if (as == null || (m = as.length - 1) < 0 ||
                (a = as[getProbe() & m]) == null ||
                !(uncontended =
                  (r = function.applyAsLong(v = a.value, x)) == v ||
                  a.cas(v, r)))
                longAccumulate(x, function, uncontended);
        }
    }

    public long get() {
        Cell[] as = cells; Cell a;
        long result = base;
        if (as != null) {
            for (int i = 0; i < as.length; ++i) {
                if ((a = as[i]) != null)
                    result = function.applyAsLong(result, a.value);
            }
        }
        return result;
    }

使用示例：

public class LongAccumulatorDemo {

    // 找出最大值
    public static void main(String[] args) throws InterruptedException {
        LongAccumulator accumulator = new LongAccumulator(Long::max, Long.MIN_VALUE);
        Thread[] ts = new Thread[1000];

        for (int i = 0; i < 1000; i++) {
            ts[i] = new Thread(() -> {
                Random random = new Random();
                long value = random.nextLong();
                accumulator.accumulate(value); // 比较value和上一次的比较值，然后存储较大者
            });
            ts[i].start();
        }
        for (int i = 0; i < 1000; i++) {
            ts[i].join();
        }
        System.out.println(accumulator.longValue());
    }
}

原子操作类-LongAdder、LongAccumulator、

1.Striped64

2.LongAdder

3.LongAccumulator

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读