前言

译自： ### Per-CPU reference counts - LWN.net

正文

Reference counting is used by the kernel to know when a data structure is unused and can be disposed of. Most of the time, reference counts are represented by an atomic_t variable, perhaps wrapped by a structure like a kref. If references are added and removed frequently over an object's lifetime, though, that atomic_t variable can become a performance bottleneck. The 3.11 kernel will include a new per-CPU reference count mechanism designed to improve scalability in such situations.

通过引用计数，内核可以了解到对应数据结构是否已经不再使用，能够被处理掉。大多数情况下，我们使用atomic_t变量(或者用kref之类的结构体进行封装)来表示引用计数。但是，如果在引用对象的生命周期内，频繁的添加和删除引用，atomic_t 变量可能成为性能瓶颈。因此在 3.11 内核中新增了一个per-cpu的引用计数机制，旨在提高在上述情况下的可扩展性。

此机制由Kent Overstreet创建，定义在<linux/percpu-refcount.h>中。典型的用法是将percpu_ref结构嵌入到需要被追踪的数据结构中。然后使用以下函数初始化计数器：

int percpu_ref_init(struct percpu_ref *ref, percpu_ref_release *release);

其中realease()是当引用计数减少为0时调用的处理函数，其函数签名为：

  typedef void (percpu_ref_release)(struct percpu_ref *);

percpu_ref_init()将初始化引用基数未1，增加或减少计数需要调用：

    void percpu_ref_get(struct percpu_ref *ref);
    void percpu_ref_put(struct percpu_ref *ref);

These functions operate on a per-CPU array of reference counters, so they will not cause cache-line bouncing across the system. There is one potential problem, though: percpu_ref_put() must determine whether the reference count has dropped to zero and call the release() function if so. Summing an array of per-CPU counters would be expensive, to the point that it would defeat the whole purpose. This problem is avoided with a simple observation: as long as the initial reference is held, the count cannot be zero, so percpu_ref_put() does not bother to check.

这些函数运行在per-CPU array上，因此他们不会引起 cache-line 在系统中反复横跳。但是，有一个潜在的问题：在执行percpu_ref_put() 时必须确定引用计数是否已降至零，如果是，则调用 release() 函数。这就需要对per-CPU array上对应的计数进行求和，这是个昂贵的开销，甚至是破坏了我们原来的目的。可以很容易的发现这样的一个事实，只要初始时的引用没有被删除，那么整个计数就不会为0，也就不需要在执行percpu_ref_put() 的时候检查。

为了实现上面的方式，就必须要在初始引用基数被删除的时候，感知到这件事情的发生，因此调用percpu_ref_init()的线程必须在结束引用时调用：

 void percpu_ref_kill(struct percpu_ref *ref);

在调用 percpu_ref_kill() 后，此机制就和atomic_t 计数器类似了；每当释放引用时，该计数器将递减并检查。

The performance benefits of a per-CPU reference count will clearly only be realized if most of the references to an object are added or removed while the initial reference is held. In practice that is often the case. This mechanism has found an initial use in the control group code; the comments in the header file claim that it is used by the asynchronous I/O code as well, but that is not the case in the current mainline.

在大部分的添加、删除操作都在初始引用的初始化和删除操作中间时，此机制才能具备优秀的性能。在实践中，情况也常常符合这种情况。在cgroup的中有其初步的应用！