c++原子编程

作者: 谭英智 | 来源:发表于2023-01-27 18:16 被阅读0次

2018-11-11 #C++ 内存模型#
C++并发编程 - 原子操作
CAS与内存屏障: c/c++的内联汇编(S0)
C++核心编程
C/C++与MATLAB混合编程
volatile关键字
学习C++所需看的书籍，初学小白的福音！
浅谈C/C++混合编程
C++_day06
c++学习笔记——第一天

并发编程的类型

wait-free

无论其他线程怎么操作，每一个线程可以在有限步骤完成自身的任务，典型的原子操作（非循环类）
lock-free

线程的同步不使用锁来保证数据的一致性，典型的cas循环
lock-base

线程使用锁来保证数据的一致性，典型的spinlock和mutex lock

lock-free是否代表性能高

片段1.1

std::atomic<unsigned long> sum;
void do_work(size_t N, unsigned long* a) {
 for (size_t i = 0; i < N; ++i) sum += a[i];
}

片段1.2

unsigned long sum(0); std::mutex M;
void do_work(size_t N, unsigned long* a) {
 unsigned long s = 0;
 for (size_t i = 0; i < N; ++i) s += a[i];
 std::lock_guard<std::mutex> L(M); sum += s;
}

atomic-lock-free-fast

片段1.2使用lock-base编程，通过各自线程使用普通变量自增，然后再通过mutex锁来保证结果的一致性

代码片段1.1使用wait-free编程，通过自增原子变量来保证一致性

然而原子变量的操作相比普通变量的操作要昂贵，导致性能远远不如lock-base编程

因此，代码的性能在于更优的算法，而不是简单的认为lock-free就是更快的编程模型

原子变量操作的原理

对于普通变量++x，发生了什么？

int x = 0;
thread 1       |  thread 2
++x;           |  ++x;

x = ?

实际执行如下：

int x = 0;
thread 1        |  thread 2
int tmp = x;//0 |  int tmp = x; //0
++tmp; //1      |  ++tmp; //1;
x = tmp; //1    |  x = tmp; //1;

x = 1

++x实际发生了read-modify-write

由于这三个操作非原子性，导致最后结果发生了data race

============================

那么原子变量又是怎样的呢？

atomic<int> x = 0;
thread 1       |  thread 2
++x;           |  ++x;

x = ?

实际执行如下：

int x = 0;
thread 1        |  thread 2
lock(&x);       |  lock(&x);
int tmp = x;//0 |  int tmp = x; //1
++tmp; //1      |  ++tmp; //2;
x = tmp; //1    |  x = tmp; //2;
unlock(&x);     |  unlock(&x);

x = 2

++x依然是read-modify-write，但是由于x是原子变量

所以会通过锁内存地址，保证操作x的原子性

最终结果保证了x的一致性

什么类型可以被atomic修饰

std::atomic<int> i;           //ok
std::atomic<double> x;        //ok
sturct S {long x; long y;};
std::atomic<S> s;             //ok
struct S {long x;}            //ok
struct S {long x; int y;}     //ok
struct S {int x; int y; int z;} //no
struct S {long x; long y; long z;} //no

可以通过函数 std::atomic::is_lock_free() 或者 is_always_lock_free() 来在运行时判断类型是否是lock-free

std::atomic<int>可以有哪些原子操作

std::atomic<int> x{0};
++x;            //atomic
x++;            //atomic
x += 1;         //atomic
x |= 2;         //atomic
x *= 2;         //compile fail
int y = x * 2;  //atomic load
x = y + 1;      //atomic store
x = x + 1;      //atomic load and atomic store
x = x * 2;      //atomic load and atomic store

std::atomic<T>可以有哪些操作

std::atomic<T> x;
T y = x.load(); // Same as T y = x;
x.store(y); // Same as x = y;

T z = x.exchange(y); // Atomically: z = x; x = y;

bool success = x.compare_exchange_strong(y, z);
// If x==y, make x=z and return true
// Otherwise, set y=x and return false

std::atomic<int> x;
x.fetch_add(y); // Same as x += y;
int z = x.fetch_add(y); // Same as z = (x += y) - y;

CAS为什么特殊

它可以把atomic变量不能原子完成的操作完成

例如x = x*2;

std::atomic<int> x{0};
int x0;
x0 = x;
while ( !x.compare_exchange_strong(x0, x0*2) );

原子变量的速度

atomic-atomic-speed

atomic-lock-speed

CAS的操作比atomic操作要复杂，所以性能比atomic低

mutex由于会带来上下文切换和线程随眠，所以性能最低

spinlock通过atomic read结合cas，利用atomic read 比 atomic write性能高来达到性能的最优

spinlock实现如下：

class Spinlock { 
    atomic<int> flag = 0;
    void lock() { 
        for (int i=0; 
                flag_.load(std::memory_order_relaxed) || flag_.exchange(1, std::memory_order_acquire);
                ++i) { 
            if (i == 8) { 
                lock_sleep(); 
                i = 0; 
            } 
        } 
    } 
    void lock_sleep() { 
        static const timespec ns = { 0, 1 }; // 1 nanosecond
        nanosleep(&ns, NULL); 
    } 
}

spinlock性能高于lock-free，是否代表lock-free已死

非也

这要看critical session耗时多大

如果耗时大，cas由于会把大部分操作放在critical session外面，CAS只会做基本的原子操作，所以耗时小

而lock-base编程则必须把所有操作都放入critical session，这导致如果操作耗时长，会产生长时间的lock，导致其他线程在空转很久

因此当critical session小的时候，spinlock会优于lock-free

当critical session大时，lock-free会优于spinlock

共享atomic变量带来的性能差异

atomic-share-speed

伪共享发生在不同变量share同一个cache line导致

CAS的实现

compare_exchange_strong

bool compare_exchange_strong(T& old_v, T new_v) {
 T tmp = value; // Current value of the atomic
 if (tmp != old_v) { old_v = tmp; return false; }
 Lock L; // Get exclusive access
 tmp = value; // value could have changed!
 if (tmp != olv_v) { old_v = tmp; return false; }
 value = new_v;
 return true;
}

使用double check来减少锁竞争

compare_exchange_weak

bool compare_exchange_weak(T& old_v, T new_v) {
 T tmp = value; // Current value of the atomic
 if (tmp != old_v) { old_v = tmp; return false; }
 TimedLock L; // Get exclusive access or fail
 if (!L.locked()) return false; // old_v is correct
 tmp = value; // value could have changed!
 if (tmp != olv_v) { old_v = tmp; return false; }
 value = new_v;
 return true;
}

通过double check减少锁竞争

使用超时锁来减少等待的时间

也因此会造成误判，是由于线程等待cpu锁超时，即使x相等，也会返回false

但也因此性能更高

memory barrier

代码编译时乱序

通过volatile和memory来保证编译器不被乱序
代码运行时乱序

通过读写屏障来保证运行时不被乱序

c++11的memory barrier

原子操作的默认barrier是 std::memory_order_seq_cst
x.store(1, std::memory_order_release);

在此原子操作前，上面的所有内存操作必须全部完成

x.load(std::memory_order_acquire);

在此原子操作后的所有操作，必须等本原子操作完成后才能完成

CAS的两个内存顺序：

bool compare_exchange_strong(T& old_v, T new_v,
memory_order on_success, memory_order on_failure) {
 T tmp = value.load(on_failure);
 if (tmp != old_v) { old_v = tmp; return false; }
 Lock L; // Get exclusive access
 tmp = value; // value could have changed!
 if (tmp != olv_v) { old_v = tmp; return false; }
 value.store(new_v, on_success);
 return true;
}

内存顺序的性能

atomic-memory-order-speed

网友评论

本文标题：c++原子编程

本文链接：https://www.haomeiwen.com/subject/brvehdtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

c++原子编程

并发编程的类型

lock-free是否代表性能高

原子变量操作的原理

什么类型可以被atomic修饰

std::atomic<int>可以有哪些原子操作

std::atomic<T>可以有哪些操作

CAS为什么特殊

原子变量的速度

spinlock性能高于lock-free，是否代表lock-free已死

共享atomic变量带来的性能差异

CAS的实现

memory barrier

c++11的memory barrier

内存顺序的性能

相关文章

2018-11-11 #C++ 内存模型#

C++并发编程 - 原子操作

CAS与内存屏障: c/c++的内联汇编(S0)

C++核心编程

C/C++与MATLAB混合编程

volatile关键字

学习C++所需看的书籍，初学小白的福音！

浅谈C/C++混合编程

C++_day06

c++学习笔记——第一天

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读