[译文]The Go Memory Model

作者: Lin_Shao | 来源:发表于2020-12-05 18:45 被阅读0次

原文链接The Go Memory Model

Introduction

G 内存模型指定了一些条件，保证了在一个goroutine中如何读取一个同时被其他goroutine写入的变量

Advice

如果程序要修改一个被多个goroutine访问的变量，此类访问必须被序列化。

为了序列了访问，我们可以采用channel操作或者使用sync或sync/atomic包下的同步原语来保护数据。

If you must read the rest of this document to understand the behavior of your program, you are being too clever.
Don't be clever.
如果你必须阅读这篇文档才能理解你的程序行为，那么说明你的程序是有问题的。作者的意思是希望我们用channel或者同步原语来保证数据的并发读写，而不是利用内存模型来推断程序运行行为

Happens Before

在一个单独的goroutine中，读写必须按照程序指定的顺序执行。但是，编译器和处理器（CPU）可以对读写指令进行重排序，仅当这种重排序对程序的执行结果没有影响[1]。由于这种重排序，一个 goroutine 观察到的执行顺序可能与另一个 goroutine 感知到的顺序不同。例如，如果一个 goroutine 执行a = 1; b = 2;，另一个 goroutine 可能会在 a 的值更新之前观察到 b 的更新值。

为了规定这种读写顺序，我们定义了Happen Before（先行发生）原则[2]，规定了在Go程序中操作内存的局部顺序。如果一个事件e1发生在事件e2之前（happen before），那么我们说e2发生在e1之后（happen after）。如果e1既不happen before也不happen after事件e2，那么e1和e2是并发发生的(happening concurrently)。

在一个单独的goroutine中，happen before顺序就是程序表达的顺序

observe：当读操作r读取到的值刚好是写操作w执行的结果，那么我们说w被r观察(observe)到

在满足下面条件下，对于变量v的读操作r，可以observe到对应的写操作w

r没有happen before w(happen after or happen concurrency)
没有其他的写操作w' happen after w 但 happen before r (在w和r之间没有另外的确定的写操作，可以是happening concurrently)

为了保证r一定能observe到特定的w，即确保w是r唯一可以observe的结果，也就是r被确保能oberse到w，必须保证以下条件：

w happen before r
任何其他的写操作，要么happen before w，要么happen after r（即，在w和r之间，没有任何其他的写事件，包括happen before 和 happening concurrently）

这两个条件比上一对严格多了，它要求在w和r之间没有另外的写操作happening concurrently

在一个单独的goroutine中，没有并发，所以这两组条件是相同的：一个读操作r必然observe到最近的写操作w。当多个goroutine同时访问一个共享变量v，必须使用同步事件建立happen before条件确保r能observe到特定的w

变量初始化为零值，在内存模型中表现为写操作

对于大于单个机器字的值的读取和写入操作，表现为以未指定顺序进行的多个机器字大小的操作。 即对于一个大于单个机器字（32位机器为4byte，64位机器为8byte）的对象，其读取和写入都是多个操作，且是happening concurrently，其顺序是不可预测的。

Synchronization

Initialization

程序的初始化在单个 goroutine 中运行，但该 goroutine 可能会创建其他并发运行的 goroutine。

如果包 p 导入包 q，则 q 的 init 函数在包 p 的任何代码开始之前完成。

函数 main.main 在所有的 init 函数完成后开始执行。

Goroutine creation

go语句开始一个新的goroutine happen before 新goroutine开始执行

例如：

var a string

func f() {
    print(a)
}

func hello() {
    a = "hello, world"
    go f()
}

调用hello()将在未来的某个时刻打印hello world（也许在helloreturn之后）打印出来的一定是hello world，因为a的赋值操作先行发生于go f()

Goroutine destruction

goroutine的终止并没有保证happen before程序的任何事件。 例如：

var a string

func hello() {
    go func() { a = "hello" }()
    print(a)
}

a的赋值并没有伴随任何同步事件，所以它并不能确保被其他goroutineobserve到。事实上，激进的编译器可能会删掉整个go func() { a = "hello" }() 语句，因为它不一定会生效。

如果一个goroutine的效果必须被其他goroutineobserve，要使用锁或者channel通讯这类同步机制建立相对顺序。

Channel communication

Channel 通信是不同goroutine的主要同步手段。通常在不同的goroutine中，对应特定channel的每个send操作，都有对应的receive操作

channel的send操作 happen before对应的`receive操作完成之前

var c = make(chan int, 10)
var a string

func f() {
    a = "hello, world"
    c <- 0
}

func main() {
    go f()
    <-c
    print(a)
}

上面的程序确定打印hello world，因为a的赋值操作先行发生于c的send操作c<-0，c的send操作先行发生于mian中对应的recevie操作<-c，recevie操作先行发生于print(a)。

当一个channel关闭后，receive会收到零值

channel的close操作happen before 因为channel cloase收到零值的receive操作

在上面的例子，把c<-0替换为close(c)，程序的执行顺序是一致的。

没有缓冲区的channel的receive操作happen before对应的send操作完成

下面的程序跟上一个类似，但是send和receive操作交换了，而且使用了unbuffered channel

var c = make(chan int)
var a string

func f() {
    a = "hello, world"
    <-c
}

func main() {
    go f()
    c <- 0
    print(a)
}

这个程序依然能保证打印hello world，因为a的赋值先行发生于receive操作<-c，receive操作先行发生于send操作c<-0完成，send操作完成先行发生于print操作[3]

如果代码中的channel是一个带缓冲区的channel（例如c = make(chan int, 1)），那么程序将无法保证打印hello world（可能会打印空字符串，崩溃或执行其他操作。）

对于容量C的channel，第k个receive操作happend before第k+C个send操作完成

此规则概括了先前的有缓冲的 channel 的规则。它允许用有缓冲的 channel 建立的计数信号量：channel中items的数量对应于资源当前的使用数量，channel的容量对应于资源同时允许的最大使用数量。send一个item到channel中代表获取一个信号量，从channel中receive一个item代表释放一个信号量。这是一个限制并发的通用用法。

下面程序为work list中的每个entry开启一个goroutine，但是这些goroutine协调使用limit channel确保最多同时有三个goroutine可以运行work方法。

var limit = make(chan int, 3)

func main() {
    for _, w := range work {
        go func(w func()) {
            limit <- 1
            w()
            <-limit
        }(w)
    }
    select{}
}

Locks

sync包实现了sync.Mutex和sync.RWMutex两种锁类型。

对于每个sync.Mutex或者sync.RWMutex类型的锁l，如果n<m，那么调用第n个l.Unlock() happend before 调用第m个l.Lock()返回

var l sync.Mutex
var a string

func f() {
    a = "hello, world"
    l.Unlock()
}

func main() {
    l.Lock()
    go f()
    l.Lock()
    print(a)
}

上面程序保证打印hello world，因为第一个l.Unlock() happen before 第二个l.Lock()返回。

对于sync.RWMutex类型的锁l的每个l.RLock()调用，l.RLock()成功返回happen after 第n次l.Unlock()，那么对应的l.RUnlock()返回 happend before 第n+1次l.Lock()

Once

sync通过Once类型，提供了一种在多个goroutine下初始化的安全机制。多个goroutine通过once.Do(f)执行特定的f()，但只有一个goroutine会真正执行，而其他goroutine会阻塞直到f()执行结束返回。

通过once.Do(f)执行的唯一f()返回 happen before于任意once.Do(f) 返回

var a string
var once sync.Once

func setup() {
    a = "hello, world"
}

func doprint() {
    once.Do(setup)
    print(a)
}

func twoprint() {
    go doprint()
    go doprint()
}

在上面的程序中，调用twoprint()将执行唯一一次setup()，且setup()返回happen before任意一个print，所以程序会打印两次hello world。

Incorrect synchronization

注意，读取操作 r 可以观察到与r同时发生的写入操作 w 所写的值。即使发生这种情况，也不意味着在 r 之后发生的读取操作将观察到在 w 之前发生的写入操作。

var a, b int

func f() {
    a = 1
    b = 2
}

func g() {
    print(b)
    print(a)
}

func main() {
    go f()
    g()
}

在上面的程序中，可能会打印2和0。f()中的a=1和b=2并没有happen before关系

这个事实使得一些常用习惯性用法失效。

双重检查锁（Double-checked lock）是一种为了避免同步开销的用法。例如，上面的twoprint可能被实现如下：

var a string
var done bool

func setup() {
    a = "hello, world"
    done = true
}

func doprint() {
    if !done {
        once.Do(setup)
    }
    print(a)
}

func twoprint() {
    go doprint()
    go doprint()
}

这并不能保证，在doprint中，观察到done的写入操作意味着同样能观察到对a的写入操作。这个版本可能（错误地）打印空字符串而不是"hello，world"。

另一个不正确的惯用语法是忙着等待一个值，如：

var a string
var done bool

func setup() {
    a = "hello, world"
    done = true
}

func main() {
    go setup()
    for !done {
    }
    print(a)
}

跟之前的程序一样，并不能保证，在main方法中，观察到done的写入操作意味着同样能观察到对a的写入操作，所以这个程序也可能打印一个空字符串。更糟糕的是，无法保证main方法可以observe带done的写操作，因为两个goroutine之间并没有同步事件，main方法的循环无法保证一定会退出。

这个主题有一些微小的变种，如下：

type T struct {
    msg string
}

var g *T

func setup() {
    t := new(T)
    t.msg = "hello, world"
    g = t
}

func main() {
    go setup()
    for g == nil {
    }
    print(g.msg)
}

即便main能observe到 g!=nil，也无法保证g.msg被初始化。

在这些例子中，解决方案都是一样的：使用显示的同步机制

总结

Within a single goroutine, the happens-before order is the order expressed by the program.
在一个单独的goroutine中，happen before顺序就是程序表达的顺序
If a package p imports package q, the completion of q's init functions happens before the start of any of p's.
如果包 p 导入包 q，则 q 的 init 函数在包 p 的任何代码开始之前完成。
The start of the function main.main happens after all init functions have finished.
函数 main.main 在所有的 init 函数完成后开始执行。
The go statement that starts a new goroutine happens before the goroutine's execution begins.
go语句开始一个新的goroutine happen before 新goroutine开始执行
The exit of a goroutine is not guaranteed to happen before any event in the program.
goroutine的终止并没有保证happen before程序的任何事件。
A send on a channel happens before the corresponding receive from that channel completes.
channel的send操作 happen before对应的`receive操作完成之前
The closing of a channel happens before a receive that returns a zero value because the channel is closed.
channel的close操作happen before 因为channel cloase收到零值的receive操作
A receive from an unbuffered channel happens before the send on that channel completes.
没有缓冲区的channel的receive操作happen before对应的send操作完成
The kth receive on a channel with capacity C happens before the k+Cth send from that channel completes.
对于容量C的channel，第k个receive操作happend before第k+C个send操作完成
For any sync.Mutex or sync.RWMutex variable l and n < m, call n of l.Unlock() happens before call m of l.Lock() returns.
对于每个sync.Mutex或者sync.RWMutex类型的锁l，如果n<m，那么调用第n个l.Unlock() happend before 调用第m个l.Lock()返回
For any call to l.RLock on a sync.RWMutex variable l, there is an n such that the l.RLock happens (returns) after call n to l.Unlock and the matching l.RUnlock happens before call n+1 to l.Lock.
对于sync.RWMutex类型的锁l的每个l.RLock()调用，l.RLock()成功返回happen after 第n次l.Unlock()，那么对应的l.RUnlock()返回 happend before 第n+1次l.Lock()
A single call of f() from once.Do(f) happens (returns) before any call of once.Do(f) returns.
通过once.Do(f)执行的唯一f()返回 happen before于任意once.Do(f) 返回

译者理解

个人理解，可能存在错误，欢迎讨论，敬请指教

[1] 指令重排序是优化手段，编译器可能会根据上下文重排序语言编译后的汇编指令，CPU可能会在运行过程中动态分析进行重排序，目的都是为了减小内存与CPU之间的速度差距。
[2] 在java的内存模型中，也有happen before原则，事实上，二者是类似的，本质上是一个东西，都是在并发读写中规定了共享变量读写顺序，以保证程序能正确运行。
[3] unbuffered channel相当于volatile关键字的Barrier作用，它在两个并发的goroutine设置了一个同步点，即channel收发之前的事件必然happen before channel收发之后的事件。

[译文]The Go Memory Model

Introduction

Advice

Happens Before

Synchronization

Initialization

Goroutine creation

Goroutine destruction

Channel communication

Locks

Once

Incorrect synchronization

总结

译者理解

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

程序员