go 内存管理

作者: 斜不靠谱 | 来源:发表于2019-12-10 11:32 被阅读0次

Go语言——内存管理
图解 Go 内存管理器的内存分配策略
go 内存模型简要说明
Go 语言内存管理（二）：Go 内存管理
Go 语言内存管理（一）：系统内存管理
go内存管理
go 内存管理
go内存管理
Go内存管理
Go内存管理

1. 内存分配步骤

go 给对象分配内存的主要流程:

object size > 32K，则使用 mheap 直接分配。
object size < 16 byte，使用 mcache 的小对象分配器 tiny 直接分配。（其实 tiny * 就是一个指针，暂且这么说吧。）
object size > 16 byte && size <=32K byte 时，先使用 mcache 中对应的 size class 分配。
如果 mcache 对应的 size class 的 span 已经没有可用的块，则向 mcentral 请求。
如果 mcentral 也没有可用的块，则向 mheap 申请，并切分。
如果 mheap 也没有合适的 span，则向操作系统申请。

1.1 关键数据结构

1.1.1 mcache

我们知道每个 Gorontine 的运行都是绑定到一个 P 上面，mcache 是每个 P 的 cache. 这样一个M处理mcache时就不需要加锁(因为同一时刻P只会绑定到一个M下)
mcache 结构如下:

// Per-thread (in Go, per-P) cache for small objects.
// No locking needed because it is per-thread (per-P).
type mcache struct {
   // 小对象分配器，小于 16 byte 的小对象都会通过 tiny 来分配。
    tiny             uintptr
    tinyoffset       uintptr
    local_tinyallocs uintptr // number of tiny allocs not counted in other stats

    // The rest is not accessed on every malloc.
    alloc [_NumSizeClasses]*mspan // spans to allocate from

    stackcache [_NumStackOrders]stackfreelist
    .......
}

其中
alloc [_NumSizeClasses]*mspan，这是一个大小为 67 的指针（指针指向 mspan ）数组（_NumSizeClasses = 67）），每个数组元素用来包含特定大小的块。当要分配内存大小时，为 object 在 alloc 数组中选择合适的元素来分配。67 种块大小为 0，8 byte, 16 byte, …

var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 896, 1024, 1152, 1280, 1408, 1536, 1792, 2048, 2304, 2688, 3072, 3200, 3456, 4096, 4864, 5376, 6144, 6528, 6784, 6912, 8192, 9472, 9728, 10240, 10880, 12288, 13568, 14336, 16384, 18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768}

而mspan的结构如下:

type mspan struct {
    next *mspan     // next span in list, or nil if none
    prev *mspan     // previous span in list, or nil if none
    list *mSpanList // For debugging. TODO: Remove.
 // 用位图来管理可用的 free object，1 表示可用
    allocCache uint64
    
    ...
    sizeclass   uint8      // size class,大小的类别
    ...
    elemsize    uintptr    // computed from sizeclass or from npages
    ...
}

1.1.2 mcentral

当 mcache 不够用的时候，会从 mcentral 申请。mcentral存储在mheap当中

type mcentral struct {
    lock      mutex      // 可能多个P竞争,因此要有锁
    sizeclass int32     // 也会有67个块大小的类别区分
    nonempty  mSpanList // list of spans with a free object, ie a nonempty free list
    empty     mSpanList // list of spans with no free objects (or cached in an mcache)
}

type mSpanList struct {
    first *mspan
    last  *mspan
}

1.1.3 mheap

type mheap struct {
    lock      mutex
    free      [_MaxMHeapList]mSpanList // free lists of given length
    freelarge mSpanList                // free lists length >= _MaxMHeapList
    busy      [_MaxMHeapList]mSpanList // busy lists of large objects of given length
    busylarge mSpanList                // busy lists of large objects length >= _MaxMHeapList
    sweepgen  uint32                   // sweep generation, see comment in mspan
    sweepdone uint32                   // all spans are swept

    // allspans is a slice of all mspans ever created. Each mspan
    // appears exactly once.
    //
    // The memory for allspans is manually managed and can be
    // reallocated and move as the heap grows.
    //
    // In general, allspans is protected by mheap_.lock, which
    // prevents concurrent access as well as freeing the backing
    // store. Accesses during STW might not hold the lock, but
    // must ensure that allocation cannot happen around the
    // access (since that may free the backing store).
    allspans []*mspan // all spans out there

    // spans is a lookup table to map virtual address page IDs to *mspan.
    // For allocated spans, their pages map to the span itself.
    // For free spans, only the lowest and highest pages map to the span itself.
    // Internal pages map to an arbitrary span.
    // For pages that have never been allocated, spans entries are nil.
    //
    // This is backed by a reserved region of the address space so
    // it can grow without moving. The memory up to len(spans) is
    // mapped. cap(spans) indicates the total reserved memory.
    spans []*mspan

    // sweepSpans contains two mspan stacks: one of swept in-use
    // spans, and one of unswept in-use spans. These two trade
    // roles on each GC cycle. Since the sweepgen increases by 2
    // on each cycle, this means the swept spans are in
    // sweepSpans[sweepgen/2%2] and the unswept spans are in
    // sweepSpans[1-sweepgen/2%2]. Sweeping pops spans from the
    // unswept stack and pushes spans that are still in-use on the
    // swept stack. Likewise, allocating an in-use span pushes it
    // on the swept stack.
    sweepSpans [2]gcSweepBuf

    // central free lists for small size classes.
    // the padding makes sure that the MCentrals are
    // spaced CacheLineSize bytes apart, so that each MCentral.lock
    // gets its own cache line.
    central [_NumSizeClasses]struct {
        mcentral mcentral
        pad      [sys.CacheLineSize]byte
    }
.....
}

mheap_ 是一个全局变量，会在系统初始化的时候初始化（在函数 mallocinit() 中）

allspans []*mspan: 所有的 spans 都是通过 mheap_ 申请，所有申请过的 mspan 都会记录在 allspans。结构体中的 lock 就是用来保证并发安全的。
central [_NumSizeClasses]…: 这个就是之前介绍的 mcentral ，每种大小的块对应一个 mcentral。
spans []*mspan: 记录 arena 区域页号（page number）和 mspan 的映射关系。

1.2 初始化

runtime·rt0_go 会调用schedinit,初始化环境

func schedinit() {
  ...
  mallocinit()
  ...
  gcinit()
  ......
  // 初始化P
  if procresize(procs) != nil {
        throw("unknown runnable goroutine during bootstrap")
    }
  .......
}

1.2.1 mallocinit

 // 操作系统内存的管理抽象层
  // runtime管理的内存空间可能为下面四种状态
    // 1. None 没有被映射管理的,region的默认状态
    // 2. Reserved 已经被runtime拥有,但是访问的话会出错,还没有计入进程的内存占用
    // 3. Prepared 在此类区域中访问内存未定义（防问可能故障，可能会返回意外的零等）
    // 4. Ready 可以安全访问
  //对于每个操作系统，都有一组通用的帮助程序定义了该过渡这些状态之间的存储区域。 助手如下：
    // sysAlloc transitions an OS-chosen region of memory from None to Ready.它从操作系统获取一块内存一般为100kb或者1mb,一般立刻可以使用
    // sysFree transitions a memory region from any state to None 用来释放内存给操作系统
    // sysReserve transitions a memory region from None to Reserved
    // sysMap transitions a memory region from Reserved to Prepared 可以使内存确保快速转换为Ready
    // sysUsed transitions a memory region from Prepared to Ready
    // sysUnused transitions a memory region from Ready to Prepared.通知OS,进程不再使用这块内存,OS可以做其他用途
    // sysFault transitions a memory region from Ready or Prepared to Reserved 如果访问这块内存会返回错误,只会在runtime debugging的时候使用
func mallocinit() {
    .....
    // 初始化mheap
    mheap_.init()
    // 初始化mcache
    _g_ := getg()
    _g_.m.mcache = allocmcache()
    .....
}

1.2.2 mheap_.init() 初始化mheap

func (h *mheap) init() {
    h.treapalloc.init(unsafe.Sizeof(treapNode{}), nil, nil, &memstats.other_sys)
    h.spanalloc.init(unsafe.Sizeof(mspan{}), recordspan, unsafe.Pointer(h), &memstats.mspan_sys)
    h.cachealloc.init(unsafe.Sizeof(mcache{}), nil, nil, &memstats.mcache_sys)
    h.specialfinalizeralloc.init(unsafe.Sizeof(specialfinalizer{}), nil, nil, &memstats.other_sys)
    h.specialprofilealloc.init(unsafe.Sizeof(specialprofile{}), nil, nil, &memstats.other_sys)
    h.arenaHintAlloc.init(unsafe.Sizeof(arenaHint{}), nil, nil, &memstats.other_sys)

    // Don't zero mspan allocations. Background sweeping can
    // inspect a span concurrently with allocating it, so it's
    // important that the span's sweepgen survive across freeing
    // and re-allocating a span to prevent background sweeping
    // from improperly cas'ing it from 0.
    //
    // This is safe because mspan contains no heap pointers.
    h.spanalloc.zero = false

    // h->mapcache needs no init

        // 初始化central
    for i := range h.central {
        h.central[i].mcentral.init(spanClass(i))
    }
}

1.2.3 allocmcache()

func allocmcache() *mcache {
    var c *mcache
    systemstack(func() {
        lock(&mheap_.lock)
        c = (*mcache)(mheap_.cachealloc.alloc())
        c.flushGen = mheap_.sweepgen
        unlock(&mheap_.lock)
    })
    for i := range c.alloc {
        c.alloc[i] = &emptymspan
    }
    c.next_sample = nextSample()
    return c
}

1.2.4 procresize 初始化P

func procresize(nprocs int32) *p {
.......
    //确保P的个数为nprocs个
    if nprocs > int32(len(allp)) {
        // Synchronize with retake, which could be running
        // concurrently since it doesn't run on a P.
        lock(&allpLock)
        if nprocs <= int32(cap(allp)) {
            allp = allp[:nprocs]
        } else {
            nallp := make([]*p, nprocs)
            // Copy everything up to allp's cap so we
            // never lose old allocated Ps.
            copy(nallp, allp[:cap(allp)])
            allp = nallp
        }
        unlock(&allpLock)
    }

    // initialize new P's
    // 初始化新的P
    for i := old; i < nprocs; i++ {
        pp := allp[i]
        if pp == nil {
            pp = new(p)
        }
        pp.init(i)
        atomicstorep(unsafe.Pointer(&allp[i]), unsafe.Pointer(pp))
    }
......
}

其中 pp.init(i)

// init initializes pp, which may be a freshly allocated p or a
// previously destroyed p, and transitions it to status _Pgcstop.
func (pp *p) init(id int32) {
    pp.id = id
    pp.status = _Pgcstop
    pp.sudogcache = pp.sudogbuf[:0]
    for i := range pp.deferpool {
        pp.deferpool[i] = pp.deferpoolbuf[i][:0]
    }
    pp.wbBuf.reset()
    if pp.mcache == nil {
        if id == 0 {
            // p0使用m0的mcache
            if getg().m.mcache == nil {
                throw("missing mcache?")
            }
            pp.mcache = getg().m.mcache // bootstrap
        } else {
            // 其他p使用新创建分配mcache 
            pp.mcache = allocmcache()
        }
    }
    if raceenabled && pp.raceprocctx == 0 {
        if id == 0 {
            pp.raceprocctx = raceprocctx0
            raceprocctx0 = 0 // bootstrap
        } else {
            pp.raceprocctx = raceproccreate()
        }
    }
}

2 内存申请

申请内存时会调用newobject方法

2.1 newobject

func newobject(typ *_type) unsafe.Pointer {
    return mallocgc(typ.size, typ, true)
}

2.2 newobject

分配一个size byte的对象
大于32K的直接从heap上分配
其他则从 per-P cache分配

// Allocate an object of size bytes.
// Small objects are allocated from the per-P cache's free lists.
// Large objects (> 32 kB) are allocated straight from the heap.
func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
        // _GCoff 为 GC not running,后台清理阶段,write barrier disabled
        // GC marking roots and workbufs: allocate black, write barrier ENABLED
        // _GCmarktermination GC mark termination: allocate black, P's help GC, write barrier ENABLED
    if gcphase == _GCmarktermination {
        throw("mallocgc called with gcphase == _GCmarktermination")
    }
        // 如果申请大小为0,直接返回
    if size == 0 {
        return unsafe.Pointer(&zerobase)
    }
    ....
    //  getg().m.mcache 获取当前m的mcache
    c := gomcache()
    ....
    if size <= maxSmallSize {
      if noscan && size < maxTinySize {
          // 不包含指针并且小于16bytes的直接在mcahce Tiny allocator
          ....
          // 如果当前的tiny还有足够空间,则直接从tiny分配
          if off+size <= maxTinySize && c.tiny != 0 {
                // The object fits into existing tiny block.
                x = unsafe.Pointer(c.tiny + off)
                // 更新c.tinyoffset
                c.tinyoffset = off + size
                c.local_tinyallocs++
                mp.mallocing = 0
                releasem(mp)
                return x
            }
           //否则新建一个tiny,并分配size
           // Allocate a new maxTinySize block.
            span := c.alloc[tinySpanClass]
            v := nextFreeFast(span)
            if v == 0 {
                    // 如果mcache中的空闲span已经用完,则会激活gc
                v, _, shouldhelpgc = c.nextFree(tinySpanClass)
            }
            x = unsafe.Pointer(v)
            (*[2]uint64)(x)[0] = 0
            (*[2]uint64)(x)[1] = 0
            // See if we need to replace the existing tiny block with the new one
            // based on amount of remaining free space.
           //看这意思,如果剩余空间比旧的tiny大,则tiny指向当前的span,否则还在旧的span
            if size < c.tinyoffset || c.tiny == 0 {
                c.tiny = uintptr(x)
                c.tinyoffset = size
            }
            size = maxTinySize
      }
    } else {
            // 如果size <= maxSmallSize && (!noscan || size > maxTinySize)则从mcache中分配
            var sizeclass uint8
            if size <= smallSizeMax-8 {
                sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
            } else {
                sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
            }
            size = uintptr(class_to_size[sizeclass])
            spc := makeSpanClass(sizeclass, noscan)
            span := c.alloc[spc]
            v := nextFreeFast(span)
            if v == 0 {
                    // 如果mcache中的空闲span已经用完,则会激活gc
                v, span, shouldhelpgc = c.nextFree(spc)
            }
            x = unsafe.Pointer(v)
            // 如果需要清0的话,则清0
            if needzero && span.needzero != 0 {
                memclrNoHeapPointers(unsafe.Pointer(v), size)
            }        
    } else {   // 如果size >  maxSmallSize(32k) ,则从mheap分配
        var s *mspan
        // 每次从mheap分配内存都会激活gc
        shouldhelpgc = true
        systemstack(func() {
            s = largeAlloc(size, needzero, noscan)
        })
        s.freeindex = 1
        s.allocCount = 1
        x = unsafe.Pointer(s.base())
        size = s.elemsize
    }
  ....
  // 开启gc
    if shouldhelpgc {
        if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
            gcStart(t)
        }
    }
  return x
}

其中nextFree
返回一个spc类型的span,如果没有空闲的span,则refills cache
并根据分配过程决定是否要开启一轮gc

func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
    s = c.alloc[spc]
    shouldhelpgc = false
    freeIndex := s.nextFreeIndex()
    if freeIndex == s.nelems {
        // The span is full.
        if uintptr(s.allocCount) != s.nelems {
            println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
            throw("s.allocCount != s.nelems && freeIndex == s.nelems")
        }
     // 没有空闲的span,则从 mheap_.central[spc].mcentral.cacheSpan() 获取span补充
        c.refill(spc)
     // 激活gc
        shouldhelpgc = true
        s = c.alloc[spc]

        freeIndex = s.nextFreeIndex()
    }

    if freeIndex >= s.nelems {
        throw("freeIndex is not valid")
    }

    v = gclinkptr(freeIndex*s.elemsize + s.base())
    s.allocCount++
    if uintptr(s.allocCount) > s.nelems {
        println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
        throw("s.allocCount > s.nelems")
    }
    return
}

3 垃圾回收

go使用三色标记法做垃圾回收

起初所有对象都是白色。
从根出发扫描所有可达对象，标记为灰色，放入待处理队列。
从队列取出灰色对象，将其引用对象标记为灰色放入队列，自身标记为黑色。
重复 3，直到灰色对象队列为空。此时白色对象即为垃圾，进行回收。

image.png

关于上图有几点需要说明的是:
首先从 root 开始遍历，root 包括全局指针和 goroutine 栈上的指针。
mark 有两个过程。
- 从 root 开始遍历，标记为灰色。遍历灰色队列。
- re-scan 全局指针和栈。因为 mark 和用户程序是并行的，所以在过程 1 的时候可能会有新的对象分配，这个时候就需要通过写屏障（write barrier）记录下来。re-scan 再完成检查一下。
Stop The World 有两个过程。
- 第一个是 GC 将要开始的时候，这个时候主要是一些准备工作，比如 enable write barrier。
- 第二个过程就是上面提到的 re-scan 过程。如果这个时候没有 stw，那么 mark 将无休止。

另外针对上图各个阶段对应 GCPhase 如下：

Off: _GCoff
Stack scan ~ Mark: _GCmark
Mark termination: _GCmarktermination

3.1 垃圾回收周期

sweep termination
mark
mark termination
sweep

3.2 何时开启gc?

如果分配内存时mcache没有空闲的span
如果申请超过32k的内存块
调用runtime.GC(),主动GC
sysmon保证最多每隔forcegcperiod(2分钟)一次gc

3.3 gcStart

gcStart 开启GC,从_GCoff到_GCmark(debug.gcstoptheworld == 0时)

func gcStart(trigger gcTrigger) {
  .....
  gcBgMarkStartWorkers()
  .......
}

3.3.1 gcBgMarkStartWorkers

gcBgMarkStartWorkers prepares 后台mark worker goroutines
直到mark phase才会运行这些g

func gcBgMarkStartWorkers() {
    // Background marking is performed by per-P G's. Ensure that
    // each P has a background GC G.
        // 会给每个p构建一个markworker,并且阻塞markwoker直到调用gcController.findRunnable唤醒
    for _, p := range allp {
        if p.gcBgMarkWorker == 0 {
            go gcBgMarkWorker(p)
            notetsleepg(&work.bgMarkReady, -1)
            noteclear(&work.bgMarkReady)
        }
    }
}

gcBgMarkWorker

func gcBgMarkWorker(_p_ *p) {
    notewakeup(&work.bgMarkReady)
        for {
       // 一直休眠,直到通过 gcController.findRunnable.唤醒
        // Go to sleep until woken by gcController.findRunnable.
        // We can't releasem yet since even the call to gopark
        // may be preempted.
        gopark(func(g *g, parkp unsafe.Pointer) bool {
            park := (*parkInfo)(parkp)

            // The worker G is no longer running, so it's
            // now safe to allow preemption.
            releasem(park.m.ptr())

            // If the worker isn't attached to its P,
            // attach now. During initialization and after
            // a phase change, the worker may have been
            // running on a different P. As soon as we
            // attach, the owner P may schedule the
            // worker, so this must be done after the G is
            // stopped.
            if park.attach != 0 {
                p := park.attach.ptr()
                park.attach.set(nil)
                // cas the worker because we may be
                // racing with a new worker starting
                // on this P.
                if !p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) {
                    // The P got a new worker.
                    // Exit this worker.
                    return false
                }
            }
            return true
        }, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0)
        // 等待gcController.findRunnable.唤醒后执行
        systemstack(func() {
            // Mark our goroutine preemptible so its stack
            // can be scanned. This lets two mark workers
            // scan each other (otherwise, they would
            // deadlock). We must not modify anything on
            // the G stack. However, stack shrinking is
            // disabled for mark workers, so it is safe to
            // read from the G stack.
            casgstatus(gp, _Grunning, _Gwaiting)
            switch _p_.gcMarkWorkerMode {
            default:
                throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
            case gcMarkWorkerDedicatedMode:
                gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
                if gp.preempt {
                    // We were preempted. This is
                    // a useful signal to kick
                    // everything out of the run
                    // queue so it can run
                    // somewhere else.
                    lock(&sched.lock)
                    for {
                        gp, _ := runqget(_p_)
                        if gp == nil {
                            break
                        }
                        globrunqput(gp)
                    }
                    unlock(&sched.lock)
                }
                // Go back to draining, this time
                // without preemption.
                gcDrain(&_p_.gcw, gcDrainFlushBgCredit)
            case gcMarkWorkerFractionalMode:
                gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
            case gcMarkWorkerIdleMode:
                gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
            }
            casgstatus(gp, _Gwaiting, _Grunning)
        })
        }
  }
}

调度时优先调度gcworker

func schedule() {
    .......
    if gp == nil && gcBlackenEnabled != 0 {
        gp = gcController.findRunnableGCWorker(_g_.m.p.ptr())
        tryWakeP = tryWakeP || gp != nil
    }
    .......
}

findRunnableGCWorker

// 返回the background mark worker for _p_ if it should be run
func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g {
        ......
    // Run the background mark worker
    gp := _p_.gcBgMarkWorker.ptr()
    casgstatus(gp, _Gwaiting, _Grunnable)
    if trace.enabled {
        traceGoUnpark(gp, 0)
    }
    return gp
}

唤醒gcBgMarkWorker后,执行gcDrain做具体的标记操作

3.4 标记

gcDrain
gcstart启动阶段准备了N个goMarkWorkers。每个worker都处理以下相同流程。
如果是第一次mark则首先markroot将所有root区的指针入队。
从gcw中取节点出对开始扫描处理scanobject，节点出队列就是黑色了。
扫描时获取该节点所有子节点的类型信息判断是不是指针，若是指针且并没有被标记则greyobject入队。
每个worker都去gcw中拿任务直到为空break。

gcDrain 遍历roots和work buffers中的objects,一直对对象做标记

func gcDrain(gcw *gcWork, flags gcDrainFlags) {
    // Drain root marking jobs.
    if work.markrootNext < work.markrootJobs {
        for !(preemptible && gp.preempt) {
            job := atomic.Xadd(&work.markrootNext, +1) - 1
            if job >= work.markrootJobs {
                break
            }
          // 标记root
            markroot(gcw, job)
            if check != nil && check() {
                goto done
            }
        }
    }
}