探索Go内存管理(分配)

作者: Love语鬼 | 来源:发表于2017-09-14 11:12 被阅读749次

探索Go内存管理(分配)
图解 Go 内存管理器的内存分配策略
go-内存机制(3)
探索iOS内存分配
Golang源码之图解内存分配
Go内存分配机制总结
Go 内存管理 -- 内存分配二
Go 内存管理 -- 内存分配一
GO内存管理与分配
Go语言——内存管理

基于1.8.3版本，64位Linux操作系统

1、概述

Go内存管理基于tcmalloc，使用连续虚拟地址，以页(8k)为单位、多级缓存进行管理；

在分配内存时，需要对size进行对齐处理，根据best-fit找到合适的mspan，对未用完的内存还会拆分成其他大小的mspan继续使用

在new一个object时(忽略逃逸分析)，根据object的size做不同的分配策略：

极小对象(size<16byte)直接在当前P的mcache上的tiny缓存上分配;
小对象(16byte <= size <= 32k)在当前P的mcache上对应slot的空闲列表中分配，无空闲列表则会继续向mcentral申请(还是没有则向mheap申请)；
大对象(size>32k)直接通过mheap申请。

2、数据结构

2.1 mspan

mspan并不直接拥有内存空间，它负责管理起始地址为startAddr、级别(预分配页个数)为sizeclass的连续地址空间。

摘取重点内容(下同)
type mspan struct {
    //双向链表
    next *mspan     
    prev *mspan    
    //起始地址
    startAddr     uintptr   
    //包含多少页
    npages        uintptr   
    //
    stackfreelist gclinkptr 
    //有多少对象
    nelems uintptr 
    //gc相关
    sweepgen    uint32
    //级别
    sizeclass   uint8      
    //已被mcache使用
    incache     bool      
    //状态
    state       mSpanState
}

2.2 mcache

Go为Per-thread (in Go, per-P)分配了mcache管理结构，所以对其操作是不需要锁的，每个mcache有大小为67的mspan数组，存储不同级别大小的mspan

type mcache struct {
    tiny             uintptr
    tinyoffset       uintptr
    alloc [_NumSizeClasses]*mspan
    stackcache [_NumStackOrders]stackfreelist
    ...
}

2.3 mcentral

mcentral集中管理，当在mcache申请失败的时候，会向mcentral申请；mcentral有个关键方法cacheSpan()，它是整个分配的核心算法

type mcentral struct {
    lock      mutex
    sizeclass int32
    nonempty  mSpanList 
    empty     mSpanList 
}

2.4 mheap

mheap是真实拥有虚拟地址的结构，同时拥有67个级别的mcentral，以及所有分配的mspan。

// _NumSizeClasses := 67
// _MaxMHeapList := 128

type mheap struct {
    lock      mutex
    //size < 128 * 8k(1M)的可用mspanList
    free      [_MaxMHeapList]mSpanList 
    //size >= 128 * 8k(1M)的可用mspanList
    freelarge mSpanList 
    busy      [_MaxMHeapList]mSpanList 
    busylarge mSpanList       
    
    //gc相关
    sweepgen  uint32                   
    sweepdone uint32                
    //所有的mspan
    allspans []*mspan 
    //页到span的查找表
    spans []*mspan

    //位图
    bitmap         uintptr 
    bitmap_mapped  uintptr

    //真实申请的内存起始地址
    arena_start    uintptr
    //真实申请的内存目前可用起始地址
    arena_used     uintptr 
    //真实申请的内存结束地址
    arena_end      uintptr

    //分级的mcentral
    central [_NumSizeClasses]struct {
        mcentral mcentral
        pad      [sys.CacheLineSize]byte
    }
    ...
}

2.5 四者的关系示图

关系示图

3、初始化

初始化时，Go向系统申请预留一段连续虚拟地址，大小(64位机器上)为512M(spans_mapped)+16G(bitmap_mapped)+512G(arena)

向系统申请预留的连续地址空间
+----------+-----------+-----------------------------+
|  spans   |   bitmap  |           arena             |
|  512M    |     16G   |            512G             |
+----------+-----------+-----------------------------+

mheap的初始化在func mallocinit()中，而mallocinit被schedinit()调用
/src/runtime/proc.go

// The bootstrap sequence is:
//
//  call osinit
//  call schedinit
//  make & queue new G
//  call runtime·mstart
//
// The new G calls runtime·main.

mallocinit的逻辑为:

func mallocinit() {
        // 0. 检查系统/硬件信息，bala bala

        // 1. 计算预留空间大小
        arenaSize := round(_MaxMem, _PageSize)
        bitmapSize = arenaSize / (sys.PtrSize * 8 / 2)
        spansSize = arenaSize / _PageSize * sys.PtrSize
        spansSize = round(spansSize, _PageSize)

        // 2. 尝试预留地址(区分不同平台 略)
        for i := 0; i <= 0x7f; i++ {
            ...
            pSize = bitmapSize + spansSize + arenaSize + _PageSize
            p = uintptr(sysReserve(unsafe.Pointer(p), pSize, &reserved))
        }

        // 3. 初始化部分mheap中变量
        p1 := round(p, _PageSize)
        spansStart := p1
        mheap_.bitmap = p1 + spansSize + bitmapSize
        mheap_.arena_start = p1 + (spansSize + bitmapSize)
        mheap_.arena_end = p + pSize
        mheap_.arena_used = p1 + (spansSize + bitmapSize)
        mheap_.arena_reserved = reserved
      
        // 4. 其他部分初始化，67个mcentral在这里初始化
        mheap_.init(spansStart, spansSize)
        _g_ := getg()
        _g_.m.mcache = allocmcache()
}

mheap的初始化方法

// Initialize the heap.
func (h *mheap) init(spansStart, spansBytes uintptr) {
    // 0. xxalloc.init
        // 1. free、busy init
    for i := range h.free {
        h.free[i].init()
        h.busy[i].init()
    }
    h.freelarge.init()
    h.busylarge.init()

    // 2. mcentral初始化
    for i := range h.central {
        h.central[i].mcentral.init(int32(i))
    }
    
    // 3. spans初始化
    sp := (*slice)(unsafe.Pointer(&h.spans))
    sp.array = unsafe.Pointer(spansStart)
    sp.len = 0
    sp.cap = int(spansBytes / sys.PtrSize)
}

mcentral的初始化比较简单，设置自己的级别，同时将两个mspanList初始化

而mcache的初始化在func procresize(nprocs int32) *p中，procresize也在schedinit()中调用，顺序在mallocinit()之后，也就是说发生在mheap于mcentral的初始化后面

func procresize(nprocs int32) *p {
    // 0. bala bala

    // 1. 初始化P
    for i := int32(0); i < nprocs; i++ {
        pp := allp[i]

        //初始化每个P的mcache
        if pp.mcache == nil {
            if old == 0 && i == 0 {
                if getg().m.mcache == nil {
                    throw("missing mcache?")
                }
                pp.mcache = getg().m.mcache 
            } else {
                pp.mcache = allocmcache()
            }
        }
    }
}

而allocmcache比较简单

func allocmcache() *mcache {
    lock(&mheap_.lock)
    c := (*mcache)(mheap_.cachealloc.alloc())
    unlock(&mheap_.lock)
    for i := 0; i < _NumSizeClasses; i++ {
        c.alloc[i] = &emptymspan
    }
    c.next_sample = nextSample()
    return c
}

至此，管理结构mheap、67个mcentral及每个P的mcache都初始化完毕，接下来进入重点--分配阶段。

4、分配

前面说过，在分配对象内存时，根据对象的大小分为3个级别：极小、小、大；在这里我们假设关闭内联优化，即没有逃逸的存在。当new一个对象时，调用的是:

func newobject(typ *_type) unsafe.Pointer {
    return mallocgc(typ.size, typ, true)
}

mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
    
    dataSize := size
    c := gomcache()
    var x unsafe.Pointer
    noscan := typ == nil || typ.kind&kindNoPointers != 0
    if size <= maxSmallSize {
        if noscan && size < maxTinySize {
            // 极小对象
        } else {
            // 小对象
    } else {
        // 大对象
    }
}

我们将针对这三类一一分析

首先是极小对象(<16byte)

off := c.tinyoffset
// 地址对齐
if size&7 == 0 {
    off = round(off, 8)
} else if size&3 == 0 {
    off = round(off, 4)
} else if size&1 == 0 {
    off = round(off, 2)
}

//若之前tiny剩余空间够用，则将极小对象拼在一起
if off+size <= maxTinySize && c.tiny != 0 {
    // The object fits into existing tiny block.
    x = unsafe.Pointer(c.tiny + off)
    c.tinyoffset = off + size
    c.local_tinyallocs++
    mp.mallocing = 0
    releasem(mp)
    return x
}

//不若，则申请新的mspan
// Allocate a new maxTinySize block.
span := c.alloc[tinySizeClass]
v := nextFreeFast(span)
if v == 0 {
    v, _, shouldhelpgc = c.nextFree(tinySizeClass)
}
x = unsafe.Pointer(v)
(*[2]uint64)(x)[0] = 0
(*[2]uint64)(x)[1] = 0

// 新申请的剩余空间大于之前的剩余空间
if size < c.tinyoffset || c.tiny == 0 {
    c.tiny = uintptr(x)
    c.tinyoffset = size
}
size = maxTinySize

其中nextFreeFast和nextFree先跳过去，因为小对象分配时也会使用到，之后一并分析；下面是极小对象分配的示意图
先是有足够剩余空间，那么对齐都直接利用(为了便于说明问题，tinyoffset用箭头指向表示)

剩余空间够用

如果没有足够空间，则申请新的，若必要修正tiny及tinyoffset的值

剩余空间不足

接着分析小对象(16byte <= size <= 32k)
介于16b到32k之间大小的对象分配比较复杂，可以结合文末的流程图，便于记忆

var sizeclass uint8
if size <= smallSizeMax-8 {
    sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
} else {
    sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
}
size = uintptr(class_to_size[sizeclass])
span := c.alloc[sizeclass]
v := nextFreeFast(span)
if v == 0 {
    v, span, shouldhelpgc = c.nextFree(sizeclass)
}
x = unsafe.Pointer(v)
if needzero && span.needzero != 0 {
    memclrNoHeapPointers(unsafe.Pointer(v), size)
}

首先计算申请对象的sizeclass，以此找到对应大小的mspan；然后找到可用的地址。这里面有两个重要的方法nextFreeFast和nextFree：

// nextFreeFast returns the next free object if one is quickly available.
// Otherwise it returns 0.
func nextFreeFast(s *mspan) gclinkptr {
    //计算s.allocCache从低位起有多少个0
    theBit := sys.Ctz64(s.allocCache) 
    if theBit < 64 {
        result := s.freeindex + uintptr(theBit)
        if result < s.nelems {
            freeidx := result + 1
            if freeidx%64 == 0 && freeidx != s.nelems {
                return 0
            }
            //更新位图、可用游标
            s.allocCache >>= (theBit + 1)
            s.freeindex = freeidx
            //根据result和s.elemsize起始地址计算v
            v := gclinkptr(result*s.elemsize + s.base())
            s.allocCount++
            return v
        }
    }
    return 0
}

重点是当mcache没有可用地址时，通过nextFree向mcentral甚至mheap申请

func (c *mcache) nextFree(sizeclass uint8) (v gclinkptr, s *mspan, shouldhelpgc bool) {
    s = c.alloc[sizeclass]
    shouldhelpgc = false
    freeIndex := s.nextFreeIndex()
    if freeIndex == s.nelems {
        // The span is full.
        ...
        //重新填充当前的mcache
        systemstack(func() {
            c.refill(int32(sizeclass))
        })
        shouldhelpgc = true
        s = c.alloc[sizeclass]

        freeIndex = s.nextFreeIndex()
    }
    ...
    ...
}

向mcentral是通过refill来实现的

func (c *mcache) refill(sizeclass int32) *mspan {
    _g_ := getg()
    _g_.m.locks++
    // 想mcentral归还当前的mspan
    s := c.alloc[sizeclass]
    if uintptr(s.allocCount) != s.nelems {
        throw("refill of span with free space remaining")
    }
    if s != &emptymspan {
        s.incache = false
    }

    // 获取新的, mcentral.cacheSpan()重点分析
    s = mheap_.central[sizeclass].mcentral.cacheSpan()
    ...
    c.alloc[sizeclass] = s
    _g_.m.locks--
    return s
}

下面是一个很长的调用链路...

func (c *mcentral) cacheSpan() *mspan {
    ...
retry:
    var s *mspan
    //先从非空列表中找
    for s = c.nonempty.first; s != nil; s = s.next {
        ...
        goto havespan
    }
    //没有则从空列表中找
    for s = c.empty.first; s != nil; s = s.next {
           ...
            goto retry
        }
    //实在没有，那么申请吧        
    s = c.grow()
    if s == nil {
        return nil
    }
    
havespan:
      //
    return s
}

// 由mcentral申请
func (c *mcentral) grow() *mspan {
    ...
    s := mheap_.alloc(npages, c.sizeclass, false, true)
    ...
    return s
}

//由mheap申请
func (h *mheap) alloc(npage uintptr, sizeclass int32, large bool, needzero bool) *mspan {
    ...
    systemstack(func() {
        s = h.alloc_m(npage, sizeclass, large)
    })
    ...
    return s
}

func (h *mheap) alloc_m(npage uintptr, sizeclass int32, large bool) *mspan {
    ...
    s := h.allocSpanLocked(npage)
    ...
    return s
}

//Best-fit算法
func (h *mheap) allocSpanLocked(npage uintptr) *mspan {
    //先从128页以内(1M)的free列表中寻找
    for i := int(npage); i < len(h.free); i++ {
        list = &h.free[i]
        ...
    }

    // Best-fit 对于大对象申请也会用到这个方法
    //基本思路是找到最小可以满足需求的mspan，如果有多个，选择地址最小的
    list = &h.freelarge
    s = h.allocLarge(npage)
    if s == nil {
        //如果mheap也没有空间了，向系统申请吧
        if !h.grow(npage) {
            return nil
        }
        s = h.allocLarge(npage)
        if s == nil {
            return nil
        }
    }

HaveSpan:
    //转移s
    list.remove(s)
    if s.inList() {
        throw("still in list")
    }
    //对于申请到的内存大于想要的，将其拆分，避免浪费
    if s.npages > npage {
        ...
        h.freeSpanLocked(t, false, false, s.unusedsince)
        s.state = _MSpanFree
    }
    
    return s
}

//向系统申请空间
func (h *mheap) grow(npage uintptr) bool {
    //计算页数
    npage = round(npage, (64<<10)/_PageSize)
    ask := npage << _PageShift
    if ask < _HeapAllocChunk {
        ask = _HeapAllocChunk
    }

    v := h.sysAlloc(ask)
    ...
}

func (h *mheap) sysAlloc(n uintptr) unsafe.Pointer {
  ...
  //    sysReserve调用mmap预留空间，至此调用链结束
  p := uintptr(sysReserve(unsafe.Pointer(h.arena_end), p_size, &reserved))
  ...           
}

最后是大对象

var s *mspan
shouldhelpgc = true
systemstack(func() {
    // largeAlloc会调用mheap_.alloc，这个方法在小对象申请时已经追踪过
    s = largeAlloc(size, needzero)
})
s.freeindex = 1
s.allocCount = 1
x = unsafe.Pointer(s.base())
size = s.elemsize