golang map源码浅析

作者: wxxhfg | 来源:发表于2022-05-03 21:18 被阅读0次

声明

本文采用版本为： go1.17.5
本文仅供自己学习使用，不做商业用途。

map 的结构：

hmap

hmap结构体定义

golang 中 map的实现结构为：哈希表 + 链表。其中链表，作用是当发生hash冲突时，拉链法生成的结点。

// A header for a Go map.
type hmap struct {
    // map中的元素个数
    count     int 
    // map当前的状态， 如： 正在写， 正在遍历
    flags     uint8
    // map中 buckets的对数 log_2， 如 buckets的容量为 2^B
    B         uint8  
    // overflow 的数量
    noverflow  uint16
    // hash种子， 用来生成hash值
    hash0     uint32 // hash seed
    // 指向map的 hash table。 hash table的长度为2^B
    buckets    unsafe.Pointer 
    // 指向扩容时的原 buckets， 新的buckets会申请新的空间。
    // 扩容的时候，buckets 长度会是 oldbuckets 的两倍
    oldbuckets unsafe.Pointer
    // 当前map的扩容进度（扩容搬迁到哪一个cell了）
    nevacuate  uintptr 

    extra *mapextra // optional fields
}

hmap 结构示意图

map 结构图

可以看到， []bmap 是一个hash table，每一个 bmap是我们常说的“桶”。经过hash 函数计算出来相同的hash值，放到相同的桶中。一个 bmap中可以存放 8个元素，如果多出8个，则生成新的结点，尾接到队尾。

bmap

bmap 结构定义

// A bucket for a Go map.
type bmap struct {
   // tophash generally contains the top byte of the hash value
   // for each key in this bucket. If tophash[0] < minTopHash,
   // tophash[0] is a bucket evacuation state instead.
   tophash [bucketCnt]uint8
   // Followed by bucketCnt keys and then bucketCnt elems.
   // NOTE: packing all the keys together and then all the elems together makes the
   // code a bit more complicated than alternating key/elem/key/elem/... but it allows
   // us to eliminate padding which would be needed for, e.g., map[int64]int8.
   // Followed by an overflow pointer.

   // 编译过程中会在后续中添加
   //keys       8个key
   //values     8个 value
   //padding    填充部分
   //overflow   指向后续结点
}

以上是只是静态文件 src/runtime/map.go 中的定义。 实际上编译期间会给它加料，动态地创建一个新的结构：

type bmap struct {
    topbits  [8]uint8
    keys     [8]keytype
    values   [8]valuetype
    pad      uintptr
    overflow uintptr
}

bmap 结构示意图

bmap

上图就是 bmap的内存模型，HOB Hash 指的就是 top hash。注意到 key 和 value 是各自放在一起的，并不是 key/value/key/value/... 这样的形式。源码里说明这样的好处是在某些情况下可以省略掉 padding 字段，节省内存空间。

每个 bmap设计成最多只能放 8 个 key-value 对，如果有第 9 个 key-value 落入当前的 bmap，那就需要再构建一个 bmap，通过 overflow 指针连接起来。

创建map

map创建方法:

ageMp := make(map[string]int)
// 指定 map 长度
ageMp := make(map[string]int, 8)

// ageMp 为 nil，不能向其添加元素，会直接panic
var ageMp map[string]int

我们实际上是通过调用的 makemap ，来创建map的。实际工作只是初始化了hmap中的各种字段，如：设置B的大小，设置hash 种子 hash 0.

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
    // 判断是否还有空间可以申请
    mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
    if overflow || mem > maxAlloc {
        hint = 0
    }

    // initialize Hmap
    if h == nil {
        h = new(hmap)
    }
    // 初始化时， 随机生成hash种子
    h.hash0 = fastrand()

    // Find the size parameter B which will hold the requested # of elements.
    // For hint < 0 overLoadFactor returns false since hint < bucketCnt.
    // 初始化合适的B的大小，
    B := uint8(0)
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B

    // allocate initial hash table
    // if B == 0, the buckets field is allocated lazily later (in mapassign)
    // If hint is large zeroing this memory could take a while.
    // 初始化hash table
    if h.B != 0 {
        var nextOverflow *bmap
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
        if nextOverflow != nil {
            h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }

    return h
}


// 得到以个 1 << B 的大小的数，同时判断是否超过了负载因子 6.5
func overLoadFactor(count int, B uint8) bool {
    //     13 * （2^B / 2） = 6.5 * 2^B
    //     map的容量count / 桶的数量 2^B  > 6.5
    return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

注意 :

makemap 返回是*hmap 指针，即 map 是引用对象，对map的操作会影响到结构体内部。

查找map元素

使用方式

package main

import "fmt"

func main() {
    // 初始化 map
    ageMap := make(map[string]int)
    // 插入元素
    ageMap["wxx"] = 24

    // 不带 comma 用法
    age1 := ageMap["a"]
    fmt.Println(age1) // 0

    // 带 comma 用法
    age2, ok := ageMap["b"] // 0， false
    fmt.Println(age2, ok)
}

对应的是下面两种方法

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
// mapaccess2 和 mapaccess1 功能相同 但是多返回一个是否搜索到的bool值
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool)

hash 函数如何确定

type maptype struct {
    typ    _type
    key    *_type
    elem   *_type
    bucket *_type // internal type representing a hash bucket
    // function for hashing keys (ptr to key, seed) -> hash
    hasher     func(unsafe.Pointer, uintptr) uintptr
    keysize    uint8  // size of key slot
    elemsize   uint8  // size of elem slot
    bucketsize uint16 // size of bucket
    flags      uint32
}

map的key的类型，实现了自己的hash 方式。每种类型实现hash函数方式不一样。

正常定位key

key 经过哈希计算后得到hash值，共 64 个 bit 位。其中后B 个bit位置，用来定位当前元素落在哪一个桶里，高8个bit 为当前 hash 值的top hash。实际上定位key的过程是一个双重循环的过程，外层循环遍历所有的overflow，内层循环遍历当前bmap 中的 8个元素。

使用低 B 为定位是在哪一个 bucket
遍历当前结点中的8个 cell是否有满足top hash的。如果有则定位完成，否则进入第3步
遍历链表的结点，如果定位到了，则返回值。否则返回零值。

举例说明：如果当前 B 的值为 5，那么buckets 的长度为 2^5 = 32。假设有个key 经过hash函数计算后，得到的hash结果为：

10010111 | 000011110110110010001111001010100010010110010101010 │ 00110

第B位 hash & B == 01010 == 6,定位到6号桶
高8位 tophash 10010111 == 151,遍历匹配bmap 中的tophash

外层遍历bucket 中的链表

双层循环遍历key

内层循环遍历 bmap中的8个 cell

keys定位到桶

扩容时定位key

建议先不看此部分内容，看完后续 修改 map中元素 -> 扩容 操作后再回头看此部分内容。

扩容前的数据：

扩容前数据

等量扩容后的数据：

等量扩容后，查找方式和原本相同，不多做赘述。

等量扩容

两倍扩容后的数据

两倍扩容后，oldbuckets 的元素，可能被分配成了两部分。查找顺序如下：

遍历到old buckets的 bucket还没有被搬迁过，正常查询，返回找到的元素。
遍历到old buckets的 bucket 被搬迁过了，则到新的buckets中寻找元素。

两倍扩容

源码：

此处只分析 mapaccess1,。mapaccess2 相比 mapaccess1多添加了是否找到的bool值，有兴趣可自行看一下。

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    // race 竞争检测
    if raceenabled && h != nil {
        callerpc := getcallerpc()
        pc := funcPC(mapaccess1)
        racereadpc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    if msanenabled && h != nil {
        msanread(key, t.key.size)
    }
    // 正确性检测
    if h == nil || h.count == 0 {
        if t.hashMightPanic() {
            t.hasher(key, 0) // see issue 23734
        }
        // 返回当前类型的0值
        return unsafe.Pointer(&zeroVal[0])
    }
    // 如果flags为 写 标识， 则同时读写冲突， 抛出panic。非协程安全
    if h.flags&hashWriting != 0 {
        throw("concurrent map read and map write")
    }
    // 依据hash 种子生成 当前key的 hash值
    hash := t.hasher(key, uintptr(h.hash0))
    // m = 2^B - 1, 实际上的作用是为了 只取低B 位， 来找到相对应桶的位置
    m := bucketMask(h.B)
    // 找到桶的位置
    b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
    // 正在扩容
    if c := h.oldbuckets; c != nil {
        // 非等量扩容（双倍扩容）
        if !h.sameSizeGrow() {
            // There used to be half as many buckets; mask down one more power of two.
            // 先 右移一位 得到原来old bucket的 mask的值
            m >>= 1
        }
        // 找到当前桶对应的 old bucket中的桶的位置
        oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
        // 原来的 oldbucket 还没有搬迁，则在old bucket中找
        if !evacuated(oldb) {
            b = oldb
        }
    }
    top := tophash(hash)
bucketloop:
    // 双重for循环 外层遍历overflow（链表） 内层遍历桶
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            // 如果tophash 不相等，
            if b.tophash[i] != top {
                // 如果当前为cell 为空， 且当前结点后面没有cell或者overflow，则剪枝， 提前结束
                if b.tophash[i] == emptyRest {
                    break bucketloop
                }
                // 找下一个cell
                continue
            }
            // 走到此处说明tophash 是相等的了， 需要进一步判断完整的hash值是否相等了。
            // k：key的地址 = 初始地址 + 偏移地址
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            // 如果key类型是指针，则解引用
            if t.indirectkey() {
                k = *((*unsafe.Pointer)(k))
            }
            // 判断当前cell的key 和 k的值是否相等
            // 如果key相等，则找到了对应的cell， 取值返回就可以了
            if t.key.equal(key, k) {
                // 找到对应的value的 地址 = 偏移地址 + key的类型大小 * 8 + i * value的类型大小
                e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                if t.indirectelem() {
                    e = *((*unsafe.Pointer)(e))
                }
                return e
            }
        }
    }
    // 遍历完成， 没有找到对应的cell，则返回零值
    return unsafe.Pointer(&zeroVal[0])
}

修改map元素

使用方式：

package main

import "fmt"

func main() {
    // 初始化 map
    ageMap := make(map[string]int)
    // 插入元素
    ageMap["wxx"] = 24
}

修改和插入元素

步骤如下：

定位key的位置，和 map的查询操作一样。
如果定位过程中发现当前map 正在进行扩容，则帮助进行扩容搬迁。否则当前插入没有意义。
如果当前map中已经存在该key，则修改value的值，返回
如果当前map中不存在该key，则在找到的第一个空的cell上插入该元素的 tophash， key， value。该步骤需要判断是否需要扩容，如果满足扩容条件则进行扩容

扩容条件：

每个bucket 中的数量大于 6.5（map的容量count / 桶的数量 2^B > 6.5）。此时太拥挤了，hashtable 冲突会比较严重
overflow的数量太多了了，查询效率太低。地广人稀，需要聚集当一起居住。
- 当 B <= 15时 noverflow >= 2^B
- 当 B > 15时， noverflow >= 2^15

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
// mapassign 插入元素 修改元素
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    // 初始化检测
    if h == nil {
        panic(plainError("assignment to entry in nil map"))
    }
    // 竞争检测
    if raceenabled {
        callerpc := getcallerpc()
        pc := funcPC(mapassign)
        racewritepc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    if msanenabled {
        msanread(key, t.key.size)
    }
    // 安全检测，如果当前有协程正在写，则panic ， 同时写错误
    if h.flags&hashWriting != 0 {
        throw("concurrent map writes")
    }
    // 依据hash种子，生成hash值
    hash := t.hasher(key, uintptr(h.hash0))

    // Set hashWriting after calling t.hasher, since t.hasher may panic,
    // in which case we have not actually done a write.
    // 将flags 的标志为改变为正在写
    h.flags ^= hashWriting
    // 如果没有 buckets, 则new 一个
    if h.buckets == nil {
        h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
    }

again:
    // 找到对应的桶 索引为 index = hash & 2^B - 1，第index个桶
    bucket := hash & bucketMask(h.B)
    // 如果正在扩容
    if h.growing() {
        // 帮助扩容
        growWork(t, h, bucket)
    }
    // 找到对应桶的地址
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    // tophash : 取hash值的高八位
    top := tophash(hash)

    // 要插入的cell 的tophash[i] 的地址
    var inserti *uint8
    // 要插入cell 的key 的地址
    var insertk unsafe.Pointer
    // 要插入cell 的value的地址
    var elem unsafe.Pointer
bucketloop:
    // 双重for 循环，寻找对应的cell
    // 外层for循环 遍历overflow
    // 内层for 循环， 遍历bucket
    for {
        for i := uintptr(0); i < bucketCnt; i++ {
            // tophash 不相等
            if b.tophash[i] != top {
                // 如果当前cell为空，且inserti == nil  ===》 找到的第一个可以容纳当前元素的cell，记录当前cell 的 i key value 的地址
                if isEmpty(b.tophash[i]) && inserti == nil {
                    // 记录tophash[] 中的i的地址
                    inserti = &b.tophash[i]
                    // 记录key 的地址
                    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
                    // 记录 value 的地址
                    elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                }
                // 如果当前cell为空， 且后面的没有cell 或 overflow了， 则说明后面不可能找了，直接跳出循环，
                if b.tophash[i] == emptyRest {
                    break bucketloop
                }
                continue
            }
            // 走到这里，说明tophash 相等了， 现在要比较完整hash了
            // 取到当前cell的key的地址
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            // 解应用
            if t.indirectkey() {
                k = *((*unsafe.Pointer)(k))
            }
            // 如果key 和 k不相等，则continue， 继续往后找
            if !t.key.equal(key, k) {
                continue
            }
            // already have a mapping for key. Update it.
            // 如果map中 已经有了当前元素， 需要更新一下value
            if t.needkeyupdate() {
                typedmemmove(t.key, k, key)
            }
            // 取到value的 地址
            elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            // 完成修改
            goto done
        }
        ovf := b.overflow(t)
        // 遍历完成了， 跳出循环。
        // 遍历完所有的cell 都没有找到。
        if ovf == nil {
            break
        }
        b = ovf
    }

    // Did not find mapping for key. Allocate new cell & add entry.

    // If we hit the max load factor or we have too many overflow buckets,
    // and we're not already in the middle of growing, start growing.
    // 判断一下 是否需要扩容
    // 扩容条件： 1. 当前map 没有在扩容
    //          2. 满足两个扩容条件中的一个
    //扩容条件 1：每个bucket 中的数量 大于 6.5（map的容量count / 桶的数量 2^B  > 6.5）。 此时太拥挤了，hashtable 冲突会比较严重
    //扩容条件 2：overflow的数量太多了， 查询效率太低。地广人稀， 需要聚集当一起居住
    //          当 B <= 15时 noverflow >= 2^B
    //          当 B > 15时， noverflow >= 2^15
    if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
        // 标识一下 开始扩容。 hashGrow()只进行分配空间， 具体扩容在growWork()中
        hashGrow(t, h)
        // 扩容后，元素的位置可能会改变，需要重新遍历一次
        goto again // Growing the table invalidates everything, so try again
    }
    // 此时的情况， 遍历完了全部的Overflow的cell， 仍然没有找到合适的位置（前面的位置都满了）， 则新建一个overflow 放置在第一个cell里
    if inserti == nil {
        // The current bucket and all the overflow buckets connected to it are full, allocate a new one.
        // 新建一个overflow
        newb := h.newoverflow(t, b)
        // tophash的 i 的地址
        inserti = &newb.tophash[0]
        // key的地址
        insertk = add(unsafe.Pointer(newb), dataOffset)
        // value的地址
        elem = add(insertk, bucketCnt*uintptr(t.keysize))
    }

    // store new key/elem at insert position
    // 解引用
    if t.indirectkey() {
        kmem := newobject(t.key)
        *(*unsafe.Pointer)(insertk) = kmem
        insertk = kmem
    }
    // 解引用
    if t.indirectelem() {
        vmem := newobject(t.elem)
        *(*unsafe.Pointer)(elem) = vmem
    }
    // 设置key的值
    typedmemmove(t.key, insertk, key)
    // 设置tophash的值
    *inserti = top
    // map容量 + 1
    h.count++

done:
    // 判断是否有修改了 flag的协程，如果有则panic
    if h.flags&hashWriting == 0 {
        throw("concurrent map writes")
    }
    // 恢复flag标识位
    h.flags &^= hashWriting
    // 解引用
    if t.indirectelem() {
        elem = *((*unsafe.Pointer)(elem))
    }
    // 返回value的值
    return elem
}

扩容

源码

扩容的标识：h.oldbuckets ！= nil

// 扩容，hashGrow 函数值是申请了新的内存， 以及将oldbucket上的flags 标识赋值给 新的buckets
func hashGrow(t *maptype, h *hmap) {
    // If we've hit the load factor, get bigger.
    // Otherwise, there are too many overflow buckets,
    // so keep the same number of buckets and "grow" laterally.
    bigger := uint8(1)
    if !overLoadFactor(h.count+1, h.B) {
        bigger = 0
        h.flags |= sameSizeGrow
    }
    oldbuckets := h.buckets
    newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

    flags := h.flags &^ (iterator | oldIterator)
    if h.flags&iterator != 0 {
        flags |= oldIterator
    }
    // commit the grow (atomic wrt gc)
    h.B += bigger
    h.flags = flags
    h.oldbuckets = oldbuckets
    h.buckets = newbuckets
    h.nevacuate = 0
    h.noverflow = 0

    if h.extra != nil && h.extra.overflow != nil {
        // Promote current overflow buckets to the old generation.
        if h.extra.oldoverflow != nil {
            throw("oldoverflow is not nil")
        }
        h.extra.oldoverflow = h.extra.overflow
        h.extra.overflow = nil
    }
    if nextOverflow != nil {
        if h.extra == nil {
            h.extra = new(mapextra)
        }
        h.extra.nextOverflow = nextOverflow
    }

    // the actual copying of the hash table data is done incrementally
    // by growWork() and evacuate().
}

// 实际扩容
func growWork(t *maptype, h *hmap, bucket uintptr) {
    // make sure we evacuate the oldbucket corresponding
    // to the bucket we're about to use
    // 搬迁元素， 将老bucket上的元素 搬迁到心bucket上
    evacuate(t, h, bucket&h.oldbucketmask())

    // evacuate one more oldbucket to make progress on growing
    // 如果仍在扩容，则继续帮助扩容
    if h.growing() {
        evacuate(t, h, h.nevacuate)
    }
}


// evacuate 将oldbuckets上的元素 搬迁到 新的buckets上
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
    // 找到oldbuckets 的 bucket的地址
    b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
    // 计算oldbucket 的bucket的数量 2^B  B == 2  1 00. 高位1
    newbit := h.noldbuckets()
    // 当前没有搬迁过
    if !evacuated(b) {
        // TODO: reuse overflow buckets instead of using new ones, if there
        // is no iterator using the old buckets.  (If !oldIterator.)

        // xy contains the x and y (low and high) evacuation destinations.
        // 如果是双倍扩容，则会由两个长为newbit（oldbuckets的长度）的数组， 低位为x 部分， 高位为 y 部分
        // 比如 原本是长度为2， 扩容后的长度为 2 + 2（4） 前一部分的长度为2的数组为 x 部分， 后一部分长度为2的数组为 y 部分
        var xy [2]evacDst
        // 默认只使用 x 部分
        x := &xy[0]
        // x部分的起始位置， 实际上是tophash[] 的起始位置
        x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
        // x部分的key 的起始位置
        x.k = add(unsafe.Pointer(x.b), dataOffset)
        // x部分的value 的起始位置
        x.e = add(x.k, bucketCnt*uintptr(t.keysize))
        // 双倍扩容才需要使用到 y 部分。 等量扩容只需要使用 x 部分
        if !h.sameSizeGrow() {
            // Only calculate y pointers if we're growing bigger.
            // Otherwise GC can see bad pointers.
            y := &xy[1]
            // y 部分 和 x部分的对应的位置相差的距离 为一个 newbit的长度
            y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
            // y 部分的keys 的起始位置
            y.k = add(unsafe.Pointer(y.b), dataOffset)
            // y 部分的values 的起始位置
            y.e = add(y.k, bucketCnt*uintptr(t.keysize))
        }

        for ; b != nil; b = b.overflow(t) {
            // keys的起始位置
            k := add(unsafe.Pointer(b), dataOffset)
            // values的起始位置
            e := add(k, bucketCnt*uintptr(t.keysize))
            // 遍历bucket中的每一个 cell
            for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
                // oldbucket 的 tophash[i] 的tophash
                top := b.tophash[i]
                if isEmpty(top) {
                    // 做一下标记 当前cell 已经搬迁过了，继续下一个cell
                    b.tophash[i] = evacuatedEmpty
                    continue
                }
                // tophash 的状态由问题。 [0, minTopHash]是给用来标识cell状态的， 正常结点的tophash 不能 小于 minTopHash
                if top < minTopHash {
                    throw("bad map state")
                }
                k2 := k
                if t.indirectkey() {
                    k2 = *((*unsafe.Pointer)(k2))
                }
                var useY uint8
                // 双倍扩容
                if !h.sameSizeGrow() {
                    // Compute hash to make our evacuation decision (whether we need
                    // to send this key/elem to bucket x or bucket y).
                    hash := t.hasher(k2, uintptr(h.hash0))
                    // 如果key 存的是一个 NaN ，重新计算hash。 但是没有意义， 每次重新计算可能hash值都会改变，不可唯一确定。
                    if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
                        // If key != key (NaNs), then the hash could be (and probably
                        // will be) entirely different from the old hash. Moreover,
                        // it isn't reproducible. Reproducibility is required in the
                        // presence of iterators, as our evacuation decision must
                        // match whatever decision the iterator made.
                        // Fortunately, we have the freedom to send these keys either
                        // way. Also, tophash is meaningless for these kinds of keys.
                        // We let the low bit of tophash drive the evacuation decision.
                        // We recompute a new random tophash for the next level so
                        // these keys will get evenly distributed across all buckets
                        // after multiple grows.
                        useY = top & 1
                        top = tophash(hash)
                    } else {
                        // 放置在 y 部分。
                        // 假设 b == 2，则newbit == 1 00 ， 即只需要和高位1 比较，即可确定元素最终是x、 y部分了。
                        // 高位 1 在 y 部分， 地位 0 在 x 部分
                        if hash&newbit != 0 {
                            useY = 1
                        }
                    }
                }
                // 状态错误， panic
                if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
                    throw("bad evacuatedN")
                }
                // evacuatedX + 0/1
                b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
                // 最终存放地址
                dst := &xy[useY] // evacuation destination

                // 新的bucket满了，则分配一个新的overflow 放在一个cell上
                if dst.i == bucketCnt {
                    dst.b = h.newoverflow(t, dst.b)
                    dst.i = 0
                    dst.k = add(unsafe.Pointer(dst.b), dataOffset)
                    dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
                }
                // 更新top hash值
                dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
                // key 指针类型 则 copy 指针， 否则copy 值
                if t.indirectkey() {
                    *(*unsafe.Pointer)(dst.k) = k2 // copy pointer
                } else {
                    typedmemmove(t.key, dst.k, k) // copy elem
                }
                if t.indirectelem() {
                    *(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
                } else {
                    typedmemmove(t.elem, dst.e, e)
                }
                // 索引 + 1
                dst.i++
                // These updates might push these pointers past the end of the
                // key or elem arrays.  That's ok, as we have the overflow pointer
                // at the end of the bucket to protect against pointing past the
                // end of the bucket.
                // key 地址更新
                dst.k = add(dst.k, uintptr(t.keysize))
                // value 地址更新
                dst.e = add(dst.e, uintptr(t.elemsize))
            }
        }
        // Unlink the overflow buckets & clear key/elem to help GC.
        // 清除掉old bucket中的指针关系， 方便GC
        if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
            b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
            // Preserve b.tophash because the evacuation
            // state is maintained there.
            ptr := add(b, dataOffset)
            n := uintptr(t.bucketsize) - dataOffset
            memclrHasPointers(ptr, n)
        }
    }

    // oldbucket == nevacuate 标识oldbucket的最后一个cell的搬迁工作完成了，即扩容完成
    if oldbucket == h.nevacuate {
        // 检查， 确保扩容完成了。 修改oldbucket == nil， 扩容完成
        advanceEvacuationMark(h, t, newbit)
    }
}

搬迁过程

搬迁过程中

搬迁中的查找

假设当前定位到了新的buckets的3号桶中，首先会判断oldbuckets中的对应的桶有没有被搬迁过。如果搬迁过了，不需要看原来的桶了，直接遍历新的buckets的3号桶。

内存对比图

扩容前：

扩容前数据

等量扩容结果

等量扩容
两倍扩容：

双倍扩容会将old buckets上的元素分配到x， y两个部key & 1 << B == 0 分配到x部分，key & 1 << B == 1 分配到y部分

两倍扩容

举例说明

注意：当前只对双倍扩容描述，等量扩容只是重新填充了一下元素，相对位置没有改变。

假设当前map 的B == 5，原本元素经过hash函数计算的 hash 值为：

10010111 | 000011110110110010001111001010100010010110010101011 │ 00110

rehash.png

因为双倍扩容之后 B = B + 1，此时B == 6。key & 1 << B == 1, 即当前元素rehash到高位，新buckets中 y 部分. 否则 key & 1 << B == 0 则rehash到低位，即x 部分。

删除map元素

定位key，和查找过程相似。
判断删除元素后当前元素的个数，如果个数为0，则重置hash种子，使得map 和以前的map更加的不同
当前overflow中只有当前元素了，要删除当前元素时，需要考虑一下后面overflow的cell的状态，将后续的状态全部修改

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
    // race 检测
    if raceenabled && h != nil {
        callerpc := getcallerpc()
        pc := funcPC(mapdelete)
        racewritepc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    if msanenabled && h != nil {
        msanread(key, t.key.size)
    }
    // map没有初始化 或者 容量位0 直接返回
    if h == nil || h.count == 0 {
        if t.hashMightPanic() {
            t.hasher(key, 0) // see issue 23734
        }
        return
    }
    // 写 标志位检查， 如果有协程正在写map， 则panic 同时写错误
    if h.flags&hashWriting != 0 {
        throw("concurrent map writes")
    }
    // 依据hash种子 生成 hash值
    hash := t.hasher(key, uintptr(h.hash0))

    // Set hashWriting after calling t.hasher, since t.hasher may panic,
    // in which case we have not actually done a write (delete).
    // 置为 写 标志位
    h.flags ^= hashWriting
    // 找到是第几个桶
    bucket := hash & bucketMask(h.B)
    // 正在扩容，则帮助搬迁
    if h.growing() {
        growWork(t, h, bucket)
    }
    // 找到对应桶的 地址
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    bOrig := b
    top := tophash(hash)
search:
    // 双重循环， 外层遍历链表， 内层遍历bucket
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if b.tophash[i] == emptyRest {
                    break search
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            k2 := k
            if t.indirectkey() {
                k2 = *((*unsafe.Pointer)(k2))
            }
            if !t.key.equal(key, k2) {
                continue
            }
            // Only clear key if there are pointers in it.
            // 如果是指针， 则将指向内容置为 nil
            if t.indirectkey() {
                *(*unsafe.Pointer)(k) = nil
            } else if t.key.ptrdata != 0 {
                memclrHasPointers(k, t.key.size)
            }
            e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            if t.indirectelem() {
                *(*unsafe.Pointer)(e) = nil
            } else if t.elem.ptrdata != 0 {
                memclrHasPointers(e, t.elem.size)
            } else {
                memclrNoHeapPointers(e, t.elem.size)
            }
            // 标识当前cell 为空
            b.tophash[i] = emptyOne
            // If the bucket now ends in a bunch of emptyOne states,
            // change those to emptyRest states.
            // It would be nice to make this a separate function, but
            // for loops are not currently inlineable.
            // 如果当前cell 是bucket的最后一个结点
            if i == bucketCnt-1 {
                // 后续overflow 中还有元素
                if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
                    // 不用继续遍历， 已经找到了
                    goto notLast
                }
            } else {
                // 后一个cell 中有元素， 则说明找到了
                if b.tophash[i+1] != emptyRest {
                    goto notLast
                }
            }

            // 走到这里， 说明当前cell 后面没有元素了
            // 这个for 循环是为了 改变当前cell后面的多余的元素。
            // 如 当前overflow中只有当前元素了，要删除当前元素时， 需要考虑一下后面overflow的cell的状态， 将后续的状态全部修改
            for {
                // 修改状态， 标识当前cell 后面没有元素了
                b.tophash[i] = emptyRest
                if i == 0 {
                    if b == bOrig {
                        break // beginning of initial bucket, we're done.
                    }
                    // Find previous bucket, continue at its last entry.
                    c := b
                    for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
                    }
                    i = bucketCnt - 1
                } else {
                    i--
                }
                if b.tophash[i] != emptyOne {
                    break
                }
            }
        notLast:
            // 容量 - 1
            h.count--
            // Reset the hash seed to make it more difficult for attackers to
            // repeatedly trigger hash collisions. See issue 25237.
            if h.count == 0 {
                // 如果 容量为空了，则重新生成hash 种子
                h.hash0 = fastrand()
            }
            break search
        }
    }

    if h.flags&hashWriting == 0 {
        throw("concurrent map writes")
    }
    h.flags &^= hashWriting
}

遍历map

使用方式：

package main

import "fmt"

func main() {
 ageMp := make(map[string]int)
 ageMp["wxx"] = 1

 for name, age := range ageMp {
 fmt.Println(name, age)
 }
}

源码

// mapiterinit 初始化迭代器
// mapiternext 迭代器遍历
mapiterinit + mapiternext == 遍历

// mapiterinit initializes the hiter struct used for ranging over maps.
// The hiter struct pointed to by 'it' is allocated on the stack
// by the compilers order pass or on the heap by reflect_mapiterinit.
// Both need to have zeroed hiter since the struct contains pointers.
func mapiterinit(t *maptype, h *hmap, it *hiter) {
    // 竞争检测
    if raceenabled && h != nil {
        callerpc := getcallerpc()
        racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit))
    }

    if h == nil || h.count == 0 {
        return
    }

    if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 {
        throw("hash_iter size incorrect") // see cmd/compile/internal/reflectdata/reflect.go
    }
    it.t = t
    it.h = h

    // grab snapshot of bucket state
    it.B = h.B
    it.buckets = h.buckets
    if t.bucket.ptrdata == 0 {
        // Allocate the current slice and remember pointers to both current and old.
        // This preserves all relevant overflow buckets alive even if
        // the table grows and/or overflow buckets are added to the table
        // while we are iterating.
        h.createOverflow()
        it.overflow = h.extra.overflow
        it.oldoverflow = h.extra.oldoverflow
    }

    // decide where to start
    // 随机取一个bucket开始
    r := uintptr(fastrand())
    if h.B > 31-bucketCntBits {
        r += uintptr(fastrand()) << 31
    }
    it.startBucket = r & bucketMask(h.B)
    // 随机取一个cell 开始
    it.offset = uint8(r >> h.B & (bucketCnt - 1))

    // iterator state
    it.bucket = it.startBucket

    // Remember we have an iterator.
    // Can run concurrently with another mapiterinit().
    if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {
        atomic.Or8(&h.flags, iterator|oldIterator)
    }

    mapiternext(it)
}

可以看到，每一遍历生成迭代器的时候，会随机选取一个bucket 以及一个cell开始。从前往后遍历，再次遍历到起始位置时，遍历完成。

总结

map 返回的结构体是指针，即引用类型
map的内存结构是 hash table + 链表
map的定位实际上是双重for循环，在定位到bucket后，外层遍历overflow，内层遍历8个cell
map 扩容的标识 h.oldbuckets != nil
map 的扩容分成等量扩容和双倍扩容
map的扩容条件为负载因子大于 6.5 或者 overflow 的数量太多（ B <= 15时 noverflow >= 2^B ； B > 15时， noverflow >= 2^15）。前者对应的是元素分配太过于集中，hash冲突严重。后者对应的是元素分配太过于稀疏，地广人稀，查询效率低。
map 删除元素后，会重新生成hash种子，使得map 和之前更加得不同
map 的遍历是随机的，可能每次的遍历先后顺序不一样。

参考资料

https://www.qcrao.com/2019/05/22/dive-into-go-map/

https://draveness.me/golang/docs/part2-foundation/ch03-datastructure/golang-hashmap/

https://www.bilibili.com/video/BV1Q4411W7MR?spm_id_from=333.337.search-card.all.click

golang map源码浅析

声明

map 的结构：

hmap

hmap结构体定义

hmap 结构示意图

bmap

bmap 结构定义

bmap 结构示意图

创建map

查找map元素

hash 函数如何确定

正常定位key

扩容时定位key

源码：

修改map元素

修改和插入元素

扩容

源码

搬迁过程

内存对比图

举例说明

删除map元素

遍历map

源码

总结

参考资料

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读