美文网首页
Go数据结构01-数组与切片

Go数据结构01-数组与切片

作者: 王侦 | 来源:发表于2022-11-04 10:26 被阅读0次

    1.数组

    数组是切片和映射的基础数据结构。

    数组是一个长度固定的数据类型,用于存储一段具有相同类型元素的连续块。

    1.1 数组赋值是深拷贝

    在 Go 中,与 C 数组变量隐式作为指针使用不同,Go 数组是值类型,赋值和函数传参操作都会复制整个数组数据。

        var arr1 [3]string
        arr2 := [3]string{"red", "yellow", "green"}
        arr1 = arr2 //数组拷贝(深拷贝)
        fmt.Println(arr1)
        fmt.Println(arr2)
        arr1[0] = "black"
        fmt.Println(arr1)
        fmt.Println(arr2)
    

    结果:

    [red yellow green]
    [red yellow green]
    --------
    [black yellow green]
    [red yellow green]
    
    

    1.2 数组传参最好用指针

    var arr [1e6]int
    
    func foo1(arr [1e6]int) { //每次拷贝整个数组
        ...
    }
    
    func foo2(arr *[1e6]int) { //传递指针,效率更高
        ...
    }
    

    测试函数

    func testCopy() {
        arrayA := [2]int{100, 200}
        var arrayB [2]int
    
        arrayB = arrayA
    
        fmt.Printf("arrayA : %p , %v\n", &arrayA, arrayA)
        fmt.Printf("arrayB : %p , %v\n", &arrayB, arrayB)
    
        testArray(arrayA)
    }
    
    func testArray(x [2]int) {
        fmt.Printf("func Array : %p , %v\n", &x, x)
    }
    

    结果:

    arrayA : 0xc000084000 , [100 200]
    arrayB : 0xc000084010 , [100 200]
    func Array : 0xc0000180e0 , [100 200]
    

    2.切片

    切片是引用传递,所以它们不需要使用额外的内存并且比使用数组更有效率。

    切片本身并不是动态数组或者数组指针。它内部实现的数据结构通过指针引用底层数组,设定相关属性将数据读写操作限定在指定的区域内。切片本身是一个只读对象,其工作机制类似数组指针的一种封装。

    切片(slice)是对数组一个连续片段的引用,所以切片是一个引用类型(因此更类似于 C++ 中的 Vector 类型,或者 Python 中的 list 类型)。这个片段可以是整个数组,或者是由起始和终止索引标识的一些项的子集。需要注意的是,终止索引标识的项不包括在切片内。切片提供了一个与指向数组的动态窗口。

    给定项的切片索引可能比相关数组的相同元素的索引小。和数组不同的是,切片的长度可以在运行时修改,最小为 0 ,最大为相关数组的长度:切片是一个长度可变的数组。

    2.1 数据结构

    src/runtime/slice.go:

    type slice struct {
        array unsafe.Pointer
        len   int
        cap   int
    }
    

    切片的结构体由3部分构成,Pointer 是指向一个数组的指针,len 代表当前切片的长度,cap 是当前切片的容量。cap 总是大于等于 len 的。

    2.2 创建切片

    src/runtime/slice.go:

    func makeslice(et *_type, len, cap int) unsafe.Pointer {
        mem, overflow := math.MulUintptr(et.size, uintptr(cap))
        if overflow || mem > maxAlloc || len < 0 || len > cap {
            // NOTE: Produce a 'len out of range' error instead of a
            // 'cap out of range' error when someone does make([]T, bignumber).
            // 'cap out of range' is true too, but since the cap is only being
            // supplied implicitly, saying len is clearer.
            // See golang.org/issue/4085.
            mem, overflow := math.MulUintptr(et.size, uintptr(len))
            if overflow || mem > maxAlloc || len < 0 {
                panicmakeslicelen()
            }
            panicmakeslicecap()
        }
    
        return mallocgc(mem, et, true)
    }
    

    上图是用 make 函数创建的一个 len = 4, cap = 6 的切片。内存空间申请了6个 int 类型的内存大小。由于 len = 4,所以后面2个暂时访问不到,但是容量还是在的。这时候数组里面每个变量都是0 。


    这里是用字面量创建的一个 len = 6,cap = 6 的切片,这时候数组里面每个元素的值都初始化完成了。需要注意的是 [ ] 里面不要写数组的容量,因为如果写了个数以后就是数组了,而不是切片了。

    2.2.1 nil切片

    nil切片:var slice []int



    nil 切片被用在很多标准库和内置函数中,描述一个不存在的切片的时候,就需要用到 nil 切片。比如函数在发生异常的时候,返回的切片就是 nil 切片。nil 切片的指针指向 nil。

    2.2.2 空切片

    空切片一般会用来表示一个空的集合。比如数据库查询,一条结果也没有查到,那么就可以返回一个空切片。

    silce := make( []int , 0 )
    slice := []int{ }
    

    空切片和 nil 切片的区别在于,空切片指向的地址不是nil,指向的是一个内存地址,但是它没有分配任何内存空间,即底层元素包含0个元素。

    2.3 切片扩容

    详细内容请参考:https://segmentfault.com/a/1190000040413412

    func growslice(oldPtr unsafe.Pointer, newLen, oldCap, num int, et *_type) slice {
        oldLen := newLen - num
        if raceenabled {
            callerpc := getcallerpc()
            racereadrangepc(oldPtr, uintptr(oldLen*int(et.size)), callerpc, abi.FuncPCABIInternal(growslice))
        }
        if msanenabled {
            msanread(oldPtr, uintptr(oldLen*int(et.size)))
        }
        if asanenabled {
            asanread(oldPtr, uintptr(oldLen*int(et.size)))
        }
    
        if newLen < 0 {
            panic(errorString("growslice: len out of range"))
        }
    
        if et.size == 0 {
            // append should not create a slice with nil pointer but non-zero len.
            // We assume that append doesn't need to preserve oldPtr in this case.
            return slice{unsafe.Pointer(&zerobase), newLen, newLen}
        }
    
        newcap := oldCap
        doublecap := newcap + newcap
        if newLen > doublecap {
            newcap = newLen
        } else {
            const threshold = 256
            if oldCap < threshold {
                newcap = doublecap
            } else {
                // Check 0 < newcap to detect overflow
                // and prevent an infinite loop.
                for 0 < newcap && newcap < newLen {
                    // Transition from growing 2x for small slices
                    // to growing 1.25x for large slices. This formula
                    // gives a smooth-ish transition between the two.
                    newcap += (newcap + 3*threshold) / 4
                }
                // Set newcap to the requested cap when
                // the newcap calculation overflowed.
                if newcap <= 0 {
                    newcap = newLen
                }
            }
        }
    
        var overflow bool
        var lenmem, newlenmem, capmem uintptr
        // Specialize for common values of et.size.
        // For 1 we don't need any division/multiplication.
        // For goarch.PtrSize, compiler will optimize division/multiplication into a shift by a constant.
        // For powers of 2, use a variable shift.
        switch {
        case et.size == 1:
            lenmem = uintptr(oldLen)
            newlenmem = uintptr(newLen)
            capmem = roundupsize(uintptr(newcap))
            overflow = uintptr(newcap) > maxAlloc
            newcap = int(capmem)
        case et.size == goarch.PtrSize:
            lenmem = uintptr(oldLen) * goarch.PtrSize
            newlenmem = uintptr(newLen) * goarch.PtrSize
            capmem = roundupsize(uintptr(newcap) * goarch.PtrSize)
            overflow = uintptr(newcap) > maxAlloc/goarch.PtrSize
            newcap = int(capmem / goarch.PtrSize)
        case isPowerOfTwo(et.size):
            var shift uintptr
            if goarch.PtrSize == 8 {
                // Mask shift for better code generation.
                shift = uintptr(sys.TrailingZeros64(uint64(et.size))) & 63
            } else {
                shift = uintptr(sys.TrailingZeros32(uint32(et.size))) & 31
            }
            lenmem = uintptr(oldLen) << shift
            newlenmem = uintptr(newLen) << shift
            capmem = roundupsize(uintptr(newcap) << shift)
            overflow = uintptr(newcap) > (maxAlloc >> shift)
            newcap = int(capmem >> shift)
            capmem = uintptr(newcap) << shift
        default:
            lenmem = uintptr(oldLen) * et.size
            newlenmem = uintptr(newLen) * et.size
            capmem, overflow = math.MulUintptr(et.size, uintptr(newcap))
            capmem = roundupsize(capmem)
            newcap = int(capmem / et.size)
            capmem = uintptr(newcap) * et.size
        }
    
        // The check of overflow in addition to capmem > maxAlloc is needed
        // to prevent an overflow which can be used to trigger a segfault
        // on 32bit architectures with this example program:
        //
        // type T [1<<27 + 1]int64
        //
        // var d T
        // var s []T
        //
        // func main() {
        //   s = append(s, d, d, d, d)
        //   print(len(s), "\n")
        // }
        if overflow || capmem > maxAlloc {
            panic(errorString("growslice: len out of range"))
        }
    
        var p unsafe.Pointer
        if et.ptrdata == 0 {
            p = mallocgc(capmem, nil, false)
            // The append() that calls growslice is going to overwrite from oldLen to newLen.
            // Only clear the part that will not be overwritten.
            // The reflect_growslice() that calls growslice will manually clear
            // the region not cleared here.
            memclrNoHeapPointers(add(p, newlenmem), capmem-newlenmem)
        } else {
            // Note: can't use rawmem (which avoids zeroing of memory), because then GC can scan uninitialized memory.
            p = mallocgc(capmem, et, true)
            if lenmem > 0 && writeBarrier.enabled {
                // Only shade the pointers in oldPtr since we know the destination slice p
                // only contains nil pointers because it has been cleared during alloc.
                bulkBarrierPreWriteSrcOnly(uintptr(p), uintptr(oldPtr), lenmem-et.size+et.ptrdata)
            }
        }
        memmove(p, oldPtr, lenmem)
    
        return slice{p, newLen, newcap}
    }
    

    2.3.1 生成新数组

    因为原来数组的容量已经达到了最大值,再想扩容, Go 默认会先开一片内存区域,把原来的值拷贝过来,然后再执行 append() 操作。这种情况丝毫不影响原数组。

    测试扩容:

    func testGrow() {
        slice := []int{10, 20, 30, 40}
        newSlice := append(slice, 50)
        fmt.Printf("Before slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))
        fmt.Printf("Before newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))
        newSlice[1] += 10
        fmt.Printf("After slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))
        fmt.Printf("After newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))
    }
    

    结果:

    Before slice = [10 20 30 40], Pointer = 0xc000008108, len = 4, cap = 4
    Before newSlice = [10 20 30 40 50], Pointer = 0xc000008120, len = 5, cap = 8
    After slice = [10 20 30 40], Pointer = 0xc000008108, len = 4, cap = 4
    After newSlice = [10 30 30 40 50], Pointer = 0xc000008120, len = 5, cap = 8
    
    

    Go 中切片扩容的策略是这样的:

    • 首先判断,如果新申请容量(cap)大于2倍的旧容量(old.cap),最终容量(newcap)就是新申请的容量(cap)
    • 否则判断,如果旧切片的长度小于1024,则最终容量(newcap)就是旧容量(old.cap)的两倍,即(newcap=doublecap)
    • 否则判断,如果旧切片长度大于等于1024,则最终容量(newcap)从旧容量(old.cap)开始循环增加原来的 1/4,即(newcap=old.cap,for {newcap += newcap/4})直到最终容量(newcap)大于等于新申请的容量(cap),即(newcap >= cap)
    • 如果最终容量(cap)计算值溢出,则最终容量(cap)就是新申请容量(cap)

    总结:

    • 如果期望容量大于当前容量的两倍就会使用期望容量;
    • 如果当前切片的长度小于 1024 就会将容量翻倍;
    • 如果当前切片的长度大于 1024 就会每次增加 25% 的容量,直到新容量大于期望容量;
    • 这里只是确定切片的大致容量,接下来还需要根据切片中元素的大小对它们进行对齐

    2.3.2 复用老数组,特别注意容易产生bug

    func testSameArray() {
        array := [4]int{10, 20, 30, 40}
        slice := array[0:2]
        newSlice := append(slice, 50)
        fmt.Printf("Before slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))
        fmt.Printf("Before newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))
        newSlice[1] += 10
        fmt.Printf("After slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))
        fmt.Printf("After newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))
        fmt.Printf("After array = %v\n", array)
    }
    

    结果

    Before slice = [10 20], Pointer = 0xc00008e180, len = 2, cap = 4
    Before newSlice = [10 20 50], Pointer = 0xc00008e198, len = 3, cap = 4
    After slice = [10 30], Pointer = 0xc00008e180, len = 2, cap = 4
    After newSlice = [10 30 50], Pointer = 0xc00008e198, len = 3, cap = 4
    After array = [10 30 50 40]
    

    在这种情况下,扩容以后并没有新建一个新的数组,扩容前后的数组都是同一个,这也就导致了新的切片修改了一个值,也影响到了老的切片了。并且 append() 操作也改变了原来数组里面的值。一个 append() 操作影响了这么多地方,如果原数组上有多个切片,那么这些切片都会被影响!无意间就产生了莫名的 bug!

    这种情况,由于原数组还有容量可以扩容,所以执行 append() 操作以后,会在原数组上直接操作,所以这种情况下,扩容以后的数组还是指向原来的数组。

    2.3.3 对比示例

    func testGrow1() {
        slice := []int{1, 2, 3, 4, 5}
        newSlice := slice[1:3] //cap(newSlice) : 4
        fmt.Printf("Before slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))
        fmt.Printf("Before newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))
    
        newSlice = append(newSlice, 60) // slice当前结果为{1, 2, 3, 60, 5}
    
        //newSlice的append操作影响了slice的结果,可以尝试 newSlice := slice[1:3:3]再看看结果
        fmt.Printf("After slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))
        fmt.Printf("After newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))
    }
    
    func testGrow2() {
        slice := []int{1, 2, 3, 4, 5}
        newSlice := slice[1:3:3] //cap(newSlice) : 4
        fmt.Printf("Before slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))
        fmt.Printf("Before newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))
    
        newSlice = append(newSlice, 60) // slice当前结果为{1, 2, 3, 60, 5}
        fmt.Printf("After slice = %v, Pointer = %p, len = %d, cap = %d\n", slice, &slice, len(slice), cap(slice))
        fmt.Printf("After newSlice = %v, Pointer = %p, len = %d, cap = %d\n", newSlice, &newSlice, len(newSlice), cap(newSlice))
    }
    

    结果:

    Before slice = [1 2 3 4 5], Pointer = 0xc00010e180, len = 5, cap = 5
    Before newSlice = [2 3], Pointer = 0xc00010e198, len = 2, cap = 4
    After slice = [1 2 3 60 5], Pointer = 0xc00010e180, len = 5, cap = 5
    After newSlice = [2 3 60], Pointer = 0xc00010e198, len = 3, cap = 4
    ----------
    Before slice = [1 2 3 4 5], Pointer = 0xc00010e210, len = 5, cap = 5
    Before newSlice = [2 3], Pointer = 0xc00010e228, len = 2, cap = 2
    After slice = [1 2 3 4 5], Pointer = 0xc00010e210, len = 5, cap = 5
    After newSlice = [2 3 60], Pointer = 0xc00010e228, len = 3, cap = 4
    
    

    2.4 切片复制

    func slicecopy(toPtr unsafe.Pointer, toLen int, fromPtr unsafe.Pointer, fromLen int, width uintptr) int {
        if fromLen == 0 || toLen == 0 {
            return 0
        }
    
        n := fromLen
        if toLen < n {
            n = toLen
        }
    
        if width == 0 {
            return n
        }
    
        size := uintptr(n) * width
        if raceenabled {
            callerpc := getcallerpc()
            pc := abi.FuncPCABIInternal(slicecopy)
            racereadrangepc(fromPtr, size, callerpc, pc)
            racewriterangepc(toPtr, size, callerpc, pc)
        }
        if msanenabled {
            msanread(fromPtr, size)
            msanwrite(toPtr, size)
        }
        if asanenabled {
            asanread(fromPtr, size)
            asanwrite(toPtr, size)
        }
    
        if size == 1 { // common case worth about 2x to do here
            // TODO: is this still worth it with new memmove impl?
            *(*byte)(toPtr) = *(*byte)(fromPtr) // known to be a byte pointer
        } else {
            memmove(toPtr, fromPtr, size)
        }
        return n
    }
    
    

    在这个方法中,slicecopy 方法会把源切片值(即 fm Slice )中的元素复制到目标切片(即 to Slice )中,并返回被复制的元素个数,copy 的两个类型必须一致。slicecopy 方法最终的复制结果取决于较短的那个切片,当较短的切片复制完成,整个复制过程就全部完成了。

    相关文章

      网友评论

          本文标题:Go数据结构01-数组与切片

          本文链接:https://www.haomeiwen.com/subject/ioyetdtx.html