iOS底层原理探究03-calloc探究

作者: superFool | 来源:发表于2021-07-27 17:46 被阅读0次

iOS底层原理探究03-calloc探究
2019-03-02
iOS底层原理探究- NSObject 所占内存
iOS底层原理 - 探寻Runtime本质（二）
iOS底层 - cache原理分析
iOS--OC底层原理文章汇总
iOS内存abort(Jetsam) 原理探究
iOS底层原理探究（2）
iOS底层原理探究（1）
iOS 多线程应用的方方面面

通过对《iOS底层原理探究01-alloc底层原理》我们知道了OC对象在alloc的过程中实际上是都是走到_class_createInstanceFromZone方法生成的对象，而该方法中又是调用calloc为对象分配内存空间的，今天我们就来深入了解一下这个函数

在研究calloc方法时定位到这个方法已经不在libobjc库里了而是在libmalloc库里，具体定位方法参考前面文章里的三种方法，libmalloc在这里可以下载到源码

拿到源码开始玩儿，我们在main函数里直接调用calloc函数跟流程

main函数里直接调用calloc

calloc里调用了_malloc_zone_calloc(default_zone, num_items, size, MZ_POSIX)函数，再来看这个函数
image.png

这个 default_zone 其实是一个“假的”zone，它存在的目的就是要引导程序进入一个创建真正的 zone 的流程。

在_malloc_zone_calloc函数中1556行是关键代码ptr = zone->calloc(zone, num_items, size)

image.png
然而我们点进去只能看到calloc的函数声明并没有实现，这样显然是找不到后续流程的，我们得想点其他办法了
image.png
我们在这里打个断点运行起来停在这个断点后 p 一下 zone->calloc 发现这个实际调用的是default_zone_calloc函数，继续跟进这个函数
image.png
看下default_zone_calloc方法的具体实现

image.png

引导创建真正的 zone
使用真正的 zone 进行 calloc
系统有一套创建zone的策略，runtime_default_zone()函数就是创建zone的入口函数接下来我们看下调用链，最终调用_malloc_initialize，为了流程的连贯这个函数我们放到后面研究，继续看后续流程
创建zone函数调用链
这里的zone->calloc(zone, num_items, size);用之前的方法po一下看实际调用的函数为nano_calloc
image.png
继续来看nano_calloc

static void *
nano_calloc(nanozone_t *nanozone, size_t num_items, size_t size)
{
    size_t total_bytes;

    if (calloc_get_size(num_items, size, 0, &total_bytes)) {
        return NULL;
    }
    // 如果要开辟的空间小于 NANO_MAX_SIZE 则进行nanozone_t的malloc。
    if (total_bytes <= NANO_MAX_SIZE) {
        void *p = _nano_malloc_check_clear(nanozone, total_bytes, 1);
        if (p) {
            return p;
        } else {
            /* FALLTHROUGH to helper zone */
        }
    }
    //否则就进行helper_zone的流程
    malloc_zone_t *zone = (malloc_zone_t *)(nanozone->helper_zone);
    return zone->calloc(zone, 1, total_bytes);
}

如果要开辟的空间小于 NANO_MAX_SIZE 则调用_nano_malloc_check_clear执行后续内存分配流程
否则就进行 helper_zone 的流程

image.png
这里可以看到NANO_MAX_SIZE的值为256B

这里内存分配的的重点是_nano_malloc_check_clea函数，深入了解一下她

static void *
_nano_malloc_check_clear(nanozone_t *nanozone, size_t size, boolean_t cleared_requested)
{
    MALLOC_TRACE(TRACE_nano_malloc, (uintptr_t)nanozone, size, cleared_requested, 0);

    void *ptr;
    size_t slot_key;
    // 获取16字节对齐之后的大小,slot_key非常关键，为slot_bytes/16的值，也是数组的二维下下标
    size_t slot_bytes = segregated_size_to_fit(nanozone, size, &slot_key); // Note slot_key is set here
    //根据_os_cpu_number经过运算获取 mag_index(meta_data的一维索引)
    mag_index_t mag_index = nano_mag_index(nanozone);
    //确定当前cpu对应的mag和通过size参数计算出来的slot，去对应metadata的链表中取已经被释放过的内存区块缓存
    nano_meta_admin_t pMeta = &(nanozone->meta_data[mag_index][slot_key]);
    //检测是否存在已经释放过，可以直接拿来用的内存,已经被释放的内存会缓存在 chained_block_s 链表
    //每一次free。同样会根据 index 和slot 的值回去 pMeta，然后把slot_LIFO的指针指向释放的内存。
    ptr = OSAtomicDequeue(&(pMeta->slot_LIFO), offsetof(struct chained_block_s, next));
    if (ptr) {
    
    ...省略无关代码
    
    //如果缓存的内存存在，这进行指针地址检查等异常检测，最后返回
    //第一次调用malloc时，不会执行这一块代码。
    } else {
    //没有释放过的内存，所以调用函数 获取内存
        ptr = segregated_next_block(nanozone, pMeta, slot_bytes, mag_index);
    }

    if (cleared_requested && ptr) {
        memset(ptr, 0, slot_bytes); // TODO: Needs a memory barrier after memset to ensure zeroes land first?
    }
    return ptr;
}

该方法主要是通过 cpu 与 slot 确定 index，从chained_block_s 链表中找出是否存在已经释放过的缓存。如果存在则进行指针检查之后返回，否则进入查询 meta data 或者开辟 band。

slot_bytes 是通过segregated_size_to_fit()16字节对齐算法对齐后的大小
slot_key 为slot_bytes/16的值，也是数组的二维下下标
通过上面的参数去取已经被释放过的内存区块缓存
取到缓存会对指针检测之后返回
没有取到缓存调用segregated_next_block函数分配内存

segregated_next_block函数深入了解

static MALLOC_INLINE void *
segregated_next_block(nanozone_t *nanozone, nano_meta_admin_t pMeta, size_t slot_bytes, unsigned int mag_index)
{
    while (1) {
        //当前这块pMeta可用内存的结束地址
        uintptr_t theLimit = pMeta->slot_limit_addr; // Capture the slot limit that bounds slot_bump_addr right now
        //原子的为pMeta->slot_bump_addr添加slot_bytes的长度，偏移到下一个地址
        uintptr_t b = OSAtomicAdd64Barrier(slot_bytes, (volatile int64_t *)&(pMeta->slot_bump_addr));
        //减去添加的偏移量，获取当前可以获取的地址
        b -= slot_bytes; // Atomic op returned addr of *next* free block. Subtract to get addr for *this* allocation.
        
        if (b < theLimit) {   // Did we stay within the bound of the present slot allocation?
            //如果地址还在范围之内，则返回地址
            return (void *)b; // Yep, so the slot_bump_addr this thread incremented is good to go
        } else {
            //已经用尽了
            if (pMeta->slot_exhausted) { // exhausted all the bands availble for this slot?
                pMeta->slot_bump_addr = theLimit;
                return 0;                 // We're toast
            } else {
                // One thread will grow the heap, others will see its been grown and retry allocation
                _malloc_lock_lock(&nanozone->band_resupply_lock[mag_index]);
                // re-check state now that we've taken the lock
                //多线程的缘故，重新检查是否用尽
                if (pMeta->slot_exhausted) {
                    _malloc_lock_unlock(&nanozone->band_resupply_lock[mag_index]);
                    return 0; // Toast
                } else if (b < pMeta->slot_limit_addr) {
                    //如果小于最大限制地址，当重新申请一个新的band后，重新尝试while
                    _malloc_lock_unlock(&nanozone->band_resupply_lock[mag_index]);
                    continue; // ... the slot was successfully grown by first-taker (not us). Now try again.
                } else if (segregated_band_grow(nanozone, pMeta, slot_bytes, mag_index)) {
                    //申请新的band成功，重新尝试while
                    _malloc_lock_unlock(&nanozone->band_resupply_lock[mag_index]);
                    continue; // ... the slot has been successfully grown by us. Now try again.
                } else {
                    pMeta->slot_exhausted = TRUE;
                    pMeta->slot_bump_addr = theLimit;
                    _malloc_lock_unlock(&nanozone->band_resupply_lock[mag_index]);
                    return 0;
                }
            }
        }
    }
}

该函数作用就是分配内存还是很重要的所以详细的讲一下，原理通俗一点说就像找座位一样，看当前座位有没有人，有人就看下一个座位有没有人没人就坐下(分配内存)有人就继续找下一个，没有到最后一个座位之前就一直重复往后找，直到找到座位坐下(分配内存成功)，或者直到最后一个座位都有人坐(分配内存失败)。

theLimit = pMeta->slot_limit_addr 当前这块pMeta可用内存的结束地址
b = OSAtomicAdd64Barrier(slot_bytes, (volatile int64_t *)&(pMeta->slot_bump_addr));原子的为pMeta->slot_bump_addr添加slot_bytes的长度，偏移到下一个地址
b -= slot_bytes;减去添加的偏移量，获取当前可以获取的地址如果地址还在范围之内，则返回地址return (void *)b;
否则 pMeta->slot_exhausted 检查是否用尽已用尽直接返回return 0;分配失败
如果小于最大限制地址，重新尝试 while循环分配内存
如果大于最大限制地址 segregated_band_grow(nanozone, pMeta, slot_bytes, mag_index) 申请新的band，申请成功重新尝试while循环
申请band失败Meta->slot_exhausted = TRUE; pMeta->slot_bump_addr = theLimit;设置用尽标志 return 0;
其中配合了加锁解锁保证线程安全

如果是第一次调用 segregated_next_block 函数，band 不存在，缓存也不会存在，所以会调用segregated_band_grow。来开辟新的 band

segregated_band_grow深入了解

boolean_t
segregated_band_grow(nanozone_t *nanozone, nano_meta_admin_t pMeta, size_t slot_bytes, unsigned int mag_index)
{
    用来计算slot_current_base_addr 的联合体
    nano_blk_addr_t u; // the compiler holds this in a register
    uintptr_t p, s;
    size_t watermark, hiwater;

    if (0 == pMeta->slot_current_base_addr) { // First encounter?
        //利用nano_blk_addr_t 来计算slot_current_base_addr。
        u.fields.nano_signature = NANOZONE_SIGNATURE;
        u.fields.nano_mag_index = mag_index;
        u.fields.nano_band = 0;
        u.fields.nano_slot = (slot_bytes >> SHIFT_NANO_QUANTUM) - 1;
        u.fields.nano_offset = 0;
        
        //根据设置的属性计算 slot_current_base_addr 
        p = u.addr;
        pMeta->slot_bytes = (unsigned int)slot_bytes;
        pMeta->slot_objects = SLOT_IN_BAND_SIZE / slot_bytes;
    } else {
        p = pMeta->slot_current_base_addr + BAND_SIZE; // Growing, so stride ahead by BAND_SIZE

        u.addr = (uint64_t)p;
        if (0 == u.fields.nano_band) { // Did the band index wrap?
            return FALSE;
        }

        assert(slot_bytes == pMeta->slot_bytes);
    }
    pMeta->slot_current_base_addr = p;
//BAND_SIZE = 1 << 21 = 2097152 = 256kb
    mach_vm_address_t vm_addr = p & ~((uintptr_t)(BAND_SIZE - 1)); // Address of the (2MB) band covering this (128KB) slot
    if (nanozone->band_max_mapped_baseaddr[mag_index] < vm_addr) {
    //如果最大能存储的地址 仍然小于目标地址，则小开辟新的band
#if !NANO_PREALLOCATE_BAND_VM
        // Obtain the next band to cover this slot
        //// mac 和模拟器 或重新使用
        // Obtain the next band to cover this slot
        //重新申请新的 band，调用mach_vm_map  从pmap 转换。
        kern_return_t kr = mach_vm_map(mach_task_self(), &vm_addr, BAND_SIZE, 0, VM_MAKE_TAG(VM_MEMORY_MALLOC_NANO),
                MEMORY_OBJECT_NULL, 0, FALSE, VM_PROT_DEFAULT, VM_PROT_ALL, VM_INHERIT_DEFAULT);

        void *q = (void *)vm_addr;
        if (kr || q != (void *)(p & ~((uintptr_t)(BAND_SIZE - 1)))) { // Must get exactly what we asked for
            if (!kr) {
                mach_vm_deallocate(mach_task_self(), vm_addr, BAND_SIZE);
            }
            return FALSE;
        }
#endif
        nanozone->band_max_mapped_baseaddr[mag_index] = vm_addr;
    }

    // Randomize the starting allocation from this slot (introduces 11 to 14 bits of entropy)
    if (0 == pMeta->slot_objects_mapped) { // First encounter?
        pMeta->slot_objects_skipped = (malloc_entropy[1] % (SLOT_IN_BAND_SIZE / slot_bytes));
        pMeta->slot_bump_addr = p + (pMeta->slot_objects_skipped * slot_bytes);
    } else {
        pMeta->slot_bump_addr = p;
    }

    pMeta->slot_limit_addr = p + (SLOT_IN_BAND_SIZE / slot_bytes) * slot_bytes;
    pMeta->slot_objects_mapped += (SLOT_IN_BAND_SIZE / slot_bytes);

    u.fields.nano_signature = NANOZONE_SIGNATURE;
    u.fields.nano_mag_index = mag_index;
    u.fields.nano_band = 0;
    u.fields.nano_slot = 0;
    u.fields.nano_offset = 0;
    s = u.addr; // Base for this core.

    // Set the high water mark for this CPU's entire magazine, if this resupply raised it.
    watermark = nanozone->core_mapped_size[mag_index];
    hiwater = MAX(watermark, p - s + SLOT_IN_BAND_SIZE);
    nanozone->core_mapped_size[mag_index] = hiwater;

    return TRUE;
}

nano_blk_addr_t u 用来计算 slot_current_base_addr 的联合体
利用 nano_blk_addr_t 来计算 slot_current_base_addr
根据设置的属性计算 slot_current_base_addr
如果最大能存储的地址仍然小于目标地址，则小开辟新的band
mac 和模拟器或重新使用
重新申请新的 band，调用 mach_vm_map 从 pmap 转换。
当进入 segregated_band_grow 时，如果当前的 band 不够用，则使用 mach_vm_map 经由 pmap 重新映射物理内存到虚拟内存。

至此malloc主流程全部分析完了，其中还剩余两个方法没有深入了解下面我们就了解一下她们
segregated_size_to_fit16字节对齐算法

//这两个宏定义也贴在这里方便下面分析用
#define SHIFT_NANO_QUANTUM      4
#define NANO_REGIME_QUANTA_SIZE (1 << SHIFT_NANO_QUANTUM)   // 16

static MALLOC_INLINE size_t
segregated_size_to_fit(nanozone_t *nanozone, size_t size, size_t *pKey)
{
    size_t k, slot_bytes;

    if (0 == size) {
        size = NANO_REGIME_QUANTA_SIZE; // Historical behavior
    }
    k = (size + NANO_REGIME_QUANTA_SIZE - 1) >> SHIFT_NANO_QUANTUM; // round up and shift for number of quanta
    slot_bytes = k << SHIFT_NANO_QUANTUM;                           // multiply by power of two quanta size
    *pKey = k - 1;                                                  // Zero-based!

    return slot_bytes;
}

首先看上面的宏定义SHIFT_NANO_QUANTUM的值是4 NANO_REGIME_QUANTA_SIZE 是 (1 << SHIFT_NANO_QUANTUM) 也就是1 << 4 等于 16
if (0 == size) { size = NANO_REGIME_QUANTA_SIZE; // Historical behavior }
传进来的size如果等于0 size赋值为16
k = (size + NANO_REGIME_QUANTA_SIZE - 1) >> SHIFT_NANO_QUANTUM;
带入值得到k = (size + 15) >> 4;
slot_bytes = k << SHIFT_NANO_QUANTUM;
带入值得到slot_bytes = k <<4;
就(size + 15)先右移4位再左移4位，这样操作相当对(size + 15)的低四位抹零，而size + 15会让size低4位小于16大于0的值变成16，也就是说这样一通操作之后size就变成了16的整数倍且多余的余数也会进位变成16

_malloc_initialize深入了解

static void
_malloc_initialize(void *context __unused)
{
    ...... - 省略多余代码
    //创建helper_zone,
    malloc_zone_t *helper_zone = create_scalable_zone(0, malloc_debug_flags);
    //创建 nano zone
    if (_malloc_engaged_nano == NANO_V2) {
    zone = nanov2_create_zone(helper_zone, malloc_debug_flags);
    } else if (_malloc_engaged_nano == NANO_V1) {
    zone = nano_create_zone(helper_zone, malloc_debug_flags);
    }
    //如果上面的if else if 成立，这进入 nonazone
    if (zone) {
    malloc_zone_register_while_locked(zone);
    malloc_zone_register_while_locked(helper_zone);

    // Must call malloc_set_zone_name() *after* helper and nano are hooked together.
    malloc_set_zone_name(zone, DEFAULT_MALLOC_ZONE_STRING);
    malloc_set_zone_name(helper_zone, MALLOC_HELPER_ZONE_STRING);
    } else {
    //使用helper_zone分配内存
    zone = helper_zone;
    malloc_zone_register_while_locked(zone);
    malloc_set_zone_name(zone, DEFAULT_MALLOC_ZONE_STRING);
    }
    //缓存default_zone
    initial_default_zone = zone;
    .....    
}

这个函数还是挺简单的

创建 helper_zone
创建 nano zone
如果上面的 if else if 成立，这进入nonazone
使用 helper_zone 分配内存
缓存 default_zone

打完收工！

参考资料：iOS 高级之美（六）—— malloc分析

iOS底层原理探究03-calloc探究
通过对《iOS底层原理探究01-alloc底层原理》[https://www.jianshu.com/p/aafb...
2019-03-02
Runtime Objective-C Runtime iOS底层原理探究-Runtime isa 和 Class...
iOS底层原理探究- NSObject 所占内存
iOS底层原理探究- NSObject 所占内存面向对象的Objective-C 我们平时写的 OC 代码底层实...
iOS底层原理 - 探寻Runtime本质（二）
1. Class结构的本质上一章对isa结构的本质做了探究，下面探究Class的内部结构。由iOS底层原理 -...
iOS底层 - cache原理分析
iOS开发底层探究之路在对Objective-C底层的探究过程中，已经探究过objc_class 结构中的isa...
iOS--OC底层原理文章汇总
OC底层原理01—alloc + init + new原理OC底层原理02—内存对齐OC底层原理03— isa探究...
iOS内存abort(Jetsam) 原理探究
iOS内存abort(Jetsam) 原理探究
iOS底层原理探究（2）
NSObject 为什么没有进入源码断点? NSObject的创建已经在系统级运行中初始化完了，执行了objc_a...
iOS底层原理探究（1）
前言作为一个在iOS领域5年以开发经验的我，只会面向搜索引擎编程，control + C与 contro...
iOS 多线程应用的方方面面
前言讲解多线程的博文很多，但大部分是标榜着“底层原理”、“深入探究”标签的水文。本文主要是探究的角度讲解iOS多...