iOS开发之runtime（14）：markgc.cpp源码分析

作者: kyson老师 | 来源:发表于2019-01-12 21:15 被阅读19次

iOS开发之runtime（14）：markgc.cpp源码分析
Runtime
iOS开发之Runtime常用示例总结
iOS runtime 源码分析 + load 和 + init
面试题知识点梳理
iOS底层原理总结 -- 利用Runtime源码分析Categ
runtime总结
iOS runtime（三）runtime之method（1）m
Masonry
OC对象原理(一) alloc&init探索

本系列博客是本人的源码阅读笔记，如果有iOS开发者在看runtime的，欢迎大家多多交流。为了方便讨论，本人新建了一个微信群(iOS技术讨论群)，想要加入的，请添加本人微信：zhujinhui207407，【加我前请备注：ios 】，本人博客http://www.kyson.cn 也在不停的更新中，欢迎一起讨论

runtime logo

本文完整版详见笔者小专栏：https://xiaozhuanlan.com/runtime

背景

上一篇文章中，我们分析了markgc.cpp在runtime编译过程中的作用：将mod_init_func的区名改为了__objc_init_func。但是我们没有具体分析这个文件的替换原理。今天，我们就来分析一下markgc.cpp：

markgc.cpp

通过本文，您将知道：

如何设置区数据
如何读取区数据
如何设置区名

分析

之前的文章中我们已经说过了，section其实是在segment中的，我们再通过machview看一下runtime库的文件构成：

runtime文件

因此如果我们想要替换mod_init_func的section名改为__objc_init_func，思路应该是先遍历并找到mod_init_func，然后通过方法sectionname将section名字进行更改。
按照此思路，我们从头到尾开始分析一下markgc.cpp：
首先是main方法：

int main(int argc, const char *argv[]) {
    for (int i = 1; i < argc; ++i) {
        if (!processFile(argv[i])) return 1;
    }
    return 0;
}

由上一篇文章我们可知，传入的参数是库文件的文件名（连同路径）。
拿到文件后，调用了方法processFile，看名字可知这是要处理这个文件了。

bool processFile(const char *filename)
{
    if (debug) printf("file %s\n", filename);
//打开文件
    int fd = open(filename, O_RDWR);
    if (fd < 0) {
        printf("open %s: %s\n", filename, strerror(errno));
        return false;
    }
    
    struct stat st;
//获取文件状态
    if (fstat(fd, &st) < 0) {
        printf("fstat %s: %s\n", filename, strerror(errno));
        return false;
    }
//将文件映射进内存进行处理
    void *buffer = mmap(NULL, (size_t)st.st_size, PROT_READ|PROT_WRITE, 
                        MAP_FILE|MAP_SHARED, fd, 0);
    if (buffer == MAP_FAILED) {
        printf("mmap %s: %s\n", filename, strerror(errno));
        return false;
    }
//开始处理进入内存后的文件
    bool result = parse_fat((uint8_t *)buffer, (size_t)st.st_size);
//解除映射关系
    munmap(buffer, (size_t)st.st_size);
    close(fd);
    return result;
}

上面代码笔者已经进行了部分注释，大概意思就是先将编译好的runtime库映射进内存进行处理，处理完后解除映射。至于这里为什么要读入内存，笔者知道的一个原因是mmap()对该内存区域的存取即是直接对该文件内容的读写。因此这样一来就省去了读取后再重新保存文件的做法。
在以上方法中有个函数调用是用来处理该文件的：

    bool result = parse_fat((uint8_t *)buffer, (size_t)st.st_size);

的parse_fat，从文件名可以看出，是用来解析fat文件的。关于fat之前已经说过，fat文件是“胖文件”的意思，胖文件就是已经适配了各种架构的处理器（主要是i386和arm）。我们继续进入该方法进行分析：

bool parse_fat(uint8_t *buffer, size_t size)
{
    uint32_t magic;

    if (size < sizeof(magic)) {
        printf("file is too small\n");
        return false;
    }

    magic = *(uint32_t *)buffer;
    if (magic != FAT_MAGIC && magic != FAT_CIGAM) {
        /* Not a fat file */
        return parse_macho(buffer);
    } else {
        struct fat_header *fh;
        uint32_t fat_magic, fat_nfat_arch;
        struct fat_arch *archs;
        
        if (size < sizeof(struct fat_header)) {
            printf("file is too small\n");
            return false;
        }

        fh = (struct fat_header *)buffer;
        fat_magic = OSSwapBigToHostInt32(fh->magic);
        fat_nfat_arch = OSSwapBigToHostInt32(fh->nfat_arch);

        if (size < (sizeof(struct fat_header) + fat_nfat_arch * sizeof(struct fat_arch))) {
            printf("file is too small\n");
            return false;
        }

        archs = (struct fat_arch *)(buffer + sizeof(struct fat_header));

        /* Special case hidden CPU_TYPE_ARM64 */
        if (size >= (sizeof(struct fat_header) + (fat_nfat_arch + 1) * sizeof(struct fat_arch))) {
            if (fat_nfat_arch > 0
                && OSSwapBigToHostInt32(archs[fat_nfat_arch].cputype) == CPU_TYPE_ARM64) {
                fat_nfat_arch++;
            }
        }
        /* End special case hidden CPU_TYPE_ARM64 */

        if (debug) printf("%d fat architectures\n", 
                          fat_nfat_arch);

        for (uint32_t i = 0; i < fat_nfat_arch; i++) {
            uint32_t arch_cputype = OSSwapBigToHostInt32(archs[i].cputype);
            uint32_t arch_cpusubtype = OSSwapBigToHostInt32(archs[i].cpusubtype);
            uint32_t arch_offset = OSSwapBigToHostInt32(archs[i].offset);
            uint32_t arch_size = OSSwapBigToHostInt32(archs[i].size);

            if (debug) printf("cputype %d cpusubtype %d\n", 
                              arch_cputype, arch_cpusubtype);

            /* Check that slice data is after all fat headers and archs */
            if (arch_offset < (sizeof(struct fat_header) + fat_nfat_arch * sizeof(struct fat_arch))) {
                printf("file is badly formed\n");
                return false;
            }

            /* Check that the slice ends before the file does */
            if (arch_offset > size) {
                printf("file is badly formed\n");
                return false;
            }

            if (arch_size > size) {
                printf("file is badly formed\n");
                return false;
            }

            if (arch_offset > (size - arch_size)) {
                printf("file is badly formed\n");
                return false;
            }

            bool ok = parse_macho(buffer + arch_offset);
            if (!ok) return false;
        }
        return true;
    }
}

以上代码大部分都是为了去校验合法性，真正起作用的其实是方法调用：

bool ok = parse_macho(buffer + arch_offset);

可以看出，刚刚是解析胖文件，现在开始解析我们的mach-o文件了：

template<typename P>
bool parse_macho(uint8_t *buffer)
{
    macho_header<P>* mh = (macho_header<P>*)buffer;
    uint8_t *cmds = (uint8_t *)(mh + 1);
    for (uint32_t c = 0; c < mh->ncmds(); c++) {
        macho_load_command<P>* cmd = (macho_load_command<P>*)cmds;
        cmds += cmd->cmdsize();
        if (cmd->cmd() == LC_SEGMENT  ||  cmd->cmd() == LC_SEGMENT_64) {
            doseg(buffer, (macho_segment_command<P>*)cmd);
        }
    }

    return true;
}

由上一篇文章我们知道，通过结构体macho_header我们知道了命令的数目，通过结构体的ncmds的属性。我们在machview中也可以看到对应的展示：

mach_header信息展示
所以以上代码不难看出其实是开始遍历每个命令，并调用

doseg(buffer, (macho_segment_command<P>*)cmd);

方法：

template <typename P>
void doseg(uint8_t *start, macho_segment_command<P> *seg)
{
    if (debug) printf("segment name: %.16s, nsects %u\n",
                      seg->segname(), seg->nsects());
    macho_section<P> *sect = (macho_section<P> *)(seg + 1);
    for (uint32_t i = 0; i < seg->nsects(); ++i) {
        dosect(start, &sect[i]);
    }
}

仍然是遍历所有的sections，然后调用方法：

dosect(start, &sect[i]);

这里终于到了我们熟悉的替换sectionname的方法了：

template <typename P>
void dosect(uint8_t *start, macho_section<P> *sect)
{
    if (debug) printf("section %.16s from segment %.16s\n",
                      sect->sectname(), sect->segname());

    // Strip S_MOD_INIT/TERM_FUNC_POINTERS. We don't want dyld to call 
    // our init funcs because it is too late, and we don't want anyone to 
    // call our term funcs ever.
    if (segnameStartsWith(sect->segname(), "__DATA")  &&  
        sectnameEquals(sect->sectname(), "__mod_init_func"))
    {
        // section type 0 is S_REGULAR
        sect->set_flags(sect->flags() & ~SECTION_TYPE);
        sect->set_sectname("__objc_init_func");
        if (debug) printf("disabled __mod_init_func section\n");
    }
    if (segnameStartsWith(sect->segname(), "__DATA")  &&  
        sectnameEquals(sect->sectname(), "__mod_term_func"))
    {
        // section type 0 is S_REGULAR
        sect->set_flags(sect->flags() & ~SECTION_TYPE);
        sect->set_sectname("__objc_term_func");
        if (debug) printf("disabled __mod_term_func section\n");
    }
}

至此分析终于结束了。