美文网首页
13、Mach-O文件

13、Mach-O文件

作者: 白马啸红中 | 来源:发表于2021-03-01 14:57 被阅读0次

这篇内容比较多,由于前面dyld的需要拓展,后续也需要分析这个文件,放查漏补缺太多了,所以这里单独分析一波。

1、名词解释

百度官方解释:Mach-OMach Object文件格式的缩写,它是一种用于可执行文件,目标代码,动态库,内核转储的文件格式。作为a.out格式的替代,Mach-O提供了更强的扩展性,并提升了符号表中信息的访问速度。

民间大佬解释:Mach-O是一种Mac上可执行文件格式,类似于windows上的PE格式 (Portable Executable), linux上的elf格式 (Executable and Linking Format)。我们编写的C、C++、swift、OC,最终编译链接生成Mach-O可执行文件。

个人理解关键词:二进制文件可以执行文件编译链接生成库文件的最终表示形态

2、Mach-O类型

#define MH_OBJECT   0x1     /* relocatable object file  可重定向目标文件 .o*/
#define MH_EXECUTE  0x2     /* demand paged executable file 需要被传呼的可执行文件*/
#define MH_FVMLIB   0x3     /* fixed VM shared library file 已修复的虚拟内存共享库文件*/
#define MH_CORE     0x4     /* core file 内核文件*/
#define MH_PRELOAD  0x5     /* preloaded executable file 预加载可执行文件*/
#define MH_DYLIB    0x6     /* dynamically bound shared library 动态库*/
#define MH_DYLINKER 0x7     /* dynamic link editor 动态链接器(dyld)*/
#define MH_BUNDLE   0x8     /* dynamically bound bundle file 动态边界包文件*/
#define MH_DYLIB_STUB   0x9     /* shared library stub for static
                       linking only, no section contents 残留的只能静态链接的共享库,没有section的内容*/
#define MH_DSYM     0xa     /* companion file with only debug
                       sections 仅含有调试sections的对比文件*/
#define MH_KEXT_BUNDLE  0xb     /* x86_64 kexts 内核拓展文件*/
#define MH_FILESET  0xc     /* a file composed of other Mach-Os to
                       be run in the same userspace sharing
                       a single linkedit. 运行在相同命名空间下功效一个链接的Mach-O文件集合*/

3、Mach-O文件格式

Mach-O文件格式

Header:

/*
 * 这里只贴出了`arm64`的
 * The 64-bit mach header appears at the very beginning of object files for
 * 64-bit architectures.
 */
struct mach_header_64 {
    uint32_t    magic;      /* mach magic number identifier 魔法数字*/
    cpu_type_t  cputype;    /* cpu specifier CPU架构及子版本*/
    cpu_subtype_t   cpusubtype; /* machine specifier CPU子版本*/
    uint32_t    filetype;   /* type of file  Mach-O文件类型 MH开头的那12个*/
    uint32_t    ncmds;      /* number of load commands 加载命令的数量*/
    uint32_t    sizeofcmds; /* the size of all the load commands 所有加载命令的大小*/
    uint32_t    flags;      /* flags dyld加载需要的一些标记 定义和filetype在同一个文件中*/
    uint32_t    reserved;   /* reserved  64位保留字段*/
};

/* Constant for the magic field of the mach_header_64 (64-bit architectures) */
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */

Load commands:

/*
 * The load commands directly follow the mach_header.  The total size of all
 * of the commands is given by the sizeofcmds field in the mach_header.  All
 * load commands must have as their first two fields cmd and cmdsize.  The cmd
 * field is filled in with a constant for that command type.  Each command type
 * has a structure specifically for it.  The cmdsize field is the size in bytes
 * of the particular load command structure plus anything that follows it that
 * is a part of the load command (i.e. section structures, strings, etc.).  To
 * advance to the next load command the cmdsize can be added to the offset or
 * pointer of the current load command.  The cmdsize for 32-bit architectures
 * MUST be a multiple of 4 bytes and for 64-bit architectures MUST be a multiple
 * of 8 bytes (these are forever the maximum alignment of any load commands).
 * The padded bytes must be zero.  All tables in the object file must also
 * follow these rules so the file can be memory mapped.  Otherwise the pointers
 * to these tables will not work well or at all on some machines.  With all
 * padding zeroed like objects will compare byte for byte.
 */
struct load_command {
    uint32_t cmd;       /* type of load command */
    uint32_t cmdsize;   /* total size of command in bytes */
};

通过machOView扒出JSONModelMach-O文件印证一下:

JSONModel的Mach-O文件 因为是在模拟器跑的所以是X86_64的,这里可以知道Load Commands展开选中了一条,可以找到对应的cmdcmdsize分别是LC_ID_DYLIB64header的部分也可以查看验证一下。

cmd的类型:

#define LC_SEGMENT  0x1 /* segment of this file to be mapped 定义一段(Segment),加载后被映射到进程的内存空间中,包括里面的节(Section)*/
#define LC_SYMTAB   0x2 /* link-edit stab symbol table info 定义符号表和字符串表,链接文件时被dyld使用,也用于调试器映射符号到源文件。符号表定义的本地符号仅用于调试,而已定义和未定义的external符号被链接器使用*/
#define LC_SYMSEG   0x3 /* link-edit gdb symbol table info (obsolete) gdb符号表信息,符号表中详细说明了代码中所用符号的信息等 */
#define LC_THREAD   0x4 /* thread */
#define LC_UNIXTHREAD   0x5 /* unix thread (includes a stack) */
#define LC_LOADFVMLIB   0x6 /* load a specified fixed VM shared library */
#define LC_IDFVMLIB 0x7 /* fixed VM shared library identification */
#define LC_IDENT    0x8 /* object identification info (obsolete) */
#define LC_FVMFILE  0x9 /* fixed VM file inclusion (internal use) */
#define LC_PREPAGE      0xa     /* prepage command (internal use) */
#define LC_DYSYMTAB 0xb /* dynamic link-edit symbol table info 将符号表中给出符号的额外信息提供给dyld*/
#define LC_LOAD_DYLIB   0xc /* load a dynamically linked shared library 依赖的动态库,含动态库名,版本号等信息*/
#define LC_ID_DYLIB 0xd /* dynamically linked shared lib ident */
#define LC_LOAD_DYLINKER 0xe    /* load a dynamic linker dyld的默认路径*/
#define LC_ID_DYLINKER  0xf /* dynamic linker identification */
#define LC_PREBOUND_DYLIB 0x10  /* modules prebound for a dynamically */
                /*  linked shared library */
#define LC_ROUTINES 0x11    /* image routines */
#define LC_SUB_FRAMEWORK 0x12   /* sub framework */
#define LC_SUB_UMBRELLA 0x13    /* sub umbrella */
#define LC_SUB_CLIENT   0x14    /* sub client */
#define LC_SUB_LIBRARY  0x15    /* sub library */
#define LC_TWOLEVEL_HINTS 0x16  /* two-level namespace lookup hints */
#define LC_PREBIND_CKSUM  0x17  /* prebind checksum */
/*
 * load a dynamically linked shared library that is allowed to be missing
 * (all symbols are weak imported).
 */
#define LC_LOAD_WEAK_DYLIB (0x18 | LC_REQ_DYLD)

#define LC_SEGMENT_64   0x19    /* 64-bit segment of this file to be
                   mapped 定义一段(Segment),加载后被映射到进程的内存空间中,包括里面的节(Section)*/
#define LC_ROUTINES_64  0x1a    /* 64-bit image routines */
#define LC_UUID     0x1b    /* the uuid Mach-O唯一ID*/
#define LC_RPATH       (0x1c | LC_REQ_DYLD)    /* runpath additions @rpath搜索路径*/
#define LC_CODE_SIGNATURE 0x1d  /* local of code signature 代码签名信息*/
#define LC_SEGMENT_SPLIT_INFO 0x1e /* local of info to split segments */
#define LC_REEXPORT_DYLIB (0x1f | LC_REQ_DYLD) /* load and re-export dylib */
#define LC_LAZY_LOAD_DYLIB 0x20 /* delay load of dylib until first use */
#define LC_ENCRYPTION_INFO 0x21 /* encrypted segment information */
#define LC_DYLD_INFO    0x22    /* compressed dyld information */
#define LC_DYLD_INFO_ONLY (0x22|LC_REQ_DYLD)    /* compressed dyld information only 记录有关链接的信息,包括在__LINKEDIT中动态链接的相关信息的具体偏移与大小(重定位,绑定,弱绑定,懒加载绑定,导出信息等),ONLY表示该指令是程序运行所必需的。*/
#define LC_LOAD_UPWARD_DYLIB (0x23 | LC_REQ_DYLD) /* load upward dylib */
#define LC_VERSION_MIN_MACOSX 0x24   /* build for MacOSX min OS version */
#define LC_VERSION_MIN_IPHONEOS 0x25 /* build for iPhoneOS min OS version */
#define LC_FUNCTION_STARTS 0x26 /* compressed table of function start addresses */
#define LC_DYLD_ENVIRONMENT 0x27 /* string for dyld to treat
                    like environment variable */
#define LC_MAIN (0x28|LC_REQ_DYLD) /* replacement for LC_UNIXTHREAD 应用程序入口,dyld的_main函数获取该地址,然后跳转*/
#define LC_DATA_IN_CODE 0x29 /* table of non-instructions in __text 定义在代码段内的非指令的表*/
#define LC_SOURCE_VERSION 0x2A /* source version used to build binary 构建二进制文件的源代码版本号*/
#define LC_DYLIB_CODE_SIGN_DRS 0x2B /* Code signing DRs copied from linked dylibs */
#define LC_ENCRYPTION_INFO_64 0x2C /* 64-bit encrypted segment information 64位文件加密段信息,加密内容偏移和大小*/
#define LC_LINKER_OPTION 0x2D /* linker options in MH_OBJECT files */
#define LC_LINKER_OPTIMIZATION_HINT 0x2E /* optimization hints in MH_OBJECT files */
#define LC_VERSION_MIN_TVOS 0x2F /* build for AppleTV min OS version */
#define LC_VERSION_MIN_WATCHOS 0x30 /* build for Watch min OS version */
#define LC_NOTE 0x31 /* arbitrary data included within a Mach-O file */
#define LC_BUILD_VERSION 0x32 /* build for platform min OS version */
#define LC_DYLD_EXPORTS_TRIE (0x33 | LC_REQ_DYLD) /* used with linkedit_data_command, payload is trie */
#define LC_DYLD_CHAINED_FIXUPS (0x34 | LC_REQ_DYLD) /* used with linkedit_data_command */
#define LC_FILESET_ENTRY (0x35 | LC_REQ_DYLD) /* used with fileset_entry_command */

其中标注了一些经常出现的,针对LC_SEGMENT_64说明下,其结构为:

/*
 * The 64-bit segment load command indicates that a part of this file is to be
 * mapped into a 64-bit task's address space.  If the 64-bit segment has
 * sections then section_64 structures directly follow the 64-bit segment
 * command and their size is reflected in cmdsize.
 */
struct segment_command_64 { /* for 64-bit architectures */
    uint32_t    cmd;        /* LC_SEGMENT_64 */
    uint32_t    cmdsize;    /* includes sizeof section_64 structs */
    char        segname[16];    /* segment name */
    uint64_t    vmaddr;     /* memory address of this segment */
    uint64_t    vmsize;     /* memory size of this segment */
    uint64_t    fileoff;    /* file offset of this segment */
    uint64_t    filesize;   /* amount to map from the file */
    vm_prot_t   maxprot;    /* maximum VM protection */
    vm_prot_t   initprot;   /* initial VM protection */
    uint32_t    nsects;     /* number of sections in segment */
    uint32_t    flags;      /* flags */
};

/* Constants for the flags field of the segment_command */
#define SG_HIGHVM   0x1 /* the file contents for this segment is for
                   the high part of the VM space, the low part
                   is zero filled (for stacks in core files) */
#define SG_FVMLIB   0x2 /* this segment is the VM that is allocated by
                   a fixed VM library, for overlap checking in
                   the link editor */
#define SG_NORELOC  0x4 /* this segment has nothing that was relocated
                   in it and nothing relocated to it, that is
                   it maybe safely replaced without relocation*/
#define SG_PROTECTED_VERSION_1  0x8 /* This segment is protected.  If the
                       segment starts at file offset 0, the
                       first page of the segment is not
                       protected.  All other pages of the
                       segment are protected. */
#define SG_READ_ONLY    0x10 /* This segment is made read-only after fixups */

LC_SEGMENT_64的类型也有几种:

LC_SEGMENT_64不同类型 _PAGEZERO:空指针陷阱段,映射到虚拟内存空间第一页,捕捉对NULL指针的引用
_TEXT:代码段、只读数据段
_DATA:读取和写入数据段
_LINKEDITdyld需要使用的信息,包括重定位、绑定、懒加载信息等
这几个类型定义后续的Section64也可以参考。

Data

struct section_64 { /* for 64-bit architectures */
    char        sectname[16];   /* name of this section */
    char        segname[16];    /* segment this section goes in */
    uint64_t    addr;       /* memory address of this section */
    uint64_t    size;       /* size in bytes of this section */
    uint32_t    offset;     /* file offset of this section */
    uint32_t    align;      /* section alignment (power of 2) */
    uint32_t    reloff;     /* file offset of relocation entries */
    uint32_t    nreloc;     /* number of relocation entries */
    uint32_t    flags;      /* flags (section type and attributes)*/
    uint32_t    reserved1;  /* reserved (for offset or index) */
    uint32_t    reserved2;  /* reserved (for count or sizeof) */
    uint32_t    reserved3;  /* reserved */
};

之前MachOVIew查看JSONModelMach-O文件可以看到Load Command后面还有很多Section64的数据,就是所谓的Data

Data数据 与Load Command其中LC_SEGMENT_64相似,也存在几种类型:
_TEXT
__text:程序可执行代码区域;
__stubs:间接符号存根,用于跳转到懒加载指针表;
__stubs_helper:懒加载符号加载辅助函数;
__cstring:只读的C字符串,包含OC的部分字符串和属性名;
__objc一methname:objc方法名;
__objc_classname:objc类名;
__objc_methtype:objc方法签名。
_DATA
__nl_symbol_ptr:非懒加载指针表,dyld加载时立即绑定值;
__la_symbol_ptr:懒加载指针表,第1次调用才绑定值;
__got:非懒加载全局指针表;
__mod_init_func:constructor函数;
__cfstring:OC字符串;
__objc_classrefs:被引用的类列表;
__mod_term_func:destructor函数;
__objc_classlist:程序中类的列表;
__objc_nlclslist:程序中自己实现了+load方法的类;
__objc_protolist:协议的列表;

在后面的部分,还有其他类型的数据:

Data尾段数据 这些的结构类型并不是Section64这种类型,这些是固定的结构类型,和之前的Load Command指令是对应上的:
Load Command命令

拓展
ASLR全称是Address Spce Layout Randomization,地址空间布局随机化,是一种针对缓冲区溢出的安全保护技术,通过对堆、栈、共享库映射等线性区布局的随机化,增加了攻击者预测目的地址的难度,防止攻击者直接定位代码位置,阻止溢出攻击。这种技术会使得每个程序或者库每次运行加载到内存中时的基地址都不是固定而是随机的,这种机制会增加黑客的破解难度。

Load Command中,LC_DYLD_INFO或者LC_DYLD_INFO_ONLY就是用来记录所有需要进行地址调整的位置。这样当程序被加载到内存时,加载器就会将需要调整的地址分别进行调整处理,以便转化为真实的内存地址。这个过程称之为基地址重定向(rebase)。

这样加载到物理内存的Mach-O就在lldb打印中不会呈现出物理内存地址了,而是ASLR映射之后的逻辑地址。

相关文章

  • IOS Mach-o 文件的解析

    导论 Mach-o 文件图解 Mach-o 文件中专有名词解释 Mach-o 文件中函数存储地址 Mach-o 文...

  • iOS堆栈信息解析(Mach-O)

    Mach-O文件 Mach-O格式全称为Mach Object文件格式的缩写 Mach-O文件类型分类: 1.Ex...

  • 13、Mach-O文件

    这篇内容比较多,由于前面dyld的需要拓展,后续也需要分析这个文件,放查漏补缺太多了,所以这里单独分析一波。 1、...

  • 四 iOS逆向- Mach-O

    Mach-O文件类型 Mach-O文件基本结构 通用二进制文件 Mach-O是Mach object的缩写,是Ma...

  • MachO文件

    MachO文件 前言 Mach-O(Mach Object):Mach-O 文件是Mach object文件格式的...

  • iOS逆向之Mach-O文件(上)

    本文主要介绍Mach-O文件格式以及通用二进制文件 Mach-O文件概述 Mach-O其实是Mach Object...

  • 【iOS逆向工程】Mach-O

    IPA包里的可执行文件就是Mach-O文件 Mach-O文件压缩一下就是IPA Mach-O是Mach objec...

  • Mach-O文件介绍及dyld加载流程

    什么是Mach-O文件? Mach-O文件是Mach object文件的缩写,它在NeXTSTEP.MacOS,i...

  • 关于 mach-o 的一些笔记

    mach-o 格式文件 mach-o 的文件类型 可以在xnu源码中,查看到Mach-O格式的详细定义(https...

  • Mach-O文件结构

    主要内容: 理解可执行文件 理解Mach-O文件 Mach-O文件结构 Mach Header Load Comm...

网友评论

      本文标题:13、Mach-O文件

      本文链接:https://www.haomeiwen.com/subject/gynrfltx.html