这篇内容比较多,由于前面dyld的需要拓展,后续也需要分析这个文件,放查漏补缺太多了,所以这里单独分析一波。
1、名词解释
百度官方解释:Mach-O
为Mach Object
文件格式的缩写,它是一种用于可执行文件,目标代码,动态库,内核转储的文件格式。作为a.out
格式的替代,Mach-O
提供了更强的扩展性,并提升了符号表中信息的访问速度。
民间大佬解释:Mach-O
是一种Mac
上可执行文件格式,类似于windows上的PE
格式 (Portable Executable
), linux
上的elf
格式 (Executable and Linking Format
)。我们编写的C、C++、swift、OC
,最终编译链接生成Mach-O
可执行文件。
个人理解关键词:二进制文件
、可以执行文件
、编译链接生成
、库文件的最终表示形态
。
2、Mach-O类型
#define MH_OBJECT 0x1 /* relocatable object file 可重定向目标文件 .o*/
#define MH_EXECUTE 0x2 /* demand paged executable file 需要被传呼的可执行文件*/
#define MH_FVMLIB 0x3 /* fixed VM shared library file 已修复的虚拟内存共享库文件*/
#define MH_CORE 0x4 /* core file 内核文件*/
#define MH_PRELOAD 0x5 /* preloaded executable file 预加载可执行文件*/
#define MH_DYLIB 0x6 /* dynamically bound shared library 动态库*/
#define MH_DYLINKER 0x7 /* dynamic link editor 动态链接器(dyld)*/
#define MH_BUNDLE 0x8 /* dynamically bound bundle file 动态边界包文件*/
#define MH_DYLIB_STUB 0x9 /* shared library stub for static
linking only, no section contents 残留的只能静态链接的共享库,没有section的内容*/
#define MH_DSYM 0xa /* companion file with only debug
sections 仅含有调试sections的对比文件*/
#define MH_KEXT_BUNDLE 0xb /* x86_64 kexts 内核拓展文件*/
#define MH_FILESET 0xc /* a file composed of other Mach-Os to
be run in the same userspace sharing
a single linkedit. 运行在相同命名空间下功效一个链接的Mach-O文件集合*/
3、Mach-O文件格式

Header
:
/*
* 这里只贴出了`arm64`的
* The 64-bit mach header appears at the very beginning of object files for
* 64-bit architectures.
*/
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier 魔法数字*/
cpu_type_t cputype; /* cpu specifier CPU架构及子版本*/
cpu_subtype_t cpusubtype; /* machine specifier CPU子版本*/
uint32_t filetype; /* type of file Mach-O文件类型 MH开头的那12个*/
uint32_t ncmds; /* number of load commands 加载命令的数量*/
uint32_t sizeofcmds; /* the size of all the load commands 所有加载命令的大小*/
uint32_t flags; /* flags dyld加载需要的一些标记 定义和filetype在同一个文件中*/
uint32_t reserved; /* reserved 64位保留字段*/
};
/* Constant for the magic field of the mach_header_64 (64-bit architectures) */
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
Load commands
:
/*
* The load commands directly follow the mach_header. The total size of all
* of the commands is given by the sizeofcmds field in the mach_header. All
* load commands must have as their first two fields cmd and cmdsize. The cmd
* field is filled in with a constant for that command type. Each command type
* has a structure specifically for it. The cmdsize field is the size in bytes
* of the particular load command structure plus anything that follows it that
* is a part of the load command (i.e. section structures, strings, etc.). To
* advance to the next load command the cmdsize can be added to the offset or
* pointer of the current load command. The cmdsize for 32-bit architectures
* MUST be a multiple of 4 bytes and for 64-bit architectures MUST be a multiple
* of 8 bytes (these are forever the maximum alignment of any load commands).
* The padded bytes must be zero. All tables in the object file must also
* follow these rules so the file can be memory mapped. Otherwise the pointers
* to these tables will not work well or at all on some machines. With all
* padding zeroed like objects will compare byte for byte.
*/
struct load_command {
uint32_t cmd; /* type of load command */
uint32_t cmdsize; /* total size of command in bytes */
};
通过machOView
扒出JSONModel
的Mach-O
文件印证一下:

X86_64
的,这里可以知道Load Commands
展开选中了一条,可以找到对应的cmd
和cmdsize
分别是LC_ID_DYLIB
和64
,header
的部分也可以查看验证一下。
cmd
的类型:
#define LC_SEGMENT 0x1 /* segment of this file to be mapped 定义一段(Segment),加载后被映射到进程的内存空间中,包括里面的节(Section)*/
#define LC_SYMTAB 0x2 /* link-edit stab symbol table info 定义符号表和字符串表,链接文件时被dyld使用,也用于调试器映射符号到源文件。符号表定义的本地符号仅用于调试,而已定义和未定义的external符号被链接器使用*/
#define LC_SYMSEG 0x3 /* link-edit gdb symbol table info (obsolete) gdb符号表信息,符号表中详细说明了代码中所用符号的信息等 */
#define LC_THREAD 0x4 /* thread */
#define LC_UNIXTHREAD 0x5 /* unix thread (includes a stack) */
#define LC_LOADFVMLIB 0x6 /* load a specified fixed VM shared library */
#define LC_IDFVMLIB 0x7 /* fixed VM shared library identification */
#define LC_IDENT 0x8 /* object identification info (obsolete) */
#define LC_FVMFILE 0x9 /* fixed VM file inclusion (internal use) */
#define LC_PREPAGE 0xa /* prepage command (internal use) */
#define LC_DYSYMTAB 0xb /* dynamic link-edit symbol table info 将符号表中给出符号的额外信息提供给dyld*/
#define LC_LOAD_DYLIB 0xc /* load a dynamically linked shared library 依赖的动态库,含动态库名,版本号等信息*/
#define LC_ID_DYLIB 0xd /* dynamically linked shared lib ident */
#define LC_LOAD_DYLINKER 0xe /* load a dynamic linker dyld的默认路径*/
#define LC_ID_DYLINKER 0xf /* dynamic linker identification */
#define LC_PREBOUND_DYLIB 0x10 /* modules prebound for a dynamically */
/* linked shared library */
#define LC_ROUTINES 0x11 /* image routines */
#define LC_SUB_FRAMEWORK 0x12 /* sub framework */
#define LC_SUB_UMBRELLA 0x13 /* sub umbrella */
#define LC_SUB_CLIENT 0x14 /* sub client */
#define LC_SUB_LIBRARY 0x15 /* sub library */
#define LC_TWOLEVEL_HINTS 0x16 /* two-level namespace lookup hints */
#define LC_PREBIND_CKSUM 0x17 /* prebind checksum */
/*
* load a dynamically linked shared library that is allowed to be missing
* (all symbols are weak imported).
*/
#define LC_LOAD_WEAK_DYLIB (0x18 | LC_REQ_DYLD)
#define LC_SEGMENT_64 0x19 /* 64-bit segment of this file to be
mapped 定义一段(Segment),加载后被映射到进程的内存空间中,包括里面的节(Section)*/
#define LC_ROUTINES_64 0x1a /* 64-bit image routines */
#define LC_UUID 0x1b /* the uuid Mach-O唯一ID*/
#define LC_RPATH (0x1c | LC_REQ_DYLD) /* runpath additions @rpath搜索路径*/
#define LC_CODE_SIGNATURE 0x1d /* local of code signature 代码签名信息*/
#define LC_SEGMENT_SPLIT_INFO 0x1e /* local of info to split segments */
#define LC_REEXPORT_DYLIB (0x1f | LC_REQ_DYLD) /* load and re-export dylib */
#define LC_LAZY_LOAD_DYLIB 0x20 /* delay load of dylib until first use */
#define LC_ENCRYPTION_INFO 0x21 /* encrypted segment information */
#define LC_DYLD_INFO 0x22 /* compressed dyld information */
#define LC_DYLD_INFO_ONLY (0x22|LC_REQ_DYLD) /* compressed dyld information only 记录有关链接的信息,包括在__LINKEDIT中动态链接的相关信息的具体偏移与大小(重定位,绑定,弱绑定,懒加载绑定,导出信息等),ONLY表示该指令是程序运行所必需的。*/
#define LC_LOAD_UPWARD_DYLIB (0x23 | LC_REQ_DYLD) /* load upward dylib */
#define LC_VERSION_MIN_MACOSX 0x24 /* build for MacOSX min OS version */
#define LC_VERSION_MIN_IPHONEOS 0x25 /* build for iPhoneOS min OS version */
#define LC_FUNCTION_STARTS 0x26 /* compressed table of function start addresses */
#define LC_DYLD_ENVIRONMENT 0x27 /* string for dyld to treat
like environment variable */
#define LC_MAIN (0x28|LC_REQ_DYLD) /* replacement for LC_UNIXTHREAD 应用程序入口,dyld的_main函数获取该地址,然后跳转*/
#define LC_DATA_IN_CODE 0x29 /* table of non-instructions in __text 定义在代码段内的非指令的表*/
#define LC_SOURCE_VERSION 0x2A /* source version used to build binary 构建二进制文件的源代码版本号*/
#define LC_DYLIB_CODE_SIGN_DRS 0x2B /* Code signing DRs copied from linked dylibs */
#define LC_ENCRYPTION_INFO_64 0x2C /* 64-bit encrypted segment information 64位文件加密段信息,加密内容偏移和大小*/
#define LC_LINKER_OPTION 0x2D /* linker options in MH_OBJECT files */
#define LC_LINKER_OPTIMIZATION_HINT 0x2E /* optimization hints in MH_OBJECT files */
#define LC_VERSION_MIN_TVOS 0x2F /* build for AppleTV min OS version */
#define LC_VERSION_MIN_WATCHOS 0x30 /* build for Watch min OS version */
#define LC_NOTE 0x31 /* arbitrary data included within a Mach-O file */
#define LC_BUILD_VERSION 0x32 /* build for platform min OS version */
#define LC_DYLD_EXPORTS_TRIE (0x33 | LC_REQ_DYLD) /* used with linkedit_data_command, payload is trie */
#define LC_DYLD_CHAINED_FIXUPS (0x34 | LC_REQ_DYLD) /* used with linkedit_data_command */
#define LC_FILESET_ENTRY (0x35 | LC_REQ_DYLD) /* used with fileset_entry_command */
其中标注了一些经常出现的,针对LC_SEGMENT_64
说明下,其结构为:
/*
* The 64-bit segment load command indicates that a part of this file is to be
* mapped into a 64-bit task's address space. If the 64-bit segment has
* sections then section_64 structures directly follow the 64-bit segment
* command and their size is reflected in cmdsize.
*/
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* includes sizeof section_64 structs */
char segname[16]; /* segment name */
uint64_t vmaddr; /* memory address of this segment */
uint64_t vmsize; /* memory size of this segment */
uint64_t fileoff; /* file offset of this segment */
uint64_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
/* Constants for the flags field of the segment_command */
#define SG_HIGHVM 0x1 /* the file contents for this segment is for
the high part of the VM space, the low part
is zero filled (for stacks in core files) */
#define SG_FVMLIB 0x2 /* this segment is the VM that is allocated by
a fixed VM library, for overlap checking in
the link editor */
#define SG_NORELOC 0x4 /* this segment has nothing that was relocated
in it and nothing relocated to it, that is
it maybe safely replaced without relocation*/
#define SG_PROTECTED_VERSION_1 0x8 /* This segment is protected. If the
segment starts at file offset 0, the
first page of the segment is not
protected. All other pages of the
segment are protected. */
#define SG_READ_ONLY 0x10 /* This segment is made read-only after fixups */
LC_SEGMENT_64
的类型也有几种:

_PAGEZERO
:空指针陷阱段,映射到虚拟内存空间第一页,捕捉对NULL指针的引用_TEXT
:代码段、只读数据段_DATA
:读取和写入数据段_LINKEDIT
:dyld
需要使用的信息,包括重定位、绑定、懒加载信息等这几个类型定义后续的
Section64
也可以参考。
Data
:
struct section_64 { /* for 64-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint64_t addr; /* memory address of this section */
uint64_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
uint32_t reserved3; /* reserved */
};
之前MachOVIew
查看JSONModel
的Mach-O
文件可以看到Load Command
后面还有很多Section64
的数据,就是所谓的Data
。

Load Command
其中LC_SEGMENT_64
相似,也存在几种类型:_TEXT
:__text
:程序可执行代码区域;__stubs
:间接符号存根,用于跳转到懒加载指针表;__stubs_helper
:懒加载符号加载辅助函数;__cstring
:只读的C字符串,包含OC的部分字符串和属性名;__objc一methname
:objc方法名;__objc_classname
:objc类名;__objc_methtype
:objc方法签名。_DATA
:__nl_symbol_ptr
:非懒加载指针表,dyld加载时立即绑定值;__la_symbol_ptr
:懒加载指针表,第1次调用才绑定值;__got
:非懒加载全局指针表;__mod_init_func
:constructor函数;__cfstring
:OC字符串;__objc_classrefs
:被引用的类列表;__mod_term_func
:destructor函数;__objc_classlist
:程序中类的列表;__objc_nlclslist
:程序中自己实现了+load方法的类;__objc_protolist
:协议的列表;
在后面的部分,还有其他类型的数据:

Section64
这种类型,这些是固定的结构类型,和之前的Load Command
指令是对应上的:
拓展
:
ASLR
全称是Address Spce Layout Randomization
,地址空间布局随机化,是一种针对缓冲区溢出的安全保护技术,通过对堆、栈、共享库映射等线性区布局的随机化,增加了攻击者预测目的地址的难度,防止攻击者直接定位代码位置,阻止溢出攻击。这种技术会使得每个程序或者库每次运行加载到内存中时的基地址都不是固定而是随机的,这种机制会增加黑客的破解难度。
在Load Command
中,LC_DYLD_INFO
或者LC_DYLD_INFO_ONLY
就是用来记录所有需要进行地址调整的位置。这样当程序被加载到内存时,加载器就会将需要调整的地址分别进行调整处理,以便转化为真实的内存地址。这个过程称之为基地址重定向(rebase
)。
这样加载到物理内存的Mach-O
就在lldb
打印中不会呈现出物理内存地址了,而是ASLR
映射之后的逻辑地址。
网友评论