美文网首页
iOS Runtime 三: 类中的数据结构和内存布局

iOS Runtime 三: 类中的数据结构和内存布局

作者: Trigger_o | 来源:发表于2022-08-30 18:33 被阅读0次

    类对象

    objc_class

    定义在objc-runtime-new.h,继承自objc_object,所以后面都叫它类对象好了.

    struct objc_class : objc_object {
        // Class ISA;
        Class superclass;
        cache_t cache;             // formerly cache pointer and vtable
        class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
    
    Class getSuperclass() const {...}
    void setSuperclass(Class newSuperclass) {...}
    
    class_rw_t *data() const {
            return bits.data();
    }
    void setData(class_rw_t *newData) {
            bits.setData(newData);
     }
    

    先看这些内容,首先第一个成员是isa,类对象的isa指向元类
    第二个成员是Class指针,指向superclass
    第三个成员cache是缓存
    第四个成员bits存放的是类的具体内容.

    然后是getSuperclass() 和setSuperclass(),设置和获取父类.
    如果是arm64e环境,并且ISA_SIGNING_SIGN_MODE不是NONE,就会在set的时候签名,在get的时候解签,
    否则就直接给superclass赋值,以及直接返回superclass.

    最下面两个函数,设置和获取一个class_rw_t *,可以看到调用的是bits的方法
    所以class_rw_t和class_data_bits_t是关联的,分别看看这两个结构体,

    class_data_bits_t

    #if __LP64__
    #define FAST_DATA_MASK          0x00007ffffffffff8UL
    #else
    #define FAST_DATA_MASK        0xfffffffcUL
    
    struct class_data_bits_t {
        friend objc_class;
    
        uintptr_t bits;
    public:
        class_rw_t* data() const {
            return (class_rw_t *)(bits & FAST_DATA_MASK);
        }
        void setData(class_rw_t *newData)
        {
            ASSERT(!data()  ||  (newData->flags & (RW_REALIZING | RW_FUTURE)));
            // Set during realization or construction only. No locking needed.
            // Use a store-release fence because there may be concurrent
            // readers of data and data's contents.
            uintptr_t newBits = (bits & ~FAST_DATA_MASK) | (uintptr_t)newData;
            atomic_thread_fence(memory_order_release);
            bits = newBits;
        }
    
        // Get the class's ro data, even in the presence of concurrent realization.
        // fixme this isn't really safe without a compiler barrier at least
        // and probably a memory barrier when realizeClass changes the data field
        const class_ro_t *safe_ro() const {
            class_rw_t *maybe_rw = data();
            if (maybe_rw->flags & RW_REALIZED) {
                // maybe_rw is rw
                return maybe_rw->ro();
            } else {
                // maybe_rw is actually ro
                return (class_ro_t *)maybe_rw;
            }
        }
    

    只有一个成员变量bits,类型是uintptr_t,这个类型与当前环境的指针大小相同,而且它确实也可以当做指针,是8个字节.

    setData()和getData,实质是设置和获取class_rw_t,注释中说明只在运行时调用这个方法,并且需要注意线程安全.
    set的时候,拿bits和FAST_DATA_MASK的取反与运算,然后或上参数newData,得到的值就是新的bits,
    以32位为例,假设bits是xxxx xxxP, ~FAST_DATA_MASK是0000 0003, newData是xxxx xxxQ,所以本质是(P&3)|Q,newData的前面都是原模原样;同样的在get的时候,相当于P&c.
    0x00007ffffffffff8是0000 0000 0000 0000 011111111111111111111111111111111111111111111000,根据isa的经验,class_rw_t应该就是存储与bits的第4到47位了.

    其次是get class_ro_t,注释说明可以并发获取,并且不能修改内容.
    这个函数可以看到两种情况,一种情况返回maybe_rw->ro(),也就是说从class_rw_t中获得class_ro_t,
    另一种情况是直接变换指针类型,把class_rw_t当做class_ro_t返回,

    class_rw_t

    接下来就看看class_rw_t的结构

    struct class_rw_t {
        // Be warned that Symbolication knows the layout of this structure.
        uint32_t flags;
        uint16_t witness;
        explicit_atomic<uintptr_t> ro_or_rw_ext;
        Class firstSubclass;
        Class nextSiblingClass;
    private:
        using ro_or_rw_ext_t = objc::PointerUnion<const class_ro_t, class_rw_ext_t, PTRAUTH_STR("class_ro_t"), PTRAUTH_STR("class_rw_ext_t")>;
        const ro_or_rw_ext_t get_ro_or_rwe() 
        void set_ro_or_rwe(const class_ro_t *ro)
        void set_ro_or_rwe(class_rw_ext_t *rwe, const class_ro_t *ro) 
        class_rw_ext_t *extAlloc(const class_ro_t *ro, bool deep = false);
    public:
        void setFlags(uint32_t set)
        void clearFlags(uint32_t clear)
        void changeFlags(uint32_t set, uint32_t clear) 
        class_rw_ext_t *ext() 
        class_rw_ext_t *extAllocIfNeeded()
        class_rw_ext_t *deepCopy(const class_ro_t *ro) 
        const class_ro_t *ro() const 
        void set_ro(const class_ro_t *ro)
        const method_array_t methods() 
        const property_array_t properties()
        const protocol_array_t protocols()
    };
    

    首先explicit_atomic,它是继承自C++的atomic,用于包装一个值,实现多个线程安全访问,不会引起数据竞争.

    using在这里的作用类似typedef,声明了一个ro_or_rw_ext_t,现在ro_or_rw_ext_t就是一个类名.
    PointerUnion是一个类,它的目的和c的union类似,可以定义多种类型的成员,但是同时只能表达一个.

    template <class T1, class T2, typename Auth1, typename Auth2>
    class PointerUnion {
    uintptr_t _value;
    

    定义的时候需要4个泛型模板,两个类型,两个成员名称,在这里传的是class_ro_t和class_rw_ext_t;
    也就是说,ro_or_rw_ext_t要么是class_rw_ext_t 要么是class_ro_t.
    PointerUnion提供了一个is()用于判断是哪一种,调用的时候需要声明类型,比如v.is<class_rw_ext_t *>(),如果此时表达class_rw_ext_t,就返回true.
    PointerUnion还提供了一个get()用于获取数据,方法和is()相同,返回的就是指定类型的指针.

    const ro_or_rw_ext_t get_ro_or_rwe() const {
            return ro_or_rw_ext_t{ro_or_rw_ext};
        }
    

    这个方法是用ro_or_rw_ext初始化ro_or_rw_ext_t,PointerUnion只有一个_value属性,在这里就是用ro_or_rw_ext赋值.
    后面对于数据的创建存取操作都是由PointerUnion,也就是get_ro_or_rwe()来完成.
    并且有时候还有匿名的ro_or_rw_ext_t{ro_or_rw_ext},目的也是用PointerUnion来处理,用同样的ro_or_rw_ext初始化的PointerUnion本质是一个对象.

    const method_array_t methods() const {
            auto v = get_ro_or_rwe();
            if (v.is<class_rw_ext_t *>()) {
                return v.get<class_rw_ext_t *>(&ro_or_rw_ext)->methods;
            } else {
                return method_array_t{v.get<const class_ro_t *>(&ro_or_rw_ext)->baseMethods()};
            }
        }
    

    在后面的成员方法中,首先就是获取get_ro_or_rwe(),
    比如上面这个获取方法列表的函数methods(),先获取ro_or_rw_ext_t,如果是class_rw_ext_t,就返回它的methods(),如果是class_ro_t,就返回它的baseMethods().

    最后算一下class_rw_t的大小,4+2+8+8+8 = 30,但是由于内存对齐,需要32字节.

    class_rw_ext_t
    PointerUnion用于表达class_rw_ext_t指针或者class_ro_t,那么就来看看这两个结构体

    struct class_rw_ext_t {
        DECLARE_AUTHED_PTR_TEMPLATE(class_ro_t)
        class_ro_t_authed_ptr<const class_ro_t> ro;
        method_array_t methods;
        property_array_t properties;
        protocol_array_t protocols;
        char *demangledName;
        uint32_t version;
    };
    

    DECLARE_AUTHED_PTR_TEMPLATE是声明一个结构体指针,声明出来的就是class_ro_t_authed_ptr,
    这个宏是这么定义的,

    #define DECLARE_AUTHED_PTR_TEMPLATE(name)                      \
        template <typename T> using name ## _authed_ptr            \
            = WrappedPtr<T, PTRAUTH_STR(name)>;
    #else
    #define PTRAUTH_STR(name) PtrauthRaw
    #define DECLARE_AUTHED_PTR_TEMPLATE(name)                      \
        template <typename T> using name ## _authed_ptr = RawPtr<T>;
    #endif
    

    它的目的是签名,用于安全性,有两种定义,一个是arm64e的签名,一个是不签名,
    除了签名还有包装,包装使用下面这个结构体,它有一个指针ptr,这就是最终指向class_ro_t的指针.

    template<typename T, typename Auth>
    struct WrappedPtr {
    private:
        T *ptr;
    

    除此之外,class_rw_t里还有方法列表,属性列表,协议列表.

    class_ro_t
    里面大概是这些内容

    struct class_ro_t {
        uint32_t flags;
        uint32_t instanceStart;
        uint32_t instanceSize;
    #ifdef __LP64__
        uint32_t reserved;
    #endif
        union {
            const uint8_t * ivarLayout;
            Class nonMetaclass;
        };
        explicit_atomic<const char *> name;
        void *baseMethodList;
        protocol_list_t * baseProtocols;
        const ivar_list_t * ivars;
        const uint8_t * weakIvarLayout;
        property_list_t *baseProperties;
        method_list_t *baseMethods()
        Class getNonMetaclass() 
        const uint8_t *getIvarLayout()
    

    ro和rw都有Methods,Protocols,Properties,但是他们的类型并不一样.
    class_ro_t是类在初始化的时候也初始化,没有提供修改的函数,所以ro也就是read only,此时是没有rw的,ro就通过 using ro_or_rw_ext_t代替rw.
    相对应的class_rw_t是可读可写的,当类完成初始化,class_rw_t中class_rw_ext_t的class_ro_t_authed_ptr<const class_ro_t> ro会指向class_ro_t.

    cache_t

    struct cache_t {
    private:
        explicit_atomic<uintptr_t> _bucketsAndMaybeMask;
        union {
            struct {
                explicit_atomic<mask_t>    _maybeMask;
    #if __LP64__
                uint16_t                   _flags;
    #endif
                uint16_t                   _occupied;
            };
            explicit_atomic<preopt_cache_t *> _originalPreoptCache;
        };
    //...
    

    explicit_atomic是C++ atomic的封装,封装的类型是uintptr_t,而uintptr_t与当前环境的指针大小相同,也就是8个字节.

    接下来是一个共用体

    #if __LP64__
    typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits
    #else
    typedef uint16_t mask_t;
    #endif
    

    mask_t是个别名,64位环境占4个字节,struct是4+2+2,preopt_cache_t是指针,也是8字节,所以共用体是8字节,
    因此cache_t一共是8+8个字节,从类的地址开始,isa_t(8byte) + Class(8byte) + cache_t(16byte) + class_data_bits_t(8byte)

    ivar_t
    class_ro_t中的Ivars是ivar_list_t类型,它是基础自entsize_list_tt的
    ro和rw的很多list都是entsize_list_tt和以entsize_list_tt为基础再次封装的list_array_tt,具体看(下一篇)[https://www.jianshu.com/p/52080de84f38]
    这篇可以先不用详细了解,只需要知道entsize_list_tt类似一个数组,有个get()方法获取元素,比如get(0).

    struct ivar_t {
        int32_t *offset;
        const char *name;
        const char *type;
        // alignment is sometimes -1; use alignment() instead
        uint32_t alignment_raw;
        uint32_t size;
    
        uint32_t alignment() const {
            if (alignment_raw == ~(uint32_t)0) return 1U << WORD_SHIFT;
            return 1 << alignment_raw;
        }
    };
    

    8+8+8+4+4 = 32字节,具体是如何使用的后面在看.

    property_t

    struct property_t {
        const char *name;
        const char *attributes;
    };
    
    

    property_t只有两个字符串指针,因为它只对属性进行描述.

    成员变量的内存布局

    ivar_list_t只在class_ro_t中有,并且rw里没有ivar相关的东西.
    但是class_ro_t的初始化中,成员变量并非一个个new出来,而是从mach-o中读取的,在objc4中只能找到class_addIvar这个函数用于动态添加成员,静态加载类的时候不会调用这个函数.
    静态加载的过程是一套复杂的流程,对于成员变量,可以先通过runtime来观察.

    在这之前,可以先看一下在内存中,ivar_t的样子

    @interface MyClass : NSObject
    {
        NSInteger _num;
    }
    @end
    
    @implementation
    
    - (instancetype)init{
        if(self = [super init]){
            _num = 5;
        }
        return self;
    }
    
    @end
    
    int main(int argc, const char * argv[]) {
        @autoreleasepool {
              MyClass *my = [[MyClass alloc]init];
              NSLog(@"Hello, World!");
              return 0
          }
    }
    

    声明一个类.在NSLog断点.

    (lldb) p my.class
    (Class) $0 = 0x0000000100008118
    (lldb) p (class_data_bits_t *)$0 + 0x20
    (class_data_bits_t *) $1 = 0x0000000100008218
    

    从类对象地址开始,偏移8(isa)+8(superclass)+16(cache)就是bits的位置了,换成16进制是0x20,得到class_data_bits_t *

    (lldb) p (objc_class *)$0
    (objc_class *) $2 = 0x0000000100008118
    (lldb) p $2->data()
    (class_rw_t *) $3 = 0x0000000109412280
    (lldb) p $2->safe_ro()
    (const class_ro_t *) $4 = 0x0000000100008098
    

    把Class转换成objc_class *,然后分别获取class_rw_t和class_ro_t.

    p $3->ro_or_rw_ext
    (explicit_atomic<unsigned long>) $5= {
      std::__1::atomic<unsigned long> = {
        Value = 4295000216
      }
    }
    (lldb) p/x 4295000216
    (long) $6 = 0x0000000100008098
    (lldb) p $3->ro()
    (const class_ro_t *) $7 = 0x0000000100008098
    

    输出rw里的ro_or_rw_ext,此时它就是ro的地址.或者调用ro()函数也可以

    p *$3
    (class_rw_t) $8 = {
      flags = 2148007936
      witness = 1
      ro_or_rw_ext = {
        std::__1::atomic<unsigned long> = {
          Value = 4295000216
        }
      }
      firstSubclass = nil
      nextSiblingClass = 0x00007ff85e83b9c8
    }
    (lldb) p sizeof($3->flags)
    (unsigned long) $9 = 4
    (lldb) p sizeof($3->witness)
    (unsigned long) $10 = 2
    (lldb) p sizeof($3->ro_or_rw_ext)
    (unsigned long) $11 = 8
    (lldb) p sizeof($3->firstSubclass)
    (unsigned long) $12 = 8
    (lldb) p sizeof($3->nextSiblingClass)
    (unsigned long) $13 = 8
    (lldb) p sizeof(*$3)
    (unsigned long) $14 = 32
    

    把整个rw输出,另外可以看到内存对齐的情况.

    p *$4
    (const class_ro_t) $15 = {
      flags = 128
      instanceStart = 8
      instanceSize = 16
      reserved = 0
       = {
        ivarLayout = 0x0000000000000000
        nonMetaclass = nil
      }
      name = {
        std::__1::atomic<const char *> = "MyClass" {
          Value = 0x0000000100003fa8 "MyClass"
        }
      }
      baseMethods = {
        ptr = nil
      }
      baseProtocols = nil
      ivars = 0x0000000100008070
      weakIvarLayout = 0x0000000000000000
      baseProperties = nil
      _swiftMetadataInitializer_NEVER_USE = {}
    }
    

    然后查看class_ro_t的内容,name是这个ro所属类的名字.
    ivars是有值的.

    (lldb) p *$4->ivars
    (const ivar_list_t) $16 = {
      entsize_list_tt<ivar_t, ivar_list_t, 0, PointerModifierNop> = (entsizeAndFlags = 32, count = 1)
    }
    (lldb) p $16->get(0)
    (ivar_t) $17 = {
      offset = 0x00000001000080e8
      name = 0x0000000100003fb0 "_num"
      type = 0x0000000100003fb5 "q"
      alignment_raw = 3
      size = 8
    }
    

    输出ivars并取出第0个元素.
    那么_num真正的值存在哪呢.需要根据offset找,offset是成员现对于实例的偏移,而offset是指针,它指向的地址存着真正的偏移量

    (lldb) p/x my
    (MyClass *) $18 = 0x0000000108e4c3b0
    (lldb) x/wx 0x00000001000080e8
    0x100008120: 0x00000008
    
    

    也就是_num存在my后面8个字节,my身就8个字节(isa的大小),所以对象后面紧跟着就是_num.

    (lldb) x/4gx $18
    0x108e4c3b0: 0x011d800100008129 0x0000000000000005
    0x108e4c3c0: 0x0000000108e4c490 0x0000000108e4c6d0
    

    读取my指针地址的内存,读取8x4字节,第一段是isa,第二段存的就是_num的值.
    假如成员变量是指针,那这8个字节存的就是这个指针.

    属性的内存布局

    @interface MyClass : NSObject
    
    @property(nonatomic, strong) NSNumber *number;
    @property(nonatomic, assign) NSInteger integer;
    @property(atomic, assign) NSInteger atomic;
    @property(nonatomic, copy) NSString *Str;
    @property(nonatomic, weak) NSObject *weak;
    @property(nonatomic, strong, readonly) NSObject *readonly;
    
    @end
    
    @implementation MyClass
    
    - (instancetype)init{
        if(self = [super init]){
            _readonly = NSObject.new;
        }
        return self;
    }
    
    @end
    

    定义五个property,分别是不同的修饰.

    (lldb) p my.class
    (Class) $0 = 0x0000000100008408
    (lldb) p (objc_class *)$0
    (objc_class *) $1 = 0x0000000100008408
    (lldb) p $1->data()
    (class_rw_t *) $2 = 0x0000000108e27090
    (lldb) p $2->ro()
    (const class_ro_t *) $3 = 0x0000000100008328
    (lldb) p *$3
    (const class_ro_t) $4 = {
      flags = 388
      instanceStart = 8
      instanceSize = 56
      reserved = 0
       = {
        ivarLayout = 0x0000000100003f46 "\U00000001!\U00000011"
        nonMetaclass = 0x0000000100003f46
      }
      name = {
        std::__1::atomic<const char *> = "MyClass" {
          Value = 0x0000000100003f3e "MyClass"
        }
      }
      baseMethods = {
        ptr = 0x00000001000080b8
      }
      baseProtocols = nil
      ivars = 0x00000001000081f8
      weakIvarLayout = 0x0000000100003f4a "A"
      baseProperties = 0x00000001000082c0
      _swiftMetadataInitializer_NEVER_USE = {}
    }
    (lldb) p $4.baseProperties
    (property_list_t *const) $5 = 0x00000001000082c0
    (lldb) p *$5
    (property_list_t) $6 = {
      entsize_list_tt<property_t, property_list_t, 0, PointerModifierNop> = (entsizeAndFlags = 16, count = 6)
    }
    (lldb) p $6.get(0)
    (property_t) $7 = (name = "number", attributes = "T@\"NSNumber\",&,N,V_number")
    (lldb) p $6.get(1)
    (property_t) $8 = (name = "integer", attributes = "Tq,N,V_integer")
    (lldb) p $6.get(2)
    (property_t) $9 = (name = "atomic", attributes = "Tq,V_atomic")
    (lldb) p $6.get(3)
    (property_t) $10 = (name = "Str", attributes = "T@\"NSString\",C,N,V_Str")
    (lldb) p $6.get(4)
    (property_t) $11 = (name = "weak", attributes = "T@\"NSObject\",W,N,V_weak")
    (lldb) p $6.get(5)
    (property_t) $12 = (name = "readonly", attributes = "T@\"NSObject\",R,N,V_readonly")
    
    

    可以看到property_t存的name和attributes,类似"T@"NSObject",R,N,V_readonly",规则是:
    以T开头,后跟@encode类型和逗号,比如NSInteger是q,NSNumber是@"NSNumber.
    然后是修饰,以逗号隔开,
    最后以V加上下划线加上属性名称结尾,其实下划线加上属性名称就是成员变量,后面细说.

    官方文档

    其中attributes的修饰大概有这些:


    image.png

    然后文档还举了一些例子:
    比如Tc,Td,Ti,Tf是char, double,enum/int, float
    还有一些需要注意的,比如@property(getter=intGetFoo, setter=intSetFoo:) int intSetterGetter;编码后是Ti,GintGetFoo,SintSetFoo:,VintSetterGetter
    还有C++指针会加一个,比如int*是Ti; void*是T^v;
    还有id类型是T@,也就是后面的类名是空的.
    等等

    image.png

    那么property的真实结构和数据存在哪呢

    继续上面的lldb

    (lldb) p $4.ivars
    (const ivar_list_t *const) $13 = 0x00000001000081f8
    (lldb) p *$13
    (const ivar_list_t) $14 = {
      entsize_list_tt<ivar_t, ivar_list_t, 0, PointerModifierNop> = (entsizeAndFlags = 32, count = 6)
    }
    (lldb) p $14.get(0)
    (ivar_t) $15 = {
      offset = 0x00000001000083d8
      name = 0x0000000100003e90 "_number"
      type = 0x0000000100003f7a "@\"NSNumber\""
      alignment_raw = 3
      size = 8
    }
    (lldb) p $14.get(1)
    (ivar_t) $16 = {
      offset = 0x00000001000083e0
      name = 0x0000000100003e98 "_integer"
      type = 0x0000000100003f86 "q"
      alignment_raw = 3
      size = 8
    }
    

    所以还是property同时还生成了ivars.

    不过不仅仅是这样,我们知道@property还会生成setter和getter,这些在后面方法和消息以及类的加载再分析.

    相关文章

      网友评论

          本文标题:iOS Runtime 三: 类中的数据结构和内存布局

          本文链接:https://www.haomeiwen.com/subject/ceytnrtx.html