类的本质

前面我们分析了对象的创建与本质，对象的创建依赖于类，接下来我们继续探索类的本质。

类的创建

我们知道，对象是在运行时创建的，对象的创建依赖于类，那么类是在什么时候创建的呢？

我们可以有两种方法类验证：

lldb 打印类和元类的指针

我们通过在 main 函数处断点，此时 main 函数还未执行，通过 lldb 命令，可以在控制台输出类和元类的地址。

查看 Mach-O 文件

我们将 cmd + b 编译后的 Mach-O 文件在 MachOView 中打开，可以看到：

这说明类和元类在程序编译期就已经创建。

指针与内存偏移

在 OC 的世界里，数据在内存中的存储是以指针的形式存在的。这些指针大致可以分为：

普通指针
对象指针
数组指针

普通指针

OC 中普通指针是相对于值类型来说的：

int a = 10;
int b = 10;
NSLog(@"%d - %p", a, &a); // 10 - 0x7ffeefbff504
NSLog(@"%d - %p", b, &b); // 10 - 0x7ffeefbff500

值类型在内存中是值拷贝类型，所以 a和b 在内存中是两个不同的存在。

对象指针

对象指针很好理解，就是针对对象来说的：

SMPerson *p1 = [SMPerson alloc];
SMPerson *p2 = [SMPerson alloc];
NSLog(@"%@ - %p", p1, &p1); // <SMPerson: 0x10064ab40> - 0x7ffeefbff4f8
NSLog(@"%@ - %p", p2, &p2); // <SMPerson: 0x100638ad0> - 0x7ffeefbff4f0

数组指针

在编写程序时，数组是用的相当多的：

int d[4] = {1, 2, 3, 4};
int *q = d;
NSLog(@"%p - %p -%p", &d, &d[0], &d[1]); //0x7ffeefbff500 - 0x7ffeefbff500 -0x7ffeefbff504 
NSLog(@"%p - %p -%p", q, q + 1, q + 2); // 0x7ffeefbff550 - 0x7ffeefbff554 -0x7ffeefbff558
for (int i = 0;i < 4; i ++) {
    NSLog(@"%d - %d", d[i], *(q + i));
} 
/**
1 - 1
2 - 2
3 - 3
4 - 4
*/

在内存中示意图大致如下：

内存偏移

上面的几种指针，我们都可以通过访问内存地址取得数据：对象在内存中分配地址，我们可以通过首地址，并结合各类型所占字节长度偏移取得的数据。

上面的数组的例子最能体现我们要表达的内存偏移的概念：根据数组的数组的首地址，在结合 int 类型 4 字节长度的特性，我们可以每个元素存储的位置。

要解读类的本质，我们从 NSObject 开始：

@interface NSObject <NSObject> {
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wobjc-interface-ivars"
    Class isa  OBJC_ISA_AVAILABILITY;
#pragma clang diagnostic pop
}

NSObject 有一个 Class 类型的 isa 成员变量。

接下来用 clang 编译 main.m，输出 main.cpp 文件，查看 NSObject 的底层定义：

clang -rewrite-objc main.m -o main.cpp

打开 main.cpp，找到 NSObject：

#ifndef _REWRITER_typedef_NSObject
#define _REWRITER_typedef_NSObject
typedef struct objc_object NSObject;
typedef struct {} _objc_exc_NSObject;
#endif

struct NSObject_IMPL {
	Class isa;
};

NSObject 是一个 objc_object 结构体，同时定义了一个 NSObject_IMPL 结构体，里面有 isa 成员变量，对应上面类 NSObject 的 isa。

对于继承自 NSObject 的类：

@interface SMBook : NSObject {
    NSString *_author;
}

@property (nonatomic, copy) NSString *name;

- (void)sayHello;

@end

@implementation SMBook

- (void)sayHello {
    NSLog(@"Hello, world!");
}

+ (void)sell {
    NSLog(@"sell");
}

@end

我们也同样可以在 main.cpp 中看到：

#ifndef _REWRITER_typedef_SMBook
#define _REWRITER_typedef_SMBook
typedef struct objc_object SMBook;
typedef struct {} _objc_exc_SMBook;
#endif

extern "C" unsigned long OBJC_IVAR_$_SMBook$_name;
struct SMBook_IMPL {
	struct NSObject_IMPL NSObject_IVARS;
	NSString *_author;
	NSString *_name;
};

SMBook 同样是一个 objc_object 的结构体，因为继承自 NSObject，SMBook_IMPL 结构体中除其自身的属性外，还多了 NSObject_IVARS – 即继承 NSObject 的类都相当于有一个 isa。

NSObject 及其子类本质上都是 objc_object 结构体类型，所以类本质上也是一个对象，即万物皆对象。

类的结构

类在底层实现是一个结构体指针：

typedef struct objc_class *Class;

所以 Class 是一个8字节的指针类型。

继续看 objc_class 的结构：

struct objc_class : objc_object {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
}

objc_class 继承自 objc_object，这说明类也是一个对象。

注意这里的 Class ISA，这个 ISA 是针对优化后的：

inline Class 
objc_object::ISA() 
{
    assert(!isTaggedPointer()); 
#if SUPPORT_INDEXED_ISA
    if (isa.nonpointer) {
        uintptr_t slot = isa.indexcls;
        return classForIndex((unsigned)slot);
    }
    return (Class)isa.bits;
#else
    return (Class)(isa.bits & ISA_MASK);
#endif
}

现在看 objc_class 的结构：

ISA 表示元类
superclass 表示父类
cache_t，方法缓存重要结构体
bits，存储数据的结构体

类的存储

OC 中类一般都会有属性以及成员变量，他们在类中是如何存储的呢？

类的内存分布

首先需要我们对类的结构体的内存结构有一个清晰的认识：

结构体成员	内存大小
ISA	8
superclass	8
cache	16

ISA 和 superclass 很好理解，都是 Class 的指针类型，在64位结构下各占8个字节，这里我们着重看下 cache：

struct cache_t {
    struct bucket_t *_buckets;
    mask_t _mask;
    mask_t _occupied;
}

#if __LP64__
typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits
#else
typedef uint16_t mask_t;
#endif

从上面的代码可以看出，cache 是 cache_t 的结构类型，其内部有3个成员变量，在 64 为架构模式下，结合内存对齐等，cache_t 占 8 + 4 + 4 = 16 个字节。

我们要读取类中成员变量和属性、方法等信息，需要读取 bits 中的值，结合上面讲的内存偏移，我们需要在类的首地址上偏移 32 个字节，用16进制表示为：0x20。

获取类的 `bits`

我们通过 LLDB 命令来探索类结构的第四个属性 bits。

@interface SMPerson : NSObject {
    NSString *_hobby;
}

@property (nonatomic, copy) NSString *nickName;

- (void)sayHello;

+ (void)sayHappy;

@end

SMPerson *p = [SMPerson alloc];
Class pClass = object_getClass(p);

我们先拿到 pClass，然后在控制台使用 LLDB 命令：

x/4xg pClass

我们需要得到 bits 指针的地址，需要进行指针偏移，即：

0x100001238 + 0x20 = 0x100001258

我们继续在控制填输入：

(lldb) po 0x100001258

会有如下输出：

objc[6727]: Attempt to use unknown class 0x10190d4a0.
4294971992

显然，bits 不是一个对象而是一个结构体，这里我们需要强转一下并得到如下输出：

(lldb) p (class_data_bits_t *)0x100001258
(class_data_bits_t *) $2 = 0x0000000100001258

解析 `class_rw_t`

OC 中类的属性、成员变量和方法等都存储在 class_rw_t 中，结合上面 objc_class 的结构：

class_rw_t *data() { 
    return bits.data();
}

struct class_data_bits_t 中：

struct class_data_bits_t {

    // Values are the FAST_ flags above.
    uintptr_t bits;
	class_rw_t* data() {
	    return (class_rw_t *)(bits & FAST_DATA_MASK);
	}
}

#define FAST_DATA_MASK          0x00007ffffffffff8UL

class_data_bits_t 占8个字节，即64位，其中从第 4~47 共 44 位表示 class_rw_t。

我们调用 $2->data()获得 class_rw_t：

(lldb) p $2->data()
(class_rw_t *) $3 = 0x000000010190d4a0

然后我们根据 libObjc 的源码中关于 class_rw_t 相关的定义：

struct class_rw_t {
    // Be warned that Symbolication knows the layout of this structure.
    uint32_t flags;
    uint32_t version;

    const class_ro_t *ro;

    method_array_t methods;
    property_array_t properties;
    protocol_array_t protocols;

    Class firstSubclass;
    Class nextSiblingClass;
    
    ...
}

我们在打印证实下是否这种结构：

这里我们需要留意几个关键成员变量：

const class_ro_t *ro 是一个不可变的属性
methods
properties
protocols

这里还有两点是需要注意的：

class_rw_t 中没有发现成员变量的列表
ro 存在的意义：
- class_rw_t 是可以在运行时拓展一些属性、方法和协议等内容
- class_ro_t 是在编译时就已经确定了的，存储类的成员变量、属性、方法和协议

现在我们已经获取到 class_rw_t 的值，下面我们就预测一下我们的属性、方法等存储在结构体的哪些变量上

预测属性应该定义在 properties 中：

接着我们查看 properties 中的内容：

预测方法应该定义在 methods 中：

在 method_list_t 中我们可以看到此时我们的方法有 3 个：

struct method_list_t : entsize_list_tt<method_t, method_list_t, 0x3> { }

method_list_t 继承自 entsize_list_tt， entsize_list_tt 实现了 first 和迭代器方法，我们可以通过 get 方法读取到数组中的元素：

(lldb) p $10.first
(method_t) $16 = {
  name = "sayHello"
  types = 0x0000000100000f85 "v16@0:8"
  imp = 0x0000000100000dc0 (objc-debug`-[SMPerson sayHello] at SMPerson.m:12)
}
(lldb) p $10.get(1)
(method_t) $17 = {
  name = "nickName"
  types = 0x0000000100000f8d "@16@0:8"
  imp = 0x0000000100000e20 (objc-debug`-[SMPerson nickName] at SMPerson.h:16)
}
(lldb) p $10.get(2)
(method_t) $18 = {
  name = "setNickName:"
  types = 0x0000000100000f95 "v24@0:8@16"
  imp = 0x0000000100000e50 (objc-debug`-[SMPerson setNickName:] at SMPerson.h:16)
}

预测协议应该定义在 protocols 中：

因为这里并没有实现任何协议，所以数组为空。

进行到这里，现在有个疑问是，我们的成员变量去哪里了呢？这就需要我们的 class_ro_t 出场了。上面已经说过，class_ro_t 在编译时就已经确定了成员变量、属性、方法和协议的布局，不考虑运行时动态添加方法等操作，我们应该在 class_ro_t 读取类的数据。这里的 ro 就是 read only 的意思了。

解析 `class_ro_t`

源码中 class_ro_t 的结构为：

struct class_ro_t {
    uint32_t flags;
    uint32_t instanceStart;
    uint32_t instanceSize;
#ifdef __LP64__
    uint32_t reserved;
#endif

    const uint8_t * ivarLayout;
    
    const char * name;
    method_list_t * baseMethodList;
    protocol_list_t * baseProtocols;
    const ivar_list_t * ivars;

    const uint8_t * weakIvarLayout;
    property_list_t *baseProperties;
}

首先第一步是获取 class_ro_t：

(lldb) p $4.ro

属性存储在 baseProperties 中：
成员变量存储在 ivars 中：

成员变量为 _hobby 和 _nickName，为什么会有 _nickName 呢，他不是属性吗？这就是编译器会帮助我们给属性生成一个带下划线的成员变量了。
方法存储在 baseMethodList 中：

这里除了我们写的方法 sayHello 之外，还有 setNickName 和 nickName 方法。这是编译器帮助我们给属性生成的 setter 和 getter 方法。

类方法的存储

在上面的 baseMethodList 中，并没有发现我们的类方法 sayHappy，这说明类方法并不存储在此，那么类方法放在哪里呢？

我们知道在 OC 的世界中，万物皆对象，类也是对象，且类是元类的对象，那么我们是不是可以大胆猜测，类方法是存储在元类的 ro 中呢？下面我们就此来验证：

首先获得元类，类的 isa 指向元类，从之前的 isa 相关的知识：

我们在元类的 ro 中找到我们的类方法。

`class_rw_t` 与 `class_ro_t` 的联系与区别

根据上面的分析 class_rw_t 和 class_ro_t 中都存储了类的属性、方法等。为什么 class_rw_t 也能拿到这些信息呢？是因为执行了方法 realizeClassWithoutSwift：

static Class realizeClassWithoutSwift(Class cls) {
	
	...
	
	const class_ro_t *ro;
	class_rw_t *rw;
	
	ro = (const class_ro_t *)cls->data();
	if (ro->flags & RO_FUTURE) {
	    // This was a future class. rw data is already allocated.
	    rw = cls->data();
	    ro = cls->data()->ro;
	    cls->changeInfo(RW_REALIZED|RW_REALIZING, RW_FUTURE);
	} else {
	    // Normal class. Allocate writeable class data.
	    // 一般走这里
	    rw = (class_rw_t *)calloc(sizeof(class_rw_t), 1);
	    rw->ro = ro;
	    rw->flags = RW_REALIZED|RW_REALIZING;
	    cls->setData(rw);
	}
	
	...
	
	// Attach categories
	methodizeClass(cls);
	
	return cls;
}

然后在调用 methodizeClass：

static void methodizeClass(Class cls)
{
    runtimeLock.assertLocked();

    bool isMeta = cls->isMetaClass();
    auto rw = cls->data();
    auto ro = rw->ro;

    // Methodizing for the first time
    if (PrintConnecting) {
        _objc_inform("CLASS: methodizing class '%s' %s", 
                     cls->nameForLogging(), isMeta ? "(meta)" : "");
    }

    // Install methods and properties that the class implements itself.
    method_list_t *list = ro->baseMethods();
    if (list) {
        prepareMethodLists(cls, &list, 1, YES, isBundleClass(cls));
        rw->methods.attachLists(&list, 1);
    }

    property_list_t *proplist = ro->baseProperties;
    if (proplist) {
        rw->properties.attachLists(&proplist, 1);
    }

    protocol_list_t *protolist = ro->baseProtocols;
    if (protolist) {
        rw->protocols.attachLists(&protolist, 1);
    }

    // Root classes get bonus method implementations if they don't have 
    // them already. These apply before category replacements.
    if (cls->isRootMetaclass()) {
        // root metaclass
        addMethod(cls, SEL_initialize, (IMP)&objc_noop_imp, "", NO);
    }

    // Attach categories.
    category_list *cats = unattachedCategoriesForClass(cls, true /*realizing*/);
    attachCategories(cls, cats, false /*don't flush caches*/);

    if (PrintConnecting) {
        if (cats) {
            for (uint32_t i = 0; i < cats->count; i++) {
                _objc_inform("CLASS: attached category %c%s(%s)", 
                             isMeta ? '+' : '-', 
                             cls->nameForLogging(), cats->list[i].cat->name);
            }
        }
    }
    
    if (cats) free(cats);
	
	...
}

在methodizeClass中，将 ro 中的方法、属性，遵循的协议、category 的方法都添加都 rw 中（注意这里只是将指针指向 ro 中对应的列表地址）。这样在运行期我们就可以在 rw 中拿到相应的信息了。

前面已经说过 ro 是在编译期就已经确定了的，而 rw 可以在运行期拓展方法等，现在我们就开看一个例子：

void run() {
    NSLog(@"running...");
}

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        SMPerson *p = [SMPerson alloc];
        Class pClass = object_getClass(p);
        class_addMethod(pClass, NSSelectorFromString(@"run"), (IMP)run, "v@:");
        [p performSelector:NSSelectorFromString(@"run")];
    }
    return 0;
}

现在我们开看 rw 与 ro 中的方法列表：

ro 中没有我们动态添加的方法，符合我们的预期，但是很奇怪的是，rw 里面的值变的很奇怪，留个坑 o(╥﹏╥)o

这里仍然有需要注意的点：

在没有动态添加方法时，ro 的 baseMethodList 与 rw 的 methods 的 list 指向的地址是相同的，不只是方法列表，属性列表指向的地址也是相同的，这说明运行时若没有动态添加属性或方法时，他们指向相同的地址
运行时动态添加方法等之后，rw 发生了变化

类的内存分布图

总结

类和元类创建于编译期
万物皆对象，类的元类的对象
class_ro_t 存储类的成员变量、属性、方法、协议等，是只读的
class_rw_t 可以在运行期进行拓展
实例方法存储在类中
类方法存储在元类中

类的创建

lldb 打印类和元类的指针

查看 Mach-O 文件

指针与内存偏移

普通指针

对象指针

数组指针

内存偏移

类的本质

类的本质

类的结构

类的存储

类的内存分布

获取类的 bits

解析 class_rw_t

解析 class_ro_t

类方法的存储

class_rw_t 与 class_ro_t 的联系与区别

类的内存分布图

总结

获取类的 `bits`

解析 `class_rw_t`

解析 `class_ro_t`

`class_rw_t` 与 `class_ro_t` 的联系与区别