Java对象模型（OOP-Klass模型）

作者: 程序员札记 | 来源:发表于2022-06-30 08:52 被阅读0次

一、问题背景

当我们在java程序中，使用new创建一个对象时，是否考虑过或者知道这个对象在JVM中是如何表示的？它占用的内存大小是多少？

class A {
    int a;
}
public static void main(String[] args) {
    A a = new A();
}

二、Java对象模型-OOP-Klass

在JVM中，Hotspot并没有将Java对象映射成C++对象，而是实现了Java的对象模型（OOP-Klass），JVM不希望每个对象中都包含一份虚函数表（Why？？？待探究）。
那么为何要设计这样一个一分为二的对象模型呢？这是因为 HotSopt JVM 的设计者不想让每个对象中都含有一个 vtable（虚函数表），所以就把对象模型拆成 klass 和 oop，其中 oop 中不含有任何虚函数，而 klass 就含有虚函数表，可以进行 method dispatch。这个模型其实是参照的 Strongtalk VM 底层的对象模型。

下面解释下OOP-Kclass模型的定义：

OOP 英文全程是Ordinary Object Pointe，即普通对象指针，看起来像个指针实际上是藏在指针里的对象，表示对象的实例信息。
Klass 元数据和方法信息，用来描述 Java。是Java类的在C++中的表示形式，用来描述Java类的信息。

当加载一个Class时，会创建一个InstanceKlass对象，实例化的对象则对应InstanceOopDesc，instanceOopDesc继承自oopDesc，用于表示普通的Java对象，每次new一个Java对象就会创建一个新的instanceOopDesc实例，其中InstanceKlass存放在元空间，InstanceOopDesc存放在堆中。

由于 Java 8 引入了 Metaspace，OpenJDK 1.8 里对象模型的实现与 1.7 有很大的不同。原先存于 PermGen 的数据都移至 Metaspace，因此它们的 C++ 类型都继承于 MetaspaceObj 类(定义见 vm/memory/allocation.hpp)，表示元空间的数据。

如下一个对象组成部分所示，InstanceOopDesc是对象的头部，对象的数据部分紧跟其后。

image.png

多个对象的ModelA和ModelB的定义如下：

class Model
{
    public static int a = 1;
    public int b;
    public Model(int b) {
        this.b = b;
    }

   public static void main(String[] args) {
      int c = 10;
      Model modelA = new Model(2);
      Model modelB = new Model(3);
   }
}

如下是分配情况：

image.png

对象的定位访问方式实际上分为两种：句柄访问和直接指针访问，上图就是直接指针访问。直接指针访问的优点就是处理速度快，节省了一次指针定位的时间开销。句柄访问方式，需要在堆中开辟一个句柄池，栈中的reference存储的就是句柄地址，句柄包含了对象实例数据和类型数据各自的地址。句柄访问方式的优点是对象被移动时，只需要修改句柄的数据地址即可，操作简单。

在hotspot/share/oops/oopsHierarchy.hpp 文件中，对oop的定义如下：

typedef class oopDesc*                    oop;
typedef class   instanceOopDesc*            instanceOop;
typedef class   arrayOopDesc*               arrayOop;
typedef class     objArrayOopDesc*            objArrayOop;
typedef class     typeArrayOopDesc*           typeArrayOop;

其中instanceOopDesc的定义如下：

// An instanceOop is an instance of a Java Class
// Evaluating "new HashTable()" will create an instanceOop.

class instanceOopDesc : public oopDesc {
public:
// aligned header size.
static int header_size() { return sizeof(instanceOopDesc)/HeapWordSize; }

// If compressed, the offset of the fields of the instance may not be aligned.
static int base_offset_in_bytes() {
// offset computation code breaks if UseCompressedClassPointers
// only is true
return (UseCompressedOops && UseCompressedClassPointers) ?
klass_gap_offset_in_bytes() :
sizeof(instanceOopDesc);
}
};

其中base_offset_in_bytes方法用于返回instanceOopDesc自身属性（即对象头）的内存的偏移量，即该偏移量之后的内存用于保存Java对象实例属性。
可以从instanceOopDesc定义中，基本类型字段的实现都是在instanceOopDesc的地址的基础上加上一个偏移量算出该字段的地址，偏移量的单位是字节，各字段的偏移量和初始值等属性都保存在InstanceKlass的_fields属性中，根据该地址可以直接获取或者设置字段值。
在oop.inline.hpp文件中，以char为例，如下代码所示：

inline jchar oopDesc::char_field(int offset) const                  { return HeapAccess<>::load_at(as_oop(), offset);  }
inline void  oopDesc::char_field_put(int offset, jchar value)       { HeapAccess<>::store_at(as_oop(), offset, value); }

其中as_oop函数定义如下，直接返回对象本身，offset是字段的操作字符的偏移量：

 protected:
  inline oop        as_oop() const { return const_cast<oopDesc*>(this); }

在access.hpp文件中，HeapAccess<>::store_at的定义源码如下：

// Helper for array access.
template <DecoratorSet decorators = DECORATORS_NONE>
class ArrayAccess: public HeapAccess<IS_ARRAY | decorators> {
  typedef HeapAccess<IS_ARRAY | decorators> AccessT;
public:
  // * load_at: Load a value from an internal pointer relative to a base object.
  // Primitive heap accesses
  static inline AccessInternal::LoadAtProxy<decorators> load_at(oop base, ptrdiff_t offset) {
    verify_primitive_decorators<load_mo_decorators>();
    return AccessInternal::LoadAtProxy<decorators>(base, offset);
  }

oopDesc类大的主要定义如下：

class oopDesc {
  friend class VMStructs;
  friend class JVMCIVMStructs;
 private:
  volatile markWord _mark;
  union _metadata {
    Klass*      _klass;
    narrowKlass _compressed_klass;
  } _metadata;

public:
...省略...
inline Klass* klass() const;

其中_metadata指向该对象的InstanceKlass，_mark中则存储了对象运行时的状态数据（存储对象的哈希码、GC分代年龄、锁等状态信息）。

hash：哈希码
age：分代年龄
biased_lock：偏向锁标识位
lock：锁状态标识位
JavaThread*：持有偏向锁的线程ID
epoch：偏向时间戳
源码在markWord.hpp文件中，对_mark的注释信息如下：

// The markWord describes the header of an object.
//
// Bit-format of an object header (most significant first, big endian layout below):
//
//  32 bits:
//  --------
//             hash:25 ------------>| age:4    biased_lock:1 lock:2 (normal object)
//             JavaThread*:23 epoch:2 age:4    biased_lock:1 lock:2 (biased object)
//
//  64 bits:
//  --------
//  unused:25 hash:31 -->| unused_gap:1   age:4    biased_lock:1 lock:2 (normal object)
//  JavaThread*:54 epoch:2 unused_gap:1   age:4    biased_lock:1 lock:2 (biased object)

摘自网络上的一张图（32位和64位操作系统：）：

image.png

_klass指针指向InstanceKlass的结构体系如下所示：

// The klass hierarchy is separate from the oop hierarchy.



class Klass;
class InstanceKlass;
class InstanceMirrorKlass;
class InstanceClassLoaderKlass;
class InstanceRefKlass;
class ArrayKlass;
class ObjArrayKlass;
class TypeArrayKlass;

klass 代表元数据，继承自 Metadata 类，因此像 Method、ConstantPool 都会以成员变量（或指针）的形式存在于 klass 体系中。
一个 Klass 对象代表一个类的元数据（相当于 java.lang.Class 对象）。它提供：

language level class object (method dictionary etc.)
provide vm dispatch behavior for the object
所有的函数都被整合到一个 C++ 类中。

Klass 对象的继承关系：xxxKlass < Klass < Metadata < MetaspaceObj

class InstanceKlass: public Klass {
...省略....
// Method array.
  Array<Method*>* _methods;
  // Default Method Array, concrete methods inherited from interfaces
  Array<Method*>* _default_methods;
  // Interfaces (InstanceKlass*s) this class declares locally to implement.
  Array<InstanceKlass*>* _local_interfaces;
  // Interfaces (InstanceKlass*s) this class implements transitively.
  Array<InstanceKlass*>* _transitive_interfaces;
  // Int array containing the original order of method in the class file (for JVMTI).
  Array<int>*     _method_ordering;
  // Int array containing the vtable_indices for default_methods
  // offset matches _default_methods offset
  Array<int>*     _default_vtable_indices;


// Instance and static variable information, starts with 6-tuples of shorts
// [access, name index, sig index, initval index, low_offset, high_offset]
// for all fields, followed by the generic signature data at the end of
// the array. Only fields with generic signature attributes have the generic
// signature data set in the array. The fields array looks like following:
//
// f1: [access, name index, sig index, initial value index, low_offset, high_offset]
// f2: [access, name index, sig index, initial value index, low_offset, high_offset]
// …
// fn: [access, name index, sig index, initial value index, low_offset, high_offset]
// [generic signature index]
// [generic signature index]
// …
Array<u2>* _fields;
...省略....

可以看到，一个Java类该具有的东西，这里面基本都包含了。

HotSpot VM从JDK8开始移除了PermGen，本来存在PermGen里的元数据都被挪到不直接由GC管理的另一块空间里了，叫做Metaspace。HotSpot VM里，Klass其实是用于描述能被GC的对象的类型信息的元数据对象。在JDK8之前的HotSpot VM里，类元数据存在由GC管理的PermGen区里。这些xxxKlass对象（例如instanceKlass的实例）自身也是被GC管理的，所以也需要有Klass对象去描述它们，叫做xxxKlassKlass。然后它们又…所以就有了KlassKlass这个终极的描述xxxKlassKlass对象的东西。这篇老官方文档提到了以前的Klass / KlassKlass体系的设计：http://openjdk.java.net/groups/hotspot/docs/StorageManagement.html小知识：这种Klass / KlassKlass体系的设计是继承自Smalltalk的流行实现的class + metaclass做法，体现了HotSpot VM的Smalltalk血缘。从JDK8开始，既然元数据不由GC直接管理了，Klass这系对象就都不需要再被KlassKlass所描述，所以KlassKlass就全去除了。

Java对象模型（OOP-Klass模型）

一、问题背景

二、Java对象模型-OOP-Klass

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读