美文网首页程序员
FlatBuffers序列化过程

FlatBuffers序列化过程

作者: searchworld | 来源:发表于2017-08-29 15:06 被阅读657次

    FlatBuffers简介
    FlatBuffers Schema解析
    FlatBuffers序列化过程
    FlatBuffers反序列化过程

    在上一篇文章里简单讲解了FlatBuffers Schema的格式,和使用flac编译出来的Java类格式,这篇文章介绍如何使用FlatBufferBuilder将上一步的Java类串起来,序列化成二进制格式。这个过程会涉及到FlatBuffers的数据表示格式,参考 FlatBuffers internals。反序列化过程在下一篇中讲解。
    代码参考 SampleBinary.java,使用的是1.7.1版本,为了说明的需要额外添加了一些offset输出信息。

    public class SampleBinary {
        public static void main(String[] args) {
            FlatBufferBuilder builder = new FlatBufferBuilder(200);
            int weaponOneName = builder.createString("Sword");
            short weaponOneDamage = 3;
    
            int weaponTwoName = builder.createString("Axe");
            short weaponTwoDamage = 5;
    
            int sword = Weapon.createWeapon(builder, weaponOneName, weaponOneDamage);
            int axe = Weapon.createWeapon(builder, weaponTwoName, weaponTwoDamage);
    
            int name = builder.createString("Orc");
    
            byte[] treasure = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
            int inv = Monster.createInventoryVector(builder, treasure);
    
            int[] weas = {sword, axe};
            int weapons = Monster.createWeaponsVector(builder, weas);
    
            Monster.startPathVector(builder, 2);
            Vec3.createVec3(builder, 1, 2, 3);
            Vec3.createVec3(builder, 4, 5, 6);
            int path = builder.endVector();
    
            System.out.println(weaponOneName + "\tweaponOneName");
            System.out.println(weaponTwoName + "\tweaponTwoName");
            System.out.println( sword + "\tsword");
            System.out.println(axe + "\taxe");
            System.out.println(name + "\tname");
            System.out.println(inv + "\tinv");
            System.out.println(weapons + "\tweapons");
            System.out.println(path + "\tpath");
    
            Monster.startMonster(builder);
            int vec3 = Vec3.createVec3(builder, 1, 2, 3);
            System.out.println(vec3 + "\tvec3");
            Monster.addPos(builder, vec3);
            System.out.println(builder.offset() + "\taddPos");
            Monster.addName(builder, name);
            System.out.println(builder.offset() + "\taddName");
            Monster.addColor(builder, Color.Red);
            System.out.println(builder.offset() + "\taddColor");
            Monster.addHp(builder, (short) 500);
            System.out.println(builder.offset() + "\taddHp");
            Monster.addInventory(builder, inv);
            System.out.println(builder.offset() + "\taddInv");
            Monster.addWeapons(builder, weapons);
            System.out.println(builder.offset() + "\taddWeapons");
            Monster.addEquippedType(builder, Equipment.Weapon);
            System.out.println(builder.offset() + "\taddEquippedType");
            Monster.addEquipped(builder, axe);
            System.out.println(builder.offset() + "\taddEquipped");
            Monster.addPath(builder, path);
            System.out.println(builder.offset() + "\taddPath");
            int orc = Monster.endMonster(builder);
            System.out.println(builder.offset() + "\tendMonster");
            builder.finish(orc);
            System.out.println(builder.offset() + "\tfinish");
    
            ByteBuffer buffer = builder.dataBuffer();
            byte[] bytes = buffer.array();
            for (byte bb : bytes) {
                System.out.print(bb + " ");
            }
            System.out.println();
        }
    }
    

    执行后输出结果如下:

    12  weaponOneName
    20  weaponTwoName
    32  sword
    52  axe
    60  name
    76  inv
    88  weapons
    116 path
    128 vec3
    128 addPos
    132 addName
    133 addColor
    136 addHp
    140 addInv
    144 addWeapons
    145 addEquippedType
    152 addEquipped
    156 addPath
    186 endMonster
    192 finish
    0 0 0 0 0 0 0 0 32 0 0 0 0 0 26 0 44 0 32 0 0 0 24 0 28 0 0 0 20 0 27 0 16 0 15 0 8 0 4 0 26 0 0 0 40 0 0 0 100 0 0 0 0 0 0 1 56 0 0 0 64 0 0 0 -12 1 0 0 72 0 0 0 0 0 -128 63 0 0 0 64 0 0 64 64 2 0 0 0 0 0 -128 64 0 0 -96 64 0 0 -64 64 0 0 -128 63 0 0 0 64 0 0 64 64 2 0 0 0 52 0 0 0 28 0 0 0 10 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 3 0 0 0 79 114 99 0 -12 -1 -1 -1 0 0 5 0 24 0 0 0 8 0 12 0 8 0 6 0 8 0 0 0 0 0 3 0 12 0 0 0 3 0 0 0 65 120 101 0 5 0 0 0 83 119 111 114 100 0 0 0 
    

    上面的输出转为最终的FlatBuffer结构图如下:

    192         186                                                          160                                                                                                     116                                                                      88                        76                                60                   52                           40               32                       20                   12                             1                                    
    |          | |                                                          | |                                                                                                    |  |                                                                     |  |                       | |                              | |                 |  |                          | |              | |                      | |                  | |                              |                                        
     root_table                      Monster vtable                                                                  Monster                                                                                               path                                        weapons                        inv                        name                    axe                 Weapon vtable           sword                weaponTwoName             weaponOneName
    |          | |                                                          | |                                                                                                    |  |                                                                     |  |                       | |                              | |                 |  |                          | |              | |                      | |                  | |                              | 
    32 0 0 0 0 0 26 0 44 0 32 0 0 0 24 0 28 0 0 0 20 0 27 0 16 0 15 0 8 0 4 0 26 0 0 0 40 0 0 0 100 0 0 0 0 0 0 1 56 0 0 0 64 0 0 0 -12 1 0 0 72 0 0 0 0 0 -128 63 0 0 0 64 0 0 64 64 2 0 0 0 0 0 -128 64 0 0 -96 64 0 0 -64 64 0 0 -128 63 0 0 0 64 0 0 64 64 2 0 0 0 52 0 0 0 28 0 0 0 10 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 3 0 0 0 79 114 99 0 -12 -1 -1 -1 0 0 5 0 24 0 0 0 8 0 12 0 8 0 6 0 8 0 0 0 0 0 3 0 12 0 0 0 3 0 0 0 65 120 101 0 5 0 0 0 83 119 111 114 100 0 0 0 
    

    下面对上面的代码进行解析。

    1. 初始化FlatBufferBuilder

    FlatBufferBuilder builder = new FlatBufferBuilder(200);
    FlatBufferBuilder正如其类名所示,是构建FlatBuffer的builder,本质上是一个用于存放序列化后的信息的ByteBuffer(使用little endian),外加一些序列化方法。ByteBuffer写数据的时候是从高内存地址往低内存地址写,有一个space字段专门维护剩余的空间,(capacity-space)就是offset。

    构造方法
    Construct.png
    public FlatBufferBuilder(int initial_size, ByteBufferFactory bb_factory) {
            if (initial_size <= 0) initial_size = 1;
            space = initial_size;
            this.bb_factory = bb_factory;
            bb = bb_factory.newByteBuffer(initial_size);
    }
    

    构造方法最终创建一个指定大小的ByteBuffer。

    2. 添加Primitive类型

    Primitive.png put.png
    其中put开头的方法用于将数据放入ByteBuffer中,比如putInt:
    public void putInt (int x) { bb.putInt (space -= Constants.SIZEOF_INT, x); }
    add开头的方法一般有两个版本,三个参数的版本内部会调用只有一个参数的版本。比如addInt:
        /**
         * Add an `int` to a table at `o` into its vtable, with value `x` and default `d`.
         *
         * @param o The index into the vtable.
         * @param x An `int` to put into the buffer, depending on how defaults are handled. If
         * `force_defaults` is `false`, compare `x` against the default value `d`. If `x` contains the
         * default value, it can be skipped.
         * @param d An `int` default value to compare against when `force_defaults` is `false`.
         */
        public void addInt(int o, int x, int d) { if(force_defaults || x != d) { addInt(x); slot(o); } }
        /**
         * Add a `long` to the buffer, properly aligned, and grows the buffer (if necessary).
         *
         * @param x A `long` to put into the buffer.
         */
        public void addInt(int x) { prep(Constants.SIZEOF_INT, 0); putInt(x); }
    

    这里的force_defaults和prep方法需要单独说明下,slot方法在struct和table中再说明。

    • FlatBuffers为了节约存储空间,对于要设置的值和默认值一致的是不会存储到内存里的,因此force_defaults默认为false。如果要强制所有值都存储到内存里,可以调用forceDefaults方法进行设置。
    • prep方法有两个作用:1、所有的对齐动作。2、内存不足时申请额外的内存空间。
       /**
        * Prepare to write an element of `size` after `additional_bytes`
        * have been written, e.g. if you write a string, you need to align such
        * the int length field is aligned to {@link Constants#SIZEOF_INT}, and
        * the string data follows it directly.  If all you need to do is alignment, `additional_bytes`
        * will be 0.
        *
        * @param size This is the of the new element to write.
        * @param additional_bytes The padding size.
        */
        public void prep(int size, int additional_bytes) {
            // Track the biggest thing we've ever aligned to.
            if (size > minalign) minalign = size;
            // Find the amount of alignment needed such that `size` is properly
            // aligned after `additional_bytes`
            int align_size = ((~(bb.capacity() - space + additional_bytes)) + 1) & (size - 1);
            // Reallocate the buffer if needed.
            while (space < align_size + size + additional_bytes) {
                int old_buf_size = bb.capacity();
                bb = growByteBuffer(bb, bb_factory);
                space += bb.capacity() - old_buf_size;
            }
            pad(align_size);
        }
    

    这个方法可以这么理解:添加完additional_bytes个字节之后,最后还要添加size个字节。这里需要对齐的是最后这个size字节,实际也是要添加的对象的大小,比如Int就是4个字节。计算需要对齐的字节数在这句话里面实现(有点不好理解,最终的效果是分配additional_bytes之后offset是size的整数倍):
    int align_size = ((~(bb.capacity() - space + additional_bytes)) + 1) & (size - 1);
    申请内存空间在这里实现(每次申请原来大小的两倍):

           while (space < align_size + size + additional_bytes) {
                int old_buf_size = bb.capacity();
                bb = growByteBuffer(bb, bb_factory);
                space += bb.capacity() - old_buf_size;
            }
    

    下面看下Monster中添加Primitive的代码。有上面的输出可以看出添加hp的时候上一个offset是133,由于hp是short类型,两个字节,因此需要先加一个0进行对齐,再加入hp的值500,内存的表示即为-12 1 0(-12 1两个字节表示的值为500),

    Monster.addHp(builder, (short) 500);
    |
        public static void addHp(FlatBufferBuilder builder, short hp) { builder.addShort(2, hp, 100); }
        |
            public void addShort  (int o, short   x, int     d) { if(force_defaults || x != d) { addShort  (x); slot(o); } }
    

    3. 添加Vector

    创建一个Vector很简单,有三个步骤:

    • startVector,主要功能是准备vector所需的内存空间。
       /*
        * @param elem_size The size of each element in the array.
        * @param num_elems The number of elements in the array.
        * @param alignment The alignment of the array.
        */
        public void startVector(int elem_size, int num_elems, int alignment) {
            notNested();
            vector_num_elems = num_elems;
            prep(SIZEOF_INT, elem_size * num_elems);
            prep(alignment, elem_size * num_elems); // Just in case alignment > int.
            nested = true;
        }
    

    对象、string、Vector不能嵌套创建,使用nested来做标记,当循环创建的时候notNested方法会抛出异常。
    另外添加Vector最后会加上一个Vector的长度字段,prep(SIZEOF_INT, elem_size * num_elems);中的SIZEOF_INT就是就是指这个长度。由于alignment有可能大于INT的长度,需要再检查一遍(为啥不先判断下SIZEOF_INT和alignment的大小?)

    • 填入数据,这个根据vector存储的数据类型调用相应的方法即可。
    • endVector,归位嵌套标志,插入vector的长度,返回offset。
       /**
        * Finish off the creation of an array and all its elements.  The array
        * must be created with {@link #startVector(int, int, int)}.
        *
        * @return The offset at which the newly created array starts.
        * @see #startVector(int, int, int)
        */
        public int endVector() {
            if (!nested)
                throw new AssertionError("FlatBuffers: endVector called without startVector");
            nested = false;
            putInt(vector_num_elems);
            return offset();
        }
    

    Monster中创建vector的代码如下:

    byte[] treasure = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
    int inv = Monster.createInventoryVector(builder, treasure);
    |
        public static int createInventoryVector(FlatBufferBuilder builder, byte[] data) { builder.startVector(1, data.length, 1); for (int i = data.length - 1; i >= 0; i--) builder.addByte(data[i]); return builder.endVector(); }
    

    内存分析:
    从输出结果中可以看出inv的上一个字段offset是60,加上treasure的10个字节刚好是70,最后要添加数组长度10,是4字节的INT类型,需要对齐到72,因此最后补两个0对齐,最后的输出是10 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0,offset是76

    4. 添加string

    string本质上也可以看做是字节的vector,因此创建过程和vector基本一致,唯一的区别就是字符串是以null结尾,即最后一位是0,这是代码中 addByte((byte)0)的作用。

       /**
        * Create a string in the buffer from an already encoded UTF-8 string in a ByteBuffer.
        *
        * @param s An already encoded UTF-8 string as a `ByteBuffer`.
        * @return The offset in the buffer where the encoded string starts.
        */
        public int createString(ByteBuffer s) {
            int length = s.remaining();
            addByte((byte)0);
            startVector(1, length, 1);
            bb.position(space -= length);
            bb.put(s);
            return endVector();
        }
    

    Monster中创建string的代码int weaponOneName = builder.createString("Sword");,其中"Sword"长度为5,最后要加一个4字节INT的字符串长度5,因此需要在后面添加3个0对齐,最后内存中的结构为 5 0 0 0 83 119 111 114 100 0 0 0

    5. struct类型

      public static int createVec3(FlatBufferBuilder builder, float x, float y, float z) {
        builder.prep(4, 12);
        builder.putFloat(z);
        builder.putFloat(y);
        builder.putFloat(x);
        return builder.offset();
      }
    

    可以看出struct的值直接放入内存中,没有进行任何处理,而且也不涉及嵌套创建的问题,因此可以内联(inline)在其他结构中。可以看出存储的顺序和字段的顺序一样。vec3在内存中的结构为 0 0 -128 63 0 0 0 64 0 0 64 64,每四个字节对应一个字段。

    6. table类型

      public static int createWeapon(FlatBufferBuilder builder,
          int nameOffset,
          short damage) {
        builder.startObject(2);
        Weapon.addName(builder, nameOffset);
        Weapon.addDamage(builder, damage);
        return Weapon.endWeapon(builder);
      }
    
      public static int endWeapon(FlatBufferBuilder builder) {
        int o = builder.endObject();
        return o;
      }
    

    分为三个部分:

    • startObject,参数为table的字段个数(union类型由于需要携带类型信息,算两个)
        public void startObject(int numfields) {
            notNested();
            if (vtable == null || vtable.length < numfields) vtable = new int[numfields];
            vtable_in_use = numfields;
            Arrays.fill(vtable, 0, vtable_in_use, 0);
            nested = true;
            object_start = offset();
        }
    

    主要工作为检查是否嵌套创建、初始化vtable和对象开始的offset。每个table都会有自己的一份vtable,其中存储着每个字段的offset,这个就是上面slot函数的作用,vtable相同的会共享同一份vtable(不同的table是否有可能一样?)。

    • 添加字段值
      public static void addName(FlatBufferBuilder builder, int nameOffset) { builder.addOffset(0, nameOffset, 0); }
      public static void addDamage(FlatBufferBuilder builder, short damage) { builder.addShort(1, damage, 0); }
    

    添加字段的方法中已经确定好在vtable中的位置,因此调用的顺序是随意的,只跟字段的定义顺序有关,这就是为啥schema中字段只能往后加,且不能删除废弃的字段,但是名称是可以修改的。
    通过startObject的nested标识可以看出table/string/vector不能inline,只能通过对offset进行引用,因此需要在root对象创建之前先创建好。这是addOffset方法的功能。实际写入内存中的offset不是相对于buffer末尾的真正的offset,而是相对于当前即将写入的位置的offset,off = offset() - off + SIZEOF_INT;加上SIZEOF_INT是因为当前写入的值是INT类型,会占用SIZEOF_INT个字节。

       /**
        * Adds on offset, relative to where it will be written.
        *
        * @param off The offset to add.
        */
        public void addOffset(int off) {
            prep(SIZEOF_INT, 0);  // Ensure alignment is already done.
            assert off <= offset();
            off = offset() - off + SIZEOF_INT;
            putInt(off);
        }
    

    对于Primitive类型直接使用add方法即可。

    • endObject,结束Object创建,vtable写入buffer
      这里需要先介绍下table和vtable的结构。
      table的开头是vtable开始位置减去当前table对象开始位置的INT型offset,由于vtable可能在任意的地方,这个值有可能是负值。接下去就是table中每个字段的值。比如sword的offset是32,vtable的offset是40,因此table开头是8,同时sword包含两个字段,一个short类型的3,一个是与weaponOneName的offset,这个值刚好存储在offset为24的地方,到weaponOneName的offset为12,因此sword在内存中表示为8 0 0 0 0 0 3 0 12 0 0 0
      vtable是一个short类型的数组,其长度为(字段个数+2)*2字节,第一个字段是vtable的大小,包括这个大小本身;第二个字段是vtable对应的对象的大小,包括到vtable的offset;接下来是每个字段相对于对象开始位置的offset。仍然看sword的值。sword有两个字段因此vtable长度为(2+2)*2=8字节;sword的在内存中表示为 8 0 0 0 0 0 3 0 12 0 0 0,长度为12字节;12 0 0 0相对sword的offset为8,3 0相对sword的offset为6,因此vtable在内存中表示为8 0 12 0 8 0 6 0
        public int endObject() {
            if (vtable == null || !nested)
                throw new AssertionError("FlatBuffers: endObject called without startObject");
            //预留table相对于vtable的offset
            addInt(0);
            //记录对象的开始位置
            int vtableloc = offset();
            // Write out the current vtable.
            int i = vtable_in_use - 1;
            // Trim trailing zeroes.
            for (; i >= 0 && vtable[i] == 0; i--) {}
            int trimmed_size = i + 1;
            //每个字段相对于对象开始位置的offset
            for (; i >= 0 ; i--) {
                // Offset relative to the start of the table.
                short off = (short)(vtable[i] != 0 ? vtableloc - vtable[i] : 0);
                addShort(off);
            }
    
            final int standard_fields = 2; // The fields below:
            //对象的大小
            addShort((short)(vtableloc - object_start));
            //vtable的大小
            addShort((short)((trimmed_size + standard_fields) * SIZEOF_SHORT));
    
            // Search for an existing vtable that matches the current one.
            // 共享vtable操作,如果vtable的所有值都是一致的则共享
            int existing_vtable = 0;
            outer_loop:
            for (i = 0; i < num_vtables; i++) {
                int vt1 = bb.capacity() - vtables[i];
                int vt2 = space;
                short len = bb.getShort(vt1);
                if (len == bb.getShort(vt2)) {
                    for (int j = SIZEOF_SHORT; j < len; j += SIZEOF_SHORT) {
                        if (bb.getShort(vt1 + j) != bb.getShort(vt2 + j)) {
                            continue outer_loop;
                        }
                    }
                    existing_vtable = vtables[i];
                    break outer_loop;
                }
            }
    
            if (existing_vtable != 0) {
                // Found a match:
                // Remove the current vtable.
                space = bb.capacity() - vtableloc;
                // Point table to existing vtable.
                bb.putInt(space, existing_vtable - vtableloc);
            } else {
                // No match:
                // Add the location of the current vtable to the list of vtables.
                if (num_vtables == vtables.length) vtables = Arrays.copyOf(vtables, num_vtables * 2);
                // vtables用于保存所有vtable的偏移量
                vtables[num_vtables++] = offset();
                // Point table to current vtable.
                //table开头放入相对于vtable的offset
                bb.putInt(bb.capacity() - vtableloc, offset() - vtableloc);
            }
    
            nested = false;
            return vtableloc;
        }
    

    endObject主要功能分为三部分:1、设置vtable;2、查找是否有相同的vtable;3、设置相对于vtable的偏移量。

    7. finish结束创建,指向root_table

        /**
         * Finalize a buffer, pointing to the given `root_table`.
         *
         * @param root_table An offset to be added to the buffer.
         */
        public void finish(int root_table) {
            prep(minalign, SIZEOF_INT);
            addOffset(root_table);
            bb.position(space);
            finished = true;
        }
    

    整个FlatBuffer最后有一个int值,指向root_table的开始位置。在反序列的时候需要用到,留到下一篇讲解。

    8. union类型

    Monster.addEquippedType(builder, Equipment.Weapon);
    Monster.addEquipped(builder, axe);
    

    union类型跟其他类型唯一的一个区别是多一个Type结尾的方法,需要先指定类型(因为union可能包含多个类型,不指定的话无法确认是什么类型)

    总结

    再回顾下最终FlatBuffer的内存结构可以看出FlatBuffers在一个扁平的内存区域中(一个ByteBuffer)可以存储复杂的类型结构,除了vtable的存储空间和少数对齐的空隙外几乎没有多余的内存占用,内存使用效率非常高。

    相关文章

      网友评论

        本文标题:FlatBuffers序列化过程

        本文链接:https://www.haomeiwen.com/subject/wmfgdxtx.html