protobuf

作者: 林凌风 | 来源:发表于2022-05-19 10:27 被阅读0次

mac上安装Protobuf
protobuf使用
Google Protocol Buffers 数据交换协议
protobuf
微服务项目讲解
在Windows下使用Protobuf的示例
Mac protobuf安装、卸载
build protobuf,交叉编译protobuf-c
Protobuf On HTTP 技术预研（附代码）
使用protobuf gradle plugin创建 proto

1.pb编码方法

varient 变长编码，对于tag和value的编码。

根据tag中包含了编码类型和字段的序号

整数：

正数和负数：绝对值小于2^28，使用变长编码，负数使用zigzag编码映射

(n << 1) ^ (n >> 31)

对于负数int32，结果表示为2n+（符号位可能存在的1），按照无符号数的计算：即 2^32 -1 -(n-1)*2 ,

绝对值大于2^28，使用固定长度的编码

浮点数：固定长度编码

tag-length-value的编码，适用于string, bytes, embedded messages, packed repeated fields

tag中对应的位表示为0b010，会有表示length的位，之后的编码方式

string 为utf8编码

byte 的编码

嵌套message根据field的内容编码

2.序列化

2.1序列化长度

调用bytesizelong函数

1.首先计算requeired字段

计算长度时候，首先通过位运算判定是否所有的required字段存在，若存在，调用函数，若位判定失败，逐个检查对应的required字段并进行序列化长度计算

首先计算string和bytes，二者计算长度时候调用的函数是一样的。

计算整型int32，int64，sint时候，使用的是自带plusone的函数，转换为无符号整数调用计算长度的函数，但是这个长度会多1（这个长度多了1是包含了tag的长度，如果要计算tag的单独的长度，需要调用TagSize(),在调用repeated成员时候哦，不需要额外的tag长度）

使用fixed32或者fixed64时候，直接返回1+byte长度

计算bool长度时候，proto2中直接长度加一，proto3中也是直接+1

计算enum的长度时候，未调用plusone的函数，但是在调用前已经自加一。

2.其次计算repeated字段

repeated 中stacked在proto3中是默认开启的，对于基本类型采取的是tagvaluevalue的模式，对于string则是tag length value ，proto2中是关闭的，直接为tag value tag value

3.计算optional字段

optional计算时候，首先根据位运算判断是否存在optional字段（proto3中），若存在，计算长度，否则逐个计算相应的值

4.计算unknown的字段长度

2.2序列化

调用SerializeToString 函数

序列化前检查相应的string的长度，确保分配足够的大小，将所确定长度的字符串数据传入SerializeToArrayImpl。

根据字段号，将序列化的内容写入stream中。初始化的steam中写入时候，逐字节写入。

对于optional，写入过程为：tag ，value

疑问：为什么在proto3中序列化optional之前要调用 EnsureSapce，proto2中所有的序列化之前都要调用该函数

对于repeat：

在proto3中，提前缓存了repeat的大小，默认开启了packed，调用函数时候调用的是WriteXXPacked

在proto2中，不开启packed，会通过一个循环对repeated的元素进行序列化。

2.2.2string和bytes

string和byte的区别

protobuf里的string/bytes在C++接口里实现上都是std::string。

两者序列化、反序列化格式上一致，不过对于string格式，会有一个utf-8格式的检查。

会调用WriteStringMaybeAliased，判断宏：

(__builtin_expect(false || (size >= 128 || end_ - ptr + 16 - TagSize(num << 3) - 1 < size), false))

为真的时候，会调用函数WriteStringMaybeAliasedOutline，其中：

左移三位是为了将其恢复为无符号数，判断tag所占据的大小，第三个异或中，若字符串的size小于剩余的尺寸，那说明不会溢出（等于呢？），也就是说，任意异或为真时候，可能会存在分配字符串空间不足的问题，这件事发生概率较小。若发生，需要重新分配内存

正常流程为通过memcpy分配对应的字节到stream中

2.3 嵌套message，使用的是TLV的格式

tag 长度+ 序列化长度+

2.4 union

使用switch，tag+ 根据类型判断是否需要length，+ 序列化的长度

2.5 map 对于map中的元素遍历进行序列化

2.6bool tag-value序列化

2.3 反序列化

反序列化，调用 ParseFromString，反序列化时候，会有不同的ParseFlags作为函数模板的参数

反序列化时候，根据tag中定义的field顺序进行生成代码，解析顺序取决于当前指针的tag的值。

1.首先分配要读的string范围，（根据是否小于16个字节返回合适的地址，若小于，返回的是内建buffer的地址，否则返回的还是string的地址）

2.根据地址，获取tag的地址，读取对应的tag值（不超过5个字节的长度）

3.根据tag值，跳转到读取变量的值，

optional

整型：

repeat

requeired

1.整数读取的整数值最多占用10个字节，否则会报错。

对于sint，正常的反序列化，最后解码后再转换为负数 (n >> 1) ^ (~(n & 1) + 1)

对于repeated，会调用readPackedVarient,

2.string

读取string的长度，最多为5个字节

读取具体的字符串，无溢出时候，调用assign，若有溢出，需要额外调用函数（需要仔细看，buffer_end, ptr,kslopBytes）

检查是否满足UTF8 编码

对于string，若其字段序号大于2个字节，repeat的生成代码会改变

3.bool

4.union

5.mao

反序列化的函数调用：MergeFromImpl函数中进行反序列化，构造对象ParseContext ctx, 最终调用虚函数实现在生成代码中的调用，ctx会将序列化字符串的首地址返回。函数的返回值为序列化结束后的ptr的地址，最后检查函数的地址，判断是否出现error。

判断error的方法，解析成功后指针ptr非空且last_tag_minus_1_为0（在string 和stream的解析中，条件不一致，具体见下方：）并进入函数判断保证所有的required字段存在（但是proto3中没有required字段）。

// This variable is used to communicate how the parse ended, in order to

// completely verify the parsed data. A wire-format parse can end because of

// one of the following conditions:

// 1) A parse can end on a pushed limit.

// 2) A parse can end on End Of Stream (EOS).

// 3) A parse can end on 0 tag (only valid for toplevel message).

// 4) A parse can end on an end-group tag.

// This variable should always be set to 0, which indicates case 1. If the

// parse terminated due to EOS (case 2), it's set to 1. In case the parse

// ended due to a terminating tag (case 3 and 4) it's set to (tag - 1).

// This var doesn't really belong in EpsCopyInputStream and should be part of

// the ParseContext, but case 2 is most easily and optimally implemented in

// DoneFallback.

uint32_t last_tag_minus_1_ = 0;

出现了stringpiece作为参数

几个需要记录的函数：

// ParseFromCodedStream() is implemented as Clear() followed by

// MergeFromCodedStream().

//解析器设计的基本抽象是对ZeroCopyInputStream（ZCIS）抽象的轻微修改。ZCIS将序列化流表示为连接到完整流的一系列缓冲区。

//从图像上看，ZCI以这样的方式将流分块呈现

//其中“-”表示与流的字节垂直排列的字节。原型解析器要求其输入以类似的方式呈现额外属性，即每个块的末尾都有kSlopBytes，与下一个块的第一个kSlopBytes重叠，或者如果没有下一个块，至少它仍然可以读取这些字节。同样，从图像上看，我们现在有了----

//这里的“-”表示流或块的字节和“.”表示与下一个块的开头匹配的块之后的字节。每个区块上方有4个“.”在大块之后。如果这些“溢出”字节表示流过去的字节（上面用“*”表示），则它们的值未指定。阅读它们仍然是合法的。用户应检测到超过末端的读数，并将其指示为错误。

//不可否认，这种非常规不变量的原因是无情地优化protobuf解析器。重叠有两个重要的帮助。

//首先，如果一段代码保证读取的字节数不超过k字节，那么它就不必执行边界检查。第二，也是更重要的一点，protobuf wireformat使得读取密钥/值对的长度始终小于16字节。这样就不需要在读取原语值的过程中更改到下一个缓冲区。因此，无需存储和加载当前位置。

// The basic abstraction the parser is designed for is a slight modification

// of the ZeroCopyInputStream (ZCIS) abstraction. A ZCIS presents a serialized

// stream as a series of buffers that concatenate to the full stream.

// Pictorially a ZCIS presents a stream in chunks like so

// [---------------------------------------------------------------]

// [---------------------] chunk 1

// [----------------------------] chunk 2

// chunk 3 [--------------]

// Where the '-' represent the bytes which are vertically lined up with the

// bytes of the stream. The proto parser requires its input to be presented

// similarly with the extra

// property that each chunk has kSlopBytes past its end that overlaps with the

// first kSlopBytes of the next chunk, or if there is no next chunk at least its

// still valid to read those bytes. Again, pictorially, we now have

// [---------------------------------------------------------------]

// [-------------------....] chunk 1

// [------------------------....] chunk 2

// chunk 3 [------------------..**]

// chunk 4 [--****]

// Here '-' mean the bytes of the stream or chunk and '.' means bytes past the

// chunk that match up with the start of the next chunk. Above each chunk has

// 4 '.' after the chunk. In the case these 'overflow' bytes represents bytes

// past the stream, indicated by '*' above, their values are unspecified. It is

// still legal to read them (ie. should not segfault). Reading past the

// end should be detected by the user and indicated as an error.

// The reason for this, admittedly, unconventional invariant is to ruthlessly

// optimize the protobuf parser. Having an overlap helps in two important ways.

// Firstly it alleviates having to performing bounds checks if a piece of code

// is guaranteed to not read more than kSlopBytes. Secondly, and more

// importantly, the protobuf wireformat is such that reading a key/value pair is

// always less than 16 bytes. This removes the need to change to next buffer in

// the middle of reading primitive values. Hence there is no need to store and

// load the current position.

//前进到下一个缓冲区块返回一个指针，指向由Overflow设置的流中相同的逻辑位置。溢出表示在slop区域中解析留下的位置（0<=溢出<=kSlopBytes）。如果处于限制，则返回true，如果出现错误，此时返回的指针可能为null。该函数的不变之处在于，它保证可以从返回的ptr访问kSlopBytes字节。此函数可能会在底层ZeroCopyInputStream中推进多个缓冲区。

// Advances to next buffer chunk returns a pointer to the same logical place

// in the stream as set by overrun. Overrun indicates the position in the slop

// region the parse was left (0 <= overrun <= kSlopBytes). Returns true if at

// limit, at which point the returned pointer maybe null if there was an

// error. The invariant of this function is that it's guaranteed that

// kSlopBytes bytes can be accessed from the returned ptr. This function might

// advance more buffers than one in the underlying ZeroCopyInputStream.

std::pair<const char*, bool> DoneFallback(int overrun, int depth);

// Advances to the next buffer, at most one call to Next() on the underlying

// ZeroCopyInputStream is made. This function DOES NOT match the returned

// pointer to where in the slop region the parse ends, hence no overrun

// parameter. This is useful for string operations where you always copy

// to the end of the buffer (including the slop region).

const char* Next();

//前进到下一个缓冲区时，对底层ZeroCopyInputStream最多调用一次next（）。此函数与返回的指针不匹配，该指针指向解析在slop区域中的结束位置，因此没有溢出参数。这对于总是复制到缓冲区末尾（包括slop区域）的字符串操作非常有用。

//溢出是流当前所在的斜坡区域中的位置（0<=溢出<=kSlopBytes）。为了防止在解析将在当前缓冲区的最后kSlopBytes中结束的情况下翻转到ZeroCopyInputStream的下一个缓冲区。深度是嵌套组的当前深度（如果用例不需要仔细跟踪，则为负值）。

inline const char* NextBuffer(int overrun, int depth);

1.pb中使用arena分配大块内存，将mutable等需要分配内存的地方，分配到一块指定的空间，减少内存碎片，对于string，设置了TaggedStringPtr，用于将内存的分配位置做区分，

mac上安装Protobuf
为什么要安装protobuf 什么是protobuf 怎么判断有没有安装过protobuf？安装protobuf...
protobuf使用
protobuf的使用 protobuf .proto文件 idea安装protobuf插件 syntax = ...
Google Protocol Buffers 数据交换协议
protobuf 简介 protobuf是什么 protobuf（Protocol Buffers）是Google...
protobuf
protobuf是什么#### protobuf是"Protocol Buffers"的简称。protobuf是一...
微服务项目讲解
内容目录如下：什么是微服务？ RPC 协议 ProtoBuf 使用protobuf 简单语法Protobuf 高...
在Windows下使用Protobuf的示例
在Windows下使用Protobuf的示例摘要 Protobuf全称为Google ProtoBuf,它是由G...
Mac protobuf安装、卸载
protobuf 安装步骤：1、先下载protobuf文件，链接 protobuf-3.12.0，自己找需要的版本...
build protobuf,交叉编译protobuf-c
1，编译protobuf 可以在protobuf git上面找到PC上，比如ubuntu下编译protobuf的方...
Protobuf On HTTP 技术预研（附代码）
Protobuf 技术预研 Protobuf 技术预研一、背景二、Protobuf说明2.1 什么是Protobu...
使用protobuf gradle plugin创建 proto
protobuf在android还推荐一种使用方式为protobuf-lite，使用protobuf gradle...