[TOC]
参考
impala wiki —— Impala Row Batches
1. 名词解释
- Value: a value (e.g. int, string, array, etc). All values in Impala are nullable internally.
- Fixed-length data: the part of a value that is fixed in size (e.g. a 32-bit integer, the 32-bit length + 64-bit pointer representing a string)
- Variable-length data: parts of a value that vary in length, e.g. string data, maps, arrays
- Slot: an area of memory that holds the fixed-length part of a value (e.g. INT, STRING) if not null
- Null indicators: a fixed-length bitstring that indicates whether slots are NULL
Tuple: an array of slots, plus null indicators - Row: a logical row comprised of a number of values. A row is comprised of multiple tuples and represented as a fixed-length array of pointers to tuples.
- RowBatch: a batch of rows, plus information about memory resources referenced by the rows.
- Operator/ExecNode: a physical query operator, e.g. aggregation, join, scan
2. 示意图
Here is an example memory layout for an (INT, STRING, BIGINT, STRING) row that is comprised of two tuples. The data is:
INT | STRING | BIGINT | STRING |
---|---|---|---|
999 | "hello" | NULL | NULL |
NULL | "hell" | 12345 | "world" |
NULL | "hell" | 12345 | "world" |
NULL | NULL | NULL | NULL |
The memory layout purposefully uses many features to illustrate how data can be shared between rows and tuples. Most batches have simpler layouts.
image.png
网友评论