美文网首页
Slot & Tuple & Row & RowBatch的理解

Slot & Tuple & Row & RowBatch的理解

作者: GOGOYAO | 来源:发表于2019-11-11 15:15 被阅读0次

    [TOC]

    参考

    impala wiki —— Impala Row Batches

    1. 名词解释

    • Value: a value (e.g. int, string, array, etc). All values in Impala are nullable internally.
    • Fixed-length data: the part of a value that is fixed in size (e.g. a 32-bit integer, the 32-bit length + 64-bit pointer representing a string)
    • Variable-length data: parts of a value that vary in length, e.g. string data, maps, arrays
    • Slot: an area of memory that holds the fixed-length part of a value (e.g. INT, STRING) if not null
    • Null indicators: a fixed-length bitstring that indicates whether slots are NULL
      Tuple: an array of slots, plus null indicators
    • Row: a logical row comprised of a number of values. A row is comprised of multiple tuples and represented as a fixed-length array of pointers to tuples.
    • RowBatch: a batch of rows, plus information about memory resources referenced by the rows.
    • Operator/ExecNode: a physical query operator, e.g. aggregation, join, scan

    2. 示意图

    Here is an example memory layout for an (INT, STRING, BIGINT, STRING) row that is comprised of two tuples. The data is:

    INT STRING BIGINT STRING
    999 "hello" NULL NULL
    NULL "hell" 12345 "world"
    NULL "hell" 12345 "world"
    NULL NULL NULL NULL

    The memory layout purposefully uses many features to illustrate how data can be shared between rows and tuples. Most batches have simpler layouts.

    image.png

    相关文章

      网友评论

          本文标题:Slot & Tuple & Row & RowBatch的理解

          本文链接:https://www.haomeiwen.com/subject/uhutictx.html