文档简介(0.9.0)
Data in Druid is stored in a custom column format known as a segment. Segments are composed of different types of columns. Column.java
and the classes that extend it is a great place to looking into the storage format.
基本类
ValueType
枚举类,包含四个可选项:
- Float
- Long
- String
- Complex
IndexedInts
主要有三个方法:
int size();
int get(int index);
void fill(int index, int[] toFill);
实现类主要有:
- EmptyIndexedInts
- IntBufferIndexedInts
- ListBasedIndexedInts
- VSizeIndexedInts
size()
指的是该 Buffer 下还有多少个元素可读或可写;
get(index)
读取该 Buffer 下的 index 个元素;
fill()
将对应的 Channel 数据填充到该 Buffer,目前都不支持该方法.
其中,ListBasedIndexedInts
采用的存储是 List<Integer>
.
可以看出,部分是采用 Java NIO 操作 native memory.
ColumnCapabilities
属性:
private ValueType type = null;
private boolean dictionaryEncoded = false; // 是否字典编码
private boolean runLengthEncoded = false; // 是否 runLength 编码,runLength 是虚构的,可忽略
private boolean hasInvertedIndexes = false; // 是否倒排索引
private boolean hasSpatialIndexes = false; // 是否稀疏索引
private boolean hasMultipleValues = false; // 是否有多值
DictionaryEncodedColumn
基本方法:
public int length(); // 一个字典编码列的总长度
public boolean hasMultipleValues(); // 是否有多值的情况
public int getSingleValueRow(int rowNum); // 获取某行的单值
public IndexedInts getMultiValueRow(int rowNum); // 获取某行的多值
public String lookupName(int id); // 通过 id 索引获取对应行的值,注意,null and empty 都会转化成 null
public int lookupId(String name); //
public int getCardinality(); // 获取基数,字典长度
唯一实现类SimpleDictionaryEncodedColumn
,有三个属性:
private final IndexedInts column;
private final IndexedMultivalue<IndexedInts> multiValueColumn;
private final CachingIndexed<String> cachedLookups;
其中有意思的是 cachedLookups
,存储的是字典。
CachingIndexed
字典的具体实现类,实现了 Indexed
接口,其它的实现类主要有:
- GenericIndexed
- ArrayIndexed
- BufferIndexed
- ListIndexed
- VSizeIndexed
CachingIndexed
是 wrapping a given GenericIndexed,同时使用一个 LRUMap SizedLRUMap<Integer, T>
来存储 cachedValues.
GenericIndexed
A generic, flat storage mechanism. Use static methods fromArray() or fromIterable() to construct. If input is sorted, supports binary search index lookups. If input is not sorted, only supports array-like index lookups.
V1 Storage Format:
- byte 1: version (0x1)
- byte 2 == 0x1 => allowReverseLookup
- bytes 3-6 => numBytesUsed
- bytes 7-10 => numElements
- bytes 10-((numElements * 4) + 10): integers representing 'end' offsets of byte serialized values
- bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value
属性有:
private final ByteBuffer theBuffer; // 内置的 ByteBuffer 存储
private final ObjectStrategy<T> strategy;
private final boolean allowReverseLookup;
private final int size; // theBuffer 的当前 int 值
private final int valuesOffset;
private final BufferIndexed bufferIndexed; // 内部类, BufferIndexed
Column 类
接口,详见实现类
SimpleColumn 类
属性:
private final ColumnCapabilitiescapabilities;
private final SupplierdictionaryEncodedColumn;
private final SupplierrunLengthColumn;
private final SuppliergenericColumn;
private final SuppliercomplexColumn;
private final SupplierbitmapIndex;
private final SupplierspatialIndex;
网友评论