DefaultIndexingChain
前面几个章节主要介绍了IndexWriter内部各个关键操作的流程,本小节会介绍最核心的DWPT内部对文档进行索引构建的流程。Lucene内部索引构建最关键的概念是IndexingChain,顾名思义,链式的索引构建。为啥是链式的?这个和Lucene的整个索引体系结构有关系,Lucene提供了各种不同类型的索引类型,例如倒排、正排(列存)、StoreField、DocValues等。每个不同的索引类型对应不同的索引算法、数据结构以及文件存储,有些是列级别的,有些是文档级别的。所以一个文档写入后,需要被这么多种不同索引处理,有些索引会共享memory-buffer,有些则是完全独立的。基于这个架构,理论上Lucene是提供了扩展其他类型索引的可能性,顶级玩家也可以去尝试。
-
image.png
源码
private int processField(IndexableField field, long fieldGen, int fieldCount) throws IOException, AbortingException {
String fieldName = field.name();
IndexableFieldType fieldType = field.fieldType();
PerField fp = null;
if (fieldType.indexOptions() == null) {
throw new NullPointerException("IndexOptions must not be null (field: \"" + field.name() + "\")");
}
// Invert indexed fields:
if (fieldType.indexOptions() != IndexOptions.NONE) {
// if the field omits norms, the boost cannot be indexed.
if (fieldType.omitNorms() && field.boost() != 1.0f) {
throw new UnsupportedOperationException("You cannot set an index-time boost: norms are omitted for field '" + field.name() + "'");
}
fp = getOrAddField(fieldName, fieldType, true);
boolean first = fp.fieldGen != fieldGen;
//创建倒排索引
fp.invert(field, first);
if (first) {
fields[fieldCount++] = fp;
fp.fieldGen = fieldGen;
}
} else {
verifyUnIndexedFieldType(fieldName, fieldType);
}
// Add stored fields:
if (fieldType.stored()) {
if (fp == null) {
fp = getOrAddField(fieldName, fieldType, false);
}
if (fieldType.stored()) {
String value = field.stringValue();
if (value != null && value.length() > IndexWriter.MAX_STORED_STRING_LENGTH) {
throw new IllegalArgumentException("stored field \"" + field.name() + "\" is too large (" + value.length() + " characters) to store");
}
try {
//创建storeField
storedFieldsConsumer.writeField(fp.fieldInfo, field);
} catch (Throwable th) {
throw AbortingException.wrap(th);
}
}
}
DocValuesType dvType = fieldType.docValuesType();
if (dvType == null) {
throw new NullPointerException("docValuesType must not be null (field: \"" + fieldName + "\")");
}
if (dvType != DocValuesType.NONE) {
if (fp == null) {
fp = getOrAddField(fieldName, fieldType, false);
}
//创建docValue
indexDocValue(fp, dvType, field);
}
if (fieldType.pointDimensionCount() != 0) {
if (fp == null) {
fp = getOrAddField(fieldName, fieldType, false);
}
//创建point value
indexPoint(fp, field);
}
return fieldCount;
}
问题
- 创建正排(列存)没有看到,正排就是docvalue
- ponit value的作用是什么
网友评论