ElasticSearch 缓存概览

作者: persisting_ | 来源:发表于2019-01-06 17:28 被阅读0次

ElasticSearch 缓存概览
Elasticsearch概览
Elasticsearch概览
每周阅读（8/8/2016）
Elasticsearch 架构以及源码概览
Elasticsearch 架构以及源码概览
ElasticSearch - 聚合查询概览
useCallback 和 useMemo 的区别
[译]HDFS的中心化缓存 (Centralized Cache
Elasticsearch之缓存

1 概述
2 Node级别缓存
3 Index级别缓存

1 概述

本文只是ElasticSearch中缓存综述，不具体介绍缓存如何实现以及如何使用。ElasticSearch的缓存分为Node级别、Index级别两种。根据阅读源码，发现Node级别的缓存有Query缓存IndicesQueryCache、索引Field缓存IndicesFieldDataCache、索引Request缓存IndicesRequestCache。Index级别的缓存有Filter缓存BitsetFilterCache。这里Node级别和Index级别的划分依据是Cache实例是Node级别维护一个还是每个Index各维护一个，所以有的Cache在本文被划分为Node级别，但是也有可能是通过Index级别管理的。

2 Node级别缓存

根据上面的介绍，Node级别的缓存有Query缓存IndicesQueryCache、索引Field缓存IndicesFieldDataCache、索引Request缓存IndicesRequestCache。

2.1 Query缓存`IndicesQueryCache`

官网说明：

The query cache is responsible for caching the results of queries. There is one queries cache per node that is shared by all shards. The cache implements an LRU eviction policy: when a cache becomes full, the least recently used data is evicted to make way for new data. It is not possible to look at the contents being cached.

The query cache only caches queries which are being used in a filter context.

查看IndicesQueryCache定义可以发现，其使用org.apache.lucene.search.LRUQueryCache缓存查询信息。

IndicesQueryCache在IndicesService构造函数中被实例化：

//IndicesService.IndicesService(...)
this.indicesQueryCache = new IndicesQueryCache(settings);

相关的配置有如下几个，后面的介绍来自官网或者源码注释：

indices.queries.cache.size：

Controls the memory size for the filter cache , defaults to 10%. Accepts either a percentage value, like 5%, or an exact value, like 512mb.

indices.queries.cache.count

mostly a way to prevent queries from being the main source of memory usage of the cache

indices.queries.cache.all_segments：

enables caching on all segments instead of only the larger ones, for testing only

IndicesQueryCache会在IndicesService创建Index时传给创建的IndexService构造函数作为参数，调用轨迹如下：

IndexService.createIndex->

IndexService.createIndexService->

new IndexModule().newIndexService->

IndexService.IndexService()

在上面函数调用中new IndexModule().newIndexService中会使用传入的IndicesQueryCache实例化QueryCache对象：

//IndexModule.newIndexService
final QueryCache queryCache;
//如果index.queries.cache.enabled设置启用（默认启用），则
//创建具有缓存功能的QueryCache，否则创建DisabledQueryCache
if (indexSettings.getValue(INDEX_QUERY_CACHE_ENABLED_SETTING)) {
    BiFunction<IndexSettings, IndicesQueryCache, QueryCache> queryCacheProvider = forceQueryCacheProvider.get();
    if (queryCacheProvider == null) {
        queryCache = new IndexQueryCache(indexSettings, indicesQueryCache);
    } else {
        queryCache = queryCacheProvider.apply(indexSettings, indicesQueryCache);
    }
} else {
    queryCache = new DisabledQueryCache(indexSettings);
}

上面的IndexQueryCache定义如下，其实就是封装了IndicesQueryCache:

//IndexQueryCache
//注意下面的注释，说IndexQueryCache是index级别的缓存，但是实际的缓存功能是通过委托到Node级别的IndicesQueryCache实现的。
/**
 * The index-level query cache. This class mostly delegates to the node-level
 * query cache: {@link IndicesQueryCache}.
 */
public class IndexQueryCache extends AbstractIndexComponent implements QueryCache {

    final IndicesQueryCache indicesQueryCache;

    public IndexQueryCache(IndexSettings indexSettings, IndicesQueryCache indicesQueryCache) {
        super(indexSettings);
        this.indicesQueryCache = indicesQueryCache;
    }

    @Override
    public void close() throws ElasticsearchException {
        clear("close");
    }

    @Override
    public void clear(String reason) {
        logger.debug("full cache clear, reason [{}]", reason);
        indicesQueryCache.clearIndex(index().getName());
    }

    @Override
    public Weight doCache(Weight weight, QueryCachingPolicy policy) {
        return indicesQueryCache.doCache(weight, policy);
    }
}

创建的QueryCache会被传入IndexService构造函数，IndexService使用此QueryCache实例化IndexCache对象实例。IndexService在实例化IndexCache时使用了两个参数，一个是QueryCache，一个是第三节会介绍的BitsetFilterCache。

//IndexService.IndexService(...)
 this.bitsetFilterCache = new BitsetFilterCache(indexSettings, new BitsetCacheListener(this));
this.indexCache = new IndexCache(indexSettings, queryCache, bitsetFilterCache);

2.2 Field缓存`IndicesFieldDataCache`

官网说明：

The field data cache is used mainly when sorting on or computing aggregations on a field. It loads all the field values to memory in order to provide fast document based access to those values. The field data cache can be expensive to build for a field, so its recommended to have enough memory to allocate it, and to keep it loaded.

IndicesFieldDataCache也是在IndicesService构造函数中实例化的，属于Node级别的缓存：

//IndicesService.IndicesService(...
this.indicesFieldDataCache = new IndicesFieldDataCache(settings, new IndexFieldDataCache.Listener() {
    @Override
    public void onRemoval(ShardId shardId, String fieldName, boolean wasEvicted, long sizeInBytes) {
        assert sizeInBytes >= 0 : "When reducing circuit breaker, it should be adjusted with a number higher or equal to 0 and not [" + sizeInBytes + "]";
        circuitBreakerService.getBreaker(CircuitBreaker.FIELDDATA).addWithoutBreaking(-sizeInBytes);
    }
});)

实例化IndicesFieldDataCache时，其构造函数中使用了配置如下：

indices.fielddata.cache.size：

The max size of the field data cache, eg 30% of node heap space, or an absolute value, eg 12GB. Defaults to unbounded. Also see Field data circuit breakeredit.

查看IndicesFieldDataCache的定义可知，其内部使用org.elasticsearch.common.cache.Cache实现Filed缓存。

在实例化IndexService时，IndicesService会将IndicesFieldDataCache实例作为参数传入IndexService构造函数，IndexService在其构造函数中会使用IndicesFieldDataCache实例作为参数构造IndexFieldDataService对象实例。IndexFieldDataService是Index自己维护的用于缓存Filed信息的Service。

//IndexService.IndexService(...)
this.indexFieldData = new IndexFieldDataService(indexSettings, indicesFieldDataCache, circuitBreakerService, mapperService);

在使用IndexFieldDataService进行缓存操作时，最主要使用其getForField方法：

//IndexFieldDataService
public <IFD extends IndexFieldData<?>> IFD getForField(MappedFieldType fieldType) {
    return getForField(fieldType, index().getName());
}

@SuppressWarnings("unchecked")
public <IFD extends IndexFieldData<?>> IFD getForField(MappedFieldType fieldType, String fullyQualifiedIndexName) {
    final String fieldName = fieldType.name();
    IndexFieldData.Builder builder = fieldType.fielddataBuilder(fullyQualifiedIndexName);

    IndexFieldDataCache cache;
    synchronized (this) {
        cache = fieldDataCaches.get(fieldName);
        if (cache == null) {
            String cacheType = indexSettings.getValue(INDEX_FIELDDATA_CACHE_KEY);
            if (FIELDDATA_CACHE_VALUE_NODE.equals(cacheType)) {
                //调用IndicesFieldDataCache.buildIndexFieldDataCache方法
                //构建实际的IndexFieldDataCache实例用于缓存Filed数据
                cache = indicesFieldDataCache.buildIndexFieldDataCache(listener, index(), fieldName);
            } else if ("none".equals(cacheType)){
                cache = new IndexFieldDataCache.None();
            } else {
                throw new IllegalArgumentException("cache type not supported [" + cacheType + "] for field [" + fieldName + "]");
            }
            fieldDataCaches.put(fieldName, cache);
        }
    }

    return (IFD) builder.build(indexSettings, fieldType, cache, circuitBreakerService, mapperService);
}

下面看下IndicesFieldDataCache.buildIndexFieldDataCache是如何构建IndexFieldDataCache的：

public IndexFieldDataCache buildIndexFieldDataCache(IndexFieldDataCache.Listener listener, Index index, String fieldName) {
    //参数cache是Node级别IndicesFieldDataCache的成员，从这里可以看出
    //虽然IndexService会自己创建IndexFieldDataService，但是其最终
    //创建的IndexFieldDataCache实际缓存的数据还是放在Node级别实例
    //IndicesFieldDataCache的成员cache中。
    return new IndexFieldCache(logger, cache, index, fieldName, indicesFieldDataCacheListener, listener);
}

2.3 Request缓存`IndicesRequestCache`

官网说明：

When a search request is run against an index or against many indices, each involved shard executes the search locally and returns its local results to the coordinating node, which combines these shard-level results into a “global” result set.

The shard-level request cache module caches the local results on each shard. This allows frequently used (and potentially heavy) search requests to return results almost instantly. The requests cache is a very good fit for the logging use case, where only the most recent index is being actively updated — results from older indices will be served directly from the cache.

IndicesRequestCache在IndicesService构造函数被实例化：

//IndicesService.IndicesService(...)
this.indicesRequestCache = new IndicesRequestCache(settings);

涉及到的配置参数有：

index.requests.cache.enable

配置是否开启Request缓存，配置粒度可以到Index级别。

indices.requests.cache.size

The cache is managed at the node level, and has a default maximum size of 1% of the heap(配置Request缓存占用的最大内存)

indices.requests.cache.expire
下面是来自官网的说明：

Also, you can use the indices.requests.cache.expire setting to specify a TTL for cached results, but there should be no reason to do so. Remember that stale results are automatically invalidated when the index is refreshed. This setting is provided for completeness' sake only.

3 Index级别缓存

Index级别的缓存有Filter缓存BitsetFilterCache。

BitsetFilterCache是在IndexService构造函数被实例化的：

//IndexService.IndexService(...)
this.bitsetFilterCache = new BitsetFilterCache(indexSettings, new BitsetCacheListener(this));

BitsetFilterCache使用的参数是：

index.load_fixed_bitset_filters_eagerly

下面是来自github issue的一句说明：

FixedBitSetFilterCache is a data structure that is loaded eagerly in memory (by default) to support nested query/filter and nested aggregations. However, the problem is that it can cause it to use too much heap for it is loaded for all nested fields (regardless of whether these fields are being used). To prevent this from happening, a common configuration workaround is to set index.load_fixed_bitset_filters_eagerly: false in the yml of the nodes and restart them to prevent the nodes from running OOM when attempting to eagerly load the fixedbitsets.

BitsetFilterCache在IndexService构造函数实例化之后，会被作为参数构建IndexCache缓存

//IndexService.IndexService(...)
this.bitsetFilterCache = new BitsetFilterCache(indexSettings, new BitsetCacheListener(this));
this.indexCache = new IndexCache(indexSettings, queryCache, bitsetFilterCache);

ElasticSearch 缓存概览
1 概述 2 Node级别缓存2.1 Query缓存IndicesQueryCache2.2 Field缓存Ind...
Elasticsearch概览
1.什么是Elasticsearch (1)Elasticsearch，基于lucene，隐藏复杂性，提供简单易用...
Elasticsearch概览
首先这是一套关于elasticsearch7.0使用与探索的文章，主要面向对elasticsearch有一定了解的...
每周阅读（8/8/2016）
Elasticsearch 架构以及源码概览简要的介绍了Elasticsearch 2016年的互联网创业凛冬已...
Elasticsearch 架构以及源码概览
Elasticsearch 架构以及源码概览 Elasticsearch是最近两年异军突起的一个兼有搜索引擎和No...
Elasticsearch 架构以及源码概览
Elasticsearch 架构以及源码概览 Elasticsearch是最近两年异军突起的一个兼有搜索引擎和No...
ElasticSearch - 聚合查询概览
聚合 Elasticsearch除搜索以外，提供的针对ES数据进行统计分析的功能实时性高通过聚合，我们会得到一个...
useCallback 和 useMemo 的区别
区别概览 -useCallBackuseMemo返回值一个缓存的回调函数一个缓存的值参数需要缓存的函数，依赖项需要...
[译]HDFS的中心化缓存 (Centralized Cache
原文概览 HDFS上的中心化缓存是一个显式的缓存机制, 使得用户可以指定哪个路径被缓存. Namenode和拥有...
Elasticsearch之缓存
Elasticsearch 包含三个类型的缓存，分别为： Node Query Cache 、 Shard Req...

ElasticSearch 缓存概览

1 概述

2 Node级别缓存

2.1 Query缓存`IndicesQueryCache`

2.2 Field缓存`IndicesFieldDataCache`

2.3 Request缓存`IndicesRequestCache`

3 Index级别缓存

相关文章