Analyzer
analyzer | logical name | description |
---|---|---|
standard analyzer | standard | standard tokenizer, standard filter, lower case filter, stop filter |
simple analyzer | simple | lower case tokenizer |
stop analyzer | stop | lower case tokenizer, stop filter |
keyword analyzer | keyword | 不分词,内容整体作为一个token(not_analyzed) |
pattern analyzer | whitespace | 正则表达式分词,默认匹配\W+ |
language analyzers | lang | 各种语言 |
snowball analyzer | snowball | standard tokenizer, standard filter, lower case filter, stop filter, snowball filter |
custom analyzer | custom | 至少需要指定一个 Tokenizer, 零个或多个Token Filter, 零个或多个Char Filter |
Character Filter
character filter | logical name | description |
---|---|---|
mapping char filter | mapping | 根据配置的映射关系替换字符 |
html strip char filter | html_strip | 去掉HTML元素 |
pattern replace char filter | pattern_replace | 用正则表达式处理字符串 |
Tokenizer
tokenizer | logical name | description |
---|---|---|
standard tokenizer | standard | |
edge ngram tokenizer | edgeNGram | |
keyword tokenizer | keyword | 不分词 |
letter analyzer | letter | 按单词分 |
lowercase analyzer | lowercase | letter tokenizer, lower case filter |
ngram analyzers | nGram | |
whitespace analyzer | whitespace | 以空格为分隔符拆分 |
pattern analyzer | pattern | 定义分隔符的正则表达式 |
uax email url analyzer | uax_url_email | 不拆分 url 和 email |
path hierarchy analyzer | path_hierarchy | 处理类似 /path/to/somthing 样式的字符串 |
Token Filter
token filter | logical name | description |
---|---|---|
standard filter | standard | |
ascii folding filter | asciifolding | |
length filter | length | 去掉太长或者太短的 |
lowercase filter | lowercase | 转成小写 |
ngram filter | nGram | |
edge ngram filter | edgeNGram | |
porter stem filter | porterStem | 波特词干算法 |
shingle filter | shingle | 定义分隔符的正则表达式 |
stop filter | stop | 移除 stop words |
word delimiter filter | word_delimiter | 将一个单词再拆成子分词 |
stemmer token filter | stemmer | |
stemmer override filter | stemmer_override | |
keyword marker filter | keyword_marker | |
keyword repeat filter | keyword_repeat | |
kstem filter | kstem | |
snowball filter | snowball | |
phonetic filter | phonetic | 插件 |
synonym filter | synonyms | 处理同义词 |
compound word filter | dictionary_decompounder, hyphenation_decompounder | 分解复合词 |
reverse filter | reverse | 反转字符串 |
elision filter | elision | 去掉缩略语 |
truncate filter | truncate | 截断字符串 |
unique filter | unique | |
pattern capture filter | pattern_capture | |
pattern replace filte | pattern_replace | 用正则表达式替换 |
trim filter | trim | 去掉空格 |
limit token count filter | limit | 限制 token 数量 |
hunspell filter | hunspell | 拼写检查 |
common grams filter | common_grams | |
normalization filter | arabic_normalization, persian_normalization |
网友评论