美文网首页
ElasticSearch 内置 Analyzer 一览表

ElasticSearch 内置 Analyzer 一览表

作者: 字母数字或汉字 | 来源:发表于2017-01-12 15:28 被阅读601次

    Analyzer

    analyzer logical name description
    standard analyzer standard standard tokenizer, standard filter, lower case filter, stop filter
    simple analyzer simple lower case tokenizer
    stop analyzer stop lower case tokenizer, stop filter
    keyword analyzer keyword 不分词,内容整体作为一个token(not_analyzed)
    pattern analyzer whitespace 正则表达式分词,默认匹配\W+
    language analyzers lang 各种语言
    snowball analyzer snowball standard tokenizer, standard filter, lower case filter, stop filter, snowball filter
    custom analyzer custom 至少需要指定一个 Tokenizer, 零个或多个Token Filter, 零个或多个Char Filter

    Character Filter

    character filter logical name description
    mapping char filter mapping 根据配置的映射关系替换字符
    html strip char filter html_strip 去掉HTML元素
    pattern replace char filter pattern_replace 用正则表达式处理字符串

    Tokenizer

    tokenizer logical name description
    standard tokenizer standard
    edge ngram tokenizer edgeNGram
    keyword tokenizer keyword 不分词
    letter analyzer letter 按单词分
    lowercase analyzer lowercase letter tokenizer, lower case filter
    ngram analyzers nGram
    whitespace analyzer whitespace 以空格为分隔符拆分
    pattern analyzer pattern 定义分隔符的正则表达式
    uax email url analyzer uax_url_email 不拆分 url 和 email
    path hierarchy analyzer path_hierarchy 处理类似 /path/to/somthing样式的字符串

    Token Filter

    token filter logical name description
    standard filter standard
    ascii folding filter asciifolding
    length filter length 去掉太长或者太短的
    lowercase filter lowercase 转成小写
    ngram filter nGram
    edge ngram filter edgeNGram
    porter stem filter porterStem 波特词干算法
    shingle filter shingle 定义分隔符的正则表达式
    stop filter stop 移除 stop words
    word delimiter filter word_delimiter 将一个单词再拆成子分词
    stemmer token filter stemmer
    stemmer override filter stemmer_override
    keyword marker filter keyword_marker
    keyword repeat filter keyword_repeat
    kstem filter kstem
    snowball filter snowball
    phonetic filter phonetic 插件
    synonym filter synonyms 处理同义词
    compound word filter dictionary_decompounder, hyphenation_decompounder 分解复合词
    reverse filter reverse 反转字符串
    elision filter elision 去掉缩略语
    truncate filter truncate 截断字符串
    unique filter unique
    pattern capture filter pattern_capture
    pattern replace filte pattern_replace 用正则表达式替换
    trim filter trim 去掉空格
    limit token count filter limit 限制 token 数量
    hunspell filter hunspell 拼写检查
    common grams filter common_grams
    normalization filter arabic_normalization, persian_normalization

    相关文章

      网友评论

          本文标题:ElasticSearch 内置 Analyzer 一览表

          本文链接:https://www.haomeiwen.com/subject/elndbttx.html