美文网首页
ES7 Token filters

ES7 Token filters

作者: 逸章 | 来源:发表于2020-05-03 16:57 被阅读0次

Token filters receive a stream of tokens from a tokenizer and have the ability to add, modify, or delete tokens.

1.The HTML strip character filter("type": "html_strip")

PUT my_index_name
{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer_name": {
                    "tokenizer": "keyword",
                    "char_filter": ["my_char_filter_name"]
                }
            },
            "char_filter": {
                "my_char_filter_name": {
                    "type": "html_strip",
                    "escaped_tags": ["b"]
                }
            }
        }
    }
}

测试如下:

POST _analyze 
{
    "tokenizer": "keyword",
    "char_filter": ["html_strip"],
    "text": "<p>I'm so <b>happy</b>!</p>"
}
图片.png

2. The mapping character filter("type": "mapping")

we can replace certain characters in a string with their associated keys.

PUT my_index_name 
{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer_name": {
                    "tokenizer": "keyword",
                    "char_filter": [
                        "my_char_filter_name"
                    ]
                }
            },
            "char_filter": {
                "my_char_filter_name": {
                    "type": "mapping",
                    "mappings": [
                        "٠ => 0",
                        "١ => 1",
                        "٢ => 2",
                        "٣ => 3",
                        "٤ => 4",
                        "٥ => 5",
                        "٦ => 6",
                        "٧ => 7",
                        "٨ => 8",
                        "٩ => 9"
                    ]
                }
            }
       }
    }
}

测试:

POST my_index_name/_analyze 
{
    "analyzer": "my_analyzer_name",
    "text": "My license plate is ٢٥٠١٥"
}
图片.png

3. The pattern replace character filter("type": "pattern_replace")

例子:
1、元数据:"aa bb aa bb" 、pattern="(aa)\s+(bb)"、 replacement="1#2"
输出结果为:"aa#bb aa#bb"
2、元数据:"aa123bb" 、pattern="(aa)\d+(bb)" 、 replacement="12"
输出结果为:"aa bb"

PUT pattern_test5
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(\\d+)-(?=\\d)",
          "replacement": "$1_"
        }
      }
    }
  }
}

测试数据:

POST pattern_test5/_analyze
{
  "analyzer": "my_analyzer",
  "text": "My credit card is 123-456-789"
}
图片.png

相关文章

网友评论

      本文标题:ES7 Token filters

      本文链接:https://www.haomeiwen.com/subject/kltjghtx.html