美文网首页
elasticsearch实现中文与拼音的搜索联想

elasticsearch实现中文与拼音的搜索联想

作者: 雨中漫步的北极熊 | 来源:发表于2019-02-27 17:47 被阅读0次
常见的电商搜索比如京东、淘宝,输入面膜或者mm,下拉框会有很多引导用户去选择的关键字,比如 面膜 面霜、面膜 补水 ,因为最近项目需求需要加入搜索联想的功能,在这过程中碰过很多次坑,所以在这记录。

常见的搜索联想有通过数据库来实现,比如mysql、oracle,通过sql语句的LIKE 查询,可以实现前缀匹配。这种在数据量不大的情况下是可以的,但是一般电商平台的索引数据量都是非常大,这样查出来的速度就很慢,用户体验也很不好。另外一种是使用搜索引擎实现的搜索,因为搜索引擎会给每个分词加索引,我们获取回来就很快。

倒排索引

Elasticsearch使用一种叫做倒排索引(inverted index)的结构来做快速的全文搜索。倒排索引由在文档中出现的唯一的单词列表,以及对于每个单词在文档中的位置组成。
正序索引 是一个 索引对应一个文档字段

索引 文档 文档
1 中国 中华人民共和国
2 中国 美国

倒排索引 是把文档字段分词,对应文档的索引

文档 1 2
中国 X X
中华人民共和国 X
美国 X

使用elasticsearch实现的搜索联想就是通过分词器进行分词
生成tokens,然后通过倒排索引的方式来搜索出所在的文档,然后会显回来。

spring-boot整合elasticsearch实现搜索联想

pom.xml文件

<parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.5.1.RELEASE</version>
    </parent>

    <dependencies>

        <!-- Spring Boot Elasticsearch 依赖 -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>

        <!-- Spring Boot Web 依赖 -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <!-- Junit -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <dependency>
            <groupId>org.nlpcn</groupId>
            <artifactId>nlp-lang</artifactId>
            <version>1.7.6</version>
        </dependency>
    </dependencies>

引入一个简单的spring-boot整合elasticsearch的项目,和拼音jar包

注解实现的实体

@Document(indexName = "cityindex", type = "citysuggest")
public class CitySuggest  implements Serializable{
  
  /**
   * 
   */
  private static final long serialVersionUID = 1L;
  
  @Field(type=FieldType.Long)
  private Long id;
  
  @Field(type=FieldType.String)
  private String keyword;
  
  @CompletionField(analyzer="ik_smart",searchAnalyzer="ik_smart",payloads=false)
  private Completion suggesttag;
  

  public Long getId() {
    return id;
  }

  public void setId(Long id) {
    this.id = id;
  }

  public Completion getSuggesttag() {
    return suggesttag;
  }

  public void setSuggesttag(Completion suggesttag) {
    this.suggesttag = suggesttag;
  }

  public String getKeyword() {
    return keyword;
  }

  public void setKeyword(String keyword) {
    this.keyword = keyword;
  }
}

注意:同一个索引中的id注解模式@Field(type=FieldType.Long),所有的type中要一致,不然后面定义的联想无效
实现联想的字段类型Completion,也就是官网上面的"type":"competeion"
该项目是elasticsearch2.3.3 + sping-boot 1.7.5
进行分词生成索引

public boolean updateSuggest(City city) {
    AnalyzeRequestBuilder requestBuilder = new AnalyzeRequestBuilder(esClient, AnalyzeAction.INSTANCE, "cityindex", city.getCityname());
    requestBuilder.setAnalyzer("ik_smart");
    AnalyzeResponse response = requestBuilder.get();
    List<AnalyzeToken> tokens = response.getTokens();
    List<String> input = new ArrayList<String>();
    List<CitySuggest> citySuggests = new ArrayList<CitySuggest>();
    for (AnalyzeToken token : tokens) {
      if (token.getTerm().length() < 2) {
        continue;
      }
      if (!input.contains(token.getTerm())) {
        input.add(token.getTerm());
      }
    }
    
    //关键字处理
    for(int i=0,j=input.size();i<j;i++){
      CitySuggest citySuggest = new CitySuggest();
      List<String> itemInput = new ArrayList<String>();
      itemInput.add(input.get(i));
      
      itemInput.add(Pinyin.list2StringSkipNull(Pinyin.pinyin(input.get(i)),""));
      itemInput.add(Pinyin.list2StringSkipNull(Pinyin.firstChar(input.get(i)),""));
      Completion completion = new Completion(list2String(itemInput));
      completion.setOutput(input.get(i));
      citySuggest.setId((i+1L));
      citySuggest.setSuggesttag(completion);
      citySuggest.setKeyword(input.get(i));
      citySuggests.add(citySuggest);
    }
    for(int i=0;i<citySuggests.size();i++){
      citySuggestRepository.save(citySuggests.get(i));
    }
    return true;
  }

获取联想数据的接口

public List<String> suggest(String prefix) {
    CompletionSuggestionBuilder suggestion = SuggestBuilders.completionSuggestion("complete");
    suggestion.analyzer("ik_smart");
    //suggesttag是联想数据字段
    suggestion.text(prefix).field("suggesttag");
    SearchResponse response = this.esClient.prepareSearch("cityindex").setTypes("citysuggest").addSuggestion(suggestion).execute().actionGet();
    Suggest suggest = response.getSuggest();
    // 没有任何数据
    if (suggest == null) {
      return new ArrayList<String>();
    }
    List<? extends Suggest.Suggestion.Entry<? extends Suggest.Suggestion.Entry.Option>> list = response.getSuggest().getSuggestion("complete").getEntries();
    List<String> suggestList = new ArrayList<String>();
    if (list == null) {
      return null;
    } else {
      for (Suggest.Suggestion.Entry<? extends Suggest.Suggestion.Entry.Option> e : list) {
        for (Suggest.Suggestion.Entry.Option option : e) {
          suggestList.add(option.getText().toString());
        }
      }
    }
    return suggestList;
  }
测试接口

http://10.0.0.80:8080/api/city/suggest?content=mm
效果数据

{
    "result": 0,
    "msg": "获取数据成功",
    "nowtime": 1551255460698,
    "suggests": [
        "面膜"
    ]
}
动态生成索引
POST  /gangyanindex/goodsuggest/_mapping
{
"goodsuggest": {
    "properties": {
        "suggesttag": {
        "max_input_length": 50,
        "payloads": false,
        "analyzer": "ik_smart",
        "preserve_position_increments": true,
        "type": "completion",
        "preserve_separators": true
     },
      "id": {
          "index": "not_analyzed",
          "type": "string"
       },
      "keyword": {
       "type": "string"
      }
   }
 }
}

相关文章

网友评论

      本文标题:elasticsearch实现中文与拼音的搜索联想

      本文链接:https://www.haomeiwen.com/subject/gjrfuqtx.html