美文网首页
17.ES集成到Django

17.ES集成到Django

作者: MononokeHime | 来源:发表于2018-06-14 13:05 被阅读0次

ES是如何实现智能提示?


image.png

首先需要增加一个Completion字段

# es.operation.py
......
from elasticsearch_dsl import  Completion
class JianshuType(DocType): 
    suggest = Completion(analyzer="ik_max_word")
.......

但由于使用ik_max_word,会出错,所以我们需要自己定义分析器,这样可以避免报错问题

# es.operation.py
......
from elasticsearch_dsl.analysis import CustomAnalyzer as _CustomAnalyzer
class CustomAnalyzer(_CustomAnalyzer):
    def get_analysis_definition(self):
        return {}

ik_analyzer = CustomAnalyzer('ik_max_word',filter=['lowercase'])

class JianshuType(DocType):  # 自定义一个类来继承DocType类
    suggest = Completion(analyzer=ik_analyzer, search_analyzer=ik_analyzer)
.......
image.png

那爬虫爬下的每一条数据是如何变成suggest值呢?我们在pipeline中定义生成建议的函数来处理字段(title和subjects,并附上各自的权重)

# pipeline.py
from jianshu.es_operation import JianshuType
from elasticsearch_dsl.connections import connections
es = connections.get_connection(JianshuType._doc_type.using)#建立连接

def gen_suggests(index,info_tuple):
    #根据字符串生成搜索建议数组
    user_words = set()
    suggests = []
    for text,weight in info_tuple:
        if text:
            #调用es的analyze接口分析字符串
            words = es.indices.analyze(index=index,params={'filter':["lowercase"]},body={'text':text,'analyzer':"ik_max_word"})
            analyzed_words = set([r["token"] for r in words['tokens']  if len(r["token"])>1])
            new_words = analyzed_words-user_words
        else:
            new_words = set()

        if new_words:
            suggests.append({"input":list(new_words),"weight":weight})

    return suggests

class JianshuESPipeline(object):

    def process_item(self,item,spider):
        jianshu = JianshuType()
        jianshu.title = item["title"]
       ......
        jianshu.suggest = gen_suggests(JianshuType._doc_type.index,((jianshu.title,10),(jianshu.subjects,7)))
        jianshu.save()
        return item

运行爬虫,我们可以在head中随便看一条数据,查看suggest的值:


image.png

相关文章

网友评论

      本文标题:17.ES集成到Django

      本文链接:https://www.haomeiwen.com/subject/paqqeftx.html