ES是如何实现智能提示?
image.png
首先需要增加一个Completion字段
# es.operation.py
......
from elasticsearch_dsl import Completion
class JianshuType(DocType):
suggest = Completion(analyzer="ik_max_word")
.......
但由于使用ik_max_word,会出错,所以我们需要自己定义分析器,这样可以避免报错问题
# es.operation.py
......
from elasticsearch_dsl.analysis import CustomAnalyzer as _CustomAnalyzer
class CustomAnalyzer(_CustomAnalyzer):
def get_analysis_definition(self):
return {}
ik_analyzer = CustomAnalyzer('ik_max_word',filter=['lowercase'])
class JianshuType(DocType): # 自定义一个类来继承DocType类
suggest = Completion(analyzer=ik_analyzer, search_analyzer=ik_analyzer)
.......
image.png
那爬虫爬下的每一条数据是如何变成suggest值呢?我们在pipeline中定义生成建议的函数来处理字段(title和subjects,并附上各自的权重)
# pipeline.py
from jianshu.es_operation import JianshuType
from elasticsearch_dsl.connections import connections
es = connections.get_connection(JianshuType._doc_type.using)#建立连接
def gen_suggests(index,info_tuple):
#根据字符串生成搜索建议数组
user_words = set()
suggests = []
for text,weight in info_tuple:
if text:
#调用es的analyze接口分析字符串
words = es.indices.analyze(index=index,params={'filter':["lowercase"]},body={'text':text,'analyzer':"ik_max_word"})
analyzed_words = set([r["token"] for r in words['tokens'] if len(r["token"])>1])
new_words = analyzed_words-user_words
else:
new_words = set()
if new_words:
suggests.append({"input":list(new_words),"weight":weight})
return suggests
class JianshuESPipeline(object):
def process_item(self,item,spider):
jianshu = JianshuType()
jianshu.title = item["title"]
......
jianshu.suggest = gen_suggests(JianshuType._doc_type.index,((jianshu.title,10),(jianshu.subjects,7)))
jianshu.save()
return item
运行爬虫,我们可以在head中随便看一条数据,查看suggest的值:
image.png
网友评论