17.ES集成到Django

作者: MononokeHime | 来源:发表于2018-06-14 13:05 被阅读0次

17.ES集成到Django
Django 进阶之 celery
flask开发之flask-SQLAlchemy
Django - 集成CAS单点登录
Django REST Elasticsearch
django+xadmin集成DjangoUeditor3富文本
Django学习（八）- 分页器
Django 入门
apache+mod_wsgi 部署 Django 项目
CKEditor在Django项目的使用指南

ES是如何实现智能提示？

image.png

首先需要增加一个Completion字段

# es.operation.py
......
from elasticsearch_dsl import  Completion
class JianshuType(DocType): 
    suggest = Completion(analyzer="ik_max_word")
.......

但由于使用ik_max_word，会出错，所以我们需要自己定义分析器，这样可以避免报错问题

# es.operation.py
......
from elasticsearch_dsl.analysis import CustomAnalyzer as _CustomAnalyzer
class CustomAnalyzer(_CustomAnalyzer):
    def get_analysis_definition(self):
        return {}

ik_analyzer = CustomAnalyzer('ik_max_word',filter=['lowercase'])

class JianshuType(DocType):  # 自定义一个类来继承DocType类
    suggest = Completion(analyzer=ik_analyzer, search_analyzer=ik_analyzer)
.......

image.png

那爬虫爬下的每一条数据是如何变成suggest值呢？我们在pipeline中定义生成建议的函数来处理字段（title和subjects，并附上各自的权重）

# pipeline.py
from jianshu.es_operation import JianshuType
from elasticsearch_dsl.connections import connections
es = connections.get_connection(JianshuType._doc_type.using)#建立连接

def gen_suggests(index,info_tuple):
    #根据字符串生成搜索建议数组
    user_words = set()
    suggests = []
    for text,weight in info_tuple:
        if text:
            #调用es的analyze接口分析字符串
            words = es.indices.analyze(index=index,params={'filter':["lowercase"]},body={'text':text,'analyzer':"ik_max_word"})
            analyzed_words = set([r["token"] for r in words['tokens']  if len(r["token"])>1])
            new_words = analyzed_words-user_words
        else:
            new_words = set()

        if new_words:
            suggests.append({"input":list(new_words),"weight":weight})

    return suggests

class JianshuESPipeline(object):

    def process_item(self,item,spider):
        jianshu = JianshuType()
        jianshu.title = item["title"]
       ......
        jianshu.suggest = gen_suggests(JianshuType._doc_type.index,((jianshu.title,10),(jianshu.subjects,7)))
        jianshu.save()
        return item

运行爬虫，我们可以在head中随便看一条数据，查看suggest的值：

image.png