用jina快速构建搜索服务

作者: 飘涯 | 来源:发表于2022-09-28 14:35 被阅读0次

神经搜索工具

特定语法

excutor

编写自己的flow；

class MyExecutor(Executor):
    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        docs[0].text = 'hello, world!'
        docs[1].text = 'goodbye, world!'

    @requests(on='/crunch-numbers')

    def bar(self, docs: DocumentArray, **kwargs):

        for doc in docs:

            doc.tensor = torch.tensor(np.random.random([10, 2]))

flow

提供api接口，定义好输入输出，比较灵活；

一个项目可以由多个flow共同决定

可以将写好的flow放到hub上快速加载

Hub

Jcloud

示例：

01：搜索系统

整体框架

输入：电影名称，描述，电影类型
输出：电影单

image.png

流程

下周数据集
将数据集加载到Docarray中
将Docarray，进行数据预处理，比如分词，分句等，然后生成向量表示。
构建索引
将输入进行编码，在索引中找到最佳匹配选项，通过api返回出来。

02构建PDF搜索系统

流程

准备pdf数据
解析pdf；准备pdf解析flow
文本处理以及分局分词
embedding
构建索引
构建输入的flow；进行匹配，返回最近的索引

from docarray import DocumentArray
from jina import Flow
docs = DocumentArray.from_files("pdf_data/*.pdf", recursive=True)
flow = Flow()
flow = (
    Flow()
    .add(
        uses="jinahub://PDFSegmenter",
        install_requirements=True,
        name="segmenter"
    )
    .add(
        uses="jinahub://SpacySentencizer",
        uses_with={"traversal_paths": "@c"},
        install_requirements=True,
        name="sentencizer",
    )
    .add(
        uses="jinahub://TransformerTorchEncoder",
        uses_with={"traversal_paths": "@cc"},
        install_requirements=True,
        name="encoder"
    )
    .add(
        uses="jinahub://SimpleIndexer",
        uses_with={"traversal_right": "@cc"},
        install_requirements=True,
        name="indexer"
    )
)
flow.plot()
with flow:
    docs = flow.index(docs, show_progress=True)

# 构建搜索flow
search_flow = (
    Flow()
    .add(
        uses="jinahub://TransformerTorchEncoder", 
         name="encoder"
    )
    .add(
        uses="jinahub://SimpleIndexer",
        uses_with={"traversal_right": "@cc"},
        name="indexer"
    )
)

search_term = "一种基于词向量的hownet表示方法"

from docarray import Document

query_doc = Document(text=search_term)

with search_flow:
    results = search_flow.search(query_doc, show_progress=True, return_results=True)

for match in results[0].matches:
    print(match.text)
    print(match.scores["cosine"].value)
    print("---")

网友评论

本文标题：用jina快速构建搜索服务

本文链接：https://www.haomeiwen.com/subject/agscartx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

用jina快速构建搜索服务

神经搜索工具

特定语法

excutor

flow

Hub

Jcloud

示例：

01：搜索系统

整体框架

流程

02构建PDF搜索系统

流程

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读