美文网首页
调研 | NLP云服务简报

调研 | NLP云服务简报

作者: Cookie_JL | 来源:发表于2017-05-11 23:09 被阅读0次

NLP 步骤拆解


原文:
Overview of Artificial Intelligence and Role of Natural Language Processing in Big Data
by Jagreet Kaur

Comment:原文说明得有些零散,我按照自己的理解重新整理了一遍。

Step1. 语法分析Syntax Analysis

1.1 句子切分Sentence Segmentation
1.2 词语标记Tokenization
1.3 变体词元化Stemming / Lemmatization
1.4 词性标注Part-of-Speech Tagging
1.5 语法解析Parsing
1.6 指定实体识别Named Entity Recognition

Step 2. 语义分析Semantic Analysis / Natural Language Understanding

2.1 词义理解
2.2 歧义化解Ambiguity Resolving
  2.1.1 词汇歧义Lexical Ambiguity
  2.1.2 语法歧义Syntactic Ambiguity
  2.1.3 语义歧义Semantic Ambiguity
  2.1.4 回指歧义Anaphoric Ambiguity

Step 3. 意图理解Pragmatics Analysis

Step 4.自然语言生成Natural Language Generation

3.1 文字材料规划Text Planning
3.2 句子规划Sentence Planning
3.3 整合Realization



大型NLP服务提供商产品定义简析


1. Google Cloud Platform

1.1 Natural Language API

Notes: 需配合Speech API 来对音频进行支持。

  • Syntax Analysis
    定义:Extract tokens and sentences, identify parts of speech (POS) and create dependency parse trees for each sentence.
    即语法分析,大致包含上文所述步骤的1.1~1.5。
  • Entity Analysis
    定义: Inspects the given text for known entities (Proper nouns such as public figures, landmarks, and so on. Common nouns such as restaurant, stadium, and so on.) and returns information about those entities.
    即实体识别。
  • Sentiment Analysis
    定义:Understand the overall sentiment expressed in a block of text. Identify the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral.
    即情感分析。
  • Entity Sentiment Analysis
    定义:Understand the sentiment for each mention of an entity within a block of text.
    即针对实体的情感分析。

参考链接:
https://cloud.google.com/natural-language/docs/

1.2 Cloud Translation API

  • Text Translation
  • Language Detection

Comment: 机器翻译其实是NLP的一种实际应用。本文为了表现各厂商的布局情况,也简单列一下。

参考链接:
https://cloud.google.com/translate/

2. Microsoft Azure

2.1 Language Understanding Intelligent Service

定义:Enable developers to build smart applications that can understand human language and react accordingly to user request. Extract intents and entities that correspond to activities in client application's logic.
即意图+实体分析。

参考链接:
https://azure.microsoft.com/en-us/services/cognitive-services/language-understanding-intelligent-service/

2.2 Text Analytics API

  • Sentiment Analysis
    定义:Extract features from POS tags, and embedded words of the text, then using classification techniques to get a score which reflects the attitude of people.
    即情感分析。
  • Key Phrase Extraction
    定义:Extract key phrases to quickly identify the main points.

Notes: 该技术来自于Microsoft Office的NLP toolkit。

  • Language Detection
    定义:The API returns the detected language and a numeric score to indicate the certainty. 120 languages are supported.

参考链接;
https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/

2.3 Linguistic Analysis API

  • Sentence separation and tokenization
    定义:Break the text into sentences and tokens.
    即句子切分和词语标记。
  • Part-of-Speech Tagging
    即词性标注。
  • Constitency Parsing
    定义:Identify the phrases in the text. A phrase is a sequence of words. It can be moved together or replaced as a whole, and the sentence should remain fluent and grammatical.
    即语法分析。

参考链接:
https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/

2.4 Bing Spell Check API

定义:Help users correct spelling errors, recognize the difference among names, brand names, and slang, as well as understand homophones as they're typing.

Notes: 与Microsoft Word的常规拼写检查程序不同,Bing采用的是第三代系统。它的更新与壮大,不依赖词典及背后的维护人员,而是利用机器学习和基于统计的机器翻译、基于大量的网络搜索和文档来训练算法。

该API分为Proof和Spell两种模式。前者对于语法错误有着更高的捕捉率,但仅支持美式英文。

参考链接:
https://azure.microsoft.com/en-us/services/cognitive-services/spell-check/

2.5 Microsoft Translator API

Notes: 目前,该API还是基于统计的机器翻译(SMT)。这项技术在性能提升方面已进入稳定阶段,翻译质量较难有所突破。基于深度神经网络(DNN)的翻译技术蓄势待发,但截止至8月27日,该技术仅对Microsoft Translator Speech API的用户开放。目前,Skypy Translator 采用DNN翻译引擎,Bing Translator采用SMT翻译引擎。

  • Text Translation API
  • Speech Translation API
    定义:Transcribe conversational speech from one language into text of another language. The API also integrates text-to-speech capabilities to speak the translated text back.

Notes: 翻译的过程包括通过ASR从源语言音频识别出对应文本。微软在ASR的基础上,采用TrueText的新技术,来优化识别文本。TrueText支持过滤口水词、咳嗽、不敬词,也能进行标点及大小写的修正。

  • Collaborative Translation Framework Reporting API
    定义:Allowing users to recommend alternative translations to those privided by Translator's automatic translation engine.
  • *Microsoft Translator Hub
    定义:Let developers customize a language pair for a specific domain (area of terminology and style) or to build automatic translation for a language that is not yet covered by Microsoft Translation API.
    It is an extension of the Microsoft Translator API and service.

参考链接:
https://azure.microsoft.com/en-us/services/cognitive-services/translator-text-api/

2.6 Web Language Model API

定义:Automate a variety of standard natural language processing tasks.

  • Word Breaking
  • Joint Probabilities
    定义:Calculate how often a particular sequence of words appear together.
  • Conditional Probabilities
    定义:Given a sequence of words, calculate how often a particular word tends to follow.
  • Next word completions
    定义:Given a sequnce of words, get the list of words most likely to follow.

参考链接:
https://azure.microsoft.com/en-us/services/cognitive-services/web-language-model/

相关文章

  • 调研 | NLP云服务简报

    NLP 步骤拆解 原文:Overview of Artificial Intelligence and Role ...

  • NLP简报

    本文首发于微信公众号:NewBeeNLP 欢迎来到 NLP 时事简报!涵盖了诸如特定语言 BERT 模型、免费数据...

  • 身份证、银行卡OCR识别第三方调研

    由于项目需要,花了一个上午对百度云、阿里云、腾讯云提供的“身份证OCR、银行卡OCR”服务做了详细的调研。调研范围...

  • 技术调研 2017.11.8

    NLP的技术热点方向: 综述 调研综合工业界,学术界当前优秀学术成果,并参考当前ACL,NLP,EMNLP等会议优...

  • 微服务框架选择

    鉴于需要用阿里云来搭建服务,因此调研后,微服务组件选择如下:1)注册中心,暂定nacos,对比如下:

  • 云服务业界动态简报-20180304

    一、网宿 网宿科技宣布正式上线洛杉矶、新加坡和日本三大安全节点,大举拓展网宿云安全在海外地区市场及资源。 最新上线...

  • 云服务业界动态简报-20180318

    一 、UCloud UCloud分布式文件存储UFS新品发布。 UCloud File System (UFS) ...

  • 云服务业界动态简报-20180401

    一、腾讯云 腾讯云在香港的第二个数据中心、位于美国东部弗吉尼亚区域和美国西部硅谷区域的两个新增数据中心、印度数据中...

  • 云服务业界动态简报-20180422

    一、UCloud UCloud旗下私有云公司UMCloud宣布将与数人云正式合并,为企业客户提供IaaS+PaaS...

  • 云服务业界动态简报-20180520

    一、青云 1. 青云QingCloud 宣布开源分布式关系型数据库 RadonDB。 2. 青云QingCloud...

网友评论

      本文标题:调研 | NLP云服务简报

      本文链接:https://www.haomeiwen.com/subject/lejatxtx.html