说明
NLC服务使用机器学习算法返回短文本输入的匹配预定义类。创建和训练一个分类器,将预定义分类与示例文本连接起来,以便服务可以将这些分类器可以对新的输入进行分类
认证方式
使用HTTP Basic Authentication方式认证。 即用户名/密码方式
创建一个分类器
CURL命令
curl -u "USERNAME":"PASSWORD" ^
-F training_data=@weather_data_train.csv ^
-F training_metadata="{\"language\":\"en\",\"name\":\"atp-weather\"}" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers"
返回值
{
"classifier_id" : "359f3fx202-nlc-223328",
"name" : "atp-weather",
"language" : "en",
"created" : "2017-07-25T03:20:16.451Z",
"url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
"status" : "Training",
"status_description" : "The classifier instance is in its training phase, not yet ready to accept classify requests"
}
** 注意此时分类器的状态为训练中 暂时还不能使用。我们可以通过命令查看分类器状态**
查看分类器列表
CURL命令
curl -u "USERNAME":"PASSWORD" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers"
返回值
{
"classifiers" : [ {
"classifier_id" : "359f3fx202-nlc-223328",
"url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
"name" : "atp-weather",
"language" : "en",
"created" : "2017-07-25T03:20:16.451Z"
} ]
}
查看分类器信息
CURL命令
curl -u "USERNAME":"PASSWORD" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328"
返回值
{
"classifier_id" : "359f3fx202-nlc-223328",
"name" : "atp-weather",
"language" : "en",
"created" : "2017-07-25T03:20:16.451Z",
"url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
"status" : "Available",
"status_description" : "The classifier instance is now available and is ready to take classifier requests."
}
分类器有如下五种状态
- 1 Non Existent : 不存在
- 2 Training : 训练中
- 3 Failed:失败
- 4 Available:有效
- 5 Unavailable:无效
使用分类器进行分类
CURL命令
- Get方法分类 How how will it be today?
curl -G -u "USERNAME":"PASSWORD" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328/classify?text=How%20hot%20will%20it%20be%20today%3F"
- Post方法分类 How how will it be today?
curl -X POST -u "USERNAME":"PASSWORD" ^
-H "Content-Type:application/json" ^
-d "{\"text\":\"How hot will it be today?\"}" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328/classify"
返回值
{
"classifier_id" : "359f3fx202-nlc-223328",
"url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
"text" : "How hot will it be today?",
"top_class" : "temperature",
"classes" : [ {
"class_name" : "temperature",
"confidence" : 0.9929586035651006
}, {
"class_name" : "conditions",
"confidence" : 0.007041396434899482
} ]
}
使用分类器训练数据中未包含的词汇(sleet 为雨夹雪)
特意使用了temperature分类中包含的句式 how xxx it is today?
分类器还是准确将其分到condition类中了。
curl -X POST -u "username":"password" ^
-H "Content-Type:application/json" ^
-d "{\"text\":\"How sleet will it be today?\"}" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328/classify"
返回值
{
"classifier_id" : "359f3fx202-nlc-223328",
"url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
"text" : "How sleet will it be today?",
"top_class" : "conditions",
"classes" : [ {
"class_name" : "conditions",
"confidence" : 0.89688785244637
}, {
"class_name" : "temperature",
"confidence" : 0.10311214755363002
} ]
}
使用分类器完全无关的词汇 it is atp's notebook?
分类结果非常不理想 temperature类的置信度竟然高达82%
curl -X POST -u "74e23665-dfea-4bd6-ad80-3e9b4a7f7604":"RxFKejjwlUcA" ^
-H "Content-Type:application/json" ^
-d "{\"text\":\"it is atp's notebook?\"}" ^
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328/classify"
返回值
{
"classifier_id" : "359f3fx202-nlc-223328",
"url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/359f3fx202-nlc-223328",
"text" : "it is atp's notebook?",
"top_class" : "temperature",
"classes" : [ {
"class_name" : "temperature",
"confidence" : 0.8255246180698945
}, {
"class_name" : "conditions",
"confidence" : 0.1744753819301055
} ]
}
删除一个分类器
CURL命令
curl -X DELETE -u "{username}":"{password}"
"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/10D41B-nlc-1"
要点
- 置信度值表示为百分比,值越大表示置信度越高。响应最多包含 10 个类。
- 如果培训数据中的类少于10个,那么所有置信度值的和为 100%。例如只定义了两个类,就只能返回两个类。
- 其中一个样本问题包含未对分类器进行培训的词语(“foggy”)。您无须执行额外工作来识别这些“缺少”的词语,分类器对于这些词语就能获得不错的分数。请尝试使用包含培训数据中没有的词(例如,“sleet”或“storm”)的其他问题。
课题
- 1 支持语言 en之外还包含?
- 2 训练数据文本的格式 csv固定? csv的format也是固定?
- 3 分类器建成以后是否可以追加training数据
网友评论