Google 可以依赖搜索引擎和大量应用的交互来训练自己的人工智能系统,
但 IBM 并没有那么多面向消费市场的应用和产品,那沃森人工智能系统是如何训练出来的?
https://www.zhihu.com/question/39279528#answer-27898289
……………………………………………………………………
1,
向Watson数据库喂入公开的海量非结构和半结构数据,就像搜索引擎建立索引一样。这一步是线下的,比如在Jeopardy秀之前就完成了这一步。
Huge amount of unstructured and semistructured data that is publicly available is feed into the Watson database, just like what a search engine does to build its index. This phase is offline, i.e. it is done before taking the jeopardy show.
2, 电视节目直播的时候,真人选手看到问题的同时,问题以文本的形式发送(输入)Watson。[Thanks to Marcus, see comment for the link]At the show, the questions are sent in text form to Watson, the same time human players see them.
3, 文本形式的问题作为搜索请求,在数据库中搜索,就像在Google里搜索一样。只有几百个最佳答案得以保留。
The questions in their text form are used as a search query to search the database, like you search it at Google. And only hundreds of the best search results are kept.
4, 搜索结果,和问题一起,被用来在数据库中重新检索支持证据。
The search results, together with the question, are used to retrieve support evidence from the database.
5,
每一个搜索结果,在回答问题的同时,也形成了一个假设,然后再根据重新索引的证据来评估这些假设。然后在多个维度上为这些答案评分。
Each search result, when answering its question, now forms a hypothesis. This hypothesis is then evaluated on the retrieved evidence.
And the answer is scored on many dimensions.
6, 使用合并算法,这些高纬度问题被排位,然后其中的某一个就赢了。
The hi-dimension scored answers are ranked using some merge algorithm, and then someone will win.
7,如果Watson对它最终得出来的答案足够自信,他就会尝试回答这个问题。当然,把答案转化成一个
对happy jeopardy的问题。
这段需要看过这个节目的人才理解,一点背景资料。(该节目的比赛以一种独特的问答形式进行,问题设置的涵盖面非常广泛,涉及到历史、文学、艺术、流行文化、科技、体育、地理、文字游戏等等各个领域。根据以答案形式提供的各种线索,参赛者必须以问题的形式做出简短正确的回答。与一般问答节目相反,《危险边缘》以答案形式提问、提问形式作答。参赛者需具备历
史、文学、政治、科学和通俗文化等知识,还得会解析隐晦含义、反讽与谜语等,而电脑并不擅长进行这类复杂思考。)
If Watson is confident enough with its final answer, it will try to answer that question. Of course, convert the answer into a question to happy jeopardy.
说了这么多,Watson是一个复杂的系统,以上描述的每一步都应用了各种算法。再就是整个系统在并行平台上运行以便用最快速度给出答案。
That said, Watson is a complicated system that each phase described above adopts various of algorithms. And the system runs on a parallel platform in order to give the answer as soon as possible.
更多信息,Google “Deep QA”。
For further information, Google DeepQA.
网友评论