问题构建 (Framing)

本单元探讨了如何将某个任务构建为机器学习问题，并介绍了各种机器学习方法中通用的很多基本词汇术语。

预计用时：2 分钟
学习目标

复习机器学习基本术语。

了解机器学习的各种用途。

0:00    Hi, my name is D. Sculley.
0:02    I'm one of the people who is coming to you from Google in order to present this Machine Learning Crash Course with TensorFlow APIs.
0:09    Now, before we dive in, let's take a second to remind ourselves of the basic framework that we are talking about in this class.
0:15    And that basic framework is supervised machine learning.
0:18    In supervised machine learning, we are learning to create models that combine inputs, to produce useful predictions even on previously unseen data.
0:28    Now, where we're training that model, we're providing it with labels.
0:33    And in the case of, say, email spam filtering, that label might be something like 'spam or not spam'.
0:40    It's the target that we're trying to predict.
0:43    The features are the way that we represent our data.
0:46    So features might be drawn from an email as, say, words in the email or "to and from addresses", various pieces of routing or header information, any piece of information that we might extract from that email. to represent it for our machine learning system.
1:03    An example, is one piece of data.
1:05    For example, one email.
1:08    Now that could be a labeled example in which we have both feature information, represented in that email, and the label value, of 'spam or not spam'.
1:18    Maybe that's come from a user who has provided that to us.
1:21    Or we could have an unlabeled example, such as a piece of email for which we have feature information, but we don't yet know whether it is spam or not spam.
1:29    And likely what we are going to do is classify that to put it in the user's inbox or spam folder.
1:35    Finally, we have a model and that model is the thing that is doing the predicting.
1:40    It's something that we're going to try and create through a process of learning from data.

0:00    大家好，我叫D. Sculley。
0:02    我是负责为大家讲解机器学习速成课程 与TensorFlow API的Google员工之一。
0:09    在深入学习本课程之前， 我们先简单了解本课将介绍的基本框架，
0:15    即监督式机器学习。
0:18    在监督式机器学习中， 我们将学习如何创建模型来结合输入信息， 对以前从未见过的数据做出有用的预测。
0:28    当我们训练该模型时，会为其提供标签。
0:33    以垃圾邮件过滤模型为例，标签可以是 “垃圾邮件或非垃圾邮件”等内容，
0:40    它是我们试图预测的目标。
0:43    特征是我们表示数据的方式。
0:46    特征可以从电子邮件中提取，例如， 电子邮件中的字词、“收件人和发件人地址”、
0:54    各种路由或标题信息， 以及任何可以从电子邮件中提取
0:59    并提供给机器学习系统的信息。
1:03    样本是一份数据。
1:05    例如，一封电子邮件。
1:08    它可以是有标签的样本， 其中包含电子邮件中呈现的特征信息， 以及标签值“垃圾邮件或非垃圾邮件”。
1:18    这些信息可能来自用户。
1:21    或者，它也可以是无标签样本， 例如，一封我们拥有关于它的特征信息、 但不知道它是否为垃圾邮件的电子邮件。
1:29    我们要做的很可能是对其进行分类， 将其放入用户的收件箱或“垃圾邮件”文件夹。
1:35    最后，我们会获得一个模型， 该模型是执行预测的工具。
1:40    我们将通过从数据中学习规律 这一过程来尝试创建模型。