人工智能的一个关键点是其通用性 - 即做不同事情的能力。 目前的AI系统擅长掌握单一技能,如AlphaGo,Jeopardy,甚至直升机特技飞行。 但是,当你向一个AI系统问各种简单的问题,他会疯掉。 champion Jeopardy 程序不能进行人机对话,而且用于特技飞行的直升机控制程序无法在新的简单情况下导航,例如定位,导航和悬停于火上并且熄灭火。 相比之下,人类可以聪明地行动并适应各种新的,不可见的情况。 我们如何使我们的人工agents获得这种通用性?
目前有几种正在发展的技术用于解决这些问题,我将在这篇文章中对它们进行总结,并讨论我们实验室最近的一种技术,称为模型不可知的元学习。 (您可以在这里查看研究论文,并查看代码。)
目前的AI系统可以使用大量经验从头开始掌握复杂的技能。 但是,如果我们希望我们的agents能够获得许多技能并适应多种环境,我们无法承担重头开始训练的代价。 相反,我们需要我们的agents学习如何通过重复使用以前的经验更快地学习新任务,而不是孤立地学习每个新任务。 这种Learning to Learn或Meta Learning的方法是多功能agents的关键,他们可以在其整个生命周期中不断学习各种各样的任务。
So, what is learning to learn, and what has it been used for??
早期的元学习方法可以追溯到20世纪80年代末和90年代初, 包括Jürgen Schmidhuber’s thesis 和work by Yoshua and Samy Bengio. 最近,元学习已经成为一个热门话题,各种乱七八糟的论文 ,这项技术通常被用于 hyperparameter 和neural network optimization, 发现good network architectures, few-shot image recognition, 和fast reinforcement learning.
Few-Shot Learning
2015年,Brendan Lake等人。 发表了一篇论文,对现代机器学习方法提出挑战,能够从一个或几个实例中学习新的概念。 例如,Lake表示人类可以学习从单张图片中识别出“新奇的的两轮车”(如下图所示),而机器不能通过单个图像概括概念。 (在看到一个例子后,人类也可以用新的字母表绘制一个角色)。 在论文中,Lake提出了一个手写字符数据集Omniglot,即MNIST的“转置”,共有1623个字符类,每个字符类有20个例子。 在ICML2016会议上,两篇深度学习模型紧随其后发表,其中使用了memory-augmented neural networks和sequential generative models; 这表明深层模型有可能学习从几个例子中学习,尽管尚未在人类层面上学习。
How Recent Meta-learning Approaches Work
Meta-learning systems are trained by being exposed to a large number of tasks and are then tested in their ability to learn new tasks; an example of a task might be classifying a new image within 5 possible classes, given one example of each class, or learning to efficiently navigate a new maze with only one traversal through the maze. This differs from many standard machine learning techniques, which involve training on a single task and testing on held-out examples from that task.
Example meta-learning set-up for few-shot image classification, visual adapted from Ravi & Larochelle ‘17.
During meta-learning, the model is trained to learn tasks in the meta-training set. There are two optimizations at play – the learner, which learns new tasks, and the meta-learner, which trains the learner. Methods for meta-learning have typically fallen into one of three categories: recurrent models, metric learning, and learning optimizers.
Meta Learning中有meta-learner和learner两个模块
实现的方法主要有:循环模型、度量学习、优化器学习
Recurrent Models
These approaches train a recurrent model, e.g. an LSTM, to take in the dataset sequentially and then process new inputs from the task., In an image classification setting, this might involve passing in the set of (image, label) pairs of a dataset sequentially, followed by new examples which must be classified.
Recurrent model approach for inputs x and corresponding labels y , figure from Santoro et al. '16.
The meta-learner uses gradient descent, whereas the learner simply rolls out the recurrent network. This approach is one of the most general approaches and has been used for few-shot classification and regression, and meta-reinforcement learning. Due to its flexibility, this approach also tends to be less (meta-)efficient than other methods because the learner network needs to come up with its learning strategy from scratch.
Metric Learning
This approach involves learning a metric space in which learning is particularly efficient. This approach has mostly been used for few-shot classification. Intuitively, if our goal is to learn from a small number of example images, than a simple approach is to compare the image that you are trying to classify with the example images that you have. But, as you might imagine, comparing images in pixel space won’t work well. Instead, you can train a Siamese network or perform comparisons in a learned metric space. Like the previous approach, meta-learning is performed using gradient descent (or your favorite neural network optimizer), whereas the learner corresponds to a comparison scheme, e.g. nearest neighbors, in the meta-learned metric space. These approaches work quite well for few-shot classification, though they have yet to be demonstrated in other meta-learning domains such as regression or reinforcement learning.
比如训练图片相似度网络实现任意类别分类
Learning Optimizers
The final approach is to learn an optimizer. In this method, there is one network (the meta-learner) which learns to update another network (the learner) so that the learner effectively learns the task. This approach has been extensively studied for better neural networkoptimization. The meta-learner is typically a recurrent network so that it can remember how it previously updated the learner model. The meta-learner can be trained with reinforcement learning or supervised learning. Ravi & Larochelle recently demonstrated this approach’s merit for few-shot image classification, presenting the view that the learner model is an optimization process that should be learned.
学习一个优化器指导网络参数更新
Learning Initializations as Meta-Learning
Arguably, the biggest success story of transfer learning has been initializing vision network weights using ImageNet pre-training. In particular, when approaching any new vision task, the well-known paradigm is to first collect labeled data for the task, acquire a network pre-trained on ImageNet classification, and then fine-tune the network on the collected data using gradient descent. Using this approach, neural networks can more effectively learn new image-based tasks from modestly-sized datasets. However, pre-training only goes so far. Because the last layers of the network still need to be heavily adapted to the new task, datasets that are too small, as in the few-shot setting, will still cause severe overfitting. Furthermore, we unfortunately don’t have an analogous pre-training scheme for non-vision domains such as speech, language, and control.1 Is there something to learn from the remarkable success of ImageNet fine-tuning?
预训练网络也是一种Meta Learning
但是目前只在图像领域效果表较好
Model-Agnostic Meta-Learning (MAML)
What if we directly optimized for an initial representation that can be effectively fine-tuned from a small number of examples? This is exactly the idea behind our recently-proposed algorithm, model-agnostic meta-learning (MAML). Like other meta-learning methods, MAML trains over a wide range of tasks. It trains for a representation that can be quickly adapted to a new task, via a few gradient steps. The meta-learner seeks to find an initialization that is not only useful for adapting to various problems, but also can be adapted quickly (in a small number of steps) and efficiently (using only a few examples). Below is a visualization – suppose we are seeking to find a set of parameters θ that are highly adaptable. During the course of meta-learning (the bold line), MAML optimizes for a set of parameters such that when a gradient step is taken with respect to a particular task i (the gray lines), the parameters are close to the optimal parameters θ∗i for task i.
Diagram of the MAML approach.用多个任务预先学习参数,然后只需要少量的梯度更新就能适应新的任务
This approach is quite simple, and has a number of advantages. It doesn’t make any assumptions on the form of the model. It is quite efficient – there are no additional parameters introduced for meta-learning and the learner’s strategy uses a known optimization process (gradient descent), rather than having to come up with one from scratch. Lastly, it can be easily applied to a number of domains, including classification, regression, and reinforcement learning.
Despite the simplicity of the approach, we were surprised to find that the method was able to substantially outperform a number of existing approaches on popular few-shot image classification benchmarks, Omniglot and MiniImageNet2, including existing approaches that were much more complex or domain specific. Beyond classification, we also tried to learn how to adapt a simulated robot’s behavior to different goals, akin to the motivation at the top of this blog post – versatility. To do so, we combined MAML with policy gradient methods for reinforcement learning. MAML discovered a policy which let a simulated robot adapt its locomotion direction and speed in a single gradient update. See videos below:
The generality of the method — it can be combined with any model smooth enough for gradient-based optimization — makes MAML applicable to a wide range of domains and learning objectives beyond those explored in the paper.
We hope that MAML’s simple approach for effectively teaching agents to adapt to variety of scenarios will bring us one step closer towards developing versatile agents that can learn a variety of skills in real world settings.
I would like to thank Sergey Levine and Pieter Abbeel for their valuable feedback.
This last part of this post was based on the following research paper:
-
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.
C. Finn, P. Abbeel, S. Levine. In ICML, 2017. (pdf, code)
-
Though, researchers have developed domain-agnostic initialization schemes to encourage well-conditioned gradients and using data-dependent normalization. ↩
-
Introduced by Vinyals et al. ‘16 and Ravi & Larochelle ‘17, the MiniImageNet benchmark is the same as Omniglot but uses real RGB images from a subset of the ImageNet dataset.
伯克利Meta Learning课件
http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_16_meta_learning.pdf
网友评论