Computer Scientists Discover Limits of Major Research Algorithm (原文链接)
计算机科学家发现主要研究算法的局限性(英文翻译)
封面原文:
Many aspects of modern applied research rely on a crucial algorithm called gradient descent. This is a procedure generally used for finding the largest or smallest values of a particular mathematical function — a process known as optimizing the function. It can be used to calculate anything from the most profitable way to manufacture a product to the best way to assign shifts to workers.
Yet despite this widespread usefulness, researchers have never fully understood which situations the algorithm struggles with most. Now, new work explains it, establishing that gradient descent, at heart, tackles a fundamentally difficult computational problem. The new result places limits on the type of performance researchers can expect from the technique in particular applications.
“There is a kind of worst-case hardness to it that is worth knowing about,” said Paul Goldberg of the University of Oxford, co-author of the work along with John Fearnley and Rahul Savani of the University of Liverpool and Alexandros Hollender of Oxford. The result received a Best Paper Award in June at the annual Symposium on Theory of Computing.
翻译:
现代应用研究的许多方面都依赖于一种称为梯度下降法算法的关键算法。它通常用于寻找特定数学函数的最大值或最小值——这个过程也被称为优化函数。它常被用于各方面的计算,包括如何使生产线的盈利达到最大,到分配工人们轮班的最佳方式。
然而,尽管有这种广泛的使用性,研究人员从未完全了解算法在哪些情况下最为困难。现在,一项新的研究解释了这个问题,希望通过建立梯度下降法,从本质上处理一个困难的计算性难题。而新的结果表明研究人员在特定应用中预想的某类性能是受到限制的。
“我们需要了解存在着一种最糟糕情况下的困难。”牛津大学(University of Oxford)的Paul Goldberg说,他与利物浦大学(University of Liverpool)的John Fearnley、Rahul Savani以及牛津大学的Alexandros Hollender共同完成了这项工作。这个结果在六月的计算机理论年度研讨会上获得了最佳论文奖。
原文:
You can imagine a function as a landscape, where the elevation of the land is equal to the value of the function (the “profit”) at that particular spot. Gradient descent searches for the function’s local minimum by looking for the direction of steepest ascent at a given location and searching downhill away from it. The slope of the landscape is called the gradient, hence the name gradient descent.
Gradient descent is an essential tool of modern applied research, but there are many common problems for which it does not work well. But before this research, there was no comprehensive understanding of exactly what makes gradient descent struggle and when — questions another area of computer science known as computational complexity theory helped to answer.
“A lot of the work in gradient descent was not talking with complexity theory,” said Costis Daskalakis of the Massachusetts Institute of Technology.
Computational complexity is the study of the resources, often computation time, required to solve or verify the solutions to different computing problems. Researchers sort problems into different classes, with all problems in the same class sharing some fundamental computational characteristics.
To take an example — one that’s relevant to the new paper — imagine a town where there are more people than houses and everyone lives in a house. You’re given a phone book with the names and addresses of everyone in town, and you’re asked to find two people who live in the same house. You know you can find an answer, because there are more people than houses, but it may take some looking (especially if they don’t share a last name).
翻译:
你可以把一个函数想象成一块地形,其中地面的高度相当于该点的函数值(“profit”)。梯度下降法搜索函数的局部极小值,方法是在给定的位置寻找最陡峭的上坡方向,然后在远离这个方向的下坡处搜索。这种地形的斜坡被称为梯度,因此被称为梯度下降法。
梯度下降法是现代应用研究的一个重要工具,但有很多公共问题它并不能很好地解决。但在本研究之前,我们并没有全面的理解这个问题——究竟是什么让梯度下降法处于困难以及它发生在什么时候,而通过计算科学的另一个领域:计算复杂性理论或许可以帮助回答它。
麻省理工学院的 Costis Daskalakis 说: “梯度下降法的很多工作都没有谈及到复杂性理论。”
计算复杂性是一门研究在不同的计算问题中解决方案所需求的资源,这通常是计算时间。研究人员将问题分成不同的类,同一类中的所有问题共享一些基本的计算特性。
举一个与新论文有关的例子,想象在一个人口多于房屋,每个人都住在一所房子里的城市。你有一个电话簿,上面有城里每个人的姓名和地址,然后要你找到住在同一所房子里的两个人。答案是确定的,因为人比房子多,但这可能需要花费一番功夫(特别是如果他们没有共同的姓氏)。
原文:
This question belongs to a complexity class called TFNP, short for “total function nondeterministic polynomial.” It is the collection of all computational problems that are guaranteed to have solutions and whose solutions can be checked for correctness quickly. The researchers focused on the intersection of two subsets of problems within TFNP.
The first subset is called PLS (polynomial local search). This is a collection of problems that involve finding the minimum or maximum value of a function in a particular region. These problems are guaranteed to have answers that can be found through relatively straightforward reasoning.
One problem that falls into the PLS category is the task of planning a route that allows you to visit some fixed number of cities with the shortest travel distance possible given that you can only ever change the trip by switching the order of any pair of consecutive cities in the tour. It’s easy to calculate the length of any proposed route and, with a limit on the ways you can tweak the itinerary, it’s easy to see which changes shorten the trip. You’re guaranteed to eventually find a route you can’t improve with an acceptable move — a local minimum.
The second subset of problems is PPAD (polynomial parity arguments on directed graphs). These problems have solutions that emerge from a more complicated process called Brouwer’s fixed point theorem. The theorem says that for any continuous function, there is guaranteed to be one point that the function leaves unchanged — a fixed point, as it’s known. This is true in daily life. If you stir a glass of water, the theorem guarantees that there absolutely must be one particle of water that will end up in the same place it started from.
The intersection of the PLS and PPAD classes itself forms a class of problems known as PLS int PPAD. It contains many natural problems relevant to complexity researchers. However, until now, researchers were unable to find a natural problem that’s complete for PLS int PPAD — meaning that it is an example of the hardest possible problems that fall within the class.
翻译:
这个问题属于一个叫做 TFNP 的复杂类,是“全函数非确定性多项式(total function nondeterministic polynomial)”的缩写。它是所有计算问题的集合,这些问题保证有解并且其解可以被快速检查是否正确。研究人员将重点放在 TFNP 中两个问题子集的交集上。
第一个子集称为 PLS (多项式局部搜索)。这是一系列问题的集合,涉及到在特定区域中查找函数的最小值或最大值。这些问题肯定有答案,可以通过相对简单的推理找到答案。
一个属于 PLS 类别的问题是规划路线的任务,这个路线允许你以最短的旅行距离访问一些固定数量的城市,假设你只能通过改变旅行团中任意一对连续城市的顺序来改变行程。计算任何提议的路线的长度都很容易,而且在调整行程的方式有限的情况下,很容易看出哪些变化会缩短行程。你最终肯定会找到一条无法用一个可接受的移动来改进的路线——一个局部的最小值。
问题的第二个子集是 PPAD (有向图上的多项式奇偶参数)。这些问题的解决方案来自一个更复杂的过程,叫做布劳威尔不动点定理。这个定理说,对于任何连续函数都保证有一个点不会改变,即一个不动点。这在日常生活中是正确的。如果你搅动一杯水,这个定理保证了一定有一个水分子,最终会停留在它开始的地方。
PLS 和 PPAD 类本身的交集形成了一类称为“PLS int PPAD”的问题。它包含了许多与复杂性研究者相关的自然问题。然而,到目前为止,研究人员还没有找到一个自然的问题完整适用于 “PLS int PPAD”——这意味着它是这类中最难的问题之一。
梯度下降法的复杂性原文:
Prior to this paper, the only known PLS int PPAD-complete problem was a rather artificial construction — a problem sometimes called “Either-Solution.” This problem glued together a complete problem from PLS and a complete problem from PPAD, forming something a researcher would be unlikely to encounter outside this context. In the new paper, the researchers proved that gradient descent is as hard as Either-Solution, making gradient descent itself PLS int PPAD-complete.
“[The nature of computation] is something that we as a species should try to understand deeply in all of its many forms. And I think that should be reason enough to be excited about this result,” said Tim Roughgarden of Columbia University.
None of this means that gradient descent will always struggle. In fact, it’s just as fast and effective as ever for most uses.
“There’s a slightly humorous stereotype about computational complexity that says what we often end up doing is taking a problem that is solved a lot of the time in practice and proving that it’s actually very difficult,” said Goldberg.
But the result does mean applied researchers shouldn’t expect gradient descent to provide precise solutions for some problems where precision is important.
翻译:
截至本文发布之前,唯一已知完整的PLS int PPAD问题也是一个人为构造的问题——亦被称为“二选一问题(Either-Solution)”。这个问题将一个来自PLS的完整问题和一个来自PPAD的完整问题粘在一起,形成了一个研究人员在此之外不太可能遇到的问题。在新的论文中,研究人员证明了梯度下降法和“二选一问题”是一样困难的,从而认为梯度下降法本身符合完整的PLS int PPAD。
“(计算的性质)是我们作为一个物种应该努力深入理解其所有形式的一部分,”哥伦比亚大学的 Tim Roughgarden 说, “这个结果足以让我们而感到兴奋。”
但这并不意味着梯度下降法将永远处于困难中。事实上,对于大多数使用者来说,它仍然是快速有效的。
“有一种对计算复杂性略带幽默的成见,认为我们费尽力气所做的事情,就是把一个在实际中花了很多时间解决的问题拿出来,然后证明它实实在在非常困难。”Goldberg说道。
但是这个研究结果确实表明,应用研究人员不应该期望梯度下降法能够为一些精确度很重要的问题提供精确的解决方案。
原文:
The question of precision speaks to the central concern of computational complexity — the evaluation of resource requirements. There is a fundamental link between precision and speed in many complexity questions. For an algorithm to be considered efficient, you must be able to increase the precision of a solution without paying a correspondingly high price in the amount of time it takes to find that solution. The new result means that for applications which require very precise solutions, gradient descent might not be a workable approach.
For example, gradient descent is often used in machine learning in ways that don’t require extreme precision. But a machine learning researcher might want to double the precision of an experiment. In that case, the new result implies that they might have to quadruple the running time of their gradient descent algorithm. That’s not ideal, but it is not a deal breaker.
But for other applications, like in numerical analysis, researchers might need to square their precision. To achieve such an improvement, they might have to square the running time of gradient descent, making the calculation completely intractable.
“[It] puts the brakes on what [they] can possibly shoot for,” said Daskalakis.
They must, and in practice do, compromise somewhere. They either accept a less precise solution, limit themselves to slightly easier problems, or find a way to manage an unwieldy runtime.
But this is not to say a fast algorithm for gradient descent doesn’t exist. It might. But the result does mean that any such algorithm would immediately imply the existence of fast algorithms for all other problems in PLS int PPAD — a much higher bar than merely finding a fast algorithm for gradient descent itself.
“Many problems that may be some advance in mathematics could crack,” said Daskalakis. “That’s why we like to have a very natural problem like gradient descent that captures the complexity of the whole intersection.”
翻译:
精确性问题涉及到计算复杂性的核心问题——资源需求的评估。在许多复杂的问题中,精确和速度之间有着根本的联系。为了使算法被认为是有效的,必须要提高解决方案的精确度,而不是需要花费大量时间去寻找解决方案。新的结果意味着,对于需要非常精确解答的应用来说,梯度下降法可能不是一个可行的方法。
例如,梯度下降法通常用于机器学习,而不需要极端的精确度。但是一个机器学习的研究者可能想要把实验的精度提高一倍。在这种情况下,新的结果意味着他们可能需要将他们的梯度下降法算法的运行时间增加四倍。这虽然不理想,但也还可以接受。
但是对于其他应用,比如数值分析,研究人员可能需要使精度平方化。为了实现这样的改进,他们可能需要对梯度下降法的运行次数平方化,这会使得计算完全难以处理。
Daskalakis说: “(这)限制了(他们)可能达到的目标。”。
实际上,他们必须在某处妥协。要么接受不那么精确的解决方案,要么将自己限制在略简单的问题上,不然就找到一种方法来管理难处理的运行次数。
但这并不是说梯度下降法的快速算法不存在,它还是有可能的。这个研究结果只是表明,任何符合 PLS int PPAD 问题都会立即意味着存在着其他快速算法——这比只按梯度下降法找到一个快速算法的成本要高得多。
“许多问题都可以通过数学上的进步得以解决。”Daskalakis说,” 这就是为什么我们喜欢更自然的问题,像梯度下降法来捕捉整个交集的复杂性。”
网友评论