Many aspects of modern applied research rely on an important algorithm called gradient descent. This is a procedure commonly used to find the largest or smallest values of a particular mathematical function—a process known as optimizing the function. It can be used to calculate anything from the most profitable way to manufacture a product to the best way to assign shifts to workers.
Yet despite this widespread utility, researchers have never fully understood under what circumstances the algorithm struggles the most. Now, new work explains it, setting that gradient descent to heart, tackling a fundamentally difficult computational problem. The new results place limits on the type of performance researchers can expect from the technology in particular applications.
Original story Reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to increase public understanding of science by covering research developments and trends in mathematics and the physical and life sciences.
Paul Goldberg of the University of Oxford, John Fernley and Rahul Savani of the University of Liverpool, and co-authors of the work with Alexandros Hollander of Oxford, “have a kind of worst-case scenario worth knowing about.” The result received the Best Paper award at the annual Symposium on the Theory of Computing in June.
You can imagine a function as a scenario, where the height of the land is equal to the value (“profit”) of the function at that particular location. Gradient descent finds the local minimum of the function by looking for the fastest ascent direction at a given location and downhill away from it. The slope of the landscape is called the slope, hence the name slope descent.
Gradient descent is an essential tool of modern applied research, but there are many common problems for which it does not work well. But prior to this research, there was no comprehensive understanding of what and when gradient descent conflicts—another area of computer science known as computational complexity theory—helped to answer.
“A lot of the work in gradient descent wasn’t talking with complexity theory,” said Kostis Daskalakis of the Massachusetts Institute of Technology.
Computational complexity is the study of resources, often computation time, required to solve or verify solutions to various computing problems. Researchers classify problems into different classes, with all problems in the same class sharing some fundamental computational characteristics.
To take an example – one that is relevant to the new paper – imagine a city with more people than households and everyone living in one house. You are given a phone book with the names and addresses of everyone in the city, and you are asked to find two people living in the same house. You know you might find the answer, because there are more people than households, but it may take some (especially if they don’t share a last name).
This question belongs to a complexity class called TFNP, which is short for “total function nondeterministic polynomial”. It is the collection of all computational problems whose solutions are guaranteed and whose solutions can be quickly checked for correctness. The researchers focused on the intersection of the two subsets of problems within the TFNP.
The first subset is called PLS (Polynomial Local Search). It is a set of problems that involve finding the minimum or maximum value of a function in a particular field. There are guaranteed to be answers to these problems that can be found through relatively straightforward reasoning.
One problem that falls into the PLS category is the task of planning a route that allows you to travel to a certain number of cities with the shortest travel distance, given that you can consistently run any pair of sequences. You can change the journey at any time. in tour. The length of any proposed route is easy to calculate and with a range of ways to change the itinerary, it’s easy to see which changes shorten the journey. You are guaranteed to eventually find a route that you cannot improve with an acceptable move – a local minimum.
The second subset of problems is PPAD (Polynomial Parity Logic on Directed Graphs). There are solutions to these problems that result from a more complex process called Brouwer’s fixed point theorem. The theorem states that for any continuous function, there is guaranteed to be a point that the function remains unchanged—a fixed point, as it is known. This is true in daily life. If you shake a glass of water, the theorem guarantees that there must be exactly one particle of water that will end up in the same place it started.
The intersection of PLS and PPAD classes produces a class of problems called PLS Int PPAD. There are a number of natural issues that researchers have to deal with. Until now, however, researchers have been unable to find a natural problem that is perfect for PLS int PPAD – which means it is an example of one of the most difficult problems in the classroom.
Prior to this article, the entire PPAD problem in PLS alone was an artificial construction مسئلہ a problem sometimes referred to as “no solution.” This problem combined a whole problem of PLS and a whole problem of PPAD, it is unlikely that a researcher will come out of this context. In a new article, the researchers proved that the descent of Milan is as difficult as any solution, which in itself completes the descent of Milan PLS int PPAD.
“[The nature of the calculation] is something that we as a species should try to understand in depth in all its forms. And I think that should be a good reason to be excited about this result,” said Columbia. Said Tim Rao Garden of the university.
This does not mean that the descent of the shield will always be controversial. In fact, it is as fast and efficient as ever for most uses.
Goldberg said, “There is a bit of a humorous stereotype about computational complexity that says that what we are often doing is a problem that takes a long time to solve in practice and is really difficult. Is.”
But the results mean that applied researchers should not expect that a gradual descent will provide accurate solutions to some problems where accuracy is important.
The question of accuracy is a central concern of computational complexity – an assessment of resource requirements. There is a fundamental relationship between accuracy and speed in many complex questions. In order for the algorithm to be effective, you must be able to increase the accuracy of the solution without paying too much for the time it takes to find the solution. The new results show that the Milan descent may not be a practical approach for applications that require highly accurate solutions.
For example, gradual descent is often used in machine learning in ways that do not require extreme accuracy. But a machine learning researcher wants to double the accuracy of an experiment. In this case, the new result means that they may have to quadruple the running time of their slope descending algorithm. It’s not ideal, but it’s not going to break the agreement.
But for other applications, such as in numerical analysis, researchers may need to rate their accuracy. To achieve such an improvement, they may have to square the running time of the slope, which makes calculations much more difficult.
“It puts a brake on who they can shoot for,” said Discalkis.
They have to, and in practice, compromise somewhere. They either accept less accurate solutions, limit themselves to simpler problems, or find a way to handle a cumbersome runtime.
But that’s not to say that there isn’t a high-speed algorithm for descending Milan. It may be possible. But the result shows that such an algorithm immediately shows the existence of a fast algorithm for all other problems in PLS int PPAD.
“There are a lot of problems that can be advanced in math,” said Discalkis. “That’s why we prefer a very natural problem, such as the Milan descent, which captures the complexity of the whole intersection.”
The original story was reprinted with the permission of Quanta Magazine, an editorially independent publication of the Simons Foundation, whose mission is to increase public understanding of science by covering research advances and trends in mathematics and physics and biology.