Very good extensive piece.
A Deeper Understanding of Deep Learning By Don Monroe
Communications of the ACM, June 2022, Vol. 65 No. 6, Pages 19-20 10.1145/3530686
Deep learning should not work as well as it seems to: according to traditional statistics and machine learning, any analysis that has too many adjustable parameters will overfit noisy training data, and then fail when faced with novel test data. In clear violation of this principle, modern neural networks often use vastly more parameters than data points, but they nonetheless generalize to new data quite well.
The shaky theoretical basis for generalization has been noted for many years. One proposal was that neural networks implicitly perform some sort of regularization—a statistical tool that penalizes the use of extra parameters. Yet efforts to formally characterize such an "implicit bias" toward smoother solutions have failed, said Roi Livni, an advanced lecturer in the department of electrical engineering of Israel's Tel Aviv University. "It might be that it's like a needle in a haystack, and if we look further, in the end we will find it. But it also might be that the needle is not there."
A Profusion of Parameters
Recent research has clarified that learning systems operate in an entirely different regime when they are highly overparameterized, such that more parameters let them generalize better. Moreover, this property is shared not just by neural networks but by more comprehensible methods, which makes more systematic analysis possible.
"People were kind of aware that there were two regimes," said Mikhail Belkin, a professor in the Haliciolu Data Science Institute of the University of California, San Diego. However, "I think the clean separation definitely was not understood" prior to work he and colleagues published in 2019. "What you do in practice," such as forced regularization or early stopping of training, "mixes them up." ... '
No comments:
Post a Comment