Good, short, relatively non-technical view, but from a distinctly computer science perspective. Which answer the questions that many of us have. Why are we here, now, and did not achieve this two decades ago? And where should we expect to arrive in the coming years?
Deep Learning Hunts for Signals Among the Noise Neural networks can deliver surprising and sometimes unwanted results. By Chris Edwards in CACM ... Communications of the ACM, June 2018, Vol. 61 No. 6, Pages 13-14 10.1145/3204445
Over the past decade, advances in deep learning have transformed the fortunes of the artificial intelligence (AI) community. The neural network approach that researchers had largely written off by the end of the 1990s now seems likely to become the most widespread technology in machine learning. However, protagonists find it difficult to explain why deep learning often works well, but is prone to seemingly bizarre failures.
The success of deep learning came with rapid improvements in computational power that came through the development of highly parallelized microprocessors and the discovery of ways to train networks with enormous numbers of virtual neurons assembled into tens of linked layers. Before these advances, neural networks were limited to simple structures that were easily outclassed in image and audio classification tasks by other machine-learning architectures such as support vector machines.
Theorists have long assumed networks with hundreds of thousands of neurons and orders of magnitude more individually weighted connections between them should suffer from a fundamental problem: over-parameterization. There are so many weights that determine how much each neuron influences its neighbors that the network could simply find a way to encode the data used to train it. It would then correctly classify anything in the training set, but fail miserably when presented with new data.
In practice, deep neural networks do not fall easily into overparameterization; instead, they are surprisingly good at dealing with new data. When trained, they seem able to ignore parts of images used for training that had little bearing on classification performance, rather than trying to build synaptic connections to deal with them.
Stefano Soatto, professor of computer science at the University of California, Los Angeles (UCLA), explains "Most of the variability in images is irrelevant to the task. For instance, if you want to recognize a friend in a picture, you want to do so regardless of where she will be, how she will be dressed, whether she is partially occluded, what sensor will be used for the picture, etc. If you think of all the possible images of your friend, they are, for all practical purposes, infinite. So if you wanted a minimal representation—something that distills the essence of 'your friend' in every possible future image of her—that should be a much, much smaller object than an image." ... "
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment