/* ---- Google Analytics Code Below */

Thursday, February 28, 2019

Beyond Worse Case Analysis

A look at the worst case in performance.  This deals more with optimization methods, those that can find specific correct examples.    So for example sorting a file is given as a common problem.    There is a precise correct result that is the sorted file.  Now how long does it take to sort very large files, with many sort parameters?   This is different for deep learning methods, where the result depends on some chosen solution architecture, and is not expected to be optimally and provably best.   (But as the paper suggests, often is!)  A number of interesting problems often addressed, like classifiers, are mentioned.   But I don't often expect a classifier to produce optimal answers.

 Video intro by the author.

Paper is technical.

Beyond Worst-Case Analysis By Tim Roughgarden 

Communications of the ACM, March 2019, Vol. 62 No. 3, Pages 88-96   10.1145/3232535

Comparing different algorithms is hard. For almost any pair of algorithms and measure of algorithm performance like running time or solution quality, each algorithm will perform better than the other on some inputs.a For example, the insertion sort algorithm is faster than merge sort on already-sorted arrays but slower on many other inputs. When two algorithms have incomparable performance, how can we deem one of them "better than" the other? ....... '

The author also talks about current 'Deep learning':

" ... To illustrate some of the challenges, consider a canonical supervised learning problem, where a learning algorithm is given a dataset of object-label pairs and the goal is to produce a classifier that accurately predicts the label of as-yet-unseen objects (for example, whether or not an image contains a cat). Over the past decade, aided by massive datasets and computational power, deep neural networks have achieved impressive levels of performance across a range of prediction tasks.25 Their empirical success flies in the face of conventional wisdom in multiple ways. First, most neural network training algorithms use first-order methods (that is, variants of gradient descent) to solve nonconvex optimization problems that had been written off as computationally intractable. Why do these algorithms so often converge quickly to a local optimum, or even to a global optimum?q Second, modern neural networks are typically over-parameterized, meaning that the number of free parameters (weights and biases) is considerably larger than the size of the training dataset. Over-parameterized models are vulnerable to large generalization error (that is, overfitting), but state-of-the-art neural networks generalize shockingly well.40 How can we explain this? The answer likely hinges on special properties of both real-world datasets and the optimization algorithms used for neural network training (principally stochastic gradient descent)  .... 

 ... With algorithms increasingly dominating our world, the need to understand when and why they work has never been greater. The field of beyond worst-case analysis has already produced several striking results, but there remain many unexplained gaps between the theoretical and empirical performance of widely used algorithms. With so many opportunities for consequential research, I suspect the best work in the area is yet to come. .... " 

Further thinking the implications of this.   But it does make us think about how such algorithms should be used, and their inherent risk.

No comments: