/* ---- Google Analytics Code Below */

Saturday, June 29, 2019

Nearest Neighbor

A method we worked on for useful purpose from very early on.     In our applications we never needed optimal,  just good, because there was too much else in the context that made measures inaccurate.

Good Algorithms Make Good Neighbors   By Erica Klarreich 
Communications of the ACM, July 2019, Vol. 62 No. 7, Pages 11-13    10.1145/3329712

A host of different tasks—such as identifying the song in a database most similar to your favorite song, or the drug most likely to interact with a given molecule—have the same basic problem at their core: finding the point in a dataset that is closest to a given point. This "nearest neighbor" problem shows up all over the place in machine learning, pattern recognition, and data analysis, as well as many other fields.

Yet the nearest neighbor problem is not really a single problem. Instead, it has as many different manifestations as there are different notions of what it means for data points to be similar. In recent decades, computer scientists have devised efficient nearest neighbor algorithms for a handful of different definitions of similarity: the ordinary Euclidean distance between points, and a few other distance measures.

However, "every time you needed to work with a new space or distance measure, you would kind of have to start from scratch" in designing a nearest neighbor algorithm, said Rasmus Pagh, a computer scientist at the IT University of Copenhagen. "Each space required some kind of craftsmanship."

Because distance measures are so varied, many computer scientists doubted these ad hoc methods would ever give way to a more general approach that could cover many different distance measures at once. Now, however, a team of five computer scientists has proven the doubters—who originally included themselves—were wrong.

In a pair of papers published last year (in the Proceedings of the ACM Symposium on Theory of Computing and the IEEE Annual Symposium on Foundations of Computer Science, respectively), the researchers set forth an efficient approximation algorithm for nearest neighbor search that covers a wide class of distance functions. Their algorithm finds, if not the very closest neighbor, then one that's almost as close, which is good enough for many applications.

The distance functions covered by the new algorithm, called norms, "encompass the majority of interesting distance functions," said Piotr Indyk, a computer scientist at the Massachusetts Institute of Technology.

The new algorithm is a big leap forward, Pagh said, who added, "I wouldn't have guessed such a general result was possible."   ..... " 

(Links to technical issues below)

No comments: