/* ---- Google Analytics Code Below */

Sunday, April 07, 2019

Bucketing Data Elements by Use

Further as a means to value data elements in context?  We tested something similar to this, which established data in context of use.

Machine learning moves popular data elements into a bucket of their own
Counting search queries isn’t easy, but MIT CSAIL’s new LearnedSketch system for “frequency-estimation” aims to help.

By Adam Conner-Simons | MIT CSAIL 
April 3, 2019

If you look under the hood of the internet, you’ll find lots of gears churning along that make it all possible.

For example, take a company like AT&T. They have to intimately understand what internet data are going where so that they can better accommodate different levels of usage. But it isn’t practical to precisely monitor every packet of data, because companies simply don’t have unlimited amounts of storage space. (Researchers actually call this the “Britney Spears problem,” named for search engines’ long-running efforts to tally trending topics.)

Because of this, tech companies use special algorithms to roughly estimate the amount of traffic heading to different IP addresses. Traditional frequency-estimation algorithms involve “hashing,” or randomly splitting items into different buckets. But this approach discounts the fact that there are patterns that can be uncovered in high volumes of data, like why one IP address tends to generate more internet traffic than another.  .... " 

See:
https://openreview.net/pdf?id=r1lohoCqY7
Learning Based Frequency Estimation Algorithms

By Chen-Yu Hsu, Piotr Indyk, Dina Katabi & Ali Vakilian

No comments: