Wednesday, April 27, 2016

Big Data's Problem

Good piece with an apt warning.  Its always about some form of sampling.  Even if you could get all observations of the data, say if you are controlling an experiment,  you won't have all the contextual metadata.   I have had many people say that they have all the data, and that's why big data will work.   Better than in older days when they said they had a handful of observations. But ...

Big Data’s Small Lie – The Limitation of Sampling and Approximation in Big Data Analysis by Alexander Gray

Volume is the most prominent of big data’s “3 Vs.” Yet, the “big” in big data analysis is often a misnomer. Most big data analysis doesn’t look at a complete, large dataset. Instead, it looks at a subsample and works on approximations, which prevents enterprises from getting the most valuable insight from their data. .... " 

