/* ---- Google Analytics Code Below */

Tuesday, August 23, 2016

Data Snooping in Practice

Good piece on the topic, though I had never called it that.    Thought it was something about security.  Another example of the lack of standards in the industry. Too many terms have come from different disciplines.   Here described as a bias of human nature,    Also known as:   Data dredging, data fishing, data snooping, and p-hacking.

" ....Data snooping is essentially the practice of finding patterns in data that don’t actually reflect the real world. Data scientists may know it by other names, like overfitting the curve or confusing the noise for the signal. The simple definition makes it sound like data snooping would be fairly easy to avoid. However, because of the way the human brain works and how it’s wired to spot connections in seemingly disparate pieces of data and events, it’s one of the most difficult biases to eliminate.

Data scientists are particularly prone to data snooping bias when they’re doing freeform exploratory data analysis, as opposed to attempting to prove or disprove a hypothesis before digging into the data. Traditionally, the best way to eliminate the data snooping bias is to institute strict controls in their experiments before they begin. Chasing interesting results once the experiment has started is a good way to fall victim to the snoops.

Over the years, data snooping has been one of the toughest biases to correct for in the world of applied statistics. In particular, data scientists and statisticians who work in the financial field are more prone to data snooping than in other industries, argues MIT professor Andrew Lo.  .... " 

No comments: