Monday, August 01, 2016

Machine Learning and Data Leakage

Admit I knew the concept, but not the term itself.   Data Leakage.   Using data outside a training set to build a model. Well, surprise, this is in practice always the case.    It is usually needed for adjusting models to changes in contexts that did not even exist when the training data set was gathered.    In general we are not seeking fundamental truths, as one collaborator once said:  "This is not Physics we are doing."  Interesting related thoughts in this article.

