Determining influence is a very frequent need for analytics in business. It can often guide us towards data collection. It is often a prelude to further analysis. Always thought it would be useful to have a more standard way of measuring this. Now the I-Score. Reading the paper and plan to follow.
In PhysOrg News: Researchers at Princeton, Columbia and Harvard have created a new method to analyze big data that better predicts outcomes in health care, politics and other fields.
The study appears this week in the journal Proceedings of the National Academy of Sciences. (Abstract of technical paper)
In previous studies, the researchers showed that significant variables might not be predictive and that good predictors might not appear statistically significant. This posed an important question: how can we find highly predictive variables if not through a guideline of statistical significance? Common approaches to prediction include using a significance-based criterion for evaluating variables to use in models and evaluating variables and models simultaneously for prediction using cross-validation or independent test data.
In an effort to reduce the error rate with those methods, the researchers proposed a new measure called the influence score, or I-score, to better measure a variable's ability to predict. They found that the I-score is effective in differentiating between noisy and predictive variables in big data and can significantly improve the prediction rate. For example, the I-score improved the prediction rate in breast cancer data from 70 percent to 92 percent. The I-score can be applied in a variety of fields, including terrorism, civil war, elections and financial markets. ... "