/* ---- Google Analytics Code Below */

Saturday, June 02, 2018

Machine Learning vs Statistical Modeling

Good thoughts here, its not always best to follow the hype.    Fairly technical, useful.

Road Map for Choosing Between Statistical Modeling and Machine Learning  by FHarrell

It is often good to let the data speak. But you must be comfortable in assuming that the data are speaking rationally. Data can fool you.

Whether using statistical modeling or machine learning, work with a methodologist who knows what she is doing, and don't begin an analysis without ample subject matter input.

Data analysis methods may be described by their areas of applications, but for this article I’m using definitions that are strictly methods-oriented. A statistical model (SM) is a data model that incorporates probabilities for the data generating mechanism and has identified unknown parameters that are usually interpretable and of special interest, e.g., effects of predictor variables and distributional parameters about the outcome variable. The most commonly used SMs are regression models, which potentially allow for a separation of the effects of competing predictor variables. SMs include ordinary regression, Bayesian regression, semiparametric models, generalized additive models, longitudinal models, time-to-event models, penalized regression, and others. Penalized regression includes ridge regression, lasso, and elastic net. Contrary to what some machine learning (ML) researchers believe, SMs easily allow for complexity (nonlinearity and second-order interactions) and an unlimited number of candidate features (if penalized maximum likelihood estimation or Bayesian models with sharp skeptical priors are used). It is especially easy, using regression splines, to allow every continuous predictor to have a smooth nonlinear effect.... "

No comments: