Yes, often a problem, often mainly for lack of sufficient data in context. Seems Google know this too. Key issue. Credibility/value is the key. Note COVID example. Technical.
Googlers Speak Out on the Scourge of ML Underspecification
Oliver Peckham in Datanami
A few days ago, 40 authors (all but a handful hailing from Google) published a 59-page paper. https://arxiv.org/pdf/2011.03395.pdf The topic at hand: why so many machine learning models, borne out by internal testing, proceed to then fail spectacularly in real-world applications. The answer, the Googlers say, is underspecification – a blight on machine learning that, they stress, requires substantive solutions.
“An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain,” they write. In plain language: an underspecified model can think of a bunch of reasonably accurate explanations for why a dataset looks the way it does. The problem comes in when researchers assume that all of those explanations are equivalently valid based solely on the model’s training results, without accounting for real-world factors that may have escaped the model’s training process. In those situations, the authors say, “ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains.”
By way of illustration, the Googlers highlight examples spanning “computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics.” In epidemiology, for instance, they discuss how early data from an epidemic (such as the COVID-19 pandemic) is easily explained by a variety of models that do not substantively account for major factors – such as the gradually diminishing number of susceptible people in an area as an epidemic infects (and then renders immune) larger and larger portions of the populace.
“Importantly, during the early stages of an epidemic … the parameters of the model are underspecified by this training task,” they write. “This is because, at this stage, the number of susceptible is approximately constant at the total population size (N), and the number of infections grows approximately exponentially.”
As a result, they say, “arbitrary choices in the learning process” determine which parameters are deemed most predictive by the model, despite different models predicting “peak infection numbers, for example, that are orders of magnitude apart.” .... '
The paper, titled “Underspecification Presents Challenges for Credibility in Modern Machine Learning,”
No comments:
Post a Comment