/* ---- Google Analytics Code Below */

Tuesday, September 15, 2020

Inconsistent Benchmarking Found

Important finding.   Further classification of form of inconsistency would also be useful for later pre checking new papers.
 Researchers find ‘inconsistent’ benchmarking across 3,867 AI research papers    By Kyle Wiggers in Venturebeat

The metrics used to benchmark AI and machine learning models often inadequately reflect those models’ true performances. That’s according to a preprint study from researchers at the Institute for Artificial Intelligence and Decision Support in Vienna, which analyzed data in over 3,000 model performance results from the open source web-based platform Papers with Code. They claim that alternative, more appropriate metrics are rarely used in benchmarking and that the reporting of metrics is inconsistent and unspecific, leading to ambiguities.

Benchmarking is an important driver of progress in AI research. A task (or tasks) and the metrics associated with it (or them) can be perceived as an abstraction of a problem the scientific community aims to solve. Benchmark data sets are conceptualized as fixed representative samples for tasks to be solved by a model. But while benchmarks covering a range of tasks including machine translation, object detection, or question-answering have been established, the coauthors of the paper claim some — like accuracy (i.e., the ratio of correctly predicted samples to the total number of samples) — emphasize certain aspects of performance at the expense of others. ... "

No comments: