The Eponymous Pickle: Asessing AI System Perfomance

Saturday, October 01, 2022

Asessing AI System Perfomance

Much more at the link, useful consideration.

Microsoft Research Blog

Assessing AI system performance: thinking beyond models to deployment contexts

Published September 26, 2022

By Cecily Morrison , Principal Research Manager Martin Grayson , Principal Research Software Development Engineer Camilla Longden , Senior Research Software Development Engineer

AI systems are becoming increasingly complex as we move from visionary research to deployable technologies such as self-driving cars, clinical predictive models, and novel accessibility devices. Unlike singular AI models, it is more difficult to assess whether these more complex AI systems are performing consistently and as intended to realize human benefit.

What makes an AI system complex?

How do we know when these more advanced systems are ‘good enough’ for their intended use? When assessing the performance of AI models, we often rely on aggregate performance metrics like percentage of accuracy. But this ignores the many, often human elements, that make up an AI system.

Our research on what it takes to build forward-looking, inclusive AI experiences has demonstrated that getting to ‘good enough’ requires multiple performance assessment approaches at different stages of the development lifecycle, based upon realistic data and key user needs (figure 1).

Shifting emphasis gradually from iterative adjustments in the AI models themselves toward approaches that improve the AI system as a whole has implications not only in terms of how performance is assessed, but who should be involved in the performance assessment process. Engaging (and training) non-technical domain experts earlier (i.e., for choosing test data or defining experience metrics) and in a larger capacity throughout the development lifecycle can enhance relevance, usability, and reliability of the AI system. ... (more )

The Eponymous Pickle

About Me

RSS

Blog Archive

Saturday, October 01, 2022

Asessing AI System Perfomance

No comments: