The Eponymous Pickle: Performance

Showing posts with label Performance. Show all posts

Wednesday, April 05, 2023

As AI Continues to Surpass Human Performance, it's Time to Reevaluate Tests,

Evaluating Performance by AI, Humans. Implications? Most interesting. Well take worth a closer look.

As AI continues to surpass human performance, it's time to reevaluate tests , says expert Shana Lynch, Stanford University

Credit: Pixabay/CC0 Public Domain

How good is AI? According to most of the technical performance benchmarks we have today, it's nearly perfect. But that doesn't mean most artificial intelligence tools work the way we want them to, says Vanessa Parli, associate director of research programs at the Stanford Institute for Human-Centered AI and a member of the AI Index steering committee.

She cites the current popular example of ChatGPT. "There's been a lot of excitement, and it meets some of these benchmarks quite well," she said. "But when you actually use the tool, it gives incorrect answers, says thing we don't want it to say, and is still difficult to interact with."

In the newest AI Index, published on April 3, a team of independent researchers analyzed over 50 benchmarks in vision, language, speech, and more to find out that AI tools are able to score extremely high on many of these evaluations.

"Most of the benchmarks are hitting a point where we cannot do much better, 80-90% accuracy," she said. "We really need to be thinking about how we, as humans and society, want to interact with AI, and develop new benchmarks from there."

In this conversation, Parli explains more about the benchmarking trends she sees from the AI Index.

What do you mean by benchmark?

A benchmark is essentially a goal for the AI system to hit. It's a way of defining what you want your tool to do, and then working toward that goal. One example is HAI Co-Director Fei-Fei Li's ImageNet, a dataset of over 14 million images. Researchers run their image classification algorithms on ImageNet as a way to test their system. The goal is to correctly identify as many of the images as possible.

What did the AI Index study find regarding these benchmarks?

We looked across multiple technical benchmarks that have been created over the past dozen years— around vision, around language, etc.—and evaluated the state-of-the-art result in each benchmark year over a year. So, for each benchmark, were researchers able to beat the score from last year? Did they meet it? Or was there no progress at all? We looked at ImageNet, a language benchmark called SUPERGlue, a hardware benchmark called MLPerf, and more; some 50 were analyzed and over 20 made it into the report.

And what did you find in your research?

In earlier years, people were improving significantly on the past year's state of the art or best performance. This year across the majority of the benchmarks, we saw minimal progress to the point we decided not to include some in the report. For example, the best image classification system on ImageNet in 2021 had an accuracy rate of 91%; 2022 saw only a 0.1 percentage point improvement.

So we're seeing a saturation among these benchmarks—there just isn't really any improvement to be made.

Additionally, while some benchmarks are not hitting the 90% accuracy range, they are beating the human baseline. For example, the Visual Question Answering Challenge tests AI systems with open-ended textual questions about images. This year, the top performing model hit 84.3% accuracy. Human baseline is about 80%. ... '

Saturday, October 01, 2022

Asessing AI System Perfomance

Much more at the link, useful consideration.

Microsoft Research Blog

Assessing AI system performance: thinking beyond models to deployment contexts

Published September 26, 2022

By Cecily Morrison , Principal Research Manager Martin Grayson , Principal Research Software Development Engineer Camilla Longden , Senior Research Software Development Engineer

AI systems are becoming increasingly complex as we move from visionary research to deployable technologies such as self-driving cars, clinical predictive models, and novel accessibility devices. Unlike singular AI models, it is more difficult to assess whether these more complex AI systems are performing consistently and as intended to realize human benefit.

What makes an AI system complex?

How do we know when these more advanced systems are ‘good enough’ for their intended use? When assessing the performance of AI models, we often rely on aggregate performance metrics like percentage of accuracy. But this ignores the many, often human elements, that make up an AI system.

Our research on what it takes to build forward-looking, inclusive AI experiences has demonstrated that getting to ‘good enough’ requires multiple performance assessment approaches at different stages of the development lifecycle, based upon realistic data and key user needs (figure 1).

Shifting emphasis gradually from iterative adjustments in the AI models themselves toward approaches that improve the AI system as a whole has implications not only in terms of how performance is assessed, but who should be involved in the performance assessment process. Engaging (and training) non-technical domain experts earlier (i.e., for choosing test data or defining experience metrics) and in a larger capacity throughout the development lifecycle can enhance relevance, usability, and reliability of the AI system. ... (more )

Thursday, September 03, 2020

Improving Performance Engineering with Machine Learning

Had never heard this specifically positioned this way, nice idea. Needs some more detail to explain, how it has been done, but a great start. We did a form of this with business process modelling, and the integration with machine learning to determine how elements of the process performed. Not sure if this is quite the same thing. Like the anomaly reference.

Machine Learning: How it Improves Performance Engineering in DSC Posted by Ryan Williamson

Enterprise software, as well as other kinds, remains a complicated endeavor, thus necessitating the use of modern means to gauge, analyze, and adapt their performance. And one of the most popular technologies in the performance engineering market right now is machine learning. Since it has demonstrated an unparalleled ability to not only help foresee performance issues and fix them. When used in the right manner — this combination can also help performance engineering teams to steer clear of any issues at all completely. It is because machine learning comes equipped with the ability to interpret and analyze data in real-time, thus delivering valuable insights about the system’s performance.

However, if you are going to truly leverage machine learning’s abilities in the context of performance engineering, it is first essential to understand the basics. Through this article, let’s discuss the kind of performance anomalies one can encounter.

Point anomalies: This is when there is only one data point that is distinct from the entire set.
Contextual anomalies: In this scenario, the anomaly is contextual, i.e., exists only in a particular context.
Collective anomalies: This refers to a data set that exhibits signs of an anomaly. .... "

Saturday, August 22, 2020

AI versus Human Perception Performance

Interesting challenge because we always emphasize that we need to be able to measure something to use/improve it. Which leads to our design of the measurement system. My response is that you can build measurement systems for particular use contexts. Reading the noted paper which emphasises " .. compare deep neural networks and the human vision system ... " which is a very broad statement for the problem. Like the discussion here.

Why AI and human perception are too complex to be compared By Ben Dickson in Tnw
Human-level performance. Human-level accuracy. Those are terms you hear a lot from companies developing artificial intelligence systems, whether it’s facial recognition, object detection, or question answering. And to their credit, the recent years have seen many great products powered by AI algorithms, mostly thanks to advances in machine learning and deep learning.

But many of these comparisons only take into account the end-result of testing the deep learning algorithms on limited data sets. This approach can create false expectations about AI systems and yield dangerous results when they are entrusted with critical tasks.

In a recent study, a group of researchers from various German organizations and universities has highlighted the challenges of evaluating the performance of deep learning in processing visual data. In their paper, titled, “The Notorious Difficulty of Comparing Human and Machine Perception,” the researchers highlight the problems in current methods that compare deep neural networks and the human vision system. ... "

In their research, the scientist conducted a series of experiments that dig beneath the surface of deep learning results and compare them to the workings of the human visual system. Their findings are a reminder that we must be cautious when comparing AI to humans, even if it shows equal or better performance on the same task. ... "

Monday, April 06, 2020

Amazon Develops AI to Improve Knowledge Graph Performance

More about knowledge graphs, their quick manipulation can be key to provide intelligence to the edge for assistance.

Amazon researchers develop AI that improves knowledge graph performance
By Kyle Wiggers in VentureBeat

In a new study researchers at Amazon describe a technique that factors in information about knowledge graphs to perform entity alignment, which entails determining which elements of different graphs refer to the same “entities” (which might be anything from products to song titles). The idea is to improve computational efficiency while at the same time improving performance, speeding up graph-related tasks like product searches on Amazon and question answering via Alexa.

The work, which was accepted to the 2020 Web Conference, might also benefit graphs beyond Amazon, such as those that underpin social networks like Facebook and Twitter, as well as graphs used by enterprises to organize various digital catalogs. .... "

Saturday, September 07, 2019

A/B Testing for Startups

A/B Testing .... of experimentation for Startups. Intriguing paper.

Experimentation and Startup Performance: Evidence from A/B Testing by Rembrand Koning, Sharique Hasan, and Aaron Chatterji in HBS Working Knowledge

Is experimentation the right strategy for startups? This analysis of the adoption of A/B testing technology by 35,000 global startups provides evidence that a strategy based on repeated experimentation will improve performance over time. However, the benefits of experimentation vary. Experimentation helps younger startups “fail faster,” while older firms may discover new, high-growth products.

Author Abstract
Recent work argues that experimentation is the appropriate framework for entrepreneurial strategy. We investigate this proposition by exploiting the time-varying adoption of A/B testing technology, which has drastically reduced the cost of experimentally testing business ideas. This paper provides the first evidence of how digital experimentation affects the performance of a large sample of high-technology startups using data that tracks their growth, technology use, and product launches. We find that, despite its prominence in the business press, relatively few firms have adopted A/B testing.

However, among those that do, we find increased performance on several critical dimensions, including page views and new product features. Furthermore, A/B testing is positively related to tail outcomes, with younger ventures failing faster and older firms being more likely to scale. Firms with experienced managers also derive more benefits from A/B testing. Our results inform the emerging literature on entrepreneurial strategy and how digitization and data-driven decision-making are shaping strategy .... '

Thursday, April 18, 2019

DJ's Spin Code

I changed the title, they will likely not write code in the future, they will create algorithms by some interface other than 'writing code'. Coding, as it has developed, is far too inefficient and error prone.

DJs of the Future Don't Spin Records—They Write Code
in Wired By Michael Calore

Artists in the underground electronic music culture are performing live-coding shows or "algoraves," in which they program software algorithms to create new forms of music. Musicians synthesize individual sounds on their computers, then direct the software to string those sounds together based on a set of predefined rules; the end product has the artist's signature, but is algorithmically sculpted. When the same routine is run again, the song will sound familiar and contain the same elements, but the composition will be structured differently. Performances often are enhanced with screens displaying the running code as trippy visuals. A popular venue for this emergent art form is the Algorithmic Art Assembly, a two-day festival in San Francisco ..... "

Wednesday, June 06, 2018

How To Build a CX Dashboard that Drives Performance

Some useful thoughts.

How To Build a CX Dashboard that Drives Performance By Sue Duris in CustomerThink

Organizations are always looking for that magic pill that proves a particular initiative delivers value, and, more importantly, ROI.

I receive a lot of questions on what are the best metrics to use to monitor CX performance and drive CX optimization. That’s a tough question because each company is different. It depends on your customer mix, the outcomes you desire, and other factors.

One point is clear though and that is you must understand your data and what is behind the numbers before you can take action to optimize the CX.

The old management adage, you can’t manage what you don’t measure, applies here.

In a 2016 study conducted by Forrester, 39% of respondents admitted that they don’t regularly ask customers about their interactions. Worse, 77% don’t regularly track the drivers of CX in their organization. Without measuring CX, companies can’t know what customers care about and where the CX can be improved.

Yet, even when companies are paying attention to metrics, they are doing nothing with the data. In the same survey, Forrester reported that 79% of respondents don’t act on CX metrics or share them with all employees regularly. That means leadership and staff don’t know what is broken and what to do to make improvements.

What are companies to do?

Putting together an effective CX dashboard is a great place to start.

A CX dashboard provides many benefits.

-It provides a snapshot to help you take the pulse of your CX at a particular moment in time.

-A CX dashboard is a great tool to help you make sense of all your data holistically, in an easy-to-understand way. It uses the data to tell a complete story in a visually appealing manner, to illustrate what is going on with your CX. This visibility into your CX serves as an alert to pinpoint opportunities where you can improve the CX.

-It helps you determine trends so you can monitor progress.

-It raises awareness of CX across the entire organization so employees understand the importance of CX and how it is attributing to the achievement of your business outcomes.

-A CX dashboard gives you a roadmap to help you prioritize so you focus on what’s important.

The main goal for CX leaders is to transform customers into lifetime brand advocates. Thinking of a CX dashboard in those terms will help you be successful.

So, what are the best practices for building a CX dashboard that drives performance?

1. Begin with the end in mind.

What does CX success look like? What business and customer outcomes do you want? Knowing the results you want will help you determine the inputs and outputs you need to achieve them.

Make sure your dashboard is relevant and actionable.

Also, make sure you are regularly monitoring the performance of your dashboard. Look at trends and conduct root cause analyses. This will help you understand what is behind the numbers, so you know why numbers are improving or declining. Without this intelligence, you will not know how to improve the CX. .... "

Wednesday, May 23, 2018

Benchmark Suite for Assessing Machine Learning

Ultimately key to making this work. Measuring the results, starting with benchmarks.

How to Evaluate Machine Learning? U of Toronto Research Supports Latest Benchmark Initiative
U of Toronto News By Nina Haikara

An industrial-academic consortium that includes Google, the University of Toronto (U of T) in Canada, and Harvard and Stanford universities is developing a new benchmark suite for assessing machine learning (ML) performance. U of T's Gennady Pekhimenko says the MLPerf consortium is investigating two benchmarking areas--an "open" category in which any model can be applied to a fixed dataset, and a "closed" category in which both model and datasets are fixed, making execution time, power requirements, and design-cost evaluations helpful. Pekhimenko notes his laboratory has developed an open source benchmark suite called TBD (To Be Determined) as a training benchmark for deep neural networks. "We're interested in understanding how well available hardware and software perform, but we also look at both hardware and software efficiency," he says. "We then provide hints to the ML developers, so they can make their networks more efficient, and hence develop new algorithms and insights faster." .... '

Friday, July 21, 2017

AI Exceeding Human Performance

Predictions are cheap, machines have been faster than humans for some time, but when inserted in a set of tasks that that need to be done can be very different. Thoughtful ideas here:

Intelligent Machines
Experts Predict When Artificial Intelligence Will Exceed Human Performance

Trucking will be computerized long before surgery, computer scientists say. by Emerging Technology from the arXiv May 31, 2017

Artificial intelligence is changing the world and doing it at breakneck speed. The promise is that intelligent machines will be able to do every task better and more cheaply than humans. Rightly or wrongly, one industry after another is falling under its spell, even though few have benefited significantly so far.

And that raises an interesting question: when will artificial intelligence exceed human performance? More specifically, when will a machine do your job better than you?

Today, we have an answer of sorts thanks to the work of Katja Grace at the Future of Humanity Institute at the University of Oxford and a few pals. To find out, these guys asked the experts. They surveyed the world’s leading researchers in artificial intelligence by asking them when they think intelligent machines will better humans in a wide range of tasks. And many of the answers are something of a surprise. .... "

Thursday, June 01, 2017

Understanding Classification Performance

And as long as you are doing any kind of machine learning, you need to measure and understand its performance. Since I am doing that now, I noted Jason Brownlee's piece noted below:

How to Report Classifier Performance with Confidence Intervals
by Jason Brownlee in Machine Learning Process

Once you choose a machine learning algorithm for your classification problem, you need to report the performance of the model to stakeholders. ... This is important so that you can set the expectations for the model on new data. ... A common mistake is to report the classification accuracy of the model alone. .... "

Monday, May 08, 2017

Gartner on Performance

Yes, we are fallible. Worth understanding how.

Optimize Performance through a Positive Interplay of Cognition, Emotion and Creativity
by Manjunath Bhat | April 21, 2017 | Submit a Comment

It has long been said that technology is not a cure for human fallibility. In this blog post, I delve upon one of the leading causes for human fallibilities in the digital age – the overheads associated with emotional “buffer overflow” and “context switch”. ... "

Monday, January 09, 2017

Performance Management

McKinsey on Performance Manangement.

Ahead of the curve: The future of performance management
By Boris Ewenstein, Bryan Hancock, and Asmus Komm

Most corporate performance-management systems don’t work today, because they are rooted in models for specializing and continually optimizing discrete work tasks. These models date back more than a century, to Frederick W. Taylor.

Over the next 100 years, performance-management systems evolved but did not change fundamentally. A measure like the number of pins produced in a single day could become a more sophisticated one, such as a balanced scorecard of key performance indicators (KPIs) that link back to overarching company goals. What began as a simple mechanistic principle acquired layers of complexity over the decades as companies tried to adapt industrial-era performance systems to ever-larger organizations and more complicated work. .... "

Wednesday, September 21, 2016

GE Buys Machine Analytics Firm

Today, GE Digital, the company’s software arm, said it acquired Meridium, Inc., a leading developer of asset performance management (APM) software for machine-heavy industries such as oil, gas, electricity and chemicals. The deal values Meridium, based in Roanoke, Virginia, at $495 million.

GE first invested in Meridium in 2014, buying a quarter of the company. Today it purchased the remaining stake. “As we forge ahead in the Industrial Internet journey, APM is clearly the first application that can leverage the Predix platform to help industrial customers benefit from increased productivity,” said Bill Ruh, CEO of GE Digital. ... "

Monday, June 27, 2016

IOT for Product Performance

Using IoT Data to Understand How Your Products Perform

" .... Yet it would be a mistake to think the IoT is a game only for high rollers and crack technologists. Our research and client engagement experience has shown us that generating strong returns from the digital sensors, wireless communications devices, digital cameras installed in buildings and other smart, connected devices does not come down to writing big checks or being technologically savvy. The companies with the greatest value from IoT to date are the best at dealing with how products are performing for customers. .... "

Sunday, June 26, 2016

Performance Metrics

Bernard Marr concisely writes on the 'little data' of performance metrics. This links well with my constant refrain, the data, big or otherwise, however cleverly analyzed, is only good as far as it relates to business process. And the key measure of process is performance. See my previous writings on KPI at the link below.

Tuesday, August 18, 2015

Tracking Human vs Cognitive Computing Performance

In the Linkedin CSIG (Cognitive Systems Institute) Group, request for more information in tracking people vs machine job performance levels, via Jim Spohrer. There is considerable additional information at the link about related work to date and goals, Ultimately this creates a kind of 'Rosetta Stone' of links between system and people performance components of real jobs. Want to join in? add yourself to the group.

CSIG "Algorithm" and tracking cognitive system performance levels
The CSIG (Cognitive Systems Institute Group) is interested in tracking cognitive computing components and their performance relative to novice and expert human performance levels.

CSIG mission is to develop cognitive assistants for all occupations to boost creativity and productivity of people in smart service systems. This requires creating an inventory of cognitive components - the capability of cognitive systems compared to human performance measures. .... "

Tuesday, February 17, 2015

Product Development with Virtual Prototypes

In February IEEE Computing Now:

Product Development with Virtual Prototypes
Douglass Post discusses how the use of virtual prototypes analyzed with physics-based performance prediction tools is a potential game changer for product development. .... "

Thursday, February 05, 2015

Expert is not Good Enough: Inserting Advice

Professor Wayne Gray of Rensselaer Polytechnic Institute gave an excellent talk today at the Cognitive Systems Institute Thursday talk series: Expert is not good enough! Asymptotes, Plateaus, and Limits to Everyday Human Performance. Slides here. (Audio to follow here). Lots of good links to research in the performance space.

Also there is a Linkedin conversation ongoing that I am participating in. This relates to how cognitive systems will be able to contribute to human performance. I am also adding the subtopic of "Advisory Insertion", which is becoming more important as advisers like Siri and Cortana start to emerge and evolve. How will this phenomenon add to human performance?

Wednesday, February 04, 2015

Strap Software and Analytics Platform

Strap. Brought to my attention. A local startup of interest. More to follow. " ... The only platform that goes everywear ... "

" ... Strap is a software and analytics platform that unleashes the potential of wearables for developers, enterprises, and brands. The flagship offering, Strap Metrics, is a cross-platform analytics SDK for wearable applications that runs on smart wearable devices, and offers unique data insights about users and app performance. Strap was founded in early 2014 by Steve Caldwell, Patrick Henshaw, and Joey Brennan, and is headquartered in Cincinnati, Ohio. ... "

About Me

RSS

Blog Archive