/* ---- Google Analytics Code Below */

Thursday, December 10, 2020

Updating AI Product Performance

Nicely done piece from NVIDIA Developers.

Updating AI Product Performance from Throughput to Time-To-Solution

By Shar Narasimhan | November 23, 2020  Tags: data center, Machine Learning and AI, MLPerf, NGC

Data scientists and researchers work toward solving the grand challenges of humanity with AI projects such as developing autonomous cars or nuclear fusion energy research. They depend on powerful, high-performance AI platforms as essential tools to conduct their work. Even enterprise-grade AI implementation efforts—adding intelligent video analytics to existing video camera streams or image classification picture searches or adding conversational AI to call center support—require incredibly accurate AI models across diverse network types that can deliver meaningful results, at high throughput and low latency.

That’s the key. To be useful and productive, the model must not only have high throughput, but it also must make correct predictions at that high throughput. In everyday terms, it’s just not useful to advertise a sports car that has a top speed over 200 mph, if it can only maintain that speed for a few seconds and can never complete a journey from point A to point B.

If you’re driving from Los Angeles to New York, you want a vehicle that can speedily and reliably make the trip from start to finish. Having a fast car that sputters out in the desert or mountains shortly after setting out isn’t the type of platform to depend on. Even if you’re an avid skier who’s brought your skis with you, getting stranded in the wilderness in the Rockies at the height of winter is going to get challenging quickly.

The next time you hear a slick promotion for a sports car that has a scorching top speed, hold onto your wallet for a second and ask yourself, “Will this get me to my destination or leave me stranded in the middle of nowhere?”

The same basic principle applies in training neural networks in AI. As obvious as this may sound, the AI industry continues to pay a lot of attention to just throughput. Every new announcement on the market starts off with a speed claim on ResNet-50, a well-established image classification network. But it’s one thing to briefly run that network at high throughout (images per second in the ResNet-50 case) for a momentary period of time, but is the network converging and making accurate predictions so it can be deployed as an actual product or service? That is, does the network reach its destination?

Here’s what convergence means in a neural network. In implementing AI, you have two essential steps: training and inference. Training a network is the first step, where you teach a network how to make predictions based on your dataset. Inference is the process of deploying a trained network where you use it in production to make predictions. To train a neural network, you feed in vast amounts of data and it starts to look for patterns, like how the human brain operates.

For example, you have an image classification network being trained to classify houses. There can be hundreds or thousands of layers with billions of parameters, each comprising a series of equations that build up to the statistical probability or likelihood of a specific type of prediction. You feed in a dataset of house images and the network scans the images for patterns. This starts to update weights in probability equations that drive predictions.  ... " 

No comments: