/* ---- Google Analytics Code Below */

Friday, October 05, 2018

Free Open Source Training Data

Always looking for good open source data, especially for testing models.  William Vorhies of of DSC talks this.  Sources and links.  Good examples:

Lots of Free Open Source Datasets to Make Your AI Better     Posted by William Vorhies  

Summary:  There are several approaches to reducing the cost of training data for AI, one of which is to get it for free.  Here are some excellent sources.

Recently we wrote that training data (not just data in general) is the new oil.  It’s the difficulty and expense of acquiring labeled training data that causes many deep learning projects to be abandoned. 
It also matters a great deal just how good you want your new deep learning app to be.  A 2016 study by Goodfellow, Bengio and Courville concluded you could get ‘acceptable’ performance with about 5,000 labeled examples per category BUT it would take 10 Million labeled examples per category to “match or exceed human performance”. 

There are a number of technologies coming up through research now that promise more accurate auto labeling to make creating training data less costly and time consuming.  Snorkel from the Stanford Dawn Project is one we covered recently.  This area is getting a lot of research attention.  ... " 

No comments: