/* ---- Google Analytics Code Below */

Monday, September 23, 2019

Snorkel for Building Data for ML

This was new to me.  But handling and selecting the data is the most important aspect of machine learning projects.   In a recent project it included over 75% of the resource effort. And likely to be much more of the ongoing maintenance effort.  Worth a good look.

Introducing Snorkel
How this Tiny Project Solves One of the Major Problems in Real World Machine Learning Solutions

By Jesus Rodriguez Towards data Science.

Building high quality training datasets is one of the most difficult challenges of machine learning solutions in the real world. Disciplines like deep learning have helped us to build more accurate models but, to do so, they require vastly larger volumes of training data. Now, saying that effective machine learning requires a lot of training data is like saying that “you need a lot of money to be rich”. It’s true, but it doesn’t make it less painful to get there. In many of the machine learning projects we work on at Invector Labs, our customers spend significant more time collecting and labeling training dataset than building machine learning models. Last year, we came across a small project created by artificial intelligence(AI) researchers from Stanford University that provides a programming model for the creation of training datasets. Ever since, Snorkel has become a regular component of our machine learning implementations.  .... " 

No comments: