/* ---- Google Analytics Code Below */

Tuesday, January 22, 2019

Labeled Datasets for Use AND Value

Note how this links to other goals, like understanding the value of datasets as an asset.  Why not use labeling as a means to attach to value analyses as well?   Labels are usually assigned for business purposes that do not work when linking to specific analytic approach. 

The Data Scientist’s Holy Grail — Labeled Data Sets  #ODSC  in Medium

The Holy Grail for data scientists is the ability to obtain labeled data sets for the purpose of training a supervised machine learning algorithm. An algorithm’s ability to “learn” is based on training it using a labeled training set — having known response variable values that correspond to a number of predictor variable values.

There are a number of common and maybe not-so-common methods for labeling a data set. In this article, we’ll run down a short list of such methods and then you can choose the best for your specific circumstances.

Readily Available Labeled Data Sets: 

Sometimes, labeled datasets are readily available as a byproduct of on-going business operations. For example, if a company is trying to predict customer churn (a very common classification problem), the company’s data assets will likely contain the label values: “churned,” or “not-churned” based on the customer’s account history. The company knows when the customer canceled their account, thus generating a churn transaction.

Sometimes, the label is not readily available and must be acquired or derived. For example, in a real estate application that wishes to predict the monthly rental value of a residential apartment building, the desired label may only come from a laborious process conducted by problem domain experts who can determine the value based on their industry knowledge. Sometimes finding label values can be time-consuming and labor-intensive, especially if a large amount of labeled data is needed for the project. ... " 

No comments: