/* ---- Google Analytics Code Below */

Sunday, December 13, 2020

Thinking about Approaching Tidy Data

Below an intro on the concept.   We laid out and used similar ideas, this organizes it well.  First stated by Hadley Wickham in his paper.    Hard to fully achieve because of context, but very useful. 

What is Tidy Data?  

A must-know concept for Data Scientists.   Outline by Benedict Neo   in Towards Data Science

Introduction

There’s a popular saying in Data Science that goes like this — “Data Scientists spend up to 80% of the time on data cleaning and 20 percent of their time on actual data analysis”. The origin of this quote goes back to 2003, in Dasu and Johnson’s book, Exploratory Data Mining and Data Cleaning, and it still true to this day.

In a typical Data Science project, from importing your data to communicating your results, tidying your data is a crucial aspect in making your workflow more productive and efficient. ... 

The process of tidying data would thus create what’s known as tidy data, which is an ideal first formulated by Hadley Wickham in his paper. So my article will be largely a summarization or extracting the essence of the paper if you will.

What is Tidy Data?

From the paper, the definition given is:

Tidy datasets provide a standardized way to link the structure of a dataset (its physical layout) with its semantics (its meaning)   To break down this definition, you have to first understand what structure and semantics means. ..."

No comments: