/* ---- Google Analytics Code Below */

Thursday, September 01, 2016

Value of Quick Descriptive Statistics

Just received from Jason Brownlee:

" ... Hi, you can learn a lot about your dataset by reviewing some basic descriptive statistics. ... The R platform provides a seemingly unlimited array of functions for poking and prodding your dataset in order to learn just that little bit more.   .. " 

My View:  I  am a big proponent of scanning data in this way.  It addresses both correctness and basic descriptive measures.  It also lets you quickly add some basic visualization to the data, which further lets you look for more subtlety in patterns.  Also,  I note that this does not need to be done in R, it can be done in Excel, Tableau, Spotfire, SPSS or many other methods.    I have had experience with descriptive methods saving me a large amount of later, more complex efforts.  Do try it. ...

Jason Brownlee sends along a note on now to do a simple descriptive review of data in R.  List below.    Jason summarizes:  " .. Below is my list of the 8 descriptive statistics I recommend you look at when reviewing your dataset in R:

1. Peek at the first few rows of your data
2 .Review the number of rows and columns you ave.
3. Review the data types of each column
4. Take a look at the class distribution (for classification problems)
5. Calculate a simple 5-number summary for each column
6. Review the standard deviations for each numerical column
7. Check the skewness of each column, handy to see what transforms to apply
8. Review the correlations between attributes

Learn the exact snippet of code to use for each statistic in the blog post:
  http://machinelearningmastery.com/descriptive-statistics-examples-with-r/

Nicely done, for more writing by Jason, see:   http://machinelearningmastery.com/author/jasonb/

And see his E book on the subject

Will report on the book in September.

No comments: