" ... Hi, you can learn a lot about your dataset by reviewing some basic descriptive statistics. ... The R platform provides a seemingly unlimited array of functions for poking and prodding your dataset in order to learn just that little bit more. .. "
My View: I am a big proponent of scanning data in this way. It addresses both correctness and basic descriptive measures. It also lets you quickly add some basic visualization to the data, which further lets you look for more subtlety in patterns. Also, I note that this does not need to be done in R, it can be done in Excel, Tableau, Spotfire, SPSS or many other methods. I have had experience with descriptive methods saving me a large amount of later, more complex efforts. Do try it. ...
1. Peek at the first few rows of your data
2 .Review the number of rows and columns you ave.
3. Review the data types of each column
4. Take a look at the class distribution (for classification problems)
5. Calculate a simple 5-number summary for each column
6. Review the standard deviations for each numerical column
7. Check the skewness of each column, handy to see what transforms to apply
8. Review the correlations between attributes
Learn the exact snippet of code to use for each statistic in the blog post:
Nicely done, for more writing by Jason, see: http://machinelearningmastery.com/author/jasonb/
And see his E book on the subject
Will report on the book in September.