Nicely done, non-technical piece
Visualizing Outliers by Nathan Yau in Flowingdata
Visualizing data that looks like it came straight out of Statistics 101 text book is nice and all — for teaching and learning purposes. You gotta learn to stand before you can run a marathon. Once you’re ready for the real data though, which is fuzzier and more irregular, you run into data points that don’t quite fit in with the rest. The outliers.
There are various ways to incorporate outliers into your visualization, but you have to understand them first.
Why is the outlier there in the first place? Maybe it’s a recording error or a kink in methodology. For example, PornHub claimed that a disproportionate percentage of traffic came from Kansas. However, location was based on IP addresses, and any locations that could not be identified defaulted to the center of the country. That spot was in Kansas.
Sometimes outliers might be an exception or something extraordinary. We see this in sports a lot, like when Stephen Curry broke the single-season three-point record. Or when Usain Bolt ran faster than everyone.
In one case, the outlier is noise relative to the rest of the data. In another the outliers deserve a closer look.
With your own data, figure out which is which first. Then decide if the outlier belongs in the background or foreground. The visualization options below will be much more useful. ... "
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment