/* ---- Google Analytics Code Below */

Tuesday, November 27, 2018

Simpson's Paradox Again

Mentioned this topic once before, here is a more data oriented explanation,  less technical and practical to real world problems .  Everyone should understand this, but too rare to find even a data scientist with an understanding.   Nicely done.

Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data

Understanding a statistical phenomenon and the importance of asking why
Imagine you and your partner are trying to find the perfect restaurant for a pleasant dinner. Knowing this process can lead to hours of arguments, you seek out the oracle of modern life: online reviews. Doing so, you find your choice, Carlo’s Restaurant is recommended by a higher percentage of both men and women than your partner’s selection, Sophia’s Restaurant. However, just as you are about to declare victory, your partner, using the same data, triumphantly states that since Sophia’s is recommended by a higher percentage of all users, it is the clear winner.

What is going on? Who’s lying here? Has the review site got the calculations wrong? In fact, both you and your partner are right and you have unknowingly entered the world of Simpson’s Paradox, where a restaurant can be both better and worse than its competitor, exercise can lower and increase the risk of disease, and the same dataset can be used to prove two opposing arguments. Instead of going out to dinner, perhaps you and your partner should spend the evening discussing this fascinating statistical phenomenon.

Simpson’s Paradox occurs when trends that appear when a dataset is separated into groups reverse when the data are aggregated. In the restaurant recommendation example, it really is possible for Carlo’s to be recommended by a higher percentage of both men and women than Sophia’s but to be recommended by a lower percentage of all reviewers. Before you declare this to be lunacy, here is the table to prove it. ... "

No comments: