top of page

How to Be Fooled by Data (Without Even Realising It)

  • Writer: thepearlyn8
    thepearlyn8
  • Feb 28, 2025
  • 2 min read

Ever looked at data and thought, "Cool, that makes sense," only to take a closer look and go, "Wait… what?!" Welcome to Simpson’s Paradox, where numbers seem to play mind games.

Let’s talk kidney stone treatments. Imagine doctors arguing over which method works best. One treatment looks like the clear winner—until you break the patients into groups, and suddenly, the “worse” option is actually better. How?! This is the kind of statistical weirdness that makes data analysis both frustrating and ridiculously fun.

So, grab a drink (preferably not one that causes kidney stones), and let’s unravel this sneaky little paradox together!


The Kidney Stone Treatment Dilemma

A study compared two kidney stone treatments: Treatment A and Treatment B.

If you looked at the overall success rates:

  • Treatment A: 78%

  • Treatment B: 83%

At first glance, Treatment B seems better, right? Not so fast.

When We Break It Down...

When we split the patients into two groups—small stones and large stones—something interesting happens:

  • For small stones:

    • Treatment A: 93% success

    • Treatment B: 87% success

    • (A is better)

  • For large stones:

    • Treatment A: 73% success

    • Treatment B: 69% success

    • (A is still better)

Hold up. Treatment A wins in both categories, but when we combine the data, Treatment B somehow looks better overall?! That’s Simpson’s Paradox in action.


The Real Culprit: Sample Size Imbalance

Observation:

  • Although both have a sample size of a total of 350,

  • For Small Stones, Treatment B has a much larger group size (270) compared to Treatment A (87).

  • For Large Stones, Treatment A handles significantly more patients (263) compared to Treatment B (80).


How Sample Size Skews Aggregated Data:

  • Treatment B appears better overall in aggregated data because:

    • It has more patients in the Small Stones category (where failure rates are naturally low).

  • Treatment A is disadvantaged because it handles more patients with Large Stones (a more complex and failure-prone category).


So what’s going on here? The trick is in the distribution of patients.

  • Treatment A was used more for large stone patients (harder to treat).

  • Treatment B was used more for small stone patients (easier to treat).

Since small stones naturally have a higher success rate, Treatment B looks better overall, even though Treatment A is actually better for both groups when you look at them separately.

Lesson learned: Always check if there’s an imbalance in your data before making conclusions!


Key Takeaways

  1. Aggregated data can be misleading – always check subgroups before drawing conclusions.

  2. Sample size imbalance can create statistical illusions – one group might dominate the dataset.

  3. Simpson’s Paradox is everywhere – from medical studies to business decisions and even sports!

So next time you see a "clear" trend in data, take a step back and dig deeper. Numbers don’t lie... but they can definitely be misleading!

References

Final Thoughts

Data is like a magic trick—it can mislead you if you don’t pay close attention. The next time you find yourself in a numbers debate, channel your inner detective and look beyond the surface!



 
 
 

1 Comment


bottom of page