MAD vs Tukey:
Choosing the Right Outlier Detection Method
Not all outlier detection methods are created equal. Learn when to use MAD (Median Absolute Deviation) versus Tukey's 1.5ΓIQR method, how each works, and which performs better for different data distributions.
1. What Are Outliers and Why Do They Matter?
Outliers are data points that deviate significantly from the rest of your dataset. They can represent:
- Data entry errors: Typos, misplaced decimal points, or incorrect measurements
- Rare events: Legitimate but unusual observations (e.g., a student scoring 100% on a difficult test)
- Measurement errors: Equipment malfunctions or environmental factors
- True anomalies: Real but exceptional values that require investigation
Detecting outliers is crucial because they can:
- Skew your statistics: Outliers can dramatically affect the mean and standard deviation
- Mislead your analysis: They can hide patterns or create false patterns
- Require investigation: Understanding why outliers exist can reveal important insights
2. Tukey's 1.5ΓIQR Method Explained
Tukey's method (also called the 1.5ΓIQR rule) is the most commonly used outlier detection method for box plots. It was developed by John Tukey in the 1970s as part of exploratory data analysis.
How It Works
- Calculate Q1 (first quartile) and Q3 (third quartile)
- Calculate IQR (Interquartile Range) = Q3 - Q1
- Calculate the lower fence = Q1 - 1.5 Γ IQR
- Calculate the upper fence = Q3 + 1.5 Γ IQR
- Any data point < lower fence or > upper fence is considered an outlier
π‘ Example
If Q1 = 20, Q3 = 40, then IQR = 20
Lower fence = 20 - 1.5 Γ 20 = -10
Upper fence = 40 + 1.5 Γ 20 = 70
Any value < -10 or > 70 is an outlier.
Pros and Cons
β Advantages
- Simple and intuitive
- Widely understood and accepted
- Works well for symmetric data
- Standard in box plot visualization
- Fast to calculate
β Limitations
- Assumes symmetric distribution
- Can flag too many points in skewed data
- Sensitive to extreme outliers
- May miss outliers in skewed distributions
3. MAD (Median Absolute Deviation) Method Explained
MAD (Median Absolute Deviation) is a robust outlier detection method that works better than Tukey's method for skewed or asymmetric data. It's based on the median rather than quartiles, making it more resistant to outliers.
How It Works
- Calculate the median of your data
- Calculate the absolute deviations from the median: |value - median|
- Calculate the MAD = median of absolute deviations
- Calculate the modified Z-scores using MAD as the scale
- Any point with |modified Z-score| > threshold (typically 3.5) is an outlier
π‘ Example
If median = 25, MAD = 5, threshold = 3.5
For a value of 45: modified Z-score = (45 - 25) / 5 = 4.0
Since |4.0| > 3.5, this value is an outlier.
Pros and Cons
β Advantages
- Robust to outliers (uses median, not mean)
- Works well for skewed data
- Less sensitive to extreme values
- Better for asymmetric distributions
- More accurate for non-normal data
β Limitations
- Less well-known than Tukey's method
- Slightly more complex to explain
- Requires choosing a threshold (typically 3.5)
- May be too conservative for some applications
4. Side-by-Side Comparison
| Aspect | Tukey (1.5ΓIQR) | MAD |
|---|---|---|
| Basis | Quartiles (Q1, Q3) | Median and absolute deviations |
| Best For | Symmetric, normal-like distributions | Skewed, asymmetric distributions |
| Robustness | Moderate (uses quartiles) | High (uses median) |
| Complexity | Simple (easy to explain) | Moderate (requires threshold) |
| Popularity | Very common (box plot standard) | Less common (growing in use) |
| Threshold | Fixed (1.5 Γ IQR) | Configurable (typically 3.5) |
5. When to Use Each Method
β Use Tukey's Method When:
- Your data is approximately symmetric
- You're creating standard box plots
- You need a simple, widely-understood method
- Your audience expects traditional box plots
- You're working with normally-distributed data
- You want consistency with standard practices
β Use MAD Method When:
- Your data is skewed or asymmetric
- You have many outliers that might affect quartiles
- You need a more robust method
- You're working with non-normal distributions
- You want better accuracy for skewed data
- You're analyzing data with potential contamination
6. Practical Examples
Example 1: Symmetric Data (Tukey Preferred)
Scenario: Test scores from a well-designed exam (approximately normal distribution).
Data:
75, 78, 80, 82, 85, 87, 90, 92, 95, 98
Result: Both methods work well, but Tukey's method is simpler and more standard for this case.
β Try this example in Outlier Calculator (switch between methods) βExample 2: Skewed Data (MAD Preferred)
Scenario: Income data (right-skewed distribution with a few high earners).
Data:
30, 35, 40, 45, 50, 55, 60, 65, 70, 200
Result: MAD method is more robust here. Tukey's method might flag the 200 as an outlier, while MAD considers the overall distribution better.
β Try this example in Outlier Calculator (compare methods) βExample 3: Data with Many Outliers
Scenario: Sensor readings with potential measurement errors.
Data:
12.1, 12.3, 12.5, 12.7, 12.9, 13.1, 13.3, 50.0, 55.0, 60.0
Result: MAD method is more robust because it uses the median, which is less affected by outliers. This makes it better at detecting true outliers in contaminated data.
β Try this example in Outlier Calculator (test both methods) β7. FAQ
Q: Which method is more accurate?
A: Neither is universally more accurate. Tukey's method is better for symmetric, normal-like distributions, while MAD method is better for skewed or asymmetric data. The "best" method depends on your data's distribution.
Q: Can I use both methods in PlotNerd?
A: Yes! PlotNerd's Outlier Calculator allows you to switch between Tukey and MAD methods in real-time. Simply select your preferred method from the dropdown in the results panel, and the chart will update instantly. This lets you compare how each method identifies outliers in your data.
Q: What's the MAD threshold in PlotNerd?
A: PlotNerd uses a default threshold of 3.5 for MAD outlier detection, which is the standard in statistical literature. This means any data point with a modified Z-score greater than 3.5 (in absolute value) is considered an outlier.
Q: Should I remove outliers after detecting them?
A: Not necessarily! Outliers can be legitimate data points that require investigation. Before removing them, consider:
- Are they data entry errors? (If yes, correct or remove)
- Are they rare but legitimate events? (Keep them, but note them)
- Do they represent important insights? (Investigate further)
- Do they significantly affect your analysis? (Consider robust methods)
Q: Can I use different methods for different groups in a grouped box plot?
A: For consistency, PlotNerd uses the same outlier detection method for all groups in a grouped box plot. This ensures fair comparison across groups. You can switch the method, but it will apply to all groups simultaneously.
8. Conclusion
Choosing between Tukey's 1.5ΓIQR and MAD outlier detection methods depends on your data's characteristics:
- Use Tukey's method for symmetric, normal-like distributions and standard box plots
- Use MAD method for skewed, asymmetric data or when you need more robust outlier detection
With PlotNerd, you can easily compare both methods in real-time, seeing how each identifies outliers in your specific dataset. This helps you choose the most appropriate method for your analysis.
Ready to Test Both Methods?
Try PlotNerd's outlier detection calculator to see how Tukey and MAD methods compare on your data.
Launch Outlier Calculatorπ Related Articles
- β Complete Guide to IQR Method Outlier Detection
- β How to Read a Box Plot: A Simple Guide for Students and Analysts
- β How to Compare Multiple Groups with Grouped Box Plots
- β Understanding Notched Box Plots: Statistical Significance Visualization
- β Why Are There So Many Quartile Methods? A Deep Dive into Tukey's Hinges
π οΈ Related Tools
- β Outlier Calculator β Compare Tukey and MAD methods side-by-side
- β Tukey Hinges Calculator β Calculate quartiles and create box plots