When a data set has outliers, some spread measures are unreliable.
To understand this, let's consider the following data set:
Most of the numbers in this data set are between and Therefore, we'd expect any measure of spread for this group to be fairly small.
Let's now compute the range and mean absolute deviation (MAD) for this data set:
- We compute the range as follows: Due to the outlier of the range is large. Therefore, the range is not a reliable measure of the data set's spread.
- Let's now compute the MAD. First, we compute the mean: Notice that the mean is affected by the outlier. Now, computing the MAD, we get So, the outlier inflates the mean, and this causes the MAD to become very high. Therefore, the MAD is not a reliable measure of the data set's spread.
Let's now compute the interquartile range (IQR) of our data set:
Dividing the dataset into the upper half and the lower half, we have the following: The lower quartile is the median of the lower half, which is The upper quartile is the median of the upper half, which is So, the interquartile range is
This example shows that the IQR gives a more reliable measure of the spread than the range and MAD.
To summarize:
The range is sensitive to outliers.
The MAD is sensitive to outliers. Also, since it depends on the mean, the MAD is sensitive to skew.
The IQR is resistant to outliers. In addition, the IQR is also resistant to skew.
We should also bear in mind the following:
If a data set is symmetric, we use the mean to compute the center and the MAD to calculate the spread.
If a data set contains outliers or is skewed, we use the median to compute the center and the IQR to calculate the spread.
In practice, the range is rarely used because it's so sensitive to outliers and only considers two values in the data set.
In which of the following distributions might it be preferable to use the interquartile range (IQR) instead of the mean absolute deviation (MAD) to measure the spread?
The mean absolute deviation (MAD) is sensitive to skew and outliers, whereas the interquartile range (IQR) is resistant to outliers.
So, we should use the IQR instead of the mean absolute deviation (MAD) when measuring the spread of a distribution containing skew or outliers.
Among the given options, all distributions are symmetric and have no outliers except for the following, which is right-skewed.
In which of the following distributions might it be preferable to use the interquartile range (IQR) instead of the mean absolute deviation (MAD) to measure the spread?
a
|
|
b
|
|
c
|
|
d
|
|
e
|
In which of the following distributions might it be preferable to use the interquartile range (IQR) instead of the mean absolute deviation (MAD) to measure the spread?
a
|
|
b
|
|
c
|
|
d
|
|
e
|
The box plot below shows the distribution of some students' heights.
Which of the following statements are true?
- The distribution is right-skewed.
- The interquartile range (IQR) is sensitive to the distribution's skew.
- The mean absolute deviation (MAD) is the best measure of the distribution's spread.
First, let's reсall the following facts.
The range is sensitive to outliers, while the mean absolute deviation (MAD) is sensitive to skew and outliers.
The interquartile range (IQR) is resistant to skew and outliers.
With that in mind, let's examine each of the statements.
Statement I is true. Our distribution is right-skewed.
Statement II is false. The IQR is not sensitive to the skew of our distribution.
Statement III is false. Since the distribution is skewed, the IQR is the best measure of the spread of the given data.
Therefore, the correct answer is "I only."
The dot plot above shows the distribution of the distance of some students' houses to their school, rounded to the nearest mile.
Which of the following statements are true?
- The distribution is skewed.
- The mean absolute deviation (MAD) is not sensitive to the skew of our distribution.
- The interquartile range (IQR) is the best measure of the distribution's spread.
a
|
I only |
b
|
I and III only |
c
|
II only |
d
|
I and II only |
e
|
II and III only |
The box plot above shows the distribution of the number of siblings some students in a class have.
Which of the following statements are true?
- The distribution is symmetric.
- The mean absolute deviation (MAD) is sensitive to the distribution's skew.
- The interquartile range (IQR) is the best measure of the distribution's spread.
a
|
I and II only |
b
|
II and III only |
c
|
I only |
d
|
II only |
e
|
III only |