10 Measures of Dispersion

Measures of dispersion are statistical tools used to describe the spread or variability within a data set. Unlike measures of central tendency (mean, median, mode) that summarize data with a single value representing the center of the data, measures of dispersion give insights into how much the data varies or how “spread out” the data points are. Understanding the variability helps in comprehending the reliability and precision of the central measures. The primary measures of dispersion include the Range, Interquartile Range (IQR), Variance, Standard Deviation, and Absolute Deviation.

10.1 Range

The range is the simplest measure of dispersion and is calculated as the difference between the maximum and minimum values in the data set.

Example: For the data set {1, 2, 4, 7, 9}, the range is \(9 - 1 = 8\).

10.1.1 Interquartile Range (IQR)

The IQR measures the middle spread of the data, essentially covering the central 50% of data points. It is the difference between the 75th percentile (Q3) and the 25th percentile (Q1).

Example: For the data set {1, 2, 4, 7, 9}, where Q1 is 2 and Q3 is 7, the IQR is \(7 - 2 = 5\).

10.2 Variance

Variance measures the average of the squared differences from the Mean. It gives a sense of how much the data points deviate from the mean. The formula for variance differs slightly between samples and populations.

Population Variance (\(\sigma^2\)): \(\sigma^2 = \frac{\sum (x_i - \mu)^2}{N}\)
Sample Variance (\(s^2\)): \(s^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1}\)

Example: For the data set {1, 2, 4, 7, 9}, with a mean of 4.6, the sample variance is calculated as follows:

\[s^2 = \frac{(1-4.6)^2 + (2-4.6)^2 + (4-4.6)^2 + (7-4.6)^2 + (9-4.6)^2}{5-1}\]

\[= \frac{46.8}{4} = 11.7\]

10.2.1 Standard Deviation

The standard deviation is the square root of the variance and provides a measure of dispersion in the same units as the data. It is one of the most commonly used measures of dispersion because it is easily interpreted.

Population Standard Deviation (\(\sigma\)): \(\sigma = \sqrt{\sigma^2}\)
Sample Standard Deviation (\(s\)): \(s = \sqrt{s^2}\)

Example: Continuing from the variance example, the sample standard deviation of {1, 2, 4, 7, 9} is \(\sqrt{11.7} \approx 3.42\).

10.2.2 Absolute Deviation / Mean Absolute Deviation (MAD)

Absolute deviation measures the average distance between each data point and the mean, ignoring the direction (positive or negative). It is a robust measure of variability.

Example: For the data set {1, 2, 4, 7, 9} with a mean of 4.6, the MAD is calculated as follows:

\[MAD = \frac{|1-4.6| + |2-4.6| + |4-4.6| + |7-4.6| + |9-4.6|}{5}\]

\[= \frac{13.2}{5} = 2.64\]

Python

10.2.3 summary

Measures of dispersion are crucial in statistical analysis for understanding the variability within a data set. They complement measures of central tendency by providing a fuller picture of the data’s distribution. The choice of which measure to use depends on the data characteristics and the analysis’s objectives. Variance and standard deviation are particularly useful in many statistical analyses, including statistical modeling and hypothesis testing, while the range and IQR provide quick insights into data spread. The MAD offers a robust alternative less affected by outliers.