The standard deviation (SD) is a widely used measure in statistics that quantifies the amount of variation or dispersion of a set of values. A low standard deviation indicates that the data points tend to be close to the mean (also called the average) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values.
Definition
Mathematically, the standard deviation is the square root of the variance. It measures the average distance that individual data points deviate from the mean. Unlike the variance, the standard deviation is expressed in the same units as the data itself, making it more interpretable in practical contexts.
Calculation
The process of calculating standard deviation generally involves several steps:
- Calculate the mean of the data set.
- Determine the deviation of each data point from the mean (subtract the mean from each value).
- Square each deviation (to eliminate negative values and emphasize larger deviations).
- Sum the squared deviations.
- Divide by the number of data points (for population standard deviation) or by the number of data points minus one (for sample standard deviation) to get the variance.
- Take the square root of the variance to obtain the standard deviation.
Population vs. Sample Standard Deviation
A crucial distinction exists between the population standard deviation (often denoted by $\sigma$, sigma) and the sample standard deviation (often denoted by $s$).
- Population standard deviation is used when the entire data set of interest (the population) is available.
- Sample standard deviation is used when only a subset of the data (a sample) is available, and the goal is to estimate the standard deviation of the larger population from which the sample was drawn. The formula for sample standard deviation uses $n-1$ in the denominator instead of $n$ (where $n$ is the number of data points), a correction known as Bessel's correction, which provides an unbiased estimate of the population variance.
Interpretation
- A small standard deviation implies that the data points are clustered closely around the mean, indicating high consistency or low variability.
- A large standard deviation suggests that the data points are spread out over a wider range, indicating high variability or low consistency.
For data that follows a normal distribution (bell curve), the standard deviation has specific interpretative properties, often described by the empirical rule (or 68-95-99.7 rule):
- Approximately 68% of the data falls within one standard deviation of the mean.
- Approximately 95% of the data falls within two standard deviations of the mean.
- Approximately 99.7% of the data falls within three standard deviations of the mean.
Relationship to Variance
The standard deviation is directly related to the variance. Variance is the average of the squared differences from the mean. Since the standard deviation is the square root of the variance, it reverts the measurement back to the original units of the data, making it more intuitive for practical interpretation than variance, which is in squared units.
Applications
Standard deviation is a fundamental concept used across various fields:
- Quality Control: To monitor the consistency of product manufacturing.
- Finance: To measure the volatility or risk of investments; a higher standard deviation indicates higher risk.
- Engineering: To assess the precision and accuracy of measurements or processes.
- Science and Research: To report the variability of experimental results and assess statistical significance.
- Social Sciences: To understand the spread of demographic data, survey responses, or test scores.