p-value
In statistical hypothesis testing, the p-value (probability value) is the probability of obtaining test results at least as extreme as the results actually observed, assuming that the null hypothesis is correct. It is a measure of evidence against the null hypothesis.
More formally, the p-value is the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value indicates stronger evidence against the null hypothesis, as it suggests the observed data is less likely to occur under the null hypothesis.
Interpretation
The p-value is often compared to a pre-determined significance level, denoted as α (alpha). Common values for α are 0.05, 0.01, and 0.10.
-
If the p-value is less than or equal to α, the null hypothesis is rejected. This indicates that the observed results are statistically significant, meaning they are unlikely to have occurred by chance alone if the null hypothesis were true.
-
If the p-value is greater than α, the null hypothesis is not rejected. This does not mean that the null hypothesis is true. It simply means that the observed data does not provide sufficient evidence to reject it. There may not be sufficient statistical power or other issues may be present.
Common Misconceptions
-
The p-value is not the probability that the null hypothesis is true. It is the probability of observing the data (or more extreme data) given that the null hypothesis is true.
-
A statistically significant result (small p-value) does not necessarily mean that the result is practically significant. The effect size should also be considered.
-
A large p-value does not prove the null hypothesis. It only suggests that the data does not contradict it.
-
P-values should not be used in isolation to make decisions. They should be considered along with other factors, such as the study design, the sample size, and the potential for bias.
Calculation
The p-value is calculated based on the test statistic and the sampling distribution of the test statistic under the null hypothesis. The specific method for calculating the p-value depends on the statistical test being used (e.g., t-test, chi-square test, ANOVA).
Criticism
The use of p-values has been subject to considerable criticism in recent years. Concerns have been raised about their potential for misinterpretation, misuse, and contribution to the "replication crisis" in science. Alternatives to relying solely on p-values are increasingly being promoted, such as focusing on effect sizes, confidence intervals, and Bayesian methods.