DFFITS
DFFITS, short for "Difference in Fits," is a statistic used in regression diagnostics to identify influential observations. It measures the influence of the i-th observation on its own fitted value. Specifically, it quantifies the change in the predicted value for the i-th observation when that observation is removed from the dataset and the regression model is refitted.
DFFITS can be calculated using the formula:
DFFITSi = (ŷi - ŷi(i)) / (s(i) * √(hii))
Where:
- ŷi is the predicted value for the i-th observation using the full dataset.
- ŷi(i) is the predicted value for the i-th observation when the i-th observation has been removed from the dataset and the model refitted.
- s(i) is the estimated standard deviation of the error term when the i-th observation has been removed from the dataset.
- hii is the leverage of the i-th observation, representing the influence the i-th observation has on its own fitted value. It is the i-th diagonal element of the hat matrix.
DFFITS combines the leverage and residual information of a data point. A large absolute value of DFFITS indicates that the observation has a strong influence on its own predicted value. A common rule of thumb is that an observation is considered influential if |DFFITSi| > 2√(p/n), where p is the number of parameters in the regression model (including the intercept) and n is the number of observations. This threshold provides a guideline for identifying observations that may warrant further investigation or special treatment in the analysis.