DFFITS

DFFITS, short for "Difference in Fits," is a statistic used in regression diagnostics to identify influential observations. It measures the influence of the i-th observation on its own fitted value. Specifically, it quantifies the change in the predicted value for the i-th observation when that observation is removed from the dataset and the regression model is refitted.

DFFITS can be calculated using the formula:

DFFITS_i = (ŷ_i - ŷ_i(i)) / (s_(i) * √(h_ii))

Where:

ŷ_i is the predicted value for the i-th observation using the full dataset.
ŷ_i(i) is the predicted value for the i-th observation when the i-th observation has been removed from the dataset and the model refitted.
s_(i) is the estimated standard deviation of the error term when the i-th observation has been removed from the dataset.
h_ii is the leverage of the i-th observation, representing the influence the i-th observation has on its own fitted value. It is the i-th diagonal element of the hat matrix.

DFFITS combines the leverage and residual information of a data point. A large absolute value of DFFITS indicates that the observation has a strong influence on its own predicted value. A common rule of thumb is that an observation is considered influential if |DFFITS_i| > 2√(p/n), where p is the number of parameters in the regression model (including the intercept) and n is the number of observations. This threshold provides a guideline for identifying observations that may warrant further investigation or special treatment in the analysis.

📖 WIPIVERSE

DFFITS