📖 WIPIVERSE

🔍 Currently registered entries: 107,239건

Dummy variable (statistics)

In statistics and econometrics, a dummy variable (also known as an indicator variable, design variable, one-hot encoding, or boolean indicator) is a numerical variable used to represent categorical data in regression analysis and other statistical models. Specifically, it assigns numerical values to different categories of a qualitative variable. Typically, a dummy variable uses the value 1 to indicate the presence of a particular category and 0 to indicate its absence.

The purpose of using dummy variables is to allow qualitative information to be included and analyzed within quantitative models. Without dummy variables, it would be impossible to directly incorporate non-numerical categorical variables such as gender, region, or occupation into a regression equation.

When dealing with a categorical variable that has k distinct categories, k-1 dummy variables are created. One category is arbitrarily chosen as the 'reference' or 'baseline' category, and its effect is captured in the intercept term of the regression model. Including k dummy variables would lead to perfect multicollinearity (the 'dummy variable trap') as the sum of all dummy variables would always equal 1, which is already represented by the intercept term.

Dummy variables allow researchers to estimate the effects of different categories relative to the reference category. The coefficients associated with each dummy variable represent the difference in the dependent variable between that category and the reference category, holding all other variables constant.

Dummy variables can also be used to represent interactions between categorical and continuous variables. This is achieved by multiplying the dummy variable by the continuous variable, creating an interaction term. The coefficient on the interaction term represents the difference in the slope of the continuous variable for that particular category compared to the reference category.

The choice of which category to use as the reference category is arbitrary and does not affect the overall fit of the model. However, the interpretation of the coefficients on the dummy variables will change depending on the chosen reference category.

Dummy variables are a fundamental tool in statistical modeling, allowing for the inclusion and analysis of categorical data, which significantly expands the applicability and explanatory power of regression models and other statistical techniques.