dummy variables

views updated

dummy variables In quantitative data analysis, researchers are sometimes interested in the implications of non-interval level variables for a dependent variable, as for example in the case of the relationship between sex and income. Although regression analyses normally require data scaled at the interval-level, it is possible to include non-interval variables in a multiple regression by creating appropriate so-called dummy variables. In the example just cited, this would involve coding men as 1 and women as 0; or, where the independent variable comprises more than two (let us say n) categories, creating a dummy variable comprising n − 1 dummy variables. Thus, if the independent variable ‘social class’ contains the four categories ‘upper’, ‘middle’, ‘working’, and ‘none’, then in order to include these in a multiple regression analysis three dummy variables would have to be created: v1 ‘upper’ (coded 1) or ‘not upper’ (coded 0); v2 ‘middle’ (coded 1) or ‘not middle’ (coded 0); v3 ‘working’ (coded 1) or ‘not working’ (coded 0). The fourth category is represented by these three dummy variables since it can be described by the combination 000. Each category in the social class variable now has a unique combination of zeros or ones by means of which its presence or absence can be indicated. In a regression analysis involving dummy variables the resulting regression coefficients are then treated as if they were based on variables measured at the interval level. See also MEASUREMENT.