![]() So, if the categorical variable is coded as -1 and 1, then if the regression coefficient is positive, it is subtracted from the group coded as -1 and added to the group coded as 1. Now the estimates for bo and b90 and -14088, respectively, leading once again to a prediction of average salary of 115090 for males and a prediction of 115090 - 14088 = 101002 for females.Īlternatively, instead of a 0/1 coding scheme, we could create a dummy variable -1 (male) / 1 (female). The fact that the coefficient for sexFemale in the regression output is negative indicates that being a Female is associated with decrease in salary (relative to Males). The output of the regression fit becomes: model |t|) You can use the function relevel() to set the baseline category to males as follow: Salaries % The decision to code males as 1 and females as 0 (baseline) is arbitrary, and has no effect on the regression computation, but does alter the interpretation of the coefficients. R has created a sexMale dummy variable that takes on a value of 1 if the sex is Male, and 0 otherwise. ![]() The contrasts() function returns the coding that R have used to create the dummy variables: contrasts(Salaries$sex) # Male The p-value for the dummy variable sexMale is very significant, suggesting that there is a statistical evidence of a difference in average salary between the genders. # (Intercept) 101002 4809 21.00 2.68e-66įrom the output above, the average salary for female is estimated to be 101002, whereas males are estimated a total of 101002 + 14088 = 115090. R creates dummy variables automatically: # Compute the model and b1 is the average difference in salary between males and females.įor simple demonstration purpose, the following example models the salary difference between males and females by computing a simple linear regression model on the Salaries data set.b0 + b1 is the average salary among males,.b0 is the average salary among females,.The coefficients can be interpreted as follow: Suppose that, we wish to investigate differences in salaries between males and females.īased on the gender variable, we can create a new dummy variable that takes the value:Īnd use this variable as a predictor in the regression equation, leading to the following the model: b0 and `b1 are the regression beta coefficients, representing the intercept and the slope, respectively. My guess would be, that for the predicted values of Y2, it does not matter whether the fraction of the variance of Y2 explained by X4 enters the fitted values ”directly”, i.e., by observing X4 and estimating f4, or whether it enters “indirectly”, i.e., by means of an inflated coefficient f1.Recall that, the regression equation, for predicting an outcome variable (y) on the basis of a predictor variable (x), can be simply written as y = b0 + b1*x. My question now is: Are the second stage coefficients in the 2SLS regression of (1) still unbiased in this setting even if f1 in the first stage regression is inflated due to the unobserved effect of X4? This implies that something like an “omitted variables problem” arises in the first stage regression, whereby the coefficient for X1, i.e., f1, is biased as it also captures (at least partly) the effect of X4 on Y2 (i.e., f4). However, in my case, the variable X4 is unobservable but correlated with one of the exogenous variables of the structural equation for Y1, e.g., X1. (2r) Y2 = f0 + f1 * X1 + f2 * X2 + f3 * X3 + f4 * x4 + u2I want to estimate the structural equation (1) using a 2SLS approach, whereby X3 and X4 are the instruments for Y2.Īs discussed in the above-mentioned post, omitting X4 as an instrument in the first stage regression (i.e., the reduced form for Y2) does not result in biased estimates in the second stage of the 2SLS estimation of (1) (it only implies a loss in efficiency).
0 Comments
Leave a Reply. |