One of the questions that can sometimes be addressed in a multiple OLS regression is whether one variable (or more) might serve as a mediator for another. In other words, does the effect of X1 on Y operate through its effect on X2, at least in part? In other words, is X2 a mediating mechanism by which X1 is affecting Y? An example can be helpful. Suppose you find that educational attainment has a positive effect on health. Theory and related empirical research would suggest that at least part of that relationship is mediated by income. That is, educational attainment generally leads to higher income which then can allow people to preserve/attain better health. Income is a causal factor in this scenario but also a mediating mechanism by which education is impacting health.

To determine whether and to what degree this may be happening, one can examine the relationship of X1 and Y in an initial model and then include the hypothesized mediating mechanism variable (X2) in the subsequent model. If X2 is mediating the relationship, it will take over some or all of X1's original effect. Additionally, any mediating effect should make sense theoretically with respect to your variables.

. regress y x1

. regress y x1 x2

In addition to examining the change in X1's coefficient across models, one can use one of two commands in Stata that helps assess the degree of mediation that is occurring -- Stata's sem command or the user-written binary_mediation command.

sem

Stata's sem (structural equation modeling) command can be used to calculate the portion of X1's total effect on Y that is mediated by X2. In the following generic command, mv indicates your mediating variable, iv the more distal independent variable, and dv the ultimate dependent variable. The subsequent estat teffects command will analyze the results and provide you with direct, indirect, and total effects. From those results, one can calculate the portion of X1's total effect on Y by dividing X1's indirect effect (coefficient) on Y by X1's total effect on Y. UCLA's page on this is helpful.

. sem (mv <- iv) (dv <- mv iv)

. estat teffects

If you iv is a categorical variable, you will need to generate dummy variables first and include each of them, save for a reference group.

. tab catvar, gen(catvar_)

. sem (mv <- catvar_2 catvar_3)(dv <- mv catvar_2 catvar_3)

. sem (ppov <- minestat_2 minestat_3)(pfairpoor <- ppov minestat_2 minestat_3)

. estat teffects

At this point, you locate the indirect effect of your iv of interest (e.g., minestat_2) and divide that by the total effect of that variable. That is the portion that is mediated by your mediating variable.

binary_mediation

There is yet another user-written command that one can use that will calculate the portion for you. When your iv is categorical, you will need to run one category at a time in the iv position, placing the other(s) in the control variable section of the command. This now dismantled webpage (pdf of it) walks you through the command.

. binary_mediation, dv(depvar) mv(medvar) iv(indvar) cv(controlvars)

. binary_mediation, dv(pfairpoor) mv(ppov) iv(minestat_2) cv(minestat_3)

. binary_mediation, dv(pfairpoor) mv(ppov) iv(minestat_3) cv(minestat_2)

You will see that the binary_mediation command provides the proportion of the total effect that is mediated; no calculation needed.