This page discusses various issues that one confronts when conducting multiple logistic regression, as well as how to diagnose those issues and what remedies you might pursue. Multiple logistic regression, though powerful, often assumes certain things about your data or can be vulnerable to a variety of issues sometimes present in data. That said, there are almost always issues with quantitative data analysis and resolutions are not always possible. Still, our analyses can help to reveal patterned relationships. Below you will see a discussion of the following issues (including ways to diagnose and "resolve" the issues):

Weighted Sample
Multicollinearity
Outliers
Heteroskedasticity
Model Specification
Goodness of Fit

Weighted sample. With survey data, it’s not uncommon for some groups to have been purposely oversampled so as to garner a sufficient number of respondents in the sample. We don’t want estimates, however, to be based on their overrepresentation in the sample. With “weights,” we are able to readjust estimates so that they reflect population characteristics. Be sure to check the study information associated with a data set to see if weights are recommended. In many cases, one can simply declare the sampling weight variable (you’ll see a “weights” tab in the dialogue box). In other cases, you may want to declare the survey design using the Stata menu system (particularly if there is a complex survey design). For now, a simple sampling weight adjustment will look like this:

. logistic dv iv1 iv2 iv3 [pweight = weightvar]

If called for, you'll want to run your models using the appropriate weight(s) before running appropriate diagnostics.

Multiple Logistic Regression – Diagnosing Issues and Possible Remedies:

Multicollinearity– this issue presents itself when two or more of your independent variables are too highly correlated with one another and Stata is unable to parse out their independent effects. In effect, the highly correlated IVs “wash each other out.” This is a definite problem in terms of getting results that make sense. The issue isn't that multicollinearity results in bad regression coefficients; it's that it inflates those coefficients' standard errors, causing issues with statistical significance (as SEs get bigger, p-values get bigger as well). The model is still good; you'll just have difficulty identifying "significant" effects even when they're there. I find the vif command the most convenient, despite the fact that it doesn't work following a logistic regression.

Diagnosis
- Since vif doesn't work following a logistic regression, you'll need to run your model as an OLS regression first (regress). Don't worry -- it will run; just ignore the results. Following that, you can run the vif command. Since vif only assesses the degree of collinearity among our IVs (not their relationship to the DV), this is not a problem.
- . vif (run this variance inflation factor test immediately after your OLS regression (of your logistic regression model). There are no cut and dry rules for what is a high VIF, but generally anything near or above 5 is of potential concern and anything near 10 most certainly indicates a problem of collinearity. Sample size comes into play as well. The smaller the sample, the more we may be concerned about a VIF as low as 4 or even less.
What to do about multicollinearity if you detect it -- you have options:
- Drop one of the suspect IVs; or run them in alternating models. See if separately they are significant, but not significant when together. Choose one or the other to represent the underlying factor. Make sure this is conceptually ok.
- Create a composite variable if appropriate (only if they speak to the same construct). You this Page on constructing composite variables.
- Leave them in and simply note the issue. This issue doesn’t make the model worse in terms of its ability to explain variation in the DV and the regression coefficients are unaffected; it only complicates your identification of statistically significant causal variables.

Outliers (influential cases) – Althought there are ways to identify outliers for logistic regression within Stata (Pearson residual, deviance residual, and Pregibon leverage), these are often not examined.

Heteroskedasticity -- For logistic regression, the problem of detecting and dealing with heteroscedasticity has been deemed virtually “unsolveable” and thus not worth worrying about.

Model Specification/Goodness of Fit – As with OLS regression, we often want to know how well our model fits the data in logistic regression. Unlike OLS regression, however, which provides a convenient and intuitive statistic on the fit of the model (R-squared), logistic regression provides no such clear-cut indication of the fit of the model. The results do show the "log likelihood chi-square" and the "pseudo R-square" for each model. The pseudo R-square, however, is not equivalent to the OLS R-square and there is wide-spread disagreement on what it actually means. In fact, there are many alternative pseudo R-square statistics and they often are quite different, despite being based on the same model. As a result, many people advoctate, as I do, for ignoring it. The log likelihood chi-square, on the other hand, is a bit more useful, though perhaps not all that intuitive. It offers an omnibus test to see if the model as a whole is statistically significant. It is 2x the difference between the log likelihood of the current model and the log likelihood of the intercept-only model. Although the log likelihood statistic is reported as a matter of convention, by itself, it doesn't tell us much (other than whether the model is better than nothing). There are other goodness of fit tests, though, that are commonly run for logistic regressions. Each of these is run following the logistic regression model.

The linktest is used to detect specification error. The idea is that if a model is properly specified, one should not be able to find any additional predictors that are statistically significant except by chance. Here, one looks at the _hatsq result. If its p-value is less than .05, then this suggest our model is incorrectly specified. If it is greater than .05, then it is well specified.
- . linktest
Hosmer and Lemshow Goodness of Fit Test -- The idea of this test is that the predicted frequency and observed frequency should match closely, and that the more closely they match, the better the fit. Since the test relies on the creation of a contingency table (behind the scenes), we often combine the patterns formed by the IVs into 10 groups and form a contingency table of 2 by 10. Note: if you have a smaller sample, you might opt to use fewer groupings (e.g., 4). Note: with this test, a higher p-value indicates a better fit (e.g., a p-value of .33 indicates a model that fits the data well). A p-value <.05 indicates that the model fit is poor.
- . estat gof, group(10) table
The ROC curve -- is an evaluative tool relating to how well the model fits the data. It returns a graph and a "c-statistic" (below the graph) which is a summary measure of how well your model discriminates between cases (1) and non-cases (0). A model with good discrimination ability (good fit) will show a curve that goes closer to the top left hand corner of the plot, whereas a model with no discrimination ability has a ROC curve close to the 45 degree line. Thus, the area under the curve ranges from 1 (perfect discrimination) to 0.5 (no discrimination ability). The c-statistic (concordance) of 0.5 would show that your model does no better than random chance at accurately predicting when a case is 1 versus 0. A c-statistic above 0.5 shows the degree to which your model does better than random chance.
- . lroc
What to do about a poorly specified model -- you have options.
- Determine whether one or more IVs have been left out of the model. Don't go fishing. Variable inclusion should always be driven by theory and an understanding of the literature.
- Assess whether any of your IVs don’t belong in the model. You can often "trim" your model of variables that appear to have little or no influence. Be careful, though, as sometimes "no effect" is an important finding to show/report.
- Don't sweat it. Acknowledge that future research is needed to better understand the phenomenon (almost always the case). In fact, many people who work with logistic regression do not stress on the fit of the model, focusing instead on the individual variable results.