Page: The Four Domains
There is a lot to learn in this course. It can be challenging to process it all. With that in mind, it might be useful to envision 4 inter-related domains that we will be working in during the course. Keep these domains in mind as we proceed. Some days will have you laboring within a single domain. Other days may have you working on tasks from multiple domains. No matter the situation, it's always valuable to to keep perspective and see the forest for the trees.
1. The sociological enterprise and quantitative research. This is big picture stuff, connecting a clear-eyed sense of how sociologists think and translating theories, concepts, and question(s) into a quantitative research study. Put simply, your research question will be guided by sociological thinking, the literature (a theoretical framework(s); prior empirical research), and a clear sense of the concepts at play in your causal narrative and how you might measure them. This domain is what makes sociological research distinct from mere data science. A lot of people (and artificial intelligence) can "run numbers"; not everyone knows how to appropriately interrogate or develop their measures, inform their data analyses, and think critically about the results to build and refine theory.
2. Data identification, collection, and/or compilation. Much of the "labor" in quantitative research revolves around data acquisition. In some cases this involves survey design and data collection (sampling strategies). In other cases it involves the use of secondary data sets (e.g., from repositories like ICPSR or government agencies). In yet other instances you may end up constructing a data set in another program, such as GIS, compiling data from a variety of sources. As you are probably aware, one has to be very careful and deliberate in designing a survey or selecting a data set so that you end up with measures of your concepts that are as valid and reliable as possible (to minimize measurement error). Additionally, as you are often working with sample data (directly or indirectly), be cognizant that sampling error is inherent to your data and that affects how you interpret and communicate your findings.
3. Data preparation. Another labor intensive aspect of the quantitative research process is cleaning and prepping the data (also known as data wrangling). This may involve cloning variables, re/naming variables, recoding categorical variables, collapsing categories, adding labels, generating new variables, etc. You want your data set to be as clear and clean as possible, containing the best possible measures of your concepts (again, minimizing measurement error within the limits of your data set). Fortunately, Stata makes data preparation fairly easy. It's also important to document any data preparation or manipulation in your do-file (script of commands) for purposes of transparency and reproducibility. You'll come to realize that while all clean/prepped data tends to be tidy in the same way, dirty data sets can be dirty in a thousand different ways.
4. Statistical analysis, graphing, and presentation of results. This involves the use of Stata (or some other program) to analyze prepared data. Stata makes possible many different analyses and your choice of analysis and the design of your models will depend on your decisions in the other domains above (e.g., types of measures; nature of one's research question). One should carefully choose one's estimation technique and make modeling decisions with an eye toward reducing estimation error. A bad model will produce bad results. We will develop proficiency in the most widely used bivariate and multiple regression techniques. Your analyses must also be presented in ways that are appropriate and accessible to your audience(s). This may involve precisely crafted narrative, statistics, graphs, or a combination of all three. You'll learn to communicate your results in both conventional ways (e.g., statistical significance, etc.) and in ways that do not rely on the conventions of statistical significance. You must also relate these results back to the literature that informed your study.