So you can’t randomize people into THAT condition? Now what?
Let’s say you’re investigating the impact of smoking on social outcomes like depression, poverty, or quality of life. Your IRB, with good reason, won’t allow random assignment of smoking status to your participants.
But how can you begin to overcome the self selected nature of smoking among the study participants? What if self-selection is driving differences in outcomes? Well, one way is to use propensity score matching and analysis as a framework for your investigation.
The propensity score is the probability of group assignment conditional on observed baseline characteristics. In this way, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects.
In this webinar, we’ll describe broadly what this method is and discuss different matching methods that can be used to create balanced samples of “treated” and “non-treated” participants. Finally, we’ll discuss some specific software resources that can be found to perform these analyses.
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
(more…)
This webinar will present the steps to apply a type of latent class analysis on longitudinal data commonly known as growth mixture model (GMM). This family of models is a natural extension of the latent variable model. GMM combines longitudinal data analysis and Latent Class Analysis to extract the probabilities of each case to belong to latent trajectories with different model parameters. A brief (not exhaustive) list of steps to prepare, analyze and interpret GMM will be presented. A published case will be described to exemplify an application of GMM and its complexity.
Finally, an alternative approach to GMM will be presented where the longitudinal model approach is linear mixed effects (also known as hierarchical linear model or multilevel modeling). The idea is the same as in GMM using growth curve modeling, mainly that the latent class membership specifies specific unobserved trajectories. These models are equivalent to GMM and are sometimes referred to heterogeneous linear mixed effects, underlining the idea that the sample may not belong to one single homogeneous population, but potentially to a mixture of distributions.
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
(more…)
Correspondence analysis is a powerful exploratory multivariate technique for categorical variables with many levels. It is a data analysis tool that characterizes associations between levels of two or more categorical variables using graphical representations of the information in a contingency table. It is particularly useful when categorical variables have many levels.
This presentation will give a brief introduction and overview of the use of correspondence analysis, including a review of chi square analysis, and examples interpreting both simple and multiple correspondence plots.
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
(more…)
P-values are the fundamental tools used in most inferential data analyses (more…)
Author: Trent Buskirk, PhD.
In my last article, we got a bit comfortable with the notion of errors in surveys. We discussed sampling errors, which occur because we take a random sample rather than a complete census.
If you ever had to admit error, sampling error is the type to admit. Polls admit this sort of error frequently by reporting the margin of error. Margin of error is the sampling error multiplied by a distributional value that can be used to create a confidence interval.
But there are some other types of error that can occur in the survey context that, while influential, are a bit more invisible. They are generally referred to as non-sampling error.
These types of errors are not associated with sample-to-sample variability but to sources like selection biases, frame coverage issues, and measurement errors. These are not the kind of errors you want in your survey.
In theory, it is possible to have an estimator that has little sampling error associated with it. That looks good on the surface, but this estimator may yield poor information due to non-sampling errors.
For example, a high rate of non-response may mean that some participants are opting out and biasing estimates.
Likewise, a scale or set of items on the survey could have known measurement error. They may be imprecise in their measurement of the construct of interest or they may measure that construct better for some populations than others. Again, this can bias estimates.
Frame coverage error occurs when the sampling frame does not quite match the target population. This leads to the sample including individuals who aren’t in the target population, missing individuals who are, or both.
A perspective called the Total Survey Error Framework allows researchers to evaluate estimates on errors that come from sampling and those that don’t. It can be very useful in choosing a sampling design that minimizes errors as a whole.
So when you think about errors and how they might come about in surveys, don’t forget about the non-sampling variety – those that could come as a result of non-response, measurement, or coverage.
Author: Trent Buskirk, PhD.
As it is in history, literature, criminology and many other areas, context is important in statistics. Knowing from where your data comes gives clues about what you can do with that data and what inferences you can make from it.
In survey samples context is critical because it informs you about how the sample was selected and from what population it was selected. (more…)