Shaping CME with Predictive Modeling: Part 2
Welcome back! In the first installment of the outcomes blog, I provided a bit of background on predictive modeling and its importance in CME. For this blog post, I’ll review the types of data that can be used in predictive modeling and focus in on the predictive models that have the most relevance to CME.
Types of Data
Before I get into the specific types of models, let’s first differentiate between types of data, as this is one of the considerations that will determine which type of model you use.
- Categorical: Data consist of categories with no particular order (eg, neurologist, oncologist, psychiatrist).
- Ordinal: Data typically reflect some sort of scale, like a Likert scale (rate on a scale of 1-5, or excellent/very good/good/fair/poor).
- Interval: Data are “continuous,” ie, can take on any value in an interval, and the difference between two values is meaningful (eg, difference in age between 30 and 40 years is the same as the difference between 50 and 60 years, at least on paper!).
- Ratio: Data are also continuous (not to be confused with a ratio, eg, 1:3) and have the properties of an interval variable, but where the value of 0 actually means something. For example, if years of education were 0, that would mean there was no education. Conversely, a temperature of 0 does not mean an absence of temperature, therefore temperature is not considered a ratio variable.
Back to the models. Regression is a category of predictive models that is probably the most widely used across disciplines. It is a method for predicting values of a “response” or “criterion” variable from the values of one or more “predictor” variables. The two most common forms of regression are linear and logistic.
- Linear regression is used when predictor and response variables are continuous, such as age and blood pressure. Ratio and interval data are appropriate for linear regression, and some would argue that ordinal data such as Likert scale, under certain conditions, can also be treated as continuous and therefore used in linear regression. An example in CME might be that we want to predict self-rating of confidence (scale from 1-10) based on years in practice.
- Logistic regression may be more appropriate for much of what we do in CME outcomes. Logistic regression is used when the response variable is categorical, eg, correct/incorrect, good/fair/poor, yes/no. The predictor variable(s) can be categorical, ordinal, interval, or ratio. In CME, we are often interested in how well our participants learned the material, so we ask questions related to knowledge, which are often scored as correct/incorrect. For example, you may be interested in whether age, specialty, activity format, or any number of demographic, evaluation, or other variables predict performance on a particular knowledge question. As a reminder, ordinal data are not always considered “continuous,” so logistic regression may be more appropriate.
- Poisson regression can also be used for categorical data, but typically when an event is rare (eg, #incomplete activities).
In general terms, ANOVA (analysis of variance) and linear regression are two sides of the same coin. So, if you have a model with a continuous response variable and categorical predictors, you’re essentially referring to ANOVA. CHAID (chi-square automatic interaction detection) is what we use at CMEO, under the name PredictCME. CHAID is often used in data mining and can be used for both continuous and categorical data. We love it, and have had excellent responses from the CME, medical, and scientific communities.
I could spend several blogs on predictive modeling alone–as you can see, there’s a lot to cover, and I have only very lightly skimmed the surface! I’m hoping this post as well as Part 1 gave you some food for thought and will provide at least a starting point for you to explore conducting your own predictive modeling.
Stay tuned for the next installment of the CME Outfitters best practices blog, during which my colleague, Beth Brillinger, CHCP, our Director of Accreditation, will be discussing best practices in accreditation.
About The Author