Interaction Effects

2 minute read

  • Create and interpret contingency tables
  • Create and interpret interaction plots

Announcements:

  • Homework is posted.
  • Data set is about 400 MB, let me know if your computer starts going super slow.

live notes

Categorical Data

What is categorical data? Categorical data takes on a finite, usually small, set of distinct values. R represents these as factors, but they might show up as character vectors.

ed = c("high school", "bachelors", "graduate")
edf = factor(ed)
sex = factor(c("male", "female", "other"))

Categorical data can also be ordered.

edo = factor(ed, levels = ed, ordered = TRUE)

How to make continuous data categorical? Use cut to cut it into pieces.

x = seq(from = 1.1, to = 10, length.out = 50)
cx = cut(x, breaks = 1:10)

Contingency tables

Let’s load the data for your next homework. It takes a while!

covid = read.csv("~/data/COVID-19_Case_Surveillance_Public_Use_Data.csv")

Q: The age_group column is an example of what kind of data? A: conceptually, it should be an ordered factor We haven’t converted it, so it’s a character.

table(covid$age_group)
class(covid$age_group)

We’re also interested in whether patients had preexisting serious medical conditions.

table(covid$medcond_yn)

Let’s look at both of these variables together. Here is an example of a contingency table, a table of counts.

t1 = table(covid[, c("age_group", "medcond_yn")])

The areas in this plot represent relatively how many observations fall into each category.

plot(t1)

Response and predictors

Many models in statistics and machine learning are based on the idea of using one or more variables to predict a different variable of interest, called the response.

Let’s look at hospitalization from COVID as a response.

Right now this is a character vector.

class(covid$hosp_yn)

Q: How are we going to make it a numeric vector? A: An easy way is to just make it binary.

covid$hospital = as.numeric(covid$hosp_yn == "Yes")

Interaction plots

In statistics, an interaction effect is when two or more variables have a non additive effect in the value of some response.

We can look at hospitalizations as a function of age and sex to find out if there are interaction effects between these two variables. For example, it might be the case that females between 20 and 29 have unusually low rates of hospitalization, and that would provide evidence of an interaction effect.

with(covid, 
interaction.plot(age_group, sex, hospital)
)

The way to interpret an interaction plot is if each line is a vertical translation of all the other lines, then there is no evidence of an interaction effect. If the lines have completely different shapes, then that provides evidence of an interaction effect.

This particular plot shows some mild interaction effects. Take great care in making causal statements. For example, it might well be the case that sex doesn’t have much to do with COVID, rather, sex might be highly correlated with a type of medical condition that does affect vulnerability to COVID.

Group Activity

Discuss in your groups, and come up with examples of:

1) unordered categorical variable 2) ordered categorical variable 3) two variables that may well have an interaction effect in producing a response

Updated: