A propensity score analysis was conducted to assess the effect of PAL supplemental instruction on course grades Math 31 . Propensity score adjustment was necessary since the data are observational and the characteristics of students who voluntarily enroll in PAL may differ in ways that may, independently of PAL, impact course grade compared to students who do not enroll in PAL. In propensity score analysis, variables related to both likelihood of PAL enrollment and course grade (confounders) are used in a logistic regression model to obtain a propensity score, which is a student's likelihood of enrolling in PAL.
First Add Some R Packages to the Workspace\ Caution: warning messages are suppressed to reduce clutter in the output.
defaultW <- getOption("warn")
options(warn = -1)
library(rbounds)
library(readr)
library(tidyr)
library(tidyverse)
library(MatchIt)
library(cobalt)
library(backports)
library(DataExplorer)
library(readxl)
library(Matching)
options(warn = defaultW)
The script contains a user-defined function called id.vars.zero.freq to identify variables causing complete separation in logistic regression.
source("IdentifyCompleteSepVars2.R")
Make sure the PAL datafile is in the same directory as this RMarkdown file. The file which includes data through the Spring 2019 semester has 1,099,371 rows and 174 columns. The total of the grd.pt.unt column is 2,237,555.
load("~/PAL data analysis/PALdatafull.Rdata")
dim(PALdatafull)
## [1] 1099371 174
sum(PALdatafull$grd.pt.unt)
## [1] 2237555
Each instance of a course taken by a student is included in the data file for the semester during which it was taken and every semester thereafter. 'test1' below filters so the dataset only includes the semester during which the course was taken.
test1 <- PALdatafull$base.time.course==1
test2 <- PALdatafull$course == "MATH 31"
math.indx <- test1 & test2
mydat <- PALdatafull[math.indx, ]
dim(mydat)
## [1] 5654 174
# [1] 5654 174
table(mydat$palN)
##
## 0 1 2
## 5153 30 471
# 0 1 2
# 5153 30 471
The course.seq variables indicates how many times a student has taken a course prior to the current attempt. To filter on the first attempt at a course, we set course.seq to 0.
raw.table = data.frame(class=character(),
nonPALavg=numeric(),
PALavg=numeric(),
Diff=numeric(),
NonPAL_Num= integer(),
PAL_Num=integer(),
CompletePAL=numeric(),
TermPALStart=integer(),
row.names=NULL,
stringsAsFactors = FALSE)
eth.tables = list()
math.classes = "MATH 31"
for (i in 1:length(math.classes))
{
curr.class = math.classes[i]
temp = subset(mydat, course==curr.class & course.seq==0)
pal.start=min(unique(temp$term.code[temp$palN==2]))
# only include terms after PAL start term
temp = subset(temp, term.code>= pal.start)
x=tapply(temp$grd.pt.unt,temp$palN,
mean, na.rm=T) %>%
as.numeric %>%
round(2)
y=table(temp$palN) %>% as.numeric
raw.table[i, 'class' ] = curr.class
raw.table[i, c(2:4,7)]=c(x[1], x[3],x[3]-x[1],
round(y[3]/sum(y),2))
raw.table[i, c(5,6,8)]= c(y[1], y[3], pal.start)
#participation by ethnicity
eth.tables[[i]] <- with(temp, table(eth.erss, palN)[,-c(2)]) #remove a few students who started PAL but did not finish
colnames(eth.tables[[i]])<- c("nonPAL","PAL")
}
names(eth.tables) <- math.classes
#counts
eth.tables
## $`MATH 31`
## palN
## eth.erss nonPAL PAL
## African American 63 15
## Asian 598 112
## Foreign 68 19
## Hispanic 536 119
## Native American 5 1
## Pacific Islander 10 6
## Two or More Races 106 28
## Unknown 88 17
## White 524 82
# each column is 100%
# percent.eth.within.PAL.and.within.non = lapply(eth.tables, prop.table, margin=2) %>% lapply(round, digits=2)
# percent.eth.within.PAL.and.within.non
# each row is 100%
percent.PAL.and.non.within.ethnicity = lapply(eth.tables, prop.table, margin=1) %>% lapply(round, digits=2)
percent.PAL.and.non.within.ethnicity
## $`MATH 31`
## palN
## eth.erss nonPAL PAL
## African American 0.81 0.19
## Asian 0.84 0.16
## Foreign 0.78 0.22
## Hispanic 0.82 0.18
## Native American 0.83 0.17
## Pacific Islander 0.62 0.38
## Two or More Races 0.79 0.21
## Unknown 0.84 0.16
## White 0.86 0.14
# formatted table
library(knitr)
## Warning: package 'knitr' was built under R version 3.6.3
kable(raw.table, caption = "Raw Comparison of PAL and non-PAL Grades (No Propensity Adjustment)")
class | nonPALavg | PALavg | Diff | NonPAL_Num | PAL_Num | CompletePAL | TermPALStart |
---|---|---|---|---|---|---|---|
MATH 31 | 2.27 | 2.57 | 0.3 | 2034 | 401 | 0.16 | 2148 |
table(mydat$palN)
##
## 0 1 2
## 5153 30 471
temp = subset(mydat, course.seq==0)
pal.start=min(unique(temp$term.code[temp$palN==2]))
# only include terms after PAL start term
mydat = subset(temp, term.code>= pal.start)
dim(mydat)
## [1] 2460 174
table(mydat$palN)
##
## 0 1 2
## 2034 25 401
# number of students who completed Math 31 PAL goes from 471 to 401 (students repeating Math 31 removed)
It would be inaccurate to include these students in the control group since they are receiving a form of supplemental instruction (SI). However, the SI-model for Math 30L and 31L differs from PAL so they are not included in the treatment group either. Rather, these students are removed entirely from the analysis. 116 rows removed.
library(readxl)
Math_30L_31L <- read_excel("Math 30L-31L.xlsx")
Math_30L_31L$Semester.Number = as.numeric(Math_30L_31L$Semester.Number)
# Extract Lecture Course name, i.e. 'Math 31' for join
Math_30L_31L$Course.Lecture = substr(Math_30L_31L$Course, start = 1, stop=7)
#exclude rows that match from the Math 30L and 31L dataframe
table(mydat$palN)
##
## 0 1 2
## 2034 25 401
mydat = anti_join(mydat, Math_30L_31L, by=c("emplid"="Emplid","term.code"="Semester.Number", "course"="Course.Lecture"))
table(mydat$palN)
##
## 0 1 2
## 1918 25 401
dim(mydat)
## [1] 2344 174
Remove 17 variables having over 25% of values missing in order to retain a larger pool of PAL and non-PAL students. The threshold was chosen considering the need to keep the sat.math variable since no other proxy for math preparation/ability is in the dataset. None of the removed variables seem likely to be confounders. We will only include students with no missing values for the subset of variables specified in the 'Filter to relevant variables' section below in this analysis.
plot_missing(mydat, missing_only = TRUE)
zero_missing_vars <- mydat %>%
summarise(across( ,list(function(x) sum(is.na(x))))) %>%
gather() %>%
arrange(value)
zero_missing_vars <- zero_missing_vars %>%
filter(zero_missing_vars$value != 0)
zero_missing_vars$var.name = substr(zero_missing_vars$key,1,nchar(zero_missing_vars$key)-2)
zero_missing_vars <- mydat %>%
dplyr::select(zero_missing_vars$var.name)
missing_vars <- profile_missing(zero_missing_vars)
missing_vars
## feature num_missing pct_missing
## 1 gender 1 0.0004266212
## 2 hous.coh.term.flg 5 0.0021331058
## 3 censusMajor 7 0.0029863481
## 4 campus.gpa 7 0.0029863481
## 5 total.gpa 7 0.0029863481
## 6 term.units.attemptedCensus 7 0.0029863481
## 7 state 8 0.0034129693
## 8 career.num 8 0.0034129693
## 9 load.code 8 0.0034129693
## 10 bot.level 8 0.0034129693
## 11 enroll.status 8 0.0034129693
## 12 term.gpa 8 0.0034129693
## 13 higher.ed.gpa 8 0.0034129693
## 14 csus.gpa 8 0.0034129693
## 15 withdraw_code 8 0.0034129693
## 16 unt_taken_prgrss 8 0.0034129693
## 17 unt_passd_prgrss 8 0.0034129693
## 18 tot_cumulative 8 0.0034129693
## 19 cur_resident_terms 8 0.0034129693
## 20 cum_resident_terms 8 0.0034129693
## 21 adm.area 9 0.0038395904
## 22 veteran 9 0.0038395904
## 23 hs.eng.units 9 0.0038395904
## 24 hs.math.units 9 0.0038395904
## 25 hs.phys.units 9 0.0038395904
## 26 hs.bio.units 9 0.0038395904
## 27 hs.lab.units 9 0.0038395904
## 28 hs.forLang.units 9 0.0038395904
## 29 hs.socstud.units 9 0.0038395904
## 30 hs.perfarts.units 9 0.0038395904
## 31 hs.elective.units 9 0.0038395904
## 32 ge.critical.thinking.status 9 0.0038395904
## 33 ge.english.comp.status 9 0.0038395904
## 34 ge.math.status 9 0.0038395904
## 35 ge.oral.comm.status 9 0.0038395904
## 36 acad.plan 9 0.0038395904
## 37 pass.flg 17 0.0072525597
## 38 pass.flgN 17 0.0072525597
## 39 acad.stand 19 0.0081058020
## 40 coh 25 0.0106655290
## 41 coh.term 25 0.0106655290
## 42 base.time.coh 26 0.0110921502
## 43 enrl.flg 26 0.0110921502
## 44 enrl.flgERS 26 0.0110921502
## 45 pst.flg 26 0.0110921502
## 46 pst.flgERS 26 0.0110921502
## 47 grad.flg 26 0.0110921502
## 48 grad.flgERS 26 0.0110921502
## 49 rtn.flg 26 0.0110921502
## 50 rtn.flgERS 26 0.0110921502
## 51 ttgF 26 0.0110921502
## 52 maxUnits 26 0.0110921502
## 53 tot_cumulative.start 26 0.0110921502
## 54 admis.plan 26 0.0110921502
## 55 stem.final.flg 26 0.0110921502
## 56 hs.grad.yr 27 0.0115187713
## 57 pell.coh.term.flg 30 0.0127986348
## 58 stem.base.flg 31 0.0132252560
## 59 m.rmd 35 0.0149317406
## 60 m.rmd.detail 35 0.0149317406
## 61 e.rmd 35 0.0149317406
## 62 e.rmd.detail 35 0.0149317406
## 63 rmd.flg 35 0.0149317406
## 64 rmd.subj.flg 35 0.0149317406
## 65 eth.erss 38 0.0162116041
## 66 urm.flgERS 38 0.0162116041
## 67 zip 39 0.0166382253
## 68 mother.ed 106 0.0452218430
## 69 m.rmd.admin 211 0.0900170648
## 70 m.rmd.admin.detail 211 0.0900170648
## 71 father.ed 215 0.0917235495
## 72 plan.college 330 0.1407849829
## 73 plan.college.desc 330 0.1407849829
## 74 plan.dept 330 0.1407849829
## 75 plan.deptAbbr 330 0.1407849829
## 76 plan.degree 330 0.1407849829
## 77 plan.type 330 0.1407849829
## 78 tot.passd.prgrss.start 377 0.1608361775
## 79 tot.taken.prgrss.start 377 0.1608361775
## 80 hs.gpa 409 0.1744880546
## 81 admit.term 439 0.1872866894
## 82 sat.math.score 586 0.2500000000
## 83 sat.math.flg 586 0.2500000000
## 84 sat.verbal.score 586 0.2500000000
## 85 sat.verbal.flg 586 0.2500000000
## 86 sat.test.date 586 0.2500000000
## 87 deg.plan1 1729 0.7376279863
## 88 grad.term 1729 0.7376279863
## 89 ttg 1729 0.7376279863
## 90 grad.termERS 1736 0.7406143345
## 91 pledge.term 1739 0.7418941980
## 92 fys.term.code 1890 0.8063139932
## 93 fys.grd 1890 0.8063139932
## 94 fys.rpt.flg 1890 0.8063139932
## 95 treat.section 1918 0.8182593857
## 96 trf.gpaADM 1939 0.8272184300
## 97 deg.plan2 2329 0.9936006826
## 98 withdraw_reason 2334 0.9957337884
## 99 deg.plan3 2343 0.9995733788
## 100 Instructor_02 2344 1.0000000000
## 101 deg.plan4 2344 1.0000000000
## 102 deg.plan5 2344 1.0000000000
## 103 deg.plan6 2344 1.0000000000
sum(missing_vars$pct_missing>0.25) # remove 17 variables with >25% missing values, none appear to be confounders
## [1] 17
vars.missing.data = as.character(missing_vars[missing_vars$pct_missing>0.25, "feature"])
dim(mydat)
## [1] 2344 174
mydat = mydat[,!(names(mydat) %in% vars.missing.data)]
dim(mydat)
## [1] 2344 157
table(mydat$palN)
##
## 0 1 2
## 1918 25 401
Some variables may be single-values and are, thus, unrelated to propensity score. There were 13 single-valued variables in this dataset which were removed.
single.category.vars = sapply(mydat, function(x) length(unique(x))==1)
sum(single.category.vars)
## [1] 13
names(mydat)[single.category.vars]
## [1] "course" "component" "units" "course.numeric"
## [5] "div" "course.seq" "rpt.flg" "pass"
## [9] "c2s" "base.time.course" "years" "pass.term.flg"
## [13] "passN"
# remove the 13 variables just listed
keep.vars = names(which(single.category.vars==FALSE))
mydat= mydat[, keep.vars]
dim(mydat)
## [1] 2344 144
table(mydat$palN)
##
## 0 1 2
## 1918 25 401
Analysis includes 70 observations from learning communities
Create new variables: delay since high school and cumulative percent of units passed. Collapse sparse categories and other miscellaneous clean up of data. Sparse categories can cause complete separation in logistic regression and are only predictive for a few students.
yr.course.taken = as.numeric(gsub(".*([0-9]{4})","\\1",mydat$coh.term))
mydat$delay.from.hs = ifelse(!is.na(yr.course.taken) & !is.na(mydat$hs.grad.yr), yr.course.taken-mydat$hs.grad.yr, NA)
sum(is.na(mydat$delay.from.hs)) #27 missing values
## [1] 27
# remove students who did not complete PAL
dim(mydat)
## [1] 2344 145
mydat=subset(mydat, palN!=1)
#recode palN to 0=nonPAL and 1=PAL, as numeric values
dim(mydat)
## [1] 2319 145
mydat$palN = ifelse(mydat$palN==2, 1, 0)
#clean up category names in m.rmd and e.rmd
mydat$m.rmd[mydat$m.rmd=="Not Remedial\nin Math"]="Not Remedial in Math"
mydat$m.rmd[mydat$m.rmd=="Remedial\nin Math"]="Remedial in Math"
mydat$m.rmd = droplevels(mydat$m.rmd)
mydat$e.rmd[mydat$e.rmd=="Not Remedial\nin English"]="Not Remedial in English"
mydat$e.rmd[mydat$e.rmd=="Remedial\nin English"]="Remedial in English"
mydat$e.rmd = droplevels(mydat$e.rmd)
table(mydat$m.rmd)
##
## Not Remedial in Math Remedial in Math
## 2168 116
# create feature, proportion of cumulative units taken that were passed
# 29% of values are missing, do not use this variable
# mydat$cum.percent.units.passed = mydat$tot.passd.prgrss.start/mydat$tot.taken.prgrss.start
# collapse sparse categories
mydat=group_category(data = mydat, feature = "load.code", threshold = 0.05, update = TRUE)
table(mydat$load.code, PAL=mydat$palN)
## PAL
## 0 1
## F 1695 374
## OTHER 223 27
mydat=group_category(data = mydat, feature = "addr.type", threshold = 0.05, update = TRUE)
table(mydat$addr.type, PAL=mydat$palN)
## PAL
## 0 1
## OTHER 305 54
## PERM 1613 347
# code instructor as alphabetic letter for anonymity
mydat$Instructor_01=droplevels(factor(mydat$Instructor_01))
instructor.vec = sort(unique(mydat$Instructor_01))
num.instr = length(instructor.vec)
mydat$Instructor_01 = factor(mydat$Instructor_01, levels = instructor.vec, labels=as.character(1:num.instr))
key.instr.code = cbind(as.character(instructor.vec),
1:num.instr)
Subjective judgment was used to narrow the pool of variables down to those likely to be confounders. It's important to include all variables correlated with outcome even if it is uncertain whether they are related to likeihood of enrolling in PAL. This allows for a more precise estimate of the treatment effect.
myvars <- c('emplid','term.code','coh','gender','eth.erss','fys.flg','mother.ed','m.rmd','e.rmd','sat.math.score','sat.verbal.score','hs.gpa',"AP_CALAB", "AP_CALAB.flg",'median.income','med.inc.flg','grd.pt.unt',"load.code",'bot.level','censusMajor','term.units.attemptedCensus', 'stem.base.flg','hous.coh.term.flg',
"enroll.status",
'palN',"pell.coh.term.flg","course.age","acad.stand",'delay.from.hs','addr.type','higher.ed.gpa.start','adm.area', 'higher.ed.gpa.start.flg')
mydat = mydat[,myvars]
# Convert categorical variables to factor type
factor.vars = c('coh','gender','eth.erss','fys.flg','mother.ed','e.rmd','m.rmd', "AP_CALAB.flg","load.code",'bot.level','censusMajor','stem.base.flg','hous.coh.term.flg',
"enroll.status","pell.coh.term.flg","acad.stand",'addr.type','adm.area', 'higher.ed.gpa.start.flg')
mydat[,factor.vars] <- lapply(mydat[,factor.vars],as.factor)
plot_missing(mydat, missing_only = TRUE)
table(mydat$palN)
##
## 0 1
## 1918 401
mydat = droplevels(mydat[complete.cases(mydat), ])
dim(mydat)
## [1] 1616 33
table(mydat$palN)
##
## 0 1
## 1323 293
sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))
sum(sep.cols) # 3 variables causing complete separation
## [1] 3
names(mydat)[sep.cols]
## [1] "eth.erss" "censusMajor" "acad.stand"
# collapse censusMajor by course since the degree of collapsing will differ markedly among courses
mydat = group_category(data = mydat, feature = "censusMajor", threshold = 0.1, update = TRUE)
with(mydat, table(censusMajor, palN))
## palN
## censusMajor 0 1
## Chemistry 108 26
## Civil Engineering 231 50
## Computer Engineering 146 30
## Computer Science 285 50
## Electrical and Electronic Engineering 111 23
## Mechanical Engineering 265 87
## OTHER 177 27
# collapse eth.erss categories, remove others causing complete separation
mydat = group_category(data = mydat, feature = "eth.erss", threshold = 0.05, update = TRUE)
with(mydat, table(eth.erss, palN))
## palN
## eth.erss 0 1
## African American 45 12
## Asian 448 85
## Hispanic 355 93
## OTHER 81 26
## Two or More Races 69 22
## White 325 55
mydat$enroll.status.Continuing.indicator = factor(ifelse(mydat$enroll.status=="Continuing", 1, 0))
mydat$acad.stand.Good.indicator = factor(ifelse(mydat$acad.stand=="Good Standing", 1, 0))
table(mydat$palN)
##
## 0 1
## 1323 293
# Removed recoded variables
mydat = mydat[ , !(names(mydat) %in% c("enroll.status","acad.stand"))]
sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))
sep.cols
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
sum(sep.cols)
## [1] 0
names(mydat)[sep.cols]
## character(0)
mydat = mydat[ ,!names(mydat) %in% c("emplid", "term.code")]
Subjectively identified three potential confounders to force the model to retain: sat.math.score, gender, and eth.erss. Stepwise variable selection will be used to select which of the variables currently in the PAL dataset to include in the propensity model.
min.model = glm(palN ~ sat.math.score+gender+eth.erss, data=mydat, family=binomial)
biggest.formula= formula(glm(palN ~sat.math.score+gender+eth.erss+coh+fys.flg+mother.ed+e.rmd+m.rmd+sat.verbal.score+hs.gpa+AP_CALAB+AP_CALAB.flg+median.income+load.code+bot.level+term.units.attemptedCensus+stem.base.flg+hous.coh.term.flg+pell.coh.term.flg+course.age+delay.from.hs+addr.type+higher.ed.gpa.start+adm.area+acad.stand.Good.indicator+censusMajor, data=mydat, family="binomial"))
step.model= step(min.model,
direction="forward",
scope=biggest.formula,
trace=FALSE, k=2)
step.formula = formula(step.model)
#Propensity scores
propensity.model = glm(step.formula, data=mydat, family="binomial")
mydat$propensity = propensity.model$fitted.values
match.math = Match(Y=mydat$grd.pt.unt, Tr=mydat$palN, X=mydat$propensity,BiasAdjust = F, estimand = "ATT", M=1, caliper=0.25)
Check to see if matching resulted in good covariate balance between the PAL and nonPAL groups. Covariate balance is reasonable with the absolute standardized mean difference below 0.10 for all covariates in the propensity model.
m31.bal= bal.tab(match.math, step.formula, data=mydat, un=TRUE, quick=FALSE)
m31.bal
## Balance Measures
## Type Diff.Un Diff.Adj
## sat.math.score Contin. -0.3645 0.0040
## gender_Male Binary -0.0390 0.0242
## eth.erss_African American Binary 0.0069 -0.0140
## eth.erss_Asian Binary -0.0485 -0.0234
## eth.erss_Hispanic Binary 0.0491 0.0200
## eth.erss_OTHER Binary 0.0275 0.0088
## eth.erss_Two or More Races Binary 0.0229 0.0167
## eth.erss_White Binary -0.0579 -0.0080
## censusMajor_Chemistry Binary 0.0071 -0.0405
## censusMajor_Civil Engineering Binary -0.0040 0.0002
## censusMajor_Computer Engineering Binary -0.0080 0.0056
## censusMajor_Computer Science Binary -0.0448 0.0205
## censusMajor_Electrical and Electronic Engineering Binary -0.0054 0.0069
## censusMajor_Mechanical Engineering Binary 0.0966 -0.0063
## censusMajor_OTHER Binary -0.0416 0.0137
## higher.ed.gpa.start Contin. 0.0390 -0.0067
## bot.level_Freshman Binary -0.0095 -0.0097
## bot.level_Junior Binary 0.0127 0.0197
## bot.level_Senior Binary -0.0268 -0.0043
## bot.level_Sophomore Binary 0.0235 -0.0056
## course.age Contin. 0.0892 -0.0185
## hs.gpa Contin. -0.1420 -0.0208
## pell.coh.term.flg_Pell Binary 0.0708 0.0288
##
## Sample sizes
## Control Treated
## All 1323. 293
## Matched (ESS) 316.39 292
## Matched (Unweighted) 590. 292
## Unmatched 733. 0
## Discarded 0. 1
love.plot(m31.bal,binary = "raw", stars = "std", var.order = "unadjusted",
thresholds = c(m = .1), abs = F)
The estimated increase in the mean grade of students in PAL over those not in PAL after correcting for self-selection biases is .
est.effect=summary(match.math)
##
## Estimate... 0.39715
## AI SE...... 0.0986
## T-stat..... 4.0279
## p.val...... 5.627e-05
##
## Original number of observations.............. 1616
## Original number of treated obs............... 293
## Matched number of observations............... 292
## Matched number of observations (unweighted). 770
##
## Caliper (SDs)........................................ 0.25
## Number of obs dropped by 'exact' or 'caliper' 1
indx.trt = match.math$index.treated
indx.cntr = match.math$index.control
wts = match.math$weights
mean.trt=sum(mydat[indx.trt, 'grd.pt.unt']*wts)/sum(wts)
mean.cntrl=sum(mydat[indx.cntr, 'grd.pt.unt']*wts)/sum(wts)
diff.means = match.math$est
std.error = match.math$se
num.uniq.contrl.matched = length(unique(match.math$index.control))
num.trt = match.math$wnobs
approx.p.val = (1-pnorm(diff.means/std.error)) #right-tail
results = data.frame(course=c("Math 31"), mean.gpa.PAL=round(mean.trt,2), mean.gpa.nonPAL=round(mean.cntrl,2), difference.mean.gpa=round(diff.means,2), standard.error=round(std.error,2),p.val=round(approx.p.val,4),number.control=num.uniq.contrl.matched, number.treated=num.trt)
kable(results, caption = "Propensity-scored Adjusted Comparison of PAL and non-PAL Grades")
course | mean.gpa.PAL | mean.gpa.nonPAL | difference.mean.gpa | standard.error | p.val | number.control | number.treated |
---|---|---|---|---|---|---|---|
Math 31 | 2.53 | 2.14 | 0.4 | 0.1 | 0 | 590 | 292 |
print(psens(match.math, Gamma=2, GammaInc = 0.1) )
##
## Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value
##
## Unconfounded estimate .... 0
##
## Gamma Lower bound Upper bound
## 1.0 0 0.0000
## 1.1 0 0.0000
## 1.2 0 0.0000
## 1.3 0 0.0009
## 1.4 0 0.0105
## 1.5 0 0.0603
## 1.6 0 0.1985
## 1.7 0 0.4266
## 1.8 0 0.6696
## 1.9 0 0.8484
## 2.0 0 0.9441
##
## Note: Gamma is Odds of Differential Assignment To
## Treatment Due to Unobserved Factors
##
Note that in the above table, 1.5 in the first column marks the first row where 0.05 is between the Lower and Upper bounds. This means that an unknown confounder which increases the odds of being in PAL by more than 1.5 is enough to change the treatment effect from significant to non-significant. The next code block generates the effect on the odds ratio of each variable in the propensity score. Thus, if there is an unknown confounder that has an effect on the propensity score similar to major or higher.ed.gpa.start then the PAL effect would become non-significant. Thus, this finding is moderately sensitive to unknown confounders. It is possible a variable like the number of hours per week a student works which is not in our dataset is a confounder which could reverse the statistical significance of this analysis.
print(sort(exp(abs(propensity.model$coefficients))))
## sat.math.score
## 1.003920
## censusMajorElectrical and Electronic Engineering
## 1.037258
## eth.erssHispanic
## 1.102326
## censusMajorCivil Engineering
## 1.122719
## eth.erssAsian
## 1.139046
## eth.erssWhite
## 1.155434
## censusMajorComputer Engineering
## 1.157027
## course.age
## 1.160383
## censusMajorComputer Science
## 1.181197
## pell.coh.term.flgPell
## 1.263025
## bot.levelSophomore
## 1.322104
## hs.gpa
## 1.349711
## genderMale
## 1.409533
## eth.erssOTHER
## 1.483459
## censusMajorMechanical Engineering
## 1.485078
## higher.ed.gpa.start
## 1.535626
## bot.levelJunior
## 1.597130
## censusMajorOTHER
## 1.598893
## eth.erssTwo or More Races
## 1.695647
## bot.levelSenior
## 5.178004
## (Intercept)
## 8.556757