A propensity score analysis was conducted to assess the effect of PAL supplemental instruction on course grades in Math 30 . Propensity score adjustment was necessary since the data are observational and the characteristics of students who voluntarily enroll in PAL may differ in ways that may, independently of PAL, impact course grade compared to students who do not enroll in PAL. In propensity score analysis, variables related to both likelihood of PAL enrollment and course grade (confounders) are used in a logistic regression model to obtain a propensity score, which is a student's likelihood of enrolling in PAL.
First Add Some R Packages to the Workspace\ Caution: warning messages are suppressed to reduce clutter in the output.
defaultW <- getOption("warn")
options(warn = -1)
library(rbounds)
library(readr)
library(tidyr)
library(tidyverse)
library(MatchIt)
library(cobalt)
library(backports)
library(DataExplorer)
library(readxl)
library(Matching)
options(warn = defaultW)
The script contains a user-defined function called id.vars.zero.freq to identify variables causing complete separation in logistic regression.
source("IdentifyCompleteSepVars2.R")
Make sure the PAL datafile is in the same directory as this RMarkdown file. The file which includes data through the Spring 2019 semester has 1,099,371 rows and 174 columns. The total of the grd.pt.unt column is 2,237,555.
load("~/PAL data analysis/PALdatafull.Rdata")
dim(PALdatafull)
## [1] 1099371 174
sum(PALdatafull$grd.pt.unt)
## [1] 2237555
Each instance of a course taken by a student is included in the data file for the semester during which it was taken and every semester thereafter. 'test1' below filters so the dataset only includes the semester during which the course was taken.
test1 <- PALdatafull$base.time.course==1
test2 <- PALdatafull$course == "MATH 30"
math.indx <- test1 & test2
mydat <- PALdatafull[math.indx, ]
dim(mydat)
## [1] 5882 174
# [1] 5882 174
table(mydat$palN) # 704 Math 30 students completed PAL
##
## 0 1 2
## 5143 35 704
The course.seq variables indicates how many times a student has taken a course prior to the current attempt. To filter on the first attempt at a course, we set course.seq to 0.
raw.table = data.frame(class=character(),
nonPALavg=numeric(),
PALavg=numeric(),
Diff=numeric(),
NonPAL_Num= integer(),
PAL_Num=integer(),
CompletePAL=numeric(),
TermPALStart=integer(),
row.names=NULL,
stringsAsFactors = FALSE)
eth.tables = list()
math.classes = "MATH 30"
for (i in 1:length(math.classes))
{
curr.class = math.classes[i]
temp = subset(mydat, course==curr.class & course.seq==0)
pal.start=min(unique(temp$term.code[temp$palN==2]))
# only include terms after PAL start term
temp = subset(temp, term.code>= pal.start)
x=tapply(temp$grd.pt.unt,temp$palN,
mean, na.rm=T) %>%
as.numeric %>%
round(2)
y=table(temp$palN) %>% as.numeric
raw.table[i, 'class' ] = curr.class
raw.table[i, c(2:4,7)]=c(x[1], x[3],x[3]-x[1],
round(y[3]/sum(y),2))
raw.table[i, c(5,6,8)]= c(y[1], y[3], pal.start)
#participation by ethnicity
eth.tables[[i]] <- with(temp, table(eth.erss, palN)[,-c(2)]) #remove a few students who started PAL but did not finish
colnames(eth.tables[[i]])<- c("nonPAL","PAL")
}
names(eth.tables) <- math.classes
#counts
eth.tables
## $`MATH 30`
## palN
## eth.erss nonPAL PAL
## African American 86 31
## Asian 636 194
## Foreign 93 20
## Hispanic 680 172
## Native American 4 3
## Pacific Islander 20 6
## Two or More Races 132 31
## Unknown 103 29
## White 526 138
# each column is 100%
# percent.eth.within.PAL.and.within.non = lapply(eth.tables, prop.table, margin=2) %>% lapply(round, digits=2)
# percent.eth.within.PAL.and.within.non
# each row is 100%
percent.PAL.and.non.within.ethnicity = lapply(eth.tables, prop.table, margin=1) %>% lapply(round, digits=2)
percent.PAL.and.non.within.ethnicity
## $`MATH 30`
## palN
## eth.erss nonPAL PAL
## African American 0.74 0.26
## Asian 0.77 0.23
## Foreign 0.82 0.18
## Hispanic 0.80 0.20
## Native American 0.57 0.43
## Pacific Islander 0.77 0.23
## Two or More Races 0.81 0.19
## Unknown 0.78 0.22
## White 0.79 0.21
# formatted table
library(knitr)
## Warning: package 'knitr' was built under R version 3.6.3
kable(raw.table, caption = "Raw Comparison of PAL and non-PAL Grades (No Propensity Adjustment)")
class | nonPALavg | PALavg | Diff | NonPAL_Num | PAL_Num | CompletePAL | TermPALStart |
---|---|---|---|---|---|---|---|
MATH 30 | 2.23 | 2.68 | 0.45 | 2310 | 626 | 0.21 | 2138 |
table(mydat$palN)
##
## 0 1 2
## 5143 35 704
temp = subset(mydat, course.seq==0)
pal.start=min(unique(temp$term.code[temp$palN==2]))
# only include terms after PAL start term
mydat = subset(temp, term.code>= pal.start)
dim(mydat)
## [1] 2966 174
table(mydat$palN)
##
## 0 1 2
## 2310 30 626
# number of students who completed Math 30 PAL goes from 704 to 626 (students repeating Math 30 removed)
It would be inaccurate to include these students in the control group since they are receiving a form of supplemental instruction (SI). However, the SI-model for Math 30L and 31L differs from PAL so they are not included in the treatment group either. Rather, these students are removed entirely from the analysis. 236 rows removed.
library(readxl)
Math_30L_31L <- read_excel("Math 30L-31L.xlsx")
Math_30L_31L$Semester.Number = as.numeric(Math_30L_31L$Semester.Number)
# Extract Lecture Course name, i.e. 'Math 30' for join
Math_30L_31L$Course.Lecture = substr(Math_30L_31L$Course, start = 1, stop=7)
#exclude rows that match from the Math 30L and 31L dataframe
table(mydat$palN)
##
## 0 1 2
## 2310 30 626
mydat = anti_join(mydat, Math_30L_31L, by=c("emplid"="Emplid","term.code"="Semester.Number", "course"="Course.Lecture"))
table(mydat$palN)
##
## 0 1 2
## 2076 30 624
dim(mydat)
## [1] 2730 174
Remove 19 variables having over 20% of values missing in order to retain a larger pool of PAL and non-PAL students. None of the removed variables seem likely to be confounders. We will only include students with no missing values for the subset of variables specified in the 'Filter to relevant variables' section below.
plot_missing(mydat, missing_only = TRUE)
zero_missing_vars <- mydat %>%
summarise(across( ,list(function(x) sum(is.na(x))))) %>%
gather() %>%
arrange(value)
zero_missing_vars <- zero_missing_vars %>%
filter(zero_missing_vars$value != 0)
zero_missing_vars$var.name = substr(zero_missing_vars$key,1,nchar(zero_missing_vars$key)-2)
zero_missing_vars <- mydat %>%
dplyr::select(zero_missing_vars$var.name)
missing_vars <- profile_missing(zero_missing_vars)
missing_vars
## feature num_missing pct_missing
## 1 gender 1 0.0003663004
## 2 adm.area 1 0.0003663004
## 3 veteran 1 0.0003663004
## 4 hs.eng.units 1 0.0003663004
## 5 hs.math.units 1 0.0003663004
## 6 hs.phys.units 1 0.0003663004
## 7 hs.bio.units 1 0.0003663004
## 8 hs.lab.units 1 0.0003663004
## 9 hs.forLang.units 1 0.0003663004
## 10 hs.socstud.units 1 0.0003663004
## 11 hs.perfarts.units 1 0.0003663004
## 12 hs.elective.units 1 0.0003663004
## 13 ge.critical.thinking.status 1 0.0003663004
## 14 ge.english.comp.status 1 0.0003663004
## 15 ge.math.status 1 0.0003663004
## 16 ge.oral.comm.status 1 0.0003663004
## 17 career.num 3 0.0010989011
## 18 load.code 3 0.0010989011
## 19 bot.level 3 0.0010989011
## 20 enroll.status 3 0.0010989011
## 21 term.gpa 3 0.0010989011
## 22 higher.ed.gpa 3 0.0010989011
## 23 csus.gpa 3 0.0010989011
## 24 withdraw_code 3 0.0010989011
## 25 unt_taken_prgrss 3 0.0010989011
## 26 unt_passd_prgrss 3 0.0010989011
## 27 tot_cumulative 3 0.0010989011
## 28 cur_resident_terms 3 0.0010989011
## 29 cum_resident_terms 3 0.0010989011
## 30 acad.plan 3 0.0010989011
## 31 hous.coh.term.flg 6 0.0021978022
## 32 censusMajor 7 0.0025641026
## 33 campus.gpa 7 0.0025641026
## 34 total.gpa 7 0.0025641026
## 35 term.units.attemptedCensus 7 0.0025641026
## 36 state 9 0.0032967033
## 37 acad.stand 13 0.0047619048
## 38 coh 15 0.0054945055
## 39 coh.term 15 0.0054945055
## 40 admis.plan 17 0.0062271062
## 41 stem.final.flg 17 0.0062271062
## 42 base.time.coh 20 0.0073260073
## 43 enrl.flg 20 0.0073260073
## 44 enrl.flgERS 20 0.0073260073
## 45 pst.flg 20 0.0073260073
## 46 pst.flgERS 20 0.0073260073
## 47 grad.flg 20 0.0073260073
## 48 grad.flgERS 20 0.0073260073
## 49 rtn.flg 20 0.0073260073
## 50 rtn.flgERS 20 0.0073260073
## 51 ttgF 20 0.0073260073
## 52 maxUnits 20 0.0073260073
## 53 tot_cumulative.start 20 0.0073260073
## 54 hs.grad.yr 20 0.0073260073
## 55 pell.coh.term.flg 21 0.0076923077
## 56 stem.base.flg 23 0.0084249084
## 57 pass.flg 29 0.0106227106
## 58 pass.flgN 29 0.0106227106
## 59 eth.erss 31 0.0113553114
## 60 urm.flgERS 31 0.0113553114
## 61 zip 38 0.0139194139
## 62 m.rmd 55 0.0201465201
## 63 m.rmd.detail 55 0.0201465201
## 64 e.rmd 55 0.0201465201
## 65 e.rmd.detail 55 0.0201465201
## 66 rmd.flg 55 0.0201465201
## 67 rmd.subj.flg 55 0.0201465201
## 68 mother.ed 111 0.0406593407
## 69 father.ed 246 0.0901098901
## 70 m.rmd.admin 292 0.1069597070
## 71 m.rmd.admin.detail 292 0.1069597070
## 72 hs.gpa 302 0.1106227106
## 73 plan.college 452 0.1655677656
## 74 plan.college.desc 452 0.1655677656
## 75 plan.dept 452 0.1655677656
## 76 plan.deptAbbr 452 0.1655677656
## 77 plan.degree 452 0.1655677656
## 78 plan.type 452 0.1655677656
## 79 sat.math.score 525 0.1923076923
## 80 sat.math.flg 525 0.1923076923
## 81 sat.verbal.score 525 0.1923076923
## 82 sat.verbal.flg 525 0.1923076923
## 83 sat.test.date 525 0.1923076923
## 84 admit.term 541 0.1981684982
## 85 tot.passd.prgrss.start 794 0.2908424908
## 86 tot.taken.prgrss.start 794 0.2908424908
## 87 treat.section 1838 0.6732600733
## 88 pledge.term 1906 0.6981684982
## 89 deg.plan1 2034 0.7450549451
## 90 grad.term 2034 0.7450549451
## 91 ttg 2034 0.7450549451
## 92 grad.termERS 2038 0.7465201465
## 93 fys.term.code 2104 0.7706959707
## 94 fys.grd 2104 0.7706959707
## 95 fys.rpt.flg 2104 0.7706959707
## 96 trf.gpaADM 2434 0.8915750916
## 97 deg.plan2 2716 0.9948717949
## 98 withdraw_reason 2720 0.9963369963
## 99 Instructor_02 2730 1.0000000000
## 100 deg.plan3 2730 1.0000000000
## 101 deg.plan4 2730 1.0000000000
## 102 deg.plan5 2730 1.0000000000
## 103 deg.plan6 2730 1.0000000000
sum(missing_vars$pct_missing>0.2) # remove 19 variables with >20% missing values, none appear to be confounders
## [1] 19
vars.missing.data = as.character(missing_vars[missing_vars$pct_missing>0.2, "feature"])
dim(mydat)
## [1] 2730 174
mydat = mydat[,!(names(mydat) %in% vars.missing.data)]
dim(mydat)
## [1] 2730 155
table(mydat$palN)
##
## 0 1 2
## 2076 30 624
Some variables may be single-values and are, thus, unrelated to propensity score. There were 10 single-valued variables in this dataset which were removed.
single.category.vars = sapply(mydat, function(x) length(unique(x))==1)
sum(single.category.vars)
## [1] 10
names(mydat)[single.category.vars]
## [1] "course" "units" "course.numeric" "div"
## [5] "course.seq" "rpt.flg" "c2s" "base.time.course"
## [9] "years" "pass.term.flg"
# remove the 10 variables just listed
keep.vars = names(which(single.category.vars==FALSE))
mydat= mydat[, keep.vars]
dim(mydat)
## [1] 2730 145
table(mydat$palN)
##
## 0 1 2
## 2076 30 624
400 observations from learning communities (362 in PAL)
Create new variables: delay since high school and cumulative percent of units passed. Collapse sparse categories and other miscellaneous clean up of data. Sparse categories can cause complete separation in logistic regression and are only predictive for a few students.
yr.course.taken = as.numeric(gsub(".*([0-9]{4})","\\1",mydat$coh.term))
mydat$delay.from.hs = ifelse(!is.na(yr.course.taken) & !is.na(mydat$hs.grad.yr), yr.course.taken-mydat$hs.grad.yr, NA)
sum(is.na(mydat$delay.from.hs)) #19 missing values
## [1] 20
# remove students who did not complete PAL
dim(mydat)
## [1] 2730 146
mydat=subset(mydat, palN!=1)
#recode palN to 0=nonPAL and 1=PAL, as numeric values
dim(mydat)
## [1] 2700 146
mydat$palN = ifelse(mydat$palN==2, 1, 0)
#clean up category names in m.rmd and e.rmd
mydat$m.rmd[mydat$m.rmd=="Not Remedial\nin Math"]="Not Remedial in Math"
mydat$m.rmd[mydat$m.rmd=="Remedial\nin Math"]="Remedial in Math"
mydat$m.rmd = droplevels(mydat$m.rmd)
mydat$e.rmd[mydat$e.rmd=="Not Remedial\nin English"]="Not Remedial in English"
mydat$e.rmd[mydat$e.rmd=="Remedial\nin English"]="Remedial in English"
mydat$e.rmd = droplevels(mydat$e.rmd)
table(mydat$m.rmd)
##
## Not Remedial in Math Remedial in Math
## 2482 163
# create feature, proportion of cumulative units taken that were passed
# 29% of values are missing, do not use this variable
# mydat$cum.percent.units.passed = mydat$tot.passd.prgrss.start/mydat$tot.taken.prgrss.start
# collapse sparse categories
mydat=group_category(data = mydat, feature = "load.code", threshold = 0.05, update = TRUE)
table(mydat$load.code, PAL=mydat$palN)
## PAL
## 0 1
## F 1838 580
## OTHER 238 44
mydat=group_category(data = mydat, feature = "addr.type", threshold = 0.05, update = TRUE)
table(mydat$addr.type, PAL=mydat$palN)
## PAL
## 0 1
## OTHER 334 97
## PERM 1742 527
# code instructor as alphabetic letter for anonymity
mydat$Instructor_01=droplevels(factor(mydat$Instructor_01))
instructor.vec = sort(unique(mydat$Instructor_01))
num.instr = length(instructor.vec)
mydat$Instructor_01 = factor(mydat$Instructor_01, levels = instructor.vec, labels=as.character(1:num.instr))
key.instr.code = cbind(as.character(instructor.vec),
1:num.instr)
Subjective judgment was used to narrow the pool of variables down to those likely to be confounders. It's important to include all variables correlated with outcome even if it is uncertain whether they are related to likeihood of enrolling in PAL. This allows for a more precise estimate of the treatment effect.
myvars <- c('emplid','term.code','coh','gender','eth.erss','fys.flg','mother.ed','m.rmd','e.rmd','sat.math.score','sat.verbal.score','hs.gpa',"AP_CALAB", "AP_CALAB.flg",'median.income','med.inc.flg','grd.pt.unt',"load.code",'bot.level','censusMajor','term.units.attemptedCensus', 'stem.base.flg','hous.coh.term.flg',
"enroll.status",
'palN',"pell.coh.term.flg","course.age","acad.stand",'delay.from.hs','addr.type','higher.ed.gpa.start','adm.area', 'higher.ed.gpa.start.flg')
mydat = mydat[,myvars]
# Convert categorical variables to factor type
factor.vars = c('coh','gender','eth.erss','fys.flg','mother.ed','e.rmd','m.rmd', "AP_CALAB.flg","load.code",'bot.level','censusMajor','stem.base.flg','hous.coh.term.flg',
"enroll.status","pell.coh.term.flg","acad.stand",'addr.type','adm.area', 'higher.ed.gpa.start.flg')
mydat[,factor.vars] <- lapply(mydat[,factor.vars],as.factor)
plot_missing(mydat, missing_only = TRUE)
table(mydat$palN)
##
## 0 1
## 2076 624
mydat = droplevels(mydat[complete.cases(mydat), ])
dim(mydat)
## [1] 2021 33
table(mydat$palN)
##
## 0 1
## 1510 511
sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))
sum(sep.cols)
## [1] 3
names(mydat)[sep.cols]
## [1] "censusMajor" "enroll.status" "acad.stand"
# collapse censusMajor by course since the degree of collapsing will differ markedly among courses
mydat = group_category(data = mydat, feature = "censusMajor", threshold = 0.1, update = TRUE)
with(mydat, table(censusMajor, palN))
## palN
## censusMajor 0 1
## Chemistry 128 41
## Civil Engineering 217 90
## Computer Engineering 185 63
## Computer Science 292 106
## Electrical and Electronic Engineering 97 34
## Mathematics 58 21
## Mechanical Engineering 314 99
## OTHER 183 42
## Undeclared 36 15
# collapse eth.erss categories, remove others causing complete separation
mydat = group_category(data = mydat, feature = "eth.erss", threshold = 0.05, update = TRUE)
with(mydat, table(eth.erss, palN))
## palN
## eth.erss 0 1
## African American 56 26
## Asian 490 167
## Hispanic 428 143
## OTHER 109 41
## Two or More Races 91 23
## White 336 111
mydat$enroll.status.Continuing.indicator = factor(ifelse(mydat$enroll.status=="Continuing", 1, 0))
mydat$acad.stand.Good.indicator = factor(ifelse(mydat$acad.stand=="Good Standing", 1, 0))
# remove rows with med.inc.flg="Missing", 12 Missing values all nonPAL causing complete separation
miss.indx <- mydat$med.inc.flg=="Missing"
mydat <- mydat[!miss.indx, ]
mydat <- subset(mydat, select=-c(med.inc.flg))
table(mydat$palN)
##
## 0 1
## 1501 508
# Removed recoded variables
mydat = mydat[ , !(names(mydat) %in% c("enroll.status","acad.stand"))]
sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))
sep.cols
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
sum(sep.cols)
## [1] 0
names(mydat)[sep.cols]
## character(0)
# remove unnecessary variables
mydat = mydat[ ,!names(mydat) %in% c("emplid", "term.code")]
Subjectively identified three potential confounders to force the model to retain: sat.math.score, gender, and eth.erss. Stepwise variable selection will be used to select which of the variables currently in the PAL dataset to include in the propensity model.
min.model = glm(palN ~ sat.math.score+gender+eth.erss, data=mydat, family=binomial)
biggest.formula= formula(glm(palN ~sat.math.score+gender+eth.erss+coh+fys.flg+mother.ed+m.rmd+e.rmd+sat.verbal.score+hs.gpa+AP_CALAB+AP_CALAB.flg+median.income+load.code+bot.level+term.units.attemptedCensus+stem.base.flg+hous.coh.term.flg+pell.coh.term.flg+course.age+delay.from.hs+addr.type+higher.ed.gpa.start+adm.area+acad.stand.Good.indicator+censusMajor, data=mydat, family="binomial"))
step.model= step(min.model,
direction="forward",
scope=biggest.formula,
trace=FALSE, k=2)
step.formula = formula(step.model)
#Propensity scores
propensity.model = glm(step.formula, data=mydat, family="binomial")
mydat$propensity = propensity.model$fitted.values
match.math = Match(Y=mydat$grd.pt.unt, Tr=mydat$palN, X=mydat$propensity,BiasAdjust = F, estimand = "ATT", M=1, caliper=0.25)
Check to see if matching resulted in good covariate balance between the PAL and nonPAL groups. Covariate balance is reasonable.
m30.bal= bal.tab(match.math, step.formula, data=mydat, un=TRUE, quick=FALSE)
m30.bal
## Balance Measures
## Type Diff.Un Diff.Adj
## sat.math.score Contin. -0.0561 -0.0414
## gender_Male Binary -0.0220 -0.0012
## eth.erss_African American Binary 0.0145 0.0007
## eth.erss_Asian Binary 0.0030 -0.0151
## eth.erss_Hispanic Binary -0.0030 0.0082
## eth.erss_OTHER Binary 0.0095 0.0082
## eth.erss_Two or More Races Binary -0.0154 0.0065
## eth.erss_White Binary -0.0086 -0.0086
## course.age Contin. -0.3067 0.0203
## acad.stand.Good.indicator Binary 0.0582 -0.0089
## bot.level_Freshman Binary 0.1300 0.0155
## bot.level_Junior Binary -0.0342 0.0055
## bot.level_Senior Binary -0.0288 0.0028
## bot.level_Sophomore Binary -0.0670 -0.0238
## sat.verbal.score Contin. -0.1116 -0.0296
## stem.base.flg_STEM Binary -0.0034 0.0001
## delay.from.hs Contin. -0.0664 0.0154
## median.income Contin. 0.0409 -0.0097
##
## Sample sizes
## Control Treated
## All 1501. 508
## Matched (ESS) 514.94 506
## Matched (Unweighted) 870. 506
## Unmatched 631. 0
## Discarded 0. 2
love.plot(m30.bal,binary = "raw", stars = "std", var.order = "unadjusted",
thresholds = c(m = .1), abs = F)
The estimated increase in the mean grade of students in PAL over those not in PAL after correcting for self-selection biases is .
est.effect=summary(match.math)
##
## Estimate... 0.23778
## AI SE...... 0.071962
## T-stat..... 3.3042
## p.val...... 0.0009526
##
## Original number of observations.............. 2009
## Original number of treated obs............... 508
## Matched number of observations............... 506
## Matched number of observations (unweighted). 1502
##
## Caliper (SDs)........................................ 0.25
## Number of obs dropped by 'exact' or 'caliper' 2
indx.trt = match.math$index.treated
indx.cntr = match.math$index.control
wts = match.math$weights
mean.trt=sum(mydat[indx.trt, 'grd.pt.unt']*wts)/sum(wts)
mean.cntrl=sum(mydat[indx.cntr, 'grd.pt.unt']*wts)/sum(wts)
diff.means = match.math$est
std.error = match.math$se
num.uniq.contrl.matched = length(unique(match.math$index.control))
num.trt = match.math$wnobs
approx.p.val = (1-pnorm(diff.means/std.error)) #right-tail
results = data.frame(course=c("Math 30"), mean.gpa.PAL=round(mean.trt,2), mean.gpa.nonPAL=round(mean.cntrl,2), difference.mean.gpa=round(diff.means,2), standard.error=round(std.error,2),p.val=round(approx.p.val,4),number.control=num.uniq.contrl.matched, number.treated=num.trt)
kable(results, caption = "Propensity-scored Adjusted Comparison of PAL and non-PAL Grades")
course | mean.gpa.PAL | mean.gpa.nonPAL | difference.mean.gpa | standard.error | p.val | number.control | number.treated |
---|---|---|---|---|---|---|---|
Math 30 | 2.66 | 2.42 | 0.24 | 0.07 | 5e-04 | 870 | 506 |
print(psens(match.math, Gamma=2, GammaInc = 0.1) )
##
## Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value
##
## Unconfounded estimate .... 0
##
## Gamma Lower bound Upper bound
## 1.0 0 0.0000
## 1.1 0 0.0002
## 1.2 0 0.0164
## 1.3 0 0.1853
## 1.4 0 0.5991
## 1.5 0 0.9063
## 1.6 0 0.9898
## 1.7 0 0.9994
## 1.8 0 1.0000
## 1.9 0 1.0000
## 2.0 0 1.0000
##
## Note: Gamma is Odds of Differential Assignment To
## Treatment Due to Unobserved Factors
##
Note that in the above table, 1.3 in the first column marks the first row where 0.05 is between the Lower and Upper bounds. This means that an unknown confounder which increases the odds of being in PAL by more than 1.3 is enough to change the treatment effect from significant to non-significant. The next code block generates the effect on the odds ratio of each variable in the propensity score. Thus, if there is an unknown confounder that has an effect on the propensity score similar to bot.level (class level at beginning of class) or ethnicity, the PAL effect would become non-significant. Thus, this finding is moderately sensitive to unknown confounders. It is possible a variable like the number of hours per week a student works which is not in our dataset is a confounder which could reverse the statistical significance of this analysis.
print(sort(exp(abs(propensity.model$coefficients))))
## median.income sat.math.score
## 1.000004 1.001379
## sat.verbal.score genderMale
## 1.001786 1.163296
## delay.from.hs eth.erssOTHER
## 1.178764 1.267830
## stem.base.flgSTEM course.age
## 1.268959 1.286637
## bot.levelSophomore bot.levelJunior
## 1.413761 1.456977
## eth.erssWhite eth.erssHispanic
## 1.603320 1.654398
## eth.erssAsian eth.erssTwo or More Races
## 1.676861 2.105995
## acad.stand.Good.indicator1 bot.levelSenior
## 2.330094 3.245961
## (Intercept)
## 196.880770