A propensity score analysis was conducted to assess the effect of PAL supplemental instruction on course grades in Math 29 . Propensity score adjustment was necessary since the data are observational and the characteristics of students who voluntarily enroll in PAL may differ in ways that may, independently of PAL, impact course grade compared to students who do not enroll in PAL. In propensity score analysis, variables related to both likelihood of PAL enrollment and course grade (confounders) are used in a logistic regression model to obtain a propensity score, which is a student's likelihood of enrolling in PAL.
First Add Some R Packages to the Workspace\ Caution: warning messages are suppressed to reduce clutter in the output.
defaultW <- getOption("warn")
options(warn = -1)
library(rbounds)
library(readr)
library(tidyr)
library(tidyverse)
library(MatchIt)
library(cobalt)
library(backports)
library(DataExplorer)
library(readxl)
library(Matching)
options(warn = defaultW)
The script contains a user-defined function called id.vars.zero.freq to identify variables causing complete separation in logistic regression.
source("IdentifyCompleteSepVars2.R")
Make sure the PAL datafile is in the same directory as this RMarkdown file. The file which include data through the Spring 2019 semester has 1,099,371 rows and 174 columns. The total of the grd.pt.unt column is 2,237,555.
load("~/PAL data analysis/PALdatafull.Rdata")
dim(PALdatafull)
## [1] 1099371 174
sum(PALdatafull$grd.pt.unt)
## [1] 2237555
Each instance of a course taken by a student is included in the data file for the semester during which it was taken and every semester thereafter. 'test1' below filters so the dataset only includes the semester during which the course was taken. The math 29 dataset should contain 4876 rows and 174 columns.
test1 <- PALdatafull$base.time.course==1
test2 <- PALdatafull$course == "MATH 29"
math.indx <- test1 & test2
mydat <- PALdatafull[math.indx, ]
dim(mydat)
## [1] 4876 174
table(mydat$palN) # 651 Math 29 students completed PAL
##
## 0 1 2
## 4172 53 651
The course.seq variables indicates how many times a student has taken a course prior to the current attempt. To filter on the first attempt at a course, we set course.seq to 0.
raw.table = data.frame(class=character(),
nonPALavg=numeric(),
PALavg=numeric(),
Diff=numeric(),
NonPAL_Num= integer(),
PAL_Num=integer(),
CompletePAL=numeric(),
TermPALStart=integer(),
row.names=NULL,
stringsAsFactors = FALSE)
eth.tables = list()
math.classes = "MATH 29"
for (i in 1:length(math.classes))
{
curr.class = math.classes[i]
temp = subset(mydat, course==curr.class & course.seq==0)
pal.start=min(unique(temp$term.code[temp$palN==2]))
# only include terms after PAL start term
temp = subset(temp, term.code>= pal.start)
x=tapply(temp$grd.pt.unt,temp$palN,
mean, na.rm=T) %>%
as.numeric %>%
round(2)
y=table(temp$palN) %>% as.numeric
raw.table[i, 'class' ] = curr.class
raw.table[i, c(2:4,7)]=c(x[1], x[3],x[3]-x[1],
round(y[3]/sum(y),2))
raw.table[i, c(5,6,8)]= c(y[1], y[3], pal.start)
#participation by ethnicity
eth.tables[[i]] <- with(temp, table(eth.erss, palN)[,-c(2)]) #remove a few students who started PAL but did not finish
colnames(eth.tables[[i]])<- c("nonPAL","PAL")
}
names(eth.tables) <- math.classes
#counts
eth.tables
## $`MATH 29`
## palN
## eth.erss nonPAL PAL
## African American 104 29
## Asian 561 165
## Foreign 76 14
## Hispanic 630 191
## Native American 2 2
## Pacific Islander 18 5
## Two or More Races 112 38
## Unknown 84 28
## White 405 121
# each column is 100%
# percent.eth.within.PAL.and.within.non = lapply(eth.tables, prop.table, margin=2) %>% lapply(round, digits=2)
# percent.eth.within.PAL.and.within.non
# each row is 100%
percent.PAL.and.non.within.ethnicity = lapply(eth.tables, prop.table, margin=1) %>% lapply(round, digits=2)
percent.PAL.and.non.within.ethnicity
## $`MATH 29`
## palN
## eth.erss nonPAL PAL
## African American 0.78 0.22
## Asian 0.77 0.23
## Foreign 0.84 0.16
## Hispanic 0.77 0.23
## Native American 0.50 0.50
## Pacific Islander 0.78 0.22
## Two or More Races 0.75 0.25
## Unknown 0.75 0.25
## White 0.77 0.23
# formatted table
library(knitr)
## Warning: package 'knitr' was built under R version 3.6.3
kable(raw.table, caption = "Raw Comparison of PAL and non-PAL Grades (No Propensity Adjustment)")
class | nonPALavg | PALavg | Diff | NonPAL_Num | PAL_Num | CompletePAL | TermPALStart |
---|---|---|---|---|---|---|---|
MATH 29 | 2.04 | 2.4 | 0.36 | 2014 | 596 | 0.22 | 2128 |
table(mydat$palN)
##
## 0 1 2
## 4172 53 651
temp = subset(mydat, course.seq==0)
pal.start=min(unique(temp$term.code[temp$palN==2]))
# only include terms after PAL start term
mydat = subset(temp, term.code>= pal.start)
dim(mydat)
## [1] 2649 174
table(mydat$palN)
##
## 0 1 2
## 2014 39 596
# number of students who completed Math 29 PAL goes from 651 to 596 (students repeating Math 29 removed)
Remove 26 variables having over 17% of values missing in order to retain a larger pool of PAL and non-PAL students. None of the removed variables seem likely to be confounders. We will only include students with no missing values for the subset of variables specified in the 'Filter to relevant variables' section below.
plot_missing(mydat, missing_only = TRUE)
zero_missing_vars <- mydat %>%
summarise(across( ,list(function(x) sum(is.na(x))))) %>%
gather() %>%
arrange(value)
zero_missing_vars <- zero_missing_vars %>%
filter(zero_missing_vars$value != 0)
zero_missing_vars$var.name = substr(zero_missing_vars$key,1,nchar(zero_missing_vars$key)-2)
zero_missing_vars <- mydat %>%
dplyr::select(zero_missing_vars$var.name)
missing_vars <- profile_missing(zero_missing_vars)
missing_vars
## feature num_missing pct_missing
## 1 adm.area 1 0.0003775009
## 2 veteran 1 0.0003775009
## 3 hs.eng.units 1 0.0003775009
## 4 hs.math.units 1 0.0003775009
## 5 hs.phys.units 1 0.0003775009
## 6 hs.bio.units 1 0.0003775009
## 7 hs.lab.units 1 0.0003775009
## 8 hs.forLang.units 1 0.0003775009
## 9 hs.socstud.units 1 0.0003775009
## 10 hs.perfarts.units 1 0.0003775009
## 11 hs.elective.units 1 0.0003775009
## 12 ge.critical.thinking.status 1 0.0003775009
## 13 ge.english.comp.status 1 0.0003775009
## 14 ge.math.status 1 0.0003775009
## 15 ge.oral.comm.status 1 0.0003775009
## 16 career.num 1 0.0003775009
## 17 load.code 1 0.0003775009
## 18 bot.level 1 0.0003775009
## 19 enroll.status 1 0.0003775009
## 20 term.gpa 1 0.0003775009
## 21 higher.ed.gpa 1 0.0003775009
## 22 csus.gpa 1 0.0003775009
## 23 withdraw_code 1 0.0003775009
## 24 unt_taken_prgrss 1 0.0003775009
## 25 unt_passd_prgrss 1 0.0003775009
## 26 tot_cumulative 1 0.0003775009
## 27 cur_resident_terms 1 0.0003775009
## 28 cum_resident_terms 1 0.0003775009
## 29 acad.plan 1 0.0003775009
## 30 hous.coh.term.flg 2 0.0007550019
## 31 addr.type 2 0.0007550019
## 32 country 2 0.0007550019
## 33 censusMajor 3 0.0011325028
## 34 campus.gpa 3 0.0011325028
## 35 total.gpa 3 0.0011325028
## 36 term.units.attemptedCensus 3 0.0011325028
## 37 coh 8 0.0030200076
## 38 coh.term 8 0.0030200076
## 39 base.time.coh 8 0.0030200076
## 40 enrl.flg 8 0.0030200076
## 41 enrl.flgERS 8 0.0030200076
## 42 pst.flg 8 0.0030200076
## 43 pst.flgERS 8 0.0030200076
## 44 grad.flg 8 0.0030200076
## 45 grad.flgERS 8 0.0030200076
## 46 rtn.flg 8 0.0030200076
## 47 rtn.flgERS 8 0.0030200076
## 48 ttgF 8 0.0030200076
## 49 maxUnits 8 0.0030200076
## 50 tot_cumulative.start 8 0.0030200076
## 51 stem.final.flg 8 0.0030200076
## 52 state 9 0.0033975085
## 53 acad.stand 9 0.0033975085
## 54 admis.plan 9 0.0033975085
## 55 hs.grad.yr 9 0.0033975085
## 56 pell.coh.term.flg 10 0.0037750094
## 57 stem.base.flg 10 0.0037750094
## 58 eth.erss 25 0.0094375236
## 59 urm.flgERS 25 0.0094375236
## 60 zip 25 0.0094375236
## 61 pass.flg 32 0.0120800302
## 62 pass.flgN 32 0.0120800302
## 63 mother.ed 83 0.0313325783
## 64 m.rmd 124 0.0468101170
## 65 m.rmd.detail 124 0.0468101170
## 66 e.rmd 124 0.0468101170
## 67 e.rmd.detail 124 0.0468101170
## 68 rmd.flg 124 0.0468101170
## 69 rmd.subj.flg 124 0.0468101170
## 70 father.ed 225 0.0849377123
## 71 hs.gpa 225 0.0849377123
## 72 m.rmd.admin 309 0.1166477916
## 73 m.rmd.admin.detail 309 0.1166477916
## 74 sat.math.score 434 0.1638354096
## 75 sat.math.flg 434 0.1638354096
## 76 sat.verbal.score 434 0.1638354096
## 77 sat.verbal.flg 434 0.1638354096
## 78 sat.test.date 434 0.1638354096
## 79 plan.college 634 0.2393355983
## 80 plan.college.desc 634 0.2393355983
## 81 plan.dept 634 0.2393355983
## 82 plan.deptAbbr 634 0.2393355983
## 83 plan.degree 634 0.2393355983
## 84 plan.type 634 0.2393355983
## 85 admit.term 756 0.2853907135
## 86 tot.passd.prgrss.start 824 0.3110607777
## 87 tot.taken.prgrss.start 824 0.3110607777
## 88 treat.section 1604 0.6055115138
## 89 fys.term.code 1897 0.7161192903
## 90 fys.grd 1897 0.7161192903
## 91 fys.rpt.flg 1897 0.7161192903
## 92 pledge.term 1948 0.7353718384
## 93 deg.plan1 2025 0.7644394111
## 94 grad.term 2025 0.7644394111
## 95 ttg 2025 0.7644394111
## 96 grad.termERS 2035 0.7682144205
## 97 trf.gpaADM 2393 0.9033597584
## 98 deg.plan2 2637 0.9954699887
## 99 withdraw_reason 2639 0.9962249906
## 100 Instructor_02 2649 1.0000000000
## 101 deg.plan3 2649 1.0000000000
## 102 deg.plan4 2649 1.0000000000
## 103 deg.plan5 2649 1.0000000000
## 104 deg.plan6 2649 1.0000000000
sum(missing_vars$pct_missing>0.17) # remove 26 variables with >17% missing values, none appear to be confounders
## [1] 26
vars.missing.data = as.character(missing_vars[missing_vars$pct_missing>0.17, "feature"])
dim(mydat)
## [1] 2649 174
mydat = mydat[,!(names(mydat) %in% vars.missing.data)]
dim(mydat)
## [1] 2649 148
table(mydat$palN)
##
## 0 1 2
## 2014 39 596
Some variables may be single-values and are, thus, unrelated to propensity score. There were 11 single-valued variables in this dataset which were removed.
single.category.vars = sapply(mydat, function(x) length(unique(x))==1)
sum(single.category.vars)
## [1] 11
names(mydat)[single.category.vars]
## [1] "course" "component" "units" "course.numeric"
## [5] "div" "course.seq" "rpt.flg" "c2s"
## [9] "base.time.course" "years" "pass.term.flg"
# remove the 11 variables just listed
keep.vars = names(which(single.category.vars==FALSE))
mydat= mydat[, keep.vars]
dim(mydat)
## [1] 2649 137
table(mydat$palN)
##
## 0 1 2
## 2014 39 596
(356 observations in PAL are from learning communities)
Create new variables: delay since high school and cumulative percent of units passed. Collapse sparse categories and other miscellaneous clean up of data. Sparse categories can cause complete separation in logistic regression and are only predictive for a few students.
yr.course.taken = as.numeric(gsub(".*([0-9]{4})","\\1",mydat$coh.term))
mydat$delay.from.hs = ifelse(!is.na(yr.course.taken) & !is.na(mydat$hs.grad.yr), yr.course.taken-mydat$hs.grad.yr, NA)
sum(is.na(mydat$delay.from.hs)) #9 missing values
## [1] 9
# remove students who did not complete PAL
dim(mydat)
## [1] 2649 138
mydat=subset(mydat, palN!=1)
#recode palN to 0=nonPAL and 1=PAL, as numeric values
dim(mydat)
## [1] 2610 138
mydat$palN = ifelse(mydat$palN==2, 1, 0)
#clean up category names in m.rmd and e.rmd
mydat$m.rmd[mydat$m.rmd=="Not Remedial\nin Math"]="Not Remedial in Math"
mydat$m.rmd[mydat$m.rmd=="Remedial\nin Math"]="Remedial in Math"
mydat$m.rmd = droplevels(mydat$m.rmd)
mydat$e.rmd[mydat$e.rmd=="Not Remedial\nin English"]="Not Remedial in English"
mydat$e.rmd[mydat$e.rmd=="Remedial\nin English"]="Remedial in English"
mydat$e.rmd = droplevels(mydat$e.rmd)
table(mydat$m.rmd)
##
## Not Remedial in Math Remedial in Math
## 2153 338
# create feature, proportion of cumulative units taken that were passes
# 824 missing values out of 2236, do not use this variable
# mydat$cum.percent.units.passed = mydat$tot.passd.prgrss.start/mydat$tot.taken.prgrss.start
# collapse sparse categories
mydat=group_category(data = mydat, feature = "load.code", threshold = 0.05, update = TRUE)
table(mydat$load.code, PAL=mydat$palN)
## PAL
## 0 1
## F 1819 554
## OTHER 195 42
mydat=group_category(data = mydat, feature = "addr.type", threshold = 0.05, update = TRUE)
table(mydat$addr.type, PAL=mydat$palN)
## PAL
## 0 1
## OTHER 293 87
## PERM 1721 509
# code instructor as alphabetic letter for anonymity
mydat$Instructor_01=droplevels(factor(mydat$Instructor_01))
instructor.vec = sort(unique(mydat$Instructor_01))
num.instr = length(instructor.vec)
mydat$Instructor_01 = factor(mydat$Instructor_01, levels = instructor.vec, labels=as.character(1:num.instr))
key.instr.code = cbind(as.character(instructor.vec),
1:num.instr)
Subjective judgment was used to narrow the pool of variables down to those likely to be confounders. It's important to include all variables correlated with outcome even if it is uncertain whether they are related to likeihood of enrolling in PAL. This allows for a more precise estimate of the treatment effect.
myvars = c('emplid','term.code','coh','gender','eth.erss','fys.flg','mother.ed','m.rmd','e.rmd','sat.math.score','sat.verbal.score','hs.gpa',"AP_CALAB", "AP_CALAB.flg",'median.income','med.inc.flg','grd.pt.unt',"load.code",'bot.level','censusMajor','term.units.attemptedCensus', 'stem.base.flg','hous.coh.term.flg',
"enroll.status",
'palN',"pell.coh.term.flg","course.age","acad.stand",'delay.from.hs','addr.type','higher.ed.gpa.start','adm.area', 'higher.ed.gpa.start.flg')
mydat = mydat[,myvars]
# Convert categorical variables to factor type
factor.vars = c('coh','gender','eth.erss','fys.flg','mother.ed','e.rmd','m.rmd', "AP_CALAB.flg","load.code",'bot.level','censusMajor','stem.base.flg','hous.coh.term.flg',
"enroll.status","pell.coh.term.flg","acad.stand",'addr.type','adm.area', 'higher.ed.gpa.start.flg')
mydat[,factor.vars] <- lapply(mydat[,factor.vars],as.factor)
plot_missing(mydat, missing_only = TRUE)
table(mydat$palN)
##
## 0 1
## 2014 596
mydat = droplevels(mydat[complete.cases(mydat), ])
dim(mydat)
## [1] 1982 33
table(mydat$palN)
##
## 0 1
## 1507 475
sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))
sum(sep.cols) # 2 variables causing complete separation
## [1] 2
names(mydat)[sep.cols]
## [1] "censusMajor" "enroll.status"
# collapse censusMajor by course since the degree of collapsing will differ markedly among courses
mydat = group_category(data = mydat, feature = "censusMajor", threshold = 0.30, update = T)
with(mydat, table(censusMajor, palN))
## palN
## censusMajor 0 1
## Chemistry 177 51
## Civil Engineering 166 74
## Computer Engineering 157 66
## Computer Science 231 67
## Mechanical Engineering 278 97
## OTHER 498 120
# collapse eth.erss categories, remove others causing complete separation
mydat = group_category(data = mydat, feature = "eth.erss", threshold = 0.05, update = TRUE)
with(mydat, table(eth.erss, palN))
## palN
## eth.erss 0 1
## African American 86 26
## Asian 455 141
## Hispanic 486 153
## OTHER 103 35
## Two or More Races 83 32
## White 294 88
mydat$enroll.status.Continuing.indicator = factor(ifelse(mydat$enroll.status=="Continuing", 1, 0))
mydat$acad.stand.Good.indicator = factor(ifelse(mydat$acad.stand=="Good Standing", 1, 0))
# Removed recoded variables
mydat = mydat[ , !(names(mydat) %in% c("enroll.status","acad.stand"))]
sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))
sep.cols
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
sum(sep.cols)
## [1] 0
names(mydat)[sep.cols]
## character(0)
mydat = mydat[ ,!names(mydat) %in% c("emplid", "term.code")]
Subjectively identified three potential confounders to force the model to retain: sat.math.score, gender, and eth.erss. Stepwise variable selection will be used to select which of the variables currently in the PAL dataset to include in the propensity model. A separate model was built for each course to allow the relevant predictors to be course-specific.
# higher.ed.gpa.start.flg is single-valued, remove
# figure out why the logistic regression won't run
min.model = glm(palN ~ sat.math.score+gender+eth.erss, data=mydat, family=binomial)
biggest.formula= formula(glm(palN ~sat.math.score+gender+eth.erss+coh+fys.flg+mother.ed+e.rmd+m.rmd+sat.verbal.score+hs.gpa+AP_CALAB+AP_CALAB.flg+median.income+load.code+bot.level+term.units.attemptedCensus+stem.base.flg+hous.coh.term.flg+pell.coh.term.flg+course.age+delay.from.hs+addr.type+higher.ed.gpa.start+adm.area+acad.stand.Good.indicator+censusMajor, data=mydat, family="binomial"))
step.model= step(min.model,
direction="forward",
scope=biggest.formula,
trace=FALSE, k=2)
step.formula = formula(step.model)
#Propensity scores
propensity.model = glm(step.formula, data=mydat, family="binomial")
mydat$propensity = propensity.model$fitted.values
match.math = Match(Y=mydat$grd.pt.unt, Tr=mydat$palN, X=mydat$propensity,BiasAdjust = F, estimand = "ATT", M=1, caliper=0.25)
Check to see if matching resulted in good covariate balance between the PAL and nonPAL groups. Covariate balance is reasonable with the absolute value of all standardized mean differences below 0.1 for all covariates in propensity model.
m29.bal= bal.tab(match.math, step.formula, data=mydat, un=TRUE, quick=FALSE)
m29.bal
## Balance Measures
## Type Diff.Un Diff.Adj
## sat.math.score Contin. 0.0329 0.0343
## gender_Male Binary 0.0312 0.0273
## eth.erss_African American Binary -0.0023 0.0068
## eth.erss_Asian Binary -0.0051 -0.0107
## eth.erss_Hispanic Binary -0.0004 0.0158
## eth.erss_OTHER Binary 0.0053 -0.0048
## eth.erss_Two or More Races Binary 0.0123 -0.0006
## eth.erss_White Binary -0.0098 -0.0065
## course.age Contin. -0.3118 0.0143
## acad.stand.Good.indicator Binary 0.0430 0.0017
## median.income Contin. 0.1237 -0.0320
## fys.flg_FYS Binary 0.0064 -0.0052
## fys.flg_FYS_EOP Binary -0.0261 -0.0144
## fys.flg_FYS_LCOM Binary -0.0387 0.0106
## fys.flg_No FYS Binary 0.0583 0.0090
##
## Sample sizes
## Control Treated
## All 1507. 475
## Matched (ESS) 443.97 475
## Matched (Unweighted) 850. 475
## Unmatched 657. 0
love.plot(m29.bal,binary = "raw", stars = "std", var.order = "unadjusted",
thresholds = c(m = .1), abs = F)
The estimated increase in the mean grade of students in PAL over those not in PAL after correcting for self-selection biases is shown below.
est.effect=summary(match.math)
##
## Estimate... 0.33988
## AI SE...... 0.07677
## T-stat..... 4.4272
## p.val...... 9.5468e-06
##
## Original number of observations.............. 1982
## Original number of treated obs............... 475
## Matched number of observations............... 475
## Matched number of observations (unweighted). 1401
##
## Caliper (SDs)........................................ 0.25
## Number of obs dropped by 'exact' or 'caliper' 0
indx.trt = match.math$index.treated
indx.cntr = match.math$index.control
wts = match.math$weights
mean.trt=sum(mydat[indx.trt, 'grd.pt.unt']*wts)/sum(wts)
mean.cntrl=sum(mydat[indx.cntr, 'grd.pt.unt']*wts)/sum(wts)
diff.means = match.math$est
std.error = match.math$se
num.uniq.contrl.matched = length(unique(match.math$index.control))
num.trt = match.math$wnobs
approx.p.val = (1-pnorm(diff.means/std.error)) #right-tail
results = data.frame(course=c("Math 29"), mean.gpa.PAL=round(mean.trt,2), mean.gpa.nonPAL=round(mean.cntrl,2), difference.mean.gpa=round(diff.means,2), standard.error=round(std.error,2),p.val=round(approx.p.val,4),number.control=num.uniq.contrl.matched, number.treated=num.trt)
kable(results, caption = "Propensity-scored Adjusted Comparison of PAL and non-PAL Grades")
course | mean.gpa.PAL | mean.gpa.nonPAL | difference.mean.gpa | standard.error | p.val | number.control | number.treated |
---|---|---|---|---|---|---|---|
Math 29 | 2.39 | 2.05 | 0.34 | 0.08 | 0 | 850 | 475 |
print(psens(match.math, Gamma=2, GammaInc = 0.1) )
##
## Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value
##
## Unconfounded estimate .... 0
##
## Gamma Lower bound Upper bound
## 1.0 0 0.0000
## 1.1 0 0.0000
## 1.2 0 0.0003
## 1.3 0 0.0121
## 1.4 0 0.1254
## 1.5 0 0.4524
## 1.6 0 0.8003
## 1.7 0 0.9597
## 1.8 0 0.9954
## 1.9 0 0.9997
## 2.0 0 1.0000
##
## Note: Gamma is Odds of Differential Assignment To
## Treatment Due to Unobserved Factors
##
Note that in the above table, 1.4 in the first column marks the first row where 0.05 is between the Lower and Upper bounds. This means that an unknown confounder which increases the odds of being in PAL by more than 1.4 is enough to change the treatment effect from significant to non-significant. The next code block generates the effect on the odds ratio of each variable in the propensity score. Thus, if there is an unknown confounder that has an effect on the propensity score similar to course.age (age at beginning of course), the PAL effect would become non-significant. Thus, this finding is sensitive to unknown confounders. It is possible a variable like the number of hours per week a student works which is not in our dataset is a confounder which could reverse the statistical significance of this analysis.
print(sort(exp(abs(propensity.model$coefficients))))
## median.income sat.math.score
## 1.000006 1.001313
## eth.erssAsian fys.flgNo FYS
## 1.025765 1.028264
## eth.erssWhite eth.erssHispanic
## 1.044620 1.063914
## genderMale eth.erssOTHER
## 1.078107 1.152969
## eth.erssTwo or More Races fys.flgFYS_LCOM
## 1.218335 1.287753
## course.age acad.stand.Good.indicator1
## 1.361760 1.577097
## fys.flgFYS_EOP (Intercept)
## 2.257320 103.579026