Executive Summary
Subset Data for Math 29
Compare the mean gpa for PAL and non-PAL students for Math 29 without Propensity Score Adjustment
Analyze missingness
Data Cleaning and Feature Engineering
Build a Logistic Regression Model for Propensity Score
Propensity Score Matching
Balance check
Propensity-score adjusted mean gpa of PAL and nonPAL
Sensitivity Analysis

Executive Summary

A propensity score analysis was conducted to assess the effect of PAL supplemental instruction on course grades in Math 29 . Propensity score adjustment was necessary since the data are observational and the characteristics of students who voluntarily enroll in PAL may differ in ways that may, independently of PAL, impact course grade compared to students who do not enroll in PAL. In propensity score analysis, variables related to both likelihood of PAL enrollment and course grade (confounders) are used in a logistic regression model to obtain a propensity score, which is a student's likelihood of enrolling in PAL.

Data Analysis

First Add Some R Packages to the Workspace\ Caution: warning messages are suppressed to reduce clutter in the output.

defaultW <- getOption("warn") 
options(warn = -1) 


library(rbounds)
library(readr)
library(tidyr)
library(tidyverse)
library(MatchIt)
library(cobalt)
library(backports)
library(DataExplorer)
library(readxl)
library(Matching)

options(warn = defaultW)

Source a script

The script contains a user-defined function called id.vars.zero.freq to identify variables causing complete separation in logistic regression.

source("IdentifyCompleteSepVars2.R")

Import the Data

Make sure the PAL datafile is in the same directory as this RMarkdown file. The file which include data through the Spring 2019 semester has 1,099,371 rows and 174 columns. The total of the grd.pt.unt column is 2,237,555.

load("~/PAL data analysis/PALdatafull.Rdata")
dim(PALdatafull)

## [1] 1099371     174

sum(PALdatafull$grd.pt.unt)

## [1] 2237555

Subset Data for Math 29

Each instance of a course taken by a student is included in the data file for the semester during which it was taken and every semester thereafter. 'test1' below filters so the dataset only includes the semester during which the course was taken. The math 29 dataset should contain 4876 rows and 174 columns.

test1 <- PALdatafull$base.time.course==1
test2 <- PALdatafull$course == "MATH 29"

math.indx <- test1 & test2

mydat <- PALdatafull[math.indx, ]
dim(mydat)

## [1] 4876  174

table(mydat$palN)  # 651 Math 29 students completed PAL

## 
##    0    1    2 
## 4172   53  651

Compare the mean gpa for PAL and non-PAL students for Math 29 without Propensity Score Adjustment

The course.seq variables indicates how many times a student has taken a course prior to the current attempt. To filter on the first attempt at a course, we set course.seq to 0.

raw.table = data.frame(class=character(),
                       nonPALavg=numeric(),
                       PALavg=numeric(), 
                       Diff=numeric(), 
                       NonPAL_Num= integer(),
                       PAL_Num=integer(),
                       CompletePAL=numeric(),
                       TermPALStart=integer(),
                       row.names=NULL,
                       stringsAsFactors = FALSE)
eth.tables = list()
math.classes = "MATH 29"
for (i in 1:length(math.classes))
    {
      curr.class = math.classes[i]
      temp = subset(mydat, course==curr.class & course.seq==0)
      pal.start=min(unique(temp$term.code[temp$palN==2]))
      # only include terms after PAL start term
      temp = subset(temp, term.code>= pal.start)
      
      x=tapply(temp$grd.pt.unt,temp$palN, 
               mean, na.rm=T) %>% 
        as.numeric %>% 
        round(2)
      
      y=table(temp$palN) %>% as.numeric
      
      raw.table[i, 'class' ] = curr.class
      raw.table[i, c(2:4,7)]=c(x[1], x[3],x[3]-x[1],
                        round(y[3]/sum(y),2))
      raw.table[i, c(5,6,8)]= c(y[1], y[3], pal.start)
      
      #participation by ethnicity
      eth.tables[[i]] <- with(temp, table(eth.erss, palN)[,-c(2)]) #remove a few students who started PAL but did not finish
      colnames(eth.tables[[i]])<- c("nonPAL","PAL")
     
}
names(eth.tables) <- math.classes
#counts
eth.tables

## $`MATH 29`
##                    palN
## eth.erss            nonPAL PAL
##   African American     104  29
##   Asian                561 165
##   Foreign               76  14
##   Hispanic             630 191
##   Native American        2   2
##   Pacific Islander      18   5
##   Two or More Races    112  38
##   Unknown               84  28
##   White                405 121

# each column is 100%
# percent.eth.within.PAL.and.within.non = lapply(eth.tables, prop.table, margin=2) %>% lapply(round, digits=2)

# percent.eth.within.PAL.and.within.non

# each row is 100%
 percent.PAL.and.non.within.ethnicity = lapply(eth.tables, prop.table, margin=1) %>% lapply(round, digits=2)

 percent.PAL.and.non.within.ethnicity

## $`MATH 29`
##                    palN
## eth.erss            nonPAL  PAL
##   African American    0.78 0.22
##   Asian               0.77 0.23
##   Foreign             0.84 0.16
##   Hispanic            0.77 0.23
##   Native American     0.50 0.50
##   Pacific Islander    0.78 0.22
##   Two or More Races   0.75 0.25
##   Unknown             0.75 0.25
##   White               0.77 0.23

# formatted table
library(knitr)

## Warning: package 'knitr' was built under R version 3.6.3

kable(raw.table, caption = "Raw Comparison of PAL and non-PAL Grades (No Propensity Adjustment)")

Raw Comparison of PAL and non-PAL Grades (No Propensity Adjustment)
class	nonPALavg	PALavg	Diff	NonPAL_Num	PAL_Num	CompletePAL	TermPALStart
MATH 29	2.04	2.4	0.36	2014	596	0.22	2128

Remove students repeating Math 29 and data from before PAL for Math 29 started

table(mydat$palN)

## 
##    0    1    2 
## 4172   53  651

temp = subset(mydat, course.seq==0)
      pal.start=min(unique(temp$term.code[temp$palN==2]))
      # only include terms after PAL start term
      mydat = subset(temp, term.code>= pal.start)
      dim(mydat)

## [1] 2649  174

      table(mydat$palN)

## 
##    0    1    2 
## 2014   39  596

      #  number of students who completed Math 29 PAL goes from 651 to 596 (students repeating Math 29 removed)

Analyze missingness

Remove 26 variables having over 17% of values missing in order to retain a larger pool of PAL and non-PAL students. None of the removed variables seem likely to be confounders. We will only include students with no missing values for the subset of variables specified in the 'Filter to relevant variables' section below.

plot_missing(mydat, missing_only = TRUE)

zero_missing_vars <- mydat %>% 
  summarise(across( ,list(function(x) sum(is.na(x))))) %>%
  gather() %>%
  arrange(value)

zero_missing_vars <- zero_missing_vars %>%
  filter(zero_missing_vars$value != 0)

zero_missing_vars$var.name = substr(zero_missing_vars$key,1,nchar(zero_missing_vars$key)-2)

zero_missing_vars <- mydat %>%
  dplyr::select(zero_missing_vars$var.name)

missing_vars <- profile_missing(zero_missing_vars)
missing_vars

##                         feature num_missing  pct_missing
## 1                      adm.area           1 0.0003775009
## 2                       veteran           1 0.0003775009
## 3                  hs.eng.units           1 0.0003775009
## 4                 hs.math.units           1 0.0003775009
## 5                 hs.phys.units           1 0.0003775009
## 6                  hs.bio.units           1 0.0003775009
## 7                  hs.lab.units           1 0.0003775009
## 8              hs.forLang.units           1 0.0003775009
## 9              hs.socstud.units           1 0.0003775009
## 10            hs.perfarts.units           1 0.0003775009
## 11            hs.elective.units           1 0.0003775009
## 12  ge.critical.thinking.status           1 0.0003775009
## 13       ge.english.comp.status           1 0.0003775009
## 14               ge.math.status           1 0.0003775009
## 15          ge.oral.comm.status           1 0.0003775009
## 16                   career.num           1 0.0003775009
## 17                    load.code           1 0.0003775009
## 18                    bot.level           1 0.0003775009
## 19                enroll.status           1 0.0003775009
## 20                     term.gpa           1 0.0003775009
## 21                higher.ed.gpa           1 0.0003775009
## 22                     csus.gpa           1 0.0003775009
## 23                withdraw_code           1 0.0003775009
## 24             unt_taken_prgrss           1 0.0003775009
## 25             unt_passd_prgrss           1 0.0003775009
## 26               tot_cumulative           1 0.0003775009
## 27           cur_resident_terms           1 0.0003775009
## 28           cum_resident_terms           1 0.0003775009
## 29                    acad.plan           1 0.0003775009
## 30            hous.coh.term.flg           2 0.0007550019
## 31                    addr.type           2 0.0007550019
## 32                      country           2 0.0007550019
## 33                  censusMajor           3 0.0011325028
## 34                   campus.gpa           3 0.0011325028
## 35                    total.gpa           3 0.0011325028
## 36   term.units.attemptedCensus           3 0.0011325028
## 37                          coh           8 0.0030200076
## 38                     coh.term           8 0.0030200076
## 39                base.time.coh           8 0.0030200076
## 40                     enrl.flg           8 0.0030200076
## 41                  enrl.flgERS           8 0.0030200076
## 42                      pst.flg           8 0.0030200076
## 43                   pst.flgERS           8 0.0030200076
## 44                     grad.flg           8 0.0030200076
## 45                  grad.flgERS           8 0.0030200076
## 46                      rtn.flg           8 0.0030200076
## 47                   rtn.flgERS           8 0.0030200076
## 48                         ttgF           8 0.0030200076
## 49                     maxUnits           8 0.0030200076
## 50         tot_cumulative.start           8 0.0030200076
## 51               stem.final.flg           8 0.0030200076
## 52                        state           9 0.0033975085
## 53                   acad.stand           9 0.0033975085
## 54                   admis.plan           9 0.0033975085
## 55                   hs.grad.yr           9 0.0033975085
## 56            pell.coh.term.flg          10 0.0037750094
## 57                stem.base.flg          10 0.0037750094
## 58                     eth.erss          25 0.0094375236
## 59                   urm.flgERS          25 0.0094375236
## 60                          zip          25 0.0094375236
## 61                     pass.flg          32 0.0120800302
## 62                    pass.flgN          32 0.0120800302
## 63                    mother.ed          83 0.0313325783
## 64                        m.rmd         124 0.0468101170
## 65                 m.rmd.detail         124 0.0468101170
## 66                        e.rmd         124 0.0468101170
## 67                 e.rmd.detail         124 0.0468101170
## 68                      rmd.flg         124 0.0468101170
## 69                 rmd.subj.flg         124 0.0468101170
## 70                    father.ed         225 0.0849377123
## 71                       hs.gpa         225 0.0849377123
## 72                  m.rmd.admin         309 0.1166477916
## 73           m.rmd.admin.detail         309 0.1166477916
## 74               sat.math.score         434 0.1638354096
## 75                 sat.math.flg         434 0.1638354096
## 76             sat.verbal.score         434 0.1638354096
## 77               sat.verbal.flg         434 0.1638354096
## 78                sat.test.date         434 0.1638354096
## 79                 plan.college         634 0.2393355983
## 80            plan.college.desc         634 0.2393355983
## 81                    plan.dept         634 0.2393355983
## 82                plan.deptAbbr         634 0.2393355983
## 83                  plan.degree         634 0.2393355983
## 84                    plan.type         634 0.2393355983
## 85                   admit.term         756 0.2853907135
## 86       tot.passd.prgrss.start         824 0.3110607777
## 87       tot.taken.prgrss.start         824 0.3110607777
## 88                treat.section        1604 0.6055115138
## 89                fys.term.code        1897 0.7161192903
## 90                      fys.grd        1897 0.7161192903
## 91                  fys.rpt.flg        1897 0.7161192903
## 92                  pledge.term        1948 0.7353718384
## 93                    deg.plan1        2025 0.7644394111
## 94                    grad.term        2025 0.7644394111
## 95                          ttg        2025 0.7644394111
## 96                 grad.termERS        2035 0.7682144205
## 97                   trf.gpaADM        2393 0.9033597584
## 98                    deg.plan2        2637 0.9954699887
## 99              withdraw_reason        2639 0.9962249906
## 100               Instructor_02        2649 1.0000000000
## 101                   deg.plan3        2649 1.0000000000
## 102                   deg.plan4        2649 1.0000000000
## 103                   deg.plan5        2649 1.0000000000
## 104                   deg.plan6        2649 1.0000000000

 sum(missing_vars$pct_missing>0.17)  # remove 26 variables with >17% missing values, none appear to be confounders

## [1] 26

vars.missing.data = as.character(missing_vars[missing_vars$pct_missing>0.17, "feature"])
dim(mydat)

## [1] 2649  174

 mydat = mydat[,!(names(mydat) %in% vars.missing.data)]
dim(mydat)

## [1] 2649  148

 table(mydat$palN)

## 
##    0    1    2 
## 2014   39  596

Remove single-valued variables

Some variables may be single-values and are, thus, unrelated to propensity score. There were 11 single-valued variables in this dataset which were removed.

single.category.vars = sapply(mydat, function(x) length(unique(x))==1)

sum(single.category.vars)

## [1] 11

names(mydat)[single.category.vars]

##  [1] "course"           "component"        "units"            "course.numeric"  
##  [5] "div"              "course.seq"       "rpt.flg"          "c2s"             
##  [9] "base.time.course" "years"            "pass.term.flg"

# remove the 11 variables just listed
keep.vars = names(which(single.category.vars==FALSE))
mydat= mydat[, keep.vars]
dim(mydat)

## [1] 2649  137

table(mydat$palN)

## 
##    0    1    2 
## 2014   39  596

Cohorted sections (numbered 80-83) are retained in the analysis

(356 observations in PAL are from learning communities)

Data Cleaning and Feature Engineering

Create new variables: delay since high school and cumulative percent of units passed. Collapse sparse categories and other miscellaneous clean up of data. Sparse categories can cause complete separation in logistic regression and are only predictive for a few students.

yr.course.taken = as.numeric(gsub(".*([0-9]{4})","\\1",mydat$coh.term))

mydat$delay.from.hs = ifelse(!is.na(yr.course.taken) & !is.na(mydat$hs.grad.yr), yr.course.taken-mydat$hs.grad.yr, NA)

sum(is.na(mydat$delay.from.hs))  #9 missing values

## [1] 9

# remove students who did not complete PAL
dim(mydat)

## [1] 2649  138

mydat=subset(mydat, palN!=1)

#recode palN to 0=nonPAL and 1=PAL, as numeric values
dim(mydat)

## [1] 2610  138

mydat$palN = ifelse(mydat$palN==2, 1, 0)

#clean up category names in m.rmd and e.rmd
mydat$m.rmd[mydat$m.rmd=="Not Remedial\nin Math"]="Not Remedial in Math"

mydat$m.rmd[mydat$m.rmd=="Remedial\nin Math"]="Remedial in Math"

mydat$m.rmd = droplevels(mydat$m.rmd)

mydat$e.rmd[mydat$e.rmd=="Not Remedial\nin English"]="Not Remedial in English"

mydat$e.rmd[mydat$e.rmd=="Remedial\nin English"]="Remedial in English"

mydat$e.rmd = droplevels(mydat$e.rmd)
table(mydat$m.rmd)

## 
## Not Remedial in Math     Remedial in Math 
##                 2153                  338

# create feature, proportion of cumulative units taken that were passes
# 824 missing values out of 2236, do not use this variable
# mydat$cum.percent.units.passed =  mydat$tot.passd.prgrss.start/mydat$tot.taken.prgrss.start

# collapse sparse categories
mydat=group_category(data = mydat, feature = "load.code", threshold = 0.05, update = TRUE)

table(mydat$load.code, PAL=mydat$palN)

##        PAL
##            0    1
##   F     1819  554
##   OTHER  195   42

mydat=group_category(data = mydat, feature = "addr.type", threshold = 0.05, update = TRUE)
table(mydat$addr.type, PAL=mydat$palN)

##        PAL
##            0    1
##   OTHER  293   87
##   PERM  1721  509

# code instructor as alphabetic letter for anonymity
mydat$Instructor_01=droplevels(factor(mydat$Instructor_01))

instructor.vec = sort(unique(mydat$Instructor_01))
num.instr = length(instructor.vec)

mydat$Instructor_01 = factor(mydat$Instructor_01, levels = instructor.vec, labels=as.character(1:num.instr))

key.instr.code = cbind(as.character(instructor.vec),
                       1:num.instr)

Filter to relevant variables

Subjective judgment was used to narrow the pool of variables down to those likely to be confounders. It's important to include all variables correlated with outcome even if it is uncertain whether they are related to likeihood of enrolling in PAL. This allows for a more precise estimate of the treatment effect.

myvars = c('emplid','term.code','coh','gender','eth.erss','fys.flg','mother.ed','m.rmd','e.rmd','sat.math.score','sat.verbal.score','hs.gpa',"AP_CALAB", "AP_CALAB.flg",'median.income','med.inc.flg','grd.pt.unt',"load.code",'bot.level','censusMajor','term.units.attemptedCensus',  'stem.base.flg','hous.coh.term.flg',
 "enroll.status",
   'palN',"pell.coh.term.flg","course.age","acad.stand",'delay.from.hs','addr.type','higher.ed.gpa.start','adm.area', 'higher.ed.gpa.start.flg')

mydat = mydat[,myvars]

# Convert categorical variables to factor type
factor.vars = c('coh','gender','eth.erss','fys.flg','mother.ed','e.rmd','m.rmd', "AP_CALAB.flg","load.code",'bot.level','censusMajor','stem.base.flg','hous.coh.term.flg',
 "enroll.status","pell.coh.term.flg","acad.stand",'addr.type','adm.area', 'higher.ed.gpa.start.flg')


mydat[,factor.vars] <- lapply(mydat[,factor.vars],as.factor)

plot_missing(mydat, missing_only = TRUE)

table(mydat$palN)

## 
##    0    1 
## 2014  596

Subset on Complete Cases only in Math/Stat Data

mydat = droplevels(mydat[complete.cases(mydat), ])
dim(mydat)

## [1] 1982   33

table(mydat$palN)

## 
##    0    1 
## 1507  475

Coarsen some factors to prevent complete separation in logistic regression

sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))

sum(sep.cols)  # 2 variables causing complete separation

## [1] 2

names(mydat)[sep.cols]

## [1] "censusMajor"   "enroll.status"

# collapse censusMajor by course since the degree of collapsing will differ markedly among courses
mydat =  group_category(data = mydat, feature = "censusMajor", threshold = 0.30, update = T)
with(mydat, table(censusMajor, palN))

##                         palN
## censusMajor                0   1
##   Chemistry              177  51
##   Civil Engineering      166  74
##   Computer Engineering   157  66
##   Computer Science       231  67
##   Mechanical Engineering 278  97
##   OTHER                  498 120

# collapse eth.erss categories, remove others causing complete separation
mydat = group_category(data = mydat, feature = "eth.erss", threshold = 0.05, update = TRUE)
with(mydat, table(eth.erss, palN))

##                    palN
## eth.erss              0   1
##   African American   86  26
##   Asian             455 141
##   Hispanic          486 153
##   OTHER             103  35
##   Two or More Races  83  32
##   White             294  88

mydat$enroll.status.Continuing.indicator = factor(ifelse(mydat$enroll.status=="Continuing", 1, 0))
mydat$acad.stand.Good.indicator = factor(ifelse(mydat$acad.stand=="Good Standing", 1, 0))

Remove original versions of recoded variables

# Removed recoded variables
mydat = mydat[ , !(names(mydat) %in% c("enroll.status","acad.stand"))]

Check again for Complete Separation

sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))
sep.cols

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

sum(sep.cols)

## [1] 0

names(mydat)[sep.cols]

## character(0)

mydat = mydat[ ,!names(mydat) %in% c("emplid", "term.code")]

Build a Logistic Regression Model for Propensity Score

Subjectively identified three potential confounders to force the model to retain: sat.math.score, gender, and eth.erss. Stepwise variable selection will be used to select which of the variables currently in the PAL dataset to include in the propensity model. A separate model was built for each course to allow the relevant predictors to be course-specific.

# higher.ed.gpa.start.flg is single-valued, remove
#  figure out why the logistic regression won't run


min.model = glm(palN ~ sat.math.score+gender+eth.erss, data=mydat, family=binomial)
 
biggest.formula= formula(glm(palN ~sat.math.score+gender+eth.erss+coh+fys.flg+mother.ed+e.rmd+m.rmd+sat.verbal.score+hs.gpa+AP_CALAB+AP_CALAB.flg+median.income+load.code+bot.level+term.units.attemptedCensus+stem.base.flg+hous.coh.term.flg+pell.coh.term.flg+course.age+delay.from.hs+addr.type+higher.ed.gpa.start+adm.area+acad.stand.Good.indicator+censusMajor, data=mydat, family="binomial"))

step.model= step(min.model,
                        direction="forward",
                        scope=biggest.formula,
                        trace=FALSE, k=2)


step.formula = formula(step.model)

#Propensity scores

propensity.model = glm(step.formula, data=mydat, family="binomial")
mydat$propensity = propensity.model$fitted.values

Propensity Score Matching

match.math =  Match(Y=mydat$grd.pt.unt, Tr=mydat$palN, X=mydat$propensity,BiasAdjust = F, estimand = "ATT", M=1, caliper=0.25)

Balance check

Check to see if matching resulted in good covariate balance between the PAL and nonPAL groups. Covariate balance is reasonable with the absolute value of all standardized mean differences below 0.1 for all covariates in propensity model.

m29.bal=    bal.tab(match.math, step.formula, data=mydat, un=TRUE, quick=FALSE)
m29.bal

## Balance Measures
##                               Type Diff.Un Diff.Adj
## sat.math.score             Contin.  0.0329   0.0343
## gender_Male                 Binary  0.0312   0.0273
## eth.erss_African American   Binary -0.0023   0.0068
## eth.erss_Asian              Binary -0.0051  -0.0107
## eth.erss_Hispanic           Binary -0.0004   0.0158
## eth.erss_OTHER              Binary  0.0053  -0.0048
## eth.erss_Two or More Races  Binary  0.0123  -0.0006
## eth.erss_White              Binary -0.0098  -0.0065
## course.age                 Contin. -0.3118   0.0143
## acad.stand.Good.indicator   Binary  0.0430   0.0017
## median.income              Contin.  0.1237  -0.0320
## fys.flg_FYS                 Binary  0.0064  -0.0052
## fys.flg_FYS_EOP             Binary -0.0261  -0.0144
## fys.flg_FYS_LCOM            Binary -0.0387   0.0106
## fys.flg_No FYS              Binary  0.0583   0.0090
## 
## Sample sizes
##                      Control Treated
## All                  1507.       475
## Matched (ESS)         443.97     475
## Matched (Unweighted)  850.       475
## Unmatched             657.         0

love.plot(m29.bal,binary = "raw", stars = "std", var.order = "unadjusted", 
          thresholds = c(m = .1), abs = F)

Propensity-score adjusted mean gpa of PAL and nonPAL

The estimated increase in the mean grade of students in PAL over those not in PAL after correcting for self-selection biases is shown below.

est.effect=summary(match.math)

## 
## Estimate...  0.33988 
## AI SE......  0.07677 
## T-stat.....  4.4272 
## p.val......  9.5468e-06 
## 
## Original number of observations..............  1982 
## Original number of treated obs...............  475 
## Matched number of observations...............  475 
## Matched number of observations  (unweighted).  1401 
## 
## Caliper (SDs)........................................   0.25 
## Number of obs dropped by 'exact' or 'caliper'  0

indx.trt = match.math$index.treated
indx.cntr = match.math$index.control
wts = match.math$weights

mean.trt=sum(mydat[indx.trt, 'grd.pt.unt']*wts)/sum(wts)
mean.cntrl=sum(mydat[indx.cntr, 'grd.pt.unt']*wts)/sum(wts)


diff.means = match.math$est

std.error =  match.math$se
num.uniq.contrl.matched =  length(unique(match.math$index.control))

num.trt = match.math$wnobs

approx.p.val =  (1-pnorm(diff.means/std.error)) #right-tail

results = data.frame(course=c("Math 29"), mean.gpa.PAL=round(mean.trt,2), mean.gpa.nonPAL=round(mean.cntrl,2), difference.mean.gpa=round(diff.means,2), standard.error=round(std.error,2),p.val=round(approx.p.val,4),number.control=num.uniq.contrl.matched, number.treated=num.trt)

kable(results, caption = "Propensity-scored Adjusted Comparison of PAL and non-PAL Grades")

Propensity-scored Adjusted Comparison of PAL and non-PAL Grades
course	mean.gpa.PAL	mean.gpa.nonPAL	difference.mean.gpa	standard.error	p.val	number.control	number.treated
Math 29	2.39	2.05	0.34	0.08	0	850	475

Sensitivity Analysis

 print(psens(match.math, Gamma=2, GammaInc = 0.1) )

## 
##  Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value 
##  
## Unconfounded estimate ....  0 
## 
##  Gamma Lower bound Upper bound
##    1.0           0      0.0000
##    1.1           0      0.0000
##    1.2           0      0.0003
##    1.3           0      0.0121
##    1.4           0      0.1254
##    1.5           0      0.4524
##    1.6           0      0.8003
##    1.7           0      0.9597
##    1.8           0      0.9954
##    1.9           0      0.9997
##    2.0           0      1.0000
## 
##  Note: Gamma is Odds of Differential Assignment To
##  Treatment Due to Unobserved Factors 
##

Note that in the above table, 1.4 in the first column marks the first row where 0.05 is between the Lower and Upper bounds. This means that an unknown confounder which increases the odds of being in PAL by more than 1.4 is enough to change the treatment effect from significant to non-significant. The next code block generates the effect on the odds ratio of each variable in the propensity score. Thus, if there is an unknown confounder that has an effect on the propensity score similar to course.age (age at beginning of course), the PAL effect would become non-significant. Thus, this finding is sensitive to unknown confounders. It is possible a variable like the number of hours per week a student works which is not in our dataset is a confounder which could reverse the statistical significance of this analysis.

  print(sort(exp(abs(propensity.model$coefficients))))

##              median.income             sat.math.score 
##                   1.000006                   1.001313 
##              eth.erssAsian              fys.flgNo FYS 
##                   1.025765                   1.028264 
##              eth.erssWhite           eth.erssHispanic 
##                   1.044620                   1.063914 
##                 genderMale              eth.erssOTHER 
##                   1.078107                   1.152969 
##  eth.erssTwo or More Races            fys.flgFYS_LCOM 
##                   1.218335                   1.287753 
##                 course.age acad.stand.Good.indicator1 
##                   1.361760                   1.577097 
##             fys.flgFYS_EOP                (Intercept) 
##                   2.257320                 103.579026

PAL effectiveness Analysis for Math 29

Michelle Norris and Caitlin Esgana

May 11, 2021