Executive Summary
Subset Data for Math 31
Compare the mean gpa for PAL and non-PAL students for Math 31 without Propensity Score Adjustment
Analyze missingness
Data Cleaning and Feature Engineering
Build a Logistic Regression Model for Propensity Score
Propensity Score Matching
Balance check
Propensity-score adjusted mean gpa of PAL and nonPAL
Sensitivity Analysis

Executive Summary

A propensity score analysis was conducted to assess the effect of PAL supplemental instruction on course grades Math 31 . Propensity score adjustment was necessary since the data are observational and the characteristics of students who voluntarily enroll in PAL may differ in ways that may, independently of PAL, impact course grade compared to students who do not enroll in PAL. In propensity score analysis, variables related to both likelihood of PAL enrollment and course grade (confounders) are used in a logistic regression model to obtain a propensity score, which is a student's likelihood of enrolling in PAL.

Data Analysis

First Add Some R Packages to the Workspace\ Caution: warning messages are suppressed to reduce clutter in the output.

defaultW <- getOption("warn") 
options(warn = -1) 


library(rbounds)
library(readr)
library(tidyr)
library(tidyverse)
library(MatchIt)
library(cobalt)
library(backports)
library(DataExplorer)
library(readxl)
library(Matching)

options(warn = defaultW)

Source a script

The script contains a user-defined function called id.vars.zero.freq to identify variables causing complete separation in logistic regression.

source("IdentifyCompleteSepVars2.R")

Import the Data

Make sure the PAL datafile is in the same directory as this RMarkdown file. The file which includes data through the Spring 2019 semester has 1,099,371 rows and 174 columns. The total of the grd.pt.unt column is 2,237,555.

load("~/PAL data analysis/PALdatafull.Rdata")
dim(PALdatafull)

## [1] 1099371     174

sum(PALdatafull$grd.pt.unt)

## [1] 2237555

Subset Data for Math 31

Each instance of a course taken by a student is included in the data file for the semester during which it was taken and every semester thereafter. 'test1' below filters so the dataset only includes the semester during which the course was taken.

test1 <- PALdatafull$base.time.course==1
test2 <- PALdatafull$course == "MATH 31"

math.indx <- test1 & test2

mydat <- PALdatafull[math.indx, ]
dim(mydat)

## [1] 5654  174

# [1] 5654  174
table(mydat$palN)

## 
##    0    1    2 
## 5153   30  471

#   0    1    2 
# 5153   30  471

Compare the mean gpa for PAL and non-PAL students for Math 31 without Propensity Score Adjustment

The course.seq variables indicates how many times a student has taken a course prior to the current attempt. To filter on the first attempt at a course, we set course.seq to 0.

raw.table = data.frame(class=character(),
                       nonPALavg=numeric(),
                       PALavg=numeric(), 
                       Diff=numeric(), 
                       NonPAL_Num= integer(),
                       PAL_Num=integer(),
                       CompletePAL=numeric(),
                       TermPALStart=integer(),
                       row.names=NULL,
                       stringsAsFactors = FALSE)
eth.tables = list()
math.classes = "MATH 31"
for (i in 1:length(math.classes))
    {
      curr.class = math.classes[i]
      temp = subset(mydat, course==curr.class & course.seq==0)
      pal.start=min(unique(temp$term.code[temp$palN==2]))
      # only include terms after PAL start term
      temp = subset(temp, term.code>= pal.start)
      
      x=tapply(temp$grd.pt.unt,temp$palN, 
               mean, na.rm=T) %>% 
        as.numeric %>% 
        round(2)
      
      y=table(temp$palN) %>% as.numeric
      
      raw.table[i, 'class' ] = curr.class
      raw.table[i, c(2:4,7)]=c(x[1], x[3],x[3]-x[1],
                        round(y[3]/sum(y),2))
      raw.table[i, c(5,6,8)]= c(y[1], y[3], pal.start)
      
      #participation by ethnicity
      eth.tables[[i]] <- with(temp, table(eth.erss, palN)[,-c(2)]) #remove a few students who started PAL but did not finish
      colnames(eth.tables[[i]])<- c("nonPAL","PAL")
     
}
names(eth.tables) <- math.classes
#counts
eth.tables

## $`MATH 31`
##                    palN
## eth.erss            nonPAL PAL
##   African American      63  15
##   Asian                598 112
##   Foreign               68  19
##   Hispanic             536 119
##   Native American        5   1
##   Pacific Islander      10   6
##   Two or More Races    106  28
##   Unknown               88  17
##   White                524  82

# each column is 100%
# percent.eth.within.PAL.and.within.non = lapply(eth.tables, prop.table, margin=2) %>% lapply(round, digits=2)

# percent.eth.within.PAL.and.within.non

# each row is 100%
 percent.PAL.and.non.within.ethnicity = lapply(eth.tables, prop.table, margin=1) %>% lapply(round, digits=2)

 percent.PAL.and.non.within.ethnicity

## $`MATH 31`
##                    palN
## eth.erss            nonPAL  PAL
##   African American    0.81 0.19
##   Asian               0.84 0.16
##   Foreign             0.78 0.22
##   Hispanic            0.82 0.18
##   Native American     0.83 0.17
##   Pacific Islander    0.62 0.38
##   Two or More Races   0.79 0.21
##   Unknown             0.84 0.16
##   White               0.86 0.14

# formatted table
library(knitr)

## Warning: package 'knitr' was built under R version 3.6.3

kable(raw.table, caption = "Raw Comparison of PAL and non-PAL Grades (No Propensity Adjustment)")

Raw Comparison of PAL and non-PAL Grades (No Propensity Adjustment)
class	nonPALavg	PALavg	Diff	NonPAL_Num	PAL_Num	CompletePAL	TermPALStart
MATH 31	2.27	2.57	0.3	2034	401	0.16	2148

Remove students repeating Math 31 and data from before PAL for Math 31 started

table(mydat$palN)

## 
##    0    1    2 
## 5153   30  471

temp = subset(mydat, course.seq==0)
      pal.start=min(unique(temp$term.code[temp$palN==2]))
      # only include terms after PAL start term
      mydat = subset(temp, term.code>= pal.start)
      dim(mydat)

## [1] 2460  174

      table(mydat$palN)

## 
##    0    1    2 
## 2034   25  401

      #  number of students who completed Math 31 PAL goes from 471 to 401 (students repeating Math 31 removed)

Remove students enrolled in Math 31L

It would be inaccurate to include these students in the control group since they are receiving a form of supplemental instruction (SI). However, the SI-model for Math 30L and 31L differs from PAL so they are not included in the treatment group either. Rather, these students are removed entirely from the analysis. 116 rows removed.

library(readxl)

Math_30L_31L <- read_excel("Math 30L-31L.xlsx")
Math_30L_31L$Semester.Number = as.numeric(Math_30L_31L$Semester.Number)

#  Extract Lecture Course name, i.e. 'Math 31' for join
Math_30L_31L$Course.Lecture = substr(Math_30L_31L$Course, start = 1, stop=7)

#exclude rows that match from the Math 30L and 31L dataframe
table(mydat$palN)

## 
##    0    1    2 
## 2034   25  401

mydat = anti_join(mydat, Math_30L_31L, by=c("emplid"="Emplid","term.code"="Semester.Number", "course"="Course.Lecture"))
table(mydat$palN)

## 
##    0    1    2 
## 1918   25  401

dim(mydat)

## [1] 2344  174

Analyze missingness

Remove 17 variables having over 25% of values missing in order to retain a larger pool of PAL and non-PAL students. The threshold was chosen considering the need to keep the sat.math variable since no other proxy for math preparation/ability is in the dataset. None of the removed variables seem likely to be confounders. We will only include students with no missing values for the subset of variables specified in the 'Filter to relevant variables' section below in this analysis.

plot_missing(mydat, missing_only = TRUE)

zero_missing_vars <- mydat %>% 
  summarise(across( ,list(function(x) sum(is.na(x))))) %>%
  gather() %>%
  arrange(value)

zero_missing_vars <- zero_missing_vars %>%
  filter(zero_missing_vars$value != 0)

zero_missing_vars$var.name = substr(zero_missing_vars$key,1,nchar(zero_missing_vars$key)-2)

zero_missing_vars <- mydat %>%
  dplyr::select(zero_missing_vars$var.name)

missing_vars <- profile_missing(zero_missing_vars)
missing_vars

##                         feature num_missing  pct_missing
## 1                        gender           1 0.0004266212
## 2             hous.coh.term.flg           5 0.0021331058
## 3                   censusMajor           7 0.0029863481
## 4                    campus.gpa           7 0.0029863481
## 5                     total.gpa           7 0.0029863481
## 6    term.units.attemptedCensus           7 0.0029863481
## 7                         state           8 0.0034129693
## 8                    career.num           8 0.0034129693
## 9                     load.code           8 0.0034129693
## 10                    bot.level           8 0.0034129693
## 11                enroll.status           8 0.0034129693
## 12                     term.gpa           8 0.0034129693
## 13                higher.ed.gpa           8 0.0034129693
## 14                     csus.gpa           8 0.0034129693
## 15                withdraw_code           8 0.0034129693
## 16             unt_taken_prgrss           8 0.0034129693
## 17             unt_passd_prgrss           8 0.0034129693
## 18               tot_cumulative           8 0.0034129693
## 19           cur_resident_terms           8 0.0034129693
## 20           cum_resident_terms           8 0.0034129693
## 21                     adm.area           9 0.0038395904
## 22                      veteran           9 0.0038395904
## 23                 hs.eng.units           9 0.0038395904
## 24                hs.math.units           9 0.0038395904
## 25                hs.phys.units           9 0.0038395904
## 26                 hs.bio.units           9 0.0038395904
## 27                 hs.lab.units           9 0.0038395904
## 28             hs.forLang.units           9 0.0038395904
## 29             hs.socstud.units           9 0.0038395904
## 30            hs.perfarts.units           9 0.0038395904
## 31            hs.elective.units           9 0.0038395904
## 32  ge.critical.thinking.status           9 0.0038395904
## 33       ge.english.comp.status           9 0.0038395904
## 34               ge.math.status           9 0.0038395904
## 35          ge.oral.comm.status           9 0.0038395904
## 36                    acad.plan           9 0.0038395904
## 37                     pass.flg          17 0.0072525597
## 38                    pass.flgN          17 0.0072525597
## 39                   acad.stand          19 0.0081058020
## 40                          coh          25 0.0106655290
## 41                     coh.term          25 0.0106655290
## 42                base.time.coh          26 0.0110921502
## 43                     enrl.flg          26 0.0110921502
## 44                  enrl.flgERS          26 0.0110921502
## 45                      pst.flg          26 0.0110921502
## 46                   pst.flgERS          26 0.0110921502
## 47                     grad.flg          26 0.0110921502
## 48                  grad.flgERS          26 0.0110921502
## 49                      rtn.flg          26 0.0110921502
## 50                   rtn.flgERS          26 0.0110921502
## 51                         ttgF          26 0.0110921502
## 52                     maxUnits          26 0.0110921502
## 53         tot_cumulative.start          26 0.0110921502
## 54                   admis.plan          26 0.0110921502
## 55               stem.final.flg          26 0.0110921502
## 56                   hs.grad.yr          27 0.0115187713
## 57            pell.coh.term.flg          30 0.0127986348
## 58                stem.base.flg          31 0.0132252560
## 59                        m.rmd          35 0.0149317406
## 60                 m.rmd.detail          35 0.0149317406
## 61                        e.rmd          35 0.0149317406
## 62                 e.rmd.detail          35 0.0149317406
## 63                      rmd.flg          35 0.0149317406
## 64                 rmd.subj.flg          35 0.0149317406
## 65                     eth.erss          38 0.0162116041
## 66                   urm.flgERS          38 0.0162116041
## 67                          zip          39 0.0166382253
## 68                    mother.ed         106 0.0452218430
## 69                  m.rmd.admin         211 0.0900170648
## 70           m.rmd.admin.detail         211 0.0900170648
## 71                    father.ed         215 0.0917235495
## 72                 plan.college         330 0.1407849829
## 73            plan.college.desc         330 0.1407849829
## 74                    plan.dept         330 0.1407849829
## 75                plan.deptAbbr         330 0.1407849829
## 76                  plan.degree         330 0.1407849829
## 77                    plan.type         330 0.1407849829
## 78       tot.passd.prgrss.start         377 0.1608361775
## 79       tot.taken.prgrss.start         377 0.1608361775
## 80                       hs.gpa         409 0.1744880546
## 81                   admit.term         439 0.1872866894
## 82               sat.math.score         586 0.2500000000
## 83                 sat.math.flg         586 0.2500000000
## 84             sat.verbal.score         586 0.2500000000
## 85               sat.verbal.flg         586 0.2500000000
## 86                sat.test.date         586 0.2500000000
## 87                    deg.plan1        1729 0.7376279863
## 88                    grad.term        1729 0.7376279863
## 89                          ttg        1729 0.7376279863
## 90                 grad.termERS        1736 0.7406143345
## 91                  pledge.term        1739 0.7418941980
## 92                fys.term.code        1890 0.8063139932
## 93                      fys.grd        1890 0.8063139932
## 94                  fys.rpt.flg        1890 0.8063139932
## 95                treat.section        1918 0.8182593857
## 96                   trf.gpaADM        1939 0.8272184300
## 97                    deg.plan2        2329 0.9936006826
## 98              withdraw_reason        2334 0.9957337884
## 99                    deg.plan3        2343 0.9995733788
## 100               Instructor_02        2344 1.0000000000
## 101                   deg.plan4        2344 1.0000000000
## 102                   deg.plan5        2344 1.0000000000
## 103                   deg.plan6        2344 1.0000000000

 sum(missing_vars$pct_missing>0.25)  # remove 17 variables with >25% missing values, none appear to be confounders

## [1] 17

vars.missing.data = as.character(missing_vars[missing_vars$pct_missing>0.25, "feature"])
dim(mydat)

## [1] 2344  174

 mydat = mydat[,!(names(mydat) %in% vars.missing.data)]
dim(mydat)

## [1] 2344  157

 table(mydat$palN)

## 
##    0    1    2 
## 1918   25  401

Remove single-valued variables

Some variables may be single-values and are, thus, unrelated to propensity score. There were 13 single-valued variables in this dataset which were removed.

single.category.vars = sapply(mydat, function(x) length(unique(x))==1)

sum(single.category.vars)

## [1] 13

names(mydat)[single.category.vars]

##  [1] "course"           "component"        "units"            "course.numeric"  
##  [5] "div"              "course.seq"       "rpt.flg"          "pass"            
##  [9] "c2s"              "base.time.course" "years"            "pass.term.flg"   
## [13] "passN"

# remove the 13 variables just listed
keep.vars = names(which(single.category.vars==FALSE))
mydat= mydat[, keep.vars]
dim(mydat)

## [1] 2344  144

table(mydat$palN)

## 
##    0    1    2 
## 1918   25  401

Retained cohorted sections (numbered 80-83).

Analysis includes 70 observations from learning communities

Data Cleaning and Feature Engineering

Create new variables: delay since high school and cumulative percent of units passed. Collapse sparse categories and other miscellaneous clean up of data. Sparse categories can cause complete separation in logistic regression and are only predictive for a few students.

yr.course.taken = as.numeric(gsub(".*([0-9]{4})","\\1",mydat$coh.term))

mydat$delay.from.hs = ifelse(!is.na(yr.course.taken) & !is.na(mydat$hs.grad.yr), yr.course.taken-mydat$hs.grad.yr, NA)

sum(is.na(mydat$delay.from.hs))  #27 missing values

## [1] 27

# remove students who did not complete PAL
dim(mydat)

## [1] 2344  145

mydat=subset(mydat, palN!=1)

#recode palN to 0=nonPAL and 1=PAL, as numeric values
dim(mydat)

## [1] 2319  145

mydat$palN = ifelse(mydat$palN==2, 1, 0)

#clean up category names in m.rmd and e.rmd
mydat$m.rmd[mydat$m.rmd=="Not Remedial\nin Math"]="Not Remedial in Math"

mydat$m.rmd[mydat$m.rmd=="Remedial\nin Math"]="Remedial in Math"

mydat$m.rmd = droplevels(mydat$m.rmd)

mydat$e.rmd[mydat$e.rmd=="Not Remedial\nin English"]="Not Remedial in English"

mydat$e.rmd[mydat$e.rmd=="Remedial\nin English"]="Remedial in English"

mydat$e.rmd = droplevels(mydat$e.rmd)
table(mydat$m.rmd)

## 
## Not Remedial in Math     Remedial in Math 
##                 2168                  116

# create feature, proportion of cumulative units taken that were passed
# 29% of values are missing, do not use this variable
# mydat$cum.percent.units.passed =  mydat$tot.passd.prgrss.start/mydat$tot.taken.prgrss.start

# collapse sparse categories
mydat=group_category(data = mydat, feature = "load.code", threshold = 0.05, update = TRUE)

table(mydat$load.code, PAL=mydat$palN)

##        PAL
##            0    1
##   F     1695  374
##   OTHER  223   27

mydat=group_category(data = mydat, feature = "addr.type", threshold = 0.05, update = TRUE)
table(mydat$addr.type, PAL=mydat$palN)

##        PAL
##            0    1
##   OTHER  305   54
##   PERM  1613  347

# code instructor as alphabetic letter for anonymity
mydat$Instructor_01=droplevels(factor(mydat$Instructor_01))

instructor.vec = sort(unique(mydat$Instructor_01))
num.instr = length(instructor.vec)

mydat$Instructor_01 = factor(mydat$Instructor_01, levels = instructor.vec, labels=as.character(1:num.instr))

key.instr.code = cbind(as.character(instructor.vec),
                       1:num.instr)

Filter to relevant variables

Subjective judgment was used to narrow the pool of variables down to those likely to be confounders. It's important to include all variables correlated with outcome even if it is uncertain whether they are related to likeihood of enrolling in PAL. This allows for a more precise estimate of the treatment effect.

myvars <- c('emplid','term.code','coh','gender','eth.erss','fys.flg','mother.ed','m.rmd','e.rmd','sat.math.score','sat.verbal.score','hs.gpa',"AP_CALAB", "AP_CALAB.flg",'median.income','med.inc.flg','grd.pt.unt',"load.code",'bot.level','censusMajor','term.units.attemptedCensus',  'stem.base.flg','hous.coh.term.flg',
 "enroll.status",
   'palN',"pell.coh.term.flg","course.age","acad.stand",'delay.from.hs','addr.type','higher.ed.gpa.start','adm.area', 'higher.ed.gpa.start.flg')

mydat = mydat[,myvars]

# Convert categorical variables to factor type
factor.vars = c('coh','gender','eth.erss','fys.flg','mother.ed','e.rmd','m.rmd', "AP_CALAB.flg","load.code",'bot.level','censusMajor','stem.base.flg','hous.coh.term.flg',
 "enroll.status","pell.coh.term.flg","acad.stand",'addr.type','adm.area', 'higher.ed.gpa.start.flg')


mydat[,factor.vars] <- lapply(mydat[,factor.vars],as.factor)

plot_missing(mydat, missing_only = TRUE)

table(mydat$palN)

## 
##    0    1 
## 1918  401

Subset on Complete Cases only in Math/Stat Data

mydat = droplevels(mydat[complete.cases(mydat), ])
dim(mydat)

## [1] 1616   33

table(mydat$palN)

## 
##    0    1 
## 1323  293

Coarsen some factors to prevent complete separation in logistic regression

sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))

sum(sep.cols)  # 3 variables causing complete separation

## [1] 3

names(mydat)[sep.cols]

## [1] "eth.erss"    "censusMajor" "acad.stand"

# collapse censusMajor by course since the degree of collapsing will differ markedly among courses
mydat = group_category(data = mydat, feature = "censusMajor", threshold = 0.1, update = TRUE)
with(mydat, table(censusMajor, palN))

##                                        palN
## censusMajor                               0   1
##   Chemistry                             108  26
##   Civil Engineering                     231  50
##   Computer Engineering                  146  30
##   Computer Science                      285  50
##   Electrical and Electronic Engineering 111  23
##   Mechanical Engineering                265  87
##   OTHER                                 177  27

# collapse eth.erss categories, remove others causing complete separation
mydat = group_category(data = mydat, feature = "eth.erss", threshold = 0.05, update = TRUE)
with(mydat, table(eth.erss, palN))

##                    palN
## eth.erss              0   1
##   African American   45  12
##   Asian             448  85
##   Hispanic          355  93
##   OTHER              81  26
##   Two or More Races  69  22
##   White             325  55

mydat$enroll.status.Continuing.indicator = factor(ifelse(mydat$enroll.status=="Continuing", 1, 0))
mydat$acad.stand.Good.indicator = factor(ifelse(mydat$acad.stand=="Good Standing", 1, 0))

table(mydat$palN)

## 
##    0    1 
## 1323  293

Remove redundant original versions of recoded variables

# Removed recoded variables
mydat = mydat[ , !(names(mydat) %in% c("enroll.status","acad.stand"))]

Check again for Complete Separation

sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))
sep.cols

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

sum(sep.cols)

## [1] 0

names(mydat)[sep.cols]

## character(0)

mydat = mydat[ ,!names(mydat) %in% c("emplid", "term.code")]

Build a Logistic Regression Model for Propensity Score

Subjectively identified three potential confounders to force the model to retain: sat.math.score, gender, and eth.erss. Stepwise variable selection will be used to select which of the variables currently in the PAL dataset to include in the propensity model.

min.model = glm(palN ~ sat.math.score+gender+eth.erss, data=mydat, family=binomial)
 
biggest.formula= formula(glm(palN ~sat.math.score+gender+eth.erss+coh+fys.flg+mother.ed+e.rmd+m.rmd+sat.verbal.score+hs.gpa+AP_CALAB+AP_CALAB.flg+median.income+load.code+bot.level+term.units.attemptedCensus+stem.base.flg+hous.coh.term.flg+pell.coh.term.flg+course.age+delay.from.hs+addr.type+higher.ed.gpa.start+adm.area+acad.stand.Good.indicator+censusMajor, data=mydat, family="binomial"))

step.model= step(min.model,
                        direction="forward",
                        scope=biggest.formula,
                        trace=FALSE, k=2)


step.formula = formula(step.model)

#Propensity scores

propensity.model = glm(step.formula, data=mydat, family="binomial")
mydat$propensity = propensity.model$fitted.values

Propensity Score Matching

match.math =  Match(Y=mydat$grd.pt.unt, Tr=mydat$palN, X=mydat$propensity,BiasAdjust = F, estimand = "ATT", M=1, caliper=0.25)

Balance check

Check to see if matching resulted in good covariate balance between the PAL and nonPAL groups. Covariate balance is reasonable with the absolute standardized mean difference below 0.10 for all covariates in the propensity model.

m31.bal=    bal.tab(match.math, step.formula, data=mydat, un=TRUE, quick=FALSE)
m31.bal

## Balance Measures
##                                                      Type Diff.Un Diff.Adj
## sat.math.score                                    Contin. -0.3645   0.0040
## gender_Male                                        Binary -0.0390   0.0242
## eth.erss_African American                          Binary  0.0069  -0.0140
## eth.erss_Asian                                     Binary -0.0485  -0.0234
## eth.erss_Hispanic                                  Binary  0.0491   0.0200
## eth.erss_OTHER                                     Binary  0.0275   0.0088
## eth.erss_Two or More Races                         Binary  0.0229   0.0167
## eth.erss_White                                     Binary -0.0579  -0.0080
## censusMajor_Chemistry                              Binary  0.0071  -0.0405
## censusMajor_Civil Engineering                      Binary -0.0040   0.0002
## censusMajor_Computer Engineering                   Binary -0.0080   0.0056
## censusMajor_Computer Science                       Binary -0.0448   0.0205
## censusMajor_Electrical and Electronic Engineering  Binary -0.0054   0.0069
## censusMajor_Mechanical Engineering                 Binary  0.0966  -0.0063
## censusMajor_OTHER                                  Binary -0.0416   0.0137
## higher.ed.gpa.start                               Contin.  0.0390  -0.0067
## bot.level_Freshman                                 Binary -0.0095  -0.0097
## bot.level_Junior                                   Binary  0.0127   0.0197
## bot.level_Senior                                   Binary -0.0268  -0.0043
## bot.level_Sophomore                                Binary  0.0235  -0.0056
## course.age                                        Contin.  0.0892  -0.0185
## hs.gpa                                            Contin. -0.1420  -0.0208
## pell.coh.term.flg_Pell                             Binary  0.0708   0.0288
## 
## Sample sizes
##                      Control Treated
## All                  1323.       293
## Matched (ESS)         316.39     292
## Matched (Unweighted)  590.       292
## Unmatched             733.         0
## Discarded               0.         1

love.plot(m31.bal,binary = "raw", stars = "std", var.order = "unadjusted", 
          thresholds = c(m = .1), abs = F)

Propensity-score adjusted mean gpa of PAL and nonPAL

The estimated increase in the mean grade of students in PAL over those not in PAL after correcting for self-selection biases is .

est.effect=summary(match.math)

## 
## Estimate...  0.39715 
## AI SE......  0.0986 
## T-stat.....  4.0279 
## p.val......  5.627e-05 
## 
## Original number of observations..............  1616 
## Original number of treated obs...............  293 
## Matched number of observations...............  292 
## Matched number of observations  (unweighted).  770 
## 
## Caliper (SDs)........................................   0.25 
## Number of obs dropped by 'exact' or 'caliper'  1

indx.trt = match.math$index.treated
indx.cntr = match.math$index.control
wts = match.math$weights

mean.trt=sum(mydat[indx.trt, 'grd.pt.unt']*wts)/sum(wts)
mean.cntrl=sum(mydat[indx.cntr, 'grd.pt.unt']*wts)/sum(wts)


diff.means = match.math$est

std.error =  match.math$se
num.uniq.contrl.matched =  length(unique(match.math$index.control))

num.trt = match.math$wnobs

approx.p.val =  (1-pnorm(diff.means/std.error)) #right-tail

results = data.frame(course=c("Math 31"), mean.gpa.PAL=round(mean.trt,2), mean.gpa.nonPAL=round(mean.cntrl,2), difference.mean.gpa=round(diff.means,2), standard.error=round(std.error,2),p.val=round(approx.p.val,4),number.control=num.uniq.contrl.matched, number.treated=num.trt)

kable(results, caption = "Propensity-scored Adjusted Comparison of PAL and non-PAL Grades")

Propensity-scored Adjusted Comparison of PAL and non-PAL Grades
course	mean.gpa.PAL	mean.gpa.nonPAL	difference.mean.gpa	standard.error	p.val	number.control	number.treated
Math 31	2.53	2.14	0.4	0.1	0	590	292

Sensitivity Analysis

 print(psens(match.math, Gamma=2, GammaInc = 0.1) )

## 
##  Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value 
##  
## Unconfounded estimate ....  0 
## 
##  Gamma Lower bound Upper bound
##    1.0           0      0.0000
##    1.1           0      0.0000
##    1.2           0      0.0000
##    1.3           0      0.0009
##    1.4           0      0.0105
##    1.5           0      0.0603
##    1.6           0      0.1985
##    1.7           0      0.4266
##    1.8           0      0.6696
##    1.9           0      0.8484
##    2.0           0      0.9441
## 
##  Note: Gamma is Odds of Differential Assignment To
##  Treatment Due to Unobserved Factors 
##

Note that in the above table, 1.5 in the first column marks the first row where 0.05 is between the Lower and Upper bounds. This means that an unknown confounder which increases the odds of being in PAL by more than 1.5 is enough to change the treatment effect from significant to non-significant. The next code block generates the effect on the odds ratio of each variable in the propensity score. Thus, if there is an unknown confounder that has an effect on the propensity score similar to major or higher.ed.gpa.start then the PAL effect would become non-significant. Thus, this finding is moderately sensitive to unknown confounders. It is possible a variable like the number of hours per week a student works which is not in our dataset is a confounder which could reverse the statistical significance of this analysis.

  print(sort(exp(abs(propensity.model$coefficients))))

##                                   sat.math.score 
##                                         1.003920 
## censusMajorElectrical and Electronic Engineering 
##                                         1.037258 
##                                 eth.erssHispanic 
##                                         1.102326 
##                     censusMajorCivil Engineering 
##                                         1.122719 
##                                    eth.erssAsian 
##                                         1.139046 
##                                    eth.erssWhite 
##                                         1.155434 
##                  censusMajorComputer Engineering 
##                                         1.157027 
##                                       course.age 
##                                         1.160383 
##                      censusMajorComputer Science 
##                                         1.181197 
##                            pell.coh.term.flgPell 
##                                         1.263025 
##                               bot.levelSophomore 
##                                         1.322104 
##                                           hs.gpa 
##                                         1.349711 
##                                       genderMale 
##                                         1.409533 
##                                    eth.erssOTHER 
##                                         1.483459 
##                censusMajorMechanical Engineering 
##                                         1.485078 
##                              higher.ed.gpa.start 
##                                         1.535626 
##                                  bot.levelJunior 
##                                         1.597130 
##                                 censusMajorOTHER 
##                                         1.598893 
##                        eth.erssTwo or More Races 
##                                         1.695647 
##                                  bot.levelSenior 
##                                         5.178004 
##                                      (Intercept) 
##                                         8.556757

PAL effectiveness Analysis for Math 31

Michelle Norris and Caitlin Esgana

May 11, 2021