Executive Summary
Subset Data for Math 30
Compare the mean gpa for PAL and non-PAL students for Math 30 without Propensity Score Adjustment
Analyze missingness
Data Cleaning and Feature Engineering
Build a Logistic Regression Model for Propensity Score
Propensity Score Matching
Balance check
Propensity-score adjusted mean gpa of PAL and nonPAL
Sensitivity Analysis

Executive Summary

A propensity score analysis was conducted to assess the effect of PAL supplemental instruction on course grades in Math 30 . Propensity score adjustment was necessary since the data are observational and the characteristics of students who voluntarily enroll in PAL may differ in ways that may, independently of PAL, impact course grade compared to students who do not enroll in PAL. In propensity score analysis, variables related to both likelihood of PAL enrollment and course grade (confounders) are used in a logistic regression model to obtain a propensity score, which is a student's likelihood of enrolling in PAL.

Data Analysis

First Add Some R Packages to the Workspace\ Caution: warning messages are suppressed to reduce clutter in the output.

defaultW <- getOption("warn") 
options(warn = -1) 


library(rbounds)
library(readr)
library(tidyr)
library(tidyverse)
library(MatchIt)
library(cobalt)
library(backports)
library(DataExplorer)
library(readxl)
library(Matching)

options(warn = defaultW)

Source a script

The script contains a user-defined function called id.vars.zero.freq to identify variables causing complete separation in logistic regression.

source("IdentifyCompleteSepVars2.R")

Import the Data

Make sure the PAL datafile is in the same directory as this RMarkdown file. The file which includes data through the Spring 2019 semester has 1,099,371 rows and 174 columns. The total of the grd.pt.unt column is 2,237,555.

load("~/PAL data analysis/PALdatafull.Rdata")
dim(PALdatafull)

## [1] 1099371     174

sum(PALdatafull$grd.pt.unt)

## [1] 2237555

Subset Data for Math 30

Each instance of a course taken by a student is included in the data file for the semester during which it was taken and every semester thereafter. 'test1' below filters so the dataset only includes the semester during which the course was taken.

test1 <- PALdatafull$base.time.course==1
test2 <- PALdatafull$course == "MATH 30"

math.indx <- test1 & test2

mydat <- PALdatafull[math.indx, ]
dim(mydat)

## [1] 5882  174

# [1] 5882  174
table(mydat$palN)  # 704 Math 30 students completed PAL

## 
##    0    1    2 
## 5143   35  704

Compare the mean gpa for PAL and non-PAL students for Math 30 without Propensity Score Adjustment

The course.seq variables indicates how many times a student has taken a course prior to the current attempt. To filter on the first attempt at a course, we set course.seq to 0.

raw.table = data.frame(class=character(),
                       nonPALavg=numeric(),
                       PALavg=numeric(), 
                       Diff=numeric(), 
                       NonPAL_Num= integer(),
                       PAL_Num=integer(),
                       CompletePAL=numeric(),
                       TermPALStart=integer(),
                       row.names=NULL,
                       stringsAsFactors = FALSE)
eth.tables = list()
math.classes = "MATH 30"
for (i in 1:length(math.classes))
    {
      curr.class = math.classes[i]
      temp = subset(mydat, course==curr.class & course.seq==0)
      pal.start=min(unique(temp$term.code[temp$palN==2]))
      # only include terms after PAL start term
      temp = subset(temp, term.code>= pal.start)
      
      x=tapply(temp$grd.pt.unt,temp$palN, 
               mean, na.rm=T) %>% 
        as.numeric %>% 
        round(2)
      
      y=table(temp$palN) %>% as.numeric
      
      raw.table[i, 'class' ] = curr.class
      raw.table[i, c(2:4,7)]=c(x[1], x[3],x[3]-x[1],
                        round(y[3]/sum(y),2))
      raw.table[i, c(5,6,8)]= c(y[1], y[3], pal.start)
      
      #participation by ethnicity
      eth.tables[[i]] <- with(temp, table(eth.erss, palN)[,-c(2)]) #remove a few students who started PAL but did not finish
      colnames(eth.tables[[i]])<- c("nonPAL","PAL")
     
}
names(eth.tables) <- math.classes
#counts
eth.tables

## $`MATH 30`
##                    palN
## eth.erss            nonPAL PAL
##   African American      86  31
##   Asian                636 194
##   Foreign               93  20
##   Hispanic             680 172
##   Native American        4   3
##   Pacific Islander      20   6
##   Two or More Races    132  31
##   Unknown              103  29
##   White                526 138

# each column is 100%
# percent.eth.within.PAL.and.within.non = lapply(eth.tables, prop.table, margin=2) %>% lapply(round, digits=2)

# percent.eth.within.PAL.and.within.non

# each row is 100%
 percent.PAL.and.non.within.ethnicity = lapply(eth.tables, prop.table, margin=1) %>% lapply(round, digits=2)

 percent.PAL.and.non.within.ethnicity

## $`MATH 30`
##                    palN
## eth.erss            nonPAL  PAL
##   African American    0.74 0.26
##   Asian               0.77 0.23
##   Foreign             0.82 0.18
##   Hispanic            0.80 0.20
##   Native American     0.57 0.43
##   Pacific Islander    0.77 0.23
##   Two or More Races   0.81 0.19
##   Unknown             0.78 0.22
##   White               0.79 0.21

# formatted table
library(knitr)

## Warning: package 'knitr' was built under R version 3.6.3

kable(raw.table, caption = "Raw Comparison of PAL and non-PAL Grades (No Propensity Adjustment)")

Raw Comparison of PAL and non-PAL Grades (No Propensity Adjustment)
class	nonPALavg	PALavg	Diff	NonPAL_Num	PAL_Num	CompletePAL	TermPALStart
MATH 30	2.23	2.68	0.45	2310	626	0.21	2138

Remove students repeating Math 30 and data from before PAL for Math 30 started

table(mydat$palN)

## 
##    0    1    2 
## 5143   35  704

temp = subset(mydat, course.seq==0)
      pal.start=min(unique(temp$term.code[temp$palN==2]))
      # only include terms after PAL start term
      mydat = subset(temp, term.code>= pal.start)
      dim(mydat)

## [1] 2966  174

      table(mydat$palN)

## 
##    0    1    2 
## 2310   30  626

      #  number of students who completed Math 30 PAL goes from 704 to 626 (students repeating Math 30 removed)

Remove students enrolled in Math 30L

It would be inaccurate to include these students in the control group since they are receiving a form of supplemental instruction (SI). However, the SI-model for Math 30L and 31L differs from PAL so they are not included in the treatment group either. Rather, these students are removed entirely from the analysis. 236 rows removed.

library(readxl)

Math_30L_31L <- read_excel("Math 30L-31L.xlsx")
Math_30L_31L$Semester.Number = as.numeric(Math_30L_31L$Semester.Number)

#  Extract Lecture Course name, i.e. 'Math 30' for join
Math_30L_31L$Course.Lecture = substr(Math_30L_31L$Course, start = 1, stop=7)

#exclude rows that match from the Math 30L and 31L dataframe
table(mydat$palN)

## 
##    0    1    2 
## 2310   30  626

mydat = anti_join(mydat, Math_30L_31L, by=c("emplid"="Emplid","term.code"="Semester.Number", "course"="Course.Lecture"))
table(mydat$palN)

## 
##    0    1    2 
## 2076   30  624

dim(mydat)

## [1] 2730  174

Analyze missingness

Remove 19 variables having over 20% of values missing in order to retain a larger pool of PAL and non-PAL students. None of the removed variables seem likely to be confounders. We will only include students with no missing values for the subset of variables specified in the 'Filter to relevant variables' section below.

plot_missing(mydat, missing_only = TRUE)

zero_missing_vars <- mydat %>% 
  summarise(across( ,list(function(x) sum(is.na(x))))) %>%
  gather() %>%
  arrange(value)

zero_missing_vars <- zero_missing_vars %>%
  filter(zero_missing_vars$value != 0)

zero_missing_vars$var.name = substr(zero_missing_vars$key,1,nchar(zero_missing_vars$key)-2)

zero_missing_vars <- mydat %>%
  dplyr::select(zero_missing_vars$var.name)

missing_vars <- profile_missing(zero_missing_vars)
missing_vars

##                         feature num_missing  pct_missing
## 1                        gender           1 0.0003663004
## 2                      adm.area           1 0.0003663004
## 3                       veteran           1 0.0003663004
## 4                  hs.eng.units           1 0.0003663004
## 5                 hs.math.units           1 0.0003663004
## 6                 hs.phys.units           1 0.0003663004
## 7                  hs.bio.units           1 0.0003663004
## 8                  hs.lab.units           1 0.0003663004
## 9              hs.forLang.units           1 0.0003663004
## 10             hs.socstud.units           1 0.0003663004
## 11            hs.perfarts.units           1 0.0003663004
## 12            hs.elective.units           1 0.0003663004
## 13  ge.critical.thinking.status           1 0.0003663004
## 14       ge.english.comp.status           1 0.0003663004
## 15               ge.math.status           1 0.0003663004
## 16          ge.oral.comm.status           1 0.0003663004
## 17                   career.num           3 0.0010989011
## 18                    load.code           3 0.0010989011
## 19                    bot.level           3 0.0010989011
## 20                enroll.status           3 0.0010989011
## 21                     term.gpa           3 0.0010989011
## 22                higher.ed.gpa           3 0.0010989011
## 23                     csus.gpa           3 0.0010989011
## 24                withdraw_code           3 0.0010989011
## 25             unt_taken_prgrss           3 0.0010989011
## 26             unt_passd_prgrss           3 0.0010989011
## 27               tot_cumulative           3 0.0010989011
## 28           cur_resident_terms           3 0.0010989011
## 29           cum_resident_terms           3 0.0010989011
## 30                    acad.plan           3 0.0010989011
## 31            hous.coh.term.flg           6 0.0021978022
## 32                  censusMajor           7 0.0025641026
## 33                   campus.gpa           7 0.0025641026
## 34                    total.gpa           7 0.0025641026
## 35   term.units.attemptedCensus           7 0.0025641026
## 36                        state           9 0.0032967033
## 37                   acad.stand          13 0.0047619048
## 38                          coh          15 0.0054945055
## 39                     coh.term          15 0.0054945055
## 40                   admis.plan          17 0.0062271062
## 41               stem.final.flg          17 0.0062271062
## 42                base.time.coh          20 0.0073260073
## 43                     enrl.flg          20 0.0073260073
## 44                  enrl.flgERS          20 0.0073260073
## 45                      pst.flg          20 0.0073260073
## 46                   pst.flgERS          20 0.0073260073
## 47                     grad.flg          20 0.0073260073
## 48                  grad.flgERS          20 0.0073260073
## 49                      rtn.flg          20 0.0073260073
## 50                   rtn.flgERS          20 0.0073260073
## 51                         ttgF          20 0.0073260073
## 52                     maxUnits          20 0.0073260073
## 53         tot_cumulative.start          20 0.0073260073
## 54                   hs.grad.yr          20 0.0073260073
## 55            pell.coh.term.flg          21 0.0076923077
## 56                stem.base.flg          23 0.0084249084
## 57                     pass.flg          29 0.0106227106
## 58                    pass.flgN          29 0.0106227106
## 59                     eth.erss          31 0.0113553114
## 60                   urm.flgERS          31 0.0113553114
## 61                          zip          38 0.0139194139
## 62                        m.rmd          55 0.0201465201
## 63                 m.rmd.detail          55 0.0201465201
## 64                        e.rmd          55 0.0201465201
## 65                 e.rmd.detail          55 0.0201465201
## 66                      rmd.flg          55 0.0201465201
## 67                 rmd.subj.flg          55 0.0201465201
## 68                    mother.ed         111 0.0406593407
## 69                    father.ed         246 0.0901098901
## 70                  m.rmd.admin         292 0.1069597070
## 71           m.rmd.admin.detail         292 0.1069597070
## 72                       hs.gpa         302 0.1106227106
## 73                 plan.college         452 0.1655677656
## 74            plan.college.desc         452 0.1655677656
## 75                    plan.dept         452 0.1655677656
## 76                plan.deptAbbr         452 0.1655677656
## 77                  plan.degree         452 0.1655677656
## 78                    plan.type         452 0.1655677656
## 79               sat.math.score         525 0.1923076923
## 80                 sat.math.flg         525 0.1923076923
## 81             sat.verbal.score         525 0.1923076923
## 82               sat.verbal.flg         525 0.1923076923
## 83                sat.test.date         525 0.1923076923
## 84                   admit.term         541 0.1981684982
## 85       tot.passd.prgrss.start         794 0.2908424908
## 86       tot.taken.prgrss.start         794 0.2908424908
## 87                treat.section        1838 0.6732600733
## 88                  pledge.term        1906 0.6981684982
## 89                    deg.plan1        2034 0.7450549451
## 90                    grad.term        2034 0.7450549451
## 91                          ttg        2034 0.7450549451
## 92                 grad.termERS        2038 0.7465201465
## 93                fys.term.code        2104 0.7706959707
## 94                      fys.grd        2104 0.7706959707
## 95                  fys.rpt.flg        2104 0.7706959707
## 96                   trf.gpaADM        2434 0.8915750916
## 97                    deg.plan2        2716 0.9948717949
## 98              withdraw_reason        2720 0.9963369963
## 99                Instructor_02        2730 1.0000000000
## 100                   deg.plan3        2730 1.0000000000
## 101                   deg.plan4        2730 1.0000000000
## 102                   deg.plan5        2730 1.0000000000
## 103                   deg.plan6        2730 1.0000000000

 sum(missing_vars$pct_missing>0.2)  # remove 19 variables with >20% missing values, none appear to be confounders

## [1] 19

vars.missing.data = as.character(missing_vars[missing_vars$pct_missing>0.2, "feature"])
dim(mydat)

## [1] 2730  174

 mydat = mydat[,!(names(mydat) %in% vars.missing.data)]
dim(mydat)

## [1] 2730  155

 table(mydat$palN)

## 
##    0    1    2 
## 2076   30  624

Remove single-valued variables

Some variables may be single-values and are, thus, unrelated to propensity score. There were 10 single-valued variables in this dataset which were removed.

single.category.vars = sapply(mydat, function(x) length(unique(x))==1)

sum(single.category.vars)

## [1] 10

names(mydat)[single.category.vars]

##  [1] "course"           "units"            "course.numeric"   "div"             
##  [5] "course.seq"       "rpt.flg"          "c2s"              "base.time.course"
##  [9] "years"            "pass.term.flg"

# remove the 10 variables just listed
keep.vars = names(which(single.category.vars==FALSE))
mydat= mydat[, keep.vars]
dim(mydat)

## [1] 2730  145

table(mydat$palN)

## 
##    0    1    2 
## 2076   30  624

Retain cohorted sections (numbered 80-83).

400 observations from learning communities (362 in PAL)

Data Cleaning and Feature Engineering

Create new variables: delay since high school and cumulative percent of units passed. Collapse sparse categories and other miscellaneous clean up of data. Sparse categories can cause complete separation in logistic regression and are only predictive for a few students.

yr.course.taken = as.numeric(gsub(".*([0-9]{4})","\\1",mydat$coh.term))

mydat$delay.from.hs = ifelse(!is.na(yr.course.taken) & !is.na(mydat$hs.grad.yr), yr.course.taken-mydat$hs.grad.yr, NA)

sum(is.na(mydat$delay.from.hs))  #19 missing values

## [1] 20

# remove students who did not complete PAL
dim(mydat)

## [1] 2730  146

mydat=subset(mydat, palN!=1)

#recode palN to 0=nonPAL and 1=PAL, as numeric values
dim(mydat)

## [1] 2700  146

mydat$palN = ifelse(mydat$palN==2, 1, 0)

#clean up category names in m.rmd and e.rmd
mydat$m.rmd[mydat$m.rmd=="Not Remedial\nin Math"]="Not Remedial in Math"

mydat$m.rmd[mydat$m.rmd=="Remedial\nin Math"]="Remedial in Math"

mydat$m.rmd = droplevels(mydat$m.rmd)

mydat$e.rmd[mydat$e.rmd=="Not Remedial\nin English"]="Not Remedial in English"

mydat$e.rmd[mydat$e.rmd=="Remedial\nin English"]="Remedial in English"

mydat$e.rmd = droplevels(mydat$e.rmd)
table(mydat$m.rmd)

## 
## Not Remedial in Math     Remedial in Math 
##                 2482                  163

# create feature, proportion of cumulative units taken that were passed
# 29% of values are missing, do not use this variable
# mydat$cum.percent.units.passed =  mydat$tot.passd.prgrss.start/mydat$tot.taken.prgrss.start

# collapse sparse categories
mydat=group_category(data = mydat, feature = "load.code", threshold = 0.05, update = TRUE)

table(mydat$load.code, PAL=mydat$palN)

##        PAL
##            0    1
##   F     1838  580
##   OTHER  238   44

mydat=group_category(data = mydat, feature = "addr.type", threshold = 0.05, update = TRUE)
table(mydat$addr.type, PAL=mydat$palN)

##        PAL
##            0    1
##   OTHER  334   97
##   PERM  1742  527

# code instructor as alphabetic letter for anonymity
mydat$Instructor_01=droplevels(factor(mydat$Instructor_01))

instructor.vec = sort(unique(mydat$Instructor_01))
num.instr = length(instructor.vec)

mydat$Instructor_01 = factor(mydat$Instructor_01, levels = instructor.vec, labels=as.character(1:num.instr))

key.instr.code = cbind(as.character(instructor.vec),
                       1:num.instr)

Filter to relevant variables

Subjective judgment was used to narrow the pool of variables down to those likely to be confounders. It's important to include all variables correlated with outcome even if it is uncertain whether they are related to likeihood of enrolling in PAL. This allows for a more precise estimate of the treatment effect.

myvars <- c('emplid','term.code','coh','gender','eth.erss','fys.flg','mother.ed','m.rmd','e.rmd','sat.math.score','sat.verbal.score','hs.gpa',"AP_CALAB", "AP_CALAB.flg",'median.income','med.inc.flg','grd.pt.unt',"load.code",'bot.level','censusMajor','term.units.attemptedCensus',  'stem.base.flg','hous.coh.term.flg',
 "enroll.status",
   'palN',"pell.coh.term.flg","course.age","acad.stand",'delay.from.hs','addr.type','higher.ed.gpa.start','adm.area', 'higher.ed.gpa.start.flg')

mydat = mydat[,myvars]

# Convert categorical variables to factor type
factor.vars = c('coh','gender','eth.erss','fys.flg','mother.ed','e.rmd','m.rmd', "AP_CALAB.flg","load.code",'bot.level','censusMajor','stem.base.flg','hous.coh.term.flg',
 "enroll.status","pell.coh.term.flg","acad.stand",'addr.type','adm.area', 'higher.ed.gpa.start.flg')

mydat[,factor.vars] <- lapply(mydat[,factor.vars],as.factor)

plot_missing(mydat, missing_only = TRUE)

table(mydat$palN)

## 
##    0    1 
## 2076  624

Subset on Complete Cases only in Math/Stat Data

mydat = droplevels(mydat[complete.cases(mydat), ])
dim(mydat)

## [1] 2021   33

table(mydat$palN)

## 
##    0    1 
## 1510  511

Coarsen some factors to prevent complete separation in logistic regression

sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))

sum(sep.cols)

## [1] 3

names(mydat)[sep.cols]

## [1] "censusMajor"   "enroll.status" "acad.stand"

# collapse censusMajor by course since the degree of collapsing will differ markedly among courses
mydat = group_category(data = mydat, feature = "censusMajor", threshold = 0.1, update = TRUE)
with(mydat, table(censusMajor, palN))

##                                        palN
## censusMajor                               0   1
##   Chemistry                             128  41
##   Civil Engineering                     217  90
##   Computer Engineering                  185  63
##   Computer Science                      292 106
##   Electrical and Electronic Engineering  97  34
##   Mathematics                            58  21
##   Mechanical Engineering                314  99
##   OTHER                                 183  42
##   Undeclared                             36  15

# collapse eth.erss categories, remove others causing complete separation
mydat = group_category(data = mydat, feature = "eth.erss", threshold = 0.05, update = TRUE)
with(mydat, table(eth.erss, palN))

##                    palN
## eth.erss              0   1
##   African American   56  26
##   Asian             490 167
##   Hispanic          428 143
##   OTHER             109  41
##   Two or More Races  91  23
##   White             336 111

mydat$enroll.status.Continuing.indicator = factor(ifelse(mydat$enroll.status=="Continuing", 1, 0))
mydat$acad.stand.Good.indicator = factor(ifelse(mydat$acad.stand=="Good Standing", 1, 0))

# remove rows with med.inc.flg="Missing", 12 Missing values all nonPAL causing complete separation
miss.indx <-  mydat$med.inc.flg=="Missing"
mydat <- mydat[!miss.indx, ]
mydat <- subset(mydat, select=-c(med.inc.flg))
table(mydat$palN)

## 
##    0    1 
## 1501  508

Remove original variables that have recoded versions

# Removed recoded variables
mydat = mydat[ , !(names(mydat) %in% c("enroll.status","acad.stand"))]

Check again for Complete Separation

sep.cols = id.vars.zero.freq(mydat, treat.col= which(names(mydat)=="palN"))
sep.cols

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

sum(sep.cols)

## [1] 0

names(mydat)[sep.cols]

## character(0)

# remove unnecessary variables
mydat = mydat[ ,!names(mydat) %in% c("emplid", "term.code")]

Build a Logistic Regression Model for Propensity Score

Subjectively identified three potential confounders to force the model to retain: sat.math.score, gender, and eth.erss. Stepwise variable selection will be used to select which of the variables currently in the PAL dataset to include in the propensity model.

min.model = glm(palN ~ sat.math.score+gender+eth.erss, data=mydat, family=binomial)
 
biggest.formula= formula(glm(palN ~sat.math.score+gender+eth.erss+coh+fys.flg+mother.ed+m.rmd+e.rmd+sat.verbal.score+hs.gpa+AP_CALAB+AP_CALAB.flg+median.income+load.code+bot.level+term.units.attemptedCensus+stem.base.flg+hous.coh.term.flg+pell.coh.term.flg+course.age+delay.from.hs+addr.type+higher.ed.gpa.start+adm.area+acad.stand.Good.indicator+censusMajor, data=mydat, family="binomial"))

step.model= step(min.model,
                        direction="forward",
                        scope=biggest.formula,
                        trace=FALSE, k=2)


step.formula = formula(step.model)

#Propensity scores

propensity.model = glm(step.formula, data=mydat, family="binomial")
mydat$propensity = propensity.model$fitted.values

Propensity Score Matching

match.math =  Match(Y=mydat$grd.pt.unt, Tr=mydat$palN, X=mydat$propensity,BiasAdjust = F, estimand = "ATT", M=1, caliper=0.25)

Balance check

Check to see if matching resulted in good covariate balance between the PAL and nonPAL groups. Covariate balance is reasonable.

m30.bal=    bal.tab(match.math, step.formula, data=mydat, un=TRUE, quick=FALSE)
m30.bal

## Balance Measures
##                               Type Diff.Un Diff.Adj
## sat.math.score             Contin. -0.0561  -0.0414
## gender_Male                 Binary -0.0220  -0.0012
## eth.erss_African American   Binary  0.0145   0.0007
## eth.erss_Asian              Binary  0.0030  -0.0151
## eth.erss_Hispanic           Binary -0.0030   0.0082
## eth.erss_OTHER              Binary  0.0095   0.0082
## eth.erss_Two or More Races  Binary -0.0154   0.0065
## eth.erss_White              Binary -0.0086  -0.0086
## course.age                 Contin. -0.3067   0.0203
## acad.stand.Good.indicator   Binary  0.0582  -0.0089
## bot.level_Freshman          Binary  0.1300   0.0155
## bot.level_Junior            Binary -0.0342   0.0055
## bot.level_Senior            Binary -0.0288   0.0028
## bot.level_Sophomore         Binary -0.0670  -0.0238
## sat.verbal.score           Contin. -0.1116  -0.0296
## stem.base.flg_STEM          Binary -0.0034   0.0001
## delay.from.hs              Contin. -0.0664   0.0154
## median.income              Contin.  0.0409  -0.0097
## 
## Sample sizes
##                      Control Treated
## All                  1501.       508
## Matched (ESS)         514.94     506
## Matched (Unweighted)  870.       506
## Unmatched             631.         0
## Discarded               0.         2

love.plot(m30.bal,binary = "raw", stars = "std", var.order = "unadjusted", 
          thresholds = c(m = .1), abs = F)

Propensity-score adjusted mean gpa of PAL and nonPAL

The estimated increase in the mean grade of students in PAL over those not in PAL after correcting for self-selection biases is .

est.effect=summary(match.math)

## 
## Estimate...  0.23778 
## AI SE......  0.071962 
## T-stat.....  3.3042 
## p.val......  0.0009526 
## 
## Original number of observations..............  2009 
## Original number of treated obs...............  508 
## Matched number of observations...............  506 
## Matched number of observations  (unweighted).  1502 
## 
## Caliper (SDs)........................................   0.25 
## Number of obs dropped by 'exact' or 'caliper'  2

indx.trt = match.math$index.treated
indx.cntr = match.math$index.control
wts = match.math$weights

mean.trt=sum(mydat[indx.trt, 'grd.pt.unt']*wts)/sum(wts)
mean.cntrl=sum(mydat[indx.cntr, 'grd.pt.unt']*wts)/sum(wts)


diff.means = match.math$est

std.error =  match.math$se
num.uniq.contrl.matched =  length(unique(match.math$index.control))

num.trt = match.math$wnobs

approx.p.val =  (1-pnorm(diff.means/std.error)) #right-tail

results = data.frame(course=c("Math 30"), mean.gpa.PAL=round(mean.trt,2), mean.gpa.nonPAL=round(mean.cntrl,2), difference.mean.gpa=round(diff.means,2), standard.error=round(std.error,2),p.val=round(approx.p.val,4),number.control=num.uniq.contrl.matched, number.treated=num.trt)

kable(results, caption = "Propensity-scored Adjusted Comparison of PAL and non-PAL Grades")

Propensity-scored Adjusted Comparison of PAL and non-PAL Grades
course	mean.gpa.PAL	mean.gpa.nonPAL	difference.mean.gpa	standard.error	p.val	number.control	number.treated
Math 30	2.66	2.42	0.24	0.07	5e-04	870	506

Sensitivity Analysis

 print(psens(match.math, Gamma=2, GammaInc = 0.1) )

## 
##  Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value 
##  
## Unconfounded estimate ....  0 
## 
##  Gamma Lower bound Upper bound
##    1.0           0      0.0000
##    1.1           0      0.0002
##    1.2           0      0.0164
##    1.3           0      0.1853
##    1.4           0      0.5991
##    1.5           0      0.9063
##    1.6           0      0.9898
##    1.7           0      0.9994
##    1.8           0      1.0000
##    1.9           0      1.0000
##    2.0           0      1.0000
## 
##  Note: Gamma is Odds of Differential Assignment To
##  Treatment Due to Unobserved Factors 
##

Note that in the above table, 1.3 in the first column marks the first row where 0.05 is between the Lower and Upper bounds. This means that an unknown confounder which increases the odds of being in PAL by more than 1.3 is enough to change the treatment effect from significant to non-significant. The next code block generates the effect on the odds ratio of each variable in the propensity score. Thus, if there is an unknown confounder that has an effect on the propensity score similar to bot.level (class level at beginning of class) or ethnicity, the PAL effect would become non-significant. Thus, this finding is moderately sensitive to unknown confounders. It is possible a variable like the number of hours per week a student works which is not in our dataset is a confounder which could reverse the statistical significance of this analysis.

  print(sort(exp(abs(propensity.model$coefficients))))

##              median.income             sat.math.score 
##                   1.000004                   1.001379 
##           sat.verbal.score                 genderMale 
##                   1.001786                   1.163296 
##              delay.from.hs              eth.erssOTHER 
##                   1.178764                   1.267830 
##          stem.base.flgSTEM                 course.age 
##                   1.268959                   1.286637 
##         bot.levelSophomore            bot.levelJunior 
##                   1.413761                   1.456977 
##              eth.erssWhite           eth.erssHispanic 
##                   1.603320                   1.654398 
##              eth.erssAsian  eth.erssTwo or More Races 
##                   1.676861                   2.105995 
## acad.stand.Good.indicator1            bot.levelSenior 
##                   2.330094                   3.245961 
##                (Intercept) 
##                 196.880770

PAL effectiveness Analysis for Math 30

Michelle Norris and Caitlin Esgana

May 11, 2021