Stat128 Lecture Schedule
Tentative schedule for lecture topics
Course Description: Computer methods for accessing, transforming, summarizing, graphing, and making statistical inferences from data. Focus is on open-source, command-line software, but menu-driven statistical software may be introduced. Students will learn to apply computer methods to solve problems selected from the areas of modeling, simulation, inference and statistical learning. The intent of this course is to provide students with the software skills needed for statistical work in industry or academia. 3 units.
Idea: Learn to program through simple experiments. Lectures are going to be very interactive.
- August 31
introduction
- Intro activity
- R overview
- Design goals
- pros and cons
- difference between R and Rstudio
- basic plots, for example,
hist()
- R / Rstudio essentials
- calling functions
- evaluating expressions
- reading error messages
- getting help
- local and stack overflow
- variables
- assignment
- workspace
- CRAN, packages, and how to use them.
- September 7
Rmarkdown
- Monday, September 7th is Labor Day Holiday
- Rmarkdown
- package installation
- markdown syntax
- running code blocks
- HTML output
- summarizing univariate data
- standard statistics, range, median, mean, etc.
- histograms and bin width
- preview filtering by condition
- September 14
programming essentials
- intro writing functions
- plotting mathematical functions, simple examples like polynomials
- booleans
- boolean logic
- comparison operators
- filters
- types 1
- statistical data types vs programming types
- type hierarchy
- vectors
- Special values: NA, Inf, Nan
- intro writing functions
- September 21
data frames
- data frames
- selecting columns
- selecting rows
- types 2
- strings
- dates
- floating points
- data frames
- September 28
plots and reporting
- producing data analysis reports for an audience
- general viz principles
- ggplot2
- ggplot2
- producing data analysis reports for an audience
- October 5
designing functions
- planning and designing functions
- debugging 1
- apply family of functions
- October 12
popular data packages
I’m exposing students to what’s out there, and they can use whatever they like best.
- data.table
- tidyverse
- dplyr
- October 19
control flow
- conditional statements
- iteration
- data science career panel
- October 26
statistical learning applications
- statistical learning
- overview
- formula interface
- decision trees
- another ML topic
- statistical learning
- November 2
utility scripting
- scripting
- saving scripts
- comments (often the “why” is more important than the “what”)
- running entire scripts
- error handling
- directories, file manipulation, and system administration
- motivation
- tree hierarchy
- scripting
- November 9
nested data
- lists
- environments, search paths
- table tools debate
- November 16
how functions work
- functions in detail
- lazy evaluation
- debugging 2
- Wednesday, November 11th is Veteran’s Day Holiday
- functions in detail
- November 23
Shiny dashboards
- General dashboard principles
- Intro to Shiny
- Shiny 2
- Shiny 3
- General dashboard principles
- November 30
object oriented programming
- object oriented programming
- object oriented programming 2
- Friday, November 27th is Thanksgiving Holiday
- December 7
high performance
- profiling
- parallel programming?
- C interfaces? Application TBD Some possible options: bootstrap, more ML applications, metaprogramming, testing and software engineering, matrix methods, big data. We can go deeper into any of the topics from earlier in the semester. Students can vote and pick what they find most interesting.
Programming
- recursion
Ideas for assignments
- Basic EDA
- Plotting 1
- Plotting 2
- Simulation - robust statistics?
- Messy Data
- Categorical Data
- Slicing
- Shiny 1
- Shiny 2
- Stat learning - prediction
- Stat learning - comparing models
- Programming, Testing
- Programming, Recursion
- Programming, Debugging, error handling
- Base R, tidyverse, data.table, debate