STAT 128 - Statistical Computing

This is the course webpage for my STAT 128 course at CSU Sacramento. In this class we’ll learn about data science and statistical computing using the R programming language. The course notes are available on this page. Announcements, assignment submissions, grades, discussions, lecture recordings, and anything that may identify students is available on Canvas.

  • Communication For Current Students

    Asking questions during live class is the fastest and most efficient way to communicate. Outside of class, the best way to communicate is through Discord and Canvas. For private matters, you can come to office hours or email me at fitzgerald@csus.edu.

  • How To Record Videos In Canvas

    We will share our videos with each other through Canvas discussions. This post describes several ways to upload videos and make them available to a Canvas discussion.

  • Vim Text Editors
    • Edit text files in a terminal
    • Quit Vim :wq to save, :q! to not save
  • Concluding Class
    • Wrapping up class
  • Merging Data Frames
    • Merge two data frames that share a common column
  • Object Oriented Programming S3
    • Use :: and ::: to find objects in a namespace
    • Predict which object will be found based on search paths
    • Define new methods for generic functions like plot
  • Debugging
    • Write wrapper functions
    • Debug functions that you write
  • Reading Files In Directories
    • Read, write, rename, and move files in directories
  • Probability Calculations
    • Simulate data from R’s builtin distributions
    • Calculate and graph probability density / mass functions
  • Reference Semantics
    • Identify side effects in function calls
    • Predict behavior of reference objects in simple functions
  • String Processing
    • Extract patterns from strings
  • Prediction From Multivariate Data
    • Describe the purpose of record identifiers, and why they should be excluded from models
    • Identify patterns in text
  • Rpart Greedy Algorithms
    • Describe the rpart regression model at a high level
    • Interpret a printed rpart object
  • Extending The Linear Model With Computed Columns
    • Interpret the results of predict on fitted models with old and new data
    • Extend the univariate linear model with computed columns
  • Univariate Regression
    • Explain the idea of linear regression with one variable
    • Extract the coefficients of a linear model produced by lm
  • Many Ways To Subset Data
    • Describe the use cases for different ways to select subsets of data
  • Types And Coercion
    • Describe R’s type hierarchy
    • Predict behavior from implicit coercion
  • Loops
    • Write functions with for and while loops
    • Know when to write loops, and when to use vectorized functions or lapply and related functions
  • Conditional Statements
    • Write functions containing if and else statements
  • Data Science Panel 2020

    Video Q&A panel with 5 data scientists working in industry and government.

  • Split Apply Combine
    • Split data, apply functions, and combine results
  • Designing Functions
    • Evaluate code using source
    • Plan and implement functions
    • Document functions
  • Interaction Effects
    • Create and interpret contingency tables
    • Create and interpret interaction plots
  • Special Values Na Null Nan
    • Compare and contrast special values Inf, NaN, NA, NULL
    • Predict propagation of NA through vectorized operations
    • Test for presence of special values
  • Introduction To Ggplot2
    • Create data visualizations using ggplot2
    • Explain the idea of declarative graphics
  • Principles Of Data Visualization
    • Identify elements of plots
    • Create clear, meaningful statistical graphics
    • Contrast EDA with presentation quality graphics
  • Boolean Logic
    • Reason about vectorized boolean computations
    • Explain difference between TRUE and T.
  • R Functions
    • Predict when variables will be available in function bodies and global environment
    • Determine whether a function is vectorized.
  • Workflow Tips
    • Write reproducible data analyses
    • Maintain consistency between written code and global environment
    • Ask precise questions about programming
    • Predict when variables will be available in function bodies and global environment
  • Introduction To User Defined Functions
    • Implement a function given a description of what it should do
    • Describe R’s formula syntax, y ~ x
    • Select rows of a data frame by index
  • Selecting Subsets Of Data Frames
    • create boolean vectors using comparison operators
    • select rows by condition
    • reorder rows of data frame
  • Introduction To Data Frames
    • describe high level idea of data frames
    • load CSV file on local file system
    • select columns by name
    • interpret interactive results of class, dim, head, tail, unique, table
  • Univariate Data
    • manipulate named vectors
    • summarize univariate data using statistics and graphics
  • Introduction To Rmarkdown Reports
    • Create standalone HTML reports from Rmarkdown
    • Embed graphics and code into HTML reports
    • Change chunk parameters
  • R Essential Definitions
    • Define the following terms, and identify them in an R expression:
      • assignment
      • variable name
      • function call
      • argument
    • Find and interpret builtin documentation
    • Search for error messages
  • R Overview And Context
    • Describe use cases for R at a high level: interactivity, visualization, data analysis, and statistical modeling
    • Describe difference between R and RStudio
  • Day One Activity
    • break the ice
    • set community norms
  • Stat128 Lecture Schedule

    Tentative schedule for lecture topics

  • Stat128 Syllabus

    Course Description: Computer methods for accessing, transforming, summarizing, graphing, and making statistical inferences from data. Focus is on open-source, command-line software, but menu-driven statistical software may be introduced. Students will learn to apply computer methods to solve problems selected from the areas of modeling, simulation, inference and statistical learning. The intent of this course is to provide students with the software skills needed for statistical work in industry or academia. 3 units.

  • Norms For Online Class Interactions

    See communication for current students

  • Resources For Learning R

    Here is a brief list of external resources for learning R.
    Many other excellent resources are available besides what I list here, and I encourage you to search for and use them.

  • Rubric for Data Science Reports

    Data scientists must communicate their results to make an impact. One way to communicate is through written reports. The components of the rubric follow, as well as a description of how you can earn full points. The point values for each of these components are listed on the assignment page, and may differ among assignments.

Updated: