Stat128 Lecture Schedule

2 minute read

Tentative schedule for lecture topics

Course Description: Computer methods for accessing, transforming, summarizing, graphing, and making statistical inferences from data. Focus is on open-source, command-line software, but menu-driven statistical software may be introduced. Students will learn to apply computer methods to solve problems selected from the areas of modeling, simulation, inference and statistical learning. The intent of this course is to provide students with the software skills needed for statistical work in industry or academia. 3 units.

Idea: Learn to program through simple experiments. Lectures are going to be very interactive.

  1. August 31 introduction
    • Intro activity
    • R overview
      • Design goals
      • pros and cons
      • difference between R and Rstudio
      • basic plots, for example, hist()
    • R / Rstudio essentials
      • calling functions
      • evaluating expressions
      • reading error messages
      • getting help
        • local and stack overflow
      • variables
      • assignment
      • workspace
      • CRAN, packages, and how to use them.
  2. September 7 Rmarkdown
    • Monday, September 7th is Labor Day Holiday
    • Rmarkdown
      • package installation
      • markdown syntax
      • running code blocks
      • HTML output
    • summarizing univariate data
      • standard statistics, range, median, mean, etc.
      • histograms and bin width
      • preview filtering by condition
  3. September 14 programming essentials
    • intro writing functions
      • plotting mathematical functions, simple examples like polynomials
    • booleans
      • boolean logic
      • comparison operators
      • filters
    • types 1
      • statistical data types vs programming types
      • type hierarchy
      • vectors
      • Special values: NA, Inf, Nan
  4. September 21 data frames
    • data frames
      • selecting columns
      • selecting rows
    • types 2
      • strings
      • dates
      • floating points
  5. September 28 plots and reporting
    • producing data analysis reports for an audience
      • general viz principles
    • ggplot2
    • ggplot2
  6. October 5 designing functions
    • planning and designing functions
    • debugging 1
    • apply family of functions
  7. October 12 popular data packages I’m exposing students to what’s out there, and they can use whatever they like best.
    • data.table
    • tidyverse
    • dplyr
  8. October 19 control flow
    • conditional statements
    • iteration
    • data science career panel
  9. October 26 statistical learning applications
    • statistical learning
      • overview
      • formula interface
    • decision trees
    • another ML topic
  10. November 2 utility scripting
    • scripting
      • saving scripts
      • comments (often the “why” is more important than the “what”)
      • running entire scripts
    • error handling
    • directories, file manipulation, and system administration
      • motivation
      • tree hierarchy
  11. November 9 nested data
    • lists
    • environments, search paths
    • table tools debate
  12. November 16 how functions work
    • functions in detail
      • lazy evaluation
    • debugging 2
    • Wednesday, November 11th is Veteran’s Day Holiday
  13. November 23 Shiny dashboards
    • General dashboard principles
      • Intro to Shiny
    • Shiny 2
    • Shiny 3
  14. November 30 object oriented programming
    • object oriented programming
    • object oriented programming 2
    • Friday, November 27th is Thanksgiving Holiday
  15. December 7 high performance
    • profiling
    • parallel programming?
    • C interfaces? Application TBD Some possible options: bootstrap, more ML applications, metaprogramming, testing and software engineering, matrix methods, big data. We can go deeper into any of the topics from earlier in the semester. Students can vote and pick what they find most interesting.

Programming

- recursion

Ideas for assignments

  1. Basic EDA
  2. Plotting 1
  3. Plotting 2
  4. Simulation - robust statistics?
  5. Messy Data
  6. Categorical Data
  7. Slicing
  8. Shiny 1
  9. Shiny 2
  10. Stat learning - prediction
  11. Stat learning - comparing models
  12. Programming, Testing
  13. Programming, Recursion
  14. Programming, Debugging, error handling
  15. Base R, tidyverse, data.table, debate

Updated: