Many Ways To Subset Data

1 minute read

  • Describe the use cases for different ways to select subsets of data

live notes


Midterm in two parts. In class part of midterm through Canvas quizzes on Friday, October 30th. Multiple choice / true false, etc. Low stakes, low pressure. Take home part of midterm due Sunday, November 1st. Let me know ASAP if you have technical / internet / scheduling issues. Watch the review videos, the questions will be on those concepts.

Plan today:

  • Review mid quarter check in.
  • Showcase some of the unique plots from Q3 in the Roulette homework.
  • Group activity to review all the ways to subset a data frame.

Goal today is to step back and review / compare and contrast all the ways we know to select subsets of a data frame.

Group activity

Come up with all the ways you know how to select subsets of a vector or data frame. We’ll do examples with the mtcars data set, because cars are on my mind right now. 😁 We’ll take what you come up with and try to make a somewhat comprehensive list of the ways to subset a data frame.


Here’s a vector for us to play with.

x = 1:10 / 10


x[x < 0.5]


x[c(5, 9)]

And negative index:

x[-c(5, 9)]


If the vector has names we can use them.

x = seq_along(letters)
names(x) = letters

x[c("a", "b")]

Data Frames

Recall the general form is



Selection through logical critieria is the most generally useful for data analysis.

For example, “cars that have greater than 30 mpg”

mtcars[30 < mtcars$mpg, ]

Some variants:

subset(mtcars, 30 < mpg)

with(mtcars, mtcars[30 < mpg, ])


# First five rows
mtcars[1:5, ]

# First four columns
mtcars[, 1:4]

# columns and rows
mtcars[1:5, 1:4]

Negative index

mtcars[-(1:5), ]

mtcars[, -(1:5)]


mtcars[, "mpg"]

mtcars["Honda Civic", ]

mtcars["Honda Civic", "mpg"]

Then there is the $ operator.


Will this partial match work?


Yes- it’s useful for interactive situations, say when a column name has 20 characters. Check out the difference between [ and $ if we try to use a column that doesn’t exist.


mtcars[, "stat128"]
