Rpart Greedy Algorithms

1 minute read

Describe the rpart regression model at a high level
Interpret a printed rpart object

123 GO – What’s your word of the day

Announcements:

References:

rpart vignette
regression trees from UCR

recursive partitioning

library(rpart)

n = 100
#x = sort(3 * runif(n))
x = seq(from = 0, to = 3, length.out = n
y = rep(2, n)
y[x < 1] = 1
y[2 < x] = 3
noise = rnorm(n, sd = 0.2)
y = y + noise
d = data.frame(x, y)

plot(x, y)

This model is piecewise constant.

Defaults:

fit1 = rpart(y ~ x, data = d)

plot(x, y)
lines(x, predict(fit1))

To see the parameters for rpart:

?rpart.control

Greedy algorithms do the best they can at every step. This means they don’t always find the best (optimal) solution. The following plot illustrates the greedy nature of rpart. The optimal solution with three leaf nodes is to split around 33 and 66. rpart starts out by splitting at 50.5, because that’s the best first split. The best second split is to split the upper half of the data that has 51 points, so around 75.

n = 101
x = seq(n)
y = x
fit = rpart(y ~ x, minsplit = 51)

plot(x, y)
lines(x, predict(fit))

This isn’t a flaw in the algorithm- it’s just a property to be aware of. Optimal solutions can be very elusive, and in practice trees work just fine.

Twitter Facebook LinkedIn

Rpart Greedy Algorithms

recursive partitioning

You May Also Enjoy

Diversity Inclusivity Statement

General Student Advice

Homework Covid Database

Introduction Sql