Special Values Na Null Nan

2 minute read

  • Compare and contrast special values Inf, NaN, NA, NULL
  • Predict propagation of NA through vectorized operations
  • Test for presence of special values

Announcements:

  • Data science career panel coming up next Friday.

live notes

References:

Inf

What does the graph of 1/x look like?

curve(1/x)

What is the value of 1/0?

oz = 1/0
?Inf

The basic rule should be that calls and relations with ‘Inf’s really are statements with a proper mathematical limit.

Your turn, predict the following:

Inf + 1
2 * Inf
-1 * Inf
mean(c(Inf, 1, 2, 3))

NaN

How about this?

Inf - Inf

NaN stands for “Not a Number”, because it’s not well defined mathematically.

How about this 0/0?

zz = 0/0

Your turn, predict the following:

NaN + 1
NaN - 1
2 * NaN
-1 * NaN
mean(c(NaN, 1, 2, 3))

If something is not well defined, then it will propagate through further operations as an unknown.

NA

head(airquality)

What are these NA values that show up in the Ozone and Solar.R columns?

class(airquality$Ozone)

It’s something numeric, but it is not the same as NaN. Any guesses?

NA represents missing data in a vector. For me personally, NA is R’s “killer feature” for data analysis. Missing values are deeply baked into everything that R does.

NA values propagate like the others:

NA + 1
2 * NA
x = c(NA, 1, 2, 3)
mean(x)

Some functions have an argument na.rm to remove NA values.

mean(x, na.rm = TRUE)

We can always manually remove them.

x2 = x[!is.na(x)]
x2

This selects the elements of x that are not NA.

mean(x2)

Imputing missing values

We might want to impute the missing values, which means to fill them in based on some computation. The easiest and most common thing to do is to replace the missing values with the mean.

Note the values we have before imputation:

table(airquality$Ozone)
hist(airquality$Ozone)

Impute with mean

o2 = airquality$Ozone
o2[is.na(o2)] = mean(o2, na.rm = TRUE)
hist(o2)
table(o2)
as.numeric(table(o2))

What do we notice about the new values? The mean got imputed.

NULL

p = plot(1:10)

What should p be? It’s not a ggplot object, all we did was change what we see on the graphics device. There’s no object at all associated with this. Yet every function must return something. We need a placeholder representing “no object”. That’s NULL.

Why do we need such an object? One use case is so we can check when it appears using is.null, and then write our code to handle these cases.

Testing for special values

Above we used is.na(x) to test for NA elements. What is wrong with the following?

x == NA

It does elementwise comparison with a missing value. Because of the propagation rules, this is always NA. Here’s the right way:

is.na(x)

Similarly, see is.nan, is.finite, is.null, etc.

Other special values

There are a few other values you might consider “special”.

0 length vectors

numeric()

Empty strings

s = ""

Summary

Sometimes arithmetic doesn’t work out. If this happens, then we might get one of the numeric special values.

  • Inf means infinity
  • NaN means “Not a Number”

There are two other more general special values.

  • NA represents missing values in a vector
  • NULL is a placeholder for ambiguous objects

Updated: