Probability Calculations

1 minute read

Simulate data from R’s builtin distributions
Calculate and graph probability density / mass functions

live notes

123 GO – Did the student made review videos improve your understanding?

Announcements:

References:

https://cran.r-project.org/web/views/Distributions.html

Stand on the shoulders of giants

If X ~ Poisson(140), then the probability mass function is:

\[P(X = k) = e^{-20} \frac{20^k}{k!}\]

We can implement the formula:

pmf_poisson140 = function(k){
    exp(-140) * 140^k / factorial(k)
}

It seems to work fine, it’s even vectorized.

x = pmf_poisson140(130:143)

plot(x)

pmf_poisson140(135:145)

Uh-oh! A PMF cannot be infinite. What happened?

140^(143:144)

Overflow.

R has more capabilities of calculating probabilities for various distributions than any other software I’m aware of. Use it!

Here’s a better way

x2 = dpois(135:145, lambda = 140)
plot(x2)

What are the reasons for preferring R’s builtin probability calculations?

clarity - most important IMHO. Others can read the code and see what you intended. If you call dpois I know you’re trying to calculate the PMF for a Poisson distribution. If you code up some formula, then I either have to read the code or rely on comments.
robust
accurate
efficient

These functions have been refined for decades. Stand on the shoulders of giants.

R probability function naming conventions

Base R has 4 different probability functions for 16 different distributions, and external packages on CRAN have many more. The behavior of the function comes from the prefix.

Prefixes:

d probability density / mass functions
p probability (cumulative) distribution functions
q quantile functions
r random number generation

The distribution comes from the suffix.

Suffixes:

beta beta
binom binomial
cauchy Cauchy
chisq chi-squared
exp exponential
f Fisher F
gamma gamma
geom geometric
hyper hypergeometric
logis logistic
lnorm lognormal
nbinom negative binomial
norm normal
pois Poisson
t Student’s t
unif uniform
weibull Weibull

For example, if we want to calculate the P(Z < -1), where Z ~ Normal(0, 1) we use the cumulative distribution function:

pnorm(-1, mean = 0, sd = 1)

pnorm(-1)

In this example we are using the defaults. Not all distributions have defaults. Be careful with the parameterization- it may be different than your textbook.

Twitter Facebook LinkedIn

Probability Calculations

Stand on the shoulders of giants

R probability function naming conventions

You May Also Enjoy

Diversity Inclusivity Statement

General Student Advice

Homework Covid Database

Introduction Sql