Probability Calculations
- Simulate data from R’s builtin distributions
- Calculate and graph probability density / mass functions
123 GO – Did the student made review videos improve your understanding?
Announcements:
References:
- https://cran.r-project.org/web/views/Distributions.html
Stand on the shoulders of giants
If X ~ Poisson(140), then the probability mass function is:
\[P(X = k) = e^{-20} \frac{20^k}{k!}\]We can implement the formula:
pmf_poisson140 = function(k){
exp(-140) * 140^k / factorial(k)
}
It seems to work fine, it’s even vectorized.
x = pmf_poisson140(130:143)
plot(x)
pmf_poisson140(135:145)
Uh-oh! A PMF cannot be infinite. What happened?
140^(143:144)
Overflow.
R has more capabilities of calculating probabilities for various distributions than any other software I’m aware of. Use it!
Here’s a better way
x2 = dpois(135:145, lambda = 140)
plot(x2)
What are the reasons for preferring R’s builtin probability calculations?
- clarity - most important IMHO.
Others can read the code and see what you intended.
If you call
dpois
I know you’re trying to calculate the PMF for a Poisson distribution. If you code up some formula, then I either have to read the code or rely on comments. - robust
- accurate
- efficient
These functions have been refined for decades. Stand on the shoulders of giants.
R probability function naming conventions
Base R has 4 different probability functions for 16 different distributions, and external packages on CRAN have many more. The behavior of the function comes from the prefix.
Prefixes:
d
probability density / mass functionsp
probability (cumulative) distribution functionsq
quantile functionsr
random number generation
The distribution comes from the suffix.
Suffixes:
beta
betabinom
binomialcauchy
Cauchychisq
chi-squaredexp
exponentialf
Fisher Fgamma
gammageom
geometrichyper
hypergeometriclogis
logisticlnorm
lognormalnbinom
negative binomialnorm
normalpois
Poissont
Student’s tunif
uniformweibull
Weibull
For example, if we want to calculate the P(Z < -1), where Z ~ Normal(0, 1) we use the cumulative distribution function:
pnorm(-1, mean = 0, sd = 1)
pnorm(-1)
In this example we are using the defaults. Not all distributions have defaults. Be careful with the parameterization- it may be different than your textbook.