Probability Calculations
- Simulate data from R’s builtin distributions
- Calculate and graph probability density / mass functions
123 GO – Did the student made review videos improve your understanding?
Announcements:
References:
- https://cran.r-project.org/web/views/Distributions.html
Stand on the shoulders of giants
If X ~ Poisson(140), then the probability mass function is:
\[P(X = k) = e^{-20} \frac{20^k}{k!}\]We can implement the formula:
pmf_poisson140 = function(k){
exp(-140) * 140^k / factorial(k)
}
It seems to work fine, it’s even vectorized.
x = pmf_poisson140(130:143)
plot(x)
pmf_poisson140(135:145)
Uh-oh! A PMF cannot be infinite. What happened?
140^(143:144)
Overflow.
R has more capabilities of calculating probabilities for various distributions than any other software I’m aware of. Use it!
Here’s a better way
x2 = dpois(135:145, lambda = 140)
plot(x2)
What are the reasons for preferring R’s builtin probability calculations?
- clarity - most important IMHO.
Others can read the code and see what you intended.
If you call
dpoisI know you’re trying to calculate the PMF for a Poisson distribution. If you code up some formula, then I either have to read the code or rely on comments. - robust
- accurate
- efficient
These functions have been refined for decades. Stand on the shoulders of giants.
R probability function naming conventions
Base R has 4 different probability functions for 16 different distributions, and external packages on CRAN have many more. The behavior of the function comes from the prefix.
Prefixes:
dprobability density / mass functionspprobability (cumulative) distribution functionsqquantile functionsrrandom number generation
The distribution comes from the suffix.
Suffixes:
betabetabinombinomialcauchyCauchychisqchi-squaredexpexponentialfFisher Fgammagammageomgeometrichyperhypergeometriclogislogisticlnormlognormalnbinomnegative binomialnormnormalpoisPoissontStudent’s tunifuniformweibullWeibull
For example, if we want to calculate the P(Z < -1), where Z ~ Normal(0, 1) we use the cumulative distribution function:
pnorm(-1, mean = 0, sd = 1)
pnorm(-1)
In this example we are using the defaults. Not all distributions have defaults. Be careful with the parameterization- it may be different than your textbook.