Probability Calculations

1 minute read

  • Simulate data from R’s builtin distributions
  • Calculate and graph probability density / mass functions

live notes

123 GO – Did the student made review videos improve your understanding?




Stand on the shoulders of giants

If X ~ Poisson(140), then the probability mass function is:

\[P(X = k) = e^{-20} \frac{20^k}{k!}\]

We can implement the formula:

pmf_poisson140 = function(k){
    exp(-140) * 140^k / factorial(k)

It seems to work fine, it’s even vectorized.

x = pmf_poisson140(130:143)


Uh-oh! A PMF cannot be infinite. What happened?



R has more capabilities of calculating probabilities for various distributions than any other software I’m aware of. Use it!

Here’s a better way

x2 = dpois(135:145, lambda = 140)

What are the reasons for preferring R’s builtin probability calculations?

  1. clarity - most important IMHO. Others can read the code and see what you intended. If you call dpois I know you’re trying to calculate the PMF for a Poisson distribution. If you code up some formula, then I either have to read the code or rely on comments.
  2. robust
  3. accurate
  4. efficient

These functions have been refined for decades. Stand on the shoulders of giants.

R probability function naming conventions

Base R has 4 different probability functions for 16 different distributions, and external packages on CRAN have many more. The behavior of the function comes from the prefix.


  • d probability density / mass functions
  • p probability (cumulative) distribution functions
  • q quantile functions
  • r random number generation

The distribution comes from the suffix.


  1. beta beta
  2. binom binomial
  3. cauchy Cauchy
  4. chisq chi-squared
  5. exp exponential
  6. f Fisher F
  7. gamma gamma
  8. geom geometric
  9. hyper hypergeometric
  10. logis logistic
  11. lnorm lognormal
  12. nbinom negative binomial
  13. norm normal
  14. pois Poisson
  15. t Student’s t
  16. unif uniform
  17. weibull Weibull

For example, if we want to calculate the P(Z < -1), where Z ~ Normal(0, 1) we use the cumulative distribution function:

pnorm(-1, mean = 0, sd = 1)


In this example we are using the defaults. Not all distributions have defaults. Be careful with the parameterization- it may be different than your textbook.
