Probability distributions reveal either the probability of a random variable being a particular outcome (as with discrete probability distributions) or the probability that a random variable will fall within a particular interval of outcomes (as with continuous probability distributions). In addition, probability distributions are such that the total sum of the set of outcomes must be equal to 1 and the probability corresponding to a single outcome of interval of outcomes must be between 0 and 1.
Discrete Probability Distributions
Illustration: Rolling A Fair, Six-Sided Dice There are only 6 possible outcomes if you roll a six-sided dice: 1, 2, 3, 4, 5, and 6. The probabilities that you roll any of these outcomes is 1/6, and the sum of the probabilities of the six different outcomes is 1. The discrete random variable here is the value of a roll.
The following graphs show the results of rolling a six-sided dice 1000 times.
The frequencies and probabilities of each outcome can be determined from these graphs (ex. Rolling a 4 had a frequency of 160, meaning that 160 out of the 1000 rolls resulted in a value of 4. Also, Rolling a 4 had probability of about 0.16).
We can see that the CDF graph is a step function that is defined at each individual possible outcome and increases towards 1. The probability that the dice rolls a 6 or lower must be 1, because this interval contains the entire set of outcome values. The cumulative probability of each outcome can be determined from this graph (ex. Rolling a 4 had a cumulative probability of approximately 0.65, meaning that the probability of rolling a 4 or lower was about 65%).
Mathematically, a discrete probability distribution can be defined as a distribution with a set of outcomes that are discrete values, and are usually also pre-defined and finite. If X represents the discrete random variable, while x represents a possible outcome of X and j represents the set of all outcomes of X, then a discrete probability distribution is such that:
- , where p(x) is the probability that X = x
- , where the sum of all possible outcomes of X is 1
Continuous Probability Distributions
Illustration: Weighing Apples The average weight of an apple is about 150 grams. However, if you were to measure a number of apples, the outcomes that you would obtain from measuring the weight of each apple would vary like so: 150.534... grams, 149.259...grams, 154.274... grams, 152.389... grams, and so on. There are an infinite number of outcomes that emerge from these measurements, so the probability that an apple would weight exactly 152.234... grams is zero. The continuous random variable here is the weight of an apple.
The graphs below show the results of measuring the weights of 1000 apples.
It is unreasonable to plot the frequency of each outcome individually (each outcome is unique and would only have a frequency of 1), so the frequencies must be grouped by intervals to make a histogram that can be generalized into a function. Using the function, we can calculate the probability that the random variable with fall within an interval (ex. [a,b]) of values.
The cumulative probability graph for this illustration ends at 1, and the cumulative probability of each outcome (ex. b) can be determined from this graph. Also, the CDF graph is steeper in the middle where the frequencies are greater.
Mathematically, continuous probability distribution can be defined as a distribution with an infinite set of uncountable outcomes. The probability that a random outcome is equal to any real-value is zero, because there are an infinite number of outcomes that are possible. Thus, probabilities can only be calculated over intervals. If X represents the continuous random variable while x represents a possible outcome of X, then a continuous probability distribution is such that:
- , where p(a < x < b) is the probability of the interval [a, b] and f(x) is the function describing the distribution
- , where the probability of any single possible outcome is zero
- , where the sum of all the probabilities of the infinite set of outcomes is 1
Cumulative Distribution Function
Another way to define these two types of distributions is by their relationship to the cumulative distribution function F(x). A cumulative distribution function (CDF) is used to find the probability that a random variable X is less than or equal to an outcome value a.
Mathematically, we can define the CDF to be: , where must be positive and must stay between 0 and 1 because the CDF represents accumulating probabilities.
To find the cumulative probability of X for an outcome value a:
- for discrete probability functions, we evaluate the expression
Therefore, discrete probability distributions must have fragmented CDFs that are uniquely defined at each outcome. Please refer back to the section on discrete probability distributions, and click to expand the CDF graph for the given illustration.
- for continuous probability functions, we evaluate the expression
Probability Density Function
The probability density function (PDF) is the derivative of the cumulative distribution function, so it must integrate to 1. Mathematically:
A PDF of a random variable describes the probability of each point in the set of outcomes available to the random variable. The PDF must always be positive if the CDF is always increasing.
Mean, Median, and Mode
- Median: The middle value of a set of outcomes
- Mode: The most commonly appearing value in a set of outcomes
- Mean (Expected Value): The average value in a set of outcomes
- Mathematically, the mean, E(x), can be found:
For example, let's supposed that the PDF of a situation is where as seen in the graph to the left.
The mean of the probability distribution is the average value of the set of outcomes. Graphically, the mean is the value at which the graph would "balance", where the outcomes on the right side of the mean and those on the left side would be equal in relative magnitude and amount.
- In this case, the mean is .
The median of the probability distribution is the middle value of the set of outcomes. Thus, it is the value on the graph where the area to the right of the median and the area to the left of the median are equal. In other words, where the sum of probabilities of the enclosed outcomes on either side is equal to 0.5 or 50%.
- In this case, the median is .
The mode of the probability distribution is the most frequent value in the set of outcomes. Since the function f(x) reveals the probability (relative frequency) of each outcome, the value with the highest probability or the maximum of the graph is the mode.
- In this case, the mode is .
ReliaSoft Corporation, Basic Statistical Definitions
Engineering Statistics Handbook, What is a Probability Distribution
Statistics Help Online, Continuous distributions
Wikipedia, Probability Density Function