Difference between revisions of "Probability Distributions"

From Math Images
Jump to: navigation, search
(Discrete Probability Distributions)
m
 
(41 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{{HelperPage|1=Blue Wash}}
 +
 +
 
Probability distributions reveal either the probability of a random variable being a particular outcome (as with '''discrete probability distributions''') or the probability that a random variable will fall within a particular interval of outcomes (as with '''continuous probability distributions'''). In addition, probability distributions are such that the total sum of the <balloon title="load:set"> set</balloon> <span id= "set" style="display:none" > A set is a finite or infinite collection of values, often represented as {value, value, value, ...(value)}.</span> of outcomes must be equal to 1 and the probability corresponding to a single outcome of interval of outcomes must be between 0 and 1.
 
Probability distributions reveal either the probability of a random variable being a particular outcome (as with '''discrete probability distributions''') or the probability that a random variable will fall within a particular interval of outcomes (as with '''continuous probability distributions'''). In addition, probability distributions are such that the total sum of the <balloon title="load:set"> set</balloon> <span id= "set" style="display:none" > A set is a finite or infinite collection of values, often represented as {value, value, value, ...(value)}.</span> of outcomes must be equal to 1 and the probability corresponding to a single outcome of interval of outcomes must be between 0 and 1.
  
Line 21: Line 24:
  
  
The cumulative probability of each outcome can be determined from this graph (ex. ''Rolling a 4'' had a cumulative probability of approximately 65%, meaning that the probability of rolling a 4 or lower was about 65%).
+
We can see that the CDF graph is a step function that is defined at each individual possible outcome and increases towards 1. The probability that the dice rolls a 6 or lower must be 1, because this interval contains the entire set of outcome values. The cumulative probability of each outcome can be determined from this graph (ex. ''Rolling a 4'' had a cumulative probability of approximately 0.65, meaning that the probability of rolling a 4 or lower was about 65%).
 
}}  
 
}}  
  
Line 30: Line 33:
  
  
::*<math>\sum_j^{} P[X = j] = 1</math>, where the sum of all possible outcomes of X is 1
+
::*<math>\sum_x^{} P[X = j] = 1</math>, where the sum of all possible outcomes of X is 1
 +
}}
 +
 
 +
==Six-Sided Dice Demonstration==
 +
This applet demonstrates rolling a six-sided dice while recording the outcomes.  {{hide|1=
 +
<java_applet code="DiceProbabilityApp.class" width="800" height ="430" archive="DiceProbApp.jar" />
 
}}
 
}}
  
Line 38: Line 46:
 
   The average weight of an apple is about 150 grams. However, if you were to measure a number of apples, the outcomes that you
 
   The average weight of an apple is about 150 grams. However, if you were to measure a number of apples, the outcomes that you
 
   would obtain from measuring the weight of each apple would vary like so: 150.534... grams, 149.259...grams, 154.274... grams,
 
   would obtain from measuring the weight of each apple would vary like so: 150.534... grams, 149.259...grams, 154.274... grams,
   152.389... grams, and so on. As you can see, there are an infinite number of outcomes that emerge from these measurements. Thus,
+
   152.389... grams, and so on. There are an infinite number of outcomes that emerge from these measurements, so the probability
   the probability that an apple would weight exactly 152.234... grams is zero. The continuous random variable here is the weight
+
   that an apple would weight exactly 152.234... grams is zero. The continuous random variable here is the weight of an apple.
  of an apple.
+
The graphs below show the results of measuring the weights of 1000 apples.
 
{{Switch|link1=Click to show Cumulative Probability Distribution |link2=Click to hide Cumulative Probability Distribution |1=[[Image:HistogramApple.png|350px]] [[Image:FreqApple.png|350px|]]
 
{{Switch|link1=Click to show Cumulative Probability Distribution |link2=Click to hide Cumulative Probability Distribution |1=[[Image:HistogramApple.png|350px]] [[Image:FreqApple.png|350px|]]
  
  
It is unreasonable to plot the frequency of each outcome individually (since each outcome is unique and would only have 1 frequency), so the frequencies must be grouped by intervals to make a histogram that can be generalized into a function. Using the function, we can calculate the probability of an interval (ex. [a,b]) of outcome values.
+
It is unreasonable to plot the frequency of each outcome individually (each outcome is unique and would only have a frequency of 1), so the frequencies must be grouped by intervals to make a histogram that can be generalized into a function. Using the function, we can calculate the probability that the random variable with fall within an interval (ex. ''[a,b]'') of values.
 
|2=[[Image:HistogramApple.png|350px]] [[Image:FreqApple.png|350px]] [[Image:CDFApple.png|350px]]
 
|2=[[Image:HistogramApple.png|350px]] [[Image:FreqApple.png|350px]] [[Image:CDFApple.png|350px]]
  
  
It is clear that the CDF for this illustration does not exceed 1 and we can use the graph to find the cumulative probability of a outcome value (ex. ''b'')}}
+
The cumulative probability graph for this illustration ends at 1, and the cumulative probability of each outcome (ex. ''b'') can be determined from this graph. Also, the CDF graph is steeper in the middle where the frequencies are greater.
 +
}}
  
 
{{HideThis| 1=Mathematical Explanation |2=
 
{{HideThis| 1=Mathematical Explanation |2=
Line 67: Line 76:
  
 
{{HideThis| 1=Mathematical Explanation |2=
 
{{HideThis| 1=Mathematical Explanation |2=
Mathematically, we can define the CDF to be: <math>F(x) = P[X \leq x]</math>, where <math>F(x)\,</math> must be positive and must stay between 0 and 1.
+
Mathematically, we can define the CDF to be: <math>F(x) = P[X \leq x]</math>, where <math>F(x)\,</math> must be positive and must stay between 0 and 1 because the CDF represents accumulating probabilities.
  
 
To find the cumulative probability of ''X'' for an outcome value ''a'':
 
To find the cumulative probability of ''X'' for an outcome value ''a'':
 
: for '''discrete probability functions''', we evaluate the expression <math>P[X \leq a] = \sum_{x_i}^a F(x)\,</math>
 
: for '''discrete probability functions''', we evaluate the expression <math>P[X \leq a] = \sum_{x_i}^a F(x)\,</math>
  
Therefore, discrete probability distributions must have fragmented CDFs. Going back to the above section on discrete probability distributions, you can click to expand the CDF graph for the given illustration. As we can see, the graph is a step function that increases towards 1.
+
Therefore, discrete probability distributions must have fragmented CDFs that are uniquely defined at each outcome. Please refer back to the section on discrete probability distributions, and click to expand the CDF graph for the given illustration.
  
 
: for '''continuous probability functions''', we evaluate the expression <math>P[X \leq a] = \int_{-\infty}^{a} F(x)dx</math>
 
: for '''continuous probability functions''', we evaluate the expression <math>P[X \leq a] = \int_{-\infty}^{a} F(x)dx</math>
  
Therefore, continuous probability distributions must have continuous CDFs. In the previous section on continuous probability distributions, you can click to expand the CDF graph for the given illustration. We can see that the graph is a continuous function that does not exceed 1.
+
Therefore, continuous probability distributions must have continuous CDFs. Please refer back to the previous section on continuous probability distributions, and click to expand the CDF graph for the given illustration.
 
}}
 
}}
 +
  
 
==Probability Density Function <math>f(x)\,</math>==
 
==Probability Density Function <math>f(x)\,</math>==
[[Image:Normal_Distribution_pdf.png|275px|thumb|Corresponding Probability Density Functions]]
+
[[Image:Normal_Distribution_pdf.png|275px|thumb|Corresponding Probability Density Functions, probabilities are on the y-axis and standard deviations are on the x-axis]]
The probability density function (PDF) is the derivative of the cumulative distribution function, so it must integrate to 1. Mathematically:
+
The probability density function (PDF) of a random variable describes the probability of each point in the set of outcomes available to the random variable. The PDF can also be defined as the derivative of the cumulative distribution function, because the CDF corresponds with the gradual summation of the PDF. If the CDF is always increasing to 1, then the PDF must always be positive and the entire PDF must integrate to 1.
 +
 
 +
Mathematically:
  
 
:<math> f(x) = \frac{d}{dx}F(x)</math> OR <math> F(x) = \int_{-\infty}^{x} f(s)ds</math>
 
:<math> f(x) = \frac{d}{dx}F(x)</math> OR <math> F(x) = \int_{-\infty}^{x} f(s)ds</math>
 
A PDF of a random variable describes the probability of each point in the set of outcomes available to the random variable. The PDF must always be positive if the CDF is always increasing.
 
  
  
Line 95: Line 105:
  
 
:'''Mean (Expected Value)''': The average value in a set of outcomes
 
:'''Mean (Expected Value)''': The average value in a set of outcomes
::Mathematically, the mean, ''E(x)'', can be found: <math>E(X) = \int_a^b xf(x)dx </math>
+
::Mathematically, the mean, ''E(X)'', can be found:
 +
 
 +
::<math>E(X) = \frac{1}{A}\sum_{j}^{} j </math> for '''discrete probability functions''', where ''A'' is the number of values in the set ''j'' of all outcomes
 +
 
 +
::<math>E(X) = \int_{-\infty}^{\infty} xf(x)dx </math> for '''continuous probability functions'''
  
 
{{HideThis|1=Graphical example|2=
 
{{HideThis|1=Graphical example|2=
[[Image:PDFgraph.png|200px|left]]
+
[[Image:PDFgraph.png|248px|left]]
  
  
  
 +
For example, let's supposed that the PDF of a situation is <math>f(x) = 0.5x\,</math> where <math>0\leq x \leq 2</math> as seen in the graph to the left.
  
For example, let's supposed that the PDF of a situation is <math>f(x) = 0.5x\,</math> where <math>0\leq x \leq 2</math> as seen in the graph to the left.
+
 
[[Image:PDFgraphMean.png|200px|left]]
+
 
 +
 
 +
 
 +
[[Image:PDFgraphMean.png|248px|left]]
  
  
Line 114: Line 132:
 
The mean of the probability distribution is the average value of the set of outcomes. Graphically, the mean is the value at which the graph would "balance", where the outcomes on the right side of the mean and those on the left side would be equal in relative magnitude and amount.  
 
The mean of the probability distribution is the average value of the set of outcomes. Graphically, the mean is the value at which the graph would "balance", where the outcomes on the right side of the mean and those on the left side would be equal in relative magnitude and amount.  
  
:In this case, the mean is <math>x = \frac{4}{3}</math>.
+
:In this case, the mean is <math>\frac{4}{3} \approx 1.333</math>.
[[Image:PDFgraphMedian.png|200px|left]]
+
{{HideThis|1=Proof|2=
 +
<template>AlignEquals
 +
|e1l=E(X)
 +
|e1r=\int_{0}^{2} xf(x)dx
 +
|e2l=E(X)
 +
|e2r=\int_{0}^{2} \left( x(0.5x) \right)dx
 +
|e3l=E(X)
 +
|e3r=\int_{0}^{2} \left( \frac{x^2}{2} \right)dx
 +
|e4l=E(X)
 +
|e4r=\left( \frac{x^3}{6} \right) {{!}}_{0}^{2}
 +
|e5l=E(X)
 +
|e5r=\frac{(2)^3}{6} - \frac{(0)^3}{6}
 +
|e6l=E(X)
 +
|e6r= \frac{8}{6}
 +
|e7l=E(X)
 +
|e7r= \frac{4}{3}
 +
</template>
 +
}}
 +
 
  
 +
 +
 +
[[Image:PDFgraphMedian.png|248px|left]]
  
  
Line 122: Line 161:
 
The median of the probability distribution is the middle value of the set of outcomes. Thus, it is the value on the graph where the area to the right of the median and the area to the left of the median are equal. In other words, where the sum of probabilities of the enclosed outcomes on either side is equal to 0.5 or 50%.  
 
The median of the probability distribution is the middle value of the set of outcomes. Thus, it is the value on the graph where the area to the right of the median and the area to the left of the median are equal. In other words, where the sum of probabilities of the enclosed outcomes on either side is equal to 0.5 or 50%.  
  
:In this case, the median is <math>x = \sqrt{2}</math>.
+
:In this case, the median is <math>\sqrt{2} \approx 1.414 </math>.
[[Image:PDFgraphMode.png|200px|left]]
+
{{HideThis|1=Proof|2=
 +
<template>AlignEquals
 +
|e1l=\int_0^b f(x)dx
 +
|e1r=\int_b^2 f(x)dx
 +
|e2l=\int_0^b \left( \frac{1}{2}x \right) dx
 +
|e2r=\int_b^2 \left( \frac{1}{2}x \right) dx
 +
|e3l=\left( \frac{1}{4}x^2 \right) {{!}}_{0}^{b}
 +
|e3r= \left( \frac{1}{4}x^2 \right) {{!}}_{b}^{2}
 +
|e4l=\frac{(b)^2}{4} - \frac{(0)^2}{4}
 +
|e4r= \frac{(2)^2}{4} - \frac{(b)^2}{4}
 +
|e5l=b^2 \,
 +
|e5r= 4 - b^2 \,
 +
|e6l=2b^2 \,
 +
|e6r= 4\,
 +
|e7l=b^2 \,
 +
|e7r= 2 \,
 +
|e8l=b \,
 +
|e8r= \sqrt{2}\,
 +
</template>
 +
}}
  
 +
 +
[[Image:PDFgraphMode.png|248px|left]]
  
  
  
The mode of the probability distribution is the most frequent value in the set of outcomes. Since the function f(x) reveals the probability (relative frequency) of each outcome, the value with the highest probability or the maximum of the graph is the mode.
 
  
:In this case, the mode is <math>x = 2\,</math>.
 
}}
 
  
  
 +
The mode of the probability distribution is the most frequent value in the set of outcomes. Since the function f(x) reveals the probability (relative frequency) of each outcome, the value with the highest probability or the maximum of the graph is the mode.
  
 +
:In this case, the mode is <math>2\,</math>.
 +
}}
  
 
==References==
 
==References==
Line 144: Line 204:
  
 
Wikipedia, [http://en.wikipedia.org/wiki/Probability_density_function Probability Density Function]
 
Wikipedia, [http://en.wikipedia.org/wiki/Probability_density_function Probability Density Function]
 +
 +
Math Forum - Ask Dr.Math, [http://mathforum.org/library/drmath/view/70509.html Visually Identifying Mean of a Probability Density Function]

Latest revision as of 09:29, 16 June 2011

This is a Helper Page for:
Blue Wash


Probability distributions reveal either the probability of a random variable being a particular outcome (as with discrete probability distributions) or the probability that a random variable will fall within a particular interval of outcomes (as with continuous probability distributions). In addition, probability distributions are such that the total sum of the set of outcomes must be equal to 1 and the probability corresponding to a single outcome of interval of outcomes must be between 0 and 1.



Discrete Probability Distributions

 Illustration: Rolling A Fair, Six-Sided Dice
 
 There are only 6 possible outcomes if you roll a six-sided dice: 1, 2, 3, 4, 5, and 6. The probabilities that you roll any of these   
 outcomes is 1/6, and the sum of the probabilities of the six different outcomes is 1. The discrete random variable here is the
 value of a roll.

The following graphs show the results of rolling a six-sided dice 1000 times.

FrequencyDice.png ProbabilityDice.png


The frequencies and probabilities of each outcome can be determined from these graphs (ex. Rolling a 4 had a frequency of 160, meaning that 160 out of the 1000 rolls resulted in a value of 4. Also, Rolling a 4 had probability of about 0.16).

FrequencyDice.png ProbabilityDice.png CumulativeDice.png


We can see that the CDF graph is a step function that is defined at each individual possible outcome and increases towards 1. The probability that the dice rolls a 6 or lower must be 1, because this interval contains the entire set of outcome values. The cumulative probability of each outcome can be determined from this graph (ex. Rolling a 4 had a cumulative probability of approximately 0.65, meaning that the probability of rolling a 4 or lower was about 65%).


Mathematically, a discrete probability distribution can be defined as a distribution with a set of outcomes that are discrete values, and are usually also pre-defined and finite. If X represents the discrete random variable, while x represents a possible outcome of X and j represents the set of all outcomes of X, then a discrete probability distribution is such that:

  • p(x) = P[X = x]\,, where p(x) is the probability that X = x


  • \sum_x^{} P[X = j] = 1, where the sum of all possible outcomes of X is 1

Six-Sided Dice Demonstration

This applet demonstrates rolling a six-sided dice while recording the outcomes.

If you can see this message, the Java Applet failed to run. No Java plug-in was found.

Continuous Probability Distributions

 Illustration: Weighing Apples
 
 The average weight of an apple is about 150 grams. However, if you were to measure a number of apples, the outcomes that you
 would obtain from measuring the weight of each apple would vary like so: 150.534... grams, 149.259...grams, 154.274... grams,
 152.389... grams, and so on. There are an infinite number of outcomes that emerge from these measurements, so the probability
 that an apple would weight exactly 152.234... grams is zero. The continuous random variable here is the weight of an apple.

The graphs below show the results of measuring the weights of 1000 apples.

HistogramApple.png FreqApple.png


It is unreasonable to plot the frequency of each outcome individually (each outcome is unique and would only have a frequency of 1), so the frequencies must be grouped by intervals to make a histogram that can be generalized into a function. Using the function, we can calculate the probability that the random variable with fall within an interval (ex. [a,b]) of values.

HistogramApple.png FreqApple.png CDFApple.png


The cumulative probability graph for this illustration ends at 1, and the cumulative probability of each outcome (ex. b) can be determined from this graph. Also, the CDF graph is steeper in the middle where the frequencies are greater.


Mathematically, continuous probability distribution can be defined as a distribution with an infinite set of uncountable outcomes. The probability that a random outcome is equal to any real-value is zero, because there are an infinite number of outcomes that are possible. Thus, probabilities can only be calculated over intervals. If X represents the continuous random variable while x represents a possible outcome of X, then a continuous probability distribution is such that:

  • p(a \leq x \leq b) = \int_a^b f(x)dx , where p(a < x < b) is the probability of the interval [a, b] and f(x) is the function describing the distribution


  • P[X = x] = 0 \,, where the probability of any single possible outcome is zero


  • \int_{-\infty}^{\infty} f(x)dx = 1, where the sum of all the probabilities of the infinite set of outcomes is 1

Cumulative Distribution Function F(x)\,

Various Cumulative Probability Functions

Another way to define these two types of distributions is by their relationship to the cumulative distribution function F(x). A cumulative distribution function (CDF) is used to find the probability that a random variable X is less than or equal to an outcome value a.

Mathematically, we can define the CDF to be: F(x) = P[X \leq x], where F(x)\, must be positive and must stay between 0 and 1 because the CDF represents accumulating probabilities.

To find the cumulative probability of X for an outcome value a:

for discrete probability functions, we evaluate the expression P[X \leq a] = \sum_{x_i}^a F(x)\,

Therefore, discrete probability distributions must have fragmented CDFs that are uniquely defined at each outcome. Please refer back to the section on discrete probability distributions, and click to expand the CDF graph for the given illustration.

for continuous probability functions, we evaluate the expression P[X \leq a] = \int_{-\infty}^{a} F(x)dx
Therefore, continuous probability distributions must have continuous CDFs. Please refer back to the previous section on continuous probability distributions, and click to expand the CDF graph for the given illustration.


Probability Density Function f(x)\,

Corresponding Probability Density Functions, probabilities are on the y-axis and standard deviations are on the x-axis

The probability density function (PDF) of a random variable describes the probability of each point in the set of outcomes available to the random variable. The PDF can also be defined as the derivative of the cumulative distribution function, because the CDF corresponds with the gradual summation of the PDF. If the CDF is always increasing to 1, then the PDF must always be positive and the entire PDF must integrate to 1.

Mathematically:

 f(x) = \frac{d}{dx}F(x) OR  F(x) = \int_{-\infty}^{x} f(s)ds


Mean, Median, and Mode

Median: The middle value of a set of outcomes
Mode: The most commonly appearing value in a set of outcomes
Mean (Expected Value): The average value in a set of outcomes
Mathematically, the mean, E(X), can be found:
E(X) = \frac{1}{A}\sum_{j}^{} j for discrete probability functions, where A is the number of values in the set j of all outcomes
E(X) = \int_{-\infty}^{\infty} xf(x)dx for continuous probability functions

PDFgraph.png


For example, let's supposed that the PDF of a situation is f(x) = 0.5x\, where 0\leq x \leq 2 as seen in the graph to the left.



PDFgraphMean.png




The mean of the probability distribution is the average value of the set of outcomes. Graphically, the mean is the value at which the graph would "balance", where the outcomes on the right side of the mean and those on the left side would be equal in relative magnitude and amount.

In this case, the mean is \frac{4}{3} \approx 1.333.

<template>AlignEquals



PDFgraphMedian.png


The median of the probability distribution is the middle value of the set of outcomes. Thus, it is the value on the graph where the area to the right of the median and the area to the left of the median are equal. In other words, where the sum of probabilities of the enclosed outcomes on either side is equal to 0.5 or 50%.

In this case, the median is \sqrt{2} \approx 1.414 .

<template>AlignEquals


PDFgraphMode.png




The mode of the probability distribution is the most frequent value in the set of outcomes. Since the function f(x) reveals the probability (relative frequency) of each outcome, the value with the highest probability or the maximum of the graph is the mode.

In this case, the mode is 2\,.

References

ReliaSoft Corporation, Basic Statistical Definitions

Engineering Statistics Handbook, What is a Probability Distribution

Statistics Help Online, Continuous distributions

Wikipedia, Probability Density Function

Math Forum - Ask Dr.Math, Visually Identifying Mean of a Probability Density Function