The normal distribution is a continuous probability distribution that appears naturally in applications of statistics and probability. The shape of the function forms a “bell-curve”. The center of the curve is described by the mean of the distribution, denoted with the greek letter (mu). The shape of the curve is described by the standard deviation of the distribution, denoted with the greek letter (sigma). The area under the curve is equal to and the probability of an event occuring between two values is equal to the area between the two values.
The probability density function (PDF), shown above, gives the general form of the normal distribution in terms of the standard deviation and mean of the distribution.
|The standard deviation is a constant value, denoted with the greek letter (sigma), that measures how far the population is distributed around the mean of the population.|
|The circle constant appears in the scaling factor that ensures the area under the distribution is equal to .|
|Euler's number gives the family of functions defined by the general equation helpful properties and makes the values of the other variables more meaningful.|
|The mean of the population is a constant value denoted as the greek letter and describes the center of the distribution. The bell-curve is symmetrical around the mean.|
|The input variable .|
Note, see the standard normal distribution for a simplified form of the function where the mean is and the standard deviation is . Historically, because of the difficulty in calculating the integral of the normal distribution integral, this form allows you to look up the area under the curve for standardized values.
- The area under the curve is equal to .
- The mean, denoted as (mu), is the center of the distribution.
- The standard deviation, denoted as (sigma), desribes how far values are from the mean.
The properties of the standard form of the normal distribution are visualized by the graphs given below of normal distributions with standard deviations of 1, 2 and 3 and a mean of 0. The area under the curve is divided by the length of the standard deviation from the mean and labeled with the percentage of the area contained in the section.
The probability of an event occuring on a probability density function between two values, and , is equal to the area under the curve from to . For example, the probability of an event occuring within standard deviation of the mean of a normal distribution is equal to . The general integral forms for calculating probability for PDFs are given below:
|The probability of an event occuring below a threshold .|
|The probability of an event occuring above a threshold .|
|The probability of an event occuring between and .|
In practice, these integrals prove tricky to calculate. Instead, the normal cumulative distribution function (CDF) is usually used. The normal CDF returns area under the curve to the left of a value, which corresponds to the first case . This alone is enough to find the other integegrals. These strategies are summarized below, before defining the normal CDF.
The probability of an event occuring below a threshold is equal to the integral from negative infinity to the threshold. This is what the normal CDF returns.
The probability of an event occuring above a threshold is equal to minus the probability of the event occuring below the threshold. This is given in the equation below:
The probability between to values and , where , is equal to the area below minus the area below . This is given in the equation below:
There are three strategies for calculating the normal cumulative distribution function: 1) Use the equation for the normal cumulative distribution function (CDF) defined with the error function. 2) Use a statistical function such as
NORM.DIST as implemented in Excel and google sheets. 3) Calculate the z-score of a value and look up the probability in a table of z-scores for the standard normal distribution. All three produce the same results.
The general form of cumulative distribution function (CDF) is given in the equation above. The output of the general CDF is equal to the area under the curve to the left of a value on the normal PDF. In other words, the CDF function returns the integral from negative infinity to a value . For example, returns the value for a distribution with a mean of and standard deviation of .
Shown below are some examples of the PDF and CDF of some distributions:
The normal distribution function as implemented by Excel and Google Sheets calculates the probability of an event occuring below a value . The syntax of the function is given below:
NORM.DIST( x, mean, standard_deviation, cumulative)
|x||The threshold value.|
|mean||The mean of the distribution|
|standard_deviation||The standard deviation of the distribution.|
For example, to compute the probability for a normal distribution with a mean of and a standard deviation of the formula would be.
= NORM.DIST(5, 7, 3, TRUE)
Visually this corresponds to the area under the curve to the left of the value 5.
This strategy is mostly included for historical reasons or if you find yourself stranded on a desert island and need to calculate probabilities by hand.
Euler’s number is chosen as the base of the exponential function to ensure that the family of exponential functions described by this equation inherit the helpful attributes of derivation and integration that functions in the form of have. More importantly, it also guarantees that the other variables
The standard normal distribution is a probability density function with some unique and meaningful properties. The "standard" form is a special case of the generic normal distribution with a mean of 0 and a standard deviation of 1.
The standard deviation formula is denoted by the greek lower case sigma symbol in the case of the population and the latin letter s for the sample.
The sample standard deviation formula is denoted by the greek lower case sigma symbol in the case of the population and the latin letter s for the sample.