Degrees of freedom
These notes discuss the concept of degrees of freedom. This concept arose in a strictly mathematical context relating to the freedom of points in n-dimensional space to vary within lower dimensional subspaces of the full space. In this regards, degrees of freedom are identified with dimension.
For ordinary mortals, less terrifying expositions are required. Here, the idea is introduced in the context of estimating a population or process standard deviation. Topics discussed include:
Estimating from a set of measurements
bias due to constraint on the deviations on which the estimate is based
reduction in bias
Degrees of freedom for
without and with constraints on their freedom to vary
Degrees of freedom and number of linear parameters to be estimated before estimating
simple linear regression
multipe linear regression
Degrees of freedom and the Analysis of Variance
Minitab knows the rules!
When calculating the standard deviation of a set of n numbers, say X1, X2, . . . Xn, the formula
This is frequently referred to as the population standard deviation formula because it is the formula that would be used if the set of numbers available were measurements on a complete population of n individuals, for example, in a census, where n might be millions. In that case, is the process or population mean that is otherwise denoted by and the formula above may be denoted by .
Bias in estimating
When estimating a process or population standard deviation, , from a sample of n measurements, say X1, X2, . . . , Xn, it is conventional to use a slight variation on the formula above in which the divisor, n, is replaced by n – 1. This is because the deviations Xi – of individual measurements from the sample mean tend to be smaller than the deviations of the Xi from , the population or process mean, which are the deviations we would ideally use. Indeed, is the least squares estimator of so it as close to the n sample values, in the least squares sense, as any one number can be. The result is that the formula with divisor n is smaller than it really should be, that is, it is biased downwards.
Dividing by n – 1 instead of n helps1 counteract this downward bias in the estimate of . The resulting estimate of the population or process standard deviation, , is
This estimate for is used by popular convention.
Degrees of freedom for estimating
In mathematical language, n – 1 is referred to as the number of degrees of freedom associated with the n deviations,, which are the basic building blocks from which the standard deviation formula is constructed. This terminology arises from the fact that the sum of the deviations is 0,
This equation constrains the deviations such that, if values are given for any n – 1 of the deviations, the value of the remaining deviation is automatically determined; it is minus the sum of the given n – 1 values. For example,
Because one of the n deviations is determined by the other n – 1 deviations in this way, the n deviations are said to have lost one degree of freedom, so that the n deviations have only n – 1 degrees of freedom.
Explaining degrees of freedom in special cases, n = 1, 2, n
This rather abstract account may be illustrated more concretely as follows.
Consider an arbitrary pair of variables, say
( X1 , X2 ),
where each variable is free to vary independently of the other, that is, each variable may assume any value, irrespective of the value of the other. Such a pair may be said to have two degrees of freedom. However, if the pair is required to satisfy the equation
X1 + X2 = 0,
then specifying a value for one of the variables automatically determines the value for the other variable. If X1 = 2, then X2 must be –2. In that case, the pair is said to have lost one degree of freedom and so has just one degree of freedom instead of two.
The triple of variables
( X1 , X2 , X3 ),
is thought of as having three degrees of freedom. However, if the three variables also satisfy the equation
X1 + X2 + X3 = 0,
then the triple has just two degrees of freedom, having lost one degree of freedom because of the constraint. If X1 = 2 and X2 = 4, then X3 must be – 6;
X3 = – ( X1 + X2 ).
More generally, if
Xn = .
The n variables, constrained by having to sum to 0, thus lose one degree of freedom and so have n – 1 degrees of freedom rather than the n degrees of freedom that they would have if they were unconstrained.
Replacing the variables X1, X2, . . . , Xn in the last paragraph by the deviations
X1 – , X2 – , . . . , Xn –
and noting that the deviations do sum to 0, we conclude that the deviations lose one degree of freedom due to the constraint of summing to 0 and so have n – 1 degrees of freedom to vary.
Degrees of freedom for estimating
As noted earlier, given a sample of n measurements X1, X2, . . . , Xn from a population or process, we would like to be able to estimate by
However, since typically we do not know , we estimate in this formula by, replace n by
n – 1 to adjust for the bias introduced by using and thus end up with the estimate
for . Using the mathematical language informally, we say that we lose one degree of freedom in estimating by and have n – 1 degrees of freedom remaining on which to base an estimate of , that is we have the n – 1 degrees of freedom associated with the n deviations
X1 – , X2 – , . . . , Xn –
on which to base an estimate of .
Simple linear regression
In simple linear regression, the estimate of is based on the residuals
, 1 ≤ i ≤ n.
The actual formula used is
Note that does not occur in the sum of squares part of the formula, as it would in a normal standard deviation formula. This is because the sum of the residuals is 0 and so = 0.
Ideally, the actual values of and would be used in calculating the residuals, in which case the divisor used would be n. However, it is their least squares estimates, and, that are used. Because these are chosen so that the fitted values,, are as close as possible to the observed values, Yi, using them means that is biased downwards. The divisor n – 2 is used to counteract this bias.
We may think of this as using two degrees of freedom to estimate the two parameters and in the ideal residuals, Yi – – Xi, leading to the least squares residuals, Thus, we end up having n – 2 degrees of freedom on which to base an estimate of .
"Losing" degrees of freedom
More technically, the mathematical derivation leading to the formulas for and includes two equations involving the residuals; ei = 0 and Xiei = 0. The residuals can, in principle, take on any values. However, the first equation says that their sum must be 0, so that, once n – 1 of them are assigned values, the last one is determined as minus their sum and thus the last one is not free to vary; one degree of freedom is lost. Similarly, because of the second equation, a second degree of freedom is lost. Thus, ultimately, the residuals have n – 2 degrees of freedom.
Multiple linear regression
This account extends naturally to multiple linear regression. The mathematical derivation of the regression coefficient estimates involves the solution of algebraic equations involving the residuals, as many equations as there are regression coefficients. The fact that the residuals satisfy these equations places a corresponding number of constraints on the residuals. Mathematically, they are not free to vary; they have lost as many degrees of freedom as there are regression coefficients, one for each equation. This is reflected in the formula
Using least squares estimates of the p regression coefficients means that the residuals are biased downwards in magnitude and so, therefore, is. Using n–p as the divisor in the formula for adjusts for this bias.
Degrees of freedom in the analysis of variance in regression
Standard computer output of the results of a regression analysis invariably include an Analysis of Variance table. The main component of this table that may be useful in regression is the value of the F statistic for testing the hypothesis that all the regression coefficients, excluding the intercept, are zero. This hypothesis means that the explanatory variables, the X's, actually explain nothing regarding variation in the response variable, Y. The sampling distribution of the F statistic depends on the numbers of degrees of freedom associated with its numerator and its denominator. These numbers of degrees of freedom relate to the fitted values and the residual, respectively.
To illustrate, consider the analysis of variance table resulting from the first fit regression in the Jobtimes case study:
Analysis of Variance
Source DF SS MS F P
Regression 4 756055 189014 134.69 0.000
Residual Error 15 21050 1403
Total 19 777105
Degrees of freedom and the F test statistic
The calculated value of F, 134.69, is highly statistically significant, since the corresponding p-value is 0 to 3 decimal places. Alternatively, the 5% critical value for F with 4 and 15 degrees of freedom, (the degrees of freedom being given in the DF column), is F4,15; 0.05 = 3.1, considerably exceeded by the calculated value of 134.69.
Degrees of freedom for Residual Error
The number of degrees of freedom for "Residual Error" is 15 because the 20 residuals each incorporate estimates of the 5 regression coefficients so that 5 degrees of freedom are lost due to estimation of the regression coefficients leaving 20 – 5 = 15 degrees of freedom in the residuals for the estimation of .
Note that the sum of squares of the residuals, a crude measure of the variation in the residuals, is given in the SS column as 21,050. Dividing this by 15, the corresponding number of degrees of freedom, gives the value of the "Mean Square of Residual", given in the MS column as 1,403. This is the estimate of 2, based on these data. Note that the square root of this is the value of s, correct to 2 decimal paces, that is, .
Degrees of freedom for Regression
The number of degrees of freedom for "Regression" is more subtle. The sum of squares for regression, 756,055 as given in the SS column, is actually the sum of the squares of the deviations of the fitted values,, from their mean, which equals. Given the values of the X variables, the n = 20 fitted values are determined by the values of the estimates of the five regression coefficients. This means that the n = 20 fitted values have five degrees of freedom; they have as much freedom to vary as have the five regression coefficients. Because the sum of squares involves the deviations of the fitted values from their mean, and these deviations necessarily sum to 0, the deviations have one less degree of freedom, that is, 5 – 4 = 4 in this case.
Note that the sum of the degrees of freedom for Regression and the degrees of freedom for Residual Error sum to the Total degrees of freedom, 4 + 15 = 19. The Total number of degrees of freedom corresponds to the deviations of the observed values, Yi, from their mean, and so equal 20 – 1 =19. The corresponding sum of squares, 777,105 in the SS column of the table, is the sum of squares of these deviations. Note that this sum of squares is the sum of the other two.
Basis for the analysis of variance
This last equation is the basis for the analysis of variance. Recall from Lecture 2.2, Slide 49:
Regression Sum of Squares measures
Residual Sum of Squares measures
unexplained (chance) variation
Total Variation = Explained + Unexplained
In terms of sum of squares formulas,
= + .
Degrees of freedom follow a corresponding equation:
n – 1 = p – 1 + n – p,
where p is the number of regression coefficients (including the intercept).
The software knows the rules
Fortunately, there is no need to memorise all this detail, awareness of the ideas is enough. Virtually all statistical software computes the appropriate numbers of degrees of freedom and displays them in an analysis of variance table. In many cases, the software will indicate explicitly the number of degrees of freedom associated with s = , that is, the residual degrees of freedom. Minitab does not do this so the number of residual degrees of freedom must be read from the Analysis of Variance table.