Calculations in probability theory
Squared deviations from the mean (SDM ) result from squaring deviations . In probability theory and statistics , the definition of variance is either the expected value of the SDM (when considering a theoretical distribution ) or its average value (for actual experimental data). Computations for analysis of variance involve the partitioning of a sum of SDM.
Background
An understanding of the computations involved is greatly enhanced by a study of the statistical value
E
(
X
2
)
{\displaystyle \operatorname {E} (X^{2})}
, where
E
{\displaystyle \operatorname {E} }
is the expected value operator.
For a random variable
X
{\displaystyle X}
with mean
μ
{\displaystyle \mu }
and variance
σ
2
{\displaystyle \sigma ^{2}}
,
σ
2
=
E
(
X
2
)
−
μ
2
.
{\displaystyle \sigma ^{2}=\operatorname {E} (X^{2})-\mu ^{2}.}
[ 1]
(Its derivation is shown here .) Therefore,
E
(
X
2
)
=
σ
2
+
μ
2
.
{\displaystyle \operatorname {E} (X^{2})=\sigma ^{2}+\mu ^{2}.}
From the above, the following can be derived:
E
(
∑
(
X
2
)
)
=
n
σ
2
+
n
μ
2
,
{\displaystyle \operatorname {E} \left(\sum \left(X^{2}\right)\right)=n\sigma ^{2}+n\mu ^{2},}
E
(
(
∑
X
)
2
)
=
n
σ
2
+
n
2
μ
2
.
{\displaystyle \operatorname {E} \left(\left(\sum X\right)^{2}\right)=n\sigma ^{2}+n^{2}\mu ^{2}.}
Sample variance
The sum of squared deviations needed to calculate sample variance (before deciding whether to divide by n or n − 1) is most easily calculated as
S
=
∑
x
2
−
(
∑
x
)
2
n
{\displaystyle S=\sum x^{2}-{\frac {\left(\sum x\right)^{2}}{n}}}
From the two derived expectations above the expected value of this sum is
E
(
S
)
=
n
σ
2
+
n
μ
2
−
n
σ
2
+
n
2
μ
2
n
{\displaystyle \operatorname {E} (S)=n\sigma ^{2}+n\mu ^{2}-{\frac {n\sigma ^{2}+n^{2}\mu ^{2}}{n}}}
which implies
E
(
S
)
=
(
n
−
1
)
σ
2
.
{\displaystyle \operatorname {E} (S)=(n-1)\sigma ^{2}.}
This effectively proves the use of the divisor n − 1 in the calculation of an unbiased sample estimate of σ 2 .
Partition — analysis of variance
In the situation where data is available for k different treatment groups having size n i where i varies from 1 to k , then it is assumed that the expected mean of each group is
E
(
μ
i
)
=
μ
+
T
i
{\displaystyle \operatorname {E} (\mu _{i})=\mu +T_{i}}
and the variance of each treatment group is unchanged from the population variance
σ
2
{\displaystyle \sigma ^{2}}
.
Under the Null Hypothesis that the treatments have no effect, then each of the
T
i
{\displaystyle T_{i}}
will be zero.
It is now possible to calculate three sums of squares:
Individual
I
=
∑
x
2
{\displaystyle I=\sum x^{2}}
E
(
I
)
=
n
σ
2
+
n
μ
2
{\displaystyle \operatorname {E} (I)=n\sigma ^{2}+n\mu ^{2}}
Treatments
T
=
∑
i
=
1
k
(
(
∑
x
)
2
/
n
i
)
{\displaystyle T=\sum _{i=1}^{k}\left(\left(\sum x\right)^{2}/n_{i}\right)}
E
(
T
)
=
k
σ
2
+
∑
i
=
1
k
n
i
(
μ
+
T
i
)
2
{\displaystyle \operatorname {E} (T)=k\sigma ^{2}+\sum _{i=1}^{k}n_{i}(\mu +T_{i})^{2}}
E
(
T
)
=
k
σ
2
+
n
μ
2
+
2
μ
∑
i
=
1
k
(
n
i
T
i
)
+
∑
i
=
1
k
n
i
(
T
i
)
2
{\displaystyle \operatorname {E} (T)=k\sigma ^{2}+n\mu ^{2}+2\mu \sum _{i=1}^{k}(n_{i}T_{i})+\sum _{i=1}^{k}n_{i}(T_{i})^{2}}
Under the null hypothesis that the treatments cause no differences and all the
T
i
{\displaystyle T_{i}}
are zero, the expectation simplifies to
E
(
T
)
=
k
σ
2
+
n
μ
2
.
{\displaystyle \operatorname {E} (T)=k\sigma ^{2}+n\mu ^{2}.}
Combination
C
=
(
∑
x
)
2
/
n
{\displaystyle C=\left(\sum x\right)^{2}/n}
E
(
C
)
=
σ
2
+
n
μ
2
{\displaystyle \operatorname {E} (C)=\sigma ^{2}+n\mu ^{2}}
Sums of squared deviations
Under the null hypothesis, the difference of any pair of I , T , and C does not contain any dependency on
μ
{\displaystyle \mu }
, only
σ
2
{\displaystyle \sigma ^{2}}
.
E
(
I
−
C
)
=
(
n
−
1
)
σ
2
{\displaystyle \operatorname {E} (I-C)=(n-1)\sigma ^{2}}
total squared deviations aka total sum of squares
E
(
T
−
C
)
=
(
k
−
1
)
σ
2
{\displaystyle \operatorname {E} (T-C)=(k-1)\sigma ^{2}}
treatment squared deviations aka explained sum of squares
E
(
I
−
T
)
=
(
n
−
k
)
σ
2
{\displaystyle \operatorname {E} (I-T)=(n-k)\sigma ^{2}}
residual squared deviations aka residual sum of squares
The constants (n − 1), (k − 1), and (n − k ) are normally referred to as the number of degrees of freedom .
Example
In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6.
I
=
1
2
1
+
2
2
1
+
3
2
1
+
4
2
1
+
6
2
1
=
66
{\displaystyle I={\frac {1^{2}}{1}}+{\frac {2^{2}}{1}}+{\frac {3^{2}}{1}}+{\frac {4^{2}}{1}}+{\frac {6^{2}}{1}}=66}
T
=
(
1
+
2
+
3
)
2
3
+
(
4
+
6
)
2
2
=
12
+
50
=
62
{\displaystyle T={\frac {(1+2+3)^{2}}{3}}+{\frac {(4+6)^{2}}{2}}=12+50=62}
C
=
(
1
+
2
+
3
+
4
+
6
)
2
5
=
256
/
5
=
51.2
{\displaystyle C={\frac {(1+2+3+4+6)^{2}}{5}}=256/5=51.2}
Giving
Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom.
Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom.
Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.
Two-way analysis of variance
See also
References
^ Mood & Graybill: An introduction to the Theory of Statistics (McGraw Hill)