Module 6. Analysis of variance
Lesson 21
ONE WAY CLASSIFICATION
21.1 Introduction
The t-test enables us to test the significance of the difference between two sample means but if we have a number of means and we need to test the hypothesis that the means are homogenous or that there is no difference among the means, then the technique known as Analysis of variance developed by Professor R. A. Fisher in 1920 is useful. Initially the technique was used in agricultural experiments but now a days it is widely used in almost all the branches of agricultural and animal sciences. This technique is used to test whether the differences between the means of three or more populations is significant or not. By using the technique of analysis of variance, we can test whether moisture contents of paneer or khoa prepared by different methods or batches differ significantly or not. Analysis of variance thus enables us to test on the basis of sample observations whether the means of three or more populations are significantly different or not. Thus basic purpose of the analysis of variance is to test the homogeneity of several means and the technique consists in splitting up the total variation into component variation due to independent factors where each of the components give us the estimate of population variation. In other words, in this technique, the total sum of squares is decomposed into sum of squares due to independent factors and the remaining is attributed to random causes or commonly called due to error.
21.2 Analysis of Variance
The term ‘Analysis of Variance’ was introduced by Prof. R.A. Fisher in 1920’s to deal with problem in the analysis of agronomical data. Variation is inherent in nature. The total variation in any set of numerical data is due to a number of causes which may be classified as:
(i) Assignable causes, and (ii) Chance causes.
The variation due to assignable causes can be detected and measured whereas the variation due to chance causes is beyond the control of human being and cannot be accounted for separately.
21.2.1 Definition
According to Prof. R. A. Fisher, Analysis of Variance (ANOVA) is the “Separation of variance ascribable to one group of causes from the variance ascribable to other group.” Thus, ANOVA consists in the estimation of the amount of variation due to each of the independent factors (causes) and the remaining due to chance factor (causes), the later being known as experimental error or simply error. The technique of the Analysis of variance consisting in splitting up the total variation into component variation due to independent factors where each of the components gives us the estimate of the population variance. The total sum of squares is broken up into sum of squares due to independent factors and the remaining is attributed to random causes or commonly called due to error. Consider, for instance, an industrial problem such as the following. A factory produces components, many machines being at work on the same operation. The process is not purely mechanical, the machine operators having an influence on the quality of the output. Moreover it is thought that on certain days of the week (e.g. Monday) the output is found to be of poorer quality than on other days (e.g. Friday). The quality therefore depends on at least three factors, the machine, the operator and the day of the week. There may be other factors in operation and some of the factors mentioned may have no significant effect. It will be possible by the technique of analysis of variance whether any of the above factors, or some combinations of these has an appreciable effect on the quality and also to estimate the contribution made by each factor to the overall variability in the production or quality of product. Thus the purpose of the analysis is to establish relations of ‘Cause’ and effect.
21.2.2 Assumptions in analysis of variance
For the validity of the F-test in ANOVA, the following assumptions are made:
(i) The samples are drawn from the population randomly and independently.
(ii) The data are quantitative in nature and are normally distributed. Parent population from which observations are taken is normal.
(iii) Various treatments and environmental effects are additive in nature.
(iv) The population from where the samples have been drawn should have equal variance σ2. This is known as Homoscedasticity and can be tested by Barttlet’s test.
21.3 One-way Analysis of Variance
The simplest type of analysis of variance is known as one way analysis of variance, in which only one source of variation or factor of interest is controlled and its effect on the elementary units is observed. It is an extension of three or more samples of the t-test procedure for use with two independent samples. In other words t-test for use with two independent samples is a special case of one-way analysis of variance. In typical situation one–way classification refers to the comparison of means of several univariate normal population, having the same unknown variance σ2, on the basis of random samples selected from each population. The population means are denoted by μ1, μ2, ..., μk, if there are k populations. The one way analysis of variance is designed to test the null hypothesis:
Ho : μ1 = μ2= ...= μk i.e. the arithmetic means of the population from which the k samples have been randomly drawn are equal to one another.
Let us suppose that N observations Xij, (I = 1,2, … , k ; j = 1,2, … , ni) of a random variable X are grouped on some
basis, into k classes (T1,T2,---, Tk)
of sizes n1, n2, … , nk
respectively as exhibited below:
Table 21.1
Treatment |
Means |
Total |
||||
T1 |
|
|
…… |
|
|
|
T2 |
|
|
…… |
|
|
|
. . |
. . |
. . |
. . |
. . |
||
Ti |
|
|
…… |
|
|
|
. . |
. . |
. . |
. . |
. . |
||
Tk |
|
|
…… |
|
|
|
G |
The total variation in the observations Xij can be split into the following two components:
(i) The variation between the classes or the variation due to different bases of classification, commonly known as treatments.
(ii) The variation within the classes, i.e., the inherent variation of the random variable within the observations of class
The first type of variation is due to assignable causes which can be detected and controlled by human being and the second type of variation is due to chance causes which are beyond the control of human being.
The main object of analysis of variance technique is to examine if there is significant difference between the class means in view of the inherent variability within the separate classes.
In particular, let us
consider the effect of k brands of yoghurt on price of yoghurt of N
shops / retail stores (of same type) divided into k brands/classes of
sizes n1, n2, … , nk
respectively, .
Here the sources of variations are
(i) Effect of the brands
(ii) Error ‘e’ produced by numerous causes of such magnitude that they are not detected and identified with the knowledge that we have and they together produce a variation of random nature obeying Gaussian (Normal) law of errors.
21.3.1 Mathematical model
The linear mathematical model will be
Xij = μ + αi + eij (i=1,2,…,k) (j=1,2,…,ni)
where Xij is the value of the variate in the jth observation (j=1,2,…,ni) belonging to ith class (i=1,2,…,k)
μ is the general mean effect
αi is the effect due to ith class where α i = µ i - μ
eij is random error which is assumed to be independently and normally distributed with mean zero and variance σe2.
Let the mean of k populations be μ1, μ2, …, μk then our aim is to test null hypothesis
Ho : μ1 = μ2= ...= μk=μ which reduces to Ho : α1 = α2 = … = αk = 0.
H1 : At least one pair of μi’s is not equal.
21.3.2 Calculation of different sum of squares
a) Total
Sum of Squares (TSS )
=
where G is the grand total of all the observations and N = n1 + n2 + … + nk
The expression i.e., sum of squares of
all the observations is known as Raw Sum of Squares (R.S.S.) and the expression
is called Correction Factor (C.F.)
b) Sum of Squares Among Classes (SSC): To find the SSC, divide the squares of sum of each class by their class size or number of observations in each class and find their sum and thereafter, Subtract the correction factor from this sum i.e.,
where Ti. is the total of the observations pertaining to the ith class.
c) Sum of Squares within classes (SSE): It is obtained by subtracting sum of squares among the classes from the total sum of squares i.e., SSE=TSS-SSC.
This sum of squares is also called error sum of squares denoted by SSE.
d) Mean Sum of Squares (M.S.S.): It is obtained by dividing sum of squares by their respective degrees of freedom.
e) Analysis of Variance Table
The results of the above calculations are presented in a table called Analysis of Variance or ANOVA table as follows:
Table 21.2
Source of variation |
Degree of Freedom (d.f.) |
Sum of Squares (S.S.) |
Mean Sum of Squares (M.S.S.) |
F-Ratio |
Among Classes |
k-1 |
SSC |
|
SC2/SE2~
F(k-1,N-k) |
Within Classes (Error) |
N-k |
SSE |
|
|
Total |
N-1 |
TSS |
|
|
If the calculated value of F is greater than the tabulated value of F α;(k-1, N-k), where α denotes the level of significance, the hypothesis H0 , is rejected and can be inferred that the class effects are significantly different from one another.
Standard Error
a) The estimated standard error of any class/treatment mean, say ith treatment/class mean, is given by
Where SE2 is the mean sum of squares within samples or MSS(Error)
b) The estimated standard error of the difference ith and jth treatment mean, is
Where ni and nj are the number of observations for ith and jth treatment/class
c) If
ni = nj = n then S.E. of difference of means is
d) The Critical Difference (C.D.) or Least Significant Difference (L.S.D.) can be calculated as
C.D. = SEd xt α,(N-k) where α is level of significance and (N-k) is the d.f. for error.
The treatment means are i=1,2,…,k. These can be compared with the help
of critical difference. Any two treatments means are said to differ
significantly if the difference is larger than the critical difference (CD).The
procedure of one way ANOVA is illustrated through the following example:
Example 1: The following table gives the moisture contents of Paneer prepared by four methods Manual(M1), Mechanical with pressure 10 pound/inch2 (M2);with pressure 12 pound/inch2 (M3) and pressure 15 pound/inch2 (M4).
Table 21.3
Methods |
|||
M1 |
M2 |
M3 |
M4 |
50.3 |
54.1 |
57.5 |
52.3 |
52.2 |
53.7 |
56.3 |
53.2 |
52.5 |
55.5 |
55.8 |
53.6 |
51.7 |
54.6 |
56.9 |
53.4 |
52.6 |
|
55.8 |
53.8 |
|
|
59.6 |
|
Analyze the data to find whether the mean moisture content in paneer is different prepared by different methods.
Solution:
Ho : μ1=μ2=μ3=μ4 i.e., the mean moisture content in paneer prepared by different methods is same.
H1 : Mean moisture content in paneer prepared by at least two methods are not equal.
Prepare the following table to calculate sum of squares due to different components:
Table 21.4
Methods |
M1 |
M2 |
M3 |
M4 |
Total |
Total ( |
259.30 |
217.90 |
341.90 |
266.30 |
G=1085.40 |
No. of observations ( |
5 |
4 |
6 |
5 |
20 |
Mean |
51.8600 |
54.4750 |
56.9833 |
53.2600 |
|
=59000.2200-58904.66=95.5620
Sum
of Squares among Classes (SSC) or Sum of Squares between Methods =
Sum of Squares within classes (SSE) or Sum of squares due to error:
SSE=TSS-SSC= 95.5620-78.48217=17.0798
Prepare the following analysis of variance table:
Table 21.5 ANOVA Table
Source of variation |
Degree of Freedom (d.f.) |
Sum of Squares (S.S.) |
Mean Sum of Squares (M.S.S.) |
F-Ratio |
Among Methods |
4-1=3 |
78.4822 |
|
=24.5068 |
Within Methods (Error) |
20-4=16 |
17.0798 |
=1.0675 |
|
Total |
20-1=19 |
95.5620 |
|
|
From Fisher and Yate’s tables, F value for 3 and 16 d.f. at 5% level of significance is 3.2389 Since the observed value of F in the analysis of variance table is greater than the 5 % tabulated F value, it can be inferred that mean moisture content in paneer prepared by different methods differ significantly from one another.
Calculation of critical differences for comparison among various pairs of methods of preparing paneer
Table 21.6
Methods |
M3 |
M2 |
M4 |
M1 |
Mean |
56.9833 |
54.4750 |
53.2600 |
51.8600 |
No. of observations |
6 |
4 |
5 |
5 |
C.D.(for comparing mean moisture content prepared by
Method 3 and Method 2 ) =
= 0.6669x2.12=1.4138
C.D.(for comparing mean moisture content prepared by
Method 2 and Method 4 ) =
= 0.6931x2.12=1.4693
C.D.(for comparing mean moisture content prepared by
Method 4 and Method 1 ) =
= 0.6534x2.12=1.3853
Conclusion
It
can be concluded the moisture content of paneer prepared by different methods
was found to be significantly different from each other. The mean moisture
content was found to be maximum in method M3(56.9833)
followed by method M2(54.4750) which is significantly different from
each other. The next mean moisture contents was found for method M4 (53.26)
followed by method M1(51.86) which is
significantly different from each other.