Module 6. Analysis of variance

Lesson 21

ONE WAY CLASSIFICATION

21.1 Introduction

The t-test enables us to test the significance of the difference between two sample means but if we have a number of means and we need to test the hypothesis that the means are homogenous or that there is no difference among the means, then the technique known as Analysis of variance developed by Professor R. A. Fisher in 1920 is useful. Initially the technique was used in agricultural experiments but now a days it is widely used in almost all the branches of agricultural and animal sciences. This technique is used to test whether the differences between the means of three or more populations is significant or not. By using the technique of analysis of variance, we can test whether moisture contents of paneer or khoa prepared by different methods or batches differ significantly or not. Analysis of variance thus enables us to test on the basis of sample observations whether the means of three or more populations are significantly different or not. Thus basic purpose of the analysis of variance is to test the homogeneity of several means and the technique consists in splitting up the total variation into component variation due to independent factors where each of the components give us the estimate of population variation. In other words, in this technique, the total sum of squares is decomposed into sum of squares due to independent factors and the remaining is attributed to random causes or commonly called due to error.

21.2 Analysis of Variance

The term ‘Analysis of Variance’ was introduced by Prof. R.A. Fisher in 1920’s to deal with problem in the analysis of agronomical data. Variation is inherent in nature. The total variation in any set of numerical data is due to a number of causes which may be classified as:

(i) Assignable causes, and (ii) Chance causes.

The variation due to assignable causes can be detected and measured whereas the variation due to chance causes is beyond the control of human being and cannot be accounted for separately.

21.2.1 Definition

According to Prof. R. A. Fisher, Analysis of Variance (ANOVA) is the “Separation of variance ascribable to one group of causes from the variance ascribable to other group.” Thus, ANOVA consists in the estimation of the amount of variation due to each of the independent factors (causes) and the remaining due to chance factor (causes), the later being known as experimental error or simply error. The technique of the Analysis of variance consisting in splitting up the total variation into component variation due to independent factors where each of the components gives us the estimate of the population variance. The total sum of squares is broken up into sum of squares due to independent factors and the remaining is attributed to random causes or commonly called due to error. Consider, for instance, an industrial problem such as the following. A factory produces components, many machines being at work on the same operation. The process is not purely mechanical, the machine operators having an influence on the quality of the output. Moreover it is thought that on certain days of the week (e.g. Monday) the output is found to be of poorer quality than on other days (e.g. Friday). The quality therefore depends on at least three factors, the machine, the operator and the day of the week. There may be other factors in operation and some of the factors mentioned may have no significant effect. It will be possible by the technique of analysis of variance whether any of the above factors, or some combinations of these has an appreciable effect on the quality and also to estimate the contribution made by each factor to the overall variability in the production or quality of product. Thus the purpose of the analysis is to establish relations of ‘Cause’ and effect.

21.2.2 Assumptions in analysis of variance

For the validity of the F-test in ANOVA, the following assumptions are made:

(i) The samples are drawn from the population randomly and independently.

(ii) The data are quantitative in nature and are normally distributed. Parent population from which observations are taken is normal.

(iii) Various treatments and environmental effects are additive in nature.

(iv) The population from where the samples have been drawn should have equal variance σ². This is known as Homoscedasticity and can be tested by Barttlet’s test.

21.3 One-way Analysis of Variance

The simplest type of analysis of variance is known as one way analysis of variance, in which only one source of variation or factor of interest is controlled and its effect on the elementary units is observed. It is an extension of three or more samples of the t-test procedure for use with two independent samples. In other words t-test for use with two independent samples is a special case of one-way analysis of variance. In typical situation one–way classification refers to the comparison of means of several univariate normal population, having the same unknown variance σ², on the basis of random samples selected from each population. The population means are denoted by μ₁, μ₂, ..., μ_k,if there are k populations. The one way analysis of variance is designed to test the null hypothesis:

H_o : μ₁ = μ₂= ...= μ_k i.e. the arithmetic means of the population from which the k samples have been randomly drawn are equal to one another.

Let us suppose that N observations X_ij, (I = 1,2, … , k ; j = 1,2, … , n_i) of a random variable X are grouped on some basis, into k classes (T₁,T₂,---, T_k) of sizes n₁, n₂, … , n_k respectively as exhibited below:

Table 21.1

Treatment					Means	Total
T₁			……
T₂			……
	. .	. .		. .	. .	. .
T_i			……			.
	. .	. .	. .		. .	. .
T_k			……
						G

The total variation in the observations X_ijcan be split into the following two components:

(i) The variation between the classes or the variation due to different bases of classification, commonly known as treatments.

(ii) The variation within the classes, i.e., the inherent variation of the random variable within the observations of class

The first type of variation is due to assignable causes which can be detected and controlled by human being and the second type of variation is due to chance causes which are beyond the control of human being.

The main object of analysis of variance technique is to examine if there is significant difference between the class means in view of the inherent variability within the separate classes.

In particular, let us consider the effect of k brands of yoghurt on price of yoghurt of N shops / retail stores (of same type) divided into k brands/classes of sizes n₁, n₂, … , n_k respectively, .

Here the sources of variations are

(i) Effect of the brands

(ii) Error ‘e’ produced by numerous causes of such magnitude that they are not detected and identified with the knowledge that we have and they together produce a variation of random nature obeying Gaussian (Normal) law of errors.

21.3.1 Mathematical model

The linear mathematical model will be

X_ij = μ + α_i + e_ij (i=1,2,…,k) (j=1,2,…,n_i)

where X_ij is the value of the variate in the j^th observation (j=1,2,…,n_i) belonging to i^th class (i=1,2,…,k)

μ is the general mean effect

α_iis the effect due to i^th class where α_i = µ_i - μ

e_ij is random error which is assumed to be independently and normally distributed with mean zero and variance σ_e².

Let the mean of k populations be μ₁, μ₂, …, μ_k then our aim is to test null hypothesis

H_o : μ₁ = μ₂= ...= μ_k=μ which reduces to H_o : α₁ = α₂ = … = α_k = 0.

H₁ : At least one pair of μ_i’s is not equal.

21.3.2 Calculation of different sum of squares

a) Total Sum of Squares (TSS ) =

where G is the grand total of all the observations and N = n₁+ n₂+ … + n_k

The expression i.e., sum of squares of all the observations is known as Raw Sum of Squares (R.S.S.) and the expression is called Correction Factor (C.F.)

b) Sum of Squares Among Classes (SSC): To find the SSC, divide the squares of sum of each class by their class size or number of observations in each class and find their sum and thereafter, Subtract the correction factor from this sum i.e.,

where T_i. is the total of the observations pertaining to the i^th class.

c) Sum of Squares within classes (SSE): It is obtained by subtracting sum of squares among the classes from the total sum of squares i.e., SSE=TSS-SSC.

This sum of squares is also called error sum of squares denoted by SSE.

d) Mean Sum of Squares (M.S.S.): It is obtained by dividing sum of squares by their respective degrees of freedom.

e) Analysis of Variance Table

The results of the above calculations are presented in a table called Analysis of Variance or ANOVA table as follows:

Table 21.2

Source of variation	Degree of Freedom (d.f.)	Sum of Squares (S.S.)	Mean Sum of Squares (M.S.S.)	F-Ratio
Among Classes	k-1	SSC		S_C²/S_E²~ F(k-1,N-k)
Within Classes (Error)	N-k	SSE
Total	N-1	TSS

If the calculated value of F is greater than the tabulated value of F α;(k-1, N-k), where α denotes the level of significance, the hypothesis H₀ , is rejected and can be inferred that the class effects are significantly different from one another.

Standard Error

a) The estimated standard error of any class/treatment mean, say i^th treatment/class mean, is given by

Where S_E² is the mean sum of squares within samples or MSS(Error)

b) The estimated standard error of the difference i^thandj^th treatment mean, is

Where n_i and n_j are the number of observations for i^thandj^th treatment/class

c) If n_i = nj = n then S.E. of difference of means is

d) The Critical Difference (C.D.) or Least Significant Difference (L.S.D.) can be calculated as

C.D. = SE_d xt _α,(N-k) where α is level of significance and (N-k) is the d.f. for error.

The treatment means are i=1,2,…,k. These can be compared with the help of critical difference. Any two treatments means are said to differ significantly if the difference is larger than the critical difference (CD).The procedure of one way ANOVA is illustrated through the following example:

Example 1: The following table gives the moisture contents of Paneer prepared by four methods Manual(M₁), Mechanical with pressure 10 pound/inch² (M₂);with pressure 12 pound/inch² (M₃) and pressure 15 pound/inch² (M₄).

Table 21.3

Methods
M₁	M₂	M₃	M₄
50.3	54.1	57.5	52.3
52.2	53.7	56.3	53.2
52.5	55.5	55.8	53.6
51.7	54.6	56.9	53.4
52.6		55.8	53.8
		59.6

Analyze the data to find whether the mean moisture content in paneer is different prepared by different methods.

Solution:

H_o : μ₁=μ₂=μ₃=μ₄i.e., the mean moisture content in paneer prepared by different methods is same.

H₁ : Mean moisture content in paneer prepared by at least two methods are not equal.

Prepare the following table to calculate sum of squares due to different components:

Table 21.4

Methods	M₁	M₂	M₃	M₄	Total
Total (	259.30	217.90	341.90	266.30	G=1085.40
No. of observations (	5	4	6	5	20
Mean	51.8600	54.4750	56.9833	53.2600

=59000.2200-58904.66=95.5620

Sum of Squares among Classes (SSC) or Sum of Squares between Methods =

Sum of Squares within classes (SSE) or Sum of squares due to error:

SSE=TSS-SSC= 95.5620-78.48217=17.0798

Prepare the following analysis of variance table:

Table 21.5 ANOVA Table

Source of variation	Degree of Freedom (d.f.)	Sum of Squares (S.S.)	Mean Sum of Squares (M.S.S.)	F-Ratio
Among Methods	4-1=3	78.4822	26.1607	=24.5068
Within Methods (Error)	20-4=16	17.0798	=1.0675
Total	20-1=19	95.5620

From Fisher and Yate’s tables, F value for 3 and 16 d.f. at 5% level of significance is 3.2389 Since the observed value of F in the analysis of variance table is greater than the 5 % tabulated F value, it can be inferred that mean moisture content in paneer prepared by different methods differ significantly from one another.

Calculation of critical differences for comparison among various pairs of methods of preparing paneer

Table 21.6

Methods	M₃	M₂	M₄	M₁
Mean	56.9833	54.4750	53.2600	51.8600
No. of observations	6	4	5	5

C.D.(for comparing mean moisture content prepared by Method 3 and Method 2 ) =

= 0.6669x2.12=1.4138

C.D.(for comparing mean moisture content prepared by Method 2 and Method 4 ) =

= 0.6931x2.12=1.4693

C.D.(for comparing mean moisture content prepared by Method 4 and Method 1 ) =

= 0.6534x2.12=1.3853

Conclusion

It can be concluded the moisture content of paneer prepared by different methods was found to be significantly different from each other. The mean moisture content was found to be maximum in method M₃(56.9833) followed by method M₂(54.4750) which is significantly different from each other. The next mean moisture contents was found for method M₄(53.26) followed by method M₁(51.86) which is significantly different from each other.