Module 6. Analysis of variance

Lesson 21

ONE WAY CLASSIFICATION

21.1 Introduction

The t-test enables us to test the significance of the difference between two sample means but if we have a number of means and we need to test the hypothesis that the means are homogenous or that there is no difference among the means, then the technique known as Analysis of variance developed by Professor R. A. Fisher in 1920 is useful. Initially the technique was used in agricultural experiments but now a days it is widely used in almost all the branches of agricultural and animal sciences. This technique is used to test whether the differences between the means of three or more populations is significant or not. By using the technique of analysis of variance, we can test whether moisture contents of paneer or khoa prepared by different methods or batches differ significantly or not. Analysis of variance thus enables us to test on the basis of sample observations whether the means of three or more populations are significantly different or not. Thus basic purpose of the analysis of variance is to test the homogeneity of several means and the technique consists in splitting up the total variation into component variation due to independent factors where each of the components give us the estimate of population variation. In other words, in this technique, the total sum of squares is decomposed into sum of squares due to independent factors and the remaining is attributed to random causes or commonly called due to error.  

21.2  Analysis of Variance

The term ‘Analysis of Variance’ was introduced by Prof. R.A. Fisher in 1920’s to deal with problem in the analysis of agronomical data. Variation is inherent in nature. The total variation in any set of numerical data is due to a number of causes which may be classified as:

(i) Assignable causes, and (ii) Chance causes.

The variation due to assignable causes can be detected and measured whereas the variation due to chance causes is beyond the control of human being and cannot be accounted for separately.

21.2.1  Definition

According to Prof. R. A. Fisher, Analysis of Variance (ANOVA) is the “Separation of variance ascribable to one group of causes from the variance ascribable to other group.” Thus, ANOVA consists in the estimation of the amount of variation due to each of the independent factors (causes) and the remaining due to chance factor (causes), the later being known as experimental error or simply error. The technique of the Analysis of variance consisting in splitting up the total variation into component variation due to independent factors where each of the components gives us the estimate of the population variance. The total sum of squares is broken up into sum of squares due to independent factors and the remaining is attributed to random causes or commonly called due to error. Consider, for instance, an industrial problem such as the following. A factory produces components, many machines being at work on the same operation. The process is not purely mechanical, the machine operators having an influence on the quality of the output. Moreover it is thought that on certain days of the week (e.g. Monday) the output is found to be of poorer quality than on other days (e.g. Friday). The quality therefore depends on at least three factors, the machine, the operator and the day of the week. There may be other factors in operation and some of the factors mentioned may have no significant effect. It will be possible by the technique of analysis of variance whether any of the above factors, or some combinations of these has an appreciable effect on the quality and also to estimate the contribution made by each factor to the overall variability in the production or quality of product. Thus the purpose of the analysis is to establish relations of ‘Cause’ and effect.

21.2.2  Assumptions in analysis of variance

For the validity of the F-test in ANOVA, the following assumptions are made:

     (i)      The samples are drawn from the population randomly and independently.

    (ii)      The data are quantitative in nature and are normally distributed. Parent population from which observations are taken is normal.

    (iii)    Various treatments and environmental effects are additive in nature.

    (iv)    The population from where the samples have been drawn should have equal variance σ2. This is known as Homoscedasticity and can be tested by Barttlet’s test.

21.3  One-way Analysis of Variance

The simplest type of analysis of variance is known as one way analysis of variance, in which only one source of variation or factor of interest is controlled and its effect on the elementary units is observed. It is an extension of three or more samples of the t-test procedure for use with two independent samples. In other words t-test for use with two independent samples is a special case of one-way analysis of variance. In typical situation one–way classification refers to the comparison of means of several univariate normal population, having the same unknown variance σ2, on the basis of random samples selected from each population. The population means are denoted by μ1, μ2, ..., μk,  if there are k populations. The one way analysis of variance is designed to test the null hypothesis:

Ho : μ1 = μ2= ...= μk  i.e. the arithmetic means of the population from which the k samples have been randomly drawn are equal to one another.

Let us suppose that N observations Xij, (I = 1,2, … , k ; j = 1,2, … , ni) of a random variable X are grouped on some basis, into k classes (T1,T2,---, Tk) of sizes n1, n2, … , nk respectively  as exhibited below:

Table 21.1


Treatment

Means

Total

T1

……

T2

……

.

.

.

.

.

.

.

.

.

.

Ti

……

.

.

.

.

.

.

.

.

.

.

.

Tk

……

G

 

The total variation in the observations Xij can be split into the following two components:

(i)     The variation between the classes or the variation due to different bases of classification, commonly known as treatments.

(ii)   The variation within the classes, i.e., the inherent variation of the random variable within the observations of class

The first type of variation is due to assignable causes which can be detected and controlled by human being and the second type of variation is due to chance causes which are beyond the control of human being.

The main object of analysis of variance technique is to examine if there is significant difference between the class means in view of the inherent variability within the separate classes.

In particular, let us consider the effect of k brands of yoghurt on price of yoghurt of N shops / retail stores (of same type) divided into k brands/classes of sizes n1, n2, … , nk respectively, .

Here the sources of variations are

(i)     Effect of the brands

(ii)   Error ‘e’ produced by numerous causes of such magnitude that they are not detected and identified with the knowledge that we have and they together produce a variation of random nature obeying Gaussian (Normal) law of errors.

21.3.1  Mathematical model

The linear mathematical model will be

               

            Xij = μ + αi + eij (i=1,2,…,k) (j=1,2,…,ni)

where Xij is the value of the variate in the jth observation (j=1,2,…,ni) belonging to ith class (i=1,2,…,k)  

    μ   is the general mean effect

    αi  is the effect due to  ith class where α i = µ i - μ

    eij is random error which is assumed to be independently and normally distributed with mean zero and variance σe2.

Let the mean of k populations be μ1, μ2, …, μk then our aim is to test null hypothesis

Ho : μ1 = μ2= ...= μk=μ which reduces to Ho : α1 = α2 = … = αk = 0.

H1 : At least one pair of μi’s is not equal.

21.3.2  Calculation of different sum of squares

 a)   Total Sum of Squares (TSS ) =

    

      where G is the grand total of all the observations and N = n1 + n2 + … + nk  

     The expression  i.e., sum of squares of all the observations is known as Raw Sum of Squares (R.S.S.) and the expression  is called Correction Factor (C.F.)

b)   Sum of Squares Among Classes (SSC): To find the SSC, divide the squares of sum of each class by their class size or number of observations in each class and find their sum and thereafter, Subtract the correction factor from this sum i.e.,


      

     where Ti. is the total of the observations pertaining to the ith class.

c) Sum of Squares within classes (SSE): It is obtained by subtracting sum of squares among the classes from the total sum of squares i.e., SSE=TSS-SSC.

     This sum of squares is also called error sum of squares denoted by SSE.

d) Mean Sum of Squares (M.S.S.): It is obtained by dividing sum of squares by their respective degrees of freedom.

e) Analysis of Variance Table

The results of the above calculations are presented in a table called Analysis of Variance or ANOVA table as follows:

 

Table 21.2

Source of variation

Degree of Freedom (d.f.)

Sum of Squares

(S.S.)

Mean Sum of Squares (M.S.S.)

F-Ratio

Among Classes

k-1

SSC

SC2/SE2~ F(k-1,N-k)

Within Classes (Error)

N-k

SSE

 

Total

N-1

TSS

 

 

 

 

If the calculated value of F is greater than the tabulated value of F α;(k-1, N-k), where α denotes the level of significance, the hypothesis H0 , is rejected and can be inferred that the class effects are significantly different from one another.

Standard Error

a)      The estimated standard error of any class/treatment mean, say ith treatment/class mean, is given by

                         

      Where SE2 is the mean sum of squares within samples or MSS(Error)

 

b)      The estimated standard error of the difference ith  and  jth  treatment mean, is

                        

Where ni and nj are the number of observations for ith  and  jth  treatment/class

      c)      If ni = nj = n then S.E. of difference of means is

                

       d)   The Critical Difference (C.D.) or Least Significant Difference (L.S.D.) can be calculated as

              C.D. = SEd xt α,(N-k)  where α is level of significance and (N-k) is the d.f. for error.

The treatment means are i=1,2,…,k. These can be compared with the help of critical difference. Any two treatments means are said to differ significantly if the difference is larger than the critical difference (CD).The procedure of one way ANOVA is illustrated through the following example:

Example 1:  The following table gives the moisture contents of  Paneer prepared by four methods Manual(M1), Mechanical with pressure 10 pound/inch2 (M2);with pressure 12 pound/inch2 (M3) and pressure 15 pound/inch2 (M4).

Table 21.3

Methods

M1

M2

M3

M4

50.3

54.1

57.5

52.3

52.2

53.7

56.3

53.2

52.5

55.5

55.8

53.6

51.7

54.6

56.9

53.4

52.6

 

55.8

53.8

 

 

59.6

 

 

Analyze the data to find whether the mean moisture content in paneer is different prepared by different methods.

Solution:

Ho : μ123i.e., the mean moisture content in paneer prepared by different methods is same.

H1 : Mean moisture content in paneer prepared by at least two methods are not equal.

Prepare the following table to calculate sum of squares due to different components:

Table 21.4

Methods

M1

M2

M3

M4

Total

Total (

259.30

217.90

341.90

266.30

G=1085.40

No. of observations (

5

4

6

5

20

Mean

51.8600

54.4750

56.9833

53.2600

 

 

               

 

               

             

 

             =59000.2200-58904.66=95.5620

 

Sum of Squares among Classes (SSC) or Sum of Squares between Methods =
            


               

 

               

 

Sum of Squares within classes (SSE) or Sum of squares due to error:

            SSE=TSS-SSC= 95.5620-78.48217=17.0798

Prepare the following analysis of variance table:

Table 21.5 ANOVA Table

Source of variation

Degree of Freedom (d.f.)

Sum of Squares

(S.S.)

Mean Sum of Squares (M.S.S.)

F-Ratio

Among Methods

4-1=3

78.4822

           26.1607

  =24.5068

Within Methods (Error)

20-4=16

17.0798

          =1.0675

 

Total

20-1=19

95.5620

 

 

 

From Fisher and Yate’s tables, F value for 3 and 16 d.f. at 5% level of significance is 3.2389 Since the observed value of F in the analysis of variance table is greater than the 5 % tabulated F value, it can be inferred that mean moisture content in paneer prepared by different methods differ significantly  from one another.

Calculation of critical differences for comparison among various pairs of methods of preparing paneer

Table 21.6

Methods

M3

M2

M4

M1

Mean

56.9833

54.4750

53.2600

51.8600

No. of observations

6

4

5

5

 C.D.(for comparing mean moisture  content prepared by Method 3 and Method 2 ) =

            = 0.6669x2.12=1.4138

 C.D.(for comparing mean moisture  content prepared by Method 2 and Method 4 ) =

            = 0.6931x2.12=1.4693

 C.D.(for comparing mean moisture  content prepared by Method 4 and Method 1 ) =

            = 0.6534x2.12=1.3853

Conclusion

It can be concluded the moisture content of paneer prepared by different methods was found to be significantly different from each other. The mean moisture content was found to be maximum in method M3(56.9833) followed by method M2(54.4750) which is significantly different from each other. The next mean moisture contents was found for method M4 (53.26) followed by method M1(51.86) which is significantly different from each other.