Module 1. Descriptive statistics

Lesson 4

MEASURES OF DISPERSION

4.1 Introduction

In the preceding lesson, we have seen different measures of central tendency and learnt how they can be calculated for varying types of distributions. The measures of central tendency are just different types of averages and do not indicate the extent of variability in a distribution. Averages or the measures of central tendency give us an idea of the concentration of the observations about the central part of the distributions. If we are given the average of a series of observations, we cannot form complete idea about the distribution since there may exist a number of distributions whose averages are same but they may differ widely from each other in a number of ways. Let us consider two series I and II of 6 items each 

Series                                                                           Total    Mean

I           20        20        25        25        30        30        150      25

II         15        20        25        25        30        35        150      25

We notice that there is no difference as far as the average is concerned. But we notice that in the first case the observations vary from 20 to 30 and in the second case, the observations vary from 15 to 35 i.e. we notice that the greatest deviation from the mean in the first case is 5 and in the second case it is 10. Clearly this indicates a difference in the two series. Such a variation is called scatter or dispersion. Thus, the measures of central tendency must be supported and supplemented by some other measures. One such measure is dispersion. Measures of dispersion help us to study variability of the items i.e. the extent to which the items vary from one another and also from the central value.

4.2 Meaning of Dispersion

The term dispersion is generally used in two senses. Firstly, dispersion refers to the variation of the items among themselves. If the value of all the items of a series is the same, there will be no variation among the various items and the dispersion will be zero. On the other hand, the greater the variation among different items of a series, the more will be the extent of dispersion. Secondly, dispersion refers to the variation of the items about an average. If the difference between the value of items and the average is large, the dispersion will be high and on the other hand if the difference between the values of items and average is small, the dispersion will be low. Thus, dispersion is defined as scatteredness around central value or the spread of the individual items in a given series. According to A. L. Bowley “Dispersion is the measure of the variation of the items”. Spiegel defined dispersion as “The degree to which numerical data tend to spread about an average value is called the variation or dispersion of the data”.

4.3 Objectives of Measuring Dispersion

The measures of dispersion are helpful in statistical investigation. Some of the main objectives of dispersion are:

4.4 Characteristics for an Ideal Measure of Dispersion

The following are the essential requisites for an ideal measure of dispersion:

·       It should be rigidly defined.

·       It should be based on all observations.

·       It should be readily comprehensive.

·       It should be easily calculated.

·       It should be amenable to further mathematical treatment.

·       It should be affected as little as possible by fluctuations of sampling.

·       It should not be affected much by extreme observations.

4.5 Absolute and Relative Measures of Dispersion

The measures of dispersion which are expressed in terms of the original units of a series are termed as Absolute Measures. Such measures are not suitable for comparing the variability of the two distributions which are expressed in different units of measurement.  On the other hand, relative measures of dispersion are obtained as ratios or percentages and are thus pure numbers independent of the units of measurement. These measures are used to compare two series expressed in different units.

4.6 Measures of Dispersion

Various measures of dispersion in common use are:

4.6.1 Range

The simplest possible measure of dispersion is the range which is nothing but the difference between the greatest and the smallest observation of the distribution. Thus, Range =Xmax -Xmin where Xmax is the greatest observation and Xmin is the smallest observation of the variable value. In case of the grouped frequency distribution range is defined as the difference between upper limit of the highest class and the lower limit of the smallest class. In order to compare the variability of the two or more distributions given in different units of measurement, the relative measure , called coefficient of range is used and this is defined as follows:

               

 In other words coefficient of range is the ratio of the difference between two extreme observations of the distribution to their sum.

4.6.1.1 Merits and demerits of range

Range is the simplest though crude measure of dispersion. It is rigidly defined, readily comprehensible and easiest to compute. It got the following drawbacks

4.6.1.2 Uses of range

In spite of above limitations range as a measure of dispersion, has following applications

4.6.2 Quartile deviation or semi-inter-quartile range

The difference between the upper and lower quartiles i.e. Q3 – Q1 is known as the inter-quartile range and half of this difference i.e.  (Q3 – Q1) is called the semi-inter-quartile range or the quartile deviation denoted by Q.D. For comparative studies of variability of two distributions the relative measure which is known as Coefficient of Quartile deviation which is given by

               

 4.6.2.1  Merits of quartile deviation

4.6.2.2  Demerits of quartile deviation

4.6.3 Mean deviation or average deviation

This measure of dispersion is obtained by taking the arithmetic mean of the absolute deviations of the given values from a measure of central tendency. According to Clark and Schkade: “Average deviation is the average amount of scatter of the items in a distribution either the mean or the median, ignoring the signs of deviations. The average that is taken of the scatter is an arithmetic mean, which accounted for the fact that this measure is often called the mean deviation”.

4.6.3.1 Calculation of mean deviation

If X1, X2, ---, Xn are n given observations then mean deviation (M.D.) about an average A is given by:

 M.D. (about an average A) =  Where  read as mod (Xi-A) is the modulus value or absolute value of the deviation and A is one of the averages viz., Mean (M), Median (Md) and Mode (Mo)

 

In case of grouped frequency distribution, mean deviation about an average A is given by:

 M.D. (about an average A) =  where Xi is the mid value of the class interval, fi is the corresponding frequency,  is the total frequency.

 Mean deviation is minimum when it is calculated from median. In other words, mean deviation calculated about median will be less than the mean deviation about mean or mode. The relative measures of mean deviation is called coefficient of mean deviation is given by

 

               

 

               

 

               

 

               

 

The coefficients of mean deviations defined above are pure numbers independent of units of measurement and are useful for comparing the variability of different distributions. The calculation of various measures is illustrated in example 1.

Example 1: Find mean deviation from mean, median and mode using the data given in example 1 of Lesson 2. Also find the coefficient of mean deviation about mean, median and mode. 
Solution : Using the values of Mean (M) =1910 Median (Md) = 1890.8696 and Mode (Mo) = 1866.3636, calculated in Lesson 3and then prepare the following table:

Class  Interval

Mid-value  (Xi)

frequency

(fi)

Xi-M

Xi-Md

Xi-Mo

1630-1730

1680

17

-230

3910

-210.87

3584.7832

-186.364

3168.1812

1730-1830

1780

19

-130

2470

-110.87

2106.5224

-86.3636

1640.9084

1830-1930

1880

23

-30

690

-10.8696

250.0008

13.6364

313.6372

1930-2030

1980

16

70

1120

89.1304

1426.0864

113.6364

1818.1824

2030-2130

2080

14

170

2380

189.1304

2647.8256

213.6364

2990.9096

2130-2230

2180

7

270

1890

289.1304

2023.9128

313.6364

2195.4548

2230-2330

2280

2

370

740

389.1304

778.2608

413.6364

827.2728

2330-2430

2380

2

470

940

489.1304

978.2608

513.6364

1027.2728

Total

 

100

 

14140

 

13795.6528

 

13981.8192

 

 

               

 

               

 

               

 

From above calculations we can verify that mean deviation calculated about median (137.9565) is less than mean deviation about mean (141.10) or mode (139.8182).

               

 

               

 

               

 4.6.3.2  Merits of mean deviation

·       It is rigidly defined, easy to understand and calculate.

·       It is based on all observations and is better than range and quartile deviation.

·      The averaging of the absolute deviations from an average irons out the irregularities in the distribution and thus provides an accurate measure of dispersion.

·       It is less affected by extreme observations.

4.6.3.3   Demerits of mean deviation

·      Ignoring the signs is not correct from mathematical point of view.

·      It is not an accurate method when it is calculated from mode.

·      It is not capable of further mathematical treatment.

·      It cannot be used if we are dealing with open end classes.

4.6.4  Standard deviation

Standard deviation, usually denoted by the Greek alphabet σ was first suggested by Karl Pearson as a measure of dispersion in 1893. It is defined as the positive square root of the mean of the square of the deviations of the given observations from their arithmetic mean. If X1,X2,---, Xn is a set of n observations then its standard deviation is given by :

    is the arithmetic mean.

 In case of a grouped data, the standard deviation is given by:

Thus  

 

Where;

 Xi is the value of the variable or mid value of the class in case of grouped frequency distribution;

 fi is the corresponding frequency of the value Xi,

  is the total frequency

  is the arithmetic mean of the distribution.

 

The square of the standard deviation viz., σ2 is called variance or second moment about mean.

4.6.4.1  Computation of variance (Direct method)

Other formulae for calculating variance is

               


and in case of grouped data is

 

               

 

4.6.4.2  Short–cut method (Change of origin)

This method consists in taking deviations of the given observations from any arbitrary value A. The formula for calculation of the arithmetic mean is

               

 

 The variance and consequently the standard deviation of a distribution is independent of the change of origin. Thus, if we add (subtract) a constant to (from) each observation of the series, its variance remains same.

4.6.4.3   Step- deviation method (Change of origin and scale)

In case of grouped frequency distribution, with class intervals of equal magnitude, the calculations are further simplified by taking;  where Xi is the mid value of the class and h is the common magnitude of the class intervals. So the formula for calculating mean and variance is  

 

               

 

which shows that the variance or standard deviation is not independent of change of scale. Thus, if we multiply (divide) each observation of the series by a constant h, its variance will be multiplied (divided) by h2.Hence variance and consequently the standard deviation of a distribution is independent of the change of origin but not of the scale. The procedure is illustrated in the example 2.It will be seen that the answer in each of the three cases is the same. The step-deviation method is the most convenient on account of simplified calculations.

Example 2: Find variance of the data given in example 1 of Lesson 3 with short-cut and step-deviation method.

Solution: Prepare the following table to calculate variance by different methods.

 Class  Interval

Mid-value  (Xi)

freq

(fi)

fi Xi

fiXi2

di’=Xi-A

A=2080

fi di

fi di2

fi di

fi di2

1630-1730

1680

17

28560

47980800

-400

-6800

2720000

-4

-68

272

1730-1830

1780

19

33820

60199600

-300

-5700

1710000

-3

-57

171

1830-1930

1880

23

43240

81291200

-200

-4600

920000

-2

-46

92

1930-2030

1980

16

31680

62726400

-100

-1600

160000

-1

-16

16

2030-2130

2080

14

29120

60569600

0

0

0

0

0

0

2130-2230

2180

7

15260

33266800

100

700

70000

1

7

7

2230-2330

2280

2

4560

10396800

200

400

80000

2

4

8

2330-2430

2380

2

4760

11328800

300

600

180000

3

6

18

Total

 

100

191000

367760000

 

-17000

5840000

 

-170

  584

  

Direct Method    

           

Short–cut Method

           

 

Step –Deviation Method

               

 4.6.4.4   Merits of standard deviation

·       It is rigidly defined.

·       It is based on all observations and is the best measure of dispersion.

·       The squaring of the deviations from mean removes the drawback of ignoring the signs  of deviations in computing the mean deviation. This makes it suitable for further  mathematical treatment. The variance of the combined series can also be computed.

·      It is least affected by fluctuations of sampling and therefore, it widely used in sampling theory and tests of significance. 

4.6.4.5   Demerits of standard deviation

·       As compared to the quartile deviation and range etc., it is difficult to understand and difficult to calculate.

·       It gives more importance to extreme observations.

4.6.4.6   Variance of the combined series

As pointed earlier variance is suitable for algebraic treatment i.e. if we are given the averages, the sizes and the variances of a number of series, then we can obtain the variance of the resultant series obtained by combining different series. Thus if  are the variances;  and  are the arithmetic means and sizes of k series respectively . Then the variance of the combined series of size N=  is given by the formula

 

               

 Where

              

is the mean of combined series. In particular, for two series the combined variance is given by

               

 Where    

Substituting the values of   and  , combined variance is

               

 4.6.5  Coefficient of variation

Standard deviation is an absolute measure of dispersion. The relative measure of dispersion based on standard deviation is called the coefficient of standard deviation and is given by

               

This is a pure number independent of the units of measurement and thus, is suitable for comparing the variability, homogeneity or uniformity of two or more distributions.

100 times the coefficient of dispersion based on standard deviation is called the coefficient of variation (C.V.) expressed in percentage. Thus,

Coefficient of Variation =        

This measure was suggested by Prof. Karl Pearson and according to him “Coefficient of variation is the percentage variation in mean, standard deviation being considered as the total variation in the mean”. For comparing the variability of two distributions we compute the coefficient of variation for each distribution. A distribution with relatively smaller C.V. is said to be more homogeneous or uniform or less variable or more consistent than the other and the series with relatively greater C.V. is said to be more heterogeneous or more variable or less consistent than the other.