4.2.3 Considerations in the construction of Frequency distributions
It is difficult to lay down any hard and fast rules for constructing a frequency distribution, since it depends on the nature of the given data and the object of classification.
However, the following general considerations may be borne in mind for ensuring meaningful classification of data :
(1) The number of classes should preferably be between 5 and 20. However, there is no rigidity about it. The classes can be more than 20 depending upon the total number of items in the series and the details required, but they should not be less than five because in that case the classification may not reveal the essential characteristics. The choice of number of classes basically depends upon :
(a) the number of figures to be classified.
(b) the magnitude of the figures
(c) the details required, and
(d) ease of calculation for further statistical work.
(2) As far as possible one should avoid values of class-intervals, as 3,7, 11, 26, 39, etc. Preferably, one should have class-intervals of either five or multiples of 5 like 10,20,25, 100, etc. The reason is that the human mind is accustomed more to think in terms of certain multiples of 5,10 and the like. However, where the data necessitate a class-interval of less than 5 it can be any value between 1 and 4.
(3) The starting point, i.e. the lower limit of the first class, may either be zero or 5 or multiple of 5. For example, if the lowest value of the data is 63 and we have taken a class-interval of 10, then the first class can be 60-70, instead of 63-73. Similarly, if the lowest value of the data is 76 and the class interval is 5 then the first class can be 75 to 80 rather than 76 to 81.
(4) To ensure continuity we should follow ‘exclusive’ method of classification. However, if ‘inclusive’ method has been adopted it is necessary to adjust the class limits between two classes to have continuity. The adjustment consists of finding the difference between the lower limit of the second class and the upper limit of the first class, dividing the difference by two, subtracting the value so obtained from all lower limits and adding the value to all upper limits. This can be expressed in the form of a formula as follows :
How the adjustment is made when data are given by inclusive method can be seen from the following examples :
To adjust the class limits, we take here the difference between 900 and 899, which is one. By dividing it by two we get ½ or 0.5. This (0.5) is called the correction factor. Deduct 0-5 from the lower limits of all classes and add 0.5 to upper limits. The adjusted classes would then be as follows :
It should be noted that before adjustment the class-interval was 99 but after adjustment, it is 100. Observe another case :
The correction factor here is
After adjustment the classes will be :
The class-interval now is 5 and not 4.5. Taking a third example, if the class limits are
The correction factor would be
After adjustment the classes will become :
(5) Wherever possible, it is desirable to use class intervals of equal sizes because comparisons of frequencies among classes are facilitated and subsequent calculations from the distribution are simplified. However, this is not always a practical procedure. For example, in case of data on monthly income of families, in order to show the details for the portion of frequency distribution where the majority of incomes lie, class intervals of 100 or 200 may be used starting say from 600-700 onwards upto about 1200 then intervals of 400 to 500 may be used upto 2500 or so and a final class of 2500 and above may be shown for the relatively small number of families having these highest incomes. It is obvious that if we have an equal class intervals were used, say 500, too many families would be lumped together in the first one or two classes, and the information how these incomes were distributed would be lost. To resolve this dilemma at times we use open-end interval along with equal or inequal class intervals. The use of unequal class sizes and open-end intervals generally becomes necessary in cases where most of data are concentrated within a certain range, where gaps appear in which relatively few items are observed and where there are very few extremely small or large values.
(6) Open-end distribution presents problems of graphing and further analysis. When the frequency distribution is being employed as the only technique of presentation, open-end classes do not seriously reduce its usefulness as long as only a few items fall in these classes. However, use of the distribution for purposes of further mathematical computation is difficult because a mid-point value, which can be used to present, the class, cannot be determined for an open-end class.
(7) In any frequency distribution the size of items or the value are indicated on the left-hand side and the number of times the items in those sizes or values have repeated are indicated by frequencies on the right-hand side corresponding to the respective size or values.