Module 7. Correlation and regression

Lesson 24

RANK CORRELATION

24.1  Introduction

There are many occasions in problems related with business and industry where it is not possible to measure the variable under consideration quantitatively or where the statistical series is composed of items, the exact magnitudes of which cannot be ascertained. Many characters are expressed in comparative terms such as beauty, intelligence, flavour and body texture of a milk product etc. In such cases the subjects are ranked pertaining to that particular character instead of taking measurements on them. Sometimes, the units are also ranked according to their quantitative measure. In these types of studies, two situations arise, (i) the same set of units is ranked according to two characters A and B (ii) two judges give ranks to the same set of units independently, pertaining to one character only. In both these situations, we get paired ranks for a set of units. For example, (i) two judges are asked  to rank ten value added milk products in terms of taste independently in a sensory evaluation experiment whereas it may be difficult to give them a numerical grade in terms of taste, (ii) The students are ranked according to their marks in Operations Research and Statistics. In all these situations, the usual Pearsonian correlation coefficient cannot be used. Hence, the psychologist, Charles Edward Spearman (1906) developed a formula for correlation coefficient, which is known as rank correlation or Spearman’s correlation coefficient.

24.2  Formula of Rank Correlation Coefficient

Suppose we want to find if two characteristics A (flavour) and B (consistency) are related or not. Both the characteristics are incapable of quantitative measurement but can be arranged in order of rank with respect to proficiency of two characteristics. Let X & Y be the random variable denoting the rank of the individuals in the characteristics A & B respectively. It is assumed that there is no tie i.e., no two individuals get the same rank for a characteristic then, obviously X and Y assume numerical values ranging from 1 to n.  Then Spearman’s rank correlation coefficient is given by the formula.

               

Where di = (Xi - Yi) difference between pairs of rank of some individual in the two characters.

n= number of pairs of observations.

Proof

Let (X1 , Y1) , (X2 , Y2) ,…, (Xn , Yn) be the ranks of the n individuals assigned for two characteristics A & B respectively. In general, an individual will not be equally proficient in both the characteristics.

i.e. (Xi) (the rank of ith individual in character A) will not be equal to Yi (the rank of the ith individual in character B).

Let di = (Xi - Yi) difference between the ranks assigned by ith individual for two characters A and B.

               

               

               

               

           

               

               

               

           

           

               

               

               

               

24.3  Features of Spearman’s Rank Correlation Coefficient

a)    Spearman’s rank correlation coefficient lies between -1 to +1 i.e., -1 ≤  r≤  +1.

b)    Sum of the difference of ranks between two variables shall be zero i.e. ∑ di = 0

c)    Spearman’s correlation coefficient is distribution free because no assumptions are made about the form of the population from which the sample observations are drawn.

24.4  Computation of Rank Correlation Coefficient

In rank correlation we have three types of problems;

24.4.1  When ranks are given

When ranks are given the following procedure is adopted to find the rank correlation coefficient:

1)    Compute di the difference of ranks i.e., di = Xi - Yi difference between the ranks of ith individual in two characters A and B.

2)    Compute di2 i.e., square of the rank difference.

3)    Obtain the sum of squares of rank difference i.e., ∑ di2.

4)   Use the following formula to compute rank correlation.

5)        

The procedure is illustrated by the following example

Example 1: In a sensory evaluation experiment, two judges accorded the following ranks to eight milk products

Judge A

8

7

6

3

1

1

5

4

Judge B

7

5

4

1

3

2

6

8

 Find the Spearman’s rank correlation coefficient.

Solution :

Prepare the following table and calculate di’s and di2’s as given below

 

 Judge A (Xi )

8

7

6

3

2

1

5

4

 

Judge  B (Yi)

7

5

4

1

3

2

6

8

 

  

1

2

2

2

-1

-1

-1

-4

 

1

4

4

4

1

1

1

16

 

 

Calculate rank correlation coefficient as follows

       

24.4.2  When ranks are not given

When we are given the actual data and not the ranks, the following procedure is adopted to find out rank correlation coefficient.

1)   First step is to convert the data into ranks the highest (smallest) observation is given rank 1. The next highest (smallest) observation is given rank 2 and so on. It is immaterial in such a way (descending or ascending) the ranks are assigned. However, the same approach should be followed for the entire variable under consideration.

2)    Compute di the difference of ranks i.e., di = Xi - Yi difference between the ranks of ith individual in two characters A and B.

3)     Compute di2 i.e., square of the rank difference.

4)     Obtain the sum of squares of rank difference i.e., ∑di2.

5)     Use the following formula to compute rank correlation

6)  

The procedure is illustrated by the following example

Example 2: Calculate Spearman’s rank Correlation for the following data on marks obtained.

X

80

73

85

36

54

93

65

96

58

88

Y

15

83

95

64

32

16

67

66

85

39

Solution :

Prepare the following table and calculate di’s and di2’s as given below

  X

Y

80

15

6

1

5

25

73

83

5

8

-3

9

85

95

7

10

-3

9

36

64

1

5

-4

16

54

32

2

3

-1

1

93

16

9

2

7

49

65

67

4

7

-3

9

96

66

10

6

4

16

58

85

3

9

-6

36

88

39

8

4

4

16

 

 

 

 

 

∑di2=186

 

Calculate rank correlation coefficient as follows

       

24.4.3  When ranks are repeated (or Tie case)

In case of attributes where there is a tie i.e. if two or more individuals are placed at the same level in classification with respect to an attribute or if in case of variable data, there is more than one item with the same value in either or both the series then Spearman’s rank correlation formula breaks down, since in this case the variables X and Y don’t take values from 1 to n and consequently . In such cases, common ranks are assigned to repeated items. The common ranks are arithmetic mean of ranks which the items would have got if they were different from each other and next item will get the rank next to the rank used in computing the common rank. If there is large number of ranks with tie,  it is advisable to apply correction factor or adjustment factor, (C.F)

           
where m is the number of times an item is repeated.  Then add this correction factor to ∑di2.

               

This correction factor is to be added for each repeated value in both the series.

The procedure is illustrated by the following example

Example 3: Calculate Spearman’s rank Correlation for the following data

X

80

73

80

36

54

93

65

36

58

80

Y

15

83

15

64

32

16

67

64

85

64

Solution

Prepare the following table and calculate di’s and di2’s as given below

X

Y

80

15

8

1.5

6.5

42.25

73

83

6

9

-3

9

80

15

8

1.5

6.5

42.25

36

64

1.5

6

-4.5

20.25

54

32

3

4

-1

1

93

16

10

3

7

49

65

67

5

8

-3

9

36

64

1.5

6

-4.5

20.25

58

85

4

10

-6

36

80

64

8

6

2

4

 

 

 

 

 

∑di2=233

 

In X series, we see that the value 36 occurs twice. The common rank assigned to each of these values is 1.5, the arithmetic mean of 1 and 2 rank which these observations would have taken if they were different.  The subsequent four values 54,58, 65 and 73 are allotted ranks as 3,4,5 and 6 respectively. Again, the value 80 occurs thrice .The common rank assigned to it is 8, the arithmetic mean of 7, 8 and 9 and the next value, viz., 93 gets the rank 10. Similarly in Y series the value 15 occurs twice and the common rank assigned to each is 1.5, the arithmetic mean of 1 and 2.The next value 16 gets the next rank viz., 3. Again the value 64 occurs thrice. The common rank assigned to it is 6, the arithmetic mean of 5, 6 and 7 and the next value, viz., 67 gets the rank 8 and so on. Hence we see that in the X-series the observation 36 occurs twice (m=2) and 80 occurs thrice (m=3) and in the Y series the observation 15 occurs twice (m=2) and 64 occurs thrice (m=3). Hence on applying the correction factor m(m2-1)/12 for each repeated item, we get

Calculate rank correlation coefficient as follows

               

               

24.5  Test of Significance of Rank Correlation Coefficient

The significance of rank correlation coefficient is tested by t-test, as it is done in case of Karl Pearson’s correlation coefficient. Here we test the null hypothesis Ho: ρs=0. vs H1: ρs≠0. The test statistic

               

i.e., t follows student’s t distribution with (n-2) d.f., where n is the number of paired observations and rs is the rank correlation coefficient.