Module 5. Test of significance

Lesson 18

t-TEST AND ITS APPLICATIONS

18.1 Introduction

The various tests of significance discussed in the previous lesson were related to large samples. The large sample theory was based on the application of ‘Normal deviate test’. However if sample size n is small (n<30), the distribution of the various statistics, e.g., are far from normality and as such ‘Normal deviate test’ cannot be applied if n is small. Hence to deal with small samples, new techniques and tests of significance known as ‘exact sample tests’ were developed which were pioneered by W. S. Gosset (1908) who wrote under the pen name of Student and later on developed and extended by Professor R. A. Fisher (1926). From practical point of view, a sample is small if its size is less than 30. In this lesson we shall discuss Student’s t-test. In exact sample tests, the basic assumption is that “the population(s) from which sample(s) are drawn is (are) normal i.e., the parent population(s) is (are) normally distributed and sample(s) is (are) random and independent of each other. The exact sample tests can be used even for large samples but large sample theory cannot be used for small samples.

18.2 Student’s t

Definition

Let X_i (i=1,2,…,n) be a random sample of size n drawn from a normal population with mean μ and variance σ², then student’s t is defined by the statistic.

where S² is an unbiased estimate of the population variance σ², and it follows student’s t distribution with (n-1) degrees of freedom.

Therefore (n-1) S²=n s²

18.3 Applications of t-test

The t-test has number of applications in statistics which are discussed in following sections

· t-test for significance of single mean, population variance being unknown

· t-test for the significance of the difference between two means, the population variances being equal

· t-test for significance of an observed sample correlation coefficient.

18.3.1 t-Test for single mean

Suppose we want to test

(i) If the given normal population has a specified value of the population mean μ₀.

(ii) If the sample mean differs from the specified value μ₀ of the population mean.

(iii) If a random sample of size n viz., X_i (i=1,2,…, n) has been drawn from a normal population with specified mean μ₀.

Basically all the above three problems are same with corresponding null hypothesis H_o as follows

(i) µ = μ₀i.e., the population mean is μ₀

(ii) There is no difference between the sample mean and the population means μ.

(iii) The given sample has been drawn from the population with mean μ_0
.

The test statistic is given by

follows student’s t distribution with (n-1) degrees of freedom. If calculated |t| > tabulated value of t at 5 percent level of significance viz., t_0.05; (n-1) d.f. then H_o is rejected at 5 per cent level of significance which implies that there is a significant difference between sample mean and population mean or the sample has not been drawn from the population having specified mean µ = μ₀. If calculated |t| < tabulated value of t at 5 percent level of significance viz., t_0.05; (n-1) d.f. then H_o is accepted. This is explained with the help of following illustrations.

Example .1: A random sample of 9 values from a normal population showed a mean of 41.5 and the sum of squares of deviations from the mean equal to 72.Test whether the assumption of mean 44.5 in the population is reasonable.

Solution: In this problem n=9 μ=44.5, =41.5 and

H₀: μ=44.5 i.e., population mean is 44.5

H₁:µ ≠ 44.5

Applying t-test

Tabulated value of t at 5% level of significance and 8 d.f. =2.306. Since the calculated value of |t| is greater than tabulated value 2.306, hence it is significant. We reject null hypothesis and conclude that the population mean is not equal to 44.5.

Example 2: An automatic machine was expected to fill 250 ml of flavored milk in the pouches. A random sample of pouches was taken and the actual content of milk was weighed. Weight of flavored milk (in ml.) is

253, 251, 248, 251, 252, 250, 249, 254, 247, 249, 248, 255, 245, 246, 254.

Do you consider that the average quantity of flavored milk in the sample is the same as that of adjusted value?

Solution : In this problem n=15 μ=250 ml.

H₀: μ=250 ml i.e., automatic machine on an average fills 250 ml milk in each pouch

H₁:µ ≠ 250

Prepare the following table

Table 18.1

X_i
253	2.8667	8.2178
251	0.8667	0.7511
248	-2.1333	4.5511
251	0.8667	0.7511
252	1.8667	3.4844
250	-0.1333	0.0178
249	-1.1333	1.2844
254	3.8667	14.9511
247	-3.1333	9.8178
249	-1.1333	1.2844
248	-2.1333	4.5511
255	4.8667	23.6844
245	-5.1333	26.3511
246	-4.1333	17.0844
254	3.8667	14.9511
3752	0.0000	131.7333

Applying t-test

Tabulated value of t at 5% level of significance for 14 d.f. is 2.15. Since the calculated value of |t| is less than tabulated value 2.15, hence it is not significant. We accept null hypothesis and conclude that the on an average automatic machine fills 250 ml. of flavored milk in pouches.

18.3.2 t-Test for difference of means

Suppose we want to test if two independent samples X_i(i=1,2,…,n₁) and Y_j(j=1,2,…,n₂) of sizes n₁ and n₂ have been drawn from two normal populations with means μ₁ and μ₂ respectively. Under the Null hypothesis H_o: µ₁ = μ₂i.e.,that the samples have been drawn from the populations having same mean .

H₁:µ ≠ μ₀

The t- statistic is given by

which follows t distribution with (n₁ + n₂ -2)

is an unbiased estimate of the common population variance σ²based on both the samples. By comparing the computed value of t with the tabulated value of t for (n₁ + n₂ -2) d.f. and at desired level of significance, we reject or retain null hypothesis H_o

18.3.2.1 Assumptions for difference of means test

(i) Parent populations from which the samples have been drawn are normally distributed.

(ii) The two samples are random and independent of each other.

(iii) The population variances are equal σ₁² = σ₂² = σ2 but unknown.

Thus before applying t-test for testing the equality means, it is theoretical desirable to test the equality of population variances by applying F-test. If the hypothesis H_o: σ₁² = σ₂² is rejected then we cannot apply t-test and in such situations Behren’s d test is applied. This procedure is explained with the help of following illustrations.

Example 3 : The prices of ghee were compared in two cities. For this purpose ten shops were selected at random in each city. The following table gives per kg. prices of ghee in two cities:

City A	361	363	356	364	359	360	362	361	358	357
City B	368	369	370	366	367	365	371	372	366	367

Test whether the average price of ghee is of the same order in two cities.

Solution :

Null hypothesis H_o:µ_A=μ_Bi.e., averageprice of ghee is of same order in cities A and B.

H₁:µ_A ≠ μ_B

Prepare the following table:

Table 18.2

City A			City B
X_i			Y_j
361	0.9	0.81	368	-0.1	0.01
363	2.9	8.41	369	0.9	0.81
356	-4.1	16.81	370	1.9	3.61
364	3.9	15.21	366	-2.1	4.41
359	-1.1	1.21	367	-1.1	1.21
360	-0.1	0.01	365	-3.1	9.61
362	1.9	3.61	371	2.9	8.41
361	0.9	0.81	372	3.9	15.21
358	-2.1	4.41	366	-2.1	4.41
357	-3.1	9.61	367	-1.1	1.21
3601		60.9	3681		48.9

and calculate,

Tabulated value of t at 5% level of significance and 18 d.f. (for two–tail) is 2.10. Since the calculated value of |t| is more than tabulated value (2.10), hence it is significant. We reject null hypothesis at 5 percent level of significance and conclude that average prices of ghee in both the cities are different.

18.3.3 Paired t-test

Let us now consider the case when

(i) Sample sizes are equal i.e., n₁ = n₂ = n and

(ii) The samples are not independent but the sample observations are paired together i.e., the pair of observations (X_i, Y_i) i=1,2,…,n corresponds to the same i^th sample unit. The problem is to test if the sample means differ significantly or not.

For example suppose we want to test the efficacy of a particular drug say for inducing sleep or controlling blood pressure or blood sugar among the patients or if we want to test the difference between two analysts or machines with regard to detection of mean fat percentage in milk. Let X_i and Y_i (i=1,2,…,n) be the readings of fat percentage of i^th milk sample, detected by two machines A and B respectively. Here instead of applying the difference of the means test discussed in previous section, we apply paired t-test.

Here we consider the difference d_i = X_i – Y_i (i=1,2,…,n)

Under the Null hypothesis H_o difference in fat percent in milk by both the machines is due to fluctuations of sampling i.e., H₀: μ_d = 0

against H₁: μ_d ≠ 0

then the test statistic

follows t distribution with (n-1) degrees of freedom

Different examples of paired t test are:

1. A sample of boys was given a test mathematics. They were given a month’s extra coaching and a second test was held at the end of it? Do the marks give evidence that the students have been benefitted by the extra coaching?

2. A sample of patients was examined to know whether a drug tends to reduce the blood pressure. The data give the blood pressure readings before the drug was given and also after it was given. The question is to examine whether the drug is effective in controlling blood pressure.

3. It is desired to test the adoption of a new technology by the farmers. A group of farmers is taken where the knowledge level score is measured before the new technology is infused and after infusion of technology, the knowledge level score is again measured. Do the difference in technology level scores provide the evidence that the farmers have been benefitted by the adoption of new technology.

This procedure is explained with the help of following illustrations.

Example 4: Ten B.Tech. (Dairy Tech.) second year students were selected for a training on quality control on the basis of marks obtained in an examination conducted for this purpose . After one month training they were given a test and marks were recorded out of 50.

Student	A	B	C	D	E	F	G	H	I	J
Before training	25	20	35	15	42	28	26	44	35	48
After training	26	20	34	13	43	40	29	41	36	46

Test whether there is any change in performance after the training.

Solution:

In this problem, the marks obtained by the students before training (X) and after training (Y) are not independent but paired together, hence we shall apply paired t test. Null Hypothesis H_o: µ_X=μ_Y_orH₀: μ_d = 0 i.e., mean scores before training and after training are same . In other words, thetraining has no impact on students’ performance against H₁: μ_d ≠ 0.

Preapare the following table

Table 18.3

Before training (X_i)	After training(Y_i)	d_i = X_i – Y_i	d_i²
25	26	-1	1
20	20	0	0
35	34	1	1
15	13	2	4
42	43	-1	1
28	40	-12	144
26	29	-3	9
44	41	3	9
35	36	-1	1
48	46	2	4
Total		= -10	=174

and calculate

Tabulated value of t at 5% level of significance and 9 d.f. (for two–tail) is 2.262. Since the calculated value of |t| is less than tabulated value 2.262, hence it is not significant. We accept null hypothesis and conclude that students have not been benefited from the training.

Example 5: A certain stimulus administered to each of 12 calves resulted in the following changes in the blood sugar levels 5, 2, 8,-1, 3,0, -2, 1, 5, 0, 4, 6 .

Can it be concluded that the stimulus will in general be accompanied by increase in blood sugar level? Test at 5% level of significance.

Solution: In this problem we are given the increments d_i =X_i –Y_iin the blood sugar levels of 12 calves

Null Hypothesis H_o: µ_X=μ_Yorμ_d = 0, i.e., there is no difference in blood sugar levels of the calves before and after the administering drug. In other words, thestimulushas no impact on blood sugar levels of calves.

Against H₁: µ_X<μ_Y or μ_d < 0 .i.e., the stimulus results in increase in blood sugar level of calves.

Preapare the following table:

d_i	5	2	8	-1	3	0	-2	1	5	0	4	6	31
d_i²	25	4	64	1	9	0	4	1	25	0	16	6	185

and calculate

Tabulated value of t at 10% level of significance and 14 d.f. is 1.80 [in this problem the alternative hypothesis is right tailed hence to test at 5% level of significance we have to see the t table at 10% level of significance]. Since the calculated value of |t| is greater than tabulated value 1.80, hence it is significant. We reject null hypothesis and conclude that the stimulus is effective in increasing blood sugar in calves.

18.3.4 t-Test for significance of an observed sample correlation coefficient

Let a random sample (x_i,y_i) (i=1,2---,n) of size n has been drawn from a bivariate normal distribution and let r be the observed sample correlation coefficient . In order to test whether sample correlation coefficient r is significant or there is no correlation between the variables in the population. Prof. R. A. Fisher proved that under the null hypothesis Ho: ρ=0 i.e. the population correlation coefficient is zero. The statistic

follows student’s t distribution with (n-2) d.f., n being the sample size.