Module 5. Test of significance

Lesson 17

Z-TEST AND ITS APPLICATIONS

17.1 Introduction

In the previous lesson we encountered a problem to decide whether our sample observations have come from a postulated population or not. On the basis of sample observations, a test is performed to decide whether the postulated hypothesis is accepted or rejected and this involves certain amount of risk. The amount of risk is termed as a level of significance. When the hypothesis is rejected, we consider it as a significant result and when a reverse situation is encountered, we consider it as a non-significant result. We have seen that for large values of n, the number of trials, almost all the distributions e.g., Binomial, Poisson etc. are very closely approximated by Normal distribution and in this case we apply Normal Deviate test (Z-test). In cases where the population variance (s) is/are known, we use Z-test. The distribution of Z is always normal with mean zero and variance one. In this lesson we shall be studying the problem relating to test of significance for large samples only. In statistics a sample is said to be large if its size exceeds 30.

17.2 Test of Significance for Large Samples

In cases where the population variance(s) is/are known, we use Z-test. Moreover when the sample size is large, sample variance approaches population variance and is deemed to be almost equal to population variance. In this way, the population variance is known even if we have sample data and hence the normal deviate test is applicable.The distribution of Z is always normal with mean zero and variance one. Thus, if X ~ N (μ, σ²)

From normal probability tables, we have

P[-3≤Z≤3]=P[|Z| ≤ 3] = 0.9973 ⇒ P[|Z| ≤ 3] = 1-P[|Z| ≤ 3] = 0.0027. Thus, the value of Z=3 is regarded as critical or significant value at all levels of significance. Thus if |Z| ≤ 3, Ho is always rejected. If |Z| < 3, we test its significance at certain level of significance usually at 5% and sometimes at 1% level of significance. Also P[|Z| >1.96]=0.05 and P[|Z| >2.58]=0.01.Thus, significant values of Z at 5% and 1% level of significance are 1.96 and 2.58 respectively. If |Z| >1.96, H_o is rejected at 5% level of significance if |Z| <1.96, H_o may be retained at 5% level of significance. Similarly |Z|>2.58, H_o is rejected at 1% level of significance and if |Z| <2.58, H_o is retained at 1% level of significance. In the following sections we shall discuss the large sample (normal) tests for attributes and variables.

17.3 Applications of Z Test

17.3.1 Test for single proportion

If the observations on various items or objects are categorized into two classes c₁ and c₂ (binomial population), viz. defective or not defective item, we often want to test the hypothesis, whether the proportion of items in a particular class, viz., defective items is P₀ or not. For example, the management of a dairy plant is interested in knowing that whether the population of leaked pouches filled by automatic milk filling machine is one percent. Thus for binomial population, the hypothesis we want to test is whether the sample proportion is representative of the Population proportion P = P₀against H₁: P≠P₀ or H₁: P>P₀ or H₁: P<P₀ can be tested by Z-test where P is the actual proportion of items in the population belonging to class c₁. Proportions are mostly based on large samples and hence Z-test is applied.

If X is the number of successes in n independent trials with constant probability P of success for each trial then E (X) = P and V (X) = n P Q where Q = 1−P. It is known that for large n, the Binomial distribution tends to Normal distribution. Hence , for large n, X~N (nP, nPQ). Therefore, Z statistic for single proportion is given by

and we can apply a normal deviate test.

If in a sample of size n, X be the number of persons possessing the given attributes then observed proportion of successes

Since X and consequently X/n is asymptotically normal for large n, the normal deviate test for the proportion of success becomes.

Example 1. In a large consignment of baby food packets, a random sample of 100 packets revealed that 5 packets were leaking. Test whether the sample comes from the population (large consignment) containing 3 percent leaked packets.

Solution: In this example n=100, X=5, P=0.03,

H₀: P = 0.03 .i.e., the proportion of the leaked pouches in the population is 3 per cent

H₁: P ≠ 0.03.

Here, we shall use standard normal deviate (Z) test for single proportion as under

Since calculated value of Z statistic is less than 1.96 therefore H₀ is not rejected at 5% level of significance which implies that the sample is representative of the population (large consignment) of packets containing 3% leaked packets.

17.3.2 Test of Significance for difference of proportions

If we have two populations and each item of a population belong to either of the two classes C₁ and C₂. A person is often interested to know whether the proportion of items in class C₁ in both the populations is same or not that is we want to test the hypothesis.

H₀: P₁=P₂ against H₁: P₁≠P₂ or P₁>P₂ or P₁<P₂ where P₁ and P₂ are the proportions of items in the two populations belonging to class C₁.

Let X₁, X₂ be the number of items belonging to class C₁ in random samples of sizes n₁ and n₂from the two populations respectively. Then the sample proportion

If P₁ and P₂ are the proportions then E (P₁) = P₁, E (P₂) = P₂

Since for the large sample, p_1, and p₂ are asymptotically normally distributed, (p₁−p₂) is also normally distributed. Therefore, the Z-statistic for difference between two proportions is given by

Since,

Since P₁ = P₂ = P and Q₁ = Q₂ = Q, therefore

If the population proportion P₁ and P₂ are given to be distinctly different that is P₁ ≠ P₂, then

In general P, the common population proportion (under H_o) is not known, then an unbiased estimate of population proportion ‘P’ based on both the samples is used and is given by

Example 2. Before an increase in excise duty on tea, 400 people out of a sample of 500 persons were found to be tea drinkers. After an increase in excise duty, 400 people were observed to be tea drinkers in a sample of 600 people. Test whether there is a significant change in the number of tea drinkers after increase in excise duty on tea.

Solution: In this example X₁ = 400, n₁ = 500, X₂ = 400, n₂ = 600

H₀: P₁=P₂i.e., there is no change in the number of tea drinkers after increase in excise duty on tea

H₁: P₁≠P₂

Here we shall use standard normal deviate (Z) test for difference of proportions as under:

In our example p₁ = 400/500=0.8, p₂ = 400/600=0.6667

q₁ = 1 - p₁ = 0.2, q₂ = 1 - p₂ = 0.333

Since calculated value of Z statistic is greater than 3, therefore H₀ is rejected at all levels of significance which implies that there is a significant change in the number of tea drinkers after increase in excise duty on tea. It is further observed that the number of tea drinkers have significantly declined after increase in excise duty on tea which is due to decrease in thevalue of p₂ (0.667) from the value of p₁ (0.8).

Example 3. A machine turns out 16 imperfect articles in a sample of 500. After overhauling it turns 3 imperfect articles in a batch of 100. Has the machine improved after overhauling?

Solution : We are given n₁=500and n₂=100

p₁= Proportions of defective items before overhauling of machine =16/500=0.032

p₂= Proportions of defective items after overhauling of machine =3/100=0.03

H₀: P₁=P₂i.e_.the machine has not improved after overhauling.

H₁: P₁>P₂

Since Z<1.645 (Right –tailed test), it is not significant at 5% level of significance. Hence we may accept the null hypothesis and conclude that the machine has not improved after overhauling.

17.3.3 Test for significance of single mean

We have seen that if X_i (i=1, 2, …., n) is a random sample of size n from a normal population with mean µ and variance σ², then the sample mean is distributed normally with mean μ and variance σ²/n i.e., .Thus for large samples normal variate corresponding to is

In test of significance for a single mean we deal the following situations

1) To test if the mean of the population has a specified value (μ₀) and null hypothesis in this case will be H₀: μ=μ₀ i.e., the population has a specified mean value.

2) To test whether the sample mean differs significantly from the hypothetical value of population mean with null hypothesis as there is no difference between sample mean and population mean (μ).

3) To test if the given random sample has been drawn from a population with specified mean μ₀ and variance σ² with null hypothesis the sample has been drawn from a normal population with specified mean μ₀ and variance σ²

In all the above three situations the test statistic is given by

If |Z| <1.96, H_o is not rejected at 5% level of significance which implies that there is no significant difference between sample mean and population mean and whatever difference is there, it exists due to fluctuation of sampling.

|Z| >1.96, H_o is rejected at 5% level of significance which implies that there is a significant difference between sample mean and population mean. The above situations are illustrated by following examples:

Example 4. A random sample of 100 students gave a mean weight of 64 kg with a standard deviation of 16 kg. Test the hypothesis that the mean weight in the population is 60 kg.

Solution: In this example, n=100, µ=60 kg., =64 kg., σ=16

H₀: µ=60 kg. , i.e. the mean weight in the population is 60 kg.

We shall use standard normal deviate (z) test for single mean as under:

Since calculated value of Z statistic is more than 1.96, it is significant at 5% level of significance. Therefore, H₀ is rejected at all levels of significance which implies that mean weight of population is not 60 kg.

Example 5. A sample of 50 cows in a herd has average lactation yield 1290 litres. Test whether the sample has been drawn from the population having herd average lactation yield of 1350 litres with a standard deviation of 65 litres.

Solution: In this example, n=50, µ=1350 litres, =1290, σ=65

H₀: µ=1350 litres i.e., the mean lactation milk yield of the cows in the population is 1350

H₁: µ≠1350 litres

We shall use standard normal deviate (Z) test for single mean as under:

Since calculated value of Z statistic is more than 3, it is significant at all levels of significance. Therefore, H₀ is rejected at all levels of significance which implies that the sample has not been drawn from the population having mean lactation milk yield as 1350 litres or there is a significant difference between sample mean and population mean.

17.3.4 Test of significance for difference of means

Let be the mean of a sample of size n₁ drawn from a population with mean μ₁ and variance σ₁² and let be the mean of an independent sample of size n₂ drawn from another population with mean μ₂ and variance σ₂². Since sample sizes are large.

Also , being the difference in means of two independent normal variates is also a normal variate. The standard normal variate corresponding to is given by

Under the null hypothesis H_o: μ₁ = μ₂ i.e., the two population means are equal, we get

The covariance terms vanish, since the sample means are independent.

Thus under H_o: μ₁ = μ₂, the Z statistic is given by

Here σ₁² and σ₂² are assumed to be known. If they are unknown then their estimates provided by corresponding sample variances s₁² and s₂² respectively are used, i.e., = s₁² and = s₂², thus, in this case the test statistic becomes

Remarks: If we want to test whether the two independent samples have come from the same population i.e., if σ₁² = σ₂² = σ² (with common S.D. σ), then under H_o : μ₁ = μ₂

If the common variance σ² is not known, then we use its estimate based on both the samples which is given by

Example 6. In a certain factory there are two independent processes manufacturing the same item. The average weight in a sample of 100 items produced from one process is found to be 50g with a standard deviation of 5g while the corresponding figures in a sample of 75 items from the other process are 52g and 6g respectively. Is the difference between two means significant?

Solution: In this example, ; .

Let μ₁ and μ₂ be the population mean of the weight of items manufactured by two independent processes.

H₀: μ₁ = μ₂, i.e., mean weights of the items manufactured by two independent processes in the population is same.

H₀: μ₁ ≠ μ₂

Here, we shall use standard normal deviate test (Z-test) for calculating difference between two means as under

Since calculated value of Z statistic is more than 1.96, therefore, H₀ is rejected at 5% level of significance which implies that there is a significant difference between mean weights of the items obtained from two manufacturing processes.