Module 3. Probability distributions

Lesson 10

BINOMIAL DISTRIBUTION

10.1  Introduction

In the first module we have studied the empirical or observed or experimental frequency distribution in which the actual data were collected and tabulated in the form of a frequency distribution. In the present lesson we will study theoretical frequency distribution which are not obtained by actual observations or experiments but distributed according to some definite probability law which can be expressed mathematically. Such distributions as are expected on the basis of previous experience or theoretical considerations are known as theoretical distribution or probability distribution. Thus, the theoretical frequency distribution are not based on actual observations but are mathematically deducted under certain assumptions. In this lesson we shall study one of the most popular discrete distributions, the origin of which lies in Bernoullian trials.

10.2  Binomial Distribution

Binomial distribution is a discrete probability distribution. This distribution was discovered by a Swiss Mathematician James Bernoulli (1654-1705). A Bernoullian trial is an experiment having only two possible outcomes i.e. success or failure. In other words the result of the trial are dichotomous e.g. in tossing of a coin either head or tail, the sex of a calf can be either male or female, a manufactured milk product or an engineering equipment or spare part will be either defective or non defective etc. This distribution can be used under the following conditions:

a)  The random experiment is performed repeatedly a finite and fixed number of times i.e. n, the number of trials is finite and fixed.

b)  The outcome of a trial results in the dichotomous classification of events i.e. each trial must result in two mutually exclusive outcomes –success or failure.

c)   Probability of success (or failure) remains same in each trial i.e. in each trail the probability of success, denoted by p remains constant. q=1-p, is then termed as the probability of failure (non-occurrence).

d)  Trials are independent i.e. the outcome of any trial does not affect the outcomes of the subsequent trials.

10.3  Probability Mass Function of Binomial Distribution

Statement

If X denotes the number of successes in n trials satisfying the above conditions, then X is a random variable which can take values 0,1,2,---,n i.e. no success, one success, two successes,---, or all the n successes. The general expression for the probability of r successes is given by:

            P(r) = P(X = r) = nCr pr qn-r    for r=0,1,2,………,n

Proof : By the theorem of compound probability, the probability that r trials are success and the remaining (n-r) are failures in a sequence of n trials in a specified order say S,F,S,F,S,,---,S is given by

           

But we are interested in any r trials being successes and since r trials can be chosen out of n trials in nCr (mutually exclusive) ways. Therefore, by the theorem of total probability, the chance P (r) of r successes in a series of n independent trials is given by

                        P (r) = nCr pr qn-r          0≤r≤n

r can take only positive integer values.

Thus, the chance variate i.e. the number of successes, can take the values 0,1,2,…..,r,……..,n with corresponding probabilities qn,nC1 p qn-1,………..,nCr pr qn-r,………..,pn

o   The probability distribution of the number of successes so obtained is called the binomial probability distribution for the obvious reason that the probabilities are the various terms of the binomial expansion of (q+p)n.

o   The sum of probabilities

o   The expression for P (X = r) is known as probability mass function of the Binomial distribution with parameter n and p. The random variable X following this probability law is called binomial variate with parmeter n and p  denoted as X~B(n,p).Hence binomial distribution can be completely determined if n and p are known .

Example 1. It is known that 40 percent cows affected by tuberculosis die every year. Six cows are admitted to a veterinary hospital suffering from tuberculosis. What is the probability that

(i)    Three cows will die.

(ii)    at least five cows will die

(iii)   all cows will be cured

(iv)   no cow will be saved.

Solution

In this exercise we have p = 0.4 ,  q = 1- 0.40 = 0.6 and n=6

In binomial distribution we have P(r) = nCr . pr . qn-r

(i) Prob. [Three cows will die] = P[r = 3] = P(3) = 6C3 . (0.4)3 (0.6)3

           

(ii) Prob. (at least five cows will die) = P(5) + P(6) = 6C5 (0.4)5 (0.6)1 + 6C6 (0.4)6 (0.6)0  = 6 (0.4)5 (0.6)1 + (0.4)6 = 0.0369 +0.0041= 0.0410

(iii)   Prob. (all cows will be cured) =1 – P (no cow will die) = 1- P(0) =1 – 6C0 (0.4)0 (0.6)6   = 1 - (0.6)6 = 1 – 0.0467 = 0.9533

(iv)   Prob. (no cow will be saved) = P (all cows will die) = P(6)= 6C6 (0.4)6 (0.6)0 = (0.4)6 =0.0041

Example 2. Ten consumers were asked to state their preferences between two types of ice-cream. Assuming that there is no difference between two types of ice–cream, calculate the probability that

a)      3 or less consumers will prefer ice-cream A.

b)      7 or more consumers will prefer ice-cream B.

Solution: In this exercise p = 0.5, q = 0.5 and n = 10

      a) Prob. [Three or less consumers will prefer Ice Cream A] = P(0) + P(1) + P(2) + P(3)

            = 10C0 (0.5)0 (0.5)10 + 10C1 (0.5)1(0.5)9 + 10C2 (0.5)2 (0.5)8 + 10C3 (0.5)3(0.5)7 = (0.5)10(10C0 + 10C1 + 10C2 + 10C3

            = 0.00098 (1 + 10+ 45 + 120) = 0.00098 (176) = 0.1725

 b) Prob. [Seven  or more consumers will prefer Ice Cream B] = P(7) + P(8) + P(9) + P(10)

   = 10C7 (0.5)7 (0.5)3 + 10C8 (0.5)8(0.5)2 + 10C9 (0.5)9 (0.5)1 + 10C10 (0.5)10 = (120+45+10+1) (0.5)10 =0.1725

10.4  Example of Binomial distribution

·         The problem relating to tossing of a coin or throwing of dice or drawing cards from a pack of cards with replacement.

·         The problems relating to distribution for the preference for a dairy product among families.

·         The problem relating to distribution of coli-forms in sterilized milk.

·         The problem relating to distribution of number of stables in farm households.

·         The problem relating to distribution of number of lactations completed by the milch animals in a dairy farm.

10.5  Properties of Binomial Distribution

i) Mean of binomial distribution is np.

Proof: First raw moment

              

 

           

 

 

ii) Variance of binomial distribution is npq

Proof: Second raw moment

        

        

Variance = 

 For the binomial distribution if mean and variance are known, we can arrive at the frequency distribution and variance is less than mean.

 iii) The third and fourth central moment µ3 and µ4 can be obtained on the same lines.

              

          

 iv) Pearson’s constants β1 & β2 as well as γ1 and γ2 are given by

                  

         

 

γ1 shows that the binomial distribution is positively skewed if q > p or p < 1/2 and it is negatively skewed if q < p or p >1/2 and it is symmetrical if p = q = 1/2.The binomial distribution is leptokurtic if pq < 1/6 and platykurtic if pq > 1/6.

v) Mode of binomial distribution is determined by the value (n+1)p. If this value is an integer equal to k then the distribution is bi-modal, the two modal values being X=k and X=k-1.When this value is not an integer then the distribution has unique mode at X=k1, the integral part of (n+1)p.

vi) Additive property: If X1 is B(n1,p)and X2 is B(n2,p) and they are independent then their sum X1 + X2 is also a binomial variate B(n1+ n2,p).

Example 3. If the mean and variance of a Binomial Distribution are respectively 9 and 6, find the distribution.

Solution: Mean of Binomial Distribution is np and variance is npq

           

           

           

           

 Hence, the Binomial Distribution is

i.e.  

 Example 4. An unbiased dice is thrown 5 times and appearance of face on the dice 2 or 3 is considered as success. Find the probability of (i) exactly one success (ii) at least 4 successes and find mean and variance.

Solution: Here

 

           

           

           

     

           

           

           

Example 5. A binomial variate X satisfies the relation 9 P(X=4)=P(X=2) when n = 6. Find the value of the parameter p.

Solution: Since the binomial probability distribution is

           

           

           

Considering the given relation,

            9 P(X = 4) = P(X = 2), we have

           

           

           

              .

10.6 Fitting of Binomial Distribution

Let the n independent trials constitute one experiment and let this experiment be repeated N times. Then we expect r successes to occur N. nCr pr qn-r times. This is called expected frequency of r successes in N experiments and the possible number of successes together with the expected frequencies will constitute binomial (expected) frequency distribution

            Nxp(r) = Nx nCr pr qn-r  ; r=0,1,2,………,n

Putting r=0,1,2,………,n we get the expected or theoretical frequencies of the Binomial distribution , which are given in the following table .

No. of successes ( r )

Expected or theoretical

Frequencies         N.P(r)

0

N

1

N

2

N

:

:

n

N

 

Case I: If p the probability of success which is constant for each trial is known , then the expected frequencies can be obtained from the above table.

Case II: If p is not known and if we want to fit a binomial distribution to a given frequency distribution, then first find mean of the given frequency distribution by the formula  and equate it to np which is mean of the binomial distribution. Hence, p can be estimated by the relation m=np⇒p=m/n, q = 1-p, with the values of p and q the expected theoretical binomial frequencies can be obtained by using the above table. The expected frequencies can also be computed by using the following recurrence formula

 

           

 

The procedure is illustrated through the following example.

Example 6. The following table gives the number of coliforms per ml in thousand pouches of milk:

No of coliforms (Xi)

0

1

2

3

4

5

6

7

8

9

10

No. of pouches (fi)

2

8

46

116

211

243

208

119

40

7

0

 

Fit a binomial distribution to the above data.

Solution: In the usual notations we have: n = 10, N = 1000, ∑ fi Xi = 4971,

             

             

to get the expected frequencies as given in the following table:

 

No. of

coliforms  (Xi)

No. of bottles (fi)

fi Xi

Expected Frequency E (r)

0

2

0

1.0347

1

8

8

10.2277

2

46

92

45.4939

3

116

348

119.9179

4

211

844

207.4360

5

243

1215

246.0524

6

208

1248

202.6788

7

119

833

114.4808

8

40

320

42.4352

9

7

63

9.3213

10

0

0

0.9214

Total

1000

4971

1000.00

 

Different expected frequencies are also computed by using recurrence formula