Module 3. Probability distributions

Lesson 10

BINOMIAL DISTRIBUTION

10.1 Introduction

In the first module we have studied the empirical or observed or experimental frequency distribution in which the actual data were collected and tabulated in the form of a frequency distribution. In the present lesson we will study theoretical frequency distribution which are not obtained by actual observations or experiments but distributed according to some definite probability law which can be expressed mathematically. Such distributions as are expected on the basis of previous experience or theoretical considerations are known as theoretical distribution or probability distribution. Thus, the theoretical frequency distribution are not based on actual observations but are mathematically deducted under certain assumptions. In this lesson we shall study one of the most popular discrete distributions, the origin of which lies in Bernoullian trials.

10.2 Binomial Distribution

Binomial distribution is a discrete probability distribution. This distribution was discovered by a Swiss Mathematician James Bernoulli (1654-1705). A Bernoullian trial is an experiment having only two possible outcomes i.e. success or failure. In other words the result of the trial are dichotomous e.g. in tossing of a coin either head or tail, the sex of a calf can be either male or female, a manufactured milk product or an engineering equipment or spare part will be either defective or non defective etc. This distribution can be used under the following conditions:

a) The random experiment is performed repeatedly a finite and fixed number of times i.e. n, the number of trials is finite and fixed.

b) The outcome of a trial results in the dichotomous classification of events i.e. each trial must result in two mutually exclusive outcomes –success or failure.

c) Probability of success (or failure) remains same in each trial i.e. in each trail the probability of success, denoted by p remains constant. q=1-p, is then termed as the probability of failure (non-occurrence).

d) Trials are independent i.e. the outcome of any trial does not affect the outcomes of the subsequent trials.

10.3 Probability Mass Function of Binomial Distribution

Statement

If X denotes the number of successes in n trials satisfying the above conditions, then X is a random variable which can take values 0,1,2,---,n i.e. no success, one success, two successes,---, or all the n successes. The general expression for the probability of r successes is given by:

P(r) = P(X = r) = ⁿC_r p^r q^n-r for r=0,1,2,………,n

Proof : By the theorem of compound probability, the probability that r trials are success and the remaining (n-r) are failures in a sequence of n trials in a specified order say S,F,S,F,S,,---,S is given by

But we are interested in any r trials being successes and since r trials can be chosen out of n trials in ⁿC_r(mutually exclusive) ways. Therefore, by the theorem of total probability, the chance P (r) of r successes in a series of n independent trials is given by

P (r) = ⁿC_r p^r q^n-r 0≤r≤n

r can take only positive integer values.

Thus, the chance variate i.e. the number of successes, can take the values 0,1,2,…..,r,……..,n with corresponding probabilities qⁿ,ⁿC₁ p q^n-1,………..,ⁿC_r p^r q^n-r,………..,pⁿ

o The probability distribution of the number of successes so obtained is called the binomial probability distribution for the obvious reason that the probabilities are the various terms of the binomial expansion of (q+p)ⁿ.

o The sum of probabilities

o The expression for P (X = r) is known as probability mass function of the Binomial distribution with parameter n and p. The random variable X following this probability law is called binomial variate with parmeter n and p denoted as X~B(n,p).Hence binomial distribution can be completely determined if n and p are known .

Example 1. It is known that 40 percent cows affected by tuberculosis die every year. Six cows are admitted to a veterinary hospital suffering from tuberculosis. What is the probability that

(i) Three cows will die.

(ii) at least five cows will die

(iii) all cows will be cured

(iv) no cow will be saved.

Solution

In this exercise we have p = 0.4 , q = 1- 0.40 = 0.6 and n=6

In binomial distribution we have P(r) = ⁿC_r . p^r . q^n-r

(i) Prob. [Three cows will die] = P[r = 3] = P(3) = ⁶C₃. (0.4)³ (0.6)³

(ii) Prob. (at least five cows will die) = P(5) + P(6) = ⁶C₅ (0.4)⁵ (0.6)¹ + ⁶C₆ (0.4)⁶ (0.6)⁰ = 6 (0.4)⁵ (0.6)¹ + (0.4)⁶ = 0.0369 +0.0041= 0.0410

(iii) Prob. (all cows will be cured) =1 – P (no cow will die) = 1- P(0) =1 – ⁶C₀ (0.4)⁰ (0.6)⁶= 1 - (0.6)⁶ = 1 – 0.0467 = 0.9533

(iv) Prob. (no cow will be saved) = P(all cows will die) = P(6)= ⁶C₆ (0.4)⁶ (0.6)⁰ = (0.4)⁶ =0.0041

Example 2. Ten consumers were asked to state their preferences between two types of ice-cream. Assuming that there is no difference between two types of ice–cream, calculate the probability that

a) 3 or less consumers will prefer ice-cream A.

b) 7 or more consumers will prefer ice-cream B.

Solution: In this exercise p = 0.5, q = 0.5 and n = 10

a) Prob. [Three or less consumers will prefer Ice Cream A] = P(0) + P(1) + P(2) + P(3)

= ¹⁰C₀ (0.5)⁰ (0.5)¹⁰ + ¹⁰C₁ (0.5)¹(0.5)⁹ + ¹⁰C₂ (0.5)²(0.5)⁸ + ¹⁰C₃ (0.5)³(0.5)⁷ = (0.5)¹⁰(¹⁰C₀ + ¹⁰C₁ + ¹⁰C₂ + ¹⁰C₃)

= 0.00098 (1 + 10+ 45 + 120) = 0.00098 (176) = 0.1725

b) Prob. [Seven or more consumers will prefer Ice Cream B] = P(7) + P(8) + P(9) + P(10)

= ¹⁰C₇ (0.5)⁷ (0.5)³ + ¹⁰C₈ (0.5)⁸(0.5)² + ¹⁰C₉ (0.5)⁹(0.5)¹ + ¹⁰C₁₀ (0.5)¹⁰= (120+45+10+1) (0.5)¹⁰ =0.1725

10.4 Example of Binomial distribution

· The problem relating to tossing of a coin or throwing of dice or drawing cards from a pack of cards with replacement.

· The problems relating to distribution for the preference for a dairy product among families.

· The problem relating to distribution of coli-forms in sterilized milk.

· The problem relating to distribution of number of stables in farm households.

· The problem relating to distribution of number of lactations completed by the milch animals in a dairy farm.

10.5 Properties of Binomial Distribution

i) Mean of binomial distribution is np.

Proof: First raw moment

ii) Variance of binomial distribution is npq

Proof: Second raw moment

Variance =

For the binomial distribution if mean and variance are known, we can arrive at the frequency distribution and variance is less than mean.

iii) The third and fourth central moment µ₃ and µ₄ can be obtained on the same lines.

iv) Pearson’s constants β₁ & β₂ as well as γ₁ and γ₂ are given by

γ₁ shows that the binomial distribution is positively skewed if q > p or p < 1/2 and it is negatively skewed if q < p or p >1/2 and it is symmetrical if p = q = 1/2.The binomial distribution is leptokurtic if pq < 1/6 and platykurtic if pq > 1/6.

v) Mode of binomial distribution is determined by the value (n+1)p. If this value is an integer equal to k then the distribution is bi-modal, the two modal values being X=k and X=k-1.When this value is not an integer then the distribution has unique mode at X=k_1, the integral part of (n+1)p.

vi) Additive property: If X₁ is B(n₁,p)and X₂ is B(n₂,p) and they are independent then their sum X₁ + X₂ is also a binomial variate B(n₁₊ n₂,p).

Example 3. If the mean and variance of a Binomial Distribution are respectively 9 and 6, find the distribution.

Solution: Mean of Binomial Distribution is np and variance is npq

Hence, the Binomial Distribution is

i.e.

Example 4. An unbiased dice is thrown 5 times and appearance of face on the dice 2 or 3 is considered as success. Find the probability of (i) exactly one success (ii) at least 4 successes and find mean and variance.

Solution: Here

Example 5. A binomial variate X satisfies the relation 9 P(X=4)=P(X=2) when n = 6. Find the value of the parameter p.

Solution: Since the binomial probability distribution is

Considering the given relation,

9 P(X = 4) = P(X = 2), we have

10.6 Fitting of Binomial Distribution

Let the n independent trials constitute one experiment and let this experiment be repeated N times. Then we expect r successes to occur N. ⁿC_r p^r q^n-r times. This is called expected frequency of r successes in N experiments and the possible number of successes together with the expected frequencies will constitute binomial (expected) frequency distribution

Nxp(r) = Nx ⁿC_r p^rq^n-r ; r=0,1,2,………,n

Putting r=0,1,2,………,n we get the expected or theoretical frequencies of the Binomial distribution , which are given in the following table .

No. of successes ( r )	Expected or theoretical Frequencies N.P(r)
0	N
1	N
2	N
:	:
n	N

Case I: If p the probability of success which is constant for each trial is known , then the expected frequencies can be obtained from the above table.

Case II: If p is not known and if we want to fit a binomial distribution to a given frequency distribution, then first find mean of the given frequency distribution by the formula and equate it to np which is mean of the binomial distribution. Hence, p can be estimated by the relation m=np⇒p=m/n, q = 1-p, with the values of p and q the expected theoretical binomial frequencies can be obtained by using the above table. The expected frequencies can also be computed by using the following recurrence formula

The procedure is illustrated through the following example.

Example 6. The following table gives the number of coliforms per ml in thousand pouches of milk:

No of coliforms (X_i)	0	1	2	3	4	5	6	7	8	9	10
No. of pouches (f_i)	2	8	46	116	211	243	208	119	40	7	0

Fit a binomial distribution to the above data.

Solution: In the usual notations we have: n = 10, N = 1000, ∑ f_iX_i = 4971,

to get the expected frequencies as given in the following table:

No. of coliforms (X_i)	No. of bottles (f_i)	f_i X_i	Expected Frequency E (r)
0	2	0	1.0347
1	8	8	10.2277
2	46	92	45.4939
3	116	348	119.9179
4	211	844	207.4360
5	243	1215	246.0524
6	208	1248	202.6788
7	119	833	114.4808
8	40	320	42.4352
9	7	63	9.3213
10	0	0	0.9214
Total	1000	4971	1000.00

Different expected frequencies are also computed by using recurrence formula