Introduction
When you make a dilution series, the tube you start with contains approximately 10^6
cells/mL. From the earlier procedure you know that the suspension actually holds between 1 and
2 million cells/mL, but for now, suppose you have exactly 1 million cells/mL. Since the tube
contains 2 mL, you should have 2 million cells. If you pipet out exactly 0.1 mL, how many cells
will you get? You would expect to get 1/20 of the 2 million cells in the tube, 10^5 cells, but you
probably won't because some extra cells could have been in the 0.1 mL you pulled out. This
expected variation is called the standard deviation, and for our experiment is approximately the
square root of the number of cells you expect to get. Thus, if you expect 105 cells you may only
get within 300 cells of 10^5 (300=û>(10^5)).
Missing a hundred thousand by a mere 300 is not so bad, but notice what happens with
the further dilutions. By the time you expect to get 100 cells, the standard deviation is 10, the
square root of 100. So if you expect to get 100 cells on a plate and in fact you get 90, that is not
necessarily due to poor technique. About 1/3 of the time you should expect to get a result
outside the standard deviation; even getting 80, therefore, may not be the result of poor
technique. Furthermore, just because you get 100 when you expect 100 does not mean you have
great technique; you were also lucky. To develop and demonstrate good technique you must
prepare many dilution series. Try to plate 100 cells 20 times. If your technique is good, you will
get an average of 100 cells per plate, with 13-15 in the 90-110 range, and the rest outside it.
Let's think about designing experiments with these rules in mind. How many cells should we aim to get on a plate? If we expect to get 100 then our result should be around 100ñ10, while if we expect to get 10 then we should see around 10ñ3. The uncertainty looks smaller in the second case. But this is an illusion. 3 is 30% of 10 while 10 is only 10% of 100, so aiming for 100 is smarter than aiming for 10; the relative uncertainty is smaller. What about aiming for 1000 cells on a plate? Then we would expect around 1000ñ+/-30, a 3% relative error. This is true, the statistical error is smaller, but it is very hard to count 1000 colonies accurately. Using so many cells introduces a large systematic error into the procedure. When designing an experiment you have to steer a course between the statistical errors that come with too few cells/plate and the systematic errors that come with too many cells, or even too many plates; people get tired and count inaccurately if there is too much to do.
Making a real plate is like pulling a plate out of this ghostly distribution: The plate will probably come from some where near the mean because there are many more plates there, but it may come from the wings, that is just less likely.
Some language from statistics
The example of the dilution series illustrates a number of ideas from statistics, a widely
applicable piece of mathematics. When we use a definite set of procedures to make the plates we
create in our mind's eye a population of plates. The actual control plates are a sample drawn
from the population. The job of statistics is to use the sample to learn things about the
population. The population is characterized by parameters. In this example the mean number of
cells/plate is the parameter we care about. We took our control plates, the sample, and used the
mean of this sample to estimate the mean of the population. We say that the sample mean is an
estimator of the population mean. Our estimation is better with larger samples. If we had made
10 control plates containing 101, 127, 113, 120, 126, 95, 130, 120, 136, 112 cells instead of the
three above our sample size would be 10 instead of 3. We would still estimate the mean to be
118 as we did before but we would feel much more confident of our estimate. We will not go
into the quantitative details of sample size and confidence levels.
The most important characteristics of a sample are its size, its mean and its standard
deviation, or its variance. The size of the sample is usually obvious, and the sample mean is
familiar: it is just the average. The standard deviation and variance of a sample are most easily
explained by an example. Consider the sample of 3 plates above. The mean is 118 so the
deviation of the first plate is 120-118, the deviation of the second is 136-118, and of the third is
98-118. The variance is
that is, the variance is the mean of the squares of the deviations. The standard deviation is just
the square root of the variance, which gives it its other name: the root mean square deviation, or
rms deviation. The standard deviation of a sample is an estimator of the standard deviation of the
population, so in this example we estimate the standard deviation of the population to be û>(242) ÷>
16. This is not far from what we expect from the rule given above (Standard deviation = square
root of expected number of cells): û>
(118) ÷ 11. Why the difference? The rule gives the ideal
standard deviation, to be expected if all procedures are perfectly carried out, which is not
possible in real life. You should be able to get close to that ideal with lots of practice but real
experiments will produce populations with bigger standard deviations than this ideal. Of course,
real samples will sometimes, by chance, have a smaller standard deviation than this ideal! The
sample deviation is just an estimator of the population deviation, it will sometimes be too big and
sometimes too small.
Recall that the normal distribution that describes the ideal dilution series experiment done with
perfect technique has standard deviation equal to the square root of the mean, but usually there
are other sources of variation, such as variation in the amount of solution withdrawn for each
plate, so the standard deviation is usually a separate parameter unrelated to the mean. The
mathematical expression for the normal distribution is given below. We give it just for fun; you
don't need to use it now. It uses some symbols you may not yet know, but your teacher, or your
math teacher, can help. The number of plates having x colonies we call N(x) and
The figure labeled (a) plots the number of plates having n mutations versus n; the figure labeled
(b) plots the probability of having n mutations on a plate versus n. The connection is simple:
there were 903 of the 1000 plates that had 0 mutants so the probability of getting 0 is 0.90, there
were 89 that had 1 mutant so the probability of 1 is 0.09, and so on. We wanted to measure p,
the probability that a single cell will mutate when exposed to this much radiation. The total
number of mutants is 111 (89 plates with one means 89 mutants, 6 plates with two adds twelve,
and so on) and the total number of cells is 1000 plates times 200 cells/plate. Thus we get
p=111/200,000. So we have found out what we wanted to, but there is a little more we can learn.
The dashed line in (b) is a Poisson distribution of the probability of getting n mutants, P(n),
versus n. We can see that this is a pretty good description of the experimental results. The
mathematical expression of the Poisson distribution is
where p is the parameter we wanted. This gives us a different way to determine p. All we need
is P(0), the probability that there are zero mutations on a plate. Because this number is
determined using lots of plates the relative error is small, so it is probably fairly accurate. Here is
how to use P(0) to find "lambda" and then p. From the formula for the Poisson distribution we see that
P(0) equals exp- >, so >
= -ln P(0)= 0.10 and so p= >
/(number of cells per plate) = 0.10/200
=0.0005, just as we found before.
The last distribution we will discuss here is the binomial distribution. This gives the
probability of having n successes in N tries. A typical situation is rolling dice or flipping a coin:
the binomial distribution tells the probability of rolling 7 four times in 11 attempts, for example.
A biological example is the mutation experiment we just discussed. We could call ``causing a
mutation'' a success and the number of cells on a plate would be the number of tries: the radiation
tries to mutate each cell and succeeds only with the few mutants.
The binomial distribution is characterized by one parameter, p, the probability of success
on any single try. The mathematical expression of the binomial distribution is this. Let P(n,N)
be the probability of having n successes in N tries. Then if p is the probability of success on any
single try we have
The quantity inside the square brackets is called "the binomial coefficient" or "N choose n." The latter name is used because it gives the number of ways to choose n objects from a set of N things. For example, if you roll two ordinary dice the probability of getting a 7 is 1/6 (can you figure out why this is so?) so the probability of rolling 7 four times in eleven attempts is (11 choose 4)(1/6)^4(5/6)^7÷ 0.07.
There is a connection between the three distributions we have discussed. If N is very
large then the binomial distribution is practically indistinguishable from the normal distribution
with æ> = Np and
å> = Np(1-p); if N is very large and p is very small then the binomial distribution
is practically indistinguishable from the Poisson distribution with
>=Np. This equivalence of
Poisson and binomial distributions is the reason we could use the mutation experiment as an
illustration of each of these distributions.
Click here to return
Last updated Friday August 19 2005