I'm going to assume you already know about the following:

- Definition of probability in terms of frequency of occurrence in a large sample.
- Probability density function (p.d.f.) for a continuous variable, and its integral, the cumulative distribution function.
- Joint probabilities.
- Normalization.
- Definition of independent random variables as having separable joint p.d.f. ( iff and independent.)
- Bayes' theorem. ()
- Expectation values.

See References.

Suppose the correct model for the distribution of light bulb lifetimes [*] is

The expectation value for is the mean,

The expectation value for the variance is

[*] | Note: this is almost certainly not a good model for light bulb lifetimes. |

- Probabilities and p.d.f.s are theoretical models.
- They are not observable with perfect precision.
- Given an arbitrarily high number of observations , the observed distributions will converge to the true p.d.f. in the limit .
- Similarly, expectation values are not statistical means, although the latter converges to the former in the limit .

A statistic is a quantity depending on random variables. A statistic is therefore itself a random variable with its own p.d.f.

- Examples of statistics on random variables:
Mean:

Mean of squares:

Cumulative distribution statistic:

The last is an example of a statistic that is also a function of a parameter . It should approach the cumulative distribution function as .

Using the same p.d.f. as in the earlier example, and assuming independent light bulb lifetimes,

- This is not very easy to evaluate.
- However, the central limit theorem tells us that for , the distribution of will approach a gaussian with mean , variance .

- An
*estimator*is a statistic that can be used as an estimate of an unknown parameter of the p.d.f., such as in the example. - Statistical moments of the distributions are often useful as estimators.

- Examples:
- Mean:
- can be used directly as an estimator for , since .
- Variance:
- provides an estimator for , but not an unbiased one. .

The statistical moments are not necessarily the least *biased*, most
*efficient*, or most *robust* estimators.

- Consistency (desired perfect):
- Should converge to the correct value as , mathematically.
- Bias (desired low):
- Difference between expectation value and true value, at any .
- Efficiency (desired high):
- Inverse of the ratio of the estimator's variance to the minimum possible variance, given by the Rao-Cramer-Frechet bound.
- Robustness (usually desired high);
- Insensitive to departures from assumptions in the p.d.f. (Somewhat fuzzy.)

Another statistic one can construct for a data set is the *likelihood*:

- Again, is a random variable, with it's own p.d.f.
- If the p.d.f.s for depend on some parameters , the is also a function of .
- It is numerically equal to the value of the joint p.d.f. of the independent observations of .
- N.B. it is not a probability, because it doesn't have the properties
of a probability. In particular, is definitely
*not*a p.d.f. for or a "probability" of the theory to be true.

- If is a random variable, then the value of that maximizes is a random variable. Call it .
- is a consistent estimator for .
- It is asymptotically unbiased as .
- Its variance approaches the Rao-Cramer-Frechet bound as , i.e., it is efficient.
- may represent any number of parameters.

The inverse of the covariance matrix can be estimated using

For large samples or perfectly Gaussian probabilities, has a "Gaussian form", becomes parabolic in .

- In this limiting case, the -standard-deviation error contours for the parameters can be found at .

Finding proper confidence intervals in the more general case will be discussed in a later class.

It is usually easier to maximize

Equivalently, one minimizes the "effective chi-squared" defined as . For Gaussian statistics, this is exactly the chi-squared, if the standard deviations are known.

It is important to include all dependence on in , including normalization factors.

For ,

Find the maximum likelihood estimator for .

For real ,

Find the maximum likelihood estimator for and .

For integer ,

Find the maximum likelihood estimator for .

For real ,

Find the maximum likelihood estimator for and .

Note the estimator for is asymptotically unbiased in the limit of large , but not unbiased at finite . The bias can be corrected without degrading the asymptotic RMS of the estimator.

Once you know how to minimize a function, and you know the p.d.f.s of the data in your model, then numerical implementation of the maximum likelihood method is easy. Just write the function to calculate:

Then minimize it.

Note: if you have too many parameters, you might need to simplify it some, perhaps by pre-fitting some of the parameters in some faster way.

Make a maximum likelihood fit of the data in "dataset 1" provided on the course web page to the following model:

By "make a maximum likelihood fit", I mean "estimate the parameters using the maximum likelihood method":

- Check normalization of .
- Derive the likelihood function.
- Find an analytic solution.
- Write the function to mimimize into your fitter program and minimize it that way. Compare with analytic solution.
- Plot the solution vs. a histogram of the data and see if it make sense.

Make a maximum likelihood fit of the data in "dataset 2" provided on the course web page to the following model:

By "make a maximum likelihood fit", I mean "estimate the parameters using the maximum likelihood method":

- Derive the likelihood function. (Turn this in on paper or by e-mailing a PDF or similar document to me.)
- Find an analytic solution (if you can).
- Write the function to mimimize into your fitter program and minimize it that way. (Compare with analytic solution, if you succeeded to derive it.)
- Plot the solution vs. a histogram of the data and see if it make sense.
- Turn in likelihood function, analytic soultion, code, results, and data-fit comparison plot.

In the following, (R) indicates a review, (I) indicates an introductory text.

- PDG-Stat:
(R) "Probability", G. Cowan, in

*Review of Particle Physics*, C. Amsler et al., PL B667, 1 (2008) and 2009 partial update for the 2010 edition (http://pdg.lbl.gov).See also general references cited in PDG-Stat.

- Larson:
- (I)
*Introduction to Probability Theory and Statistical Inference*, 3rd ed., H.J. Larson, Wiley (1982).

- PDG-Prob:
(R) "Probability", G. Cowan, in

*Review of Particle Physics*, C. Amsler et al., PL B667, 1 (2008) and 2009 partial update for the 2010 edition (http://pdg.lbl.gov).See also general references cited in PDG-Prob.