Class 0x0B: Statistics

Quick non-review of probability

I'm going to assume you already know about the following:

See References.

Example: light bulb lifetime model

Suppose the correct model for the distribution of light bulb lifetimes [*] is

\frac{dP}{dt} = f(t) = \frac{1}{\mu}e^{-t/\mu}.

The expectation value for t is the mean,

E[t] = \int_0^\infty t f(t) dt = \mu

The expectation value for the variance is

E[(t-\mu)^2] = V[t] = \int_0^\infty (t-\mu)^2 f(t) dt = \mu^2
[*]Note: this is almost certainly not a good model for light bulb lifetimes.

Caution on probabilities and expectation values

What is a statistic?

A statistic is a quantity depending on random variables. A statistic is therefore itself a random variable with its own p.d.f.

Examples of statistics on random variables:

Mean:

\<x\> = \frac{1}{N}\sum_{i=1}^N x_i

Mean of squares:

\<x^2\> = \frac{1}{N} \sum_{i=1}^N x_i^2

Cumulative distribution statistic:

S(x')= \frac{1}{N}\sum_{i \text{ for } x_i \leq x'} 1

The last is an example of a statistic that is also a function of a parameter x'. It should approach the cumulative distribution function as N\rightarrow \infty.

Example: statistics of lightbulb lifetimes

Using the same p.d.f. as in the earlier example, and assuming independent light bulb lifetimes,

f(t_1,t_2,t_3...) = \prod_{i=1}^N f(t_i)
f_{\<t\>}\left(\<t\>\right) = \int \<t\> \prod_{i=1}^N f(t_i) dt_i
= \frac{1}{N}\sum_{i=1}^N \int_0^{\<t\>} f(t_1) dt_1 \int_0^{\<t\>-t_1} f(t_2) dt_2 \int_0^{\<t\>-(t_1+t_2)} f(t_3) dt_3 ....

Estimators

Examples:
Mean:
\<x\> can be used directly as an estimator for E[x], since E[\<x\>]=E[x].
Variance:
\<(x-\<x\>)^2\> provides an estimator for E[(x-E[x])^2]=V[x], but not an unbiased one. E[\<(x-\<x\>)^2\>] = V[x]\cdot (N-1)/N.

The statistical moments are not necessarily the least biased, most efficient, or most robust estimators.

Desired properties of estimators

Consistency (desired perfect):
Should converge to the correct value as N\rightarrow \infty, mathematically.
Bias (desired low):
Difference between expectation value and true value, at any N.
Efficiency (desired high):
Inverse of the ratio of the estimator's variance to the minimum possible variance, given by the Rao-Cramer-Frechet bound.
Robustness (usually desired high);
Insensitive to departures from assumptions in the p.d.f. (Somewhat fuzzy.)

The likelihood statistic

Another statistic one can construct for a data set is the likelihood:

L = \prod_{i=1}^N f_x(x_i)

Maximum likelihood estimators

The variance of maximum likelihood estimators

Practical maximum likelihood estimators

It is usually easier to maximize

\log L(\alpha) = \sum_{i=1}^N \log(f_i(x; \alpha)).

Equivalently, one minimizes the "effective chi-squared" defined as -2\log L(\alpha). For Gaussian statistics, this is exactly the chi-squared, if the standard deviations are known.

It is important to include all dependence on \alpha in f(x;\alpha), including normalization factors.

Example: estimator for exponential distribution

For x>0,

\frac{dP}{dx} = \frac{1}{s} e^{-x/s}.

Find the maximum likelihood estimator for s.

Example: estimators for double-exponential distribution

For real x,

\frac{dP}{dx} = \frac{1}{2s} e^{-|x-\mu|/s}.

Find the maximum likelihood estimator for \mu and s.

Example: estimator for Poisson distribution

For integer k \geq 0,

P_k = \frac{e^{-\lambda} \lambda^k}{k!}.

Find the maximum likelihood estimator for \lambda.

Example: estimators for Gaussian distribution

For real x,

\frac{dP}{dx} = \frac{1}{\sqrt{2\pi}\sigma} e^{-(x-\mu)^2/(2\sigma^2)}.

Find the maximum likelihood estimator for \mu and \sigma.

Note the estimator for \sigma is asymptotically unbiased in the limit of large N, but not unbiased at finite N. The bias can be corrected without degrading the asymptotic RMS of the estimator.

Numerical implementation

Once you know how to minimize a function, and you know the p.d.f.s of the data in your model, then numerical implementation of the maximum likelihood method is easy. Just write the function to calculate:

-\log L(\alpha) = -\sum_{i=1}^N \log(f_i(x; \alpha)).

Then minimize it.

Note: if you have too many parameters, you might need to simplify it some, perhaps by pre-fitting some of the parameters in some faster way.

Exercise

Make a maximum likelihood fit of the data in "dataset 1" provided on the course web page to the following model:

\frac{dP}{dt} = f(t) = \left\{ \begin{array}{ll} b & \text{if~} t<t_1 \\ b+s & \text{if~} t_1 \leq t \leq t_2 \\ b & \text{if~} t>t_2 \end{array} \right. .

By "make a maximum likelihood fit", I mean "estimate the parameters b,s,t_1,t_2 using the maximum likelihood method":

Assignment

Make a maximum likelihood fit of the data in "dataset 2" provided on the course web page to the following model:

\frac{dP}{dt} = f(t) = \left\{ \begin{array}{ll} b & \text{if~} -1<t<0 \\ b+\frac{1-b}{\mu (1-e^{-1/\mu})} e^{-t/\mu} & \text{if~} 0 \leq t < 1 \end{array} \right. .

By "make a maximum likelihood fit", I mean "estimate the parameters b,\mu using the maximum likelihood method":

References

In the following, (R) indicates a review, (I) indicates an introductory text.

Probability:

PDG-Stat:

(R) "Probability", G. Cowan, in Review of Particle Physics, C. Amsler et al., PL B667, 1 (2008) and 2009 partial update for the 2010 edition (http://pdg.lbl.gov).

See also general references cited in PDG-Stat.

Statistics:

Larson:
(I) Introduction to Probability Theory and Statistical Inference, 3rd ed., H.J. Larson, Wiley (1982).
PDG-Prob:

(R) "Probability", G. Cowan, in Review of Particle Physics, C. Amsler et al., PL B667, 1 (2008) and 2009 partial update for the 2010 edition (http://pdg.lbl.gov).

See also general references cited in PDG-Prob.