We can't simultaneously minimize and
.
We can fix and find the test that minimizes
.
The Neyman-Pearson lemma says that a test that always achieves the
lowest possible for a given
has a critical region
of the following form:
That is, the critical region is defined by the region where the
likelihood of the observation assuming hypothesis
is not greater than
times the likelihood assuming hypothesis
.
Written another way, the critical region is defined by the test statistic
and the critical region is defined by .
There is a 1-1 mapping between and
. Choose the
that
gives the
you want. (Obviously, you need to know the p.d.f.s of
both hypotheses to do this.)
Suppose we have one model that the p.d.f. for light bulb
lifetime
is given by
for some known , and another model
which is the same except
that the mean is
, also known.
What's the Neyman-Pearson test statistic?
Since we're just going to compare it to a value , we can just as well use
In this case, this is simply
Let's take . The critical region defined by
can be rewritten as
Stated in words: "reject" the larger lifetime hypothesis if the observed
mean is smaller than some amount. Adjust that amount to get the
desired . This can be done assuming a gaussian distribution for
the mean if
is large; otherwise, evaluate it analytically or using
MC methods.
These plots were made with the code in classCh_example1.cc
with ,
, and
.
For any given value chosen as the decision criteria on the test
statistic, the value of
is given by the c.d.f. of the test
statistic assuming
(red curve above), and
is given by
1-c.d.f. of the test statistic assume
(blue curve above).
Generally one chooses
first and then finds the necessary
value of
for the decision. The Neyman-Pearson lemma says that
is as low as it can be for that
. Note there is in
general no particular advantage to setting
although
there may be reason to do so in some cases.
Suppose we have one model that the p.d.f. for light bulb
lifetime
is given by
for some unknown , and another model
for some unknown and
. Construct the test statistic as
before, and compare the best fit for
to the best fit for
.
Here you might want to evaluate the significance levels for a given
using a MC simulation.
(... see discussion in hypothesis test section of [PDG-Stat] ...)
You don't have to decide in advance at what significance level you will accept or reject a hypothesis.
Depending on what you are doing, it may not even be appropriate to
do so. It might be more appropriate to report the significance of
the result: the value such that the observed data
would be in the critical region for
,
out of the critical region of
.
For example, you might be investigating a specific alternative to Einstein's theory of general relativity in light of some new data. Rather than report just "hypothesis accepted" or "hypothesis rejected" according to your personal, pre-chosen
, the world would like to know what
is. Then every person can know, for her/his own personal
, whether they want to accept or reject the null hypothesis.
Despite the fact that the significance is often reported as a percentage, it is a random variable. It is definitely not the probability the hypothesis is really right or wrong.
The significance level, , is a number you (or someone)
chooses. You adjust your test so it has that probability of giving
you a false positive (type 1 error), on average, over many data sets.
is random variable determined by one measurement or set of
measurements, numerically equal to
,
where
is the value of the test statistic corresponding to the
significance level.
What you often most want is ,
the probability that the null hypothesis is true given one measurement or
set of measurements. Bayes' theorem tells us
if the truth/falseness of is itself a random variable.
This might be possible in the case of a medical
diagnosis, but not for a law of nature.
[*] | See Comment on Bayesian statistics. |
Again, we have a test statistic, which I'll call .
The -value is what the hypothetical model
says should be the
probability to find the statistic
in a region of equal or lesser
compatibility than the observed
: that is,
assuming
is true.
[†] | The proof is the reverse of the derivation of the inverse-distribution-function method of generating a random variable. |
How good is the fit of the exponential-plus-background model to the
data in the last assignment? Let's use the best-fit likelihood
as our test statistic. We'll get the p.d.f. for
using MC simulation.
Now just read off the -value from this histogram: according to the
simulation, if the hypothesis is true, what fraction of
would be worse than what you got for the actual data?
Choose either "option A" or "option B" below -- you do not have to do both.
Are the 119 globular clusters in the Arp 1965 catalog uniformly
distributed in , where
is galactic latitude?
Even though is meaningless
as a probability in the sense of "the fraction of possible
universes in which
would turn out to be true given that
we made these observations", some people like to use Bayes' law
anyway to characterize and update what they call their
"subjective degree of belief". Rather than using objective
data for
, they use that term in Bayes' law to reflect
their "prior subjective beliefs" ("priors" for short),
deliberately introducing this as something that can only be
changed by statistically significant evidence to the contrary.
This approach has caused a lot of controversy over the years.
In my opinion, there is nothing wrong with this as long as one
evaluates the resulting p.d.f.s as carefully as possible and
keeps in mind the limitations. However, it can go badly wrong
if the "prior" pre-assigns very low probability to what the
data actually ends up indicating, even if the "prior" is based
on little or no relevant data: for an extreme case, see "The
Logic of Intelligence Failure" by Bruce G. Blair [Blair2004],
actually written by a proponent of this way of thinking. I
won't talk about that further today.
In the following, (R) indicates a review, (I) indicates an introductory text, and (A) indicates an advanced text.
(R) "Probability", G. Cowan, in Review of Particle Physics, C. Amsler et al., PL B667, 1 (2008) and 2009 partial update for the 2010 edition (http://pdg.lbl.gov).
See also general references cited in PDG-Prob.
(R) "Statistics", G. Cowan, in Review of Particle Physics, C. Amsler et al., PL B667, 1 (2008) and 2009 partial update for the 2010 edition (http://pdg.lbl.gov).
See also general references cited in PDG-Stat.