We can't simultaneously minimize and .
We can fix and find the test that minimizes .
The Neyman-Pearson lemma says that a test that always achieves the lowest possible for a given has a critical region of the following form:
That is, the critical region is defined by the region where the likelihood of the observation assuming hypothesis is not greater than times the likelihood assuming hypothesis .
Written another way, the critical region is defined by the test statistic
and the critical region is defined by .
There is a 1-1 mapping between and . Choose the that gives the you want. (Obviously, you need to know the p.d.f.s of both hypotheses to do this.)
Suppose we have one model that the p.d.f. for light bulb lifetime is given by
for some known , and another model which is the same except that the mean is , also known.
What's the Neyman-Pearson test statistic?
Since we're just going to compare it to a value , we can just as well use
In this case, this is simply
Let's take . The critical region defined by can be rewritten as
Stated in words: "reject" the larger lifetime hypothesis if the observed mean is smaller than some amount. Adjust that amount to get the desired . This can be done assuming a gaussian distribution for the mean if is large; otherwise, evaluate it analytically or using MC methods.
These plots were made with the code in classCh_example1.cc with , , and .
For any given value chosen as the decision criteria on the test statistic, the value of is given by the c.d.f. of the test statistic assuming (red curve above), and is given by 1-c.d.f. of the test statistic assume (blue curve above). Generally one chooses first and then finds the necessary value of for the decision. The Neyman-Pearson lemma says that is as low as it can be for that . Note there is in general no particular advantage to setting although there may be reason to do so in some cases.
Suppose we have one model that the p.d.f. for light bulb lifetime is given by
for some unknown , and another model
for some unknown and . Construct the test statistic as before, and compare the best fit for to the best fit for .
Here you might want to evaluate the significance levels for a given using a MC simulation.
(... see discussion in hypothesis test section of [PDG-Stat] ...)
You don't have to decide in advance at what significance level you will accept or reject a hypothesis.
Depending on what you are doing, it may not even be appropriate to do so. It might be more appropriate to report the significance of the result: the value such that the observed data would be in the critical region for , out of the critical region of .
For example, you might be investigating a specific alternative to Einstein's theory of general relativity in light of some new data. Rather than report just "hypothesis accepted" or "hypothesis rejected" according to your personal, pre-chosen , the world would like to know what is. Then every person can know, for her/his own personal , whether they want to accept or reject the null hypothesis.
Despite the fact that the significance is often reported as a percentage, it is a random variable. It is definitely not the probability the hypothesis is really right or wrong.
The significance level, , is a number you (or someone) chooses. You adjust your test so it has that probability of giving you a false positive (type 1 error), on average, over many data sets.
is random variable determined by one measurement or set of measurements, numerically equal to , where is the value of the test statistic corresponding to the significance level.
What you often most want is , the probability that the null hypothesis is true given one measurement or set of measurements. Bayes' theorem tells us
if the truth/falseness of is itself a random variable. This might be possible in the case of a medical diagnosis, but not for a law of nature.
[*] | See Comment on Bayesian statistics. |
Again, we have a test statistic, which I'll call .
The -value is what the hypothetical model says should be the probability to find the statistic in a region of equal or lesser compatibility than the observed : that is, assuming is true.
[†] | The proof is the reverse of the derivation of the inverse-distribution-function method of generating a random variable. |
How good is the fit of the exponential-plus-background model to the data in the last assignment? Let's use the best-fit likelihood as our test statistic. We'll get the p.d.f. for using MC simulation.
Now just read off the -value from this histogram: according to the simulation, if the hypothesis is true, what fraction of would be worse than what you got for the actual data?
Choose either "option A" or "option B" below -- you do not have to do both.
Are the 119 globular clusters in the Arp 1965 catalog uniformly distributed in , where is galactic latitude?
Even though is meaningless as a probability in the sense of "the fraction of possible universes in which would turn out to be true given that we made these observations", some people like to use Bayes' law anyway to characterize and update what they call their "subjective degree of belief". Rather than using objective data for , they use that term in Bayes' law to reflect their "prior subjective beliefs" ("priors" for short), deliberately introducing this as something that can only be changed by statistically significant evidence to the contrary. This approach has caused a lot of controversy over the years. In my opinion, there is nothing wrong with this as long as one evaluates the resulting p.d.f.s as carefully as possible and keeps in mind the limitations. However, it can go badly wrong if the "prior" pre-assigns very low probability to what the data actually ends up indicating, even if the "prior" is based on little or no relevant data: for an extreme case, see "The Logic of Intelligence Failure" by Bruce G. Blair [Blair2004], actually written by a proponent of this way of thinking. I won't talk about that further today.
In the following, (R) indicates a review, (I) indicates an introductory text, and (A) indicates an advanced text.
(R) "Probability", G. Cowan, in Review of Particle Physics, C. Amsler et al., PL B667, 1 (2008) and 2009 partial update for the 2010 edition (http://pdg.lbl.gov).
See also general references cited in PDG-Prob.
(R) "Statistics", G. Cowan, in Review of Particle Physics, C. Amsler et al., PL B667, 1 (2008) and 2009 partial update for the 2010 edition (http://pdg.lbl.gov).
See also general references cited in PDG-Stat.