ReproNim Statistics Module

P-values and their issues


Teaching: ~ 60 min
Exercises: ~ 60 min
  • What is a p-value ?

  • What should I be aware of when I see a ‘significant’ p-value ?

  • After this lesson, you should know what is a p-value and interpret it appropriately. You will know about the caveats of p-values.

Introduction: are p-values entirely evil ?

As often, any headline with a question mark is answered with a “no”. But p-values have been seriously mis-used by scientists, especially in the life science and medical fields, such that they require specific attention, hence this lesson.


Starting with a little challenge !

Can you answer these questions? Even if yes, you may want to read the p-value section —>

If the distribution of a statistic is assumed (by assuming the distribution of the original data) it can be used to compute a p-value. For instance, if you assume that the distribution of the sum of the \(Y\)s is normal and has zero mean, and if we estimate the dispersion (the variance), we have a distribution under the hypothesis that the sum of the values has zero mean. We can then compare the observed statistic with its distribution under this (null) hypothesis.

“Informally, a p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.” It requires that we know the distribution of the statistic (the “statistical summary”) under a certain hypothesis. ASA Statement on Statistical Significance and P-Values

Again, wikipedia is a great resource. You can also read the short blog from realClearScience to make sure you have the main idea right.

The key concepts and limitations are:

  1. P-values can indicate how incompatible the data are with a specified statistical model.

  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

  4. Proper inference requires full reporting and transparency. P-values and related analyses should not be reported selectively.

  5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.

  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

Other resource:

This paper by Steven Goodman (unfortunately seems to be under pay wall, please email us if you don’t have access) tells you about the mis-conceptions of p-values.

Exercise on p-values

For this exercise, you need to be able to generate some values from a normal distribution and perform a test on the mean of these values. You can use python, R, or octave/matlab for this. An example in python with the solutions to the exercise is given here.

Multiple Comparison problem:

One of the best way to understand the problem is to look at the xkcd view of it. The cartoon is great not only because it exposes the issue, but because it also exposes the consequence of the issue in peer review publications.

Please also go through the wikipedia of multiple comparisons

Exercise on multiple comparison issue :

Can you answer these questions? —>

Understanding statistical power and significance testing:

These interactive visualization are helpful to understand the concepts.

P-value and base rate fallacy: to be read, and read again

The base rate fallacy is one of the most common source of error and mis-conception. Here is a great article about it:

Base rate fallacy

This takes as an example the number of drugs tested on individuals, or the mamography test for cancer, but easily generalize to number of voxels or ROIs tested. It introduces you to very important concepts, read carefully and make sure you understand what is the base rate fallacy. After reading, you should know more not only Type I and Type II errors, but importantly, on what the issue of the having low prior probability for the alternative hypothesis.

Can you answer these questions? —>

Going further : what is the distribution of a p-value?

You should have now a good idea of what is a distribution, and what is the cumulative density function of a random variable.

An interesting fact is that p-values, which are random variable because they are just a function of the data, and the data are random (since you got these specific data by sampling eg subjects).

So, say you sample from a normal N(0,1) distribution, what is the distribution of a p-value for a test T (for instance the test T is simply a z-score for a sample of N(0,1) variables). We show that this distribution is uniform, where all values are equally probable (loosely speaking).

Warning: this is more advanced material, you may want to skip it if you don’t have some mathematical background

Let’s take T as your random variable. Note, the definition of a random variable is not straightforward, but roughly speaking it is a function that “maps from an outcome of the events (that is, from a point in a probability space) to a mathematically convenient outcome label, usually a real number.”

We write “equal by definition” with the symbol \(\equiv\). We note the random variables with capital letters and specific values taken by lower case letters. \(F_T\) is the cumulative density function (CDF) of \(T\), and \(F_P\) is the CDF of \(P\).

We define our variable \(P\) with: \(P \equiv F_T(T)\)

This means that \(F_T(t) \equiv Pr(T \leq t)\). We have by definition of \(P\) and \(F_T\):

\[Pr(P \leq p) = Pr(F_T(T) \leq p)\]

If F is invertible, and for continuous random variable with strictly monotonic CDF (CDF that are never “flat”) it is the case, \(F_T\) has an inverse \(F_T^{-1}\), and we can apply this function on both side of the inequality without changing the inequality:

\[F_T(T) \leq p \equiv F_T^{-1}F_T(T) \leq F_T^{-1}(p)\]


\[Pr(P \leq p) = Pr(T \leq F^{-1}(p)) \equiv F_T(F_T^{-1}(p)) = p\]


\[Pr( P \leq p) \equiv F_p(p) = p\]

Therefore, the CDF of \(P\) is the identity function \(CDF(x)=x\). As the probability distribution function (PDF) is simply the derivative of the CDF (when this derivative exists) we finally have that \(PDF(P) = 1\), with \(P\) taking values between 0 and 1. This is a uniform random variable, each observed p is as likely as any other.

This fact is used latter in this course on for instance to demonstrate the presence in p-hacking in the litterature. See lesson on what p-hacking is.

Key Points