On P-Value and Intuition
Let's start with the most obvious question... what is the technical definition of a P-Value? The p-value is the probability of obtaining results as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. The p-value is used as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means that there is stronger evidence in favor of the alternative hypothesis.
Sound confusing? That's because it is, and is often a root problem for misinterpreting the implications of a test being statistically significant. (something I'm sure we've all done in a meeting or two).
In fact, even scientists have a hard time fully explaining what a p-value is. Christie Aschwanden of FiveThirtyEight met with some world’s leading experts on meta-science at the METRICS conference at Stanford, and not a single one could translate the technical definition into something that resembles plain-English.
Even Steven Goodman, co-director of METRICS said "I cannot tell you what it means, and almost nobody can. Scientists regularly get it wrong, and so do most textbooks..."
But don't worry we did a little digging. The most straight forward example we could find was from Stuart Buck who said, "Imagine that you have a coin that you suspect is weighted toward heads. (Your null hypothesis is then that the coin is fair.) You flip it 100 times and get more heads than tails. The p-value won’t tell you whether the coin is fair, but it will tell you the probability that you’d get at least as many heads as you did if the coin was fair."
What This Means For You
If scientists at a conference called METRICS are having a hard time understanding a p-value, it probably means most of the advertising industry does as well. Don't fret! Having an intuitive grasp of a p-value doesn't invalidate any past or future tests.
BUT, it does mean you should likely review any expensive upcoming studies to see if the designer of the testing agenda understands what the metric at the end means. A better understanding of the statistical framework involved in your testing agendas will lead to more accurate and more frequent wins.
--
This write up is part 1 in the series we call Adventures in Significance. After running 257 lift studies across 30+ different clients in our past, we thought it would be helpful to write a series on common mistakes and how to avoid them. This is a dreaded subject by many but is core to the direct response advertising world. We thought we could do our part with a round-up of our favorite opinions on the subject and how this applies to your business.
--
Thanks to Stephen Dawson for sharing their work on Unsplash