Stat Sig Ain't The Whole Story
We have all heard of A/B tests and likely have attempted to do one ourselves. For those unfamiliar, an A/B test is one in which a variable is changed and cell A is compared via some type of statistically test to cell B. But have you heard of A/A tests? It's exactly as it sounds. You test the exact same thing against itself. Why would you ever do such a thing? Because the majority of advertisers are often ill-equipped to run a proper A/B test, and the A/A test is a concept that will help verify the validity of your testing procedure.
The author of the article, A/A Testing: How To Increase Conversions 300% by Doing Absolutely Nothing, showcases a fantastic use case of A/A testing with his email distribution list. Even with an email list of 30,000 the author finds many aspects in his A/A test that show a statistically significant result... but it was the exact same thing tested against itself.
"To many 'wantrepreneurs' (my former self included), this looks like 'oh wow, you increased opens by 10%!' They may even punch it into Visual Website Optimizer’s significance calculator and see that p=.048. 'It’s statistically significant!' they (or I) might exclaim.
…to a trained statistician, there is nothing remarkable about these 'results'. Given the baseline conversion rate on opens, the sample size simply isn’t large enough to get a reliable result. What’s happening here is just the silly tricks our feeble human minds play on us when we try to measure things."
While this may sound like it doesn't apply to your tests, this is the exact thing happening with many A/B tests. Results that are interpreted as gospel because it indicated something incremental, and thus a new tactic is chosen. But what was your confidence interval? What was the sample size or power of the test (a concept we will deep dive into on a future newsletter)?
"Even if you do have a large enough sample size, you’re bound to get the occasional 'false positive' or 'false negative.' Meaning that you could make the completely wrong decision based upon false information."
We're not saying A/B testing is bad or wrong, quite the opposite. It's an incredibly powerful tool that can provide directional information, but as with many tools it can become dangerous when used incorrectly or when managed by people who don't adequately understand the variables at play.
Before your next A/B test, ask yourself a handful of questions to decide whether you should be running:
- Do I have an important question? Will answering this question make an impact worth the effort of running a test?
- What else could I be doing with my energy and money? By the nature of testing, it's an overall less efficient use of funds than scaling a "winning" strategy.
- Can I run a large enough test? Use this sample size calculator to see if you're actually able to reach statistical significance.
- Do I really understand testing? Try running an A/A test to get a feel for how misleading “results” can be. Be honest with yourself around your ability to interpret results, because the final decision of A versus B with potentially thousands or millions in ad budget (and thus revenue for the business).
If you want to double-click into this topic, we recommend reading "Why Data, Statistics and Numbers Can Make You do the Wrong Thing". Lengthy but extremely informative on the pitfalls of A/B testing.
--
This write up is part 4 in the series we call Adventures in Significance. After running 257 lift studies across 30+ different clients in our past, we thought it would be helpful to write a series on common mistakes and how to avoid them.
This is a dreaded subject by many but is core to the direct response advertising world. We thought we could do our part with a round-up of our favorite opinions on the subject and how this applies to your business.
--
Thanks to Science in HD for sharing their work on Unsplash