The 9 A/B testing mistakes that invalidate your tests
Sample ratio mismatch, peeking, novelty effects — what to actually watch for in your test program.

Most A/B test programs are running tests that wouldn’t pass a statistician’s smell test. Here are the 9 mistakes we audit out of every program we inherit.
The big nine
- Sample ratio mismatch — control and variant traffic should split 50/50. Half the programs we audit don’t.
- Peeking at p-values — calling tests early kills validity.
- Novelty effects — first-week wins regress to mean.
- Underpowered tests — testing with traffic that can’t detect a real lift.
- Overlapping tests — running 3 tests on the same surface contaminates each.
- Wrong primary metric — optimizing CR when LTV is the goal.
- No segmentation — a 0% overall lift can hide +10% mobile and -10% desktop.
- Pre-test bias — control group already sees the variant somehow.
- Stopping after winning — failing to verify in production for 2+ weeks.
90% of declared CRO wins don’t replicate at 6 months. Build for the replication, not the announcement.


