The 9 A/B testing mistakes that invalidate your tests

Sample ratio mismatch, peeking, novelty effects — what to actually watch for in your test program.

The 9 A/B testing mistakes that invalidate your tests

Most A/B test programs are running tests that wouldn’t pass a statistician’s smell test. Here are the 9 mistakes we audit out of every program we inherit.

The big nine

  • Sample ratio mismatch — control and variant traffic should split 50/50. Half the programs we audit don’t.
  • Peeking at p-values — calling tests early kills validity.
  • Novelty effects — first-week wins regress to mean.
  • Underpowered tests — testing with traffic that can’t detect a real lift.
  • Overlapping tests — running 3 tests on the same surface contaminates each.
  • Wrong primary metric — optimizing CR when LTV is the goal.
  • No segmentation — a 0% overall lift can hide +10% mobile and -10% desktop.
  • Pre-test bias — control group already sees the variant somehow.
  • Stopping after winning — failing to verify in production for 2+ weeks.
90% of declared CRO wins don’t replicate at 6 months. Build for the replication, not the announcement.
← Back to all articlesWork with us