SuperPM Blog/Prompt Guide

Design a rigorous A/B testing program from scratch

Your team runs occasional experiments but has no systematic approach — tests overlap, sample sizes are guessed, and results are cherry-picked. This sets up a structured experimentation program with proper hypothesis templates, statistical rigor, and a decision framework.

Delivery

14 uses·Published 3/27/2026·Updated 4/2/2026

Why Most A/B Tests Are a Waste of Time — And How to Fix It

Running A/B tests is easy. Running them correctly is surprisingly rare. According to research by Ronny Kohavi, former VP at Airbnb and Microsoft, approximately one-third of A/B tests at large tech companies produce statistically significant results — but even among those, a substantial fraction are false positives due to poor experimental design, peeking at results, or running tests on inadequately sized samples.

The Three Sins of Amateur Experimentation

Sin 1: No hypothesis. Teams launch tests because someone had an idea, not because they identified a specific behavior they expect to change. Without a clear hypothesis, you can't distinguish a successful experiment from a lucky one.

Sin 2: Premature peeking. Checking results daily and stopping the test when the graph looks good virtually guarantees false positives. Proper stopping rules exist for a reason — they protect you from your own impatience.

Sin 3: Ignoring guardrail metrics. A test that increases signups by 5% but decreases 30-day retention by 8% is a net loss. Yet teams that don't define guardrail metrics before running the test often celebrate the surface-level win and ship the change.

How the A/B Testing Program Prompt Works

This prompt builds a complete experimentation infrastructure in five steps. It starts with a hypothesis framework that forces teams to articulate what they expect to happen and why. Then it establishes statistical foundations — sample sizes, power calculations, and stopping rules. An experimentation roadmap prioritizes test ideas by impact, effort, and learning value. A results framework standardizes how outcomes are evaluated and documented. Finally, a culture section addresses the organizational habits that sustain rigorous experimentation.

The ICE scoring in Step 3 is particularly valuable for teams with more ideas than traffic. When you can only run 5-8 tests per quarter, choosing the right ones matters more than optimizing any individual test.

When to Use It

You're transitioning from gut-driven decisions to data-driven product development
Your team runs experiments but has no shared process, templates, or decision criteria
You've been burned by a test result that didn't hold up in production
Your product leadership wants to increase experimentation velocity without sacrificing rigor
You need to justify experimentation investment to executive stakeholders

Common Pitfalls

Testing too many things simultaneously. Multivariate tests require exponentially more traffic. For most products, simple A/B tests with one variable produce clearer insights faster.

Optimizing for local maxima. A/B tests are excellent for incremental optimization but terrible for evaluating bold new directions. Don't A/B test your way to a product strategy — use experiments for execution, not vision.

Not documenting negative results. Failed experiments are knowledge assets. Teams that don't record why something didn't work will inevitably re-run the same losing test six months later.

Sources

Trustworthy Online Controlled Experiments — Ronny Kohavi's comprehensive guide to experimentation at scale
How Booking.com Runs Thousands of Experiments — Harvard Business Review on experimentation culture
Sample Size Calculator for A/B Tests — Evan Miller's practical tool for experiment planning

Sources

Trustworthy Online Controlled Experiments — Cambridge University Press
Building a Culture of Experimentation — Harvard Business Review
Sample Size Calculator — Evan Miller

Prompt details

Ready to try the prompt?

Open the live prompt detail page for the full workflow.

View prompt details

Design a rigorous A/B testing program from scratch

Why Most A/B Tests Are a Waste of Time — And How to Fix It

The Three Sins of Amateur Experimentation

How the A/B Testing Program Prompt Works

When to Use It

Common Pitfalls

Sources

Sources

Prompt details

Ready to try the prompt?

More Delivery Guides

PRD Generator

User Story

Write Jira Issues