Design an autonomous experiment loop for product optimization
You know something in your product could be better — onboarding copy, pricing page layout, notification timing — but running A/B tests manually is slow and you never get through enough variants. Apply Karpathy's autoresearch pattern (https://github.com/karpathy/autoresearch) to set up a structured experiment loop where each iteration builds on the last.
The Experiment Loop Is the Product Manager's Scientific Method
Product management borrows heavily from science without always admitting it. We form hypotheses about customer behavior, design interventions to test them, and use data to update our beliefs. But unlike scientists, most product teams lack a rigorous, repeatable process for running experiments.
The Problem
The typical product experiment goes like this: someone has an idea, the team debates it, they build it, they check the metrics a week later, and they either celebrate or move on. There is no formal hypothesis. There is no pre-registered success criteria. There is no systematic learning captured for future experiments.
This ad hoc approach produces two failure modes. First, teams run experiments that cannot actually disprove their hypothesis, because they never specified what "failure" looks like. Second, teams fail to compound their learnings, running each experiment in isolation rather than building on what previous experiments revealed.
A 2023 Eppo analysis of 10,000 product experiments found that teams with documented hypotheses and pre-registered success criteria were 3.2 times more likely to make correct ship/no-ship decisions compared to teams that evaluated results post hoc. The discipline of writing down what you expect to happen before you see the data is the single most impactful practice in experimentation.
How This Prompt Works
This prompt creates a self-reinforcing experiment loop that mirrors the scientific method. For each experiment, it generates:
- Hypothesis: A specific, falsifiable prediction about what will happen
- Method: The experiment design, including control conditions, audience segmentation, and duration
- Success criteria: Pre-registered thresholds that define whether the hypothesis is supported or refuted
- Learning capture: A structured template for documenting what was learned, regardless of outcome
- Next experiment: Based on the results, what the next logical experiment should test
The "autonomous" aspect means the loop is self-perpetuating. Each experiment's results inform the next experiment's hypothesis, creating a compounding knowledge base rather than a scattered collection of one-off tests.
According to Microsoft's ExP Platform team, which runs over 20,000 controlled experiments per year, the most valuable output of experimentation is not any individual result but the organizational capability to learn quickly. Teams that run more experiments per unit of time make better product decisions, even when most individual experiments fail.
When to Use It
- When launching a new feature to set up a learning system rather than a one-time measurement
- During growth optimization where small improvements compound over time
- When entering a new market to rapidly test assumptions about unfamiliar customer behavior
- As a team practice to build experimentation muscle across the organization
Common Pitfalls
- Not running experiments long enough. According to a 2023 Statsig report, 44% of product experiments are concluded before reaching statistical significance, leading to a false positive rate of up to 30%. Patience is a methodology, not a personality trait.
- Ignoring guardrail metrics. An experiment that improves conversion by 5% but increases support tickets by 50% is not a win. Define guardrail metrics that must not degrade before you launch.
- Optimizing for local maxima. Small iterative experiments can trap you on a local peak. Occasionally run big, bold experiments that explore fundamentally different approaches.
- Experimentation theater. Running experiments without the willingness to act on surprising results is worse than not running them. If the team will ship the feature regardless of results, do not waste time pretending to experiment.
Further Reading
- Kohavi, R., Tang, D., & Xu, Y. (2020). *Trustworthy Online Controlled Experiments*. Cambridge University Press. https://www.cambridge.org/highereducation/books/trustworthy-online-controlled-experiments/D97B26382EB0EB2DC2019A7A7B6B0B43
- Eppo. (2023). State of Experimentation Report. https://www.geteppo.com/blog
- Statsig. (2023). Experimentation Best Practices. https://statsig.com/blog
Sources
Prompt details
Ready to try the prompt?
Open the live prompt detail page for the full workflow.