Design a statistically rigorous A/B experiment with sample size calculations, guardrail metrics, validity checks, and a full results interpretation and analysis plan.
I want to run an experiment on . Here's the context:
- Hypothesis:
- Primary metric:
- Current baseline:
- Minimum detectable effect I care about:
- Weekly traffic/users exposed:
Please:
1. Calculate the required sample size and estimated runtime
2. Identify 2-3 secondary metrics and guardrails to track
3. Flag potential confounders or validity threats (novelty effect, seasonality, etc.)
4. Draft the test plan including variant descriptions and rollout %
5. Write the analysis plan: how I'll interpret results, including edge cases like inconclusive outcomes
Assume I'm using a standard frequentist framework with 95% confidence and 80% power.
Before you begin, ask me any clarifying questions that would help you produce a more accurate or useful output.