structured
Structured
Short description placeholder
Structured. Tier 3 of the OperatorAI Maturity Model
What this tier means
You have a regular testing cadence. 10-19 tests per quarter, scheduled rather than reactive. There's a hypothesis priority framework, PIE, ICE, or an internal rubric. Sample size is calculated before launch (most of the time). The team tracks losing tests in a document somewhere.
This is the tier where most ambitious in-house CRO programmes plateau. The cadence is real. The discipline is partial. The lift is in the 8-15% band, depending on which disciplines have been formalised.
What it looks like in practice
- 10-19 tests per quarter (mostly hero, CTA, copy variations; some pricing-page tests)
- Hypotheses ranked via PIE / ICE / internal scoring rubric
- Sample size calculated against 95% confidence with documented MDE
- 95% threshold gates the winner-call (tool default; no override)
- Some peeking happens, but the team feels guilty about it
- Losing tests logged in a Notion / Airtable tracker, not always reviewed for pattern-recognition
- Self-serve AI tools used routinely
Why this matters
Structured programmes have built most of the testing infrastructure. What's missing is the protocol that turns infrastructure into compounding wins:
- The 99 Rule. Moving from 95% to 99% significance reduces false positives 5x. The 4% gap costs ~5 false positives per year on a 120-test programme.
- Failure-as-information. The tracker exists, but losing tests aren't being mined for failure-mode patterns.
- Operator-set hypothesis quality. PIE/ICE rubrics are better than nothing, but they don't substitute for 13 years of pattern-recognition.
Recommended next move
Pricing Experimentation Audit — £2,500
Five testable pricing hypotheses + 12-week implementation roadmap. 21 days end-to-end. The most undertested surface in your funnel, with the highest revenue-to-conversion-lift mapping. Built on the same testing discipline that took Enzymedica from 3.4% to 16.9% conversion on Black Friday 2021.


