OperatorAI Frameworks
Three frameworks. One methodology.
The OperatorAI framework comprises three named published frameworks: the 4-to-34 Gap (operator-augmented AI delivers 28–34% lift vs 4–7% from self-serve AI), the 99 Rule (99% statistical confidence on every test, vs the industry-standard 95%), and the Evidence Stack (four-layer hypothesis prioritisation that prevents false-positive winners from reversing at 90 days).
Most CRO agencies have four-letter acronyms. We have three named frameworks. Each one describes something we actually do, and each one connects to documented evidence.
The three named frameworks
The 4-to-34 Gap
The documented 5x lift differential between DIY AI CRO tools (4-7% lift) and operator-led AI CRO (28-34% lift). Built on Build Grow Scale's 2026 research across 347 e-commerce stores. The same software, the same data. The operator is the variable.
Read more about The 4-to-34 Gap →The Evidence Stack
The four-layer testing discipline that protects every engagement from drift, false positives, and noise-deployed-as-truth. Operator-set hypotheses, sample-size discipline, The 99 Rule, and failure-as-information. The structural reason expert-guided AI delivers 28-34% lift.
Read more about The Evidence Stack →The 99 Rule
The statistical-significance discipline. Most CRO agencies stop tests at 95% confidence. I require 99%. The cost is longer test cycles. The benefit is that a winner actually wins when shipped to 100% of traffic.
Read more about The 99 Rule →Why operator-led frameworks beat off-the-shelf CRO tools
Most CRO programmes fail because they substitute an off-the-shelf tool for an operator's judgement. Hotjar gives you heatmaps. Optimizely gives you A/B testing. VWO gives you both plus a programme manager. None of them tell you which hypothesis to test, how long to run it, or when to roll back a false-positive winner. That is the operator-led work.
The three frameworks below — The 4-to-34 Gap, The 99 Rule, and The Evidence Stack — are the operator-led judgement-rules we apply across every GoGoChimp engagement. Each came out of a specific failure mode we saw repeatedly in client work, then validated against the Build Grow Scale 347-store research baseline. They work because they replace tool-choice with decision-rules.
How the three frameworks interact
The frameworks are not independent. They sit in a sequence, with each one feeding into the next. Skipping a framework breaks the sequence and produces predictable failure modes.
Step 1: Identify the gap (The 4-to-34 Gap)
The 4-to-34 Gap is a diagnostic, not a tactic. It tells you how much lift you should expect from operator-led AI work versus self-serve AI tools. Self-serve AI inside a SaaS tool delivers 4-7% conversion lifts in our client testing. Operator-led AI work, where a 13-year operator applies AI to specific hypotheses, delivers 28-34%. The differential is the operator's pattern recognition, not the AI.
Most CRO buyers assume the tool delivers the lift. The 4-to-34 Gap says the operator delivers the lift, and the tool is the substrate.
Step 2: Test with rigour (The 99 Rule)
The 99 Rule prevents false-positive wins from contaminating the programme. The industry standard — 95% statistical significance, 7-day run length, 200 conversions per variant — produces winners that don't hold up at 90 days. We've measured this on rolled-back tests across 40+ engagements.
The 99 Rule requires 99% statistical significance, minimum 14-day run length, minimum 1,000 conversions per variant. Tests that pass the 99 Rule hold up at 90 days, 180 days, and 365 days. Tests that pass at 95% significance with shorter run lengths reverse 40-60% of the time.
Step 3: Stack the evidence (The Evidence Stack)
The Evidence Stack is the documentation layer. It records the hypothesis, the test design, the result, the statistical confidence, the rollback condition, and the post-test observation. Every test produces an Evidence Stack entry, and every entry is queryable later.
Without the Evidence Stack, programmes lose institutional knowledge when an operator leaves, when a client changes agencies, or when a test result needs to be re-examined 18 months later. The stack is the audit trail.
What you get when you apply all three
Programmes that apply all three frameworks consistently produce 28-34% conversion lifts in 12-week engagements, with the lifts holding up at 90, 180, and 365 days. Programmes that apply one or two of the three deliver 8-15% lifts that reverse at 90 days. The frameworks compound — each one closes a different failure mode that the others don't catch.
The 4-to-34 Gap closes the wrong-tool failure mode. The 99 Rule closes the false-positive failure mode. The Evidence Stack closes the institutional-amnesia failure mode. Together they produce reliable, durable lifts.
How GoGoChimp applies these in client work
Every engagement starts with a diagnostic that scores the client's existing programme against the OperatorAI Maturity Model. Tier 1 (Ad-Hoc) and Tier 2 (Reactive) programmes typically need all three frameworks installed at once. Tier 3 (Structured) programmes usually have a working test cadence and need the 99 Rule and Evidence Stack but not the 4-to-34 Gap diagnostic. Tier 4 (Strategic) and Tier 5 (Operator-Led) programmes are already running framework-equivalent rules and need optimisation rather than installation.
The maturity scoring is the gating decision. Without it, we'd recommend the same three-framework install for every client, which would be wrong for the 20% of clients already at Tier 4 or above.
How the three fit together
The 4-to-34 Gap is the outcome. The Evidence Stack is the engine that produces it. The 99 Rule is one layer inside the Evidence Stack. All three live inside the OperatorAI methodology, the master-brand framework I have been documenting since 2013.
Read the full OperatorAI methodology →

