AI CRO
The ICE framework is broken. Here’s what to use instead for A/B test prioritisation
Every CRO team eventually encounters ICE scoring. Impact, Confidence, Ease — score each out of 10, multiply, prioritise descending. The appeal is obvious: a clean number you can defend in meetings.
The problem is that all three inputs are subjective, and in practice, whoever presents the hypothesis also scores it. I’ve sat through dozens of ICE-scored test backlogs where every advocate scored their own hypothesis 9/10/9 and nothing was ever deprioritised.
The three-part framework that replaces ICE
1. Evidence weight (0–5)
What do we actually know? A session recording showing 30% of users abandon on this step = 4. A past test on a similar page that won at 9% lift = 5. Someone’s hunch = 0. No hypothesis should enter the backlog without at least 2 points of evidence weight.
2. Ceiling estimate (percentage)
If this variant wins, what’s the realistic upper bound on the lift? Based on comparable tests, industry benchmarks, or Bayesian priors from prior tests. A 2% ceiling test on high-traffic is worth more than a 20% ceiling test on a low-traffic page.
3. Traffic-adjusted runtime (days)
Run the sample-size maths before prioritising. If your traffic produces significance in 9 days, ship it. If it requires 47 days, either increase traffic, accept the opportunity cost, or deprioritise.
How this changes the backlog
Tests with no evidence weight go to the bottom regardless of how confident the advocate feels. Tests with 30-day+ runtimes get staged later than faster tests of similar ceiling. Advocacy stops winning. Evidence wins.
Want us to do this for your site?
Book a free AI audit. 15 minutes. We’ll show you three things your site is missing and what we’d test first.
Book my free AI audit →


