AI CRO
CRO Glossary: 50+ Conversion Rate Optimisation Terms (Defined)

If you can articulate statistical significance, sequential testing, and the gap between self-serve and expert-guided AI CRO in one sentence each without checking, close this tab. The rest is for the rest of us.
A working glossary of 50+ conversion rate optimisation terms across testing, AI CRO, page speed, and behavioural psychology, written by a 13-year CRO operator for people who actually run tests.
I've watched this industry mangle its own vocabulary for 13 years. Founders use "statistical significance" to mean "the dashboard turned green." Agencies sell "AI CRO" without saying whether they mean self-serve software or operator-led testing. The 5× delta between those two answers is real money. Build Grow Scale's research across 347 stores found self-serve AI tools deliver 4–7% lift while operator-led AI delivers 28–34% (Stafford, 2026).
So this is the working definition list I wish someone had handed me in 2013. 50+ terms. Defined for practitioners. No padding, no textbook recursion, every term linked to where it does real work on the GoGoChimp pillar pages.
Conversion rate optimisation is a vocabulary problem before it's a tooling problem. If your team can't agree on what "significance" means, no testing platform will save you.
Quick navigation
- Tier A: Core CRO terms
- Tier B: AI CRO terms
- Tier C: Page speed and Core Web Vitals
- Tier D: Behavioural and psychology terms
- FAQ
Tier A: Core CRO terms
The definitional spine of the discipline. If you can't articulate these in one sentence each, you don't yet have a testing programme. Linked to the A/B Testing pillar where they do the most work.
A/B test
A controlled experiment that splits live traffic between a control (the existing experience) and one variant (a single change), measuring which version produces a higher conversion rate at statistical significance. A/B tests isolate one variable so the resulting lift is attributable. Run on the wrong page, with the wrong sample size, or stopped early, an A/B test produces a number that looks like a finding but isn't. See the A/B Testing pillar for the discipline behind running them properly.
Bayesian vs frequentist testing
Two competing statistical frameworks for interpreting test results. Frequentist testing asks "how likely is this result if there's no real difference?" and uses p-values plus a fixed sample size. Bayesian testing asks "given what I've seen, what's the probability the variant beats the control?" and updates as data arrives. Most modern testing platforms (VWO, Optimizely) ship Bayesian by default because it's friendlier to early-stopping users. GoGoChimp tests at 99% statistical significance regardless of framework.
Confidence interval
The range a test result is likely to fall within if you re-ran the test under identical conditions. A reported 12% lift with a 95% confidence interval of 8–16% means the true effect almost certainly sits between 8% and 16%. A reported 12% lift with a confidence interval of -2% to +26% means you've measured nothing useful. Confidence intervals separate "the variant won" from "the variant might have won, or might have done nothing."
Control / variant
The two sides of an A/B test. The control is the existing experience left untouched. The variant (or "treatment") is the version with the proposed change. The point of running both simultaneously, with traffic split randomly, is to isolate the change as the only difference between groups. If you run them sequentially or split unevenly, seasonality and traffic-mix differences pollute the result. Multi-variant tests run several variants against one control.
Conversion rate
The percentage of visitors who complete the goal action on a page or funnel: purchase, signup, lead-form submission, donation. Calculated as conversions divided by sessions (or unique visitors), times 100. Enzymedica moved from 3.4% to 16.9%. Donate For Charity added 494% more donations in 30 days. The conversion rate is the metric that does or doesn't move; everything else is diagnostic. See What Is Conversion Rate Optimisation? for the working definition.
Conversion rate optimisation (CRO)
The discipline of systematically increasing the percentage of visitors who complete a goal action, using hypothesis-driven testing, behavioural research, technical performance fixes, and AI-driven experimentation. CRO is not a tactic. It's a measurement discipline applied to the entire funnel: traffic quality assumed, goal action defined, every change validated against revenue rather than dashboard noise. The GoGoChimp homepage lays out our specific operator-led approach.
False positive / false negative
Two categories of testing mistake. A false positive ("Type I error") is calling a winner when the variant didn't actually beat the control, typically caused by stopping a test early because the dashboard "looked significant." A false negative ("Type II error") is missing a real winner because the test was under-powered or run for too short. Most ecommerce A/B tests fail with false positives; most SaaS CRO tests fail with false negatives. Both cost money.
Hypothesis
A specific, falsifiable statement of the form: "If we change X, then metric Y will move by Z%, because of mechanism M." A real hypothesis names the change, the metric, the expected magnitude, and the reasoning. "Test a green button" is not a hypothesis. "Replace the generic 'Add to Cart' with 'Get yours before midnight' because scarcity copy on session-limited inventory increases urgency-driven conversions on mobile by 8–12%" is one. The OperatorAI methodology (GoGoChimp's CRO methodology, distinct from OpenAI's Operator agent product) starts with hypotheses, never with tests.
Lift
The percentage increase in the conversion rate (or revenue per visitor) of the variant over the control. A 12% lift on a 3% control becomes a 3.36% conversion rate. Lift is the headline number, but it's only meaningful at statistical significance and inside a confidence interval. The 4–7% vs 28–34% gap from The 347 Method is a lift comparison, not a tooling comparison.
Multivariate test
An experiment that varies several elements simultaneously (headline, image, and CTA, for example), measuring every combination against a control to identify which combination wins and which individual elements drive the lift. Multivariate tests need much larger sample sizes than A/B tests because traffic splits across many cells. Run them when you've already shipped winners from prior A/B tests and want to interrogate combinations, not when you're testing for the first time on a low-traffic page.
Power analysis
A pre-test calculation that tells you how much traffic each variant needs to detect a given lift at a given significance threshold. Power analysis answers "if there is a real 5% lift, how many sessions before I'm 80% likely to detect it?" Skip it and you'll run tests that can't possibly reach significance, then conclude (incorrectly) that nothing works. Statistical power below 80% means real winners look like ties. The A/B Testing pillar covers calculation in detail.
Sample size
The number of sessions or visitors required per variant for a test to reach statistical significance at the chosen lift detection threshold. Sample size depends on baseline conversion rate, minimum detectable effect, significance level, and statistical power. Tests that ship without a pre-calculated sample size are gambling. The most common ecommerce CRO failure is launching a test, watching the dashboard, and stopping when "it looks significant" at half the required sample.
Sequential testing
A class of statistical methods that allows valid early-stopping decisions during a live test, rather than waiting until the pre-calculated sample size completes. Sequential testing trades a small reduction in detection power for the ability to call winners or losers early without inflating false-positive rates. Methods such as group sequential designs (Pocock, O'Brien-Fleming) and always-valid p-values (Johari et al.) are the foundations the modern testing platforms build on. Used well, they shorten test cycles. Used as cover for impatience, they produce the same false positives as ordinary peeking.
Statistical significance
The probability that the lift you've measured is real rather than random noise. Most agencies test at 95% confidence (a 1-in-20 chance the result is fluke). GoGoChimp tests at 99%, a 1-in-100 chance. The stricter threshold means we kill borderline winners, but the winners we ship hold up at scale. Significance is the single most-misused word in CRO. A "significant" result that ignores sample size, peeking, or confidence interval is worse than no result.
Stopping rule
The pre-declared condition under which a test ends. A proper stopping rule names the sample size, the significance threshold, the maximum runtime, and the action on inconclusive results. Without one, tests get stopped at the moment the dashboard flatters the operator's hypothesis, which is the textbook recipe for false positives. The OperatorAI methodology fixes the stopping rule before launch and holds it even when the dashboard wants to celebrate.
Tier B: AI CRO terms
This is where most of the industry vocabulary breaks down. "AI CRO" sounds like one thing and is actually two: a class of self-serve software, and an operator-led delivery model that uses the same software very differently. The distinction is worth roughly 5× lift. Linked to the AI CRO pillar.
AI CRO
Conversion rate optimisation that uses artificial intelligence (large language models, predictive analytics, autonomous testing agents) to generate hypotheses, draft variants, run experiments, and personalise experiences. AI CRO is a category, not a methodology. The AI CRO pillar covers the full stack we use. Inside the category, the 4–7% (self-serve) vs 28–34% (operator-led) gap from Build Grow Scale's 347-store research is the most important number to understand before buying anything.
AI-generated copy variant
A headline, button label, product description, or page section drafted by a large language model, then deployed as a variant against a human-written control. AI-generated copy is fast and cheap to produce. The GoGoChimp Sprint engagement ships 10 such tests in two weeks. The bottleneck isn't generation; it's selecting which variants are worth testing. The operator's job is to filter LLM output down to the 1–2 variants tied to a real revenue hypothesis, not to ship every plausible draft.
Autonomous testing agent
An AI system that designs, launches, monitors, and concludes A/B tests with limited human input. Autonomous agents handle traffic allocation, sample-size watching, winner declaration, and the queueing of follow-up tests based on prior results. They sit at the productive end of the AI CRO category when paired with operator-set hypotheses. They sit at the wasteful end when given a goal like "increase conversions" and turned loose. Autonomous testing is included on the Scale tier.
Expert-guided AI CRO
The operator-led approach: a senior CRO practitioner sets the hypotheses, prioritises tests by expected revenue impact, and interprets results. AI generates variants, runs experiments at scale, and surfaces patterns. The combination delivers an average 28–34% conversion lift in Build Grow Scale's 347-store research. The expertise is the differentiator; the AI is the throughput multiplier. This is what GoGoChimp ships as OperatorAI.
LLM-driven hypothesis generation
Using a large language model (Claude, GPT, Gemini) to propose testable hypotheses based on heatmap data, session recordings, conversion analytics, and customer reviews. Done well, it surfaces hypotheses an individual operator might miss because the model has read more behavioural research than any one human. Done badly, it drowns the test backlog in plausible-sounding but revenue-irrelevant ideas. The operator's filter is what separates 4–7% lift from 28–34%.
OperatorAI methodology
GoGoChimp's productised implementation of operator-led AI CRO, the proprietary delivery system Chris McCarron has refined across 13 years of CRO engagements. Distinct from OpenAI's Operator agent product released January 2025: OperatorAI is a CRO methodology (a system of work for an experienced practitioner directing AI testing tools); OpenAI Operator is an autonomous web agent. The two share linguistic surface similarity but are unrelated. An operator sets the hypotheses, AI runs 30+ experiments per quarter, the operator calls winners at 99% statistical significance. OperatorAI is built on top of The 347 Method industry research. It's how we delivered Enzymedica's 3.4% to 16.9% conversion shift. Full breakdown on the OperatorAI methodology page.
Predictive heatmap
A heatmap generated by AI models trained on eye-tracking and click-data corpora, predicting where users will look and click on a page before any real visitor arrives. Predictive heatmaps cut the diagnostic loop. Instead of waiting six weeks for traffic to populate a Hotjar heatmap, you get a defensible attention map in minutes. They're a hypothesis-generation tool, not a winner-declaration tool. Included on the Growth tier.
Self-serve AI CRO
The DIY end of AI CRO: software you sign up for, plug in, and let run without an experienced operator setting the testing agenda. Self-serve tools (Mutiny, Intellimize, Fibr, and the AI features inside VWO and Optimizely) deliver an average 4–7% conversion lift in Build Grow Scale's research. The software works. It runs the tests it's asked to run. The problem is that the tests are usually the wrong ones. The 4-to-34% gap article covers why.
The 347 Method
Industry research conducted by Build Grow Scale across 347 e-commerce stores doing $300K–$8M/month, published in their 2026 CRO Trends Recap. The headline finding: skilled CRO specialists using AI as a force multiplier saw 28–34% conversion improvements, while self-serve AI tools delivered 4–7%. The same software, applied differently, produces ~5× the result. "The 347 Method" is GoGoChimp's branded reference to the underlying Build Grow Scale research; Build Grow Scale's own post does not use that phrase.
Tier C: Page speed and Core Web Vitals
Speed is conversion. BeeFriendly Skincare's revenue went from $48,000/year to $1,447,225/year off a 2.24-second page-speed reduction: same product, same traffic, ~30× revenue multiplier. Linked to the Page Speed pillar. For a free starting diagnosis, run our Free Page Speed Audit Tool.
Above-the-fold
The portion of a webpage visible without scrolling on the visitor's device. The content above the fold loads first, gets seen first, and carries disproportionate conversion weight. On mobile, the fold is roughly 600–700 vertical pixels on a typical 6-inch phone in portrait orientation, which is half a typical hero section. If your value proposition, primary CTA, and trust signal don't all fit above the fold, you're losing conversions to people who never scrolled.
CLS (Cumulative Layout Shift)
A Core Web Vitals metric that measures unexpected layout movement during page load: images popping in, ads injecting, fonts swapping and resizing text. CLS is scored 0 to 1+; Google's "good" threshold is below 0.1. High CLS makes users misclick, abandon, and distrust the page. Common causes: unspecified image dimensions, web fonts loaded after first paint, and ad slots that appear after content renders. Fixed in code, not in the dashboard.
Core Web Vitals
Google's three-metric standard for real-world page experience: LCP (loading), INP (interactivity), and CLS (visual stability). Core Web Vitals are a ranking signal in Google search and a strong correlate with conversion rate. Sites that pass all three thresholds on 75% of real visits earn a "good" rating; everyone else is fixing something. The Page Speed pillar breaks down the per-metric fixes. Field data (real visits) beats lab data (Lighthouse) for Core Web Vitals because Google scores you on what users actually experience.
Critical render path
The sequence of resources a browser must fetch, parse, and execute before it can paint the first usable view of a page: HTML, render-blocking CSS, render-blocking JavaScript, then layout, paint, composite. Optimising the critical render path means inlining critical CSS, deferring non-critical JavaScript, and removing render-blocking third-party scripts. Every redundant resource on the path delays the user's first interaction by tens to hundreds of milliseconds, which, at scale, is conversion lost.
INP (Interaction to Next Paint)
The Core Web Vitals metric that replaced FID (First Input Delay) in March 2024. INP measures the latency between a user's interaction (tap, click, key press) and the next visual response from the page. Google's "good" threshold is below 200 milliseconds. High INP means heavy main-thread JavaScript blocking response, typically too many third-party tags firing on interaction, or React/Vue components doing too much work synchronously.
LCP (Largest Contentful Paint)
The Core Web Vitals metric measuring how long it takes the largest visible content block (usually the hero image or hero text) to render. Google's "good" threshold is under 2.5 seconds. LCP is the single highest-leverage Core Web Vital for conversion rate because it correlates directly with perceived page-load speed. The fashion-brand LCP case study shows how an LCP fix recovered revenue without touching anything else.
The 7% rule
The industry-standard finding that every additional second of page load time reduces conversions by approximately 7%. The figure is attributed to Akamai's 2017 State of Online Retail Performance report, and it holds up across most Shopify and WooCommerce engagements I've seen. It's the reason page speed sits inside the CRO discipline rather than next to it. BeeFriendly Skincare's $48K-to-$1.44M jump came from the inverse mechanism: a 2.24-second reduction unlocking 30× revenue.
TTFB (Time to First Byte)
The time between a user's request and the browser receiving the first byte of the server's response. TTFB is the upstream foundation of LCP, because every millisecond of slow TTFB pushes LCP later. Google's "good" threshold is under 800 milliseconds field-measured. Slow TTFB usually means slow server, missing CDN, missing edge caching, or a heavy origin platform (Magento, unoptimised WordPress). Fixed at the infrastructure layer, not in the front end.
WebP
A modern image format from Google that delivers, on average, ~26% smaller file sizes than JPEG at equivalent visual quality, with native support across all modern browsers. The 26% figure comes from Google's own WebP study comparing lossy WebP to JPEG at equivalent SSIM quality. Converting JPEGs and PNGs to WebP typically cuts image weight by a quarter without any visual trade-off, making it one of the highest-leverage page-speed wins available. The BeeFriendly Skincare engagement used WebP conversion as part of the speed package that drove their 30× revenue multiplier.
Tier D: Behavioural and psychology terms
Most "CRO best practice" is downstream of behavioural science most operators haven't read. The terms below are the ones that genuinely change how I write copy and lay out pages. Linked to Ecommerce Psychology and the Copywriting Frameworks post.
Anchoring effect
A cognitive bias where the first number a person sees disproportionately influences their judgement of every subsequent number. Show a £499 product first and a £199 product looks affordable; show the £199 first and it looks expensive. Anchoring is why pricing pages list the highest tier first, why "compare at" prices exist, and why crossed-out RRP works on landing pages. Originally documented by Tversky and Kahneman in the 1970s (Tversky & Kahneman, 1974).
Cognitive dissonance / cognitive battle of consumption
The mental discomfort of holding two contradictory beliefs at once, for example "I should save money" and "I want this £200 jacket." Behavioural copywriting either heightens dissonance (urgency, scarcity, "what is your time really worth?") or resolves it (testimonials, guarantees, free returns). The "cognitive battle of consumption" framing treats every purchase decision as an internal argument the visitor is having with themselves; the copy's job is to win the argument on the side you want. The upstream theory comes from Festinger's 1957 A Theory of Cognitive Dissonance, which defined the original mental-discomfort mechanism the consumption framing builds on.
Decoy pricing
A pricing structure that introduces a deliberately unattractive third option to make one of the other two look obviously correct. The Economist's classic example: web-only at $59, print-only at $125, web-and-print at $125. Almost no one buys print-only; its job is to make the bundled option look like a steal. Decoy pricing works because most pricing decisions are relative, not absolute. Used carefully on SaaS pricing pages, it shifts the centre of gravity to the tier you actually want sold (Ariely, 2008).
F-shaped reading pattern
Eye-tracking research from the Nielsen Norman Group showing that on text-heavy webpages, users typically scan in an F-shaped path: read the top horizontally, drop down, read part of the next line, then scan the left edge vertically. The implication for copy: the first two words of every line, every heading, and every bullet do disproportionate work. Bury the value proposition mid-sentence and most visitors never see it. The pattern weakens on well-designed pages with clear hierarchy and strong subheadings.
Loss aversion
The behavioural finding that the pain of losing something is psychologically about twice as strong as the pleasure of gaining the equivalent thing. "Don't miss out on £40 off" outperforms "Save £40" because the framing hits the loss-aversion circuit. Loss aversion is why 30-day refund guarantees increase conversions (they remove the loss), why exit-intent popups work (they invoke the loss of the discount), and why cancel-flow save-offers convert. From Kahneman and Tversky's prospect theory (Kahneman & Tversky, 1979).
Review tense (present vs past)
A copy detail with measurable conversion impact: reviews written in the present tense ("I use this every morning, my skin is smoother") outperform reviews written in the past tense ("I used this for two weeks, my skin got smoother"). Present-tense reviews suggest ongoing usage and durable benefit; past-tense reviews implicitly close the story and signal the reviewer may have moved on. The mechanism sits in the temporal-construal literature pioneered by Trope and Liberman: present-tense framing reduces psychological distance, which is the lever the conversion lift rides on. When sourcing reviews for product pages, prompt for present-tense framing where authentic.
Scarcity (real vs fake)
A persuasion principle: humans value things more when they appear limited in availability. Real scarcity ("only 3 left in stock," verified against inventory) increases urgency and conversion without harming trust. Fake scarcity ("only 3 left!" displayed to every visitor, every time) increases short-term conversion but craters retention, review scores, and brand trust the moment users notice. The principle traces to Cialdini's Influence (Cialdini, 1984). Use real scarcity. The fake kind is a tax you pay later.
Texture gradient psychology
The visual perception principle that surfaces with finer texture appear closer and more tangible than smoother surfaces, exploited in product photography and packaging design to make products feel touchable on screen. Texture-rich product imagery (visible weave on fabric, grain on leather, condensation on a drink bottle) increases perceived quality and add-to-cart rates compared to flat, over-retouched alternatives. The principle traces to J.J. Gibson's ecological psychology of visual perception, which established texture gradient as a primary depth cue the visual system uses to infer surface tangibility.
Visual depiction effect
The finding that products shown in their expected use position (a coffee mug with the handle facing the dominant hand of the viewer, a soup bowl with the spoon on the right) increase purchase intent compared to neutrally posed product shots. The effect is strongest when the imagery primes mental simulation of the product in the viewer's hand. The original academic source is Elder and Krishna's 2012 paper in the Journal of Consumer Research, which established the mental-simulation mechanism behind the lift. On Shopify product pages, swapping flat-lay product shots for use-position shots is a high-leverage test.
How GoGoChimp uses this glossary in client work
Every term here connects to a place in the testing programme. When we audit a Shopify store, we're checking LCP, CLS, and INP against the Core Web Vitals thresholds. When we draft AI copy variants, we're filtering by hypothesis, not by which one sounds clever. When we call a winner, it's at 99% statistical significance with a pre-declared stopping rule, not at the moment the dashboard turned green.
The 347 Method proved the approach. OperatorAI is how we deliver it.
The vocabulary matters because the work is precise. A self-serve AI tool used without operator-led hypothesis filtering caps at 4–7% lift. The same tool, used with the discipline this glossary describes, delivers the 28–34% range. The difference is the operator, and the operator's edge is partly that they know what these words actually mean.
FAQ
What's the difference between an A/B test and a multivariate test?
An A/B test isolates one variable (control versus a single variant) to attribute lift cleanly. A multivariate test varies several elements at once and measures every combination, surfacing both the winning combination and the contribution of each individual element. Multivariate tests need much larger sample sizes because traffic splits across many cells. Run A/B first; multivariate only when you've already shipped winners and want to interrogate combinations.
Why does GoGoChimp test at 99% statistical significance instead of 95%?
Most agencies test at 95% confidence, a 1-in-20 chance the result is fluke. GoGoChimp tests at 99%, a 1-in-100 chance. The stricter threshold means we kill borderline winners that wouldn't hold up at scale, and the winners we ship reliably move revenue. It's slower in the short term, but the false-positive rate falls, and the cumulative compounding lift over a 12-month engagement is materially higher.
What's the actual difference between self-serve AI CRO and expert-guided AI CRO?
Self-serve AI CRO is software you sign up for, plug in, and let run without operator oversight. It delivers 4–7% average conversion lift in Build Grow Scale's 347-store research. Expert-guided AI CRO is the same software with an experienced operator setting the hypotheses, prioritising tests by revenue impact, and interpreting results. It delivers 28–34%. The software is identical. The 5× delta is the operator.
Which Core Web Vital matters most for conversion rate?
LCP (Largest Contentful Paint) carries the most direct conversion weight because it correlates with perceived page-load speed, which Akamai's research links to roughly 7% conversion loss per extra second. INP and CLS matter (slow interactions and visual jank both cost conversions), but if you're triaging a single fix, LCP is where to start. Google's "good" threshold is under 2.5 seconds field-measured.
What's the OperatorAI methodology in one sentence?
OperatorAI is GoGoChimp's productised implementation of operator-led AI CRO: an experienced operator sets the hypotheses, AI runs 30+ experiments per quarter, the operator calls winners at 99% statistical significance, together delivering the 28–34% lift range Build Grow Scale's research established as the upper bound of the AI CRO category.
What's the cheapest CRO win most ecommerce stores miss?
Image weight. Converting a Shopify CRO store's hero and product imagery to WebP at correct render sizes typically cuts page weight 60–80% and pulls LCP under 2.5 seconds, which under the 7% rule recovers conversion at scale. The Affordable Golf engagement moved homepage LCP from 21.3s to 6.1s through this single category of fix. It's a code change, not a redesign, and it usually ships in under a week.
How long does an A/B test need to run?
Long enough to reach the pre-calculated sample size at the chosen significance threshold, with a minimum of two full business cycles (two weeks for most ecommerce, sometimes a full month for B2B SaaS) to absorb day-of-week and weekly seasonality. Tests that ship in three days are gambling. Tests that run for three months without a stopping rule have usually been peeked at and stopped already, just unofficially.
Next step: If you've read this far and you're spending £10K+/month on traffic with a conversion rate that hasn't moved in 12 months, run our free 15-minute AI audit. You'll get your page speed revenue impact, a predictive heatmap of your homepage, and three AI-generated headline alternatives. We'll tell you whether OperatorAI fits before you pay anything.
The 347 Method proved the approach. OperatorAI is how GoGoChimp delivers it.
Want us to do this for your site?
Book a free AI audit. 15 minutes. We’ll show you three things your site is missing and what we’d test first.
Book my free AI audit →



