All Posts

The Metrics You're Not Watching Are the Ones That Will Hurt You

Blog

Jun 4, 2026

Eggs and milk spilled on a kitchen counter next to a stove and dishrack.

The Metrics You're Not Watching Are the Ones That Will Hurt You

A test "wins." The primary metric is up, the confidence interval clears, and the team ships it. Three weeks later, support tickets are climbing and nobody connects the two events. The renovation made the living room beautiful and left the kitchen smelling of gas.

This is the most common failure mode in CRO programs, and it has nothing to do with statistics. It's a framing problem. When a single number decides whether an experiment succeeded, that number becomes the only thing anyone looks at, and the rest of the system gets to degrade quietly.

A primary metric should never travel alone

Every experiment needs a metric hierarchy, not a metric. Three layers do most of the work.

The primary metric answers the question the test was designed to answer: did users advance to the next step, complete the purchase, submit the form. It decides whether you keep the change.

Secondary metrics explain the behavior around that change. Clicks on adjacent options, interaction with a module you didn't touch, the path users took to get to the conversion. These don't decide the test, but they tell you why the primary moved the way it did.

Guardrail metrics are the health indicators that aren't allowed to get worse. Cart abandonment, post-purchase cancellations, average order value, time to load. They're the handbrake. A change can lift the primary by ten points and still be killed by a guardrail that went red, because the guardrail is measuring whether the lift was real or borrowed from somewhere else in the funnel.

Most teams define the primary in the test plan and treat guardrails as an afterthought, if they define them at all. The order should be reversed. Decide what you refuse to break before you decide what you're trying to improve.

When a win is a loss

Three patterns show up again and again, and all three look like success on the dashboard you were watching.

The aggressive CTA. You rewrite a button to be louder and more insistent, and clicks jump. The secondary metric looks great. Then the next-step completion rate drops, because the louder button pulled in users who didn't understand what they were clicking. You moved volume from informed clicks to confused ones and called it a lift.

The forced upsell. You push ancillary products harder during checkout and ancillary attach rate climbs. Real revenue per visitor, though, falls, because the added friction nudged a chunk of carts into abandonment. You optimized one line item and taxed the whole transaction.

The removal of useful friction. A confirmation step, a clarifying screen, an extra field gets cut to "streamline" the flow. CTR through the funnel improves immediately. What the test window is too short to show: lead quality dropped, or cancellations rose, because some of that friction was doing the job of making sure users knew what they were signing up for. Not all friction is waste. Some of it is the product explaining itself.

In each case the primary metric did exactly what you hoped. The damage was sitting in a number nobody promoted to a decision criterion.

Squeeze versus optimize

Under deadline pressure, it's easy to mistake movement for progress. Manufactured urgency, dark-pattern defaults, a countdown timer that resets when you reload the page — these reliably move a primary KPI. They are squeezing the funnel, not optimizing it.

The difference shows up in the guardrails, on a delay. Churn, refund requests, support complaints, and unsubscribe rates are where borrowed conversions get paid back. A squeeze tactic produces a clean uplift in week one and an erosion of trust that doesn't fully register until the experiment is long closed and the result is already in the deck as a win.

A working rule: if a change can't survive being explained to the user in plain language, it isn't an optimization. It's a tax you're collecting before they notice.

Guardrails aren't only business metrics

Technical health belongs in the same category, and it's the part most experiment plans skip.

Page performance is a guardrail. A variant that adds 400ms to load time can post a higher conversion rate in the test and still cost you money at scale, because the latency hits every user while the conversion lift hits a subset.

JavaScript errors are a guardrail. If your variant raises the console error rate, that's red, full stop, even if the funnel looks healthy. You're shipping a result built on top of users who happened not to hit the broken path.

Segmentation hides the rest. A guardrail can be perfectly healthy on desktop and on fire on mobile, and the aggregate will average the fire down to something that looks survivable. Check the segments before you trust the total — the traffic that converts and the traffic that's most fragile are rarely the same devices.

The point was never to win tests

The goal of an A/B testing program is not a high win rate. It's making better decisions with less smoke. A team that ships every "winning" test without checking what those wins cost isn't running experiments, it's running a justification engine.

Calling a test inconclusive because it hurt a guardrail is a sign of professional maturity, not failure. It means the measurement system is doing its job — catching the change that would have looked good in the readout and bad in the quarterly numbers. The people who report those results honestly are the ones whose programs compound, because the wins they do ship are real.

Don't optimize for the click. Optimize for the decision.