When Should You A/B Test?

And even more crucially, when should you refrain?

Jul 19, 2023

Stanford professor Robert A. Burgelman’s principle, dubbed Burgelman's Law by Jim Collins, posits that the most substantial danger is not failure, but rather achieving success without understanding its cause. Many product engineering teams, particularly in larger and more sophisticated companies, might interpret this as a call to A/B test everything. I contest this interpretation. It's not that I don't value statistically significant proof of a successful change, but I believe excessive A/B testing can hamper creativity.

Before delving into the potential pitfalls of a culture that champions incessant A/B testing, let's examine why A/B tests can be remarkably beneficial. From my experience, it's not rare that about 85% of new features prove either neutral — making no difference to the customer experience, or negatively impacting it. A 2017 Harvard Business Review article states, "At Google and Bing, only 10% to 20% of experiments yield positive results. At Microsoft, a third are effective, a third neutral, and a third negative." This data strengthens the argument that almost any modification to our site or app should undergo testing to determine its effect on the customer experience.

However, constantly conducting A/B tests can lead to some problematic scenarios. Teams might execute numerous minor tests to boost the metric of experiment velocity, which measures the number of experiments a team completes within a given timeframe. In other situations, teams might be incentivized through ratings or bonuses based on A/B testing success, leading to them choosing only the most secure, guaranteed routes, even when riskier ones could be more rewarding in the long run. In yet another scenario, A/B tests are assessed quarterly or even bi-weekly. This frequency encourages short-term thinking. Teams might not contemplate initiatives that take three quarters to complete if they're measured more frequently. These are just a few examples of how overemphasis on A/B testing can lead to Goodhart's Law in action: when a measurement becomes a target, it ceases to be a good measurement. This ultimately promotes risk aversion and discourages long-term thinking among teams.

Tal Raviv, a Product Manager at Riverside, an online recording studio company, wrote in a Medium article that he "actively discourages teammates from conducting experiments." He even offers a decision tree for determining when to conduct an experiment. While I disagree with the decision tree — mainly because he questions the necessity of a "well-formed hypothesis" which I believe is crucial before any development — I do concur with his main point: not everything needs an A/B test. In my interpretation of Jim Collins' reference to Burgelman's Law, one should have a well-formed hypothesis before any action. This approach is practical and can be easily implemented by teams. Changes shouldn't be made haphazardly as they can often do more harm than good. However, not all changes need to be validated through A/B testing. As discussed earlier, this can lead to a slippery slope towards risk aversion and short-term thinking.

So when should you consider A/B testing, and when should you avoid it? Applying the 3X framework, the higher we ascend on the S-curve, the more we should employ A/B testing. If you're optimizing functionality and extracting value, testing should be a part of the process. However, if you're seeking product-market fit or launching significant new functionalities, you are likely to see an initial drop in established business metrics. Customers unfamiliar with the new navigation might not recognize the benefits of the service, etc. Rather than testing how this change impacts the conversion rate (which is almost certainly negative initially), look for other indicators of long-term success like engagement with the new feature. Suppose you're working on an existing e-commerce site. If your team is refining the current checkout flow by adjusting button placements, A/B testing these changes to verify their effectiveness is a wise move. If you're introducing a new service like a gift registry, A/B testing is less applicable. Instead, monitor user engagement with the new service and, if you take a short-term hit on conversion rate due to changed navigation, work on recouping that after the successful launch of the new service. This balanced approach — applying A/B testing when beneficial and avoiding it when unnecessary — allows teams to be more adventurous and think long-term.

I personally advocate for A/B testing and believe it delivers immense value. Most early-stage product teams could benefit from more A/B testing. Conversely, more mature, larger teams can sometimes overdo it. There's a happy medium to find. A/B testing is a tool, not a principle. Principles, like "starting with customer outcomes that are grounded in insights and state hypotheses about those outcomes," guide your teams in their everyday projects. Tools, including surveys, interviews, and mockups, help your teams achieve their objectives. A/B testing is just another tool, and it should be used appropriately. Give your product engineering teams a plethora of tools, lest you end up like the person who only has a hammer and thus perceives every problem as a nail.

Mike Grabowski

Sep 5, 2023

Have had first-hand experience with this when I was a PM and improving experimentation velocity was a metric set within an objective-- "Teams might execute numerous minor tests to boost the metric of experiment velocity, which measures the number of experiments a team completes within a given timeframe". It was the opposite of being outcome focused, which is critical. This was a great read. Thanks.

Expand full comment

Insightful . "A/B testing is a tool, not a principle. Principles, like "starting with customer outcomes that are grounded in insights and state hypotheses about those outcomes," guide your teams in their everyday projects." Love this point and overall super insightful. Thanks for sharing Mike.

Fish Food for Thought

Discussion about this post