[Metrics] A/B Testing, Feature Flipping and going too far

March 17, 2013 Graham Siener

A/B testing is probably not worth your time. When you start hooking metrics up to your product, the feedback is addictive. All of a sudden you’ve got lots of actionable data and you’re tacking validation goals onto feature stories. This is great, but I implore you to not take it too far.

You’ve probably read stories proclaiming how effective A/B testing is for Twitter, 37 Signals and even the Obama campaign. There’s no shortage of third-party services that boast one-click setup via javascript snippet and claim to deliver a double digit boost to your bottom line.

I talked in my last post about the concept of opportunity cost and it’s with this lens that I view excessive testing and experimentation. If you are still in growth mode, you’re still figuring out what A is. There’s too much at stake (and too few developer cycles) to distract yourself with subtle experiments that are ripe for invalidation by small sample sizes, statistical insignificance, and indecision.

“One consequence of this data-driven revolution is that the whole attitude toward writing software, or even imagining it, becomes subtly constrained. A number of developers told me that A/B has probably reduced the number of big, dramatic changes to their products. They now think of wholesale revisions as simply too risky—instead, they want to break every idea up into smaller pieces, with each piece tested and then gradually, tentatively phased into the traffic.” — The A/B Test (Wired)

Sounds like the agile software development process, right? The difference here is that you gain efficiency and transparency by splitting feature work into atomic units of customer value. You risk building broken software when you split features into chores that aren’t customer-focused; similarly, you risk building a broken product when you try to subcompose the UX into lots of trivial tests.

I say all this because I’ve employed A/B testing in a couple of startups and we never got our bang for the buck. At one, we used Optimizely but found the integration points to be lacking[1] when we wanted to focus on anything embedded into our app experience. Landing pages were easy enough to test but acquisition is only one of your challenges.

We then moved to A/Bingo, a framework written by the amazing Patrick McKenzie [2]. This felt like a framework we could grow into, but we were also moving from server- to client-side functionality and we had to shoehorn the testing payloads into a homegrown api. The result was way too much time invested into infrastructure and not enough time delivering more customer value. It still kills me to think about the time we devoted to just getting a great new feature to the starting line.

I then joined a startup that had rolled their own A/B testing for life-cycle and transactional emails. I didn’t even realize this was going on until we started adding KISSmetrics tracking to the emails. What was the result of all of this wonderful testing? It turned out that we weren’t storing any of the results, and had been sending only one variant for the last year. Whoops!

We did have some success with a feature flipper powered by Rollout. A feature flipper lets you enable functionality for specific customers or a controlled subset of your audience. We weren’t using it ambitiously, but it was helpful to have the plumbing in place to deliver new features. I was eager to give it a try, but any changes we wanted to deploy were largely tested and validated before we started building. Perhaps we should’ve tried turning off features that we suspected weren’t valuable, but we never got around to it (limited cycles and all).

I look forward to the day I can enthusiastically get behind A/B testing, but until that day I will encourage anyone that asks what else they could do with their time.

Is A/B testing worth it for you? Do you have any horror stories to share?

[1] Caveat emptor: I haven’t used Optimizely in a few years so I’m not an expert on their current functionality

[2] A/B testing is definitely worth the time to Patrick (because he has found his product/market fit). I encourage you to read through everything he’s written if you’re building a SaaS app.

About the Author

Biography

More Content by Graham Siener
Previous
A Student Asks about Pairing
A Student Asks about Pairing

I gave a talk the other day at the Flatiron School here in NY, and had a great follow-up discussion over em...

Next
Desire for Mobile Apps Driving Private Cloud Adoption
Desire for Mobile Apps Driving Private Cloud Adoption

It’s clear that mobile applications will be a major component of enterprise technology in the coming years....

Enter curious. Exit smarter.

Register Now