Why AB and Multivariate Testing Sucks
Visit the Invesp blog for more top notch articles. I’d love to hear from you, so please don’t forget to like, share, and comment on this article!
I’ve written quite a few articles on Medium, and dabbled a bit in topics that revolve around conversion rate optimization, the area that I specialize in. Today we are going to dive in! I am going to be explaining the ins and outs of it so you can use it for your business!
Let me give you a bit of a primer if you’re new to CRO.
Conversion rate optimization is the system of increasing the number of visitors that “convert” or take a desired action on a webpage or application.
In order to know whether or not your hypothesis behind increasing conversions is valid, you can test it by conducting either an AB or MVT test.
An AB test or what’s known in the digital marketing industry as split or bucket testing, is a method of comparing two versions of a webpage or app to see which performs better.
An MVT is technique to test multiple variables within a test. For example 2 different headlines and 2 different images. This can mean a total of 4 different variations.
Let’s get into the meat of it.
AB and MVT testing isn’t for everyone
After conducting 4000+ AB and MVT tests with our clients, I am confident in saying:
“AB and MVT testing isn’t for everyone.”
Unfortunately, the tide of “Always be Testing” and following best practices has muddled the true process and purity of testing.
An AB or MVT test takes more than just creating a variation that resembles somehow the original with a few additional changes. It’s actually about identifying a problem and providing a hypothesis, backed with data, to build a new experience than the original offers.
This article offers a peek at some of the dark areas that your AB or MVT testing program can fall into, whilst offering advice on how to avoid some of the common pitfalls that stand in the way of optimizing for conversions.
If you can’t macro, micro
The harsh truth is that, if you want to see meaningful A/B tests, you need to do at least 200 to 300 conversions per month. Not all companies have that number of conversions. Does this mean smaller companies with less than 200 conversion per month should not pursue testing?
The answer is to look at conversion goals differently. Every test has a conversion goal. Most often the goal is the ultimate final action you want the user to take, such as purchase, or subscribe to your site. That’s what is called a macro goal.
For a lower traffic, low conversion website, consider the conversion goal to be either an event or next page visit (a micro goal).
Why low traffic poses a challenge?
If your website receives only a few visitors per month, it will be difficult to reach statistical significance to determine which variation is the winner. Statistical significance is tied with reaching a certain sample size.
To illustrate this better, let’s imagine that you have launched a test where you changed the text of call-to-action button from ‘’sign up now’’ to ‘’join us now’’. After a certain period of time you noticed an improvement of 10% for the variation.
Before naming this variation to be the ‘’winner’’, you need to be certain that you have the required sample size and make sure that this uplift in conversion is not by chance and you have indeed reached statistical significance of 95%+. Here is an example:
· Original: 2900
· Variation: 2900
Total of visits needed to conduct the test: 5800
Basing your next changes on faulty statistics can impact you negatively and increase the chance of launching a false positive as a winning variation.
Sometimes, setting the goal as a next event or page on the site can help you reach the correct sample size for a statistically sound test. This may be a consideration for you. Or if you decide to run a specific campaign, you make get an influx in traffic which can help you reach that sample size.
In some cases you can run your test for 2–3 months to reach statistical significance, but that raises the chance of data pollution, which is explained next.
“An Inconvenient” Problem
The truth is, there are a bunch of factors that can pollute your data to make your A/B test results less reliable or invalid (and increase the chance of launching a false positive). It’s a common issue among marketers that they are not even aware of. To avoid falling into the same trap it’s important to understand the different types of data pollution such as biased sample, length pollution, pollution due to external factors, and 10 other types.
Dismissing and not accounting for these factors (as well as not following a process), is the reason why Google reports on 13% of tests provide positive uplifts.
The thought of rapid testing is fine, so long as there is a defined process and a hypothesis backed with data for every test. Considering the above will also ensure that your experiment will give sound conclusive, statistically significant results.
Visual editors are a lie
Just because visual editors make it easy for you to test, it doesn’t mean that you have to jump into testing variables that will add no value to your brand and make no difference.
Visual editors are only good when you need to test something easy and quick without any professional help. In any other case you’ll be wasting your time and money by relying on the simplicity of tests that a visual editor can offer, instead of your development team.
Out of all the tests we have implemented so far, only 5 to 10% have been implemented using only a visual editor. The point is, if you just need to rearrange some elements, add/remove images, modify text, change colors/fonts, the editor can be enough. This is true when your test hypothesis is sound. If it isn’t it doesn’t matter how well places those elements are.
If your test requires some complicated layout, modifications of dynamic elements such as sliders, carousels or popups, frontend developer skills are required to implement such a test. You can’t rely on just the editor.
AB testing is only good when you test what matters and represents the company’s OKRs. Your test variations should make a huge change to the experience and substantially impact the business goals and outcomes in the future. But how would you know what is best to test?
Clearly AB testing doesn’t happen in a vacuum and requires intense qualitative and quantitative research to uncover problems and opportunities.
Forget about imitating your competitors, it will get you nowhere. Don’t get near implementing best practices! It’s not about the solution of the best practice, but the problem they were solving maybe worth considering.
There’s no golden rule when it comes to optimization
Every website is different and unique and requires something different. Choosing your tests based on what you or the HIPPO at your company (highest paid person’s opinion) thinks isn’t going to work either.
Following The conversion framework will help you discover and spot what really matters to your users.
As you walk through the pages of your site and compare them against these principles, you’ll be able to determine quick fixes, opportunities for growth, and areas that need further research to determine the viability behind testing them.
Other methods of collecting data about the health of your site and areas that need to be fixed or tested include:
- Conducting Heuristic Evaluation
- Conducting Qualitative and Quantitative Analysis
- Conducting Competitive analysis
- Creating Conversion Roadmap
It’s not all fun and games
Failing is not always negative. Of course, only if you are able to learn from it and move forward. But this sounds pretty broad and general.
Well, in A/B testing, failing is inevitable; with a very low rate of success, which can be estimated around only 13% of the sum of tests that you are conducting. This means that majority of tests are a big let down and you have to work double hard.
But even worse, sometimes you reach sample size and statistical significance, yet still the test is a false positive and drives down your stats after launching that new design.
If you follow the above advice, you will not have such low returns and low success. This is an average based on poor practices within the AB and MVT testing industry.
Remember, any testing program requires meticulous attention to detail and follow-up. You can’t just launch a test and forget about it. You need to be constantly reviewing and considering how to modify and improve your site.
Did you find this article interesting? Please like and share.
You can also read more great content on the Invesp blog. Or, if you’re the visual type, find great content and information on our slideshare accountand YouTube channel. Connect with us on Google+ and twitter. You can also sign-up for our amazing, all-in-one growth-hacking CRO tool absolutely FREE: Visit FigPii for access to our new testing engine, heatmaps, exit intent pop-ups, unlimited resources, and other great features.