Does The AI Test Replace The AB Test?

Posted in  Blog   on  August 31, 2017 by  Marc

Artificial Intelligence (AI) is hot. Not a day goes by without some article about a ground-breaking new AI application. For laypeople – like myself – it often sounds like abstract hocus pocus. And frankly, sometimes it’s flat-out scary shit. Not surprising, then, that big names like Stephen Hawking, Elon Musk and Steve Wozniak are warning about the dangers of AI.

AI IN CONVERSION OPTIMISATION

AI is also making its entry into conversion optimisation, specifically in testing. Relax, it’s a harmless application, not something Hawking and friends need to worry about. It is an AI test tool.

The tool was developed a while ago by Sentient Ascend and appears to be on its way to duking it out with the AB test tools. But is this technology a competitor able to compete with the AB test tools from the market? We spoke with Sentient Ascend, and this is our own humble opinion.

AB VS MVT

To clarify how the AI tool works, we need to take a detour along the classic AB test and the multivariate test (or MVT)…

You can compare the AI test tool with an ordinary AB test or MVT tool: You add a snippet code to your site, and then create versions for certain pages in the tool. You let the tool run and analyse the results at the end of the testing period to determine which version is the winner.

An AB test tool needs sufficient time to determine which version is the winner. Depending on which tool you use, this is usually based on ‘frequentist statistics’ or ‘Bayesian statistics’. So, statistics. With Bayesian, you typically get results slightly faster, but, depending on the amount of traffic and conversions you have, you will have to let an AB test run at least 2 weeks, and in many cases even 3 to 4 weeks.

With most AB test tools, you can also set up multivariate tests (MVT). Then you create e.g. 2 headlines and 2 different images. The test tool then makes all possible combinations: headline 1 with image 1, headline 2 with image 2, headline 1 with image 2, and headline 2 with image 1. The outcome of such a test is – just like the AB test – calculated on the basis of ‘Bayesian statistics’ or based on ‘frequentist statistics’. As a result, you need a whole lot of traffic to complete such an MVT within an acceptable period because you can make many different combinations. Each combination is actually a new version.

Multivariate testing

To explain it simply, you do not create a normal A/B test in this example, but an A/B/C/D test. This example with 2 elements (headline and image) is still a fairly limited MVT. Often, many more elements are tested. And then the number of possible combinations rises very quickly.

And because each version needs sufficient traffic and conversions, an MVT has to run for a very long time before you can draw any useful conclusions.

WHEN AB AND WHEN MVT?

To understand when AI tests are useful, it is important to understand correctly when an MVT is useful.

An important criterion: You need a lot of traffic for MVTs. If you only have just enough traffic and conversions for AB testing, then the decision is simple: MVT is not for you. If you do have enough traffic (and then we are talking about hundreds of thousands of visitors or more per month), MVT might be interesting. In certain cases.

An AB test is often much more interesting than an MVT test. Because you can test much more drastic changes with an AB test (e.g. testing 2 completely different pages against each other). And if you test bigger changes, they usually also deliver bigger results. In addition, an AB test is faster than an MVT and – last but not least – an AB test gives you much more insight about your visitor. And that is often a very good basis for coming up with even better hypotheses for further tests and achieving even more results.

An example: Suppose your shopping cart page is problematic. The visual hierarchy is not correct, so the flow from the shopping cart page is not good, as in this example. (Disclaimer: This is not a customer of ours, so we cannot see the data; we are only using this as an illustration.):

Now you could make a number of possible combinations with an MVT by combining various elements on the page differently.

 

However, because there may be several issues on this page, it may be more useful to set up a much more vigorous test and to test the whole shopping cart page against a completely different page in which we entirely change the visual hierarchy. Because your B version looks dramatically different from your A version, there is a real chance that you will also see a greater difference in the conversion rate. And if your B version wins, then you know where the problem is, namely in the visual hierarchy. In other words, you learned something from your test. And you can use that insight in subsequent tests. To start, you can set up a follow-up test on the same page, in which you try to get the visual hierarchy even better. But you can also ask on other pages whether the visual hierarchy is a problem, and set up possible tests. We learned from the test that the visual hierarchy causes problems for our visitors. And it may be that this does not only apply for the shopping cart page.

With an MVT, the result would have been: ‘This combination of elements works best’. But why that’s the case is anyone’s guess. As a result, you cannot build on the insights from your test. And you cannot come up with new winners based on those insights.
An MVT is therefore primarily interesting as a follow-up test for fine tuning specific things. You test with an AB test (or a series of AB tests) until you have found the optimal layout, and then fine tune further with an MVT test.

AI: MVT ‘ON STEROIDS’

As with an MVT test, you can combine a great many different elements with an AI test from Sentient Ascend. The tool then looks at how well or how poorly the different combinations of the different elements convert.

But the big difference with Sentient Ascend lies in the calculation of the winners. They do not use a classic statistical calculation (Bayesian or frequentist statistics) for this, but determine the winners, uh, in a different way… Frankly, for non-AI specialists, how the technology really works is quite fuzzy. Or better said: Exactly how a winner and a loser is determined.

This is how you work with Sentient Ascend: First, set up everything you want to test in the tool. For example, suppose you want to test 3 elements: headline, CTA and image. And then: You have 4 versions of the headline, 5 versions of the CTA and 3 versions of the images. The tool then tests each of the elements separately. If, for example, you want to test 4 headlines, Sentient first looks at which of the 4 converts best. The same applies for the CTA and the images. After the first round, the tool knows which version of each element has ‘good genes’ (as they call it themselves). Or in other words: Which version of each element converts best? The tool will then combine the winning elements, e.g. a winning headline with a winning image, a winning image with a winning CTA, etc. The winners will then be combined with other winners. The site thus evolves towards the strongest possible version of itself. That is why at Sentient Ascend they compare it with natural evolution.

It may not be entirely transparent how the winners are calculated, but what is certain: the AI test tool determines much faster than an AB or MVT test tool which version performs better. Which not only means that you get faster results and can test more combinations, but also that you need less traffic than a classic MVT to reach a conclusion. In that sense, you might see Sentient Ascend as ‘MVT on steroids’.

DISADVANTAGES

If you ever ask for a Demo from Sentient Ascend, your initial enthusiasm may be great. Because it seems like a really slick tool. But if you think about it for a while, then you notice clear disadvantages associated with it:

  • You do not learn anything: just like with an MVT, the result is a version of the page that performs best. But you really do not know why. That means you cannot use these insights for subsequent tests and thus for the next winners.
  • You have to set up many different test ideas in the tool at the same time. That not only means that there is a great deal of set-up work in the beginning (and so it may take a while before the test can run live), it also means that you have to have a lot of test ideas. There may be a temptation then to just throw random things into the tool – just because you can. But every test is only as good as its hypothesis. That applies not only for AB or MVT, but also for AI. If you toss in trash, you cannot expect miracles.
  • QA: Previous research has shown that 40% of all AB tests that are set up are actually broken. For example, the version does not work on one particular browser or it looks OK, but the button suddenly stops working. That’s why we devote a great deal of attention at dexter.agency to QA (Quality Assurance): we make sure each test works properly and therefore spend a lot of time testing each test in each browser. If your test does not work properly, it affects the data from your test and may reach the wrong conclusions, which cost you a lot of money. To take full advantage of the Sentient Ascend tool, you have to test many elements simultaneously. As a result, the number of possible combinations increases exponentially and QA actually becomes impossible. An improperly functioning version may therefore interfere with the data without you being aware of it.
  • Prices start at 3000 EUR per month. And then you only have the tool. You still have to invest in the ‘brains’ that can get the most out of the tool: CRO specialists who come up with good hypotheses and developers who can set up all the test ideas in the tool. At this high price, this tool is not for everyone…
  • In the end, you still need an AB test: Sentient Ascend recommends validating the outcome of the AI test with an AB test in which the final winner from the AI test is tested against the original Control.

CONCLUSION

AI testing seems like a promising technology. But there are clearly still some downsides. Not in the least the very high price. And so, at the moment, it seems like something that is more for the (very) big players. We think that AI testing will not yet replace AB testing. But it can be an interesting replacement for MVT.

In any case, we are keeping a close eye on this technology and its evolution, because it does hold a lot of promise. If the entry price becomes more accessible than it is now, then it could become a little more mainstream in the future. But for the moment, we see this AI test tool as not competing with the classic AB test tools on the market.


You may also like