Multivariate Testing 101

by | Apr 4, 2018

(no — MVT is not just advanced A/B/n testing)

There’s a lot to learn about split testing (A/B testing) and multi­vari­ate testing (MTV), and there’s a lot of indus­try debate about whether and when to use each. But don’t fret! This post seeks to clarify issues surround­ing A/B testing and MVT by provid­ing infor­ma­tion about the follow­ing:

Understanding A/B and MVT

Split Testing (aka A/B or A/B/n) tradi­tion­ally focuses on eval­u­at­ing the impact of chang­ing one element at a time. Split testing splits traffic across two choices (or more, if A/B/n testing is applied to several choices), and is a tool for decid­ing A or B: You want to know if one element outper­forms another to improve a page’s conver­sion rate? This is the tool to use.

Multi­vari­ate testing (MVT) is a way of testing that allows you to eval­u­ate multi­ple element options simul­ta­ne­ously and measure their inter­ac­tions with each other, while also iden­ti­fy­ing which elements have the great­est impact on your desired KPI. MVT is a tool for deter­min­ing prof­itable asso­ci­a­tions (AB or AC or BC). You want to know which combi­na­tion of elements work best, given your hypoth­e­sis? This is the tool to use.

There are two kinds of MVTs: In a Full Facto­r­ial MVT, all possi­ble vari­ants are created and tested over equal parts of traffic. In a Partial or Frac­tional Facto­r­ial, a selec­tion of possi­ble vari­ants are created and tested. These results are less precise, but they require less traffic.

Each plat­form and each analyst uses differ­ent nomen­cla­ture, but all MVTs have these core elements:

  • Element — element on page you want to change (head­line, hero image, CTA, offer, price, etc)
  • Version — each element will have 2 or more versions you want to test
  • Variant — a unique combi­na­tion of element-versions to make a recipe

An MVT Example for our Palates instead of our Screens:
Say we wanted to make toast. Because toast is deli­cious. Why couldn’t we perform an MVT toast-test at a party, provided we had enough friends? Say we have enough friends (traffic). Say we made Sour­dough toast with butter and honey for a previ­ous party (the control), and it was a big hit, but we wanted to see if we could do better, so we’ll intro­duce three new versions of our elements. The elements of our MVT toast-test would be bread, some­thing creamy, some­thing sweet. The versions in our toast-making would be sour­dough or grainy wheat bread, butter or a soft, melty cheese, honey or cherry compote. But we can’t just test wheat, cheese and cherry against our control! If our second recipe won or lost against the control, we wouldn’t know why (maybe the cherry compote was too sour? Maybe the grainy wheat was too dense?). Plus, if we only compared the two toast recipes, we wouldn’t know if a previ­ously uncon­sid­ered combi­na­tion would actu­ally win (honey and cher­ries??). So, if we design a full-facto­r­ial MVT toast-test, we’ll need to supply the follow­ing vari­ants: (total number of vari­ants = number of versions for element 1 x number of versions for element 2 x number of versions for element 3, ect., so, in this case, we need 8 variants–the control + 7 vari­a­tions):

While it is certainly possi­ble to use split testing method­olo­gies when making multi­ple changes on a page, the sacri­fice made is to the insights captured and ability to scale the learn­ing. If you change the heading and the hero image and the Call to Action (CTA) and out-perform to control, you will not know if it was the heading, the hero, or the CTA that drove that improve­ment, nor will you know if the improve­ment was made larger due to an inter­ac­tion of any 2 or all 3 of the elements (or if the improve­ment would have been largest if only 1 or 2 of the 3 were included).

In contrast, MVT allows you to learn all of that—and be able to scale that learn­ing for future testing. MVT is specif­i­cally designed to measure the impact of changes to each element as well as the inter­ac­tion of each element to the others. In that way, MVT allows you to test multi­ple elements at the same time without sacri­fic­ing the insights and scal­a­bil­ity.

How to Choose

Choos­ing whether to use split testing or MVT is about as myste­ri­ous as choos­ing whether to use a hammer or a screw­driver: They’re each good tools for differ­ent tasks. While each tool has it’s pros and cons, the deter­min­ing factor for which tool to use should always be the task at hand.

Each tool requires a testable hypoth­e­sis for it to be effec­tive. To their peril, analysts some­times skip this step when they’re plan­ning MVT exper­i­ments. Further, there’s a miscon­cep­tion that an MVT plan will allow analysts to do work that split testing keeps them from doing. Keep in mind that MVT will allow for incred­i­ble insights, but only if an analyst has done upfront analy­sis and hypoth­e­sis forma­tion to ensure the test runs on the most advan­ta­geous elements and with the most applic­a­ble versions of those elements.

In theory, nearly every split test could be run as an MVT. BUT, in prac­tice, it makes much more sense for about 1 in 10 tests to be MVT for two reasons. First, MVT requires more resources to set up and analyze. Secondly, it is more diffi­cult to inter­pret and commu­ni­cate MVT testing, and because of that, it’s harder to drive deci­sion making, which should be the primary goal of your testing program.

Below, I’ve high­lighted the pros and cons of split testing and MVT and given frequent use scenar­ios for each.

Pros/Cons of each

Split Testing Pros:

  • Easy to design, inter­pret, and commu­ni­cate
  • Faster results for deci­sion-making (less traffic required)
  • Lower design/development cost

Split Testing Cons:

  • Frequently misunderstood/poorly designed tests with limited learn­ing obtained
  • Require­ment to test only one element per test limits the number of answers you can get at any one time

Multi­vari­ate Pros:

  • Helps you iden­tify which elements are most impact­ful to your Key Perfor­mance Indi­ca­tor (KPI)
  • Measures the inter­ac­tion of each element on the others
  • Ability to combine multi­ple tests into single MVT to eval­u­ate inter­ac­tions

Multi­vari­ate Cons:

  • Signif­i­cant increase in design, devel­op­ment, QA, and analy­sis time required
  • Results can be diffi­cult to inter­pret and even more trou­ble­some to commu­ni­cate effec­tively reduc­ing ability to derive insights and drive deci­sion-making
  • Consid­er­able increase in traffic / test runtime required for readout
  • Frequently used as a “test every­thing at the same time” approach without solid evidence support­ing ratio­nale for all element versions
  • Propen­sity for “nonsense” vari­ants to be created
  • Frequent need to follow an MVT with split test to confirm winning “variant” due to MVTs having increased like­li­hood of false posi­tives

Use cases (when to use / what prob­lems)

  • Newly designed layout or page and want to figure out where to focus opti­miza­tion efforts — MVT
  • Spend­ing a lot of time and resources design­ing multi­ple versions of hero images and want evidence to support the impor­tance of hero images — MVT
  • Limited avail­able data to support hypoth­e­sis for why page is under­per­form­ing due to lack of tags or age of page — MVT
  • Looking for insights that you can scale through­out the site and across other digital prop­er­ties — A/B
  • Already know which element is the most impor­tant on the page and want to find the right version — A/B
  • Clear hypoth­e­sis supported by data recom­mend­ing specific changes to single element — A/B

Which Comes First?

It’s impor­tant to remem­ber that some jobs may require both split testing and MVT, just as some jobs may require a hammer and a screw­driver. Based on test outcomes and the goals of your tests, you might need to perform split testing after MVT to vali­date MVT results or an MVT variant winner. We do this for two really good reasons. First, MVTs have a higher false posi­tive rate than A/B tests. Due to the simple fact that false posi­tives increase with increas­ing number of vari­ants. Secondly, Partial/Fractional  MVTs can result in a winning variant that was iden­ti­fied by the tool based on inter­ac­tion measure­ments that were never actu­ally tested on live traffic.

You might also need to perform MVT after a split test to incre­men­tally improve elements of the winning split test design, because a split test may select new page or layout, and then MVT can be used to help incre­men­tally improve page or layout with multi­ple vari­ants. MVT can also be used to help iden­tify which elements are most impor­tant to page perfor­mance.

Results Interpretation

One of the most impor­tant (and most frequently disre­garded) elements of any opti­miza­tion test is the ability to inter­pret the results.

Split test results are fairly easy to inter­pret. However, even they can run the risk of mis-stating the impact if the analyst does not under­stand or cannot adequately explain confi­dence inter­vals.

MVT results can be very chal­leng­ing to inter­pret, as they give more than just recipe B lift over recipe A with X confi­dence. MVTs also include inter­ac­tion effects and element contri­bu­tion. Inter­ac­tion effects are the measure of how each element inter­acts with other elements on the page to increase (or decrease) the KPI impact. Element contri­bu­tion speaks to the impor­tance of a given element on the KPI outcomes.

Confi­dence inter­vals & margin of error–It’s impor­tant to repre­sent “lifts” measured in both split and multi­vari­ate testing in a way that includes the ranges inher­ent in confi­dence inter­vals and margin of error.

Managing Stakeholder Expectations

Being able to commu­ni­cate results is an impor­tant part of A/B and MVT testing. However, it is equally impor­tant to commu­ni­cate your expec­ta­tions for the trajec­tory of the entire testing project up front. You’ll need to commu­ni­cate your plan and expec­ta­tions, not just your results.

The most impor­tant part of an effec­tive opti­miza­tion program is that program’s ability to drive change and help the orga­ni­za­tion make better deci­sions. To do this, the opti­miza­tion program owners must be able to effec­tively commu­ni­cate both the results and their recom­mended actions.

MVT results can be more diffi­cult to compre­hend and explain, so it’s impor­tant not to share too much data. Stick to a few core insights, for example these:

  • Most impact­ful element—element that seems to be most impor­tant to visitor for moving them to where you want them to be.
  • Least impact­ful element—element where you can feel free to do lower confi­dence testing/targeting going forward without risking conver­sion.
  • Element interactions—where any two elements seem to impact each other, that rela­tion­ship must be explained and recom­men­da­tions provided for how to consider future tests or site updates.
  • Most success­ful variant—a visual should be created showing clear­ing the most success­ful combi­na­tion of element versions. If numbers exist (the most success­ful variant is not always tested when using partial / frac­tional facto­r­ial methods), the lift and confi­dence over the control should be shared. If the most popular variant was not tested, a follow-up A/B test should be recom­mended with exist­ing control vs. winning variant.

MVT recom­men­da­tions might include:

  • Most impor­tant element to consider when creat­ing new page layouts / designs for tests or live site
  • Element rela­tion­ships to consider when creat­ing new page layouts / designs for tests or live site
  • Recom­mended roll-out of winning variant OR follow-up A/B with winning variant

TL;DR (Too Long; Didn’t Read for those of us older than millennials…)

Hope­fully this article has debunked the faulty ideas that MVT is somehow supe­rior to split testing because it is more sophis­ti­cated, or that MVT is an “advanced version” of split testing that should be used for every­thing. Neither of these ideas are true: Each method is simply another tool in the toolbox. When decid­ing whether to use either of these methods, don’t start with the tool, start with the problem. If the problem is a screw, you need a screw­driver. If the problem is a nail, you need a hammer. Commu­ni­ca­tion is the clear­est way to discern the problems—and the solutions—at hand.