Let the Bandit Decide: Optimization with Machine Learning

by Mar 14, 2019

We’ve all heard of the one-armed bandit: the slot machine. The consum­mate victor engi­neered to best all players. But cue the lone tumble­weed and stand­off music, because it’s high noon, and a stranger’s come to town.

Enter Machine Learning’s sinis­ter-sound­ing solu­tion: The multi-armed bandit. He’s not going to play the slots like any old two-armed, one lever-pullin’ Joe—ooooh, no. The Multi-armed Bandit is going to pull all the levers—that’s all the levers at once—and he’s going to learn quickly which machines pay fastest and when they’re paying fastest, and he’s going to play those partic­u­lar machines and win(!), until he’s won every­thing (every­thing!), until the barkeep goes home crying and the casino is a penni­less waste­land.

This aspect of the bandit problem has impor­tant impli­ca­tions. It elim­i­nates our para­digm of measur­ing against a control, because a multi-armed bandit doesn’t test against an exist­ing control, it tests among every expe­ri­ence.

Typi­cally, in stan­dard A/B/n or MVT exper­i­men­ta­tion, we measure the lift of a winner against a control, but with the bandit, you cannot say, “x% lift trans­lates to x dollars annu­al­ized;” all you can say is, “this is your best performer.” Multi-armed-bandit type exper­i­men­ta­tion might help us shift the conver­sa­tion from “how much can we expect?” to “what works best?”  In this way, we’ll be able to trans­fer the precious skull sweat we currently use calcu­lat­ing upside to actu­ally creat­ing optimal user expe­ri­ences.

Further, with the multi-armed bandit approach, you cannot say with statis­ti­cal confi­dence what your worst performer is in a given exper­i­ment. Why? Because the bandit has greed­ily shifted traffic away from the non-winning expe­ri­ences in an effort to exploit the gains from the higher-perform­ers. That means the lower perform­ing recipes will not have the statis­ti­cal power (samples) neces­sary to statis­ti­cally measure which is worst.

Why should you care which is worst? Well, maybe you don’t. Normally you would­n’t. But you certainly couldn’t calcu­late oppor­tu­nity cost of choos­ing the losing variant, as many programs try to do in an effort to measure and report program ROI and the “savings” from not rolling out a subpar expe­ri­ence. Again, this shifts the way we think about and commu­ni­cate about our program. I would say that this shift is a good move—away from calcu­lat­ing ROI and toward focus­ing on the very thing our programs are typi­cally named after: opti­miza­tion.

The Minimum Detectable Effect Conundrum, (aka, My Formerly FAQ)

One of the most common ques­tions I get from my clients is around runtime calcu­la­tions: “But, how do I know what my minimum detectable effect (MDE, minimum lift, etc.) should be?” I have the same answer every time: “How big a lift would you need to see to go with the new expe­ri­ence?” I normally get blank stares which leads to me (unhelp­fully) lectur­ing about the cost of testing (it ain’t free, y’all!) and how they should, at a minimum, want to see a lift that would outweigh the costs of the test itself.

Do you know how many have been able to answer that ques­tion? Zero. That’s right: zero. Every­one can come up with a scenario where they would need that information—perhaps a new 3rd party recom­men­da­tions engine or a test to remove a revenue-gener­at­ing ad from a key funnel page, etc. In each scenario, we know the chal­lenger must provide gains that outweigh or erase the costs of the deci­sion being made. Easy!

But what if you’re decid­ing between banner A or B? Page template X or Y? The assets have been created. There’s no addi­tional cost to push them live. So…what does your lift need to be? Does it even matter? So long as it’s measur­ably better. BUT we can’t use a minimum lift like, say, 1%  in a runtime calcu­la­tor (unless you’re one of those programs gifted with an abun­dance of traffic and wonder­ful conver­sion rates) without seeing runtime esti­mates that leave us gasping for air.

What’s the solu­tion? We tried to back into lift esti­mates by using maximum runtime in this version of our calcu­la­tor, and that helps to some extent. It helps you make deci­sions about whether those lifts are likely, at least. And knowing that can help you with prior­i­ti­za­tion. But it still doesn’t answer that niggling ques­tion: How big does a lift need to be in order for the busi­ness to make a deci­sion?

Is that answer truly “any lift?” Because if it is—and I suspect it often really is—then why do we spend so much time arguing over lift % and annu­al­ized impacts? Why can’t we stand up and say B is better than A. Go with B.

The New Frontier

This is essen­tially what the Multi-Armed Bandit is doing. It doesn’t care HOW much better B is compared to A, or that C is worse than A. It only cares that B is better. And with the push from exec­u­tive lead­er­ship toward machine learn­ing and the automa­tion solu­tion, maybe we can use this shift in method­olo­gies to also shift the way we think about—and communicate—success. In that sense, maybe we’re all multi-armed bandits! Watch out casinos, here we come!

If you have any ques­tions or would like to chat about how SDI could help build a new program or take an exist­ing program to the next level, reach out!

Ready to get started?
Reach out to learn more about how we can help.

I consent to having Search Discovery use the provided infor­ma­tion for direct market­ing purposes includ­ing contact by phone, email, SMS, or other elec­tronic means.