Let the Bandit Decide: Optimization with Machine Learning
We’ve all heard of the one-armed bandit: the slot machine. The consummate victor engineered to best all players. But cue the lone tumbleweed and standoff music, because it’s high noon, and a stranger’s come to town.
Enter Machine Learning’s sinister-sounding solution: The multi-armed bandit. He’s not going to play the slots like any old two-armed, one lever-
This aspect of the bandit problem has important implications. It eliminates our paradigm of measuring against a
Typically, in standard A/B/n or MVT experimentation, we measure the lift of a winner against
Further, with the multi-armed bandit approach, you cannot say with statistical confidence what your worst performer is in a given experiment. Why? Because the bandit has greedily shifted traffic away from the non-winning experiences in an effort to exploit the gains from the higher-performers. That means the lower performing recipes will not have the statistical power (samples) necessary to statistically measure which is worst.
Why should you care which is worst? Well, maybe you don’t. Normally you wouldn’t. But you certainly couldn’t calculate
The Minimum Detectable Effect Conundrum, (aka, My Formerly FAQ)
One of the most common questions I get from my clients is around runtime calculations: “But, how do I know what my minimum detectable effect (MDE, minimum lift, etc.) should be?” I have the same answer every time: “How big a lift would you need to see to go with the new experience?” I normally get blank stares which leads to me (unhelpfully) lecturing about the cost of testing (it ain’t free,
Do you know how many have been able to answer that question? Zero. That’s right: zero. Everyone can come up with a scenario where they would need that information—perhaps a new 3rd party recommendations engine or a test to remove a revenue-generating ad from a key funnel page, etc. In each scenario, we know the challenger must provide gains that outweigh or erase the costs of the decision being made. Easy!
But what if you’re deciding between banner A or B? Page template X or Y? The assets have been created. There’s no additional cost to push
What’s the solution? We tried to back into lift estimates by using maximum runtime in this version of our calculator, and that helps to some extent. It helps you make decisions about whether those lifts are likely, at least. And knowing that can help you with prioritization. But it still doesn’t answer that niggling question: How big does a lift need to be in order for the business to make a decision?
Is that answer truly “any lift?” Because if it is—and I suspect it often really is—then why do we spend so much time arguing over lift % and annualized impacts? Why can’t we stand up and say B is better than A. Go with
The New Frontier
This is essentially what the Multi-Armed Bandit is doing. It doesn’t care HOW much better B is compared to A, or that C is worse than A. It only cares that B is better. And with the push from executive leadership toward machine learning and the automation solution, maybe we can use this shift in methodologies to also shift the way we think about—and communicate—success. In that sense, maybe we’re all multi-armed bandits! Watch out casinos, here we come!
If you have any questions or would like to chat about how SDI could help build a new program or take an existing program to the next level, reach out!