Calculating ‘Truth’ (while avoiding existential crises)

by Mar 16, 2020

It’s an elec­tion year! There’s a pandemic! And you’re a crack­er­jack opti­miza­tion analyst! Every day you’re called on to calcu­late the “Truth,” at least for your exper­i­ments. Lucky you!

So how can you calcu­late some­thing unknow­able like “Truth”? Statis­tics has a way to esti­mate that!

What you’ll need:

Power: typi­cally defaults to 80% but check your testing tech­nol­ogy docu­men­ta­tion if you do not know or just use 80% as a default

Confi­dence (or “Statis­ti­cal Signif­i­cance”): what statis­ti­cal thresh­old you have used for the selec­tion of tests to be eval­u­ated (ie — 90 or 95% typi­cally)

Win Rate: the propor­tion of tests completed that were consid­ered statis­ti­cally signif­i­cant 

What you can do with that:
From these 3 numbers, we can calcu­late a “True Discovery Rate”. And 100% — the “True Discovery Rate” of course would be our “False Discovery Rate.” Here’s a handy formula for that: 

True Discovery Rate = Power * (Win Rate + Confi­dence — 1)
 Win Rate * (Power + Confi­dence ‑1)

For an example using numbers, a likely scenario would be 80% power, 95% confi­dence, and win rate of 10%:

True Discovery Rate = 80% * (10% + 95% — 1) or 53%
 10% * (80% + 95% — 1)

False Discovery Rate then equals 100–53% or 47%.

That means that nearly half of all “wins” reported by orga­ni­za­tions (using 95% confi­dence, 80% power, and achiev­ing a 10% win rate) are actu­ally illu­sory!

So how can you increase that True Discovery Rate?

If you play around with the various elements in that formula, one of the first things you learn is that 80% power gives you nearly the same True Discovery Rate as 90%, 95%, and even 99% power. (Which is why 80% power is the indus­try stan­dard for most tech­nolo­gies, I’m guess­ing!) In fact, there is a very slight nega­tive corre­la­tion between Power and True Discovery Rate. In other words, increas­ing power increases your like­li­hood of finding a signif­i­cant differ­ence. But — due to the law of large numbers — some­times those “signif­i­cant” differ­ences might not be impor­tant. It’s easy to find trends and corre­la­tions in data when you have a lot of data. So — it’s impor­tant to adequately power your test — but don’t over-power them either. Though if you have to make one error, err on the side of over power rather than under. If you under power the test — why are you even testing? You’ll not be able to get a statis­ti­cally valid read and you’re just wasting time and resources. 

However, lower­ing your confi­dence level from 95% to 90% (while main­tain­ing the 10% win rate and 80% power), reduces the True Discovery Rate to 0%! 

Of course, reduc­ing your confi­dence level from 95% to 90% should increase your win rate simply by lower­ing your stan­dards (and required runtime). So, if we lower to 90% while increas­ing the win rate to 20%, you find nearly the same False Discovery Rate (43%). Essen­tially, if you want to lower your confi­dence level to 90%, you’ll want to make sure you have a win rate of 20% or higher to ensure that more than half of your “wins” are actu­ally real. Increase your confi­dence level to 99%, and you can afford a lower win rate of 10%.

Why it Matters

So why do we care? This is hard! Should we all just throw up our hands in frus­tra­tion and find new jobs? 

No! Of course not. As G.I.Joe reminds us, “Knowing is half the battle!” In our case, knowl­edge is power! And while power corrupts, and absolute power corrupts absolutely, power used respon­si­bly makes the world a better place. 

So, how should we use our newfound knowl­edge and power? We can apply this better under­stand­ing of error to our impact esti­mates. We can ensure our ROI math is adjusted by these poten­tial error rates. We can calcu­late how many more tests we should run, and how we might want to adjust our stan­dards to better miti­gate (or not) each type of error. In short, we can apply our think­ing brains to prob­lems that histor­i­cally we left to gut and heart, and we make smarter deci­sions and better recom­men­da­tions on how to safely and respon­si­bly use this data.

We are all asked or required at some point in our careers to calcu­late the “impact” or “program ROI” of our exper­i­men­ta­tion efforts. Some compa­nies even calcu­late an annu­al­ized revenue impact from every test they complete. And then they bake those numbers into the P&L. Those efforts typi­cally last about one year and lead to multi­ple (chal­leng­ing) conver­sa­tions with finance and lead­er­ship teams, where you (the analyst) stumble about, trying to explain all the poten­tial causes for why this or that win didn’t mate­ri­al­ize post perma­nent imple­men­ta­tion. 

We point to season­al­ity, changes in the market, lack of controlled envi­ron­ment. Some­times we even point out that we cannot know that the perma­nent imple­men­ta­tion is NOT provid­ing a lift that has raised what would actu­ally be a dip to a level perfor­mance! 

Here’s the deal, though–and I hope this has been made clear through­out this post, so this doesn’t come as too much of a shock (but maybe take a seat, just in case)–responsibly, we cannot say any of those things.

Based on every­thing you’ve read above, hope­fully, you under­stand why that might be. Even a result with 99% confi­dence means there is a 1% chance of a Type I error and a 2–9% chance of a False Discovery (depend­ing on your win rate). False Discovery Rate goes down as Win Rate goes up.

Credit: Ton Wessel­ing

But are you satis­fied with only half of your wins being “real”? 

Would you like to feel confi­dent your wins are real more than half of the time? Person­ally, I would aim for 70%+ True Discovery Rate — that’s about where you see the green appear­ing in Ton’s chart above. If your confi­dence level is set to 90%, you’ll want a win rate of at least 30% which will still get you a True Discovery Rate of about 76%. If your confi­dence level is set to 95%, you can get away with a win rate of 20%, and your True Discovery Rate is 80%! Manage a win rate of 30% with 95% confi­dence, and you’ll win the lottery with a True Discovery Rate of close to 90%! But, note! Even with those much higher stan­dards and results, you still have 10% of your wins that are not wins!*

Fine. But what if you want to figure out which of our wins are the “real” wins?
Um. Yeah. I can’t do that. And neither can you. So you best make peace with that.

But go to this post for some inner peace about it all: Don’t Run with Scis­sors! How to Safely Calcu­late Program ROI

*Find this really inter­est­ing but hate doing math by hand? Thanks to Ton Wessel­ing, there’s a calcu­la­tor for that!

Resources

Still scratching your head and want some help? Reach out. We can help.

I consent to having Search Discovery use the provided infor­ma­tion for direct market­ing purposes includ­ing contact by phone, email, SMS, or other elec­tronic means.