Search Discovery is excited to announce a new resource for our Experimentation clients and friends: The Sequential Test Calculator. Watch this video to see how to use the tool, and read on to learn how it works. Stay tuned for our evolving series with use cases and examples.
What in the world is a sequential test calculator?
Designed and developed by our own Merritt Aho, the sequential test calculator provides experimenters with a responsible way to “peek “at test results and optimize test run times. The calculator allows you to:
- Plan and analyze sequential A/B tests
- Analyze data while the test is running in order to determine if an early decision can be made
- Control your errors
- Run tests efficiently
How does it do all this, you ask?
Sequential testing assesses data as it is collected, as opposed to assessing only once a full sample is collected, as in the case of fixed-horizon testing. This R-based calculator provides a methodology for terminating fixed horizon tests early in the face of extreme results.
Because extreme results are hard to predict, and because when extreme results come around, you’ll want to act on them sooner, this tool provides you with the way to act on those results effectively. But you’ll want to start with this method rather than turning to it when the results show up as extreme (changing the rules of the game at halftime is just as inadvisable in statistics as it is in sports!).
The sequential test calculator is designed to give you a structured way to manage your alpha (false positive) and beta (false negative) error rates to allow for responsible peeking.
How it helps
We’ve all been tempted to “peek” for one reason or another, but most of the time this is hazardous. Making decisions to end a test early (because of peeking) compromises the statistical validity of your test. However, if you can “peek” responsibly, you can gain some clear advantages. Namely, you’ll be able to make decisions more quickly when strong signals exist, and herein lies the primary benefit of sequential testing.
One required parameter for fixed-horizon tests is the Minimum Effect of Interest (a.k.a. the Minimum Detectable Effect [MDE]), a baseline effect requirement for a successful test result. The MDE also represents a threshold of practical business impact. Fixed-horizon tests that we’re all most familiar with use this threshold to determine sample size (run time). The sequential test method reduces the penalty of having a bad prediction. For example, when the actual impact of your test is far greater than the MDE threshold, the fixed-horizon test method requires that you wait until you’ve achieved the full sample size before ending and making a decision on the test in order to not invalidate the statistical controls used to estimate sample size.
Here’s another illustration. Say you want to use a 95% confidence level so that you’ve reduced your risk of a false positive to 5%. Ending a fixed horizon test early invalidates that control such that you cannot be sure what your actual false positive rate is. In contrast, sequential testing methodologies are designed to allow for that “peeking” and decision-making mid-test, when the actual results cross the efficacy (better than expected) or futility (worse than expected) boundaries without reducing or invalidating those statistical controls.
Try it out!
If you’ve been looking for ways to build more efficiency into your experimentation program, take our new Sequential Calculator for a test drive (*pun intended*) and discover a new tool for your optimization toolbox.