Learn from Lukas Vermeer’s conversation with the Test and Learn Community about Sample Ratio Mismatch (SRM) or what he calls “one neat trick to run better experiments.” Here’s a simple tool to test for data quality issues.
Learn more about and join the Test & Learn Community!
The audience was polled “Have you heard of SRM?” then, “Have you seen it in practice?” There was a mixed bag of answers. But by the end of the session, every practitioner had learned what a valuable asset SRM can be.
In this session Lukas answered five main questions about SRM:
Why should we care about SRM? [6:21]
How do you know if you have SRM? [12:10]
Who suffers from SRM? [17:22]
What can cause SRM? [20:19]
What can you do about SRM? [27:06]
To hear his answers to each question check out the recording here.
SRM is defined as the mismatch between expected sample ratio and observed sample ratio. For example in an A/B test with a 50/50 split, we expect the ratio between Control and Challenger sample groups to be 1. If we end up observing a 40/60 split, we are far from the expected ratio of 1 and we have SRM. Today I will be highlighting 3 tips Vermeer delivered for thinking about SRM.
From Lukas’ presentation
#1: SRM affects anyone who is testing; it’s important to take it seriously
Unfortunately, just because you may not have been looking for SRM in the past, it doesn’t mean it hasn’t been there. If you test for it, you will find it. SRM is a widespread issue in experimentation, and checking for it should be a part of everyone’s program. It ends up that even some of the most experienced experimentation programs find SRM in 6-10% of their tests! And, this is after years of them working on preventing SRM!
The majority of experimentation programs have to take checking for SRM into their own hands. As of now, only 2 commercial platforms actively check and flag SRM for users. The good news, though, is that it’s easy to determine if you have it. You simply need the expected sample size for each group, the observed sample size for each group, and a Chi-Squared Goodness of Fit calculator. There are multiple online calculators that can be utilized for this, or even a Chrome browser tool Lukas has developed.
Chi-Squared Goodness of Fit calculator showing results of a test where SRM exists
If your results from the Chi-Squared test come back with statistically significant results, then you know that the difference in your observed sample sizes is not due to chance and that there is an underlying cause. This is your signal that you have an SRM issue that must be addressed for the test and the program.
Screenshot from Lukas’ presentation
#2: Even Baysians need to pay attention to SRM
When you see a statistically significant difference between the observed and expected sample ratios, it indicates there is a fundamental issue in your data (and even Bayesian doesn’t correct for that). This bias in the data causes it to be in violation of our statistical test’s assumptions. No matter the type of statistical test being run, if the assumptions are not properly met then the results are invalid. With invalid results, you cannot actually know if your test’s conclusion applies to the reality of your situation. And worst of all, if you are using these results as proof for making a business decision, you could unknowingly be causing harm to your business.
When SRM is detected for any type of test the most common fix is to stop the test, identify the cause, correct the issue, and refield the test. If your program is running multiple tests at once, all the tests may have to be stopped since it is possible the same issue causing the detected SRM is affecting them as well.
From Lukas’ presentation
#3: Conduct a Root Cause Analysis for SRM
Though SRM itself is easy to check for and find, it is much harder to figure out why it is occurring. SRM is only a symptom of many possible “diseases.” It could be happening in the testing tool deploying the test, or in the TMS in charge of tracking the experiences, or even in the analytics tool itself. There are endless possibilities for why you see the SRM, so, to help, Lukas has outlined 10 rules of thumb when doing a root cause analysis:
- Examine scorecards
- Examine user segments
- Examine time segments
- Analyze performance metrics
- Analyze engagement metrics
- Count frequency of SRMs
- Examine A/A experiment
- Examine severity
- Examine downstream
- Examine across pipelines
He also includes links to other helpful resources [32:04]. One of the links is to a research paper he and other industry leaders wrote on SRM, and the taxonomy developed around it. In the paper, they discuss how to diagnose, fix, and prevent different types of issues causing SRM.
Checking for SRM may seem tedious, complicated, or unnecessary, but it is, quite simply, vital to ensure you are making decisions based on valid results. To learn more about what we consider pertinent information for live tests, check out the blog How (and Why!) to Monitor Your Tests. SRM and Test Monitoring are crucial pieces of the test process because they help to build trust in your results, which leads to program buy-in from the stakeholders who drive decision-making in your organization.
If you feel this part of your process could use some help or you have questions not covered, please feel free to reach out! If other parts of your process, from planning to results analysis and sharing are giving you trouble – we can help! Each part of the testing process is pertinent to the overall program’s success, and deserves some TLC!