Pragmatic Attribution: Build a Model of Models
Attribution is Complicated
Let’s acknowledge that marketing attribution is extremely complicated, and a true “360-degree view” of the customer is simply not possible. As an anecdotal case in point, consider a recent change in my personal media consumption: I signed up for a digital subscription to The Washington Post. I know why I signed up for a digital subscription to a newspaper: my local newspaper, which had been delivered daily for over a decade, simply has not been able to figure out a workable business model in a shifting news media landscape, and chose a path of reducing content while dramatically (and sneakily — if ham-handedly) increasing subscription rates.
But, why did I choose The Washington Post for my shift to a digital daily newspaper? I can’t answer that question. I can only point to various factors that could all reasonably deserve some attribution for the choice:
- The Post movie — any brand that can have itself favorably represented in a movie by Meryl Streep and Tom Hanks should, clearly, jump at the opportunity; I watched the movie on a plane somewhere, and I left with a very favorable impression of the paper (even if it was an impression circa 1971)
- The Super Bowl Ad — I did not see it live, but I saw it covered in the bevy of “Super Bowl ad winners and losers” posts in the week following the game
- My grandparents — while I grew up far, far away from Washington, D.C., my grandparents, who I delighted in visiting for a couple of weeks every summer of my youth, were loyal consumers of the paper (and avid cruciverbalists). Childhood associations are powerful and, often, permanent.
- Countless stories published over the past two years — the paper has broken numerous stories that have been picked up by other media outlets in the…ahem…dynamic and shifting environment of the seat of the U.S. federal government.
It’s important that we understand and accept that any attribution management solution is inherently noisy and imperfect. But, as George Box is famously paraphrased: “All models are wrong, but some are useful.”
As such, it’s useful to break down the different dimensions of attribution, as we did in this post from last year. What is outlined in the remainder of this post is a technique for identifying a “better model than last click,” without having to go overboard with opinion and conjecture. We can do this by taking advantage of the various heuristic models now available in both Google Analytics and Adobe Analytics and applying a little bit of objective statistics to that data. Essentially, this means we are focusing on one specific dimension of attribution management and illustrating a technique for moving from “basic” and towards “advanced” in a practical way:
A Primer on Heuristic Models
Let’s start with a definition of a “heuristic model.” Essentially, this is a fancy way of saying “pick your own model” (but if we come right out and say “pick your own model” at a cocktail party, we’re not going to sound nearly as sophisticated and advanced as if we say we “use a heuristic model”). There are countless heuristic models available. The two most dominant web analytics platforms — Google Analytics and Adobe Analytics — both now offer a range of such models (limited to marketing touches that result in a clickthrough to a digital property), and both include convenient visualizations to provide some intuition behind the logic behind each model:
The problem with a last touch approach is that it undervalues all of the touchpoints — upper funnel, mid-funnel, even lower-funnel-that-didn’t-quite-lead-to-a-conversion-at-the-time — that occurred before that last interaction:
“We’ve been doing all that great podcast advertising to get awareness of our brand out into the market. And we think that’s been working well. When the customer takes out her earbuds and then searches for us and clicks through on a search ad, it’s unfair that Google Adwords gets all the credit for her purchase!”
For years, Adobe Analytics also offered “first touch” attribution, which is similarly problematic (and introduces measurement latency issues that we won’t go into here…but which you will quickly come across as you dive into this world):
“That’s all well and good that you’ve been doing podcast advertising, but it was our killer paid search activities that actually offered the customer a way to click through and buy when she was ready to purchase. It’s unfair for you to give the podcast ads all of the credit for that order!!!”
What is the “right” model, then? In a heuristic attribution world, that gets left up to the business to decide: how does the organization think it makes the most sense to assign value? The number of heuristic models is endless. Most platforms allow the marketer to come up with their own model in addition to the pre-defined ones that are available out of the box, so, for the marketer who has been aggressively overthinking this topic, there could be an “Inverse U model” or a “W model” or a “Dr. J.” model or anything else!
This is how heuristic models fundamentally differ from algorithmic (aka, “data-driven”) models. Algorithmic models still require some human judgment — choosing which algorithm to use — but they are geared towards letting the data itself drive the attribution. The downside of algorithmic models is that they’re more complex to implement, and they will feel like a black box to most of the marketers who use them. (Both Google and Adobe are starting to offer algorithmic options off in the hidden corners of their platforms, but they’re not exactly shouting from the rooftops that they have robust solutions for this just yet; it’s complicated!).
What is described in the remainder of this post is a two-step approach for using multiple heuristic models to find an objective way to identify a “good” (better than last touch, better than first touch, still not perfect) heuristic model for your organization to use. Along the way, this approach often yields some useful understanding about how at least some of your paid channels actually work in your marketing ecosystem!
The Wisdom of the Heuristic Crowd
The key idea in this approach is that, rather than just picking a single heuristic model that just “feels right,” we can actually do an objective comparison of multiple models, average out the results those models return, and then choose the single model that most closely approximates that average.
The process looks like this:
Step 1: Look at All the Models
The first step is to pull the revenue (or orders or leads or whatever your primary outcome metric is) for each model and then combine them into a single data set. A boxplot is a useful visualization for this, as, for each channel, it will illustrate the median for all the models, as well as show how much overall variation there is across all of the models (the box in the center represents the 2nd and 3rd quartiles of values — so 50% of the models returned a value within that box; the median is the vertical line inside that box).
The channel that has the highest variability between the different models is Direct, and that is traffic that is very difficult to quickly impact. And, even with that variability, remember what we said about the boxes? Half of the models return results in a very narrow range, even for the channel that has the highest spread overall!
This may or may not be the case for your brand, but pausing here for a bit of a reality check is an important step: every model besides last touch introduces complexity into the process, so it’s worth checking whether an advanced attribution model is actually going to provide results that substantially differ from a last touch model.
We’ll put that aside for now and go ahead and complete the second step.
Step 2: Identify Which Model Fits “Best”
It would not be practical to have an attribution model that requires running multiple models and then finding the median each time. To avoid that, our second step is to determine which single model best approximates the median result across all models.
There are various approaches for doing this, but a simple one is, for each model, to simply calculate the absolute difference from the median for each channel and add those differences together. Then, simply find the model that has the lowest total value for that metric. In the example above, this turned out to be a Time Decay model with a 3‑week half-life, which is shown added as the orange diamond in the updated plot below.
Typically, I like to evaluate multiple timeframes (peak season, slow season, normal-run season) this way — creating the boxplot and finding the best approximation model for each one. If each of the timeframes returns the same result, that’s great! If not, then I look at the top 3 models for each timeframe and identify which model seems to be most consistently near the top of that list. Generally, one or two models pretty quickly emerge as being pretty good at approximating the median of all models, regardless of the timeframe, so that’s what I go with!
The approach described here is not the be-all/end-all for attribution modeling. It’s still, at its core, based on heuristic models; it doesn’t take into account impression/viewthrough data (although, if that data is available, it certainly can — but that’s not typically available in the digital analytics platform); and it does not factor in the customer lifetime value (although, again, it could… if that information were available). It also requires a bit of data crunching (some tedious exporting of data from the digital analytics platform or, as we prefer to do it, the querying of that data programmatically using Python or R).
But, it’s a nice step forward from simple last click, and it may even turn up that last click returns results that are close enough to other models that it makes sense to continue using that approach!
If you’re interested in learning more about this approach, feel free to contact us!