Pragmatic Attribution: Build a Model of Models

by | Feb 8, 2019

“Attri­bu­tion” remains a hot topic for marketers. It’s been more than a century since John Wana­maker is cred­ited with making the state­ment: “Half the money I spend on adver­tis­ing is wasted; the trouble is I don’t know which half!” But, it’s only been in the last decade that the migra­tion of consumers to digital chan­nels has combined with increas­ingly sophis­ti­cated tools for record­ing their behav­ior, storing that data at scale (hello, cloud!), and being able to query and crunch that data in ways that bring to mind imagery of Bob Dylan convers­ing with Big Blue. That conver­gence of circum­stances has led to a belief that, at last (AT! LAST!!!), the ghost of Mr. Wana­maker will finally be able to stop haunt­ing the halls of market­ing depart­ments the world over.
This post, alas, will not banish that ghost. But, it will outline one tacti­cal approach for moving beyond “last touch attri­bu­tion” in a way that is objec­tive and prac­ti­cal and mini­mizes the need for someone’s opinion to be part of that process.

Attribution is Complicated

Let’s acknowl­edge that market­ing attri­bu­tion is extremely compli­cated, and a true “360-degree view” of the customer is simply not possi­ble. As an anec­do­tal case in point, consider a recent change in my personal media consump­tion: I signed up for a digital subscrip­tion to The Wash­ing­ton Post. I know why I signed up for a digital subscrip­tion to a news­pa­per: my local news­pa­per, which had been deliv­ered daily for over a decade, simply has not been able to figure out a work­able busi­ness model in a shift­ing news media land­scape, and chose a path of reduc­ing content while dramat­i­cally (and sneak­ily — if ham-hand­edly) increas­ing subscrip­tion rates.

But, why did I choose The Wash­ing­ton Post for my shift to a digital daily news­pa­per? I can’t answer that ques­tion. I can only point to various factors that could all reason­ably deserve some attri­bu­tion for the choice:

  • The Post movie any brand that can have itself favor­ably repre­sented in a movie by Meryl Streep and Tom Hanks should, clearly, jump at the oppor­tu­nity; I watched the movie on a plane some­where, and I left with a very favor­able impres­sion of the paper (even if it was an impres­sion circa 1971)
  • The Super Bowl Ad I did not see it live, but I saw it covered in the bevy of “Super Bowl ad winners and losers” posts in the week follow­ing the game
  • My grand­par­ents — while I grew up far, far away from Wash­ing­ton, D.C., my grand­par­ents, who I delighted in visit­ing for a couple of weeks every summer of my youth, were loyal consumers of the paper (and avid cruciver­bal­ists). Child­hood asso­ci­a­tions are power­ful and, often, perma­nent.
  • Count­less stories published over the past two years — the paper has broken numer­ous stories that have been picked up by other media outlets in the…ahem…dynamic and shift­ing envi­ron­ment of the seat of the U.S. federal govern­ment.
Ulti­mately, I showed up on as direct traffic to the site that imme­di­ately went and regis­tered. From my tablet. I could not tell the paper how much each of the above factors should each be cred­ited for my subscrip­tion if I wanted to. I don’t know myself!

It’s impor­tant that we under­stand and accept that any attri­bu­tion manage­ment solu­tion is inher­ently noisy and imper­fect. But, as George Box is famously para­phrased: “All models are wrong, but some are useful.”

As such, it’s useful to break down the differ­ent dimen­sions of attri­bu­tion, as we did in this post from last year. What is outlined in the remain­der of this post is a tech­nique for iden­ti­fy­ing a “better model than last click,” without having to go over­board with opinion and conjec­ture. We can do this by taking advan­tage of the various heuris­tic models now avail­able in both Google Analyt­ics and Adobe Analyt­ics and apply­ing a little bit of objec­tive statis­tics to that data. Essen­tially, this means we are focus­ing on one specific dimen­sion of attri­bu­tion manage­ment and illus­trat­ing a tech­nique for moving from “basic” and towards “advanced” in a prac­ti­cal way:

A Primer on Heuristic Models

Let’s start with a defi­n­i­tion of a “heuris­tic model.” Essen­tially, this is a fancy way of saying “pick your own model” (but if we come right out and say “pick your own model” at a cock­tail party, we’re not going to sound nearly as sophis­ti­cated and advanced as if we say we “use a heuris­tic model”). There are count­less heuris­tic models avail­able. The two most domi­nant web analyt­ics plat­forms — Google Analyt­ics and Adobe Analyt­ics — both now offer a range of such models (limited to market­ing touches that result in a click­through to a digital prop­erty), and both include conve­nient visu­al­iza­tions to provide some intu­ition behind the logic behind each model:

Last Inter­ac­tion or Last Touch is the default/standard/typical model that both Google Analyt­ics and Adobe Analyt­ics — as well as other plat­forms — used for years. It’s simple, intu­itive, and even ratio­nal, in that it assigns full credit for the conver­sion (purchase, booking, lead) to what­ever the last iden­ti­fied market­ing channel before the conver­sion was.

The problem with a last touch approach is that it under­val­ues all of the touch­points — upper funnel, mid-funnel, even lower-funnel-that-didn’t-quite-lead-to-a-conversion-at-the-time — that occurred before that last inter­ac­tion:

“We’ve been doing all that great podcast adver­tis­ing to get aware­ness of our brand out into the market. And we think that’s been working well. When the customer takes out her earbuds and then searches for us and clicks through on a search ad, it’s unfair that Google Adwords gets all the credit for her purchase!”

For years, Adobe Analyt­ics also offered “first touch” attri­bu­tion, which is simi­larly prob­lem­atic (and intro­duces measure­ment latency issues that we won’t go into here…but which you will quickly come across as you dive into this world):

That’s all well and good that you’ve been doing podcast adver­tis­ing, but it was our killer paid search activ­i­ties that actu­ally offered the customer a way to click through and buy when she was ready to purchase. It’s unfair for you to give the podcast ads all of the credit for that order!!!”

What is the “right” model, then? In a heuris­tic attri­bu­tion world, that gets left up to the busi­ness to decide: how does the orga­ni­za­tion think it makes the most sense to assign value? The number of heuris­tic models is endless. Most plat­forms allow the marketer to come up with their own model in addi­tion to the pre-defined ones that are avail­able out of the box, so, for the marketer who has been aggres­sively over­think­ing this topic, there could be an “Inverse U model” or a “W model” or a “Dr. J.” model or anything else!

This is how heuris­tic models funda­men­tally differ from algo­rith­mic (aka, “data-driven”) models. Algo­rith­mic models still require some human judg­ment — choos­ing which algo­rithm to use — but they are geared towards letting the data itself drive the attri­bu­tion. The down­side of algo­rith­mic models is that they’re more complex to imple­ment, and they will feel like a black box to most of the marketers who use them. (Both Google and Adobe are start­ing to offer algo­rith­mic options off in the hidden corners of their plat­forms, but they’re not exactly shout­ing from the rooftops that they have robust solu­tions for this just yet; it’s compli­cated!).

What is described in the remain­der of this post is a two-step approach for using multi­ple heuris­tic models to find an objec­tive way to iden­tify a “good” (better than last touch, better than first touch, still not perfect) heuris­tic model for your orga­ni­za­tion to use. Along the way, this approach often yields some useful under­stand­ing about how at least some of your paid chan­nels actu­ally work in your market­ing ecosys­tem!

The Wisdom of the Heuristic Crowd

The key idea in this approach is that, rather than just picking a single heuris­tic model that just “feels right,” we can actu­ally do an objec­tive compar­i­son of multi­ple models, average out the results those models return, and then choose the single model that most closely approx­i­mates that average.

The process looks like this:

This approach gives equal weight to each of the models, with the assump­tion that the amal­ga­ma­tion of all of them is a pretty good result.

Step 1: Look at All the Models

The first step is to pull the revenue (or orders or leads or what­ever your primary outcome metric is) for each model and then combine them into a single data set. A boxplot is a useful visu­al­iza­tion for this, as, for each channel, it will illus­trate the median for all the models, as well as show how much overall vari­a­tion there is across all of the models (the box in the center repre­sents the 2nd and 3rd quar­tiles of values — so 50% of the models returned a value within that box; the median is the verti­cal line inside that box).

This chart on its own is worth pausing for a minute to inspect. Your results will vary, of course, but notice here how, even with multi­ple differ­ent models being tried, most chan­nels don’t vary that much when it comes to their attrib­uted revenue!

The channel that has the highest vari­abil­ity between the differ­ent models is Direct, and that is traffic that is very diffi­cult to quickly impact. And, even with that vari­abil­ity, remem­ber what we said about the boxes? Half of the models return results in a very narrow range, even for the channel that has the highest spread overall!

This may or may not be the case for your brand, but pausing here for a bit of a reality check is an impor­tant step: every model besides last touch intro­duces complex­ity into the process, so it’s worth check­ing whether an advanced attri­bu­tion model is actu­ally going to provide results that substan­tially differ from a last touch model.

We’ll put that aside for now and go ahead and complete the second step.

Step 2: Identify Which Model Fits “Best”

It would not be prac­ti­cal to have an attri­bu­tion model that requires running multi­ple models and then finding the median each time. To avoid that, our second step is to deter­mine which single model best approx­i­mates the median result across all models.

There are various approaches for doing this, but a simple one is, for each model, to simply calcu­late the absolute differ­ence from the median for each channel and add those differ­ences together. Then, simply find the model that has the lowest total value for that metric. In the example above, this turned out to be a Time Decay model with a 3-week half-life, which is shown added as the orange diamond in the updated plot below.

If we deter­mine that we do, indeed, want to move beyond last touch, then this Time Decay heuris­tic model looks like it would be a pretty good way to go!

Typi­cally, I like to eval­u­ate multi­ple time­frames (peak season, slow season, normal-run season) this way — creat­ing the boxplot and finding the best approx­i­ma­tion model for each one. If each of the time­frames returns the same result, that’s great! If not, then I look at the top 3 models for each time­frame and iden­tify which model seems to be most consis­tently near the top of that list. Gener­ally, one or two models pretty quickly emerge as being pretty good at approx­i­mat­ing the median of all models, regard­less of the time­frame, so that’s what I go with!


The approach described here is not the be-all/end-all for attri­bu­tion model­ing. It’s still, at its core, based on heuris­tic models; it doesn’t take into account impression/viewthrough data (although, if that data is avail­able, it certainly can — but that’s not typi­cally avail­able in the digital analyt­ics plat­form); and it does not factor in the customer life­time value (although, again, it could… if that infor­ma­tion were avail­able). It also requires a bit of data crunch­ing (some tedious export­ing of data from the digital analyt­ics plat­form or, as we prefer to do it, the query­ing of that data program­mat­i­cally using Python or R).

But, it’s a nice step forward from simple last click, and it may even turn up that last click returns results that are close enough to other models that it makes sense to continue using that approach!

If you’re inter­ested in learn­ing more about this approach, feel free to contact us!

Would you like to chat about how we can help you?

I consent to having Search Discovery use the provided infor­ma­tion for direct market­ing purposes includ­ing contact by phone, email, SMS, or other elec­tronic means.