How to Reveal a Bot Problem in Social and Display Campaign Tracking

This post is the first in our series on the invisible arms race against bots. Follow along at #invisiblearmsrace #botsareathing.
Got tracking discrepancies? Here’s how we got to the bottom of a 98% tracking and reporting discrepancy for a global leader in logistics and shipping.

We’ve been accepting bots as a normal cost of doing business for a long time. We don’t change our marketing spend because of bots, and we expect that they’ll be there, but just kind of pretend everything’s fine.

Everything is not fine! It is extremely convenient for us to all go about our digital business and ignore the bots. However, it’s an inconvenient truth that we’re all likely buying bots (a LOT of them), which I’ve written about on my blog at jimalytics.com.

Lukas Oldenburg writes about how none of our standard “solutions” for dealing with bots are good enough—their detection comes too late (by the time a bot is spotted, your data is already in the tool/system of your choice, so you’ve both paid for it, and it isn’t filterable), and they still have too many false positives/negatives (if you relied on their judgment, you’d lose human visitors and have just as many bots). And, worse than all attempts, the old “ignore-them-and-they-won’t-matter” gambit is starting to show insane diminishing returns.  

We’re now at an inflection point: The bots are maturing. It’s time to admit that bots matter, that “managing” bots isn’t effective, and that aggressively filtering out bots or filtering IN humans may become a standard practice.

When a giant data discrepancy showed up for one of our clients between their paid clicks and site traffic, we (Charlie Tysse, Sam Burge, and I at Search Discovery) dug in to try to understand why. Spoiler alert: the problem was bots. But to reveal this, we had to audit each step in the data flow to test for faults.

Data Flow Audit

Like a rocket launch, many things need to go right for tracking to work properly, and only one thing needs to go wrong to lose a payload. In order to ensure there were no technical issues with the implementation as it relates to social traffic, we needed to validate the data flow from end to end.

image3

Testing Methodology

Before jumping to any conclusion of non-human interference, we needed to make sure all the tech that tracks users was working as intended. We painstakingly reviewed and documented each minor and major milestone that occurs between an ad click and a data point populating in the analytics tools. Errors happen. Unintended consequences of a complex implementation are common. Maybe the query string was stripped. The consent manager might be misbehaving. Browsers might be blocking the server calls. There’s always some small level of data distortion with stuff like ad blockers; however, that would account for a 10-20% discrepancy—not a 99% discrepancy. image2 With the technical implementation validated, the next logical step would be to go straight to the source—server logs.

Audit Findings

We approached the Akamai phase of the data flow with a hypothesis that Akamai was preventing consent from being served to social traffic. In fact, we did find that Akamai Bot Manager blocked a significant amount of traffic for social channels, so, obviously, that traffic was never served consent. But we needed to understand what traffic was being blocked and whether any of the traffic that wasn’t being blocked should be blocked.
In other words, while Akamai Bot Manager was effective at filtering out bot traffic, the bot traffic volume signaled symptoms of a broader issue with non-human traffic. Here are our findings on social spotlight landing pages where Akamai was registering and blocking a large amount of bot traffic.

image5

For the top 43 campaign CIDs, 84% of traffic picked up by Bot Manager was blocked and 16% was flagged for further monitoring. This total view included Spotlight landing page requests for Facebook, LinkedIn, and display ads that showed up in Akamai Bot Manager reporting.

image4

For the top nine CIDs, an estimated 27% of clicks were picked up by Bot Manager for review, 67% of traffic picked up by Bot Manager was being blocked, and 33% of traffic was tagged for further monitoring. This total view also included Spotlight landing page requests for Facebook, LinkedIn, and Display ads that showed up in Akamai Bot Manager reporting.

image8

The consent phase required thought but seemed okay: Further along in the consent data flow phase, we checked to ensure that OneTrust was implemented correctly. The OneTrust implementation doesn’t use the official Launch extension and is fairly bespoke, so we developed an outline to show our client the general mechanics of their OneTrust implementation. This way, the broader team could understand flow and setup as we explored potential fault points.

We found that consent categories mapped correctly, the analytics beacon fired after the user gave consent, and marketing pixels fired after the user gave consent. After we confirmed traffic was getting to OneTrust, we then needed to confirm that traffic that Akamai is passing through sees the OneTrust Prompt. Without alignment between the actions users take on the OneTrust prompt with the data collected in Akamai, we had to assume that a majority of traffic hitting the OneTrust prompts was bots. 

Here, it should be noted that some site visitors never make a selection, which results in tags not firing. If they do make a selection on any page past entry, the campaign data is lost and the visit is considered “direct” from the analytic system’s point of view.

Launch was a go: When we examined the client’s Launch implementation, we found that while there are a few conditions that exclude traffic from pageviews, there was nothing that would disqualify a high volume of traffic. A look at the Launch library file hosted by Adobe revealed that the CMP library loaded via custom code at top of the page, consent policy groups updated in accordance with user preferences, data elements mapped correctly, and cookies were set in a first-party context (paying attention to how these are set will determine the likelihood of a delta between entry and the ultimate desired conversion event, and that is a subject for another post!). 

Browser phase behaved as expected: In the browser phase of the data flow, we examined beacons and pixels to see if perhaps the analytics server call and/or the social media marketing pixels weren’t making it through. Our client wasn’t using a CNAME record for its analytics tracking server, which may have a marginal effect on actual server calls; however, we did find that cookies are being set in a first-party context once consent is given. The analytics beacon fired after users gave consent and contained all variables as set in the Launch rules. Further, our client’s pixels were configured correctly and fired after users gave consent.

Reporting checked out: Finally, we checked the Analytics report suite configuration to see if it might be preventing data from being collected. They all checked out. The report suites we analyzed were generated in Workspace and are subject to some limitations of Adobe Analytics (e.g., Low Traffic bucketing), but otherwise, there didn’t appear to be any issues with the configuration. While the reports might have trimmed some traffic, they didn’t trim enough to account for the 90%+ differential.

It should be noted that both Google and Adobe Analytics have controls available that filter well-known bots based on a list from the IAB (and other bodies). Adobe Analytics controls can be found here. This can account for some of the delta between platforms if bots are a consideration (depending on the configuration).

Synthesizing the findings

Significant non-human traffic was being detected by Akamai Bot Manager to social and display campaign landing pages. Most of that traffic was being filtered out and blocked before reaching the landing pages; however, non-human traffic played a significant role in marketing traffic being reported. We recommended that detection and filtration must be prioritized since it was the underlying cause of the nearly 98% discrepancy between social and display campaign reporting and Adobe reporting.

Considerations

Before we could make recommendations for our client’s actionable next steps, it was important to recognize a couple of challenges to our findings: The impact of the consent opt-in rate and some misaligned Akamai data.

First, our client’s consent management platform, OneTrust, only provides summary-level consent opt-in rates, so granular reporting rates (e.g., at the page level), that might reveal more definitive explanations for tracking discrepancies aren’t possible in that phase of the data flow.

Secondly, full Akamai request data is only available for 48 hours, impacting our ability to access the total number of landing page requests. We retrieved Akamai Bot Manager reporting for a two-week frame. In that data set, campaign data received for CIDs captured in Bot Manager reporting wasn’t always 1:1. Timeframes and breakouts couldn’t always be aligned, leaving ambiguity in the Akamai Data phase of the data flow, too. 

Digging Deeper

We had a pretty good notion of where in the data flow the discrepancy was introduced, so we pulled a Data Warehouse request to get data on user agents and IP addresses. Here’s what we found:

image6

As suspected, most requests were from an Android 9 standalone Chrome (not WebView) user agent. The referring URL hostname was api.outbrain.net, there were some subnet patterns in the IP addresses but nothing that suggested bots, and the location appeared to be the Detroit metro area.

These findings didn’t suggest obvious bot traffic, but an outdated version of Android (released Aug. 6, 2018) was disproportionately represented, and it’s known that this version is relatively easy to exploit. This could indicate a type of malware designed to look like a human user installed on a vulnerable device/OS as part of an app listed in an app store and is used to generate clicks (click spam and click injection) on ads. Hits from the malware appear to be from a human user, except for the absurdly high traffic volume.

Why does this matter? While we haven’t found evidence that points to wide-scale ad fraud, these findings do suggest the following:

  • Non-human traffic can make it to our client’s landing pages despite current bot detection and prevention measures. This means that millions of advertising dollars spent on advertising over the past year are paying for bot clicks. 
  • Display campaigns may be more susceptible to these techniques since generating impressions/clicks for them is easier. For example, fewer users were blocked from unbranded search terms vs. social platform profile attributes because social platforms do have an element of programmatic marketing that can be exploited.
  • It’s possible that social traffic is not deflated so much as other paid media traffic is inflated. 
  • We even saw a 1:1 ratio between clicks and what’s considered “monitored” traffic for one campaign. Monitored traffic is suspected bot traffic. That monitored traffic would get to the consent manager, and that’s as far as they were able to be measured. 
  • We saw this result with nearly all paid campaigns – close to 99% of non-human traffic. This is an outcome of millions of advertising dollars spent over the past year. This is a huge problem for the company. 

If this is true, incalculable digital ad dollars are being spent on non-human traffic. Countries in the EU have consent managers that seem to be a tracking dead-end for these bots. It’s likely flying under the radar in the US.

Recommendations

Additional data points are required to validate our hypothesis that bots are causing the data discrepancies. During internal peer reviews, we discovered that this phenomenon isn’t unique to our client, but is instead impacting a broader market. In fact, we performed this same analysis with two other large companies and saw excruciatingly similar results.

Sharing information between teams will accelerate fact-finding and urgency to media platforms that want to continue to sell high-quality traffic. Here’s where further data could be collected and shared to validate bot tracking impact:

  1. User Clicks an Ad
    Ask: Can we follow social campaign traffic end-to-end from click to consent?
    To understand the full impact of bot traffic and be able to tap into all Akamai reporting (bot traffic and human traffic alike), run test social campaigns in specific geo-locations to track traffic from click to Akamai to yourwebsite.com. Coordinate and develop test plans with Adrivo, Marketing, IT, and analytics.
  2. Akamai Bot Detection
    Ask: Could Akamai be filtering too many people?
    This is possible but unlikely. Akamai typically takes a conservative approach to filtering bots, lest it accidentally exclude real people. It is more likely that Akamai isn’t filtering enough. Review Akamain Bot Manager rules to make sure real people aren’t being excluded.
  3. OneTrust Consent
    Ask: Could people be opting out of tracking at a much higher rate when they reach a landing page?
    While possible, this would infer the opt-in rate for the campaign pages is less than 5%, while the overall opt-in rate for our client’s site is 60%-80%. Deploy a test to determine whether users are even selecting an option in OneTrust.

Bottom line

Reporting discrepancies demonstrate that something is amiss in tracking, not just for our client , but perhaps widely throughout the industry. We recommend that our clients and readers conduct further data collection and testing throughout the data flow and continue to share stories about the potential impact of bots on tracking data. For analysts, this is a discussion we can’t afford to not have.  At Search Discovery, we’ll be having it. And we’ll be trying to get to the bottom of some things. We hope you stay tuned.

Suspect bots? Contact us to get to the bottom of things and mature your analytics ahead of the bots.

Leave a Comment

Your email address will not be published.

Contact Us

Related Posts

Join the Conversation

Check out Kelly Wortham’s Optimization based YouTube channel: Test & Learn Community.

Search Discovery
Education Community

Join Search Discovery’s new education community and keep up with the latest tools, technologies, and trends in analytics.

STAY IN TOUCH

FOLLOW US

SHARE THIS BLOG!

Scroll to Top

KEEP IN TOUCH

Catch the latest industry trends we’re watching and get new insights from our thought leaders delivered directly to your inbox each month.