A circular puzzle of Google Analytics bar chart logomark.

GA4 Guide Chapter 3: How Data is Collected on a Website or Native Application

,

Sep 13, 2022

A quick overview of how Analytics tools work (JavaScript or mobile SDK). How to use dev tools to see what data an analytics tool is collecting.

THE BASICS: HOW DO ANALYTICS TOOLS COLLECT DATA?

Google Analytics is outrageously popular. As I’m writing this at the close of 2022, W3Techs is reporting that Google Analytics is installed on “86% of all websites whose traffic analysis tool we know,” and the platform has been the Beyoncé of digital marketing ever since it was acquired in 2005.

This broad adoption has allowed Google Analytics to set the standard for how marketers and web developers measure success. For example, if Google Analytics had not introduced us all to the concept of a session, likely, we would still be measuring traffic volume in hits or server logs.

To understand how Google Analytics got here, let’s take a quick look at where it came from.

WHERE DID GOOGLE ANALYTICS COME FROM?

We’ve talked a lot about Google Analytics. But let’s return to the basics: How do analytics tools work to collect data? In this chapter, we’ll review how data is collected on a website or native application, including how to use dev tools to see what data is being collected.

THE BEHAVIORAL TRAFFIC BACKSTORY

You might be wondering how data gets from point A to point B. How does website traffic, mobile app traffic, or other high-level data get collected? To understand how this is done today, it is helpful to understand how behavioral tracking has evolved over the past 30 years or so.

In the 1990s, website developers commonly deployed page script hit counters. These simple calculators would call a server every time the page loaded, and this allowed the server to tally the number of visits to the page. This gave you a sense of how often a page was being viewed, but it was difficult to garner any nuanced reporting from the script counter.

The more sophisticated organizations and enterprises of the 1990s used server log analysis to collect certain metadata along with the server call, and enable analysts to do slightly more advanced analysis. Server log analysis was also limited in its reporting, and because of the extra expense, it was not a feasible option for most companies.

The introduction of cookies changed the way data could be collected and reported. Cookies are small pieces of text sent to your browser by the website you’re visiting. Cookies are stored on your computer and help websites remember information about your visit as you navigate from page to page, or between sessions. Cookies make it possible to recognize a new vs. returning user, and much more.

The next major advancement came with the introduction of tracking pixels. Tracking pixels are able to track more detailed user behavior because they are executed in the browser rather than on a server. They can read and write cookies, and use JavaScript to detect events. When a call is made to transfer this data to the server, a tiny transparent pixel is requested in response. By the early 2000s, the use of cookies and tracking pixels by innovative startups like Urchin and Omniture had dramatically improved the way we collect data and generate server calls.

OVERVIEW OF THE SERVER CALL

Server calls package and send data from a website to a server. Once a server call is received by the server, the data is parsed and sent to a database. From there, the collected data is used to populate reports.

Think of it this way: you’re shopping on your favorite online store and click to see a product. The next page shows a detailed product description, an image, and options to buy. The website knew what to send you—which image, which description, which options—because of server calls. And your interaction with those items provided valuable data for the website to store and report.

But what does a server call look like?

‍

This is what a traditional client-side tagging implementation looks like. Data layer events are the way to send data from your website to the tag manager library that you have loaded on your website, and then, when events happen we can send data out to a set number of endpoints that we pre-appoint. Read more about event-driven data layers (EDDLs)

‍

Inside this site console is all the code for the website, including the server call. Server calls are generated via code. Website server calls are written in JavaScript. Nonwebsites, including mobile apps, use other SDKs to write their server calls.

Server calls are composed of GET and POST requests, which allows for the configuration to determine how the data is submitted to the server. GET requests include all required data in the URL, while POST requests pull additional data from the client to send to the server. GET requests remain in browser history as part of the URL while POST requests are not saved in browser history and therefore need to be resubmitted with each server call.

Have you ever landed on a page and gotten a “404 Page Not Found” error? That’s an example of a status code sent back from a server call. In this case, the server call failed. But successful server calls are also logged, even if these aren’t as visible to the user.

Whenever a server responds to the server call request, it sends back a status code to the browser. Status codes are organized into five categories:

Information: These codes are issued when a request is received and understood. It also alerts the client to wait for another final response.
Successful: Successful status codes include GET and POST requests, as well as other codes to identify a successful or partially successful server call.
Redirection: This type of code indicates the website, needs to take additional steps to complete the request, such as the implementation of a URL redirection.
Client Error: These codes appear as a result of an error caused by the website where the server cannot respond to the request. Common codes include Request Timeout or Not Found.
Server Error: When the server fails to complete the request, a server error code occurs.

Page URLs are also connected to server calls. URLs are composed of three main segments:

Protocol: This is how the browser gets information about the page, either http or https.
Domain name: This is the address where the site is located. The subdomain comes before the main domain and may help organize website content. This might be www or some other configuration of letters.
Path: This is the exact page location.

Take this page URL for example: https://www.searchdiscovery.com/the-complete-guide-to-ga4/

The protocol is “https”, identifying the browser as pulling information from a secure server. The domain is “searchdiscovery.com” and the subdomain is “www.” The path is the remaining string, which brings the user directly to the specific page on the domain.

How does this relate to server calls? The content of a URL dictates how the request is sent to the website host server and what GET requests are made. Information collected from the website can be organized specific to the path in the URL. If any part of the URL is mistyped, the server cannot complete its call and a status code error, such as “404 Page Not Found” will appear.

OVERVIEW OF SERVER-SIDE TAG MANAGERS

So far, we’ve discussed how server calls are used to send data to a data collection server like those managed by Google Analytics, but with the development of server-side tag managers the server call can first be sent to a middleman before it is then passed on to your endpoint for data collection. Using a server-side tag manager (like server-side Google Tag Manager) has a few key benefits:

It allows developers to remove advertising code from the website and have it run on the server instead, which can simplify the processing done by your browser or mobile device and potentially improve page load speed.
Website owners can centrally validate the data that is collected across all environments (web, iOS and Android) before passing it along to the data collection server.
Developers can enrich the data before passing it along to the data collection server. There are many use cases for this, but the most common is to transfer personally identifiable information about a user between endpoints without exposing that data to the client (if the user has provided the proper consent, of course).

Server-side tag managers also extend cookie life. Instead of 1–7 days of validity, cookies are good for up to 30 days, giving greater data visibility and reporting.

ENTER THE ANALYTICS TOOL

Your data has been collected via server calls and server-side tag managers. Now what? Analytics tools are then implemented to retrieve data from one or more systems and collate it in a repository, such as a data warehouse. The data is then ready to be reviewed and analyzed.

Analytics tools provide great insight into data, including different segmentation and analysis. They also allow you to evaluate the health of your website and its performance. Insights gained from analytics tools inform teams on how best to optimize their websites to enhance visitor actions.

When choosing an analytics tool for your organization, you’ll want to consider how user-friendly it is and how easy it is to learn. You’ll also want to look at its reporting capabilities and what options you have for segmenting, filtering, and visualizing your data.

Analytics tools are incredibly important to businesses, but it all starts with server calls. Server calls lay the foundation for websites to collect the data they need.

GA4 Guide Chapter 3: How Data is Collected on a Website or Native Application

THE BASICS: HOW DO ANALYTICS TOOLS COLLECT DATA?

WHERE DID GOOGLE ANALYTICS COME FROM?

THE BEHAVIORAL TRAFFIC BACKSTORY

OVERVIEW OF THE SERVER CALL

OVERVIEW OF SERVER-SIDE TAG MANAGERS

ENTER THE ANALYTICS TOOL

Read More Insights From Our Team

Comparing Metrics for UA and GA4: What Differences to Expect

From Odd Couple to Power Couple: Use Cases for Extending GA4 with Adobe CJA

Take action on Google’s automatically-created GA4 properties announcement

Take your company further. Unlock the power of data-driven decisions.