Considering your marketing strategies and tactics through the lens of an equation can provide powerful clarity, both in how you execute and in how and where you use data and analytics.
What's Your Dependent Variable?
As marketers, we take actions and hope those actions deliver results. Along the way, we expect “data,” in its broadest sense, to help us take the most effective actions at the optimal time in the most efficient manner.
That seems so simple, right? Conceptually, it is. In practice, we know the world is a lot messier. It turns out, having data is not enough. We also have to know which data we care about.
In this post, we will explore a basic feature of the analytics approach—the capability to determine how an action led to results. We will show the impact of that approach on two fronts:
- Forcing clarity as to what meaningful business outcome the activity is expected to deliver
- Laying the groundwork for effectively putting data to actionable use
In particular, we will use a statistical lens to clarify the point. (The actual math in this approach is extremely light!) We will take a brief trip back to grade school and tie what we learned then into the modern world of marketing strategy, marketing analytics, and even data science!
Let’s start with a brief refresher on the formula of a line:
The dependent variable is y. It’s typically on the left side of the equal sign, and everything to the right of the equal sign are the variables used to calculate the value of y. In this basic example, x is an independent variable, m is the slope (aka, the “coefficient for x“), and b is the intercept—a constant base value. An equation of this form is simply a representation of a line:
That’s just math. To extend this basic idea to the world of statistics and prediction, the formula changes, but, really, not that much. The generalized formula for a linear model is:
yi= β0 + β1x1i + β2x2i+...+ εi
This is actually the same basic concept (and formula) as our formula for a line, but with some nods to the complexities of the real world:
- With statistics, we’re trying to make a prediction for the value of y (which has become yi in the new equation). This is still our dependent variable and the main subject of this post.
- That prediction will never be exact, so we added an “error term,” εi (which we’re not going to worry about in this post!).
- b became β0 because Greek letters and subscripts are fancier (there’s a real reason, but we won’t pursue that, either, for now).
- The mx was transformed into multiple βnxni terms. In the real world, predicting some yi (our dependent variable) usually is best done by using multiple factors (think: predicting the temperature may be driven by both the time of day and the day of the year). This means there are multiple independent variables (multiple x’s).
Congratulations! We have now linked “the formula for a line” to “the generalized formula for a linear model.” Many other forms of this basic equation exist, but linear models are powerful and are often the starting point for building a predictive model.
Let’s Talk About the Dependent Variable
There are two subtle—but powerful—characteristics of the dependent variable (yi):
- There is only one! The right side of the equation—the independent variables and error term—can have many different terms! There can be countless independent variables involved! But, on the left side of the equation, there can be only one dependent variable.
- It’s what matters! The independent variables are a combination of drivers of the dependent variable, some of which we can control (our spend on paid search, the price point for our products) and some that we cannot (the day of the week, the time of year, our competitors’ product release cycles). But, the dependent variable is what we care about because it is the substantively important variable on which we are laser-focused. This turns out to be pretty profound, even outside of the realm of prediction!
Considering any potential area of investment through the lens of the dependent variable brings clarity to any initiative which, in turn, drives efficiency in the execution of the work.
Consider the Dependent Variable
Now, think back to our discussion of how and where a dependent variable works in a predictive model. If we think about any initiative we undertake as, “We’re doing some thing or some collection of things in the hopes that we will get a positive result,” then we’re actually already thinking in terms of an equation:
- The thing or things we are doing are our independent variables
- The result is…our dependent variable!
There is real clarity in framing work this way because it forces some clarity of thought:
- What is it we care most about figuring out or affecting with this thing that we’re doing (what is our dependent variable)?
- What do we think is or should affect that thing (our independent variables)?
- Of those actions or decisions that we think should affect that thing, which are ones that we can influence going forward (actionability)?
- Do we already have data to determine if there is an apparent relationship between our potential actions (based on past actions) and the thing we care about? This is typically a correlation-rather-than-causation scenario unless you are very lucky, but there is power in correlation!
- How much value is there in taking action in a way that will enable us to identify a causal relationship between our actions and the thing we care about? For instance, depending on the dependent variable, it may or may not make sense to roll out a change to the website as an A/B test. (But there are myriad other examples!)
Now, before you decide this entire concept is just a pile of theoretical hoo-haw, let’s explore some of the ways this can actually be applied.
It’s Simple…Unless It’s Not!
Consider an example of an online retailer that is planning a multi-channel campaign. What is the dependent variable?
Really. What comes to mind for you as the dependent variable? Do any of these seem like they might be your dependent variable (“the thing that you most care about impacting”):
- Conversion Rate
If you’re like most marketers, this seems easy: revenue! Or, maybe you think it’s a trick question.
It sort of is.
Of the four options listed, revenue is likely the closest to being your dependent variable. But, it would be pretty easy to make a case that either profit or customer lifetime value would be more impactful.
Arguably, the correct answer is “No,” for two reasons:
- Just because the metric is not readily available currently doesn’t mean it’s not worth finding out what would be required to make it readily available.
- Even if a proxy (e.g., revenue) has to be used, keeping the “true” dependent variable in mind can still provide useful focus and clarity for the campaign or initiative!
Select Dependent Variables that Are Closest to Your Goal, Not Closest to What’s Available
To be clear, the dependent variable does not necessarily always need to be a financial metric. After all, there are plenty of activities that occur well upstream of financial results that we expect to ultimately drive financial impact, but which are, perhaps, too far removed from the actual activities (the independent variables!).
Consider a life insurance company example where consumers can apply for a policy online, and the marketing team is evaluating a new advertising channel. What is the appropriate dependent variable? Ultimately, the goal is to deliver high-value customers: policies that are approved and written that then generate a profitable recurring stream of revenue.
But, customer lifetime value might not be the most useful dependent variable for this initiative. The process for getting to that lifetime value may look something like:
- The consumer responds to the advertising (either as a direct click-through or as a view-through—that should be well-measured, too!).
- The consumer then submits an application online.
- The consumer may be required to get a medical exam.
- The insurance company’s underwriting team then has to process the application and approve it.
- The consumer begins paying for the policy.
- That payment continues for the life of the policy.
- The consumer may or may not ultimately collect on the policy.
What is the most appropriate dependent variable in this case? There is no “right” answer. Since this is an entirely new advertising channel, it may make sense to use “applications submitted” as the dependent variable:
- This is a dependent variable for which Marketing has the most control over the independent variables. For instance, if the underwriters are consistently returning pricing that is much higher than the market for similar coverage, then there may be a falloff in actual paid policies that is completely out of Marketing’s control.
- If the new advertising channel can’t at least generate a reasonable number of applications, then it can’t possibly deliver the desired downstream financial impact.
This doesn’t mean that “applications submitted” is the best dependent variable, though. Rather, it is an illustration that, by having the discipline to identify a singular dependent variable, an organization forces itself to have a meaningful discussion about what result they are focusing on for any given investment.
The Bonus: Thinking This Way Opens the Door to Data Science!
There is a nontrivial side benefit of thinking in terms of independent variables and a dependent variable: it’s the language of machine learning and data science!
Guaranteed, when you sit down with a data scientist to talk about the advanced analytics you are looking to bring to your initiative, you will quickly be asked a couple of questions right off the bat:
- “What’s our dependent variable for this?”
- “What is our unit of analysis?”
We’re going to leave the nuts and bolts of that second question for another day, but having a solid answer for the first question will catapult the discussion forward!
The Dependent Variable for This Post
It only seems fair, since I put a reasonable level of effort into penning this post, that I share what my dependent variable is for publishing it. Why did I write it? What outcome am I hoping to achieve? I’m hoping to simply arm analysts and marketers with a way of thinking and a technique that they can readily understand and apply on their own. If you’re so inclined, you can click this link (edit the tweet as you see fit) and let us know if I impacted my dependent variable!