**Considering your marketing strategies and tactics through the lens of an equation can provide powerful clarity, both in how you execute and in how and where you use data and analytics.**

#### What's Your Dependent Variable?

As marketers, we take actions and hope those actions deliver results. Along the way, we expect “data,” in its broadest sense, to help us take the most effective actions at the optimal time in the most efficient manner.

That seems so simple, right? Conceptually, it is. In practice, we know the world is a lot messier. It turns out, having data is not enough. We also have to know which data we care about.

In this post, we will explore a basic feature of the analytics approach—the capability to determine how an action led to results. We will show the impact of that approach on two fronts:

- Forcing clarity as to what
*meaningful business outcome*the activity is expected to deliver - Laying the groundwork for effectively putting data to
*actionable use*

In particular, we will use a statistical lens to clarify the point. (The actual math in this approach is extremely light!) We will take a brief trip back to grade school and tie what we learned then into the modern world of marketing strategy, marketing analytics, and even data science!

**It All Goes Back to the Formula for a Line**

Let’s start with a brief refresher on the formula of a line:

*y=mx+b*

The dependent variable is *y*. It’s typically on the left side of the equal sign, and everything to the right of the equal sign are the variables used to calculate the value of *y*. In this basic example, *x* is an *independent variable*, *m* is the slope (aka, the “coefficient for *x*“), and *b* is the intercept—a constant base value. An equation of this form is simply a representation of a line:

That’s just math. To extend this basic idea to the world of statistics and prediction, the formula changes, but, really, not that much. The generalized formula for a linear model is:

*y*_{i}= β_{0}
+ β_{1}x_{1i} + β_{2}x_{2i}+...+ ε_{i}

This is actually the same basic concept (and formula) as our formula for a line, but with some nods to the complexities of the real world:

- With statistics, we’re trying to
*make a prediction*for the value of*y*(which has become*y*in the new equation). This is still our_{i}*dependent variable*and the main subject of this post. - That prediction will never be exact, so we added an “error term,” ε
_{i}(which we’re not going to worry about in this post!). *b*became βbecause Greek letters and subscripts are fancier (there’s a real reason, but we won’t pursue that, either, for now)._{0}- The
*mx*was transformed into*multiple*β_{n}*x*terms. In the real world, predicting some_{ni}*y*(our dependent variable) usually is best done by using multiple factors (think: predicting the temperature may be driven by both the time of day and the day of the year). This means there are multiple_{i}*independent variables*(multiple x’s).

Congratulations! We have now linked “the formula for a line” to “the generalized formula for a *linear model*.” Many other forms of this basic equation exist, but linear models are powerful and are often the starting point for building a predictive model.

**Let’s Talk About the Dependent Variable **

There are two subtle—but powerful—characteristics of the dependent variable (yi):

**There is only one!**The*right*side of the equation—the independent variables and error term—can have many different terms! There can be*countless*independent variables involved! But, on the left side of the equation,*there**can be only one**dependent variable*.

**It’s what matters!**The independent variables are a combination of drivers of the dependent variable, some of which we can control (our spend on paid search, the price point for our products) and some that we cannot (the day of the week, the time of year, our competitors’ product release cycles). But, the dependent variable is*what we care about*because it is the substantively important variable on which we are laser-focused. This turns out to be pretty profound, even outside of the realm of prediction!

Considering any potential area of investment through the lens of the dependent variable brings clarity to any initiative which, in turn, drives efficiency in the execution of the work.

**Consider the Dependent Variable **

Now, think back to our discussion of how and where a dependent variable works in a predictive model. If we think about *any* initiative we undertake as, “We’re doing some *thing* or some collection of *things* in the hopes that we will get a positive result,” then we’re actually already thinking in terms of an equation:

- The thing or things we are doing are our
*independent variables* - The result is…our
*dependent variable*!

There is real clarity in framing work this way because it forces some clarity of thought:

- What is it we care most about figuring out or affecting with this thing that we’re doing (what is our dependent variable)?
- What do we
*think*is or should affect that thing (our independent variables)? - Of those actions or decisions that we think should affect that thing, which are ones that we can influence going forward (actionability)?
- Do we already have data to determine if there is an apparent relationship between our potential actions (based on past actions) and the thing we care about? This is typically a correlation-rather-than-causation scenario unless you are very lucky, but there is power in correlation!
- How much value is there in taking action in a way that will enable us to identify a
*causal*relationship between our actions and the thing we care about? For instance, depending on the dependent variable, it may or may not make sense to roll out a change to the website as an A/B test. (But there are myriad other examples!)

Now, before you decide this entire concept is just a pile of theoretical hoo-haw, let’s explore some of the ways this can actually be applied.

**It’s Simple…Unless It’s Not! **

Consider an example of an online retailer that is planning a multi-channel campaign. What is the dependent variable?

Really. What comes to mind for you as the dependent variable? Do any of these seem like they might be your dependent variable (“the thing that you most care about impacting”):

- Visits
- Conversion Rate
- Orders
- Revenue

If you’re like most marketers, this seems easy: revenue! Or, maybe you think it’s a trick question.

It sort of is.

Of the four options listed, revenue is likely the closest to being your dependent variable. But, it would be pretty easy to make a case that either *profit* or *customer lifetime value* would be more impactful.

Why don't we think of one of those two immediately? Often, it's because neither is a metric that is readily available in the platform(s) we regularly use. Is that, on its own, a good reason to forget what the true dependent variable should be?

Arguably, the correct answer is “No,” for two reasons:

- Just because the metric is not readily available currently doesn’t mean it’s not worth finding out what would be required to make it readily available.
- Even if a proxy (e.g., revenue) has to be used, keeping the “true” dependent variable in mind can still provide useful focus and clarity for the campaign or initiative!

**Select Dependent Variables that Are Closest to Your Goal, Not Closest to What’s Available **

To be clear, the dependent variable does not necessarily always need to be a financial metric. After all, there are plenty of activities that occur well upstream of financial results that we expect to ultimately drive financial impact, but which are, perhaps, too far removed from the actual activities (the independent variables!).

Consider a life insurance company example where consumers can apply for a policy online, and the marketing team is evaluating a new advertising channel. What is the appropriate dependent variable? *Ultimately*, the goal is to deliver high-value customers: policies that are approved and written that then generate a profitable recurring stream of revenue.

But, customer lifetime value might not be the most useful dependent variable for this initiative. The process for getting to that lifetime value may look something like:

- The consumer responds to the advertising (either as a direct click-through or as a view-through—that should be well-measured, too!).
- The consumer then submits an application online.
- The consumer may be required to get a medical exam.
- The insurance company’s underwriting team then has to process the application and approve it.
- The consumer begins paying for the policy.
- That payment continues for the life of the policy.
- The consumer may or may not ultimately collect on the policy.

What is the most appropriate dependent variable in this case? There is no “right” answer. Since this is an entirely new advertising channel, it may make sense to use “applications submitted” as the dependent variable:

- This is a dependent variable for which Marketing has the most control over the independent variables. For instance, if the underwriters are consistently returning pricing that is much higher than the market for similar coverage, then there may be a falloff in actual paid policies that is completely out of Marketing’s control.
- If the new advertising channel can’t
*at least*generate a reasonable number of applications, then it can’t possibly deliver the desired downstream financial impact.

This doesn’t mean that “applications submitted” is the best dependent variable, though. Rather, it is an illustration that, by having the discipline to *identify a singular dependent variable*, an organization forces itself to have a meaningful discussion about what result they are focusing on for any given investment.

**The Bonus: Thinking This Way Opens the Door to Data Science! **

There is a nontrivial side benefit of thinking in terms of independent variables and a dependent variable: it’s the language of machine learning and data science!

Guaranteed, when you sit down with a data scientist to talk about the advanced analytics you are looking to bring to your initiative, you will quickly be asked a couple of questions right off the bat:

- “What’s our dependent variable for this?”
- “What is our unit of analysis?”

We’re going to leave the nuts and bolts of that second question for another day, but having a solid answer for the first question will catapult the discussion forward!

**The Dependent Variable for This Post **

It only seems fair, since I put a reasonable level of effort into penning this post, that I share what my dependent variable is for publishing it. Why did I write it? What outcome am I hoping to achieve? I’m hoping to simply arm analysts and marketers with a way of thinking and a technique that they can readily understand and apply on their own. If you’re so inclined, you can click this link (edit the tweet as you see fit) and let us know if I impacted my dependent variable!