In this era of “big data,” business decisions are increasingly being made on the basis of data-centered evidence. This is especially true in the online arena, where a tradition of A/B testing and data-focused decision making has dominated. But even the best data does not necessarily facilitate accurate interpretation of customer actions. Any web analytics package will provide multiple attribution methods (methods for apportioning credit for the sale across various customer touchpoints), and none of those attribution methods is clearly superior to another. Thus, now more than ever, analysts need to understand the assumptions underlying any data-based decision that gets made.
This case examines an even more difficult attribution problem—how to apportion credit for sales across all advertising channels, both online and offline.
CMO: How is our ad budget allocation changing this year compared to last?
VP of Advertising: We’re doubling our allotment to digital channels like social media ads and online search ads and paring back our spending on traditional channels like television and magazine ads. Overall in the advertising industry, traditional ad channels are declining and digital channels are growing, and we’re leading the way.
CMO: Great! So are we seeing better results now?
VP of Advertising: We are now showing advertisements in over ten different channels, so consumers are being exposed to our advertisements on more media than ever.
CMO: Okay, good, but are we getting more bang for the buck?
VP of Advertising: We feel that being exposed to more ads in more locations can only help sell customers on our brand.
CMO: That’s probably true, but is there any evidence that moving budget away from television to digital channels is bringing in more sales?
VP of Advertising: It’s impossible to know for sure, but we think keeping ahead of recent trends is a good idea.
In the internet era, many customer actions can be measured. As a result, advertisers are under increasing pressure to use this customer data to show that their ads are increasing sales. But even with careful tracking of all possible customer data, problems with attribution can cause faulty conclusions about the effectiveness of various online ads. For example, last-click attribution typically exaggerates the effect of search marketing efforts, and first-click attribution can give highly errant results with small changes in an arbitrary time window assumption. No ready solution to the attribution problem has yet been developed, so marketing analysts must simply keep in mind that their data is not 100% reliable.
Even more difficult than the attribution problem within digital marketing is the attribution problem across an entire advertising budget, including both online and offline ad spending. Even if a marketing analyst could be confident in attributing sales to marketing efforts in email, online search, social media, and display advertising, how could she determine the relative effectiveness of marketing efforts in television, billboard, magazine, and catalogs? An analysis that would accurately determine the relative effectiveness of the myriad advertising channels would be extremely valuable to any business, but an analysis of this kind is extremely difficult. The largest marketing research company in the world, The Nielsen Company, along with another marketing research heavyweight, Arbitron, undertook a joint project in 2005 to make such an analysis possible. The project was terminated three years later and deemed an expensive failure (https://magnostic.wordpress.com/2008/02/25/marketing-measurement-misplay-project-apollo-is-dead/).
In 2013, Peter Danaher and Tracey Dagger, marketing scholars from Monash University, in Melbourne, Australia, published the results of a research project for a large Australian retailer in which they were able to measure the relative effectiveness of advertising expenditures across ten different advertising channels spanning both online and offline advertising activities. In other words, Danaher and Dagger were able to solve the attribution problem, not just for the digital marketing, but for all marketing channels. This case describes the methods they used to collect the data and run this analysis.
When a marketing analyst is trying to determine the effect of advertising expenditures on sales, what she is trying to determine is whether seeing an advertisement caused an individual (or several individuals) to make a purchase. Advertising is only effective if it changes individuals’ behavior. As a result, the only way to reliably determine the effectiveness of advertising is to measure both advertising exposure and purchasing at the individual level. That is, a company would need a list of its customers along with data on their purchasing and amount of exposure to all forms of advertising done by the company. The company could then analyze this data and determine whether customers who saw more television ads subsequently spent more than customers who saw fewer television ads for the company.
Collecting such data is challenging. Many marketing research companies collect portions of this data, but none of them collect all of this data at the individual level. To collect this data, Danaher and Dagger used the loyalty program members of the Australian retailer. (The retailer wishes to remain anonymous, but it is an upscale department store analogous to Macy’s in the United States.) Specifically, they sent an invitation to an online survey to 20,000 randomly- selected members of the loyalty program (hereafter LP) who fit the target market (women between the ages of 25 and 54) on the day after the conclusion of a major four-week-long sale and accompanying advertising campaign. The survey measured LP members’ exposure to the retailer’s ads across all 10 advertising channels used by the retailer during the ad campaign for the sale. The LP program maintained a database of each member’s purchase history, so sales of each LP member could be retrieved from this database and matched to the data on her advertising exposure.
The sale began on Wednesday, September 22, 2010 and concluded on Sunday, October 17, 2010. This sale was accompanied by a four-week-long advertising blitz across ten advertising channels, including mass media channels (television, newspapers, radio, and magazines), electronic media outlets (online display ads, Google search ads, and social media ads), and direct media (catalogs, postal mail, and e-mail). Across all media, ads were consistent in their appearance and messaging, announcing “massive discounts” on a wide range of products or on specific featured items. Table 1 shows the relative spending on these ten advertising channels and various measures of the resulting reach.
|Online display||34||180||61||16 million|
|Search (Google)||3||39||21||15,200 paid clicks|
1 GRP stands for gross ratings points, which is a standard way to measure advertising exposures. GRP is calculated as Reach (%) × Average frequency (#). A GRP of 100 indicates enough ad exposures to cover the entire population, though this score could come from a reach of 50% and average frequency of 2 or a reach of 100% and frequency of 1. Television’s GRP of 1,048 indicates that people on average saw the advertisement over 10 times.
Measuring an individual’s exposure to multiple advertising channels is a difficult task. Market research companies have developed sophisticated measurement techniques for measuring exposure to a single medium, such as Nielsen’s People Meter panel for television and Arbitron’s panel for radio. These companies typically require participants to keep a diary of every exposure to the medium in question. For example, participants in Arbitron’s radio panel will record every instance of radio listening for a week, including the radio station listened to and the length of time spent listening. Keeping such diaries is labor-intensive for one medium and thus would be impossible for ten media.
As a result, media exposure was measured through the survey sent after the sale and ad campaign concluded. Because the retailer had a known media plan, the survey could be limited to asking about the media on which the retailer had advertised. For example, instead of asking an LP member for every instance of TV viewing during the four-week advertising campaign, the survey asked, “In the past four weeks, how many episodes of Desperate Housewives have you seen?” For newspapers, LP members were asked, “On which days did you read or look into these newspapers in a typical week?” To measure exposure to online display ads and social media ads, participants were asked their frequency of visiting the sites on which the retailer had placed banner ads. To measure exposure to Google search ads, the survey asked, “About how many times did you do a Google search for [retailer] in the past 4 weeks?” To measure exposure to radio ads, the survey asked respondents about their typical weekly radio-listening habits.
Because the purpose of the study was to determine how ad exposure influences purchasing, measurements of media exposure must be converted to measurements of ad exposure. Ad exposure was measured using the traditional GRP, with a major difference being that GRP in this case indicates an individual’s exposure to ads in that channel rather than the population-level exposure. Individual-level GRP was calculated from the individual’s exposure to the medium in question combined with the number of times an ad was shown on that medium. For example, if an individual watched 3 of 4 episodes of Desperate Housewives and the retailer advertised on this show twice, the individual’s GRP would be 150 for this show (100 × 3⁄4 × 2). The same calculation would be carried out for all television shows on which the retailer advertised, and the individual’s television GRP would be a summation of the GRP numbers for all television shows on which the retailer advertised.
Fitting the Model
This case is not meant to provide an in-depth study on statistical modeling, so it will skirt many of the details of the model, but some of the basic aspects of the model must be discussed if the reader is to develop an understanding of this research project and have any hope of replicating it. Table 2 shows a small portion of the data as they were formatted to enable fitting of the statistical model.
The desired end result of the statistical model is measurement of the effectiveness of each advertising channel. That is, we wish to know whether and by how much advertising expenditures in a given channel increased sales. In order for advertising to influence sales, it has to influence an individual to either (1) make a purchase when she otherwise would not have purchased or (2) spend more money than she otherwise would have. To determine whether advertising influenced the first behavior, or purchase incidence, we could run a logistic regression or probit regression model with the Purchase variable as the dependent variable and the GRP data as independent variables. This model would indicate which advertisement channels influenced LP members to shop when they otherwise might not have shopped. But we would not be able to determine whether advertising influenced the amount of money they spent. To determine whether advertising influenced the second behavior, or purchase amount, we could fit a linear regression model with the Spending variable as the dependent variable and the same GRP data as independent variables. But the Spending variable has several 0s in it. Roughly 45% of LP members in the sample made no purchases at all during the sale period. Running a regression on data with these 0s violates the assumptions of linear regression, so our results would be biased.
The model used by Danaher and Dagger is called a Type II Tobit model. The model first fits a probit model to the Purchase variable to determine whether advertising influenced purchase incidence. It then fits a linear regression model to the Spending variable but ignores the 0 data to determine whether advertising influenced purchase amount2.
A number of important statistical issues arise in the fitting of this model. This case will briefly discuss three of these issues. They are:
- customer heterogeneity
- purchase/viewing bias
Other issues besides these three arise, but we select these three issues for discussion because they illustrate important points about analyzing market data that every marketer should understand.
Customer heterogeneity refers to the fact that customers differ from one another. One important way in which customers differ from one another for our model is a difference in underlying purchase propensity. If one customer spends $500 and another spends $100 during the sale period, the model will attribute the first customer’s higher spending to the media outlets to which this customer received more exposure. But it could be that this $500 expenditure was a drop from her usual $1000-per-month spending at this store while the $100 expenditure made by the second customer was an increase from a typical expenditure of $0. The model will be biased if it does not correct for the customers’ baseline level of spending. To correct for this difference in baseline spending, the model also included a variable expressing the amount spent by each customer in the nine-month period before the start of the sale.
The purchase-viewing bias refers to the fact that someone who is a frequent shopper at the retailer might also be a heavy media viewer. If so, the model would incorrectly infer that it was the exposure to the many ads that led to her large purchase level. But this correlation could be spurious. To correct for this, the model included a variable measuring each LP member’s general level of media consumption.
Endogeneity, the third issue, is a very technical problem that arises frequently in marketing data. Though it is a problem for statistical models, it is often a sign of good strategic marketing decisions. In the case of the current data, endogeneity problems arise because the marketing managers for this retailer were strategic in directing their advertising to the customers who were most influenced by the advertising. The retailer obviously wants to encourage this kind of optimal advertising allocations, but it makes modeling more difficult because it biases the model results. The example data shown in Table 3 illustrates why.
Consider a very simple movie store that sells DVDs. Every week, the store advertises the latest new DVD for sale. Table 3 shows the advertising and sales of DVDs in three successive weeks. In week 1, a small independent film is the only new title. Knowing the movie to be of limited appeal, the store invests only $100 in advertising the new film and is able to sell $1000 worth of DVDs. The next week, an action movie comes out. Because this movie has a larger market, the store invests $500 into advertising and achieves $10,000 in sales. Finally, in week 3, a major blockbuster movie with huge market appeal is released. The store puts $2,000 into advertising this movie and achieves $25,000 in sales.
2This description is not 100% accurate, but an accurate description would require more technical detail than this case is intended to give. The gist of this description is accurate even if a few technical details are omitted.
|Week||Movie Type||Advertising||Sales||Market Share|
What if we were to analyze the effect of advertising on sales? From this table, it appears that advertising has a dramatic influence on sales. When advertising increased, sales also increased. But the market share data reveals that the effect of advertising was not so dramatic. If anything, the advertising merely preserved the store’s share of the market. What explains this strong relationship between advertising and sales if advertising is not causing larger sales? A third variable, audience size, is influencing both advertising and sales. When the audience size is high, the store advertises more and sales are higher. A portion of the higher sales level is due to the higher advertising, but that portion is small. If the store had spent $2,000 to advertise the independent movie from week 1, we would not have observed sales anywhere near $25,000.
The term endogeneity refers to the fact that advertising levels were not set randomly but were set strategically to maximize sales, the dependent variable. A third variable, audience size, is causing an exaggerated relationship between advertising and sales. If we were to run a regression model predicting sales from advertising, the regression would tell us that advertising had a much larger effect on sales than it was having in reality. This is a major concern for the analysis done by Danaher and Dagger. If the retailer was at all strategic in setting advertising levels, the analysis would be misleading, and since most competent marketers are strategic in their decisions, the analysis here would be biased if the endogeneity issue were not addressed. As a result, Danaher and Dagger had to use a statistical technique known as instrumental variables to account for the fact that advertising levels are endogenous.
Recall that the Type II Tobit model used to analyze the relationship between advertising and sales is really two different models—a model of purchase incidence and a model of purchase amount. Table 4 shows the coefficients of both parts of the model depicting the effect of advertising exposure on sales. The stars next to the coefficients report whether those coefficients are statistically significant. The results indicate that exposure television ads, radio ads, Google search ads, the sale catalog and postal mail ads significantly increased purchase incidence. Being exposed to those ads significantly increased the likelihood that a shopper would visit the store and make a purchase. The analysis indicates that newspaper ads, magazine ads, online display ads, social media ads, and email blasts had no significant effect on the likelihood of LP members’ making a purchase during the sale. The model of purchase amount shows similar results, though with some slight differences. The model indicates that exposure to advertising on television, newspaper, and radio influenced the amount customers spent, as did exposure to the sale catalog, but exposure to advertising on other channels had no reliable effect.
|Purchase Incidence||Purchase Amount|
Surprisingly, the results of this analysis indicate that advertising in the more traditional media outlets of television, newspaper, and radio are effective, as are catalog distribution and postal mail ads. On the other hand, of the newer digital media outlets, only search ads were found to have a reliable effect on sales. Online display ads, emails, and social media ads were all found to have no reliable effect on either purchase incidence or purchase amount. Given the large growth in digital advertising, it is disappointing that the data from this study indicate that most digital advertising techniques provide no significant increase in sales. What accounts for this surprising result?
The most likely explanation for the ineffectiveness of most of the digital advertising is the nature of this retailer’s website. This retailer’s website is almost purely informational. Very few products are available for purchase on the retailer’s website, and none of the items being discounted during the sale and accompanying ad blitz were available on the website. Countless other investigations have found that digital advertising positively affects online sales, so the inability of customers to immediately purchase the advertised item on the website was likely a huge missed opportunity for additional sales. Indeed, follow-up analyses show that all forms of digital advertising had a positive effect on visits to the retailer’s website. If the website had made sale products available, digital advertising would likely have had a significant effect on purchase incidence and purchase amounts.
An additional possible reason for the ineffectiveness of online display ads was a poor choice of websites used to show these ads. Most of the online display ads were shown on the web version of the same newspapers selected to show print ads. The online display ads may have been more effective had they been shown on different websites and used ad copy that was more suited to online display rather than the same ads being shown in the physical newspapers.
Replicating the Research
Other companies should be able to apply the methods described in this case to determine the relative effectiveness of the various available advertising channels for their company, but these methods are not cheap or easy to apply. Anyone attempting to replicate this research should be aware of potential pitfalls.
First, measurement of ad exposure will always be difficult, especially if the company is running advertisements on multiple channels. Measurement of ad exposure by Danaher and Dagger was feasible in some instances because the retailer was focused in some of its ad buys. For example, the retailer displayed online ads only on a few select online newspapers. Had the retailer instead purchased online display ads through an ad network, these ads would likely have displayed on thousands of websites, making it impossible to measure ad exposure through a survey. Instead, data on ad exposure would have required retrieval from the ad network. But the ad network keeps the identities of shoppers anonymous, so matching online ad exposure data to online and offline sales data would have proved difficult.
This matching of offline and online purchasing will be another difficulty faced by anyone trying to apply these methods. In the study described here, virtually all purchasing occurred offline. But many companies offer both online and offline purchasing ability, so utilizing these methods will require matching purchase data from both channels. This is not always feasible. And when it is feasible, it is not always ethical, because customers’ privacy may be violated.
Privacy concerns are likely to arise any time a company tries to match online activity with a person’s identity. Most data collected online is anonymous in that anyone using the data does not know the name or address of an individual. But running a study of advertising effectiveness would require matching online activity with a person’s identity. Anyone running such an analysis should consult the latest privacy guidelines before proceeding.
Case Study Questions
- Which of the ten media channels do you think was least reliably measured? Do you think the measurement of ad exposure in this channel was biased (systematically higher or lower than the real number) or just highly variant (sometime higher and sometimes lower than the real number)?
- Think up another situation in which you would expect to find endogeneity.
3.In general, do you think online media is better for inducing purchase incidence or purchase amount?
- Could the methodology described in the case be applied to determine advertising effectiveness of a consumer packaged good like Tide? Why/why not?