New paper on using A-B tests for academic research

Posted on May 17, 2024 by Michael Braun

Online A-B tests using targeted ad platforms are not randomized experiments. That can put the internal validity of your study in doubt. This is the subject of my new paper, published in the Journal of Consumer Research.

In “Leveraging Digital Advertising Platforms for Consumer Research,” Bart De Langhe (Vlerick), Stefano Puntoni (Wharton), Eric Schwartz (Michigan), and I explain why experiments using digital advertising platforms are likely not answering the question you are asking.

Many online advertising platforms provide tools to help researchers conduct experiments on the relative effectiveness of ads. But platforms target different users to different ads, based on the content of those ads. This creates a confound that makes it impossible to separate the effect of the ad from the effect from how an online platform’s targeting algorithm decides which users see those ads. As a result, which of the platform’s users are included in the experiment, and how experimental subjects are selected to be exposed to each treatment, are not randomized.

Our paper lays out this problem from the perspective of academic researchers who may be considering conducting field experiments using these tools. We want these researchers to understand the trade-offs between the convenience of only A-B testing and the threats to internal validity that result from ad targeting. In this paper, we provide a set of guidelines for researchers (and editors) to follow when determining whether certain online experimental designs are appropriate, and how much weight should be placed on the resulting inferences.

Eric and I also have a complementary paper that goes into some quantitative detail about how divergent delivery of different ads to different users generates bias in A-B test results. This paper takes the perspective of the practicing marketer who uses A-B test results to make strategic decisions based on which creative elements of ads are most effective.

Here’s the full abstract of the JCR paper:

Abstract

Digital advertising platforms have emerged as a widely utilized data source in consumer research; yet, the interpretation of such data remains a source of confusion for many researchers. This article aims to address this issue by offering a comprehensive and accessible review of four prominent data collection methods proposed in the marketing literature: informal studies, multiple-ad studies without holdout, single-ad studies with holdout, and multiple-ad studies with holdout. By outlining the strengths and limitations of each method, we aim to enhance understanding regarding the inferences that can and cannot be drawn from the collected data. Furthermore, we present seven recommendations to effectively leverage these tools for programmatic consumer research. These recommendations provide guidance on how to use these tools to obtain causal and non-causal evidence for the effects of marketing interventions, and the associated psychological processes, in a digital environment regulated by targeting algorithms. We also give recommendations for how to describe the testing tools and the data they generate and urge platforms to be more transparent on how these tools work.