Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on an Online Educational Platform: New Data and New Results



Published Jun 21, 2023
Adam C. Sales Ethan B. Prihar Johann A. Gagnon-Bartsch Neil T. Heffernan


Randomized A/B tests within online learning platforms represent an exciting direction in learning sciences.
With minimal assumptions, they allow causal effect estimation without confounding bias and
exact statistical inference even in small samples. However, often experimental samples and/or treatment
effects are small, A/B tests are underpowered, and effect estimates are overly imprecise. Recent
methodological advances have shown that power and statistical precision can be substantially boosted
by coupling design-based causal estimation to machine-learning models of rich log data from historical
users who were not in the experiment. Estimates using these techniques remain unbiased and inference
remains exact without any additional assumptions. This paper reviews those methods and applies them
to a new dataset including over 250 randomized A/B comparisons conducted within ASSISTments, an
online learning platform. We compare results across experiments using four novel deep-learning models
of auxiliary data and show that incorporating auxiliary data into causal estimates is roughly equivalent to
increasing the sample size by 20% on average, or as much as 50-80% in some cases, relative to t-tests,
and by about 10% on average, or as much as 30-50%, compared to cutting-edge machine learning unbiased
estimates that use only data from the experiments. We show that the gains can be even larger for
estimating subgroup effects, hold even when the remnant is unrepresentative of the A/B test sample, and
extend to post-stratification population effects estimators.

How to Cite

Sales, A. C., Prihar, E. B., Gagnon-Bartsch, J. A., & Heffernan, N. T. (2023). Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on an Online Educational Platform: New Data and New Results. Journal of Educational Data Mining, 15(2), 53–85.
A/B tests, deep learning, evaluation

