Randomized A/B tests within online learning platforms represent an exciting direction in learning sciences.
With minimal assumptions, they allow causal effect estimation without confounding bias and
exact statistical inference even in small samples. However, often experimental samples and/or treatment
effects are small, A/B tests are underpowered, and effect estimates are overly imprecise. Recent
methodological advances have shown that power and statistical precision can be substantially boosted
by coupling design-based causal estimation to machine-learning models of rich log data from historical
users who were not in the experiment. Estimates using these techniques remain unbiased and inference
remains exact without any additional assumptions. This paper reviews those methods and applies them
to a new dataset including over 250 randomized A/B comparisons conducted within ASSISTments, an
online learning platform. We compare results across experiments using four novel deep-learning models
of auxiliary data and show that incorporating auxiliary data into causal estimates is roughly equivalent to
increasing the sample size by 20% on average, or as much as 50-80% in some cases, relative to t-tests,
and by about 10% on average, or as much as 30-50%, compared to cutting-edge machine learning unbiased
estimates that use only data from the experiments. We show that the gains can be even larger for
estimating subgroup effects, hold even when the remnant is unrepresentative of the A/B test sample, and
extend to post-stratification population effects estimators.
How to Cite
A/B tests, deep learning, evaluation
BENJAMINI, Y. AND HOCHBERG, Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57, 1, 289–300.
BENJAMINI, Y. AND YEKUTIELI, D. 2001. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics 29, 4, 1165 – 1188.
CHERNOZHUKOV, V., CHETVERIKOV, D., DEMIRER, M., DUFLO, E., HANSEN, C., NEWEY, W., AND ROBINS, J. 2018. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21, 1, C1–C68.
COOK, L. D., LOGAN, T. D., AND PARMAN, J. M. 2014. Distinctively black names in the american past. Explorations in Economic History 53, 64–82.
DENTON, E., HANNA, A., AMIRONESEI, R., SMART, A., NICOLE, H., AND SCHEUERMAN, M. K. 2020. Bringing the people back in: Contesting benchmark machine learning datasets. Proceedings of ICML Workshop on Participatory Approaches to Machine Learning.
DING, P., LI, X., AND MIRATRIX, L. W. 2017. Bridging finite and super population causal inference. Journal of Causal Inference 5, 2.
FISHER, R. A. 1935. Design of experiments. Oliver and Boyd, Edinburgh.
FREEDMAN, D. A. 2008. On regression adjustments to experimental data. Advances in Applied Mathematics 40, 2, 180–193.
GAGNON-BARTSCH, J. A., SALES, A. C., WU, E., BOTELHO, A. F., ERICKSON, J. A., MIRATRIX, L. W., AND HEFFERNAN, N. T. Forthcoming. Precise unbiased estimation in randomized experiments using auxiliary observational data. Journal of Causal Inference. https://arxiv.org/abs/2105.03529.
GERS, F. A., SCHMIDHUBER, J., AND CUMMINS, F. 2000. Learning to forget: Continual prediction with lstm. Neural Computation 12, 10, 2451–2471.
HARRISON, A., SMITH, H., HULSE, T., AND OTTMAR, E. R. 2020. Spacing out! manipulating spatial features in mathematical expressions affects performance. Journal of Numerical Cognition 6, 2, 186– 203.
HEFFERNAN, N. T. AND HEFFERNAN, C. L. 2014. The assistments ecosystem: building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. International Journal of Artificial Intelligence in Education 24, 4, 470–497.
HELLER, R., ROSENBAUM, P. R., AND SMALL, D. S. 2009. Split samples and design sensitivity in observational studies. Journal of the American Statistical Association 104, 487, 1090–1101.
IMBENS, G. W. 2004. Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and statistics 86, 1, 4–29.
KINGMA, D. P. AND BA, J. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds.
MCDERMOTT, R. 2011. Internal and external validity. In Cambridge Handbook of Experimental Political Science, J. N. Druckman, D. P. Greene, J. H. Kuklinski, and A. Lupia, Eds. Cambridge University Press, 27–40.
MIRATRIX, L. W., SEKHON, J. S., AND YU, B. 2013. Adjusting treatment effect estimates by poststratification in randomized experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75, 2, 369–396.
NEYMAN, J. 1923. On the application of probability theory to agricultural experiments. essay on principles. section 9. Statistical Science 5, 463–480. 1990; transl. by D.M. Dabrowska and T.P. Speed.
OSTROW, K. S., SELENT, D., WANG, Y., VAN INWEGEN, E. G., HEFFERNAN, N. T., AND WILLIAMS, J. J. 2016. The assessment of learning infrastructure (ali): The theory, practice, and scalability of automated assessment. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge. LAK ’16. Association for Computing Machinery, New York, NY, USA, 279–288.
RUBIN, D. B. 1978. Bayesian Inference for Causal Effects: The Role of Randomization. The Annals of Statistics 6, 1, 34 – 58.
SALES, A. C., BOTELHO, A., PATIKORN, T. M., AND HEFFERNAN, N. T. 2018. Using big data to sharpen design-based inference in a/b tests. In Proceedings of the 11th International Conference on Educational Data Mining. (EDM 2018), K. E. Boyer and M. Yudelson, Eds. International Educational Data Mining Society, 479–486.
SALES, A. C., PRIHAR, E., GAGNON-BARTSCH, J., GURUNG, A., AND HEFFERNAN, N. T. 2022. More powerful a/b testing using auxiliary data and deep learning. In Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium: 23rd International Conference, AIED 2022, Durham, UK, July 27–31, 2022, Proceedings, Part II (AIED 2022), M. M. Rodrigo, N. Matsuda, A. I. Cristea, and V. Dimitrova, Eds. Springer Cham, 524–527.
SCHOCHET, P. Z. 2015. Statistical theory for the RCT-YES software: Design-based causal inference for RCTs. U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance.
VAN DER LAAN, M. J. AND ROSE, S. 2011. Targeted learning: causal inference for observational and experimental data. Springer Series in Statistics. Springer Science & Business Media, New York, NY.
WAGER, S., DU, W., TAYLOR, J., AND TIBSHIRANI, R. J. 2016. High-dimensional regression adjustments in randomized experiments. Proceedings of the National Academy of Sciences 113, 45, 12673–12678.
WU, E. AND GAGNON-BARTSCH, J. A. 2018. The LOOP estimator: Adjusting for covariates in randomized experiments. Evaluation Review 42, 4, 458–488.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.