Randomized experiments can provide key insights for improving educational technologies, but many students may experience conditions associated with inferior learning outcomes in these experiments. Multiarmed bandit (MAB) algorithms can address this issue by accumulating evidence from the experiment as it runs and modifying the experimental design to assign more helpful conditions to a greater proportion of future students. Using simulations, we explore the statistical impact of using MAB algorithms for experiment design, focusing on the tradeoff between acquiring statistically reliable information from the experiment and benefits to students. We consider how temporal biases in patterns of student behavior may impact the results of MAB experiments, and model data from ten previous educational experiments to demonstrate potential impacts of MAB assignment. Results suggest that MAB experiments can lead to much higher average benefits to students than traditional experimental designs, although at least twice as many participants are needed for acceptable statistical power. Using an optimistic prior distribution for the MAB algorithm mitigates the loss in power to some extent, without significantly reducing benefits to students. Additionally, longer experiments with MAB assignment still assign fewer students to a less effective condition than typical practice of a shorter experiment followed by choosing one condition for all future students. Yet, MAB assignment does increase false positive rates, especially if there are temporal biases in when students enter the experiment. Caution must thus be used when interpreting results from MAB assignment in cases where students can choose when to participate in the experiment. Overall, in scenarios where student characteristics do not vary over time, MAB experimental designs can be beneficial for students and effective for reliably determining which of two differing conditions is better given large sample sizes.
How to Cite
experimental design, educational experiment, simulation, statistical hypothesis testing, adaptive experimentation, multi-armed bandits
AGRAWAL, S. AND GOYAL, N. 2013. Thompson sampling for contextual bandits with linear payoffs. In Proceedings of the 30th International Conference on International Conference on Machine Learning, S. Dasgupta and D. McAllester, Eds. Vol. 28. JMLR, 127–135.
ATKINSON, A. C. 2014. Selecting a biased-coin design. Statistical Science 29, 1, 144–163.
AUDIBERT, J.-Y. AND BUBECK, S. 2010. Best arm identification in multi-armed bandits. In Proceedings of the 23rd Annual Conference on Learning Theory. 41–53.
BASSLER, D., BRIEL, M., MONTORI, V. M., LANE, M., GLASZIOU, P., ZHOU, Q., HEELS-ANSDELL, D., WALTER, S. D., GUYATT, G. H., GROUP, S.-. S., ET AL. 2010. Stopping randomized trials early for benefit and estimation of treatment effects: Systematic review and meta-regression analysis. JAMA 303, 12, 1180–1187.
BESBES, O., GUR, Y., AND ZEEVI, A. 2014. Stochastic multi-armed-bandit problem with non-stationary rewards. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, Eds. Curran Associates, Inc., 199–207.
BOWDEN, J. AND TRIPPA, L. 2017. Unbiased estimation for response adaptive clinical trials. Statistical Methods in Medical Research 26, 5, 2376–2388.
BUTTON, K. S., IOANNIDIS, J. P., MOKRYSZ, C., NOSEK, B. A., FLINT, J., ROBINSON, E. S., AND MUNAFO`, M. R. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14, 5, 365–376.
CAVAGNARO, D. R., MYUNG, J. I., PITT, M. A., AND KUJALA, J. V. 2010. Adaptive design optimization: A mutual information-based approach to model discrimination in cognitive science. Neural Computation 22, 4, 887–905.
CHAPELLE, O. AND LI, L. 2011. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2249–2257.
CHOW, S.-C., WANG, H., AND SHAO, J. 2007. Sample size calculations in clinical research. CRC Press, Boca Raton, FL.
CLEMENT, B., ROY, D., OUDEYER, P.-Y., AND LOPES, M. 2015. Multi-armed bandits for intelligent tutoring systems. Journal of Educational Data Mining 7, 20–48.
COHEN, J. 1988. Statistical power analysis for the behavioral sciences, 2 ed. Lawrence Erlbaum Associates, Mahwah, NJ.
DEMETS, D. L. AND LAN, K. 1994. Interim analysis: the alpha spending function approach. Statistics in Medicine 13, 13-14, 1341–1352.
DUAN, L. AND HU, F. 2009. Doubly adaptive biased coin designs with heterogeneous responses. Journal of Statistical Planning and Inference 139, 9, 3220–3230.
EISELE, J. R. AND WOODROOFE, M. B. 1995. Central limit theorems for doubly adaptive biased coin designs. The Annals of Statistics 23, 1, 234–254.
ERRAQABI, A., LAZARIC, A., VALKO, M., BRUNSKILL, E., AND LIU, Y.-E. 2017. Trading off rewards and errors in multi-armed bandits. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, A. Singh and J. Zhu, Eds. Vol. 54. PMLR, 709–717.
GELMAN, A. AND CARLIN, J. 2014. Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. Perspectives on Psychological Science 9, 6, 641–651.
HU, F. AND ROSENBERGER, W. F. 2003. Optimality, variability, power: evaluating response-adaptive randomization procedures for treatment comparisons. Journal of the American Statistical Association 98, 463, 671–678.
HU, F. AND ROSENBERGER, W. F. 2006. The theory of response-adaptive randomization in clinical trials. Vol. 525. John Wiley & Sons, Hoboken, NJ.
JENNISON, C. AND TURNBULL, B. W. 2005. Meta-analyses and adaptive group sequential designs in the clinical development process. Journal of Biopharmaceutical Statistics 15, 4, 537–558.
KAUFMANN, E., CAPPE´, O., AND GARIVIER, A. 2016. On the complexity of best arm identification in multi-armed bandit models. Journal of Machine Learning Research 17, 1, 1–42.
KULESHOV, V. AND PRECUP, D. 2014. Algorithms for multi-armed bandit problems. arXiv preprint arXiv:1402.6028.
LAI, T. L. AND ROBBINS, H. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 1, 4–22.
LAN, A. S. AND BARANIUK, R. G. 2016. A contextual bandits framework for personalized learning action selection. In Proceedings of the Ninth International Conference on Educational Data Mining, T. Barnes, M. Chi, and M. Feng, Eds. 424–429.
LANGFORD, J. AND ZHANG, T. 2008. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. Curran Associates, Inc., 817–824.
LI, L., CHU, W., LANGFORD, J., AND SCHAPIRE, R. E. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web. ACM, 661–670.
LIU, Y.-E., MANDEL, T., BRUNSKILL, E., AND POPOVIC, Z. 2014. Trading off scientific knowledge and user learning with multi-armed bandits. In Proceedings of the 7th International Conference on Educational Data Mining, J. Stamper, Z. Pardos, M. Mavrikis, and B. McLaren, Eds. 161–168.
MANSOURNIA, M. A. AND ALTMAN, D. G. 2016. Inverse probability weighting. British Medical Journal 352, i189.
MU, T., WANG, S., ANDERSEN, E., AND BRUNSKILL, E. 2018. Combining adaptivity with progression ordering for intelligent tutoring systems. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale. ACM, 15:1–15:4.
RADLINSKI, F., KLEINBERG, R., AND JOACHIMS, T. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th International Conference on Machine Learning, A. McCallum and S. Roweis, Eds. ACM, 784–791.
SCOTT, S. L. 2010. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry 26, 6, 639–658.
SEGAL, A., DAVID, Y. B., WILLIAMS, J. J., GAL, K., AND SHALOM, Y. 2018. Combining difficulty ranking with multi-armed bandits to sequence educational content. In Proceedings of the 19th International Conference on Artificial Intelligence in Education, C. Penstein Rose, R. Martnez-´ Maldonado, U. Hoppe, R. Luckin, M. Mavrikis, K. Porayska-Pomsta, B. McLaren, and B. du Boulay, Eds. Springer, 317–321.
SELENT, D., PATIKORN, T., AND HEFFERNAN, N. 2016. ASSISTments dataset from multiple randomized controlled experiments. In Proceedings of the Third ACM Conference on Learning at Scale. ACM, 181–184.
TANG, L., JIANG, Y., LI, L., AND LI, T. 2014. Ensemble contextual bandits for personalized recommendation. In Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 73–80.
TANG, L., ROSALES, R., SINGH, A., AND AGARWAL, D. 2013. Automatic ad format selection via contextual bandits. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 1587–1594.
WELCH, B. L. 1938. The significance of the difference between two means when the population variances are unequal. Biometrika 29, 3/4, 350–362.
WILLIAMS, J. J., KIM, J., RAFFERTY, A., MALDONADO, S., GAJOS, K. Z., LASECKI, W. S., AND HEFFERNAN, N. 2016. Axis: Generating explanations at scale with learnersourcing and machine learning. In Proceedings of the Third ACM Conference on Learning at Scale. ACM, 379–388.
WILLIAMS, J. J., RAFFERTY, A. N., TINGLEY, D., ANG, A., LASECKI, W. S., AND KIM, J. 2018. Enhancing online problems through instructor-centered tools for randomized experiments. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 207:1–207:12.
XU, J., XING, T., AND VAN DER SCHAAR, M. 2016. Personalized course sequence recommendations. IEEE Transactions on Signal Processing 64, 20, 5340–5352.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.