Using a Randomized Experiment to Compare Mastery Learning Thresholds

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Jun 19, 2025
Jeffrey Matayoshi Eric Cosyn Hasan Uzun Eyad Kurd-Misto

Abstract

Many modern adaptive learning and intelligent tutoring systems implement the principles of mastery learning, where a student must demonstrate mastery of core prerequisite material before working on subsequent content within the system.  Typically in such cases, a set of rules or algorithms is used to determine if a student has sufficiently mastered the concepts in a topic.  In a previous work, we used a quasi-experimental design to investigate the relationship between two different mastery learning thresholds and the forgetting of the learned material.  As a follow-up to this initial study, in the present work, we analyze the results from a randomized experiment---or A\B test---directly comparing these two mastery learning thresholds.  These latest results seemingly agree with those from our initial study, giving evidence for the validity of the conclusions from our original quasi-experiment.  In particular, we find that although students who learn with the higher mastery threshold are less likely to forget the learned knowledge, over time this difference decreases.  Additionally, we build on these analyses by looking at how the relationships between the mastery thresholds change based on other factors, such as the amount of struggle students experience while learning or the subject matter being covered.

How to Cite

Matayoshi, J., Cosyn, E., Uzun, H., & Kurd-Misto, E. (2025). Using a Randomized Experiment to Compare Mastery Learning Thresholds. Journal of Educational Data Mining, 17(1), 308–336. https://doi.org/10.5281/zenodo.15698758
Abstract 9 | PDF Downloads 4 HTML Downloads 6

##plugins.themes.bootstrap3.article.details##

Keywords

mastery learning, forgetting, intelligent tutoring system, randomized experiment

References
Acharya, A., Blackwell, M., and Sen, M. 2016. Explaining causal findings without bias: Detecting and assessing direct effects. The American Political Science Review 110, 3, 512.

Agarwal, P. K., Bain, P. M., and Chamberlain, R. W. 2012. The value of applied research: Retrieval practice improves classroom learning and recommendations from a teacher, a principal, and a scientist. Educational Psychology Review 24, 437–448.

Angrist, J. D. and Pischke, J.-S. 2008. Mostly Harmless Econometrics. Princeton University Press, Princeton.

Averell, L. and Heathcote, A. 2011. The form of the forgetting curve and the fate of memories. Journal of Mathematical Psychology 55, 25–35.

Bae, C. L., Therriault, D. J., and Redifer, J. L. 2019. Investigating the testing effect: Retrieval as a characteristic of effective study strategies. Learning and Instruction 60, 206–214.

Baker, R. S. J. d., Corbett, A. T., and Aleven, V. 2008. More accurate student modeling through contextual estimation of slip and guess probabilities in Bayesian Knowledge Tracing. In Intelligent Tutoring Systems, B. P. Woolf, E. Aïmeur, R. Nkambou, and S. Lajoie, Eds. Springer Berlin Heidelberg, Berlin, Heidelberg, 406–415.

Bälter, O., Zimmaro, D., and Thille, C. 2018. Estimating the minimum number of opportunities needed for all students to achieve predicted mastery. Smart Learning Environments 5, 1, 1–19.

Barzagar Nazari, K. and Ebersbach, M. 2019. Distributing mathematical practice of third and seventh graders: Applicability of the spacing effect in the classroom. Applied Cognitive Psychology 33, 2, 288–298.

Bloom, B. S. 1968. Learning for mastery. Evaluation Comment 1, 2.

Cen, H., Koedinger, K., and Junker, B. 2006. Learning Factors Analysis–a general method for cognitive model evaluation and improvement. In International Conference on Intelligent Tutoring Systems. Springer Berlin Heidelberg, Berlin, Heidelberg, 164–175.

Cen, H., Koedinger, K. R., and Junker, B. 2007. Is over practice necessary? Improving learning efficiency with the cognitive tutor through educational data mining. Frontiers in artificial intelligence and applications 158, 511.

Choffin, B., Popineau, F., Bourda, Y., and Vie, J.-J. 2019. DAS3H: Modeling student learning and forgetting for optimally scheduling distributed practice of skills. In Proceedings of the 12th International Conference on Educational Data Mining. International Eduational Data Mining Society, 29–38.

Corbett, A. T. and Anderson, J. R. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction 4, 4, 253–278.

Cosyn, E., Uzun, H., Doble, C., and Matayoshi, J. 2021. A practical perspective on knowledge space theory: ALEKS and its data. Journal of Mathematical Psychology 101, 102512.

Doroudi, S. 2020. Mastery learning heuristics and their hidden models. In International Conference on Artificial Intelligence in Education. Springer International Publishing, Cham, 86–91.

Driskell, J. E., Willis, R. P., and Copper, C. 1992. Effect of overlearning on retention. Journal of Applied Psychology 77, 5, 615.

Ebbinghaus, H. 1885; translated by Henry A. Ruger and Clara E. Bussenius (1913). Memory: A Contribution to Experimental Psychology. Originally published by Teachers College, Columbia University, New York.

Fancsali, S., Nixon, T., and Ritter, S. 2013. Optimal and worst-case performance of mastery learning assessment with Bayesian knowledge tracing. In Proceedings of the 6th International Conference on Educational Data Mining. International Eduational Data Mining Society, 35–42.

Gelman, A., Hill, J., and Vehtari, A. 2020. Regression and Other Stories. Cambridge University Press, Cambridge.

Goetgeluk, S., Vansteelandt, S., and Goetghebeur, E. 2008. Estimation of controlled direct effects. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 5, 1049–1066.

Goossens, N. A., Camp, G., Verkoeijen, P. P., Tabbers, H. K., Bouwmeester, S., and Zwaan, R. A. 2016. Distributed practice and retrieval practice in primary school vocabulary learning: A multi-classroom study. Applied Cognitive Psychology 30, 5, 700–712.

Hanley-Dunn, P. and McIntosh, J. L. 1984. Meaningfulness and recall of names by young and old adults. Journal of Gerontology 39, 583–585.

Hardin, J. W. and Hilbe, J. M. 2012. Generalized Estimating Equations. Chapman and Hall/CRC, New York.

Heagerty, P. J. and Zeger, S. L. 2000. Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Statistical Science 15, 1, 1–26.

Joffe, M. M. and Greene, T. 2009. Related causal frameworks for surrogate outcomes. Biometrics 65, 2, 530–538.

Kang, S. H. 2016. Spaced repetition promotes efficient and effective learning: Policy implications for instruction. Policy Insights from the Behavioral and Brain Sciences 3, 1, 12–19.

Karpicke, J. D. and Roediger, H. L. 2008. The critical importance of retrieval for learning. Science 319, 5865, 966–968.

Kelly, K., Wang, Y., Thompson, T., and Heffernan, N. 2015. Defining mastery: Knowledge tracing versus n-consecutive correct responses. In Proceedings of the 8th International Conference on Educational Data Mining. International Eduational Data Mining Society, 630–631.

Liang, K.-Y. and Zeger, S. L. 1986. Longitudinal data analysis using generalized linear models. Biometrika 73, 1, 13–22.

Lindsey, R. V., Shroyer, J. D., Pashler, H., and Mozer, M. C. 2014. Improving students long-term knowledge retention through personalized review. Psychological Science 25, 3, 639–647.

Matayoshi, J., Cosyn, E., and Uzun, H. 2021. Evaluating the impact of research-based updates to an adaptive learning system. In International Conference on Artificial Intelligence in Education. Springer International Publishing, Cham, 451–456.

Matayoshi, J., Cosyn, E., and Uzun, H. 2022. Does practice make perfect? Analyzing the relationship between higher mastery and forgetting in an adaptive learning system. In Proceedings of the 15th International Conference on Educational Data Mining, A. Mitrovic and N. Bosch, Eds. International Educational Data Mining Society, 316–324.

Matayoshi, J., Cosyn, E., Uzun, H., and Kurd-Misto, E. 2024. Going for the gold (standard): Validating a quasi-experimental study with a randomized experiment comparing mastery learning thresholds. In Workshop on Causal Inference in Educational Data Mining, EDM 2024.

Matayoshi, J., Granziol, U., Doble, C., Uzun, H., and Cosyn, E. 2018. Forgetting curves and testing effect in an adaptive learning and assessment system. In Proceedings of the 11th International Conference on Educational Data Mining. International Eduational Data Mining Society, 607–612.

Matayoshi, J., Uzun, H., and Cosyn, E. 2019. Deep (un)learning: Using neural networks to model retention and forgetting in an adaptive learning system. In International Conference on Artificial Intelligence in Education. Springer International Publishing, Cham, 258–269.

Matayoshi, J., Uzun, H., and Cosyn, E. 2020. Studying retrieval practice in an intelligent tutoring system. In Proceedings of the Seventh ACM Conference on Learning @ Scale. Association for Computing Machinery, New York, NY, USA, 51–62.

Matayoshi, J., Uzun, H., and Cosyn, E. 2022. Using a randomized experiment to compare the performance of two adaptive assessment engines. In Proceedings of the 15th International Conference on Educational Data Mining. International Educational Data Mining Society, 821–827.

Matayoshi, J. S. and Cosyn, E. E. 2024. Neural network-based assessment engine for the determination of a knowledge state. US Patent App. 18/536,844.

McBride, D. M. and Dosher, B. A. 1997. A comparison of forgetting in an implicit and explicit memory task. Journal of Experimental Psychology: General 126, 371–392.

Paivio, A. and Smythe, P. C. 1971. Word imagery, frequency, and meaningfulness in short-term memory. Psychonomic Science 22, 333–335.

Pardos, Z. A. and Heffernan, N. T. 2011. KT-IDEM: Introducing item difficulty to the knowledge tracing model. In Proceedings of the 19th International Conference on User Modeling, Adaption, and Personalization. UMAP’11. Springer Berlin Heidelberg, Berlin, Heidelberg, 243–254.

Pavlik, P. I. and Anderson, J. R. 2008. Using a model to compute the optimal schedule of practice. Journal of Experimental Psychology: Applied 14, 2, 101.

Pavlik, P. I., Cen, H., and Koedinger, K. R. 2009. Performance Factors Analysis–a new alternative to knowledge tracing. In Artificial Intelligence in Education-14th International Conference, AIED 2009. IOS Press, NLD, 531–538.

Pelánek, R. and Řihák, J. 2017. Experimental analysis of mastery learning criteria. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. Association for Computing Machinery, New York, NY, USA, 156–163.

Qiu, Y., Qi, Y., Lu, H., Pardos, Z. A., and Heffernan, N. T. 2011. Does time matter? Modeling the effect of time with Bayesian knowledge tracing. In Proceedings of the 4th International Conference on Educational Data Mining. International Eduational Data Mining Society, 139–148.

Reynolds, J. H. and Glaser, R. 1964. Effects of repetition and spaced review upon retention of a complex learning task. Journal of Educational Psychology 55, 5, 297.

Roediger III, H. L. and Butler, A. C. 2011. The critical role of retrieval practice in long-term retention. Trends in Cognitive Sciences 15, 20–27.

Roediger III, H. L. and Karpicke, J. D. 2006a. The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science 1, 3, 181–210.

Roediger III, H. L. and Karpicke, J. D. 2006b. Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science 17, 3, 249–255.

Rohrer, D. and Taylor, K. 2006. The effects of overlearning and distributed practice on the retention of mathematics knowledge. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition 20, 9, 1209–1224.

Rohrer, D., Taylor, K., Pashler, H., Wixted, J. T., and Cepeda, N. J. 2005. The effect of overlearning on long-term retention. Applied Cognitive Psychology 19, 3, 361–374.

Rosenbaum, P. R. 1984. The consequences of adjustment for a concomitant variable that has been affected by the treatment. Journal of the Royal Statistical Society: Series A (General) 147, 5, 656–666.

Seabold, S. and Perktold, J. 2010. Statsmodels: Econometric and statistical modeling with Python. In 9th Python in Science Conference. 92–96.

Settles, B. and Meeder, B. 2016. A trainable spaced repetition model for language learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1848–1858.

Smith, S. M. 1979. Remembering in and out of context. Journal of Experimental Psychology: Human Learning and Memory 4, 460–471.

Snijders, T. and Bosker, R. 2011. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. SAGE Publications, London.

Szmaragd, C., Clarke, P., and Steele, F. 2013. Subject specific and population average models for binary longitudinal data: a tutorial. Longitudinal and Life Course Studies 4, 2, 147–165.

Tabibian, B., Upadhyay, U., De, A., Zarezade, A., Schölkopf, B., and Gomez-Rodriguez, M. 2019. Enhancing human learning via spaced repetition optimization. Proceedings of the National Academy of Sciences 116, 10, 3988–3993.

Thistlethwaite, D. L. and Campbell, D. T. 1960. Regression-discontinuity analysis: An alternative to the ex post facto experiment. Journal of Educational Psychology 51, 6, 309.

Vansteelandt, S. 2009. Estimating direct effects in cohort and case–control studies. Epidemiology 20, 6, 851–860.

Vansteelandt, S., Goetgeluk, S., Lutz, S., Waldman, I., Lyon, H., Schadt, E. E., Weiss, S. T., and Lange, C. 2009. On the adjustment for covariates in genetic association analysis: A novel, simple principle to infer direct causal effects. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society 33, 5, 394–405.

Wang, Y. and Beck, J. E. 2012. Using student modeling to estimate student knowledge retention. In Proceedings of the 5th International Conference on Educational Data Mining. International Eduational Data Mining Society, 200–203.

Wang, Y. and Heffernan, N. T. 2011. Towards modeling forgetting and relearning in ITS: Preliminary analysis of ARRS data. In Proceedings of the 4th International Conference on Educational Data Mining. International Eduational Data Mining Society, 351–352.

Weinstein, Y., Madan, C. R., and Sumeracki, M. A. 2018. Teaching the science of learning. Cognitive Research: Principles and Implications 3, 1, 2.

Xiong, X. and Beck, J. E. 2014. A study of exploring different schedules of spacing and retrieval interval on mathematics skills in ITS environment. In International Conference on Intelligent Tutoring Systems. Springer International Publishing, Cham, 504–509.

Xiong, X., Li, S., and Beck, J. E. 2013. Will you get it right next week: Predict delayed performance in enhanced ITS mastery cycle. In The Twenty-Sixth International FLAIRS Conference. Association for the Advancement of Artificial Intelligence, 533–537.

Xiong, X., Wang, Y., and Beck, J. B. 2015. Improving students’ long-term retention performance: A study on personalized retention schedules. In Proceedings of the Fifth International Conference on Learning Analytics and Knowledge. Association for Computing Machinery, New York, NY, USA, 325–329.

Yudelson, M. 2016. Individualizing Bayesian knowledge tracing. Are skill parameters more important than student parameters? In Proceedings of the 9th International Conference on Educational Data Mining. International Eduational Data Mining Society, 556–561.
Section
EDM 2025 Journal Track