Multi-Armed Bandits for Intelligent Tutoring Systems



Published Jun 18, 2015
Benjamin Clement Didier Roy Pierre-Yves Oudeyer Manuel Lopes


We present an approach to Intelligent Tutoring Systems which adaptively personalizes sequences of learning activities to maximize skills acquired by students, taking into account the limited time and motivational resources. At a given point in time, the system proposes to the students the activity which makes them progress faster. We introduce two algorithms that rely on the empirical estimation of the learning progress, RiARiT that uses information about the difficulty of each exercise and ZPDES that uses much less knowledge about the problem. The system is based on the combination of three approaches. First, it leverages recent models of intrinsically motivated learning by transposing them to active teaching, relying on empirical estimation of learning progress provided by specific activities to particular students. Second, it uses state-of-the-art Multi-Arm Bandit (MAB) techniques to efficiently manage the exploration/exploitation challenge of this optimization process. Third, it leverages expert knowledge to constrain and bootstrap initial exploration of the MAB, while requiring only coarse guidance information of the expert and allowing the system to deal with didactic gaps in its knowledge. The system is evaluated in a scenario where 7-8 year old schoolchildren learn how to decompose numbers while manipulating money. Systematic experiments are presented with simulated students, followed by results of a user study across a population of 400 school children.

How to Cite

Clement, B., Roy, D., Oudeyer, P.-Y., & Lopes, M. (2015). Multi-Armed Bandits for Intelligent Tutoring Systems. JEDM | Journal of Educational Data Mining, 7(2), 20-48.
Abstract 504 | PDF Downloads 294



intelligent tutoring systems, multi-armed bandits, personalization, intrinsic motivation, active teaching, active learning

ANDERSON, J. R., CORBETT, A. T., KOEDINGER, K. R., AND PELLETIER, R. 1995. Cognitive tutors: Lessons learned. The journal of the learning sciences 4, 2, 167–207.

AUER, P., CESA-BIANCHI, N., FREUND, Y., AND SCHAPIRE, R. 2003. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 1, 48–77.

AZAR, M. G., LAZARIC, A., AND BRUNSKILL, E. 2013. Sequential transfer in multi-armed bandit with finite set of models. In NIPS. 2220–2228.

BAKER, R. S., CORBETT, A. T., AND ALEVEN, V. 2008. More accurate student modeling through contextual estimation of slip and guess probabilities in bayesian knowledge tracing. In Intelligent Tutoring Systems. 406–415.

BARNES, T., STAMPER, J., AND CROY, M. 2011. Using markov decision processes for automatic hint generation. Handbook of Educational Data Mining, 467.

BECK, J. E. AND CHANG, K.-M. 2007. Identifiability: A fundamental problem of student modeling. In User Modeling 2007. Springer, 137–146.

BECK, J. E. AND XIONG, X. 2013. Limits to accuracy: How well can we do at student modeling? In Educational Data Mining.

BERLYNE, D. 1960. Conflict, arousal, and curiosity. McGraw-Hill Book Company.

BRUNSKILL, E. AND RUSSELL, S. 2010. Rapid: A reachable anytime planner for imprecisely-sensed domains. In UAI.

BUBECK, S. AND CESA-BIANCHI, N. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends R in Stochastic Systems 1, 4.

CHI, M., VANLEHN, K., LITMAN, D., AND JORDAN, P. 2011. Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction 21, 1, 137–180.

CLEMENT, B., ROY, D., OUDEYER, P.-Y., AND LOPES, M. 2014. Online optimization of teaching sequences with multi-armed bandits. In Educational Data Mining (EDM’14).

CORBETT, A. AND ANDERSON, J. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction 4, 4, 253–278.

CSIKSZENTMIHALYI, I. S. 1992. Optimal experience: Psychological studies of flow in consciousness. Cambridge University Press.

DESMARAIS, M. C. 2011. Performance comparison of item-to-item skills models with the IRT single latent trait model. In User Modeling, Adaption and Personalization. Springer, 75–86.

DHANANI, A., LEE, S. Y., PHOTHILIMTHANA, P., AND PARDOS, Z. 2014. A comparison of error metrics for learning model parameters in bayesian knowledge tracing. In Inter. Conf. on Educational Data Mining Workshops.

ENGESER, S. AND RHEINBERG, F. 2008. Flow, performance and moderators of challenge-skill balance. Motivation and Emotion 32, 3, 158–172.

GAGNE, R. M. AND BRIGGS, L. J. 1974. Principles of instructional design. Holt, Rinehart & Winston.

GERTNER, A. S., CONATI, C., AND VANLEHN, K. 1998. Procedural help in andes: Generating hints using a bayesian network student model. AAAI/IAAI 1998, 106–11.

GONZ´A LEZ-BRENES, J., HUANG, Y., AND BRUSILOVSKY, P. 2014. General features in knowledge tracing: Applications to multiple subskills, temporal item response theory, and expert knowledge. In Inter. Conf. on Educational Data Mining.

GONZ´A LEZ-BRENES, J. P. AND MOSTOW, J. 2012. Dynamic cognitive tracing: Towards unified discovery of student and cognitive models. In EDM. 49–56.

GOTTLIEB, J., OUDEYER, P.-Y., LOPES, M., AND BARANES, A. 2013. Information-seeking, curiosity, and attention: computational and neural mechanisms. Trends in Cognitive Sciences 17, 11, 585–593.

HABGOOD, M. J. AND AINSWORTH, S. E. 2011. Motivating children to learn effectively: Exploring the value of intrinsic integration in educational games. The Journal of the Learning Sciences 20, 2, 169–206.

HAMBLETON, R. K. 1991. Fundamentals of item response theory. Vol. 2. Sage publications.

KOEDINGER, K., ANDERSON, J., HADLEY, W., MARK, M., ET AL. 1997. Intelligent tutoring goes to school in the big city. Inter. Journal of Artificial Intelligence in Education (IJAIED) 8, 30–43.

KOEDINGER, K. R., BRUNSKILL, E., BAKER, R. S., MCLAUGHLIN, E. A., AND STAMPER, J. 2013. New potentials for data-driven intelligent tutoring system development and optimization. AI Magazine.

KRZYWINSKI, M., SCHEIN, J., BIROL, ˙I., CONNORS, J., GASCOYNE, R., HORSMAN, D., JONES, S. J., AND MARRA, M. A. 2009. Circos: an information aesthetic for comparative genomics. Genome research 19, 9, 1639–1645.

LEE, C. D. 2005. Signifying in the zone of proximal development. An introduction to Vygotsky 2, 253– 284.

LEE, J. AND BRUNSKILL, E. 2012. The impact on individualizing student models on necessary practice opportunities. In Inter. Conf. on Educational Data Mining (EDM).

LOPES, M. AND OUDEYER, P.-Y. 2012. The strategic student approach for life-long exploration and learning. In IEEE Inter. Conf. on Development and Learning (ICDL’12). San Diego, USA.

LUCKIN, R. 2001. Designing childrens software to ensure productive interactivity through collaboration in the zone of proximal development (zpd). Information Technology in Childhood Education Annual 2001, 1, 57–85.

NKAMBOU, R., MIZOGUCHI, R., AND BOURDEAU, J. 2010. Advances in intelligent tutoring systems. Vol. 308. Springer.

OUDEYER, P. AND KAPLAN, F. 2007. What is intrinsic motivation? a typology of computational approaches. Frontiers in Neurorobotics 1.

RAFFERTY, A., BRUNSKILL, E., GRIFFITHS, T., AND SHAFTO, P. 2011. Faster teaching by pomdp planning. In Artificial Intelligence in Education. Springer, 280–287.

ROY, D. 2012. Usage d’un robot pour la rem´ediation en math´ematiques. M.S. thesis, Universit´e de Bordeaux.

SCHATTEN, C., JANNING, R., MAVRIKIS, M., AND SCHMIDT-THIEME, L. 2014. Matrix factorization feasibility for sequencing and adaptive support in its. In 7th International Conference on Educational Data Mining EDM 2014.

SEMET, Y., YAMONT, Y., BIOJOUT, R., LUTON, E., AND COLLET, P. 2003. Artificial ant colonies and e-learning: An optimisation of pedagogical paths. In International Conference on Human-Computer Interaction.

SHUTE, V. J. 2011. Stealth assessment in computer-based games to support learning. Computer games and instruction 55, 2, 503–524.

SHUTE, V. J., HANSEN, E. G., AND ALMOND, R. G. 2008. You can’t fatten a hog by weighing it–or can you? evaluating an assessment for learning system called aced. International Journal of Artificial Intelligence in Education 18, 4, 289–316.

WANG, Y. AND HEFFERNAN, N. 2013. Extending knowledge tracing to allow partial credit: using continuous versus binary nodes. In Artificial Intelligence in Education. Springer, 181–188.
EDM 2015 Journal Track