Deep Learning vs. Bayesian Knowledge Tracing: Student Models for Interventions



Published Oct 25, 2018
Ye Mao Chen Lin Min Chi


Bayesian Knowledge Tracing (BKT) is a commonly used approach for student modeling, and Long Short Term Memory (LSTM) is a versatile model that can be applied to a wide range of tasks, such as language translation. In this work, we directly compared three models: BKT, its variant Intervention-BKT (IBKT), and LSTM, on two types of student modeling tasks: post-test scores prediction and learning gains prediction. Additionally, while previous work on student learning has often used skill/knowledge components identified by domain experts, we incorporated an automatic skill discovery method (SK), which includes a nonparametric prior over the exercise-skill assignments, to all three models. Thus, we explored a total of six models: BKT, BKT+SK, IBKT, IBKT+SK, LSTM, and LSTM+SK. Two training datasets were employed, one was collected from a natural language physics intelligent tutoring system named Cordillera, and the other was from a standard probability intelligent tutoring system named Pyrenees. Overall, our results showed that BKT and BKT+SK outperformed the others on predicting post-test scores, whereas LSTM and LSTM+SK achieved the highest accuracy, F1-measure, and area under the ROC curve (AUC) on predicting learning gains. Furthermore, we demonstrated that by combining SK with the BKT model, BKT+SK could reliably predict post-test scores using only the earliest 50% of the entire training sequences. For learning gain early prediction, using the earliest 70% of the entire sequences, LSTM can deliver a comparable prediction as using the entire training sequences. The findings yield a learning environment that can foretell students’ performance and learning gains early, and can render adaptive pedagogical strategy accordingly.

How to Cite

Mao, Y., Lin, C., & Chi, M. (2018). Deep Learning vs. Bayesian Knowledge Tracing: Student Models for Interventions. JEDM | Journal of Educational Data Mining, 10(2), 28-54.
Abstract 1912 | PDF Downloads 1376



student modeling, learning gain, interventions, LSTM, BKT

ALDOUS, D. J. 1985. Exchangeability and related topics. In E´cole d’E´te´ de Probabilite´s de Saint-Flour XIII — 1983, P. L. Hennequin, Ed. Springer, 1–198.

ALEVEN, V. A. AND KOEDINGER, K. R. 2002. An effective metacognitive strategy: Learning by doing and explaining with a computer-based cognitive tutor. Cognitive science 26, 2, 147–179.

BAKER, R. S., CORBETT, A. T., AND ALEVEN, V. 2008. More accurate student modeling through contextual estimation of slip and guess probabilities in Bayesian knowledge tracing. In Proceedings of the 9th international conference on Intelligent Tutoring Systems, B. P. Woolf, E. A¨ımeur, R. Nkambou, and S. Lajoie, Eds. Springer, 406–415.

BARNES, T. 2005. The q-matrix method: Mining student response data for knowledge. In American Association for Artificial Intelligence 2005 Educational Data Mining Workshop. 1–8.

BECK, J. E. 2005. Engagement tracing: Using response times to model student disengagement. In Artificial Intelligence in Education: Supporting Learning Through Intelligent and Socially Informed Technology, C.-K. Looi, G. McCalla, and B. Bredeweg, Eds. IOS Press, 88–95.

BECK, J. E. AND MOSTOW, J. 2008. How who should practice: Using learning decomposition to evaluate the efficacy of different types of practice for different types of students. In Intelligent Tutoring Systems, B. P. Woolf, E. A¨ımeur, R. Nkambou, and S. Lajoie, Eds. Springer, 353–362.

CEN, H., KOEDINGER, K., AND JUNKER, B. 2006. Learning factors analysis – a general method for cognitive model evaluation and improvement. In Intelligent Tutoring Systems, M. Ikeda, K. D. Ashley, and T.-W. Chan, Eds. Springer, 164–175.

CHI, M., KOEDINGER, K. R., GORDON, G. J., JORDON, P., AND VANLAHN, K. 2011. Instructional factors analysis: A cognitive model for multiple instructional interventions. In Proceedings of the 4th International Conference on Educational Data Mining, M. Pechenizkiy, T. Calders, C. Conati, C. R. Sebastian Ventura, and J. Stamper, Eds. 61–70.

COCEA, M. AND WEIBELZAHL, S. 2006. Can log files analysis estimate learners’ level of motivation? In LWA 2006: Lernen - Wissensentdeckung - Adaptivitat, 14th Workshop on Adaptivity and User Modeling in Interactive Systems (ABIS 2006). Number 1/2006 in Hildesheimer Informatik-Berichte. University of Hildesheim, Institute of Computer Science, 32–35.

CORBETT, A. T. AND ANDERSON, J. R. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction 4, 4, 253–278.

CRONBACH, L. J. AND SNOW, R. 1981. Aptitudes and Instructional Methods: A Handbook for Research on Interactions. Irvington Publishers.

DESMARAIS, M. C. AND NACEUR, R. 2013. A matrix factorization method for mapping items to skills and for enhancing expert-based q-matrices. In Artificial Intelligence in Education, H. C. Lane, K. Yacef, J. Mostow, and P. Pavlik, Eds. Springer, 441–450.

EDDY, S. R. 1996. Hidden Markov models. Current opinion in structural biology 6, 3, 361–365.

FENG, M., BECK, J., HEFFERNAN, N., AND KOEDINGER, K. 2008. Can an intelligent tutoring system predict math proficiency as well as a standardized test? In Proceedings of the 1st International Conference on Educational Data Mining, R. S. J. de Baker, T. Barnes, and J. E. Beck, Eds. 107–116.

GALYARDT, A. AND GOLDIN, I. 2014. Recent-performance factors analysis. In Proceedings of the 7th International Conference on Educational Data Mining, J. Stamper, Z. Pardos, M. Mavrikis, and B. McLaren, Eds. 411–412.

GERS, F. A. AND SCHMIDHUBER, J. 2000. Recurrent nets that time and count. In Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on. Vol. 3. IEEE, 189–194.

GERS, F. A., SCHMIDHUBER, J., AND CUMMINS, F. 1999. Learning to forget: Continual prediction with LSTM. IET Conference Proceedings, 850–855.

GONZALEZ-BRENES, J. AND MOSTOW, J. 2013. What and when do students learn? fully data-driven joint estimation of cognitive and student models. In Proceedings of the 6th International Conference on Educational Data Mining, S. K. DMello, R. A. Calvo, and A. Olney, Eds. 236–239.

GONZ´A LEZ-BRENES, J. P. AND MOSTOW, J. 2012. Dynamic cognitive tracing: Towards unified discovery of student and cognitive models. In Proceedings of the 5th International Conference on Educational Data Mining, K. Yacef, O. Zaane, A. Hershkovitz, M. Yudelson, and J. Stamper, Eds. 49–56.

GONZ´A LEZ-ESPADA, W. J. AND BULLOCK, D. W. 2007. Innovative applications of classroom response systems: Investigating students item response times in relation to final course grade, gender, general point average, and high school act scores. Electronic Journal for the Integration of Technology in Education 6, 97–108.

GRAESSER, A. C., LU, S., JACKSON, G. T., MITCHELL, H. H., VENTURA, M., OLNEY, A., AND LOUWERSE, M. M. 2004. Autotutor: A tutor with dialogue in natural language. Behavior Research Methods, Instruments, & Computers 36, 2, 180–192.

GRAVES, A., JAITLY, N., AND MOHAMED, A.-R. 2013. Hybrid speech recognition with deep bidirectional LSTM. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 273–278.

HAKE, R. R. 2002. Relationship of individual student normalized learning gains in mechanics with gender, high-school physics, and pretest scores on mathematics and spatial visualization. In Physics education research conference. Number 2. 30–45.

HOCHREITER, S. AND SCHMIDHUBER, J. 1997. Long short-term memory. Neural computation 9, 8, 1735–1780.

ISHWARAN, H. AND JAMES, L. F. 2003. Generalized weighted chinese restaurant processes for species sampling mixture models. Statistica Sinica, 1211–1235.

KALCHBRENNER, N., DANIHELKA, I., AND GRAVES, A. 2015. Grid long short-term memory. arXiv preprint arXiv:1507.01526.

KHAJAH, M., LINDSEY, R. V., AND MOZER, M. C. 2016. How deep is knowledge tracing? arXiv preprint arXiv:1604.02416.

LAN, A. S., WATERS, A. E., STUDER, C., AND BARANIUK, R. G. 2014. Sparse factor analysis for learning and content analytics. The Journal of Machine Learning Research 15, 1, 1959–2008.

LECUN, Y., BENGIO, Y., AND HINTON, G. 2015. Deep learning. Nature 521, 7553, 436–444.

LIN, C. AND CHI, M. 2016. Intervention-bkt: incorporating instructional interventions into Bayesian knowledge tracing. In International Conference on Intelligent Tutoring Systems. Springer, 208–218.

LIN, C. AND CHI, M. 2017. A comparisons of BKT, RNN, and LSTM for learning gain prediction. In Artificial Intelligence in Education, E. Andr´e, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay, Eds. Springer, 536–539.

LIN, C., SHEN, S., AND CHI, M. 2016. Incorporating student response time and tutor instructional interventions into student modeling. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization. ACM, 157–161.

LINDSEY, R. V., KHAJAH, M., AND MOZER, M. C. 2014. Automatic discovery of cognitive skills to improve the prediction of student learning. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 1386–1394.

LUCKIN, R. ET AL. 2007. Beyond the code-and-count analysis of tutoring dialogues. In Artificial intelligence in education: Building technology rich learning contexts that work, R. Luckin, K. R. Koedinger, and J. Greer, Eds. IOS Press, 349–356.

LUONG, M.-T. AND MANNING, C. D. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of the International Workshop on Spoken Language Translation. 76–79.

MARCEL, S., BERNIER, O., VIALLET, J.-E., AND COLLOBERT, D. 2000. Hand gesture recognition using input-output hidden Markov models. In Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on. IEEE, 456–461.

MCGRATH, C. H., GUERIN, B., HARTE, E., FREARSON, M., AND MANVILLE, C. 2015. Learning gain in higher education. Santa Monica, CA: RAND Corporation.

MERRILL, D. C., REISER, B. J., RANNEY, M., AND TRAFTON, J. G. 1992. Effective tutoring techniques: A comparison of human tutors and intelligent tutoring systems. The Journal of the Learning Sciences 2, 3, 277–305.

NEAL, R. M. 2000. Markov chain sampling methods for Dirichlet process mixture models. Journal of computational and graphical statistics 9, 2, 249–265.

NG, J. Y.-H., HAUSKNECHT, M., VIJAYANARASIMHAN, S., VINYALS, O., MONGA, R., AND TODERICI, G. 2015. Beyond short snippets: Deep networks for video classification. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 4694–4702.

PARDOS, Z. A. AND HEFFERNAN, N. T. 2010. Modeling individualization in a Bayesian networks implementation of knowledge tracing. In International Conference on User Modeling, Adaptation, and Personalization. Springer, 255–266.

PARDOS, Z. A. AND HEFFERNAN, N. T. 2011. Kt-idem: Introducing item difficulty to the knowledge tracing model. In User Modeling, Adaption and Personalization. Springer, 243–254.

PAVLIK, P. I., CEN, H., AND KOEDINGER, K. R. 2009. Performance factors analysis –a new alternative to knowledge tracing. In Proceedings of the 2009 Conference on Artificial Intelligence in Education: Building Learning Systems That Care: From Knowledge Representation to Affective Modelling. IOS Press, 531–538.

PIECH, C., BASSEN, J., HUANG, J., GANGULI, S., SAHAMI, M., GUIBAS, L. J., AND SOHLDICKSTEIN, J. 2015. Deep knowledge tracing. In Advances in Neural Information Processing Systems, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 505–513.

RITTER, S., JOSHI, A., FANCSALI, S., AND NIXON, T. 2013. Predicting standardized test scores from cognitive tutor interactions. In Proceedings of the 6th International Conference on Educational Data Mining, S. K. DMello, R. A. Calvo, and A. Olney, Eds. 169–176.

SCHNIPKE, D. L. AND SCRAMS, D. J. 2002. Exploring issues of examinee behavior: Insights gained from response-time analyses. In Computer-based testing: Building the foundation for future assessments, C. N. Mills, M. T. Potenza, J. J. Fremer, and W. C. Ward, Eds. Lawrence Erlbaum Associates Publishers, Mahwah, NJ, US, 237–266.

TANG, S., PETERSON, J. C., AND PARDOS, Z. A. 2016. Deep neural networks and how they apply to sequential education data. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale. ACM, 321–324.

TATSUOKA, K. 1983. Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement. 20, 4, 345–354.

THAI-NGHE, N., DRUMOND, L., HORV´ATH, T., KROHN-GRIMBERGHE, A., NANOPOULOS, A., AND SCHMIDT-THIEME, L. 2012. Factorization techniques for predicting student performance. In Educational recommender systems and technologies: Practices and challenges, O. C. Santos and J. G. Boticario, Eds. IGI Global, Hershey, PA, 129–153.

THAI-NGHE, N., DRUMOND, L., KROHN-GRIMBERGHE, A., AND SCHMIDT-THIEME, L. 2010. Recommender system for predicting student performance. Procedia Computer Science 1, 2, 2811–2819.

THOMAS, R. D. L. V. S. ET AL. 1986. Response Times: Their Role in Inferring Elementary Mental Organization: Their Role in Inferring Elementary Mental Organization. Oxford University Press, USA.

VANLEHN, K. 2006. The behavior of tutoring systems. International Journal Artificial Intelligence in Education 16, 3, 227–265.

VANLEHN, K., JORDAN, P., AND LITMAN, D. 2007. Developing pedagogically effective tutorial dialogue tactics: Experiments and a testbed. In Proceedings of SLaTE Workshop on Speech and Language Technology in Education ISCA Tutorial and Research Workshop. 17–20.

WILSON, K. H., KARKLIN, Y., HAN, B., AND EKANADHAM, C. 2016. Back to the basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation. arXiv preprint arXiv:1604.02336.

XINGJIAN, S., CHEN, Z., WANG, H., YEUNG, D.-Y., WONG, W.-K., AND WOO, W.-C. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds. 802–810.

XIONG, X., ZHAO, S., VAN INWEGEN, E., AND BECK, J. 2016. Going deeper with deep knowledge tracing. In Proceedings of the 9th International Conference on Educational Data Mining, J. Rowe and E. Snow, Eds. 545–550.

XU, K., BA, J., KIROS, R., CHO, K., COURVILLE, A., SALAKHUDINOV, R., ZEMEL, R., AND BENGIO, Y. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048–2057.

YUDELSON, M. V., KOEDINGER, K. R., AND GORDON, G. J. 2013. Individualized Bayesian knowledge tracing models. In Artificial Intelligence in Education, H. C. Lane, K. Yacef, J. Mostow, and P. Pavlik, Eds. Springer, 171–180.
EDM 2018 Journal Track