Principled Transformers for Predictive Performance in Knowledge Tracing

Main

Sidebar

Published February 7, 2026
Kai Neubauer Yannick Rudolph Ulf Brefeld

Abstract

Knowledge tracing aims to model students' knowledge and abilities over time, which is crucial for intelligent tutoring systems. In this paper, we propose a straightforward model class, knowledge tracing set transformers (KTSTs), specifically addressing predictive performance in knowledge tracing tasks. KTSTs closely follow prominent transformer architectures and use an intuitive set-based representation for student interactions. We introduce learnable ALiBi, which simplifies and improves upon a prevalent attention mechanism in knowledge tracing, and MHSA aggregation, which readily allows incorporating an arbitrary number of additional, potentially more complex features per student interaction. We highlight and discuss flaws present in related approaches, which are overly complex and, in part, based on suboptimal design choices. We validate our design choices for KTSTs in experiments with real-world data and simulated learning sequences. Overall, we address lessons learned and propose a straightforward model that relies on best practices and establishes a new state-of-the-art on standardized benchmark datasets. Ultimately, KTSTs may serve as a simple but effective base model class for future research in knowledge tracing and intelligent tutoring systems. Code is available at https://github.com/kainbr/kt_set_transformers.

How to Cite

Principled Transformers for Predictive Performance in Knowledge Tracing. (2026). Journal of Educational Data Mining, 18(1), 89-112. https://doi.org/10.5281/zenodo.18518653
Abstract 183 | PDF Downloads 171 HTML Downloads 47

Details

Keywords

Intelligent tutoring systems, Knowledge tracing, Deep learning, Transformer

References
Abdelrahman, G. and Wang, Q. 2019. Knowledge tracing with sequential key-value memory networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 175–184.

Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 2623–2631.

Anderson, J. R., Boyle, C., Corbett, A. T., and Lewis, M. W. 1990. Cognitive Modeling and Intelligent Tutoring. Artificial Intelligence 42, 1, 7–49.

Ba, J. L., Kiros, J. R., and Hinton, G. E. 2016. Layer Normalization.

Bergstra, J., Bardenet, R., Bengio, Y., and Kégl, B. 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, Eds. Vol. 24. Curran Associates, Inc., Red Hook, NY, USA.

Bier, N. 2011. OLI Engineering Statics - Fall 2011. Dataset accessed via PSLC DataShop.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., ..., and Amodei, D. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, Inc., Red Hook, NY, USA, 1877–1901.

Cen, H., Koedinger, K., and Junker, B. 2006. Learning Factors Analysis–A General Method for Cognitive Model Evaluation and Improvement. In International Conference on Intelligent Tutoring Systems. Springer, Berlin, Heidelberg, 164–175.

Chen, J., Liu, Z., Huang, S., Liu, Q., and Luo, W. 2023. Improving Interpretability of Deep Sequential Knowledge Tracing Models with Question-centric Cognitive Representations. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. 14196–14204.

Choi, Y., Lee, Y., Cho, J., Baek, J., Kim, B., Cha, Y., Shin, D., Bae, C., and Heo, J. 2020. Towards An Appropriate Query, Key, and Value Computation for Knowledge Tracing. In Proceedings of the 7th ACM Conference on Learning @ Scale. Association for Computing Machinery, New York, NY, USA, 341–344.

Choi, Y., Lee, Y., Shin, D., Cho, J., Park, S., Lee, S., Baek, J., Bae, C., Kim, B., and Heo, J. 2020. Ednet: A Large-scale Hierarchical Dataset in Education. In International Conference on Artificial Intelligence in Education. Springer, Berlin, Heidelberg, 69–73.

Corbett, A. T. and Anderson, J. R. 1994. Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Modeling and User-adapted Interaction 4, 253–278.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.

Dufter, P., Schmitt, M., and Schütze, H. 2022. Position Information in Transformers: An Overview. Computational Linguistics 48, 3, 733–763.

Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C., and Garcia, R. 2000. Incorporating Second-order Functional Knowledge for Better Option Pricing. In Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp, Eds. MIT Press, 451–457.

Feng, M., Heffernan, N., and Koedinger, K. 2009. Addressing the Assessment Challenge with an Online System that Tutors as it Assesses. User Modeling and User-adapted Interaction 19, 3, 243–266.

Gervet, T., Koedinger, K., Schneider, J., Mitchell, T., et al. 2020. When Is Deep Learning the Best Approach to Knowledge Tracing? Journal of Educational Data Mining 12, 3, 31–54.

Ghosh, A., Heffernan, N., and Lan, A. S. 2020. Context-aware Attentive Knowledge Tracing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 2330–2339.

Girgis, R., Golemo, F., Codevilla, F., Weiss, M., D’Souza, J. A., Kahou, S. E., Heide, F., and Pal, C. 2022. Latent Variable Sequential Set Transformers for Joint Multi-Agent Motion Prediction. In International Conference on Learning Representations.

Glorot, X., Bordes, A., and Bengio, Y. 2011. Deep Sparse Rectifier Neural Networks. In International Conference on Artificial Intelligence and Statistics. Vol. 15. 315–323.

Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., et al. 2016. Hybrid Computing Using a Neural Network with Dynamic External Memory. Nature 538, 7626, 471–476.

Guo, X., Huang, Z., Gao, J., Shang, M., Shu, M., and Sun, J. 2021. Enhancing Knowledge Tracing via Adversarial Training. In Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 367–375.

He, K., Zhang, X., Ren, S., and Sun, J. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 770–778.

He, X., Lin, X., and Zhao, Y. 2024. Hypergraph Transformer for Knowledge Tracing. In IEEE 14th Annual Computing and Communication Workshop and Conference (CCWC). IEEE Computer Society, 0262–0268.

Hochreiter, S. and Schmidhuber, J. 1997. Long Short-Term Memory. Neural Computation 9, 8, 1735–1780.

Im, Y., Choi, E., Kook, H., and Lee, J. 2023. Forgetting-aware Linear Bias for Attentive Knowledge Tracing. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, 3958–3962.

Khajah, M., Wing, R., Lindsey, R. V., and Mozer, M. 2014. Integrating Latent-factor and Knowledge-tracing Models to Predict Individual Differences in Learning. In Educational Data Mining. International Educational Data Mining Society, 99–106.

Käser, T., Klingler, S., Schwing, A. G., and Gross, M. 2017. Dynamic Bayesian Networks for Student Modeling. IEEE Transactions on Learning Technologies 10, 4, 450–462.

LeCun, Y., Bengio, Y., and Hinton, G. 2015. Deep Learning. Nature 521, 7553, 436–444.

Lee, J. and Yeung, D.-Y. 2019. Knowledge Query Network for Knowledge Tracing: How Knowledge Interacts with Skills. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge. Association for Computing Machinery, New York, NY, USA, 491–500.

Lee, W., Chun, J., Lee, Y., Park, K., and Park, S. 2022. Contrastive Learning for Knowledge Tracing. In Proceedings of the ACM Web Conference 2022. Association for Computing Machinery, New York, NY, USA, 2330–2338.

Liu, Z., Liu, Q., Chen, J., Huang, S., Gao, B., Luo, W., and Weng, J. 2023. Enhancing Deep Knowledge Tracing with Auxiliary Tasks. In Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, New York, NY, USA, 4178–4187.

Liu, Z., Liu, Q., Chen, J., Huang, S., and Luo, W. 2023. simpleKT: A Simple but Tough-to-beat Baseline for Knowledge Tracing. In International Conference on Learning Representations.

Liu, Z., Liu, Q., Chen, J., Huang, S., Tang, J., and Luo, W. 2022. pyKT: A Python Library to Benchmark Deep Learning Based Knowledge Tracing Models. Advances in Neural Information Processing Systems 35, 18542–18555.

Long, T., Liu, Y., Shen, J., Zhang, W., and Yu, Y. 2021. Tracing Knowledge State with Individual Cognition and Acquisition Estimation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 173–182.

Nagatani, K., Zhang, Q., Sato, M., Chen, Y.-Y., Chen, F., and Ohkuma, T. 2019. Augmenting Knowledge Tracing by Considering Forgetting Behavior. In The World Wide Web Conference. Association for Computing Machinery, New York, NY, USA, 3101–3107.

Nakagawa, H., Iwasawa, Y., and Matsuo, Y. 2019. Graph-based Knowledge Tracing: Modeling Student Proficiency using Graph Neural Network. In IEEE/WIC/ACM International Conference on Web Intelligence. Association for Computing Machinery, New York, NY, USA, 156–163.

Pandey, S. and Karypis, G. 2019. A Self-attentive Model for Knowledge Tracing. In 12th International Conference on Educational Data Mining. International Educational Data Mining Society, 384–389.

Pandey, S. and Srivastava, J. 2020. RKT: Relation-aware Self-attention for Knowledge Tracing. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, New York, NY, USA, 1205–1214.

Pardos, Z. A. and Heffernan, N. T. 2011. KT-IDEM: Introducing Item Difficulty to the Knowledge Tracing Model. In User Modeling, Adaption and Personalization: 19th International Conference. Springer, Berlin, Heidelberg, 243–254.

Pavlik, P. I., Cen, H., and Koedinger, K. R. 2009. Performance Factors Analysis – A New Alternative to Knowledge Tracing. In Artificial Intelligence in Education. IOS Press, 531–538.

Piech, C., Bassen, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L., and Sohl-Dickstein, J. 2015. Deep Knowledge Tracing. In Advances in Neural Information Processing Systems. Vol. 1. Curran Associates, Inc., Red Hook, NY, USA, 505–513.

Press, O., Smith, N., and Lewis, M. 2021. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. In International Conference on Learning Representations.

Reckase, M. D. 2009. Multidimensional Item Response Theory. Springer, New York, NY.

Rodrigues, T. B., de Souza, J. F., Bernardino, H. S., and Baker, R. S. 2022. Towards Interpretability of Attention-Based Knowledge Tracing Models. In Anais do XXXIII Simpósio Brasileiro de Informática na Educação. SBC, Porto Alegre, RS, Brasil, 810–821.

Rudolph, Y., Neubauer, K., and Brefeld, U. 2025. Self-improvement for Computerized Adaptive Testing. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer Nature Switzerland, Cham, 70–86.

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., and Lillicrap, T. 2016. Meta-Learning with Memory-Augmented Neural Networks. In International Conference on Machine Learning. PMLR, 1842–1850.

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. 2009. The Graph Neural Network Model. IEEE Transactions on Neural Networks 20, 1, 61.

Schmidhuber, J. 2015. Deep Learning in Neural Networks: An Overview. Neural Networks 61, 85–117.

Self, J. A. 1974. Student models in Computer-aided Instruction. International Journal of Man-machine studies 6, 2, 261–276.

Shen, S., Huang, Z., Liu, Q., Su, Y., Wang, S., and Chen, E. 2022. Assessing Student’s Dynamic Knowledge State by Exploring the Question Difficulty Effect. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 427–437.

Shen, S., Liu, Q., Chen, E., Huang, Z., Huang, W., Yin, Y., Su, Y., and Wang, S. 2021. Learning Process-consistent Knowledge Tracing. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 1452–1460.

Sonkar, S., Waters, A. E., Lan, A. S., Grimaldi, P. J., and Baraniuk, R. G. 2020. qDKT: Question-Centric Deep Knowledge Tracing. In 13th International Conference on Educational Data Mining. International Educational Data Mining Society.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. 2014. Dropout: A Simple Way to Prevent Neural Networks From Overfitting. The Journal of Machine Learning Research 15, 1, 1929–1958.

Stamper, J., Niculescu-Mizil, A., Ritter, S., Gordon, G. J., and Koedinger, K. R. 2010a. Algebra I 2005-2006. Challenge Data Set from KDD Cup 2010 Educational Data Mining Challenge.

Stamper, J., Niculescu-Mizil, A., Ritter, S., Gordon, G. J., and Koedinger, K. R. 2010b. Bridge to Algebra 2006-2007. Challenge Data Set from KDD Cup 2010 Educational Data Mining Challenge.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. 2017. Attention Is All You Need. In Advances in Neural Information Processing Systems. Curran Associates, Inc., Red Hook, NY, USA, 5998–6008.

Vie, J.-J. and Kashima, H. 2019. Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. AAAI Press, 750–757.

Wang, C., Ma, W., Zhang, M., Lv, C., Wan, F., Lin, H., Tang, T., Liu, Y., and Ma, S. 2021. Temporal Cross-effects in Knowledge Tracing. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, NY, USA, 517–525.

Wang, Z., Lamb, A., Saveliev, E., Cameron, P., Zaykov, Y., Hernández-Lobato, J. M., Turner, R. E., Baraniuk, R. G., Barton, C., Jones, S. P., et al. 2020. Instructions and Guide for Diagnostic Questions: The NeurIPS 2020 Education Challenge.

Yang, Y., Shen, J., Qu, Y., Liu, Y., Wang, K., Zhu, Y., Zhang, W., and Yu, Y. 2021. GIKT: A Graph-based Interaction Model for Knowledge Tracing. In Machine Learning and Knowledge Discovery in Databases: European Conference. Springer International Publishing, Cham, 299–315.

Yeung, C.-K. 2019. Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory. In Proceedings of the 12th International Conference on Educational Data Mining. International Educational Data Mining Society, 683–686.

Yeung, C.-K. and Yeung, D.-Y. 2018. Addressing Two Problems in Deep Knowledge Tracing via Prediction-consistent Regularization. In Proceedings of the 5th Annual ACM Conference on Learning @ Scale. Association for Computing Machinery, New York, NY, USA, 1–10.

Yin, Y., Dai, L., Huang, Z., Shen, S., Wang, F., Liu, Q., Chen, E., and Li, X. 2023. Tracing Knowledge Instead of Patterns: Stable Knowledge Tracing with Diagnostic Transformer. In Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, New York, NY, USA, 855–864.

Yudelson, M. V., Koedinger, K. R., and Gordon, G. J. 2013. Individualized Bayesian Knowledge Tracing Models. In Artificial Intelligence in Education: 16th International Conference. Springer, Berlin, Heidelberg, 171–180.

Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczós, B., Salakhutdinov, R. R., and Smola, A. J. 2017. Deep Sets. In Advances in Neural Information Processing Systems. Curran Associates, Inc., Red Hook, NY, USA, 3391–3401.

Zhan, B., Guo, T., Li, X., Hou, M., Liang, Q., Gao, B., Luo, W., and Liu, Z. 2024. Knowledge Tracing as Language Processing: A Large-Scale Autoregressive Paradigm. In International Conference on Artificial Intelligence in Education. Springer, Berlin, Heidelberg, 177–191.

Zhang, J., Shi, X., King, I., and Yeung, D.-Y. 2017. Dynamic Key-Value Memory Networks for Knowledge Tracing. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 765–774.
Section
Articles