Principled Transformers for Predictive Performance in Knowledge Tracing

Kai Neubauer; Yannick Rudolph; Ulf Brefeld

doi:10.5281/zenodo.18518653

Principled Transformers for Predictive Performance in Knowledge Tracing

PDF HTML

Published February 7, 2026

DOI: https://doi.org/10.5281/zenodo.18518653

Kai Neubauer

Leuphana University

https://orcid.org/0009-0003-1246-7011

Yannick Rudolph

Leuphana University

https://orcid.org/0009-0000-5677-3318

Ulf Brefeld

Leuphana University

Abstract

Knowledge tracing aims to model students' knowledge and abilities over time, which is crucial for intelligent tutoring systems. In this paper, we propose a straightforward model class, knowledge tracing set transformers (KTSTs), specifically addressing predictive performance in knowledge tracing tasks. KTSTs closely follow prominent transformer architectures and use an intuitive set-based representation for student interactions. We introduce learnable ALiBi, which simplifies and improves upon a prevalent attention mechanism in knowledge tracing, and MHSA aggregation, which readily allows incorporating an arbitrary number of additional, potentially more complex features per student interaction. We highlight and discuss flaws present in related approaches, which are overly complex and, in part, based on suboptimal design choices. We validate our design choices for KTSTs in experiments with real-world data and simulated learning sequences. Overall, we address lessons learned and propose a straightforward model that relies on best practices and establishes a new state-of-the-art on standardized benchmark datasets. Ultimately, KTSTs may serve as a simple but effective base model class for future research in knowledge tracing and intelligent tutoring systems. Code is available at https://github.com/kainbr/kt_set_transformers.

How to Cite

Principled Transformers for Predictive Performance in Knowledge Tracing. (2026). Journal of Educational Data Mining, 18(1), 89-112. https://doi.org/10.5281/zenodo.18518653

Abstract 370 | PDF Downloads 364 HTML Downloads 118

Keywords

Intelligent tutoring systems, Knowledge tracing, Deep learning, Transformer

References

Abdelrahman, G. and Wang, Q. 2019. Knowledge tracing with sequential key-value memory networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 175–184.

Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 2623–2631.

Anderson, J. R., Boyle, C., Corbett, A. T., and Lewis, M. W. 1990. Cognitive Modeling and Intelligent Tutoring. Artificial Intelligence 42, 1, 7–49.

Ba, J. L., Kiros, J. R., and Hinton, G. E. 2016. Layer Normalization.

Bergstra, J., Bardenet, R., Bengio, Y., and Kégl, B. 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, Eds. Vol. 24. Curran Associates, Inc., Red Hook, NY, USA.

Bier, N. 2011. OLI Engineering Statics - Fall 2011. Dataset accessed via PSLC DataShop.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., ..., and Amodei, D. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, Inc., Red Hook, NY, USA, 1877–1901.

Cen, H., Koedinger, K., and Junker, B. 2006. Learning Factors Analysis–A General Method for Cognitive Model Evaluation and Improvement. In International Conference on Intelligent Tutoring Systems. Springer, Berlin, Heidelberg, 164–175.

Chen, J., Liu, Z., Huang, S., Liu, Q., and Luo, W. 2023. Improving Interpretability of Deep Sequential Knowledge Tracing Models with Question-centric Cognitive Representations. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. 14196–14204.

Choi, Y., Lee, Y., Cho, J., Baek, J., Kim, B., Cha, Y., Shin, D., Bae, C., and Heo, J. 2020. Towards An Appropriate Query, Key, and Value Computation for Knowledge Tracing. In Proceedings of the 7th ACM Conference on Learning @ Scale. Association for Computing Machinery, New York, NY, USA, 341–344.

Choi, Y., Lee, Y., Shin, D., Cho, J., Park, S., Lee, S., Baek, J., Bae, C., Kim, B., and Heo, J. 2020. Ednet: A Large-scale Hierarchical Dataset in Education. In International Conference on Artificial Intelligence in Education. Springer, Berlin, Heidelberg, 69–73.

Corbett, A. T. and Anderson, J. R. 1994. Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Modeling and User-adapted Interaction 4, 253–278.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.

Dufter, P., Schmitt, M., and Schütze, H. 2022. Position Information in Transformers: An Overview. Computational Linguistics 48, 3, 733–763.

Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C., and Garcia, R. 2000. Incorporating Second-order Functional Knowledge for Better Option Pricing. In Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp, Eds. MIT Press, 451–457.

Feng, M., Heffernan, N., and Koedinger, K. 2009. Addressing the Assessment Challenge with an Online System that Tutors as it Assesses. User Modeling and User-adapted Interaction 19, 3, 243–266.

Gervet, T., Koedinger, K., Schneider, J., Mitchell, T., et al. 2020. When Is Deep Learning the Best Approach to Knowledge Tracing? Journal of Educational Data Mining 12, 3, 31–54.

Ghosh, A., Heffernan, N., and Lan, A. S. 2020. Context-aware Attentive Knowledge Tracing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 2330–2339.

Girgis, R., Golemo, F., Codevilla, F., Weiss, M., D’Souza, J. A., Kahou, S. E., Heide, F., and Pal, C. 2022. Latent Variable Sequential Set Transformers for Joint Multi-Agent Motion Prediction. In International Conference on Learning Representations.

Glorot, X., Bordes, A., and Bengio, Y. 2011. Deep Sparse Rectifier Neural Networks. In International Conference on Artificial Intelligence and Statistics. Vol. 15. 315–323.

Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., et al. 2016. Hybrid Computing Using a Neural Network with Dynamic External Memory. Nature 538, 7626, 471–476.

Guo, X., Huang, Z., Gao, J., Shang, M., Shu, M., and Sun, J. 2021. Enhancing Knowledge Tracing via Adversarial Training. In Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 367–375.

He, K., Zhang, X., Ren, S., and Sun, J. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 770–778.

He, X., Lin, X., and Zhao, Y. 2024. Hypergraph Transformer for Knowledge Tracing. In IEEE 14th Annual Computing and Communication Workshop and Conference (CCWC). IEEE Computer Society, 0262–0268.

Hochreiter, S. and Schmidhuber, J. 1997. Long Short-Term Memory. Neural Computation 9, 8, 1735–1780.

Im, Y., Choi, E., Kook, H., and Lee, J. 2023. Forgetting-aware Linear Bias for Attentive Knowledge Tracing. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, 3958–3962.

Khajah, M., Wing, R., Lindsey, R. V., and Mozer, M. 2014. Integrating Latent-factor and Knowledge-tracing Models to Predict Individual Differences in Learning. In Educational Data Mining. International Educational Data Mining Society, 99–106.

Käser, T., Klingler, S., Schwing, A. G., and Gross, M. 2017. Dynamic Bayesian Networks for Student Modeling. IEEE Transactions on Learning Technologies 10, 4, 450–462.

LeCun, Y., Bengio, Y., and Hinton, G. 2015. Deep Learning. Nature 521, 7553, 436–444.

Lee, J. and Yeung, D.-Y. 2019. Knowledge Query Network for Knowledge Tracing: How Knowledge Interacts with Skills. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge. Association for Computing Machinery, New York, NY, USA, 491–500.

Lee, W., Chun, J., Lee, Y., Park, K., and Park, S. 2022. Contrastive Learning for Knowledge Tracing. In Proceedings of the ACM Web Conference 2022. Association for Computing Machinery, New York, NY, USA, 2330–2338.

Liu, Z., Liu, Q., Chen, J., Huang, S., Gao, B., Luo, W., and Weng, J. 2023. Enhancing Deep Knowledge Tracing with Auxiliary Tasks. In Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, New York, NY, USA, 4178–4187.

Liu, Z., Liu, Q., Chen, J., Huang, S., and Luo, W. 2023. simpleKT: A Simple but Tough-to-beat Baseline for Knowledge Tracing. In International Conference on Learning Representations.

Liu, Z., Liu, Q., Chen, J., Huang, S., Tang, J., and Luo, W. 2022. pyKT: A Python Library to Benchmark Deep Learning Based Knowledge Tracing Models. Advances in Neural Information Processing Systems 35, 18542–18555.

Long, T., Liu, Y., Shen, J., Zhang, W., and Yu, Y. 2021. Tracing Knowledge State with Individual Cognition and Acquisition Estimation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 173–182.

Nagatani, K., Zhang, Q., Sato, M., Chen, Y.-Y., Chen, F., and Ohkuma, T. 2019. Augmenting Knowledge Tracing by Considering Forgetting Behavior. In The World Wide Web Conference. Association for Computing Machinery, New York, NY, USA, 3101–3107.

Nakagawa, H., Iwasawa, Y., and Matsuo, Y. 2019. Graph-based Knowledge Tracing: Modeling Student Proficiency using Graph Neural Network. In IEEE/WIC/ACM International Conference on Web Intelligence. Association for Computing Machinery, New York, NY, USA, 156–163.

Pandey, S. and Karypis, G. 2019. A Self-attentive Model for Knowledge Tracing. In 12th International Conference on Educational Data Mining. International Educational Data Mining Society, 384–389.

Pandey, S. and Srivastava, J. 2020. RKT: Relation-aware Self-attention for Knowledge Tracing. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, New York, NY, USA, 1205–1214.

Pardos, Z. A. and Heffernan, N. T. 2011. KT-IDEM: Introducing Item Difficulty to the Knowledge Tracing Model. In User Modeling, Adaption and Personalization: 19th International Conference. Springer, Berlin, Heidelberg, 243–254.

Pavlik, P. I., Cen, H., and Koedinger, K. R. 2009. Performance Factors Analysis – A New Alternative to Knowledge Tracing. In Artificial Intelligence in Education. IOS Press, 531–538.

Piech, C., Bassen, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L., and Sohl-Dickstein, J. 2015. Deep Knowledge Tracing. In Advances in Neural Information Processing Systems. Vol. 1. Curran Associates, Inc., Red Hook, NY, USA, 505–513.

Press, O., Smith, N., and Lewis, M. 2021. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. In International Conference on Learning Representations.

Reckase, M. D. 2009. Multidimensional Item Response Theory. Springer, New York, NY.

Rodrigues, T. B., de Souza, J. F., Bernardino, H. S., and Baker, R. S. 2022. Towards Interpretability of Attention-Based Knowledge Tracing Models. In Anais do XXXIII Simpósio Brasileiro de Informática na Educação. SBC, Porto Alegre, RS, Brasil, 810–821.

Rudolph, Y., Neubauer, K., and Brefeld, U. 2025. Self-improvement for Computerized Adaptive Testing. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer Nature Switzerland, Cham, 70–86.

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., and Lillicrap, T. 2016. Meta-Learning with Memory-Augmented Neural Networks. In International Conference on Machine Learning. PMLR, 1842–1850.

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. 2009. The Graph Neural Network Model. IEEE Transactions on Neural Networks 20, 1, 61.

Schmidhuber, J. 2015. Deep Learning in Neural Networks: An Overview. Neural Networks 61, 85–117.

Self, J. A. 1974. Student models in Computer-aided Instruction. International Journal of Man-machine studies 6, 2, 261–276.

Shen, S., Huang, Z., Liu, Q., Su, Y., Wang, S., and Chen, E. 2022. Assessing Student’s Dynamic Knowledge State by Exploring the Question Difficulty Effect. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 427–437.

Shen, S., Liu, Q., Chen, E., Huang, Z., Huang, W., Yin, Y., Su, Y., and Wang, S. 2021. Learning Process-consistent Knowledge Tracing. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 1452–1460.

Sonkar, S., Waters, A. E., Lan, A. S., Grimaldi, P. J., and Baraniuk, R. G. 2020. qDKT: Question-Centric Deep Knowledge Tracing. In 13th International Conference on Educational Data Mining. International Educational Data Mining Society.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. 2014. Dropout: A Simple Way to Prevent Neural Networks From Overfitting. The Journal of Machine Learning Research 15, 1, 1929–1958.

Stamper, J., Niculescu-Mizil, A., Ritter, S., Gordon, G. J., and Koedinger, K. R. 2010a. Algebra I 2005-2006. Challenge Data Set from KDD Cup 2010 Educational Data Mining Challenge.

Stamper, J., Niculescu-Mizil, A., Ritter, S., Gordon, G. J., and Koedinger, K. R. 2010b. Bridge to Algebra 2006-2007. Challenge Data Set from KDD Cup 2010 Educational Data Mining Challenge.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. 2017. Attention Is All You Need. In Advances in Neural Information Processing Systems. Curran Associates, Inc., Red Hook, NY, USA, 5998–6008.

Vie, J.-J. and Kashima, H. 2019. Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. AAAI Press, 750–757.

Wang, C., Ma, W., Zhang, M., Lv, C., Wan, F., Lin, H., Tang, T., Liu, Y., and Ma, S. 2021. Temporal Cross-effects in Knowledge Tracing. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, NY, USA, 517–525.

Wang, Z., Lamb, A., Saveliev, E., Cameron, P., Zaykov, Y., Hernández-Lobato, J. M., Turner, R. E., Baraniuk, R. G., Barton, C., Jones, S. P., et al. 2020. Instructions and Guide for Diagnostic Questions: The NeurIPS 2020 Education Challenge.

Yang, Y., Shen, J., Qu, Y., Liu, Y., Wang, K., Zhu, Y., Zhang, W., and Yu, Y. 2021. GIKT: A Graph-based Interaction Model for Knowledge Tracing. In Machine Learning and Knowledge Discovery in Databases: European Conference. Springer International Publishing, Cham, 299–315.

Yeung, C.-K. 2019. Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory. In Proceedings of the 12th International Conference on Educational Data Mining. International Educational Data Mining Society, 683–686.

Yeung, C.-K. and Yeung, D.-Y. 2018. Addressing Two Problems in Deep Knowledge Tracing via Prediction-consistent Regularization. In Proceedings of the 5th Annual ACM Conference on Learning @ Scale. Association for Computing Machinery, New York, NY, USA, 1–10.

Yin, Y., Dai, L., Huang, Z., Shen, S., Wang, F., Liu, Q., Chen, E., and Li, X. 2023. Tracing Knowledge Instead of Patterns: Stable Knowledge Tracing with Diagnostic Transformer. In Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, New York, NY, USA, 855–864.

Yudelson, M. V., Koedinger, K. R., and Gordon, G. J. 2013. Individualized Bayesian Knowledge Tracing Models. In Artificial Intelligence in Education: 16th International Conference. Springer, Berlin, Heidelberg, 171–180.

Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczós, B., Salakhutdinov, R. R., and Smola, A. J. 2017. Deep Sets. In Advances in Neural Information Processing Systems. Curran Associates, Inc., Red Hook, NY, USA, 3391–3401.

Zhan, B., Guo, T., Li, X., Hou, M., Liang, Q., Gao, B., Luo, W., and Liu, Z. 2024. Knowledge Tracing as Language Processing: A Large-Scale Autoregressive Paradigm. In International Conference on Artificial Intelligence in Education. Springer, Berlin, Heidelberg, 177–191.

Zhang, J., Shi, X., King, I., and Yeung, D.-Y. 2017. Dynamic Key-Value Memory Networks for Knowledge Tracing. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 765–774.

Issue

Vol 18 No 1 (2026)

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms:

The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:

Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
The Author represents and warrants that:

the Work is the Author’s original work;
the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
the Work is not pending review or under consideration by another publisher;
the Work has not previously been published;
the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
the Work contains no libel, invasion of privacy, or other unlawful matter.

The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.

Main

Sidebar

Abstract

How to Cite

Details