Predicting Students’ Future Success: Harnessing Clickstream Data with Wide & Deep Item Response Theory

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Oct 17, 2024
Shi Pu Yu Yan Brandon Zhang

Abstract

We propose a novel model, Wide & Deep Item Response Theory (Wide & Deep IRT), to predict the correctness of students’ responses to questions using historical clickstream data. This model combines the strengths of conventional Item Response Theory (IRT) models and Wide & Deep Learning for Recommender Systems. By leveraging clickstream data, Wide & Deep IRT provides precise predictions of answer correctness while enabling the exploration of behavioral patterns among different ability groups.
Our experimental results based on a real-world dataset (EDM Cup 2023) demonstrate that Wide & Deep IRT outperforms conventional IRT models and state-of-the-art knowledge tracing models while maintaining the ease of interpretation associated with IRT models. Our model performed very well in the EDM Cup 2023 competition, placing second on the public leaderboard and third on the private leaderboard. Additionally, Wide & Deep IRT identifies distinct behavioral patterns across ability groups. In the EDM Cup 2023 dataset, low-ability students were more likely to directly request an answer to a question before attempting to respond, which can negatively impact their learning outcomes and potentially indicates attempts to game the system. Lastly, the Wide & Deep IRT model consists of significantly fewer parameters compared to traditional IRT models and deep knowledge tracing models, making it easier to deploy in practice. The source code is available via Open Science Framework https://osf.io/8vcfd/.

How to Cite

Pu, S., Yan, Y., & Zhang, B. (2024). Predicting Students’ Future Success: Harnessing Clickstream Data with Wide & Deep Item Response Theory. Journal of Educational Data Mining, 16(2), 1–31. https://doi.org/10.5281/zenodo.13627151
Abstract 105 | HTML Downloads 41 PDF Downloads 96

##plugins.themes.bootstrap3.article.details##

Keywords

wide & deep learning, item response theory, knowledge tracing, Explainable student modeling

References
ABADI, M., BARHAM, P., CHEN, J., CHEN, Z., DAVIS, A., DEAN, J., DEVIN, M., GHEMAWAT, S., IRVING, G., ISARD, M., ET AL. 2016. Tensorflow: a system for large-scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). USENIX Association, 265–283.

AGUDO-PEREGRINA, Á. F., IGLESIAS-PRADAS, S., CONDE-GONZÁLEZ, M. Á., AND HERNÁNDEZGARCÍA, Á. 2014. Can we predict success from log data in vles? classification of interactions for learning analytics and their relation with performance in vle-supported f2f and online learning. Computers in human behavior 31, 542–550.

BAKER, F. B. 2001. The basics of item response theory, 2nd ed. ERIC Clearinghouse on Assessment and Evaluation. Retrieved from https://eric.ed.gov/?id=ED458219.

BAKER, F. B. AND KIM, S.-H. 2004. Item response theory: Parameter estimation techniques, 2nd ed. Marcel Dekker, New York.

BAKER, R., WALONOSKI, J., HEFFERNAN, N., ROLL, I., CORBETT, A., AND KOEDINGER, K. 2008. Why students engage in “gaming the system” behavior in interactive learning environments. Journal of Interactive Learning Research 19, 2, 185–224.

BAKER, R., XU, D., PARK, J., YU, R., LI, Q., CUNG, B., FISCHER, C., RODRIGUEZ, F., WARSCHAUER, M., AND SMYTH, P. 2020. The benefits and caveats of using clickstream data to understand student self-regulatory behaviors: opening the black box of learning processes. International Journal of Educational Technology in Higher Education 17, 1, 1–24.

BENEDETTO, L., ARADELLI, G., CREMONESI, P., CAPPELLI, A., GIUSSANI, A., AND TURRIN, R. 2021. On the application of transformers for estimating the difficulty of multiple-choice questions from text. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, J. Burstein, A. Horbach, E. Kochmar, R. Laarmann-Quante, C. Leacock, N. Madnani, I. Pilán, H. Yannakoudakis, and T. Zesch, Eds. Association for Computational Linguistics, Online, 147–157.

BESEISO, M. AND ALZAHRANI, S. 2020. An empirical analysis of bert embedding for automated essay scoring. International Journal of Advanced Computer Science and Applications 11, 10, 204–210.

BIRNBAUM, A. 1968. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores, F. M. Lord and M. R. Novick, Eds. Addison-Wesley, Reading, MA, 397–472.

CELLA, D., RILEY, W., STONE, A., ROTHROCK, N., REEVE, B., YOUNT, S., AMTMANN, D., BODE, R., BUYSSE, D., CHOI, S., ET AL. 2010. The patient-reported outcomes measurement information system (promis) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of clinical epidemiology 63, 11, 1179–1194.

CHENG, H.-T., KOC, L., HARMSEN, J., SHAKED, T., CHANDRA, T., ARADHYE, H., ANDERSON, G., CORRADO, G., CHAI, W., ISPIR, M., ANIL, R., HAQUE, Z., HONG, L., JAIN, V., LIU, X., AND SHAH, H. 2016.Wide & deep learning for recommender systems. In Proceedings of the 1stWorkshop on Deep Learning for Recommender Systems. DLRS 2016. Association for Computing Machinery, New York, NY, USA, 7–10.

CHENG, S., LIU, Q., CHEN, E., HUANG, Z., HUANG, Z., CHEN, Y., MA, H., AND HU, G. 2019. Dirt: Deep learning enhanced item response theory for cognitive diagnosis. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. CIKM ’19. Association for Computing Machinery, New York, NY, USA, 2397–2400.

CHOI, Y., LEE, Y., CHO, J., BAEK, J., KIM, B., CHA, Y., SHIN, D., BAE, C., AND HEO, J. 2020. Towards an appropriate query, key, and value computation for knowledge tracing. In Proceedings of the Seventh ACM Conference on Learning@Scale. L@S ’20. Association for Computing Machinery, New York, NY, USA, 341–344.

COHEN, A. 2017. Analysis of student activity in web-supported courses as a tool for predicting dropout. Educational Technology Research and Development 65, 5, 1285–1304.

COOPER, H., ROBINSON, J. C., AND PATALL, E. A. 2006. Does homework improve academic achievement? a synthesis of research, 1987–2003. Review of educational research 76, 1, 1–62.

CORBETT, A. T. AND ANDERSON, J. R. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction 4, 4, 253–278.

CROSSLEY, S., PAQUETTE, L., DASCALU, M., MCNAMARA, D. S., AND BAKER, R. S. 2016. Combining click-stream data with nlp tools to better understand mooc completion. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge. LAK ’16. Association for Computing Machinery, New York, NY, USA, 6–14.

DEVLIN, J., CHANG, M.-W., LEE, K., AND TOUTANOVA, K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.

DONG, L., WEI, F., TAN, C., TANG, D., ZHOU, M., AND XU, K. 2014. Adaptive recursive neural network for target-dependent Twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), K. Toutanova and H. Wu, Eds. Association for Computational Linguistics, Baltimore, Maryland, 49–54.

DWARAMPUDI, M. AND REDDY, N. 2019. Effects of padding on lstms and cnns. arXiv preprint arXiv:1903.07288.

EMBRETSON, S. E. AND REISE, S. P. 2013. Item response theory. Psychology Press, New York.

FAN, H., XU, J., CAI, Z., HE, J., AND FAN, X. 2017. Homework and students’ achievement in math and science: A 30-year meta-analysis, 1986–2015. Educational Research Review 20, 35–54.

GHOSH, A., HEFFERNAN, N., AND LAN, A. S. 2020. Context-aware attentive knowledge tracing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’20. Association for Computing Machinery, New York, NY, USA, 2330–2339.

GONZÁLEZ-BRENES, J., HUANG, Y., AND BRUSILOVSKY, P. 2014. General features in knowledge tracing to model multiple subskills, temporal item response theory, and expert knowledge. In The 7th international conference on educational data mining, J. Stamper, Z. Pardos, M. Mavrikis, and B. M. McLaren, Eds. 84–91.

GRAVES, A., WAYNE, G., REYNOLDS, M., HARLEY, T., DANIHELKA, I., GRABSKA-BARWI´N SKA, A., COLMENAREJO, S. G., GREFENSTETTE, E., RAMALHO, T., AGAPIOU, J., ET AL. 2016. Hybrid computing using a neural network with dynamic external memory. Nature 538, 7626, 471–476.

HAMBLETON, R. K., SWAMINATHAN, H., AND ROGERS, H. J. 1991. Fundamentals of item response theory. Vol. 2. SAGE.

HINTON, G. E. AND SALAKHUTDINOV, R. R. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786, 504–507.

HOANG, M., BIHORAC, O. A., AND ROUCES, J. 2019. Aspect-based sentiment analysis using BERT. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, M. Hartmann and B. Plank, Eds. Linköping University Electronic Press, Turku, Finland, 187–196.

HOCHREITER, S. AND SCHMIDHUBER, J. 1997. Long short-term memory. Neural computation 9, 8, 1735–1780.

JEFFORD, M., WARD, A. C., LISY, K., LACEY, K., EMERY, J. D., GLASER, A. W., CROSS, H., KRISHNASAMY, M., MCLACHLAN, S.-A., AND BISHOP, J. 2017. Patient-reported outcomes in cancer survivors: a population-wide cross-sectional study. Supportive care in cancer : official journal of the Multinational Association of Supportive Care in Cancer 25, 10, 3171–3179.

JUHAŇÁK, L., ZOUNEK, J., AND ROHLÍKOVÁ, L. 2019. Using process mining to analyze students’ quiz-taking behavior patterns in a learning management system. Computers in Human Behavior 92, 496–506.

KEITH, T. Z., DIAMOND-HALLAM, C., AND FINE, J. G. 2004. Longitudinal effects of in-school and out-of-school homework on high school grades. School Psychology Quarterly 19, 3, 187.

KHAJAH, M., LINDSEY, R. V., AND MOZER, M. C. 2016. How deep is knowledge tracing? In Proceedings of the 9th International Conference on Educational Data Mining, T. Barnes, M. Chi, and M. Feng, Eds. International Educational Data Mining Society, 94–101.

KINGMA, D. P. AND BA, J. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR). Ithaca, NY: ArXiv, 1–13.

KWAK, S. G. AND KIM, J. H. 2017. Central limit theorem: the cornerstone of modern statistics. Korean journal of anesthesiology 70, 2, 144–156.

LIM, J. M. 2016. Predicting successful completion using student delay indicators in undergraduate selfpaced online courses. Distance Education 37, 3, 317–332.

LINDEN, W. J., VAN DER LINDEN, W. J., AND GLAS, C. A. 2000. Computerized adaptive testing: Theory and practice. Springer.

LINDSEY, R. V., SHROYER, J. D., PASHLER, H., AND MOZER, M. C. 2014. Improving students’ longterm knowledge retention through personalized review. Psychological science 25, 3, 639–647.

LIU, Q., HUANG, Z., YIN, Y., CHEN, E., XIONG, H., SU, Y., AND HU, G. 2021. Ekt: Exercise-aware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering 33, 1, 100–115.

LIU, Y., YANG, Y., CHEN, X., SHEN, J., ZHANG, H., AND YU, Y. 2020. Improving knowledge tracing via pre-training question embeddings. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, C. Bessiere, Ed. International Joint Conferences on Artificial Intelligence Organization, 1577–1583. Main track.

LOH, H., SHIN, D., LEE, S., BAEK, J., HWANG, C., LEE, Y., CHA, Y., KWON, S., PARK, J., AND CHOI, Y. 2021. Recommendation for effective standardized exam preparation. In LAK21: 11th International Learning Analytics and Knowledge Conference. LAK21. Association for Computing Machinery, New York, NY, USA, 397–404.

LORD, F. 1952. A theory of test scores. Psychometric Society.

LORD, F. 1980. Applications of Item Response Theory To Practical Testing Problems, 1 ed. Lawrence Erlbaum Associates, New York.

MACFADYEN, L. P. AND DAWSON, S. 2010. Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers & Education 54, 2, 588–599.

MAGALHÃES, P., FERREIRA, D., CUNHA, J., AND ROSÁRIO, P. 2020. Online vs traditional homework: A systematic review on the benefits to students’ performance. Computers & Education 152, 103869.

MAYFIELD, E. AND BLACK, A. W. 2020. Should you fine-tune BERT for automated essay scoring? In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, J. Burstein, E. Kochmar, C. Leacock, N. Madnani, I. Pilán, H. Yannakoudakis, and T. Zesch, Eds. Association for Computational Linguistics, Seattle, WA, USA → Online, 151–162.

NAIR, V. AND HINTON, G. E. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10. Omnipress, Madison, WI, USA, 807–814.

NAKAGAWA, H., IWASAWA, Y., AND MATSUO, Y. 2019. Graph-based knowledge tracing: Modeling student proficiency using graph neural network. In 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI). IEEE Computer Society, Los Alamitos, CA, USA, 156–163.

PANDEY, S. AND KARYPIS, G. 2019. A self attentive model for knowledge tracing. In Proceedings of The 12th International Conference on Educational Data Mining (EDM 2019), C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. 384–389.

PANDEY, S. AND SRIVASTAVA, J. 2020. RKT: Relation-aware self-attention for knowledge tracing. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. CIKM ’20. Association for Computing Machinery, New York, NY, USA, 1205–1214.

PARDOS, Z. A. AND HEFFERNAN, N. T. 2011. KT-IDEM: Introducing item difficulty to the knowledge tracing model. In User Modeling, Adaption and Personalization:19th International Conference, UMAP 2011, Girona, Spain, July 11-15, 2011. Proceedings 19, J. A. Konstan, R. Conejo, J. L. Marzo, and N. Oliver, Eds. Springer, Berlin, Heidelberg, 243–254.

PARK, J., YU, R., RODRIGUEZ, F., BAKER, R., SMYTH, P., AND WARSCHAUER, M. 2018. Understanding student procrastination via mixture models. In Proceedings of the 11th International Conference on Educational Data Mining, K. E. Boyer and M. Yudelson, Eds. International Educational Data Mining Society, 187–197.

PIECH, C., BASSEN, J., HUANG, J., GANGULI, S., SAHAMI, M., GUIBAS, L., AND SOHL-DICKSTEIN, J. 2015. Deep knowledge tracing. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1. NIPS’15. MIT Press, Cambridge, MA, USA, 505–513.

PISTILLI, M. D. AND ARNOLD, K. E. 2010. Purdue signals: Mining real-time academic data to enhance student success. About campus 15, 3, 22–24.

PU, S., CONVERSE, G., AND HUANG, Y. 2021. Deep performance factors analysis for knowledge tracing. In Artificial Intelligence in Education, I. Roll, D. McNamara, S. Sosnovsky, R. Luckin, and V. Dimitrova, Eds. Springer International Publishing, Cham, 331–341.

PU, S., YUDELSON, M., OU, L., AND HUANG, Y. 2020. Deep knowledge tracing with transformers. In Artificial Intelligence in Education, I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, and E. Millán, Eds. Springer International Publishing, Cham, 252–256.

QU, C., YANG, L., QIU, M., CROFT, W. B., ZHANG, Y., AND IYYER, M. 2019. BERT with history answer embedding for conversational question answering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR’19. Association for Computing Machinery, New York, NY, USA, 1133–1136.

RASCH, G. 1961. On general laws and the meaning of measurement. In Proceedings of the IV Berkeley Symposium on Mathematical Statistics and Probability, J. Neyman, Ed. Vol. 4. University of California Press, Oakland, CA, USA, 321–333.

RITTER, S., YUDELSON, M., FANCSALI, S. E., AND BERMAN, S. R. 2016. How mastery learning works at scale. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale. L@S ’16. Association for Computing Machinery, New York, NY, USA, 71–79.

SAK, H., SENIOR, A. W., AND BEAUFAYS, F. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In 15th Annual Conference of the International Speech Communication Association, INTERSPEECH 2014, Singapore, September 14-18, 2014, H. Li, H. M. Meng, B. Ma, E. Chng, and L. Xie, Eds. ISCA, 338–342.

SCRUGGS, R., BAKER, R., AND MCLAREN, B. 2020. Extending deep knowledge tracing: Inferring interpretable knowledge and predicting post-system performance. In Proceedings of the 28th International Conference on Computers in Education, H. J. S. et al., Ed. Australia: Asia-Pacific Society for Computers in Education.

SCRUGGS, R., BAKER, R. S., PAVLIK, P. I., MCLAREN, B. M., AND LIU, Z. 2023. How well do contemporary knowledge tracing algorithms predict the knowledge carried out of a digital learning game? Educational technology research and development 71, 3, 901–918.

SETTLES, B. AND MEEDER, B. 2016. A trainable spaced repetition model for language learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), K. Erk and N. A. Smith, Eds. Association for Computational Linguistics, Berlin, Germany, 1848–1858.

SHIN, D., SHIM, Y., YU, H., LEE, S., KIM, B., AND CHOI, Y. 2021. Saint+: Integrating temporal features for ednet correctness prediction. In LAK21: 11th International Learning Analytics and Knowledge Conference. LAK21. Association for Computing Machinery, New York, NY, USA, 490–496.

SONG, X., LI, J., LEI, Q., ZHAO, W., CHEN, Y., AND MIAN, A. 2022. Bi-clkt: Bi-graph contrastive learning based knowledge tracing. Knowledge-Based Systems 241, 108274.

SRIVASTAVA, N., HINTON, G., KRIZHEVSKY, A., SUTSKEVER, I., AND SALAKHUTDINOV, R. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1, 1929–1958.

SUN, C., QIU, X., XU, Y., AND HUANG, X. 2019. How to fine-tune bert for text classification? In Chinese Computational Linguistics, M. Sun, X. Huang, H. Ji, Z. Liu, and Y. Liu, Eds. Springer International Publishing, Cham, 194–206.

SUTSKEVER, I., VINYALS, O., AND LE, Q. V. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14. MIT Press, Cambridge, MA, USA, 3104–3112.

TONG, H., ZHOU, Y., AND WANG, Z. 2020. Exercise hierarchical feature enhanced knowledge tracing. In Artificial Intelligence in Education, I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, and E. Millán, Eds. Springer International Publishing, Cham, 324–328.

TSUTSUMI, E., KINOSHITA, R., AND UENO, M. 2021a. Deep-IRT with independent student and item networks. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. enn Vie, Eds. 510–517.

TSUTSUMI, E., KINOSHITA, R., AND UENO, M. 2021b. Deep item response theory as a novel test theory based on deep learning. Electronics 10, 9, 1020.

VASWANI, A., SHAZEER, N., PARMAR, N., USZKOREIT, J., JONES, L., GOMEZ, A. N., KAISER, L., AND POLOSUKHIN, I. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Curran Associates Inc., Red Hook, NY, USA, 6000–6010.

WISE, S. L. 2017. Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice 36, 4, 52–61.

YEUNG, C.-K. 2019. Deep-irt: Make deep learning based knowledge tracing explainable using item response theory. In Proceedings of The 12th International Conference on Educational Data Mining (EDM 2019), C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. 683–686.

YEUNG, C.-K. AND YEUNG, D.-Y. 2018. Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale. L@S ’18. Association for Computing Machinery, New York, NY, USA, 1–10.

YOU, J. W. 2016. Identifying significant indicators using lms data to predict course achievement in online learning. The Internet and Higher Education 29, 23–30.

YUDELSON, M. V., KOEDINGER, K. R., AND GORDON, G. J. 2013. Individualized bayesian knowledge tracing models. In Artificial Intelligence in Education, H. C. Lane, K. Yacef, J. Mostow, and P. Pavlik, Eds. Springer Berlin Heidelberg, Berlin, Heidelberg, 171–180.

ZHANG, J., SHI, X., KING, I., AND YEUNG, D.-Y. 2017. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web. WWW ’17. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 765–774.

ZHANG, L., XIONG, X., ZHAO, S., BOTELHO, A., AND HEFFERNAN, N. T. 2017. Incorporating rich features into deep knowledge tracing. In Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale. L@S ’17. Association for Computing Machinery, New York, NY, USA, 169–172.

ZHANG, N., DU, Y., DENG, K., LI, L., SHEN, J., AND SUN, G. 2020. Attention-based knowledge tracing with heterogeneous information network embedding. In Knowledge Science, Engineering and Management, G. Li, H. T. Shen, Y. Yuan, X. Wang, H. Liu, and X. Zhao, Eds. Springer International Publishing, Cham, 95–103.
Section
Special Section EDM Cup 2023