Using Problem Similarity- and Order-based Weighting to Model Learner Performance in Introductory Computer Science Problems

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Mar 15, 2023
Yingbin Zhang Juan D. Pinto Aysa Xuemo Fan Luc Paquette

Abstract

The second CSEDM data challenge aimed at finding innovative methods to use students’ programming traces to model their learning. The main challenge of this task is how to decide which past problems are relevant for predicting performance on a future problem. This paper proposes a set of weighting schemes to address this challenge. Specifically, students’ behaviors and performance on past problems were weighted in predicting performance on future problems. The weight of a past problem was proportional to its similarity with the future problem. Problem similarity was quantified in terms of source code, problem prompts, and struggling patterns. In addition, we considered another weighting scheme where past problems were weighted by the order in which students attempted them. Prior studies have used problem similarity and order information in learner modeling, but the proposed weighting schemes are more flexible in capturing problem similarity on various problem properties and weighting various behaviors and performance information on past problems. We systematically investigate the utility of the weighting schemes on performance prediction through two analyses. The first analysis found that the weighting schemes based on source code similarity, struggling pattern similarity, and problem order improved the prediction performance, but the weighting scheme based on problem prompts did not. The second analysis found that the weighting scheme allows a simple and interpretable model, such as logistic regression, to have performance comparable to state-of-the-art deep-learning models. We discussed the implications of the weighting schemes for learner modeling and suggested directions for further improvement.

How to Cite

Zhang, Y., Pinto, J. D., Fan, A. X., & Paquette, L. (2023). Using Problem Similarity- and Order-based Weighting to Model Learner Performance in Introductory Computer Science Problems. Journal of Educational Data Mining, 15(1), 63–99. https://doi.org/10.5281/zenodo.7646789
Abstract 516 | PDF Downloads 365

##plugins.themes.bootstrap3.article.details##

Keywords

learner modeling, programming trace, problem similarity, knowledge tracing, performance prediction

References
ALEVEN, V., 2010. Rule-based cognitive modeling for intelligent tutoring systems. In Advances in Intelligent Tutoring Systems, R. Nkambou, J. Bourdeau, R. Mizoguchi, Eds. Studies in Computational Intelligence, vol 308, Springer, Berlin, Heidelberg, 33-62.

ALON, U., ZILBERSTEIN, M., LEVY, O., AND YAHAV, E., 2019. Code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages. 3, POPL, 1-29.

ANDERSON, J. R., CORBETT, A. T., KOEDINGER, K. R., AND PELLETIER, R., 1995. Cognitive Tutors: Lessons learned. Journal of the Learning Sciences. 4, 2, 167-207.

AZCONA, D., ARORA, P., Hsiao, I., AND SMEATON, A., 2019. User2code2vec: Embeddings for profiling students based on distributional representations of source code. In Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK 2019). Association for Computing Machinery, 86-95.

BAUMSTARK, L., AND ORSEGA, M., 2016. Quantifying introductory CS students' iterative software process by mining version control system repositories. Journal of Computing Sciences in Colleges 31, 6, 97-104.

BECK, J. E., CHANG, K., MOSTOW, J., AND CORBETT, A., 2008. Does help help? Introducing the Bayesian evaluation and assessment methodology. In Intelligent Tutoring Systems. ITS 2008, B. P. Woolf, E. Aïmeur, R. Nkambou, S. Lajoie, Eds. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 383-394.

BECKER, B. A., GLANVILLE, G., IWASHIMA, R., MCDONNELL, C., GOSLIN, K., AND MOONEY, C., 2016. Effective compiler error message enhancement for novice programming students. Computer Science Education 26, 2-3, 148-175.

BOSCH, N., AND PAQUETTE, L., 2021. What' s next? Sequence length and impossible loops in state transition measurement. Journal of Educational Data Mining. 13, 1, 1-23.

CARTER, A. S., HUNDHAUSEN, C. D., AND ADESOPE, O., 2015. The normalized programming state model: Predicting student performance in computing courses based on programming behavior. In Proceedings of the 11th Annual International Conference on International Computing Education Research (ICER 2015). Association for Computing Machinery, 141- 150.

CECHÁK, J., AND PELÁNEK, R., 2021. Experimental evaluation of similarity measures for educational items. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), S. I. H. Hsiao, S. Sahebi, F. Bouchet, and J. Vie, Eds. International Educational Data Mining Society, 553-558.

CEN, H., KOEDINGER, K., AND JUNKER, B., 2006. Learning factors analysis: A general method for cognitive model evaluation and improvement. In Intelligent Tutoring Systems. ITS 2006, M. Ikeda, K. D. Ashley, and T. Chan, Eds. Lecture Notes in Computer Science, vol 4053, Springer, Berlin, Heidelberg, 164-175.

CHALMERS, R. P., 2012. mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 6, 1-29.

CHOI, Y., LEE, Y., CHO, J., BAEK, J., KIM, B., CHA, Y., SHIN, D., BAE, C., AND HEO, J., 2020. Towards an appropriate query, key, and value computation for knowledge tracing. In Proceedings of the 7th ACM Conference on Learning @ Scale (L@S 2020). Association for Computing Machinery, 341-344.

CORBETT, A. T., AND ERSON, J. R., 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction 4, 4, 253-278.

DAGNE, G. A., HOWE, G. W., BROWN, C. H., AND MUTHÉN, B. O., 2002. Hierarchical modeling of sequential behavioral data: An empirical Bayesian approach. Psychological Methods 7, 2, 262-280.

EMBRETSON, S. E., AND REISE, S. P., 2000. Item Response Theory for Psychologists. Lawrence Erlbaum Associates.

FITZGERALD, S., LEWANDOWSKI, G., MCCAULEY, R., MURPHY, L., SIMON, B., THOMAS, L., AND ZANDER, C., 2008. Debugging: Finding, fixing and flailing, a multi-institutional study of novice debuggers. Computer Science Education 18, 2, 93-116.

GERVET, T., KOEDINGER, K., SCHNEIDER, J., AND MITCHELL, T., 2020. When is deep learning the best approach to knowledge tracing? Journal of Educational Data Mining 12, 3, 31-54.

GONG, Y., BECK, J. E., AND HEFFERNAN, N. T., 2011. How to construct more accurate student models: Comparing and optimizing knowledge tracing and performance factor analysis. International Journal of Artificial Intelligence in Education 21, 1–2, 27-45.

HOSSEINI, R., AND BRUSILOVSKY, P., 2013. JavaParser: A fine-grain concept indexing tool for java problems. In The 1st Workshop on AI-supported Education for Computer Science, N. Le, K. E. Boyer, B. Chaudhry, B. Di Eugenio, S. I. H. Hsiao, L. A. Sudol-Delyser, Eds. CEUR Workshop Proceedings, vol 1009, 60-63.

JADUD, M. C., 2006a. Methods and tools for exploring novice compilation behaviour. In Proceedings of the 2nd International Workshop on Computing Education Research (ICER 2006). Association for Computing Machinery, 73-84.

JADUD, M. C., 2006b. An exploration of novice compilation behaviour in BlueJ. Ph.D. thesis, University of Kent, Canterbury, United Kingdom.

JURAFSKY, D., AND MARTIN, J. H., 2000. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall.

LE, Q., AND MIKOLOV, T., 2014. Distributed representations of sentences and documents. Proceedings of the 31st International Conference on Machine Learning (ICML 2014), E. P. Xing and T. Jebara, Eds. Association for Computing Machinery, 1188-1196.

LEE, Y., CHOI, Y., CHO, J., FABBRI, A. R., LOH, H., HWANG, C., LEE, Y., KIM, S., AND RADEV, D., 2019. Creating a neural pedagogical agent by jointly learning to review and assess. arXiv. https://doi.org/10.48550/arxiv.1906.10910

LIU, Q., HUANG, Z., YIN, Y., CHEN, E., XIONG, H., SU, Y., AND HU, G., 2021. EKT: Exerciseaware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering, 33, 1, 100-115.

LUCKIN, R., HOLMES, W., GRIFFITHS, M., AND FORCIER, L. B., 2016. Intelligence Unleashed: An Argument for AI in Education. Pearson Education, London.

MAO, Y., ZHI, R., AND KHOSHNEVISAN, F., 2019. One minute is enough: Early prediction of student success and event-level difficulty during a novice programming task. In Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019), C. F. Lynch, A. Merceron, M. Desmarais, R. Nkambou, Eds. International Educational Data Mining Society, Montréal, Canada, 119-128.

NATTI, A., AND ATHREY, D., 2019. CSEDM 2019 challenge. In Joint Proceedings of the 2nd CSEDM Workshop at International Conference on Learning Analytics and Knowledge 2019, D. Azcona, Y. V. Paredes, S. I. H. Hsiao, and T. W. Price, Eds. CEUR Workshop Proceedings.

PANDEY, S., AND KARYPIS, G., 2019. A self-attentive model for knowledge tracing. In Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019), C. F. Lynch, A. Merceron, M. Desmarais, R. Nkambou, Eds. International Educational Data Mining Society, 384-389.

PARDOS, Z. A., AND HEFFERNAN, N. T., 2010. Modeling individualization in a Bayesian networks implementation of knowledge tracing. In User Modeling, Adaptation, and Personalization (UMAP 2010), P. De Bra, A. Kobsa, D. Chin, Eds. Lecture Notes in Computer Science, vol 6075, Springer, Berlin, Heidelberg, 255-266.

PAVLIK JR, P. I., CEN, H., AND KOEDINGER, K. R., 2009. Performance factors analysis – a new alternative to knowledge tracing. In Proceedings of the 14th International Conference on Artificial Intelligence in Education (AIED 2009), V. Dimitrova, R. Mizoguchi, B. du Boulay, A. Graesser, Eds. IOS Press, Amsterdam, Netherlands, 531–538.

PEDRO, M. A. S., BAKER, R. S. J. D., AND GOBERT, J. D., 2013. What different kinds of stratification can reveal about the generalizability of data-mined skill assessment models. In Proceedings of the 3rd International Conference on Learning Analytics and Knowledge (LAK’13). Association for Computing Machinery, 190-194.

PELÁNEK, R., 2020. Measuring similarity of educational items: An overview. IEEE Transactions on Learning Technology 13, 2, 354-366.

PELÁNEK, R., EFFENBERGER, T., VANĚK, M., SASSMANN, V., AND GMITERKO, D., 2018. Measuring item similarity in introductory programming. In Proceedings of the 5th Annual ACM Conference on Learning at Scale (L@S 2018). Association for Computing Machinery, Article 19.

PIECH, C., BASSEN, J., HUANG, J., GANGULI, S., SAHAMI, M., GUIBAS, L. J., AND SOHLDICKSTEIN, J., 2015. Deep knowledge tracing. In Proceedings of Advances in Neural Information Processing Systems, vol. 28, S. Becker, S. Thrun, K. Obermayer, Eds. MIT Press, Cambridge, MA, United States, 505–513.

PINTO, J. D., ZHANG, Y., PAQUETTE, L., AND FAN, A. X., 2021. Investigating elements of student persistence in an introductory computer science course. In Joint Proceedings of the 5th CSEDM Workshop at the International Conference on Educational Data Mining 2021, T. W. Price and S. San Pedro, Eds. CEUR Workshop Proceedings, vol 3051.

REHUREK, R., AND SOJKA, P., 2010. Software framework for topic modelling with large corpora. In Proceedings of LREC 2010 Workshop New Challenges for NLP Frameworks. University of Malta, Valletta, Malta, 46-50.

ROSENTHAL, J. A., 1996. Qualitative descriptors of strength of association and effect size. Journal of Social Service Research 21, 4, 37-59.

SAHEBI, S., AND BRUSILOVSKY, P., 2018. Student performance prediction by discovering Inter- Activity relations. In Proceedings of the 11th International Conference on Educational Data Mining (EDM 2018), K. E. Boyer, M. Yudelson, Eds. International Educational Data Mining Society, 87-96.

SAHEBI, S., LIN, Y. R., AND BRUSILOVSKY, P., 2016. Tensor factorization for student modeling and performance prediction in unstructured domain. In Proceedings of the 9th International Conference on Educational Data Mining (EDM 2016), T. Barnes, M. Chi, M. Feng, Eds. International Educational Data Mining Society, 502-506.

SARSA, S., LEINONEN, J., AND HELLAS, A., 2022. Empirical evaluation of deep learning models for knowledge tracing: Of hyperparameters and metrics on performance and replicability. Journal of Educational Data Mining 14, 2, 32-102.

SHI, Y., CHI, M., BARNES, T., AND PRICE, T.W., 2022. Code-DKT: A code-based knowledge tracing model for programming tasks. In Proceedings of the 15th International Conference on Educational Data Mining (EDM 2022), A. Mitrovic, N. Bosch, Eds. International Educational Data Mining Society, 50-61.

SHI, Y., MAO, Y., BARNES, T., CHI, M., AND PRICE, T. W., 2021. More with less: Exploring how to use deep learning effectively through semi-supervised learning for automatic bug detection in student code. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), S. I. H. Hsiao, S. Sahebi, F. C. Bouchet, J. Vie, Eds. International Educational Data Mining Society, 446-453.

SHI, Y., SHAH, K., WANG, W., MARWAN, S., PENMETSA, P., AND PRICE, T. W., 2021. Toward semi-automatic misconception discovery using code embeddings. In Proceedings of the 11th International Conference on Learning Analytics and Knowledge (LAK’21). Association for Computing Machinery, 606-612.

SHUTE, V. J., AND ZAPATA-RIVERA, D., 2012. Adaptive educational systems. In Adaptive Technologies for Training and Education, P. J. Durlach, A. M. Lesgold, Eds. Cambridge University Press, 7-27.

STAMPER, J. C., KOEDINGER, K. R., BISWAS, G., BULL, S., KAY, J., AND MITROVIC, A., 2011. Human-machine student model discovery and improvement using datashop. In Artificial Intelligence in Education. AIED 2011, G. Biswas, S. Bull, J. Kay, A. Mitrovic, Eds. Lecture Notes in Computer Science, vol 6738, Springer, Berlin, Heidelberg, 353-360.

TABANAO, E. S., RODRIGO, M. M. T., AND JADUD, M. C., 2011. Predicting at-risk novice Java programmers through the analysis of online protocols. In Proceedings of the 7th International Workshop on Computing Education Research (ICER 2011). Association for Computing Machinery, New York, NY, United States, 85-92.

VILLAMOR, M. M., 2020. A review on process-oriented approaches for analyzing novice solutions to programming problems. Research and Practice in Technology Enhanced Learning 15, 1-23.

VON DAVIER, M., 2016. Rasch model. In Handbook of Item Response Theory, Volume One, W. J. van der Linden, Eds. Chapman and Hall/CRC, 31-50.

WANG, L., SY, A., LIU, L., AND PIECH, C., 2017. Learning to represent student knowledge on programming exercises using deep learning. In Proceedings of the 10th International Conference on Educational Data Mining (EDM 2017), X. Hu, T. Barnes, A. Hershkovitz, L. Paquette, Eds. International Educational Data Mining Society, 324-329.

WOOLF, B. P., LANE, H. C., CHAUDHRI, V. K., AND KOLODNER, J. L., 2013. AI grand challenges for education. AI Magazine 34, 4, 66-84.

YECKEHZAARE, I., MULLIGAN, V., RAMSTAD, G., AND RESNICK, P., 2022. Semester-level spacing but not procrastination affected student exam performance. In Proceedings of 12th International Conference on Learning Analytics and Knowledge (LAK 2022). Association for Computing Machinery, 304-314.

YUDELSON, M., HOSSEINI, R., AND BRUSILOVSKY, P., 2014. Investigating automated student modeling in a java MOOC. In Proceedings of the 7th International Conference on Educational Data Mining (EDM 2014), J. Stamper, Z. Pardos, M. Mavrikis, B. M. Mclaren, Eds. International Educational Data Mining Society, 261-264.

ZHANG, J., SHI, X., KING, I., AND YEUNG, D., 2017. Dynamic Key-Value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 765-774.

ZHAO, S., WANG, C., AND SAHEBI, S., 2020. Modeling Knowledge Acquisition from Multiple Learning Resource Types. In Proceedings of the 13th International Conference on Educational Data Mining (EDM 2020), A. N. Rafferty, J. Whitehill, C. Romero, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society, 313-324.
Section
Special Issue on CSEDM: Educational Data Mining for Computing Education