Using Problem Similarity- and Order-based Weighting to Model Learner Performance in Introductory Computer Science Problems

Yingbin Zhang; Juan D. Pinto; Aysa Xuemo Fan; Luc Paquette

doi:10.5281/zenodo.7646789

Using Problem Similarity- and Order-based Weighting to Model Learner Performance in Introductory Computer Science Problems

PDF

Published Mar 15, 2023

DOI https://doi.org/10.5281/zenodo.7646789

Yingbin Zhang

South China Normal University

https://orcid.org/0000-0002-2664-3093

Juan D. Pinto

University of Illinois at Urbana Champgain

https://orcid.org/0000-0002-2972-485X

Aysa Xuemo Fan

University of Illinois at Urbana Champgain

Luc Paquette

University of Illinois at Urbana Champgain

https://orcid.org/0000-0002-2738-3190

Abstract

The second CSEDM data challenge aimed at finding innovative methods to use students’ programming traces to model their learning. The main challenge of this task is how to decide which past problems are relevant for predicting performance on a future problem. This paper proposes a set of weighting schemes to address this challenge. Specifically, students’ behaviors and performance on past problems were weighted in predicting performance on future problems. The weight of a past problem was proportional to its similarity with the future problem. Problem similarity was quantified in terms of source code, problem prompts, and struggling patterns. In addition, we considered another weighting scheme where past problems were weighted by the order in which students attempted them. Prior studies have used problem similarity and order information in learner modeling, but the proposed weighting schemes are more flexible in capturing problem similarity on various problem properties and weighting various behaviors and performance information on past problems. We systematically investigate the utility of the weighting schemes on performance prediction through two analyses. The first analysis found that the weighting schemes based on source code similarity, struggling pattern similarity, and problem order improved the prediction performance, but the weighting scheme based on problem prompts did not. The second analysis found that the weighting scheme allows a simple and interpretable model, such as logistic regression, to have performance comparable to state-of-the-art deep-learning models. We discussed the implications of the weighting schemes for learner modeling and suggested directions for further improvement.

How to Cite

Zhang, Y., Pinto, J. D., Fan, A. X., & Paquette, L. (2023). Using Problem Similarity- and Order-based Weighting to Model Learner Performance in Introductory Computer Science Problems. Journal of Educational Data Mining, 15(1), 63–99. https://doi.org/10.5281/zenodo.7646789

Abstract 615 | PDF Downloads 442

Keywords

learner modeling, programming trace, problem similarity, knowledge tracing, performance prediction

References

ALEVEN, V., 2010. Rule-based cognitive modeling for intelligent tutoring systems. In Advances in Intelligent Tutoring Systems, R. Nkambou, J. Bourdeau, R. Mizoguchi, Eds. Studies in Computational Intelligence, vol 308, Springer, Berlin, Heidelberg, 33-62.

ALON, U., ZILBERSTEIN, M., LEVY, O., AND YAHAV, E., 2019. Code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages. 3, POPL, 1-29.

ANDERSON, J. R., CORBETT, A. T., KOEDINGER, K. R., AND PELLETIER, R., 1995. Cognitive Tutors: Lessons learned. Journal of the Learning Sciences. 4, 2, 167-207.

AZCONA, D., ARORA, P., Hsiao, I., AND SMEATON, A., 2019. User2code2vec: Embeddings for profiling students based on distributional representations of source code. In Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK 2019). Association for Computing Machinery, 86-95.

BAUMSTARK, L., AND ORSEGA, M., 2016. Quantifying introductory CS students' iterative software process by mining version control system repositories. Journal of Computing Sciences in Colleges 31, 6, 97-104.

BECK, J. E., CHANG, K., MOSTOW, J., AND CORBETT, A., 2008. Does help help? Introducing the Bayesian evaluation and assessment methodology. In Intelligent Tutoring Systems. ITS 2008, B. P. Woolf, E. Aïmeur, R. Nkambou, S. Lajoie, Eds. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 383-394.

BECKER, B. A., GLANVILLE, G., IWASHIMA, R., MCDONNELL, C., GOSLIN, K., AND MOONEY, C., 2016. Effective compiler error message enhancement for novice programming students. Computer Science Education 26, 2-3, 148-175.

BOSCH, N., AND PAQUETTE, L., 2021. What' s next? Sequence length and impossible loops in state transition measurement. Journal of Educational Data Mining. 13, 1, 1-23.

CARTER, A. S., HUNDHAUSEN, C. D., AND ADESOPE, O., 2015. The normalized programming state model: Predicting student performance in computing courses based on programming behavior. In Proceedings of the 11th Annual International Conference on International Computing Education Research (ICER 2015). Association for Computing Machinery, 141- 150.

CECHÁK, J., AND PELÁNEK, R., 2021. Experimental evaluation of similarity measures for educational items. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), S. I. H. Hsiao, S. Sahebi, F. Bouchet, and J. Vie, Eds. International Educational Data Mining Society, 553-558.

CEN, H., KOEDINGER, K., AND JUNKER, B., 2006. Learning factors analysis: A general method for cognitive model evaluation and improvement. In Intelligent Tutoring Systems. ITS 2006, M. Ikeda, K. D. Ashley, and T. Chan, Eds. Lecture Notes in Computer Science, vol 4053, Springer, Berlin, Heidelberg, 164-175.

CHALMERS, R. P., 2012. mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 6, 1-29.

CHOI, Y., LEE, Y., CHO, J., BAEK, J., KIM, B., CHA, Y., SHIN, D., BAE, C., AND HEO, J., 2020. Towards an appropriate query, key, and value computation for knowledge tracing. In Proceedings of the 7th ACM Conference on Learning @ Scale (L@S 2020). Association for Computing Machinery, 341-344.

CORBETT, A. T., AND ERSON, J. R., 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction 4, 4, 253-278.

DAGNE, G. A., HOWE, G. W., BROWN, C. H., AND MUTHÉN, B. O., 2002. Hierarchical modeling of sequential behavioral data: An empirical Bayesian approach. Psychological Methods 7, 2, 262-280.

EMBRETSON, S. E., AND REISE, S. P., 2000. Item Response Theory for Psychologists. Lawrence Erlbaum Associates.

FITZGERALD, S., LEWANDOWSKI, G., MCCAULEY, R., MURPHY, L., SIMON, B., THOMAS, L., AND ZANDER, C., 2008. Debugging: Finding, fixing and flailing, a multi-institutional study of novice debuggers. Computer Science Education 18, 2, 93-116.

GERVET, T., KOEDINGER, K., SCHNEIDER, J., AND MITCHELL, T., 2020. When is deep learning the best approach to knowledge tracing? Journal of Educational Data Mining 12, 3, 31-54.

GONG, Y., BECK, J. E., AND HEFFERNAN, N. T., 2011. How to construct more accurate student models: Comparing and optimizing knowledge tracing and performance factor analysis. International Journal of Artificial Intelligence in Education 21, 1–2, 27-45.

HOSSEINI, R., AND BRUSILOVSKY, P., 2013. JavaParser: A fine-grain concept indexing tool for java problems. In The 1st Workshop on AI-supported Education for Computer Science, N. Le, K. E. Boyer, B. Chaudhry, B. Di Eugenio, S. I. H. Hsiao, L. A. Sudol-Delyser, Eds. CEUR Workshop Proceedings, vol 1009, 60-63.

JADUD, M. C., 2006a. Methods and tools for exploring novice compilation behaviour. In Proceedings of the 2nd International Workshop on Computing Education Research (ICER 2006). Association for Computing Machinery, 73-84.

JADUD, M. C., 2006b. An exploration of novice compilation behaviour in BlueJ. Ph.D. thesis, University of Kent, Canterbury, United Kingdom.

JURAFSKY, D., AND MARTIN, J. H., 2000. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall.

LE, Q., AND MIKOLOV, T., 2014. Distributed representations of sentences and documents. Proceedings of the 31st International Conference on Machine Learning (ICML 2014), E. P. Xing and T. Jebara, Eds. Association for Computing Machinery, 1188-1196.

LEE, Y., CHOI, Y., CHO, J., FABBRI, A. R., LOH, H., HWANG, C., LEE, Y., KIM, S., AND RADEV, D., 2019. Creating a neural pedagogical agent by jointly learning to review and assess. arXiv. https://doi.org/10.48550/arxiv.1906.10910

LIU, Q., HUANG, Z., YIN, Y., CHEN, E., XIONG, H., SU, Y., AND HU, G., 2021. EKT: Exerciseaware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering, 33, 1, 100-115.

LUCKIN, R., HOLMES, W., GRIFFITHS, M., AND FORCIER, L. B., 2016. Intelligence Unleashed: An Argument for AI in Education. Pearson Education, London.

MAO, Y., ZHI, R., AND KHOSHNEVISAN, F., 2019. One minute is enough: Early prediction of student success and event-level difficulty during a novice programming task. In Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019), C. F. Lynch, A. Merceron, M. Desmarais, R. Nkambou, Eds. International Educational Data Mining Society, Montréal, Canada, 119-128.

NATTI, A., AND ATHREY, D., 2019. CSEDM 2019 challenge. In Joint Proceedings of the 2nd CSEDM Workshop at International Conference on Learning Analytics and Knowledge 2019, D. Azcona, Y. V. Paredes, S. I. H. Hsiao, and T. W. Price, Eds. CEUR Workshop Proceedings.

PANDEY, S., AND KARYPIS, G., 2019. A self-attentive model for knowledge tracing. In Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019), C. F. Lynch, A. Merceron, M. Desmarais, R. Nkambou, Eds. International Educational Data Mining Society, 384-389.

PARDOS, Z. A., AND HEFFERNAN, N. T., 2010. Modeling individualization in a Bayesian networks implementation of knowledge tracing. In User Modeling, Adaptation, and Personalization (UMAP 2010), P. De Bra, A. Kobsa, D. Chin, Eds. Lecture Notes in Computer Science, vol 6075, Springer, Berlin, Heidelberg, 255-266.

PAVLIK JR, P. I., CEN, H., AND KOEDINGER, K. R., 2009. Performance factors analysis – a new alternative to knowledge tracing. In Proceedings of the 14th International Conference on Artificial Intelligence in Education (AIED 2009), V. Dimitrova, R. Mizoguchi, B. du Boulay, A. Graesser, Eds. IOS Press, Amsterdam, Netherlands, 531–538.

PEDRO, M. A. S., BAKER, R. S. J. D., AND GOBERT, J. D., 2013. What different kinds of stratification can reveal about the generalizability of data-mined skill assessment models. In Proceedings of the 3rd International Conference on Learning Analytics and Knowledge (LAK’13). Association for Computing Machinery, 190-194.

PELÁNEK, R., 2020. Measuring similarity of educational items: An overview. IEEE Transactions on Learning Technology 13, 2, 354-366.

PELÁNEK, R., EFFENBERGER, T., VANĚK, M., SASSMANN, V., AND GMITERKO, D., 2018. Measuring item similarity in introductory programming. In Proceedings of the 5th Annual ACM Conference on Learning at Scale (L@S 2018). Association for Computing Machinery, Article 19.

PIECH, C., BASSEN, J., HUANG, J., GANGULI, S., SAHAMI, M., GUIBAS, L. J., AND SOHLDICKSTEIN, J., 2015. Deep knowledge tracing. In Proceedings of Advances in Neural Information Processing Systems, vol. 28, S. Becker, S. Thrun, K. Obermayer, Eds. MIT Press, Cambridge, MA, United States, 505–513.

PINTO, J. D., ZHANG, Y., PAQUETTE, L., AND FAN, A. X., 2021. Investigating elements of student persistence in an introductory computer science course. In Joint Proceedings of the 5th CSEDM Workshop at the International Conference on Educational Data Mining 2021, T. W. Price and S. San Pedro, Eds. CEUR Workshop Proceedings, vol 3051.

REHUREK, R., AND SOJKA, P., 2010. Software framework for topic modelling with large corpora. In Proceedings of LREC 2010 Workshop New Challenges for NLP Frameworks. University of Malta, Valletta, Malta, 46-50.

ROSENTHAL, J. A., 1996. Qualitative descriptors of strength of association and effect size. Journal of Social Service Research 21, 4, 37-59.

SAHEBI, S., AND BRUSILOVSKY, P., 2018. Student performance prediction by discovering Inter- Activity relations. In Proceedings of the 11th International Conference on Educational Data Mining (EDM 2018), K. E. Boyer, M. Yudelson, Eds. International Educational Data Mining Society, 87-96.

SAHEBI, S., LIN, Y. R., AND BRUSILOVSKY, P., 2016. Tensor factorization for student modeling and performance prediction in unstructured domain. In Proceedings of the 9th International Conference on Educational Data Mining (EDM 2016), T. Barnes, M. Chi, M. Feng, Eds. International Educational Data Mining Society, 502-506.

SARSA, S., LEINONEN, J., AND HELLAS, A., 2022. Empirical evaluation of deep learning models for knowledge tracing: Of hyperparameters and metrics on performance and replicability. Journal of Educational Data Mining 14, 2, 32-102.

SHI, Y., CHI, M., BARNES, T., AND PRICE, T.W., 2022. Code-DKT: A code-based knowledge tracing model for programming tasks. In Proceedings of the 15th International Conference on Educational Data Mining (EDM 2022), A. Mitrovic, N. Bosch, Eds. International Educational Data Mining Society, 50-61.

SHI, Y., MAO, Y., BARNES, T., CHI, M., AND PRICE, T. W., 2021. More with less: Exploring how to use deep learning effectively through semi-supervised learning for automatic bug detection in student code. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), S. I. H. Hsiao, S. Sahebi, F. C. Bouchet, J. Vie, Eds. International Educational Data Mining Society, 446-453.

SHI, Y., SHAH, K., WANG, W., MARWAN, S., PENMETSA, P., AND PRICE, T. W., 2021. Toward semi-automatic misconception discovery using code embeddings. In Proceedings of the 11th International Conference on Learning Analytics and Knowledge (LAK’21). Association for Computing Machinery, 606-612.

SHUTE, V. J., AND ZAPATA-RIVERA, D., 2012. Adaptive educational systems. In Adaptive Technologies for Training and Education, P. J. Durlach, A. M. Lesgold, Eds. Cambridge University Press, 7-27.

STAMPER, J. C., KOEDINGER, K. R., BISWAS, G., BULL, S., KAY, J., AND MITROVIC, A., 2011. Human-machine student model discovery and improvement using datashop. In Artificial Intelligence in Education. AIED 2011, G. Biswas, S. Bull, J. Kay, A. Mitrovic, Eds. Lecture Notes in Computer Science, vol 6738, Springer, Berlin, Heidelberg, 353-360.

TABANAO, E. S., RODRIGO, M. M. T., AND JADUD, M. C., 2011. Predicting at-risk novice Java programmers through the analysis of online protocols. In Proceedings of the 7th International Workshop on Computing Education Research (ICER 2011). Association for Computing Machinery, New York, NY, United States, 85-92.

VILLAMOR, M. M., 2020. A review on process-oriented approaches for analyzing novice solutions to programming problems. Research and Practice in Technology Enhanced Learning 15, 1-23.

VON DAVIER, M., 2016. Rasch model. In Handbook of Item Response Theory, Volume One, W. J. van der Linden, Eds. Chapman and Hall/CRC, 31-50.

WANG, L., SY, A., LIU, L., AND PIECH, C., 2017. Learning to represent student knowledge on programming exercises using deep learning. In Proceedings of the 10th International Conference on Educational Data Mining (EDM 2017), X. Hu, T. Barnes, A. Hershkovitz, L. Paquette, Eds. International Educational Data Mining Society, 324-329.

WOOLF, B. P., LANE, H. C., CHAUDHRI, V. K., AND KOLODNER, J. L., 2013. AI grand challenges for education. AI Magazine 34, 4, 66-84.

YECKEHZAARE, I., MULLIGAN, V., RAMSTAD, G., AND RESNICK, P., 2022. Semester-level spacing but not procrastination affected student exam performance. In Proceedings of 12th International Conference on Learning Analytics and Knowledge (LAK 2022). Association for Computing Machinery, 304-314.

YUDELSON, M., HOSSEINI, R., AND BRUSILOVSKY, P., 2014. Investigating automated student modeling in a java MOOC. In Proceedings of the 7th International Conference on Educational Data Mining (EDM 2014), J. Stamper, Z. Pardos, M. Mavrikis, B. M. Mclaren, Eds. International Educational Data Mining Society, 261-264.

ZHANG, J., SHI, X., KING, I., AND YEUNG, D., 2017. Dynamic Key-Value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 765-774.

ZHAO, S., WANG, C., AND SAHEBI, S., 2020. Modeling Knowledge Acquisition from Multiple Learning Resource Types. In Proceedings of the 13th International Conference on Educational Data Mining (EDM 2020), A. N. Rafferty, J. Whitehill, C. Romero, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society, 313-324.

Issue

Vol. 15 No. 1 (2023): JEDM Special Issue on Computer Science Education and Educational Data Mining (CSEDM)

Section

Special Issue on CSEDM: Educational Data Mining for Computing Education

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms:

The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:

Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
The Author represents and warrants that:

the Work is the Author’s original work;
the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
the Work is not pending review or under consideration by another publisher;
the Work has not previously been published;
the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
the Work contains no libel, invasion of privacy, or other unlawful matter.

The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Abstract

How to Cite

##plugins.themes.bootstrap3.article.details##

Most read articles by the same author(s)