The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Jun 27, 2024
Yang Shi Robin Schmucker Keith Tran John Bacher Kenneth Koedinger Thomas Price Min Chi Tiffany Barnes

Abstract

Understanding students’ learning of knowledge components (KCs) is an important educational data mining
task and enables many educational applications. However, in the domain of computing education,
where program exercises require students to practice many KCs simultaneously, it is a challenge to attribute
their errors to specific KCs and, therefore, to model student knowledge of these KCs. In this paper,
we define this task as the KC attribution problem. We first demonstrate a novel approach to addressing this
task using deep neural networks and explore its performance in identifying expert-defined KCs (RQ1).
Because the labeling process takes costly expert resources, we further evaluate the effectiveness of transfer
learning for KC attribution, using more easily acquired labels, such as problem correctness (RQ2).
Finally, because prior research indicates the incorporation of educational theory in deep learning models
could potentially enhance model performance, we investigated how to incorporate learning curves in the
model design and evaluated their performance (RQ3). Our results show that in a supervised learning scenario,
we can use a deep learning model, code2vec, to attribute KCs with a relatively high performance
(AUC > 75% in two of the three examined KCs). Further using transfer learning, we achieve reasonable
performance on the task without any costly expert labeling. However, the incorporation of learning curves
shows limited effectiveness in this task. Our research lays important groundwork for personalized feedback
for students based on which KCs they applied correctly, as well as more interpretable and accurate
student models.

How to Cite

Shi, Y., Schmucker, R., Tran, K., Bacher, J., Koedinger, K., Price, T., Chi, M., & Barnes, T. (2024). The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data. Journal of Educational Data Mining, 16(1), 1–33. https://doi.org/10.5281/zenodo.10844782
Abstract 235 | HTML Downloads 112 PDF Downloads 122

##plugins.themes.bootstrap3.article.details##

Keywords

knowledge component, KC, Deep Learning, learning curve, code2vec, student modeling

References
AI, F., CHEN, Y., GUO, Y., ZHAO, Y., WANG, Z., FU, G., AND WANG, G. 2019. Concept-aware deep knowledge tracing and exercise recommendation in an online learning system. In Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019), C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. International Educational Data Mining Society, 240–245.

ALEVEN, V. AND KOEDINGER, K. R. 2013. Knowledge component (kc) approaches to learner modeling. In Design Recommendations for Intelligent Tutoring Systems, R. A. Sottilare, A. Graesser, X. Hu, and H. Holden, Eds. Vol. 1. US Army Research Laboratory, Chapter 15, 165–182.

ALLAMANIS, M., PENG, H., AND SUTTON, C. 2016. A convolutional attention network for extreme summarization of source code. In Proceedings of The 33rd International Conference on Machine Learning, M. F. Balcan and K. Q. Weinberger, Eds. PMLR, 2091–2100.

ALON, U., BRODY, S., LEVY, O., AND YAHAV, E. 2019. code2seq: Generating sequences from structured representations of code. In International Conference on Learning Representations.

ALON, U., ZILBERSTEIN, M., LEVY, O., AND YAHAV, E. 2018. A general path-based representation for predicting program properties. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, D. Grossman, Ed. Association for Computing Machinery, 404–419.

ALON, U., ZILBERSTEIN, M., LEVY, O., AND YAHAV, E. 2019. code2vec: Learning distributed representations of code. In Proceedings of the ACM on Programming Languages, S. Weirich, Ed. Association for Computing Machinery, 40–69.

ANDERSON, J. R. AND REISER, B. J. 1985. The lisp tutor: it approaches the effectiveness of a human tutor. BYTE 10, 4, 159–175.

BARNES, T. 2005. The q-matrix method: Mining student response data for knowledge. In AAAI 2005 Educational Data Mining Workshop, J. Beck, Ed. Association for the Advancement of Artificial Intelligence (AAAI), 39–46.

BARRIA-PINEDA, J., GUERRA-HOLLSTEIN, J., AND BRUSILOVSKY, P. 2018. A fine-grained open learner model for an introductory programming course. In Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization, D. Chin and L. Chen, Eds. Association for Computing Machinery, 53–61.

BLIKSTEIN, P. 2011. Using learning analytics to assess students’ behavior in open-ended programming tasks. In Proceedings of the 1st International Conference on Learning Analytics and Knowledge, G. Conole and D. Gaševíc, Eds. Association for Computing Machinery, 110–116.

CARUANA, R. 1997. Multitask learning. Machine learning 28, 41–75.

CEN, H., KOEDINGER, K., AND JUNKER, B. 2006. Learning factors analysis – a general method for cognitive model evaluation and improvement. In Intelligent Tutoring Systems, M. Ikeda, K. D. Ashley, and T.-W. Chan, Eds. Springer, 164–175.

CHI, M., KOEDINGER, K., GORDON, G., AND JORDAN, P. 2011. Instructional factors analysis: A cognitive model for multiple instructional interventions. In EDM 2011 4th International Conference on Educational Data Mining, M. Pechenizkiy, T. Calders, C. Conati, S. Ventura, C. Romero, and J. Stamper, Eds. International Educational Data Mining Society, 61–70.

CLARK, R. E., FELDON, D. F., VAN MERRIËNBOER, J. J., YATES, K. A., AND EARLY, S. 2008. Cognitive task analysis. In Handbook of research on educational communications and technology, D. Jonassen, M. J. Spector, M. Driscoll, M. D. Merrill, J. van Merrienboer, and M. P. Driscoll, Eds. Routledge, New York, NY, 577–593.

CLEUZIOU, G. AND FLOUVAT, F. 2021. Learning student program embeddings using abstract execution traces. In 14th International Conference on Educational Data Mining, S. Hsiao and S. Sahebi, Eds. International Educational Data Mining Society, 252–262.

COOKE, N. J. 1994. Varieties of knowledge elicitation techniques. International journal of human computer studies 41, 6, 801–849.

CORBETT, A. T. AND ANDERSON, J. R. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction 4, 253–278.

DE BOER, P.-T., KROESE, D. P., MANNOR, S., AND RUBINSTEIN, R. Y. 2005. A tutorial on the cross-entropy method. Annals of operations research 134, 19–67.

EDWARDS, S. H. AND MURALI, K. P. 2017. Codeworkout: short programming exercises with built-in data collection. In Proceedings of the 2017 ACM conference on innovation and technology in computer science education, G. Rößling and I. Polycarpou, Eds. Association for Computing Machinery, 188–193.

EMERSON, A., SMITH, A., RODRIGUEZ, F. J., WIEBE, E. N., MOTT, B. W., BOYER, K. E., AND LESTER, J. C. 2020. Cluster-based analysis of novice coding misconceptions in block-based programming. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education, S. Heckman, P. Cutter, and A. Monge, Eds. Association for Computing Machinery, 825–831.

FEIN, B., GRASSL, I., BECK, F., AND FRASER, G. 2022. An evaluation of code2vec embeddings for scratch. In Proceedings of the 15th International Conference on Educational Data Mining (EDM) 2022, T. Mitrovic and N. Bosch, Eds. International Educational Data Mining Society, 368–375.

FEIN, B., OBERMÜLLER, F., AND FRASER, G. 2022. Catnip: An automated hint generation tool for scratch. In Proceedings of the 27th ACM Conference on Innovation and Technology in Computer Science Education, E. Barendsen and Simon, Eds. Association for Computing Machinery, 124–130.

FENG, Z., GUO, D., TANG, D., DUAN, N., FENG, X., GONG, M., SHOU, L., QIN, B., LIU, T., JIANG, D., AND ZHOU, M. 2020. CodeBERT: A pre-trained model for programming and natural languages. In The 2020 Conference on Empirical Methods in Natural Language Processing, T. Cohn, Y. He, and Y. Liu, Eds. Association for Computational Linguistics, 1536–1547.

FLEISS, J. L., LEVIN, B., AND PAIK, M. C. 2013. Statistical methods for rates and proportions. john wiley & sons.

GERVET, T., KOEDINGER, K., SCHNEIDER, J., MITCHELL, T., ET AL. 2020. When is deep learning the best approach to knowledge tracing? Journal of Educational Data Mining 12, 3, 31–54.

GONÇALVES, J. A. AND SANTOS, A. L. 2023. Jinter: A hint generation system for java exercises. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education, Simon and J. Sheard, Eds. Association for Computing Machinery, 375–381.

GRAESSER, A. C., HU, X., AND SOTTILARE, R. 2018. Intelligent tutoring systems. In International handbook of the learning sciences, F. Fischer, C. E. Hmelo-Silver, S. R. Goldman, and P. Reimann, Eds. Routledge, England, UK, 246–255.

HELLAS, A., IHANTOLA, P., PETERSEN, A., AJANOVSKI, V. V., GUTICA, M., HYNNINEN, T., KNUTAS, A., LEINONEN, J., MESSOM, C., AND LIAO, S. N. 2018. Predicting academic performance: A systematic literature review. In Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education. Association for Computing Machinery, 175–199.

HOQ, M., BRUSILOVSKY, P., AND AKRAM, B. 2023. Analysis of an explainable student performance prediction model in an introductory programming course. In Proceedings of the 16th International Conference on Educational Data Mining (EDM) 2023, M. Feng, T. Käser, and P. Talukdar, Eds. International Educational Data Mining Society, 79–90.

HOSSEINI, R. AND BRUSILOVSKY, P. 2013. Javaparser: A fine-grain concept indexing tool for java problems. In The First Workshop on AI-supported Education for Computer Science. CEUR Workshops, 60–63.

HUANG, S., LIU, Z., ZHAO, X., LUO, W., AND WENG, J. 2023. Towards robust knowledge tracing models via k-sparse attention. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, M. P. Kato, J. Mothe, and B. Poblete, Eds. Association for Computing Machinery, 2441–2445.

JIANG, B., ZHAO, W., ZHANG, N., AND QIU, F. 2022. Programming trajectories analytics in blockbased programming language learning. Interactive Learning Environments 30, 1, 113–126.

JIN, W., BARNES, T., STAMPER, J., EAGLE, M. J., JOHNSON, M. W., AND LEHMANN, L. 2012. Program representation for automatic hint generation for a data-driven novice programming tutor. In Proceedings of the 11th International Conference on Intelligent Tutoring Systems, S. A. Cerri, W. J. Clancey, G. Papadourakis, and K. Panourgia, Eds. Springer, 304–309.

KAMBEROVIC, M., KRIVIC, S., DELIC, A., SZEDMAK, S., AND LJUBOVIC, V. 2023. Personalized learning systems for computer science students: Analyzing and predicting learning behaviors using programming error data. In Adjunct Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization, S. Pera and J. Neidhardt, Eds. Association for Computing Machinery, 86–91.

KINNEBREW, J. S., SEGEDY, J. R., AND BISWAS, G. 2014. Analyzing the temporal evolution of students’ behaviors in open-ended learning environments. Metacognition and learning 9, 187–215.

KOEDINGER, K. R., CORBETT, A. T., AND PERFETTI, C. 2012. The knowledge-learning-instruction framework: Bridging the science-practice chasm to enhance robust student learning. Cognitive science 36, 5, 757–798.

LABRA, C. AND SANTOS, O. C. 2023. Exploring cognitive models to augment explainability in deep knowledge tracing. In Adjunct Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization. Association for Computing Machinery, 220–223.

MACLELLAN, C. J., HARPSTEAD, E., ALEVEN, V., AND KOEDINGER, K. R. 2015. Trestle: Incremental learning in structured domains using partial matching and categorization. In Proceedings of the 3rd Annual Conference on Advances in Cognitive Systems, A. Goel and M. Riedl, Eds. Cognitive Systems Foundation, 192–210.

MANIKTALA, M., CODY, C., ISVIK, A., LYTLE, N., CHI, M., AND BARNES, T. 2020. Extending the hint factory for the assistance dilemma: A novel, data-driven helpneed predictor for proactive problem-solving help. Journal of Educational Data Mining 12, 4 (Dec), 24–65.

MAO, Y., SHI, Y., MARWAN, S., PRICE, T. W., BARNES, T., AND CHI, M. 2021. Knowing” when” and” where”: Temporal-astnn for student learning progression in novice programming tasks. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), S. Hsiao and S. Sahebi, Eds. International Educational Data Mining Society, 172–182.

MARWAN, S., SHI, Y., MENEZES, I., CHI, M., BARNES, T., AND PRICE, T. W. 2021. Just a few expert constraints can help: Humanizing data-driven subgoal detection for novice programming. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), S. Hsiao and S. Sahebi, Eds. International Educational Data Mining Society, 68–80.

McNAMARA, D. S., CROSSLEY, S. A., AND ROSCOE, R. 2013. Natural language processing in an intelligent writing strategy tutoring system. Behavior research methods 45, 499–515.

MORÉ , J. J., GARBOW, B. S., AND HILLSTROM, K. E. 1980. User guide for minpack-1. Tech. rep.

MORSHED FAHID, F., TIAN, X., EMERSON, A., B. WIGGINS, J., BOUNAJIM, D., SMITH, A., WIEBE, E., MOTT, B., ELIZABETH BOYER, K., AND LESTER, J. 2021. Progression trajectory-based student modeling for novice block-based programming. In Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, N. Tintarev and M. Tkalcic, Eds. Association for Computing Machinery, 189–200.

MULDNER, K., WIXON, M., RAI, D., BURLESON, W., WOOLF, B., AND ARROYO, I. 2015. Exploring the impact of a learning dashboard on student affect. In Artificial Intelligence in Education: 17th International Conference, C. Conati, N. Heffernan, A. Mitrovic, and M. F. Verdejo, Eds. Springer, 307–317.

NEWELL, A. AND ROSENBLOOM, P. S. 2013. Mechanisms of skill acquisition and the law of practice. In Cognitive skills and their acquisition, J. R. Anderson, Ed. Psychology Press, 1–55.

NGUYEN, H., WANG, Y., STAMPER, J., AND MCLAREN, B. M. 2019. Using knowledge component modeling to increase domain understanding in a digital learning game. In Proceedings of The 12th International Conference on Educational Data Mining (EDM 2019), C. Lynch and A. Merceron, Eds. International Educational Data Mining Society, 139–148.

PAASSEN, B., HAMMER, B., PRICE, T. W., BARNES, T., GROSS, S., PINKWART, N., ET AL. 2018. The continuous hint factory-providing hints in vast and sparsely populated edit distance spaces. Journal of Educational Data Mining 10, 1, 1–35.

PANDEY, S. AND SRIVASTAVA, J. 2020. Rkt: relation-aware self-attention for knowledge tracing. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, P. Cui, E. Rundensteiner, D. Carmel, Q. He, and J. X. Yu, Eds. Association for Computing Machinery, 1205–1214.

PAVLIK, P. I., CEN, H., AND KOEDINGER, K. R. 2009. Performance factors analysis –a new alternative to knowledge tracing. In Proceedings of the 2009 Conference on Artificial Intelligence in Education, V. Dimitrova, R. Mizoguchi, B. du Boulay, and A. C. Graesser, Eds. IOS Press, 531–538.

PERKINS, D. N. AND MARTIN, F. 1986. Fragile knowledge and neglected strategies in novice programmers. In The First Workshop on Empirical Studies of Programmers on Empirical Studies of Programmers, E. Soloway, Ed. Association for Computing Machinery, 213–229.

PIECH, C., BASSEN, J., HUANG, J., GANGULI, S., SAHAMI, M., GUIBAS, L. J., AND SOHLDICKSTEIN, J. 2015. Deep knowledge tracing. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds. Neural Information Processing Systems Foundation, 505–513.

PRICE, T. W., HOVEMEYER, D., RIVERS, K., GAO, G., BART, A. C., KAZEROUNI, A. M., BECKER, B. A., PETERSEN, A., GUSUKUMA, L., EDWARDS, S. H., AND BABCOCK, D. 2020. Progsnap2: A flexible format for programming process data. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education, A. Luxton-Reilly and M. Divitini, Eds. Association for Computing Machinery, 356–362.

PRICE, T. W., ZHI, R., AND BARNES, T. 2017. Hint generation under uncertainty: The effect of hint quality on help-seeking behavior. In International Conference on Artificial Intelligence in Education, E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay, Eds. Springer, 311–322.

RIVERS, K., HARPSTEAD, E., AND KOEDINGER, K. 2016. Learning curve analysis for programming: Which concepts do students struggle with? In Proceedings of the 2016 ACM Conference on International Computing Education Research, B. Dorn, J. Sheard, J. Tenenberg, and D. Chinn, Eds. Association for Computing Machinery, 143–151.

RIVERS, K. AND KOEDINGER, K. R. 2017. Data-driven hint generation in vast solution spaces: a selfimproving python programming tutor. International Journal of Artificial Intelligence in Education 27, 37–64.

SALDEN, R. J., ALEVEN, V. A., RENKL, A., AND SCHWONKE, R. 2009.Worked examples and tutored problem solving: redundant or synergistic forms of support? Topics in Cognitive Science 1, 1, 203– 213.

SCHMUCKER, R., WANG, J., HU, S., AND MITCHELL, T. 2022. Assessing the knowledge state of online students-new data, new approaches, improved accuracy. Journal of Educational Data Mining 14, 1, 1–45.

SELENT, D., PATIKORN, T., AND HEFFERNAN, N. 2016. Assistments dataset from multiple randomized controlled experiments. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale, V. Aleven, J. Kay, and I. Roll, Eds. Association for Computing Machinery, 181–184.

SHEN, S., LIU, Q., CHEN, E., HUANG, Z., HUANG, W., YIN, Y., SU, Y., AND WANG, S. 2021. Learning process-consistent knowledge tracing. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, H. Wang, I. Skrypnyk, W. Hsu, and S. Chawla, Eds. Association for Computing Machinery, 1452–1460.

SHI, Y. 2023. Interpretable code-informed learning analytics for cs education. In Companion Proceedings of the 13th International Learning Analytics and Knowledge Conference. Society for Learning Analytics Research, 180–187.

SHI, Y., CHI, M., BARNES, T., AND PRICE, T. 2022. Code-dkt: A code-based knowledge tracing model for programming tasks. In Proceedings of the 15th International Conference on Educational Data Mining (EDM) 2022, T. Mitrovic and N. Bosch, Eds. International Educational Data Mining Society, 50–61.

SHI, Y., MAO, Y., BARNES, T., CHI, M., AND PRICE, T. W. 2021. More with less: Exploring how to use deep learning effectively through semi-supervised learning for automatic bug detection in student code. In International Conference on Educational Data Mining, S. Hsiao and S. Sahebi, Eds. International Educational Data Mining Society, 446–453.

SHI, Y. AND PRICE, T. 2022. An overview of code2vec in student modeling for programming education. MMTC Communications-Frontiers 17, 3, 17–24.

SHI, Y., SCHMUCKER, R., CHI, M., BARNES, T., AND PRICE, T. 2023. Kc-finder: Automated knowledge component discovery for programming problems. In Proceedings of the 16th International Conference on Educational Data Mining (EDM) 2023, M. Feng, T. Käser, and P. Talukdar, Eds. International Educational Data Mining Society, 28–39.

SHI, Y., SHAH, K., WANG, W., MARWAN, S., PENMETSA, P., AND PRICE, T. 2021. Toward semiautomatic misconception discovery using code embeddings. In LAK21: 11th International Learning Analytics and Knowledge Conference, N. Dowell, S. Joksimovic, M. Scheffel, and G. Siemens, Eds. Association for Computing Machinery, 606–612.

SHUTE, V. J. 2008. Focus on formative feedback. Review of educational research 78, 1, 153–189.

SNODDY, G. S. 1926. Learning and stability: a psychophysiological analysis of a case of motor learning with clinical applications. Journal of Applied Psychology 10, 1, 1.

TOBIAS, S., FLETCHER, J. D., AND WIND, A. P. 2014. Game-based learning. Handbook of research on educational communications and technology 1, 485–503.

TORREY, L. AND SHAVLIK, J. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, E. S. Olivas, J. D. M. Guerrero, M. M. Sober, J. R. M. Benedito, and A. J. S. Lopez, Eds. IGI global, 242–264.

WANG, L., SY, A., LIU, L., AND PIECH, C. 2017. Learning to represent student knowledge on programming exercises using deep learning. In Proceedings of the 10th International Conference on Educational Data Mining (EDM 2017), A. Hershkovitz and L. Paquette, Eds. International Educational Data Mining Society, 324–329.

WICK, M., STEVENSON, D., AND WAGNER, P. 2005. Using testing and junit across the curriculum. ACM SIGCSE Bulletin 37, 1, 236–240.

WIGGINS, J. B., FAHID, F. M., EMERSON, A., HINCKLE, M., SMITH, A., BOYER, K. E., MOTT, B., WIEBE, E., AND LESTER, J. 2021. Exploring novice programmers’ hint requests in an intelligent block-based coding environment. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, P. Cutter, A. Monge, and J. Sheard, Eds. Association for Computing Machinery, 52–58.

XU, K., BA, J., KIROS, R., CHO, K., COURVILLE, A., SALAKHUDINOV, R., ZEMEL, R., AND BENGIO, Y. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning, F. Bach and D. Blei, Eds. PMLR, 2048–2057.

YANG, S., ZHU, M., HOU, J., AND LU, X. 2020. Deep knowledge tracing with convolutions.

YUDELSON, M., HOSSEINI, R., VIHAVAINEN, A., AND BRUSILOVSKY, P. 2014. Investigating automated student modeling in a java mooc. In Proceedings of the 7th International Conference on Educational Data Mining (EDM) 2014, J. Stamper, Z. Pardos, M. Mavrikis, and B. M. McLaren, Eds. International Educational Data Mining Society, 261–264.

ZHANG, J., SHI, X., KING, I., AND YEUNG, D.-Y. 2017. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web, E. Agichtein and E. Gabrilovich, Eds. Association for Computing Machinery, 765–774.
Section
EDM 2024 Journal Track

Most read articles by the same author(s)

1 2 > >>