Investigating Concept Definition and Skill Modeling for Cognitive Diagnosis in Language Learning

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Jun 27, 2024
Boxuan Ma Sora Fukui Yuji Ando Shinichi Konomi

Abstract

Language proficiency diagnosis is essential to extract fine-grained information about the linguistic knowledge
states and skill mastery levels of test takers based on their performance on language tests. Different
from comprehensive standardized tests, many language learning apps often revolve around word-level
questions. Therefore, knowledge concepts and linguistic skills are hard to define, and diagnosis must be
well-designed. Traditional approaches are widely applied for modeling knowledge in science or mathematics,
where skills or knowledge concepts are easy to associate with each item. However, only a
few works focus on defining knowledge concepts and skills using linguistic characteristics for language
knowledge proficiency diagnosis. In addressing this, we propose a framework for language proficiency
diagnosis based on neural networks. Specifically, we propose a series of methods based on our framework
that uses different linguistic features to define skills and knowledge concepts in the context of the
language learning task. Experimental results on a real-world second-language learning dataset demonstrate
the effectiveness and interpretability of our framework. We also provide empirical evidence with
comprehensive experiments and analysis to prove that our knowledge concept and skill definitions are
reasonable and critical to the performance of our model.

How to Cite

Ma, B., Fukui, S., Ando , Y., & Konomi, S. (2024). Investigating Concept Definition and Skill Modeling for Cognitive Diagnosis in Language Learning. Journal of Educational Data Mining, 16(1), 303–329. https://doi.org/10.5281/zenodo.10948071
Abstract 156 | HTML Downloads 61 PDF Downloads 136

##plugins.themes.bootstrap3.article.details##

Keywords

cognitive diagnosis, language proficiency, linguistic skill, concept definition, skill modeling

References
AVDIU, D., BUI, V., AND KLIMČÍKOVÁ, K. P. 2019. Predicting learner knowledge of individual words using machine learning. In Proceedings of the 8thWorkshop on NLP for Computer Assisted Language Learning, D. Alfter, E. Volodina, L. Borin, I. Pilan, and H. Lange, Eds. LiU Electronic Press, 1–9.

BEINBORN, L., ZESCH, T., AND GUREVYCH, I. 2014. Predicting the difficulty of language proficiency tests. Transactions of the Association for Computational Linguistics 2, 517–530.

BENEDETTO, L., ARADELLI, G., CREMONESI, P., CAPPELLI, A., GIUSSANI, A., AND TURRIN, R. 2021. On the application of transformers for estimating the difficulty of multiple-choice questions from text. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, J. Burstein, A. Horbach, E. Kochmar, R. Laarmann-Quante, C. Leacock, N. Madnani, I. Pilán, H. Yannakoudakis, and T. Zesch, Eds. Association for Computational Linguistics, 147–157.

CHEN, Y., LIU, Q., HUANG, Z., WU, L., CHEN, E., WU, R., SU, Y., AND HU, G. 2017. Tracking knowledge proficiency of students with educational priors. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Association for Computing Machinery, 989–998.

CHENG, S., LIU, Q., CHEN, E., HUANG, Z., HUANG, Z., CHEN, Y., MA, H., AND HU, G. 2019. Dirt: Deep learning enhanced item response theory for cognitive diagnosis. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, 2397–2400.

CULLIGAN, B. 2015. A comparison of three test formats to assess word difficulty. Language Testing 32, 4, 503–520.

DE LA TORRE, J. 2009. Dina model and parameter estimation: A didactic. Journal of educational and behavioral statistics 34, 1, 115–130.

DE LA TORRE, J. AND DOUGLAS, J. A. 2004. Higher-order latent trait models for cognitive diagnosis. Psychometrika 69, 3, 333–353.

DESMARAIS, M. C. 2012. Mapping question items to skills with non-negative matrix factorization. ACM SIGKDD Explorations Newsletter 13, 2, 30–36.

EMBRETSON, S. E. AND REISE, S. P. 2013. Item response theory. Psychology Press.

FUSI, N., SHETH, R., AND ELIBOL, M. 2018. Probabilistic matrix factorization for automated machine learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, and N. Cesa-Bianchi, Eds. Curran Associates Inc., 3352–3361.

GAO, W., LIU, Q., HUANG, Z., YIN, Y., BI, H., WANG, M.-C., MA, J., WANG, S., AND SU, Y. 2021. Rcd: Relation map driven cognitive diagnosis for intelligent education systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 501–510.

GRAVE, É., BOJANOWSKI, P., GUPTA, P., JOULIN, A., AND MIKOLOV, T. 2018. Learning word vectors for 157 languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, and T. Tokunaga, Eds. European Language Resources Association (ELRA), 3483–3487.

HUANG, Z., LIU, Q., CHEN, Y., WU, L., XIAO, K., CHEN, E., MA, H., AND HU, G. 2020. Learning or forgetting? a dynamic approach for tracking the knowledge proficiency of students. ACM Transactions on Information Systems (TOIS) 38, 2, 1–33.

KILICKAYA, F. ET AL. 2019. Assessing l2 vocabulary through multiple-choice, matching, gap-fill, and word formation items. Lublin Studies in Modern Languages and Literature 43, 3, 155–166.

KINGMA, D. P. AND BA, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

KREMMEL, B. AND SCHMITT, N. 2016. Interpreting vocabulary test scores: What do various item formats tell us about learners’ ability to employ words? Language Assessment Quarterly 13, 4, 377– 392.

LEE, D. AND SEUNG, H. S. 2000. Algorithms for non-negative matrix factorization. In Proceedings of the 13th International Conference on Neural Information Processing Systems, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds. MIT Press, 535–541.

LI, J., WANG, F., LIU, Q., ZHU, M., HUANG, W., HUANG, Z., CHEN, E., SU, Y., AND WANG, S. 2022. Hiercdf: A bayesian network-based hierarchical cognitive diagnosis framework. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 904–913.

LIU, Q., HUANG, Z., YIN, Y., CHEN, E., XIONG, H., SU, Y., AND HU, G. 2019. Ekt: Exercise-aware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering 33, 1, 100–115.

LIU, Q., WU, R., CHEN, E., XU, G., SU, Y., CHEN, Z., AND HU, G. 2018. Fuzzy cognitive diagnosis for modelling examinee performance. ACM Transactions on Intelligent Systems and Technology (TIST) 9, 4, 1–26.

LORD, F. M. 1980. Applications of Item Response Theory to Practical Testing Problems. Routledge.

LOUKINA, A., YOON, S.-Y., SAKANO, J., WEI, Y., AND SHEEHAN, K. 2016. Textual complexity as a predictor of difficulty of listening items in language proficiency tests. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Y. Matsumoto and R. Prasad, Eds. The COLING 2016 Organizing Committee, 3245–3253.

MA, B., HETTIARACHCHI, G. P., AND ANDO, Y. 2022. Format-aware item response theory for predicting vocabulary proficiency. In Proceedings of the 15th International Conference on Educational Data Mining, A. Mitrovic and N. Bosch, Eds. International Educational Data Mining Society, 695–700.

MA, B., HETTIARACHCHI, G. P., FUKUI, S., AND ANDO, Y. 2023a. Each encounter counts: Modeling language learning and forgetting. In LAK23: 13th International Learning Analytics and Knowledge Conference. Association for Computing Machinery, 79–88.

MA, B., HETTIARACHCHI, G. P., FUKUI, S., AND ANDO, Y. 2023b. Exploring the effectiveness of vocabulary proficiency diagnosis using linguistic concept and skill modeling. In Proceedings of the 16th International Conference on Educational Data Mining, M. Feng, T. Käser, and P. Talukdar, Eds. International Educational Data Mining Society, 149–159.

MA, H., ZHU, J., YANG, S., LIU, Q., ZHANG, H., ZHANG, X., CAO, Y., AND ZHAO, X. 2022.

A prerequisite attention model for knowledge proficiency diagnosis of students. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, 4304–4308.

MNIH, A. AND SALAKHUTDINOV, R. R. 2007. Probabilistic matrix factorization. In Advances in neural information processing systems, J. Platt, D. Koller, Y. Singer, and S. Roweis, Eds. Vol. 20. Curran Associates, Inc.

NATION, I. S. 2001. Learning vocabulary in another language. Vol. 10. Cambridge university press Cambridge.

RECKASE, M. D. 2009. Multidimensional item response theory models. In Multidimensional item response theory. Springer, 79–112.

ROBERTSON, F. 2021. Word discriminations for vocabulary inventory prediction. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), R. Mitkov and G. Angelova, Eds. INCOMA Ltd., 1188–1195.

SETTLES, B., T LAFLAIR, G., AND HAGIWARA, M. 2020. Machine learning–driven language assessment.

Transactions of the Association for computational Linguistics 8, 247–263.

SONG, L., HE, M., SHANG, X., YANG, C., LIU, J., YU, M., AND LU, Y. 2023. A deep cross-modal neural cognitive diagnosis framework for modeling student performance. Expert Systems with Applications, 120675.

STÆHR, L. S. 2008. Vocabulary size and the skills of listening, reading and writing. Language Learning Journal 36, 2, 139–152.

SUN, Y., YE, S., INOUE, S., AND SUN, Y. 2014. Alternating recursive method for q-matrix learning. In Proceedings of the 7th International Conference on Educational Data Mining, J. Stamper, Z. Pardos, M. Mavrikis, and B. M. McLaren, Eds. International Educational Data Mining Society, 14–20.

SUSANTI, Y., NISHIKAWA, H., TOKUNAGA, T., OBARI, H., ET AL. 2016. Item difficulty analysis of english vocabulary questions. In Proceedings of the 8th International Conference on Computer Supported Education (CSEDU 2016), J. Uhomoibhi, G. Costagliola, S. Zvacek, and B. M. McLaren, Eds. Vol. 1. SCITEPRESS - Science and Technology Publications, Lda, 267–274.

THAI-NGHE, N. AND SCHMIDT-THIEME, L. 2015. Multi-relational factorization models for student modeling in intelligent tutoring systems. In 2015 Seventh international conference on knowledge and systems engineering (KSE). IEEE, 61–66.

TONG, S., LIU, J., HONG, Y., HUANG, Z., WU, L., LIU, Q., HUANG, W., CHEN, E., AND ZHANG, D. 2022. Incremental cognitive diagnosis for intelligent education. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 1760–1770.

TONG, S., LIU, Q., YU, R., HUANG, W., HUANG, Z., PARDOS, Z. A., AND JIANG, W. 2021. Item response ranking for cognitive diagnosis. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI), Z.-H. Zhou, Ed. International Joint Conferences on Artificial Intelligence, 1750–1756.

TOSCHER, A. AND JAHRER, M. 2010. Collaborative filtering applied to educational data mining. KDD Cup 2010 Workshop, Held as part of 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2010).

TRAUB, R. E. 1993. On the equivalence of the traits assessed by multiple-choice and constructedresponse tests. In Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment, W. C. Ward and R. E. Bennett, Eds. Routledge, 29–44.

VAN DER LINDEN, W. J. AND HAMBLETON, R. 1997. Handbook of item response theory. Vol. 1. Taylor & Francis Group.

VAN DER MAATEN, L. AND HINTON, G. 2008. Visualizing data using t-sne. Journal of machine learning research 9, 11.

WANG, F., LIU, Q., CHEN, E., HUANG, Z., CHEN, Y., YIN, Y., HUANG, Z., AND WANG, S. 2020. Neural cognitive diagnosis for intelligent education systems. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. AAAI Press, 6153–6161.

WANG, F., LIU, Q., CHEN, E., HUANG, Z., YIN, Y., WANG, S., AND SU, Y. 2023. Neuralcd: A general framework for cognitive diagnosis. IEEE Transactions on Knowledge and Data Engineering 35, 8, 8312–8327.

WANG, X., HUANG, C., CAI, J., AND CHEN, L. 2021. Using knowledge concept aggregation towards accurate cognitive diagnosis. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, 2010–2019.

WANG, Z., GU, Y., LAN, A., AND BARANIUK, R. 2020. Varfa: A variational factor analysis framework for efficient bayesian learning analytics. In Proceedings of the 13th International Conference on Educational Data Mining, A. N. Rafferty, J. Whitehill, V. Cavalli-Sforza, and C. Romero, Eds. International Educational Data Mining Society, 696–699.

YAO, L. AND SCHWARZ, R. D. 2006. A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied psychological measurement 30, 6, 469–492.

ZYLICH, B. AND LAN, A. 2021. Linguistic skill modeling for second language acquisition. In LAK21: 11th International Learning Analytics and Knowledge Conference. Association for Computing Machinery, 141–150.
Section
Extended Articles from the EDM 2023 Conference