Latent Skill Mining and Labeling from Courseware Content

Noboru Matsuda; Jesse Wood; Raj Shrivastava; Machi Shimmei; Norman Bier

doi:10.5281/zenodo.7086211

Latent Skill Mining and Labeling from Courseware Content

PDF

Published Oct 1, 2022

DOI https://doi.org/10.5281/zenodo.7086211

Noboru Matsuda

North Carolina State University

https://orcid.org/0000-0003-2344-1485

Jesse Wood

North Carolina State University

Raj Shrivastava

North Carolina State University

Machi Shimmei

North Carolina State University

Norman Bier

Carnegie Mellon University

Abstract

A model that maps the requisite skills, or knowledge components, to the contents of an online course is necessary to implement many adaptive learning technologies. However, developing a skill model and tagging courseware contents with individual skills can be expensive and error prone. We propose a technology to automatically identify latent skills from instructional text on existing online courseware called Smart (Skill Model mining with Automated detection of Resemblance among Texts). Smart is capable of mining, labeling, and mapping skills without using an existing skill model or student learning (aka response) data. The goal of our proposed approach is to mine latent skills from assessment items included in existing courseware, provide discovered skills with human-friendly labels, and map didactic paragraph texts with skills. This way, mapping between assessment items and paragraph texts is formed. In doing so, automated skill models produced by Smart will reduce the workload of courseware developers while enabling adaptive online content at the launch of the course. In our evaluation study, we applied Smart to two existing authentic online courses. We then compared machine-generated skill models and human-crafted skill models in terms of the accuracy of predicting students’ learning. We also evaluated the similarity between machine-generated and human-crafted skill models. The results show that student models based on Smart-generated skill models were equally predictive of students’ learning as those based on human-crafted skill models— as validated on two OLI (Open Learning Initiative) courses. Also, Smart can generate skill models that are highly similar to human-crafted models as evidenced by the normalized mutual information (NMI) values.

How to Cite

Matsuda, N., Wood, J., Shrivastava, R., Shimmei, M., & Bier, N. (2022). Latent Skill Mining and Labeling from Courseware Content . Journal of Educational Data Mining, 14(2). https://doi.org/10.5281/zenodo.7086211

Abstract 650 | PDF Downloads 461

Keywords

skill model discovery, learning engineering, massive open online course, text mining, natural language processing

References

BANSAL, M., AND SHARMA, D. 2021. A novel multi-view clustering approach via proximity-based factorization targeting structural maintenance and sparsity challenges for text and image categorization. Information Processing & Management, 58(4), Elsevier, 102546.

BARNES, T. 2010. Novel derivation and application of skill matrices: The q-matrix method. In Handbook of Educational Data Mining, C. Romero, S. Ventura, M. Pechenizkiy and R. S. J. d. Baker, Eds. CRC Press, Boca Raton, FL, 159-172.

BIER, N., AND RINDERLE, J. 2011. Openness and Learning Analytics. Open Education Annual Conference, Park City, UT. Routledge.

BIER, N., STRADER, R., AND ZIMMARO, D. 2014. An approach to Skill Mapping in Online Courses. Learning with MOOCs, Cambridge, MA.

BIER, N., MOORE, S., AND VAN VELSEN, M. 2019. Instrumenting courseware and leveraging data with the Open Learning Initiative (OLI). In Companion Proceedings 9th International Learning Analytics & Knowledge Conference, J. Cunningham, N. Hoover, S. Hsiao, G. Lynch, K. McCarthy, C. Brooks, R. Ferguson, and U. Hoppe, Eds. Tempe, AZ, 990-1001.

CEN, H., KOEDINGER, K., AND JUNKER, B. 2006. Learning Factors Analysis – A General Method for Cognitive Model Evaluation and Improvement. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, M. Ideka, K.D. Ashley, and T.W. Chan, Eds. 4053, Springer, Berlin, 164–175. DOI: https://doi.org/10.1007/11774303_17

CHAPLOT, D. S., MACLELLAN, C., SALAKHUTDINOV, R., AND KOEDINGER, K. 2018. Learning Cognitive Models Using Neural Networks. In Proceedings of International Conference on Artificial Intelligence in Education, C. Penstein Rosé, R. Martínez-Maldonado, U. Hoppe, R. Luckin, M. Mavrikis, K. Porayska-Pomsta, B. McLaren, and B. du Boulay, Eds. Vol 10947, Springer, Cham, 43-56.

CHEN, Y., LI, X., LIU, J., AND YING, Z. 2018. Recommendation system for adaptive learning. Applied Psychological Measurement, 42(1), Sage Publications, 24-41.

CLARK, R., FELDON, D., VAN MERRIENBOER, J. J. G., YATES, K., AND EARLY, S. 2008. Cognitive task analysis. In Handbook of Research on Educational Communications and Technology, J. M. Spector, M. D. Merrill, J. J. G. van Merriënboer, and M. P. Driscoll, Eds. Macmillan/Gale, New York, NY, Routledge, 577–593.

CRANDALL B, KLEIN G, HOFFMAN RR. 2006. Working Minds: A Practitioner’s Guide To Cognitive Task Analysis. MIT Press, Cambridge, MA.

DAI, Y., ASANO, Y., YOSHIKAWA, M. 2016. Course Content Analysis: An Initiative Step toward Learning Object Recommendation Systems for MOOC Learners. In 9th Proceedings of International Conference on Educational Data Mining, T. Barnes, M. Chi, and M. Feng, Eds. International Educational Data Mining Society, 347–52.

DESMARAIS, M. C. 2012. Mapping question items to skills with non-negative matrix factorization. ACM SIGKDD Explorations Newsletter, 13(2), ACM, 30–36. DOI: https://doi.org/10.1145/2207243.2207248

DESMARAIS, M. C., AND BAKER, R. S. J. D. 2012. A review of recent advances in learner and skill modeling in intelligent learning environments. User Modeling and User-Adapted Interaction, 22(1–2), Springer, 9–38. DOI: https://doi.org/10.1007/s11257-011-9106-8

DESMARAIS, M. C., AND NACEUR, R. 2013. A Matrix Factorization Method for Mapping Items to Skills and for Enhancing Expert-Based Q-Matrices. In Proceedings of the 16th International Conference on Artificial Intelligence in Education, 7926, H. C. Lane, K. Yacef, J. Mostow, and P. Pavlik, Eds. Springer, Berlin, Heidelberg. DOI: https://doi.org/10.1007/978-3-642-39112-5_45

DEVLIN J, CHANG MW, LEE K, TOUTANOVA K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 4171-4186.

GAVRILOVIĆ, N., ARSIĆ, A., DOMAZET, D., AND MISHRA, A. 2018. Algorithm for adaptive learning process and improving learners’ skills in java programming language. Computer Applications in Engineering Education, 26(5), Wiley Online Library, 1362-1382.

GONZALEZ-BRENES, J. P., AND MOSTOW, J. 2012. Dynamic Cognitive Tracing: Towards Unified Discovery of Student and Cognitive Models. In Proceedings of the 5th International Conference on Educational Data Mining, K. Yacef, O. Zaïane, A. Hershkovitz, M. Yudelson, and J. Stamper Eds. International Educational Data Mining Society, 49-56.

HARIS, S. S., AND OMAR, N. 2012. A rule-based approach in Bloom’s Taxonomy question classification through natural language processing. In 2012 7th International Conference on Computing and Convergence Technology (ICCCT), K. D. Kwack, S. Kawata, S, Hwang, D. Han, and F. Ko, Eds. IEEE, 410–414.

HARTIGAN, J. A., AND WONG, M. A. 1979. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), Wiley, 100–108. DOI: https://doi.org/10.2307/2346830

IMHOF, C., BERGAMIN, P., AND MCGARRITY, S. 2020. Implementation of Adaptive Learning Systems: Current State and Potential. In Online Teaching and Learning in Higher Education, P. Isaias, D. G. Sampson, and D. Ifenthaler, Eds. Springer, Cham, 93–115. DOI: https://doi.org/10.1007/978-3-030-48190-2_6

JOVANOVIC, D., AND JOVANOVIC, S. 2015. An adaptive e-learning system for java programming course, based on Dokeos LE. Computer Applications in Engineering Education, 23(3), Wiley Online Library, 337-343.

KETCHEN, D., AND SHOOK, C. 1996. The application of cluster analysis in strategic management research: An analysis and critique. Strategic Management Journal, 17(6), Wiley Online Library, 441-458.

KOEDINGER, K. R., BAKER, R., CUNNINGHAM, K., SKOGSHOLM, A., LEBER, B., AND STAMPER, J. 2010. A Data Repository for the EDM community: The PSLC DataShop. In Handbook of Educational Data Mining, C. Romero, S. Ventura, M. Pechenizkiy and R. S. J. d. Baker, Eds. CRC Press, Boca Raton, FL.

KOEDINGER, K. R., CORBETT, A. T., AND PERFETTI, C. 2012. The Knowledge-Learning-Instruction Framework: Bridging the Science-Practice Chasm to Enhance Robust Student Learning. Cognitive Science, 36(5), Wiley Online Library, 757–798.

KOEDINGER, K. R., MCLAUGHLIN, E. A., AND STAMPER, J. C. 2012. Automated Student Model Improvement. In Proceedings of the 5th International Conference on Educational Data Mining, K. Yacef, O. Zaïane, A. Hershkovitz, M. Yudelson, and J. Stamper Eds. International Educational Data Mining Society, 383-395

KOEDINGER, K. R., AND NATHAN, M. J. 2004. The real story behind story problems: Effects of representations on quantitative reasoning. The Journal of the Learning Sciences, 13(2), Taylor & Francis, 129-164.

KULLBACK, S., AND LEIBLER, R. A. 1951. On information and sufficiency. The Annals of Mathematical Statistics, 22(1), Institute of Mathematical Statistics, 79-86.

LIU, M., MCKELROY, E., CORLISS, S. B., AND CARRIGAN, J. 2017. Investigating the effect of an adaptive learning intervention on students' learning. Educational Technology Research and Development, 65(6), Springer, 1605-1625.

LOSSIO-VENTURA, J. A., GONZALES, S., MORZAN, J., ALATRISTA-SALAS, H., HERNANDEZ-BOUSSARD, T., AND BIAN, J. 2021. Evaluation of clustering and topic modeling methods over health-related tweets and emails. Artificial Intelligence in Medicine, Elsevier, 117.

MARTIN, B., MITROVIC, T., MATHAN, S., AND KOEDINGER, K. R. (2005). On using learning curves to evaluate ITS: Automatic and semi-automatic skill coding with a view towards supporting on-line assessment. In Proceedings of the 12th International Conference on Artificial Intelligence in Education C. K. Looi, G. McCalla, B. Bredeweg, and J. Breuker, Eds. Springer, Cham, 419-426.

MARTIN, B., MITROVIC, A., KOEDINGER, K. R., AND MATHAN, S. (2011). Evaluating and improving adaptive educational systems with learning curves. User Modeling and User-Adapted Interaction, 21(3), Springer. 249-283. doi: 10.1007/s11257-010-9084-2

MATSUDA, N., FURUKAWA, T., BIER, N., AND FALOUTSOS, C. 2015. Machine Beats Experts: Automatic Discovery of Skill Models for Data-Driven Online Course Refinement. In Proceedings of the 8th International Conference on Educational Data Mining, O.C. Santos, C. Romero, M. Pechenizkiy, A. Merceron, P. Mitros, J.M. Luna, C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, and M. Desmarais, Eds. International Educational Data Mining Society. 101-108.

MATSUDA, N., SHIMMEI, M., CHAUDHURI, P., MAKAM, D., SHRIVASTAVA, R., WOOD, J., AND TANEJA, P. (in press). PASTEL: Evidence-based learning engineering methods to facilitate creation of adaptive online courseware. In Artificial Intelligence in STEM Education: The Paradigmatic Shifts in Research, Education, and Technology. F. Ouyang, P. Jiao, B. M. McLaren and A. H. Alavi, Eds. New York, NY: CSC Press, 1-16.

MIHALCEA, R., AND TARAU, P. 2004. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 404–411. MISLEVY, R. J., ALMOND, R. G., AND LUKAS, J. F. (2003). A Brief Introduction to Evidence-centered Design. ETS Research Report Series, 2003(1), Wiley Online Library, 1-29. DOI: https://doi.org/10.1002/j.2333-8504.2003.tb01908.x

PAQUETTE, G., MARIÑO, O., ROGOZAN, D., AND LÉONARD, M. 2015. Competency-based personalization for massive online learning. Smart Learning Environments, 2(1), Springer, 4. DOI: https://doi.org/10.1186/s40561-015-0013-z

PELÁNEK, R. 2017. Bayesian knowledge tracing, logistic models, and beyond: an overview of learner modeling techniques. User Modeling and User-Adapted Interaction, 27(3), Springer, 313-350.

REIMERS, N., GUREVYCH, I. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, K. Inui, J. Jiang, V. Ng, and X. Wan, Eds. Association for Computational Linguistics, 3973-3983.

RIHÁK, J., AND PELÁNEK, R. 2017. Measuring Similarity of Educational Items Using Data on Learners’ Performance. In Proceedings of the 10th International Conference on Educational Data Mining, X. Hu, T. Barnes, A. Hershkovitz, and L. Paquette, Eds. International Educational Data Mining Society. 16-23.

ROUSSEEUW, P. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, Elsevier, 53-65.

SALTON, G., AND BUCKLEY, C. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), Elsevier, 513–523. DOI: https://doi.org/10.1016/0306-4573(88)90021-0

SALTON, G., AND MCGILL, M. J. 1983. Introduction to modern information retrieval. McGraw-Hill, New York, NY.

SHEN, J. T., YAMASHITA, M., PRIHAR, E., HEFFERNAN, N., WU, X., MCGREW, S., AND LEE, D. 2021. Classifying math knowledge components via task-adaptive pre-trained BERT. In Proceedings of the 24th International Conference on Artificial Intelligence in Education, I. Roll, D. McNamara, S. Sosnovsky, R. Luckin, and V. Dimitrova, Eds. Springer, Cham, 408-419.

SHERKAT, E., VELCIN, J., AND MILIOS, E. E. 2018. Fast and Simple Deterministic Seeding of KMeans for Text Document Clustering. In Experimental IR Meets Multilinguality, Multimodality, and Interaction, P. Bellot, C. Trabelsi, J. Mothe, F. Murtagh, J. Y. Nie, L. Soulier, E. SanJuan, L. Cappellato, and N. Ferro, Eds. Springer, Berlin, Heidelbert, 76–88.

SHMUELI, G. 2010. To Explain or to Predict? Statistical Science, 25(3), Institute of Mathematical Statistics, 289–310.

SHUTE, V. J., TORREANO, L. A., AND WILLIS, R. E. 2000. DNA: Providing the blueprint for instruction. In Cognitive Task Analysis, Psychology Press, 85-100.

ŚMIEJA, M., HAJTO, K. & TABOR, J. 2019. Efficient mixture model for clustering of sparse high dimensional binary data. Data Mining and Knowledge Discovery, 33, Springer, 1583-1624.

STAMPER, J., AND KOEDINGER, K. 2011. Human-machine student model discovery and improvement using data. In Proceedings of the 15th International Conference on Artificial Intelligence in Education, G. Biswas, S. Bull, J. Kay, and A. Mitrovic, Eds. Springer, Berlin, Heidelberg, 353–360.

SUPRAJA, S., HARTMAN, K., TATINATI, S., AND KHONG, A. W. H. 2017. Toward the Automatic Labeling of Course Questions for Ensuring Their Alignment with Learning Outcomes. In Proceedings of the 10th International Conference on Educational Data Mining, X. Hu, T. Barnes, A. Hershkovitz, and L. Paquette, Eds. International Educational Data Mining Society. 56-63.

TATSUOKA, K. K. 1983. Rule Space: An Approach for Dealing with Misconceptions Based on Item Response Theory. Journal of Educational Measurement, 20(4), JSTOR, 345–354.

THORNDIKE, R. L. 1953. Who belongs in the family? Psychometrika, 18(4), 267-276.

TYTON PARTNERS. 2020. Time for Class 2020. Tyton Partners and Bay View Analytics in Partnership with Every Learner Everywhere, posted July 2020, www.everylearnereverywhere.org/resources/time-for-class-2020/

VINH, N. X., EPPS, J., AND BAILEY, J. 2009. Information theoretic measures for clusterings comparison: Is a correction for chance necessary? In Proceedings of the 26th Annual International Conference on Machine Learning, A. Danyluk, L. Bottou, and M. Littman, Eds. Association for Computing Machinery, 1073–1080. DOI: https://doi.org/10.1145/1553374.1553511

WALKINGTON, C. A. 2013. Using adaptive learning technologies to personalize instruction to student interests: The impact of relevant contexts on performance and learning outcomes. Journal of Educational Psychology, 105(4), American Psychological Association, 932-945.

WANG, W., SONG, L., DING, S., WANG, T., GAO, P., AND XIONG, J. 2020. A Semi-supervised Learning Method for Q-Matrix Specification Under the DINA and DINO Model With Independent Structure. Frontiers in Psychology, 11(2120). Frontiers. https://doi.org/10.3389/fpsyg.2020.02120

WINTERS, T., SHELTON, C., PAYNE, T., AND MEI, G. 2005. Topic extraction from item-level grades. In American Association for Artificial Intelligence 2005 Workshop on Educational Datamining. AAAI.

YANG, Y. C., GAMBLE, J. H., HUNG, Y., AND LIN, T. 2014. An online adaptive learning environment for critical-thinking-infused English literacy instruction. British Journal of Educational Technology, 45(4), Wiley Online Library, 723-747.

ZAMORA, J. 2017. Recent Advances in High-Dimensional Clustering for Text Data. In Claudio Moraga: A Passion for Multi-Valued Logic and Soft Computing, Springer, 323-337.

Issue

Vol. 14 No. 2 (2022): EDM Journal Track Special Issue

Section

EDM 2022 Journal Track

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms:

The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:

Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
The Author represents and warrants that:

the Work is the Author’s original work;
the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
the Work is not pending review or under consideration by another publisher;
the Work has not previously been published;
the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
the Work contains no libel, invasion of privacy, or other unlawful matter.

The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Abstract

How to Cite

##plugins.themes.bootstrap3.article.details##