Towards Design-Loop Adaptivity: Identifying Items for Revision



Published Dec 18, 2022
Radek Pelánek Tomáš Effenberger Adam Kukučka


We study the automatic identification of educational items worthy of content authors’ attention. Based on
the results of such analysis, content authors can revise and improve the content of learning environments.
We provide an overview of item properties relevant to this task, including difficulty and complexity
measures, item discrimination, and various forms of content representation. We analyze the potential
usefulness of these properties using both simulation and analysis of real data from a large-scale learning
environment. We also describe two case studies where we practically apply the identification of attention-worthy
items. Based on the analysis and case studies, we provide recommendations for practice and
impulses for further research.

How to Cite

Pelánek, R., Effenberger, T., & Kukučka, A. (2022). Towards Design-Loop Adaptivity: Identifying Items for Revision. Journal of Educational Data Mining, 14(3), 1–25.
Abstract 337 | PDF Downloads 267



learning environment, outliers, anomaly detection, interpretability, reliability, difficulty, content analysis, attention-worthiness


ALEVEN, V., MCLAUGHLIN, E. A., GLENN, R. A., AND KOEDINGER, K. R. 2016. Handbook of research on learning and instruction. Routledge, Chapter Instruction based on adaptive learning technologies, 522–559.

ANDERSON, L. W. AND KRATHWOHL, D. R. 2001. A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Longman.

ARRUARTE, J., LARRAÑ AGA, M., ARRUARTE, A., AND ELORRIAGA, J. A. 2021. Measuring the quality of test-based exercises based on the performance of students. International Journal of Artificial Intelligence in Education 31, 3, 585–602.

BAKER, F. B. 2001. The basics of item response theory. ERIC.

BAKER, R. S. 2016. Stupid tutoring systems, intelligent humans. International Journal of Artificial Intelligence in Education 26, 2, 600–614.

BAKER, R. S., CORBETT, A. T., AND KOEDINGER, K. R. 2007. The difficulty factors approach to the design of lessons in intelligent tutor curricula. International Journal of Artificial Intelligence in Education 17, 4, 341–369.

BASAK, J. AND KRISHNAPURAM, R. 2005. Interpretable hierarchical clustering by constructing an unsupervised decision tree. IEEE transactions on knowledge and data engineering 17, 1, 121–132.

BECK, J. E. 2005. Engagement tracing: using response times to model student disengagement. In Proceedings of Artificial intelligence in education: Supporting learning through intelligent and socially informed technology, C.-K. Looi, G. McCalla, B. Bredeweg, and J. Breuker, Eds. IOS Press, 88–95.

BENJAMIN, R. G. 2012. Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review 24, 1, 63–88.

BRENNAN, R. L. 1972. A generalized upper-lower item discrimination index. Educational and Psychological Measurement 32, 2, 289–303.

CEN, H., KOEDINGER, K., AND JUNKER, B. 2006. Learning factors analysis – a general method for cognitive model evaluation and improvement. In Proceedings of Intelligent Tutoring Systems, M. Ikeda, K. D. Ashley, and T.-W. Chan, Eds. Springer, 164–175.

CLOW, D. 2012. The learning analytics cycle: closing the loop effectively. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, S. B. Shum, D. Gasevic, and R. Ferguson, Eds. Association for Computing Machinery, 134–138.

DARVISHI, A., KHOSRAVI, H., SADIQ, S., AND GAŠEVÍC , D. 2022. Incorporating AI and learning analytics to build trustworthy peer assessment systems. British Journal of Educational Technology 53, 4, 844–875.

DASGUPTA, S., FROST, N., MOSHKOVITZ, M., AND RASHTCHIAN, C. 2020. Explainable k-means clustering: Theory and practice. In XXAI: Extending Explainable AI Beyond Deep Models and Classifiers, ICML Workshop.

DIAMOND, J. M. 1998. Guns, Germs, and Steel: the Fates of Human Societies. W. W. Norton & Co.

DOROUDI, S. 2019. Integrating human and machine intelligence for enhanced curriculum design. PhD diss., Air Force Research Laboratory.

DURAND, G., GOUTTE, C., BELACEL, N., BOUSLIMANI, Y., AND LEGER, S. 2017. Review, computation and application of the additive factor model (AFM). Tech. rep., Tech. Report 23002483. National Research Council Canada.

EFFENBERGER, T. AND PELÁNEK, R. 2021a. Interpretable clustering of students’ solutions in introductory programming. In Proceedings of Artificial Intelligence in Education, I. Roll, D. McNamara, S. Sosnovsky, R. Luckin, and V. Dimitrova, Eds. Springer, 101–112.

EFFENBERGER, T. AND PELÁNEK, R. 2021b. Visualization of student-item interaction matrix. In Visualizations and Dashboards for Learning Analytics, M. Sahin and D. Ifenthaler, Eds. Springer, 439–456.

EMMOTT, A. F., DAS, S., DIETTERICH, T., FERN, A., AND WONG, W.-K. 2013. Systematic construction of anomaly detection benchmarks from real data. In Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, L. Akoglu, E. M¨uller, and J. Vreeken, Eds. Association for Computing Machinery, 16–21.

FANCSALI, S. E., LI, H., AND RITTER, S. 2021. Toward scalable improvement of large content portfolios for adaptive instruction. In Joint Proceedings of the Workshops at EDM, T. W. Price and S. S. Pedro, Eds. Vol. 3051. CEUR Workshop Proceedings.

FANCSALI, S. E., LI, H., SANDBOTHE, M., AND RITTER, S. 2021. Targeting design-loop adaptivity. In Proceedings of the 14th International Conference on Educational Data Mining, S. I. Hsiao, S. Sahebi, F. Bouchet, and J. Vie, Eds. International Educational Data Mining Society, 323–330.

GOUTTE, C., LÉGER, S., AND DURAND, G. 2015. A probabilistic model for knowledge component naming. In Proceedings of the 8th International Conference on Educational Data Mining, O. C. Santos, J. Boticario, C. Romero, M. Pechenizkiy, A. Merceron, P. Mitros, J. M. Luna, M. C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, and M. C. Desmarais, Eds. International Educational Data Mining Society, 608–609.

GUPTA, N., ESWARAN, D., SHAH, N., AKOGLU, L., AND FALOUTSOS, C. 2019. Beyond outlier detection: LookOut for pictorial explanation. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, M. Berlingerio, F. Bonchi, T. G¨artner, N. Hurley, and G. Ifrim, Eds. Springer, 122–138.

HASSANI, H., SILVA, E. S., UNGER, S., TAJMAZINANI, M., AND MAC FEELY, S. 2020. Artificial intelligence (ai) or intelligence augmentation (ia): What is the future? AI 1, 2, 143–155.

HUANG, Y., LOBCZOWSKI, N. G., RICHEY, J. E., MCLAUGHLIN, E. A., ASHER, M. W., HARACKIEWICZ, J. M., ALEVEN, V., AND KOEDINGER, K. R. 2021. A general multi-method approach to data-driven redesign of tutoring systems. In Proceedings of the 11th International Learning Analytics and Knowledge Conference, M. Scheffel, N. Dowell, S. Joksimovic, and G. Siemens, Eds. Association for Computing Machinery, 161–172.

KAVITHA, R., VIJAYA, A., AND SARASWATHI, D. 2012. Intelligent item assigning for classified learners in ITS using item response theory and point biserial correlation. In Proceedings of 2012 International Conference on Computer Communication and Informatics, S. Zhong, Ed. IEEE, 1–5.

KHOSRAVI, H., SHUM, S. B., CHEN, G., CONATI, C., TSAI, Y.-S., KAY, J., KNIGHT, S., MARTINEZMALDONADO, R., SADIQ, S., AND GAŠEVÍC , D. 2022. Explainable artificial intelligence in education. Computers and Education: Artificial Intelligence 3, 100074.

KIRILENKO, A. P., STEPCHENKOVA, S. O., AND DAI, X. 2021. Automated topic modeling of tourist reviews: does the anna karenina principle apply? Tourism Management 83, 104241.

KOEDINGER, K. R. AND MCLAUGHLIN, E. A. 2016. Closing the loop with quantitative cognitive task analysis. In Proceedings of the 9th International Conference on Educational Data Mining, T. Barnes, M. Chi, and M. Feng, Eds. International Educational Data Mining Society, 412–417.

KRATHWOHL, D. R. 2002. A revision of bloom’s taxonomy: An overview. Theory into practice 41, 4, 212–218.

LINARDATOS, P., PAPASTEFANOPOULOS, V., AND KOTSIANTIS, S. 2020. Explainable AI: A review of machine learning interpretability methods. Entropy 23, 1, 18.

LIU, B., XIA, Y., AND YU, P. S. 2000. Clustering through decision tree construction. In Proceedings of the Ninth International Conference on Information and Knowledge Management, A. Agah, J. Callan, E. Rundensteiner, and S. Gauch, Eds. Association for Computing Machinery, 20–29.

LIU, F. T., TING, K. M., AND ZHOU, Z.-H. 2012. Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6, 1, 1–39.

LIU, R. AND KOEDINGER, K. R. 2017. Closing the loop: Automated data-driven cognitive model discoveries lead to improved instruction and learning gains. Journal of Educational Data Mining 9, 1, 25–41.

MIAN, S., GOSWAMI, M., AND MOSTOW, J. 2019. What’s most broken? design and evaluation of a tool to guide improvement of an intelligent tutor. In Proceedings of Artificial Intelligence in Education, S. Isotani, E. Mill´an, A. Ogan, P. Hastings, B. McLaren, and R. Luckin, Eds. Springer, 283–295.

MOHAMMED, M. AND OMAR, N. 2020. Question classification based on bloom’s taxonomy cognitive domain using modified tf-idf and word2vec. PloS one 15, 3, e0230442.

OMAR, N., HARIS, S. S., HASSAN, R., ARSHAD, H., RAHMAT, M., ZAINAL, N. F. A., AND ZULKIFLI, R. 2012. Automated analysis of exam questions according to bloom’s taxonomy. Procedia-Social and Behavioral Sciences 59, 297–303.

OOSTERHOF, A. C. 1976. Similarity of various item discrimination indices. Journal of Educational Measurement 13, 2, 145–150.

PEDREGOSA, F., VAROQUAUX, G., GRAMFORT, A., MICHEL, V., THIRION, B., GRISEL, O., BLONDEL, M., PRETTENHOFER, P., WEISS, R., DUBOURG, V., VANDERPLAS, J., PASSOS, A., COURNAPEAU, D., BRUCHER, M., PERROT, M., AND DUCHESNAY, E. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830.

PELÁNEK, R. 2016. Applications of the elo rating system in adaptive educational systems. Computers & Education 98, 169–179.

PELÁNEK, R. 2017. Bayesian knowledge tracing, logistic models, and beyond: an overview of learner modeling techniques. User Modeling and User-Adapted Interaction 27, 3, 313–350.

PELÁNEK, R. 2018. The details matter: methodological nuances in the evaluation of student models. User Modeling and User-Adapted Interaction 28, 3, 207–235.

PELÁNEK, R. 2020a. Managing items and knowledge components: domain modeling in practice. Educational Technology Research and Development 68, 1, 529–550.

PELÁNEK, R. 2020b. Measuring similarity of educational items: An overview. IEEE Transactions on Learning Technologies 13, 2, 354–366.

PELÁNEK, R. 2021. Analyzing and visualizing learning data: A system designer’s perspective. Journal of Learning Analytics 8, 2, 93–104.

PELÁNEK, R. 2022. Adaptive, intelligent, and personalized: Navigating the terminological maze behind educational technology. International Journal of Artificial Intelligence in Education 32, 151–173.

PELÁNEK, R. AND EFFENBERGER, T. 2022. Improving learning environments: Avoiding stupidity perspective. IEEE Transactions on Learning Technologies 15, 1, 64–77.

PELÁNEK, R., EFFENBERGER, T., AND ČECHÁK, J. 2022. Complexity and difficulty of items in learning systems. International Journal of Artificial Intelligence in Education 32, 196–232.

PELÁNEK, R. AND ˇŘIHÁK, J. 2018. Analysis and design of mastery learning criteria. New Review of Hypermedia and Multimedia 24, 133–159.

PELÁNEK, R., ŘIHÁK, J., ET AL. 2016. Properties and applications of wrong answers in online educational systems. In Proceedings of the 9th International Conference on Educational Data Mining, T. Barnes, M. Chi, and M. Feng, Eds. International Educational Data Mining Society, 466–471.

PYRCZAK, F. 1973. Validity of the discrimination index as a measure of item quality 1. Journal of Educational Measurement 10, 3, 227–231.

RACHATASUMRIT, N. AND KOEDINGER, K. R. 2021. Toward improving student model estimates through assistance scores in principle and in practice. In Proceedings of The 14th International Conference on Educational Data Mining, S. I. Hsiao, S. Sahebi, F. Bouchet, and J. Vie, Eds. International Educational Data Mining Society, 295–301.

SAISUBRAMANIAN, S., GALHOTRA, S., AND ZILBERSTEIN, S. 2020. Balancing the tradeoff between clustering value and interpretability. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, A. Markham, J. Powles, T. Walsh, and A. L. Washington, Eds. Association for Computing Machinery, 351–357.

SALO, O. AND ABRAHAMSSON, P. 2007. An iterative improvement process for agile software development. Software Process: Improvement and Practice 12, 1, 81–100.

SHEEHAN, K. M., KOSTIN, I., NAPOLITANO, D., AND FLOR, M. 2014. The textevaluator tool: Helping teachers and test developers select texts for use in instruction and assessment. The Elementary School Journal 115, 2, 184–209.

STEINKE, A., KOPP, B., AND LANGE, F. 2021. The wisconsin card sorting test: split-half reliability estimates for a self-administered computerized variant. Brain Sciences 11, 5, 529.

WANG, X., ROSE, C., AND KOEDINGER, K. 2021. Seeing beyond expert blind spots: Online learning design for scale and quality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, K. Isbister, T. Igarashi, P. Bjørn, and S. Drucker, Eds. Association for Computing Machinery, 1–14.

YAHYA, A. A., TOUKAL, Z., AND OSMAN, A. 2012. Bloom’s taxonomy–based classification for item bank questions using support vector machines. In Modern advances in intelligent systems and tools, W. Ding, H. Jiang, M. Ali, and M. Li, Eds. Springer, 135–140.

ZANZOTTO, F. M. 2019. Human-in-the-loop artificial intelligence. Journal of Artificial Intelligence Research 64, 243–252.