Intrinsic and Contextual Factors Impacting Student Ratings of Automatically Generated Questions: A Large-Scale Data Analysis

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Apr 8, 2025
Benny G. Johnson Jeffrey S. Dittel Rachel Van Campenhout

Abstract

Combining formative practice with the primary expository content in a learning by doing method is a proven approach to increase student learning. Artificial intelligence has led the way for automatic question generation (AQG) systems that can generate volumes of formative practice otherwise prohibitive to create with human effort. One such AQG system was developed that used textbooks as the corpus of generation for the sole purpose of generating formative practice to place alongside the textbook content for students to use as a study tool. In this work, we analyzed a data set comprising over 5.2 million student-question interaction sessions. More than 800,000 unique questions were answered across more than 9,000 textbooks, with over 400,000 students using them. As part of the user experience, students could rate questions after answering with a social media-style thumbs up or thumbs down. In this investigation, this student feedback data was used to gain new insights into the automatically generated questions: are there features of questions that influence student ratings? An explanatory model was developed to analyze ten key features that may influence student ratings. Results and implications for improving automatic question generation are discussed. The code and data for this paper are available at https://github.com/vitalsource/data.

How to Cite

Johnson, B. G., Dittel, J. S., & Van Campenhout, R. (2025). Intrinsic and Contextual Factors Impacting Student Ratings of Automatically Generated Questions: A Large-Scale Data Analysis. Journal of Educational Data Mining, 17(1), 217–247. https://doi.org/10.5281/zenodo.15174917
Abstract 17 | PDF Downloads 20 HTML Downloads 28

##plugins.themes.bootstrap3.article.details##

Keywords

automatic question generation, student ratings, explanatory model, question features

References
ALSHEHRI, Y. A. (2019). Applying explanatory analysis in education using different regression methods. In Proceedings of the 4th International Conference on Information and Education Innovations (ICIEI ‘19), 109-115. https://doi.org/10.1145/3345094.3345111

ANDERSON, R. C. (2018). Role of the reader’s schema in comprehension, learning, and memory. In D. E. Alvermann, N. J. Unrau, M. Sailors, & R. B. Ruddell (Eds.), Theoretical models and processes of literacy (7th ed., pp. 136–145). Routledge. https://doi.org/10.4324/9781315110592

BAKER, R. S., & INVENTADO, P. S. (2016). Educational data mining and learning analytics. Emergence and Innovation. In Digital learning: Foundations and applications (pp. 1–15), George Veletsianos (Ed.). Athabasca University Press. https://doi.org/10.15215/aupress/9781771991490.01

BANDURA, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215.

BLACK, P., & WILIAM, D. (2010). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 92(1), 81–90. https://doi.org/10.1177/003172171009200119

BOOK INDUSTRY STUDY GROUP (2022). Complete BISAC subject headings list. https://www.bisg.org/complete-bisac-subject-headings-list

BROOKS, M. E., KRISTENSEN, K., BENTHEM, K. J., MAGNUSSON, A., BERG, C. W., NIELSEN, A., SKAUG, H. J., MAECHLER, M., & BOLKER, B. M. (2017). glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. The R Journal, 9(2), 378–400. https://doi.org/10.32614/RJ-2017-066

COHEN, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587

DAS, B., MAJUMDER, M., PHADIKAR, S., & SEKH, A. A. (2021). Automatic question generation and answer assessment: A survey. Research and Practice in Technology Enhanced Learning, 16, Article 5. https://doi.org/10.1186/s41039-021-00151-1

DITTEL, J. S., CLARK, M. W., VAN CAMPENHOUT, R., & JOHNSON, B. G. (2024). Exploring large language models for evaluating automatically generated questions. In Workshop on Automated Evaluation of Learning and Assessment Content at the 25th International Conference on Artificial Intelligence in Education (pp. 1–6). https://sites.google.com/view/eval-lac-2024/program

DUNLOSKY, J., RAWSON, K., MARSH, E., NATHAN, M., & WILLINGHAM, D. (2013). Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14(1), 4–58. https://doi.org/10.1177/1529100612453266

EVERT, S. (2009). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol. 2, pp. 1212–1248). Mouton de Gruyter. https://doi.org/10.1515/9783110213881.2.1212

FELLBAUM, C. (2010). WordNet. In R. Poli, M. Healy, & A. Kameas (Eds.), Theory and applications of ontology: Computer applications (pp. 231–243). Springer Netherlands. https://doi.org/10.1007/978-90-481-8847-5

FISCHER, C., PARDOS, Z. A., BAKER, R. S., WILLIAMS, J. J., SMYTH, P., YU, R., SLATER, S., BAKER, R., & WARSCHAUER, M. (2020). Mining big data in education: Affordances and challenges. Review of Research in Education, 44(1), 130–160. https://doi.org/10.3102/0091732X20903304

GIVÓN, T. (2001). Syntax: A functional-typological introduction (2nd ed.). John Benjamins.

GOLDSTEIN, P. J., & KATZ, R. N. (2005). Academic analytics: The uses of management information and technology in higher education. Educause. https://library.educause.edu/-/media/files/library/2005/12/ers0508w-pdf.pdf

GRAY, J. A., & DILORETO, M. (2016). The effects of student engagement, student satisfaction, and perceived learning in online learning environments. International Journal of Educational Leadership Preparation, 11(1), 1–20. https://eric.ed.gov/?id=EJ1103654

HONNIBAL, M., MONTANI, I., VAN LANDEGHEM, S., & BOYD, A. (2020). spaCy: Industrial-strength natural language processing in Python. https://doi.org/10.5281/zenodo.1212303

HUBERTZ, M., & VAN CAMPENHOUT, R. (2023). Leveraging learning by doing in online psychology courses: Replicating engagement and outcomes. In eLmL 2023: The Fifteenth International Conference on Mobile, Hybrid, and On-line Learning (pp. 46–49). https://www.thinkmind.org/index.php?view=article&articleid=elml_2023_2_60_50025

HUNT, R. R., & WORTHEN, J. B. (Eds.). (2006). Distinctiveness and memory. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195169669.001.0001

JEROME, B., VAN CAMPENHOUT, R., DITTEL, J. S., BENTON, R., & JOHNSON, B. G. (2023). Iterative improvement of automatically generated practice with the Content Improvement Service. In R. Sottilare & J. Schwarz (Eds.), Lecture Notes in Computer Science: Vol. 14044. Adaptive Instructional Systems, HCII 2023. (pp. 312–324). Springer, Cham. https://doi.org/10.1007/978-3-031-34735-1_22

JEROME, B., VAN CAMPENHOUT, R., DITTEL, J. S., BENTON, R., GREENBERG, S., & JOHNSON, B. G. (2022). The Content Improvement Service: An adaptive system for continuous improvement at scale. In Meiselwitz, et al., Lecture Notes in Computer Science: Vol 13517. Interaction in New Media, Learning and Games, HCII 2022 (pp. 286–296). Springer, Cham. https://doi.org/10.1007/978-3-031-22131-6_22

JOHNSON, B. G., DITTEL, J., & VAN CAMPENHOUT, R. (2024). Investigating student ratings with features of automatically generated questions: A large-scale analysis using data from natural learning contexts. In Proceedings of the 17th International Conference on Educational Data Mining (pp. 194–202). https://doi.org/10.5281/zenodo.12729796

JOHNSON, B. G., DITTEL, J. S., VAN CAMPENHOUT, R., & JEROME, B. (2022). Discrimination of automatically generated questions used as formative practice. In Proceedings of the Ninth ACM Conference on Learning@Scale, 325–329. https://doi.org/10.1145/3491140.3528323

KOEDINGER, K., KIM, J., JIA, J., MCLAUGHLIN, E., & BIER, N. (2015). Learning is not a spectator sport: Doing is better than watching for learning from a MOOC. In Proceedings of the Second ACM Conference on Learning@Scale, 111–120. https://doi.org/10.1145/2724660.2724681

KOEDINGER, K. R., MCLAUGHLIN, E. A., JIA, J. Z., & BIER, N. L. (2016). Is the doer effect a causal relationship? How can we tell and why it's important. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, 388–397. http://dx.doi.org/10.1145/2883851.2883957

KURDI, G., LEO, J., PARSIA, B., SATTLER, U., & AL-EMARI, S. (2020). A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education, 30(1), 121–204. https://doi.org/10.1007/s40593-019-00186-y

LIANG, K.-Y., & ZEGER, S. L. (1993). Regression analysis for correlated data. Annual Review of Public Health, 14(1), 43–68. https://doi.org/10.1146/annurev.pu.14.050193.000355

MCFARLAND, D. A., KHANNA, S., DOMINGUE, B. W., & PARDOS, Z. A. (2021). Education data science: Past, present, future. AERA Open, 7(1), 1–12. https://doi.org/10.1177/23328584211052055

MIHALCEA, R., & TARAU, P. (2004). TextRank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 404–411. https://aclanthology.org/W04-3252

MULLA, N., & GHARPURE, P. (2023). Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications. Progress in Artificial Intelligence, 12(1), 1–32. https://doi.org/10.1007/s13748-023-00295-9

NIELSEN, J. (1994). Usability engineering. Morgan Kaufmann. https://doi.org/10.1016/C2009-0-21512-1

R CORE TEAM (2021). R: A language and environment for statistical computing (Version 4.4.1) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/

REARDON, S. F., & STUART, E. A. (2019). Education research in a new data environment: Special issue introduction. Journal of Research on Educational Effectiveness, 12(4), 567–569. https://doi.org/10.1080/19345747.2019.1685339

SAINANI, K. L. (2014). Explanatory versus predictive modeling. PM&R, 6(9), 841–844. https://doi.org/10.1016/j.pmrj.2014.08.941

SHMUELI, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330

SHUTE, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189. https://doi.org/10.3102/0034654307313795

SWELLER, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1016/0364-0213(88)90023-7

VAN CAMPENHOUT, R., AUTRY, K. S., CLARK, M. W., DITTEL, J. S., JEROME, B., & JOHNSON, B. G. (2025). Scaling the doer effect: A replication analysis using AI-generated questions [Manuscript submitted for publication].

VAN CAMPENHOUT, R., BROWN, N., JEROME, B., DITTEL, J. S., & JOHNSON, B. G. (2021a). Toward effective courseware at scale: Investigating automatically generated questions as formative practice. In L@S '21: Proceedings of the Eighth ACM Conference on Learning@Scale, 295–298. https://doi.org/10.1145/3430895.3460162

VAN CAMPENHOUT, R., CLARK, M., DITTEL, J. S., BROWN, N., BENTON, R., & JOHNSON, B. G. (2023a). Exploring student persistence with automatically generated practice using interaction patterns. In 2023 International Conference on Software, Telecommunications and Computer Networks (SoftCOM) (pp. 1–6). https://doi.org/10.23919/SoftCOM58365.2023.10271578

VAN CAMPENHOUT, R., CLARK, M., JEROME, B., DITTEL, J. S., & JOHNSON, B. G. (2023b). Advancing intelligent textbooks with automatically generated practice: A large-scale analysis of student data. In Proceedings of the 5th Workshop on Intelligent Textbooks, The 24th International Conference on Artificial Intelligence in Education (pp. 15–28). https://intextbooks.science.uu.nl/workshop2023/files/itb23_s1p2.pdf

VAN CAMPENHOUT, R., CLARK, M., JOHNSON, B. G., DEININGER, M., HARPER, S., ODENWELLER, K., & WILGENBUSCH, E. (2024a). Automatically generated practice in the classroom: Exploring performance and impact across courses. In Proceedings of the 32nd International Conference on Software, Telecommunications and Computer Networks (SoftCOM 2024), 1–6. https://doi.org/10.23919/SoftCOM62040.2024.10721828

VAN CAMPENHOUT, R., DITTEL, J. S., JEROME, B., & JOHNSON, B. G. (2021b). Transforming textbooks into learning by doing environments: An evaluation of textbook-based automatic question generation. In Third Workshop on Intelligent Textbooks at the 22nd International Conference on Artificial Intelligence in Education CEUR Workshop Proceedings, 1–12. https://ceur-ws.org/Vol-2895/paper06.pdf

VAN CAMPENHOUT, R., HUBERTZ, M., & JOHNSON, B. G. (2022). Evaluating AI-generated questions: A mixed-methods analysis using question data and student perceptions. In M. M. Rodrigo, N. Matsuda, A. I. Cristea, & V. Dimitrova, V. (Eds.) Lecture Notes in Computer Science: Vol 13355. Artificial Intelligence in Education, AIED 2022 (pp. 344–353). Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_28

VAN CAMPENHOUT, R., JEROME, B., & JOHNSON, B. G. (2023a). Engaging in student-centered educational data science through learning engineering. In A. Peña-Ayala (Ed.), Educational data science: Essentials, approaches, and tendencies (pp. 1–40). Springer Singapore. https://doi.org/10.1007/978-981-99-0026-8_1

VAN CAMPENHOUT, R., JEROME, B., & JOHNSON, B. G. (2023b). The doer effect at scale: Investigating correlation and causation across seven courses. In Proceedings of LAK23: 13th International Learning Analytics and Knowledge Conference, 357–365. https://doi.org/10.1145/3576050.3576103

VAN CAMPENHOUT, R., JOHNSON, B. G., & OLSEN, J. A. (2021). The doer effect: Replicating findings that doing causes learning. Presented at eLmL 2021: The Thirteenth International Conference on Mobile, Hybrid, and On-line Learning, 1–6. https://www.thinkmind.org/index.php?view=article&articleid=elml_2021_1_10_58001

VAN CAMPENHOUT, R., JOHNSON, B. G., & OLSEN, J. A. (2022). The doer effect: Replication and comparison of correlational and causal analyses of learning. International Journal on Advances in Systems and Measurements, 15(1-2), 48-59. https://www.iariajournals.org/systems_and_measurements/sysmea_v15_n12_2022_paged.pdf

VAN CAMPENHOUT, R., KIMBALL, M., CLARK, M., DITTEL, J. S., JEROME, B., & JOHNSON, B. G. (2024b). An investigation of automatically generated feedback on student behavior and learning. In Proceedings of LAK24: 14th International Learning Analytics and Knowledge Conference, 850-856. https://doi.org/10.1145/3636555.3636901

VITALSOURCE. (2024). Supplemental Data Repository. https://github.com/vitalsource/data

ZHANG, R., GUO, J., CHEN, L., FAN, Y., & CHENG, X. (2021). A review on question generation from natural language text. ACM Transactions on Information Systems, 40(1), Article 14, 1-43. https://doi.org/10.1145/3468889
Section
Extended Articles from the EDM 2024 Conference