Intrinsic and Contextual Factors Impacting Student Ratings of Automatically Generated Questions: A Large-Scale Data Analysis

Benny G. Johnson; Jeffrey S. Dittel; Rachel Van Campenhout

doi:10.5281/zenodo.15174917

Intrinsic and Contextual Factors Impacting Student Ratings of Automatically Generated Questions: A Large-Scale Data Analysis

PDF HTML

Published Apr 8, 2025

DOI https://doi.org/10.5281/zenodo.15174917

Benny G. Johnson

VitalSource

https://orcid.org/0000-0003-4267-9608

Jeffrey S. Dittel

VitalSource

Rachel Van Campenhout

VitalSource

https://orcid.org/0000-0001-8404-6513

Abstract

Combining formative practice with the primary expository content in a learning by doing method is a proven approach to increase student learning. Artificial intelligence has led the way for automatic question generation (AQG) systems that can generate volumes of formative practice otherwise prohibitive to create with human effort. One such AQG system was developed that used textbooks as the corpus of generation for the sole purpose of generating formative practice to place alongside the textbook content for students to use as a study tool. In this work, we analyzed a data set comprising over 5.2 million student-question interaction sessions. More than 800,000 unique questions were answered across more than 9,000 textbooks, with over 400,000 students using them. As part of the user experience, students could rate questions after answering with a social media-style thumbs up or thumbs down. In this investigation, this student feedback data was used to gain new insights into the automatically generated questions: are there features of questions that influence student ratings? An explanatory model was developed to analyze ten key features that may influence student ratings. Results and implications for improving automatic question generation are discussed. The code and data for this paper are available at https://github.com/vitalsource/data.

How to Cite

Johnson, B. G., Dittel, J. S., & Van Campenhout, R. (2025). Intrinsic and Contextual Factors Impacting Student Ratings of Automatically Generated Questions: A Large-Scale Data Analysis. Journal of Educational Data Mining, 17(1), 217–247. https://doi.org/10.5281/zenodo.15174917

Abstract 198 | PDF Downloads 146 HTML Downloads 223

Keywords

automatic question generation, student ratings, explanatory model, question features

References

ALSHEHRI, Y. A. (2019). Applying explanatory analysis in education using different regression methods. In Proceedings of the 4th International Conference on Information and Education Innovations (ICIEI ‘19), 109-115. https://doi.org/10.1145/3345094.3345111

ANDERSON, R. C. (2018). Role of the reader’s schema in comprehension, learning, and memory. In D. E. Alvermann, N. J. Unrau, M. Sailors, & R. B. Ruddell (Eds.), Theoretical models and processes of literacy (7th ed., pp. 136–145). Routledge. https://doi.org/10.4324/9781315110592

BAKER, R. S., & INVENTADO, P. S. (2016). Educational data mining and learning analytics. Emergence and Innovation. In Digital learning: Foundations and applications (pp. 1–15), George Veletsianos (Ed.). Athabasca University Press. https://doi.org/10.15215/aupress/9781771991490.01

BANDURA, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215.

BLACK, P., & WILIAM, D. (2010). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 92(1), 81–90. https://doi.org/10.1177/003172171009200119

BOOK INDUSTRY STUDY GROUP (2022). Complete BISAC subject headings list. https://www.bisg.org/complete-bisac-subject-headings-list

BROOKS, M. E., KRISTENSEN, K., BENTHEM, K. J., MAGNUSSON, A., BERG, C. W., NIELSEN, A., SKAUG, H. J., MAECHLER, M., & BOLKER, B. M. (2017). glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. The R Journal, 9(2), 378–400. https://doi.org/10.32614/RJ-2017-066

COHEN, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587

DAS, B., MAJUMDER, M., PHADIKAR, S., & SEKH, A. A. (2021). Automatic question generation and answer assessment: A survey. Research and Practice in Technology Enhanced Learning, 16, Article 5. https://doi.org/10.1186/s41039-021-00151-1

DITTEL, J. S., CLARK, M. W., VAN CAMPENHOUT, R., & JOHNSON, B. G. (2024). Exploring large language models for evaluating automatically generated questions. In Workshop on Automated Evaluation of Learning and Assessment Content at the 25th International Conference on Artificial Intelligence in Education (pp. 1–6). https://sites.google.com/view/eval-lac-2024/program

DUNLOSKY, J., RAWSON, K., MARSH, E., NATHAN, M., & WILLINGHAM, D. (2013). Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14(1), 4–58. https://doi.org/10.1177/1529100612453266

EVERT, S. (2009). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol. 2, pp. 1212–1248). Mouton de Gruyter. https://doi.org/10.1515/9783110213881.2.1212

FELLBAUM, C. (2010). WordNet. In R. Poli, M. Healy, & A. Kameas (Eds.), Theory and applications of ontology: Computer applications (pp. 231–243). Springer Netherlands. https://doi.org/10.1007/978-90-481-8847-5

FISCHER, C., PARDOS, Z. A., BAKER, R. S., WILLIAMS, J. J., SMYTH, P., YU, R., SLATER, S., BAKER, R., & WARSCHAUER, M. (2020). Mining big data in education: Affordances and challenges. Review of Research in Education, 44(1), 130–160. https://doi.org/10.3102/0091732X20903304

GIVÓN, T. (2001). Syntax: A functional-typological introduction (2nd ed.). John Benjamins.

GOLDSTEIN, P. J., & KATZ, R. N. (2005). Academic analytics: The uses of management information and technology in higher education. Educause. https://library.educause.edu/-/media/files/library/2005/12/ers0508w-pdf.pdf

GRAY, J. A., & DILORETO, M. (2016). The effects of student engagement, student satisfaction, and perceived learning in online learning environments. International Journal of Educational Leadership Preparation, 11(1), 1–20. https://eric.ed.gov/?id=EJ1103654

HONNIBAL, M., MONTANI, I., VAN LANDEGHEM, S., & BOYD, A. (2020). spaCy: Industrial-strength natural language processing in Python. https://doi.org/10.5281/zenodo.1212303

HUBERTZ, M., & VAN CAMPENHOUT, R. (2023). Leveraging learning by doing in online psychology courses: Replicating engagement and outcomes. In eLmL 2023: The Fifteenth International Conference on Mobile, Hybrid, and On-line Learning (pp. 46–49). https://www.thinkmind.org/index.php?view=article&articleid=elml_2023_2_60_50025

HUNT, R. R., & WORTHEN, J. B. (Eds.). (2006). Distinctiveness and memory. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195169669.001.0001

JEROME, B., VAN CAMPENHOUT, R., DITTEL, J. S., BENTON, R., & JOHNSON, B. G. (2023). Iterative improvement of automatically generated practice with the Content Improvement Service. In R. Sottilare & J. Schwarz (Eds.), Lecture Notes in Computer Science: Vol. 14044. Adaptive Instructional Systems, HCII 2023. (pp. 312–324). Springer, Cham. https://doi.org/10.1007/978-3-031-34735-1_22

JEROME, B., VAN CAMPENHOUT, R., DITTEL, J. S., BENTON, R., GREENBERG, S., & JOHNSON, B. G. (2022). The Content Improvement Service: An adaptive system for continuous improvement at scale. In Meiselwitz, et al., Lecture Notes in Computer Science: Vol 13517. Interaction in New Media, Learning and Games, HCII 2022 (pp. 286–296). Springer, Cham. https://doi.org/10.1007/978-3-031-22131-6_22

JOHNSON, B. G., DITTEL, J., & VAN CAMPENHOUT, R. (2024). Investigating student ratings with features of automatically generated questions: A large-scale analysis using data from natural learning contexts. In Proceedings of the 17th International Conference on Educational Data Mining (pp. 194–202). https://doi.org/10.5281/zenodo.12729796

JOHNSON, B. G., DITTEL, J. S., VAN CAMPENHOUT, R., & JEROME, B. (2022). Discrimination of automatically generated questions used as formative practice. In Proceedings of the Ninth ACM Conference on Learning@Scale, 325–329. https://doi.org/10.1145/3491140.3528323

KOEDINGER, K., KIM, J., JIA, J., MCLAUGHLIN, E., & BIER, N. (2015). Learning is not a spectator sport: Doing is better than watching for learning from a MOOC. In Proceedings of the Second ACM Conference on Learning@Scale, 111–120. https://doi.org/10.1145/2724660.2724681

KOEDINGER, K. R., MCLAUGHLIN, E. A., JIA, J. Z., & BIER, N. L. (2016). Is the doer effect a causal relationship? How can we tell and why it's important. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, 388–397. http://dx.doi.org/10.1145/2883851.2883957

KURDI, G., LEO, J., PARSIA, B., SATTLER, U., & AL-EMARI, S. (2020). A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education, 30(1), 121–204. https://doi.org/10.1007/s40593-019-00186-y

LIANG, K.-Y., & ZEGER, S. L. (1993). Regression analysis for correlated data. Annual Review of Public Health, 14(1), 43–68. https://doi.org/10.1146/annurev.pu.14.050193.000355

MCFARLAND, D. A., KHANNA, S., DOMINGUE, B. W., & PARDOS, Z. A. (2021). Education data science: Past, present, future. AERA Open, 7(1), 1–12. https://doi.org/10.1177/23328584211052055

MIHALCEA, R., & TARAU, P. (2004). TextRank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 404–411. https://aclanthology.org/W04-3252

MULLA, N., & GHARPURE, P. (2023). Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications. Progress in Artificial Intelligence, 12(1), 1–32. https://doi.org/10.1007/s13748-023-00295-9

NIELSEN, J. (1994). Usability engineering. Morgan Kaufmann. https://doi.org/10.1016/C2009-0-21512-1

R CORE TEAM (2021). R: A language and environment for statistical computing (Version 4.4.1) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/

REARDON, S. F., & STUART, E. A. (2019). Education research in a new data environment: Special issue introduction. Journal of Research on Educational Effectiveness, 12(4), 567–569. https://doi.org/10.1080/19345747.2019.1685339

SAINANI, K. L. (2014). Explanatory versus predictive modeling. PM&R, 6(9), 841–844. https://doi.org/10.1016/j.pmrj.2014.08.941

SHMUELI, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330

SHUTE, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189. https://doi.org/10.3102/0034654307313795

SWELLER, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1016/0364-0213(88)90023-7

VAN CAMPENHOUT, R., AUTRY, K. S., CLARK, M. W., DITTEL, J. S., JEROME, B., & JOHNSON, B. G. (2025). Scaling the doer effect: A replication analysis using AI-generated questions [Manuscript submitted for publication].

VAN CAMPENHOUT, R., BROWN, N., JEROME, B., DITTEL, J. S., & JOHNSON, B. G. (2021a). Toward effective courseware at scale: Investigating automatically generated questions as formative practice. In L@S '21: Proceedings of the Eighth ACM Conference on Learning@Scale, 295–298. https://doi.org/10.1145/3430895.3460162

VAN CAMPENHOUT, R., CLARK, M., DITTEL, J. S., BROWN, N., BENTON, R., & JOHNSON, B. G. (2023a). Exploring student persistence with automatically generated practice using interaction patterns. In 2023 International Conference on Software, Telecommunications and Computer Networks (SoftCOM) (pp. 1–6). https://doi.org/10.23919/SoftCOM58365.2023.10271578

VAN CAMPENHOUT, R., CLARK, M., JEROME, B., DITTEL, J. S., & JOHNSON, B. G. (2023b). Advancing intelligent textbooks with automatically generated practice: A large-scale analysis of student data. In Proceedings of the 5th Workshop on Intelligent Textbooks, The 24th International Conference on Artificial Intelligence in Education (pp. 15–28). https://intextbooks.science.uu.nl/workshop2023/files/itb23_s1p2.pdf

VAN CAMPENHOUT, R., CLARK, M., JOHNSON, B. G., DEININGER, M., HARPER, S., ODENWELLER, K., & WILGENBUSCH, E. (2024a). Automatically generated practice in the classroom: Exploring performance and impact across courses. In Proceedings of the 32nd International Conference on Software, Telecommunications and Computer Networks (SoftCOM 2024), 1–6. https://doi.org/10.23919/SoftCOM62040.2024.10721828

VAN CAMPENHOUT, R., DITTEL, J. S., JEROME, B., & JOHNSON, B. G. (2021b). Transforming textbooks into learning by doing environments: An evaluation of textbook-based automatic question generation. In Third Workshop on Intelligent Textbooks at the 22nd International Conference on Artificial Intelligence in Education CEUR Workshop Proceedings, 1–12. https://ceur-ws.org/Vol-2895/paper06.pdf

VAN CAMPENHOUT, R., HUBERTZ, M., & JOHNSON, B. G. (2022). Evaluating AI-generated questions: A mixed-methods analysis using question data and student perceptions. In M. M. Rodrigo, N. Matsuda, A. I. Cristea, & V. Dimitrova, V. (Eds.) Lecture Notes in Computer Science: Vol 13355. Artificial Intelligence in Education, AIED 2022 (pp. 344–353). Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_28

VAN CAMPENHOUT, R., JEROME, B., & JOHNSON, B. G. (2023a). Engaging in student-centered educational data science through learning engineering. In A. Peña-Ayala (Ed.), Educational data science: Essentials, approaches, and tendencies (pp. 1–40). Springer Singapore. https://doi.org/10.1007/978-981-99-0026-8_1

VAN CAMPENHOUT, R., JEROME, B., & JOHNSON, B. G. (2023b). The doer effect at scale: Investigating correlation and causation across seven courses. In Proceedings of LAK23: 13th International Learning Analytics and Knowledge Conference, 357–365. https://doi.org/10.1145/3576050.3576103

VAN CAMPENHOUT, R., JOHNSON, B. G., & OLSEN, J. A. (2021). The doer effect: Replicating findings that doing causes learning. Presented at eLmL 2021: The Thirteenth International Conference on Mobile, Hybrid, and On-line Learning, 1–6. https://www.thinkmind.org/index.php?view=article&articleid=elml_2021_1_10_58001

VAN CAMPENHOUT, R., JOHNSON, B. G., & OLSEN, J. A. (2022). The doer effect: Replication and comparison of correlational and causal analyses of learning. International Journal on Advances in Systems and Measurements, 15(1-2), 48-59. https://www.iariajournals.org/systems_and_measurements/sysmea_v15_n12_2022_paged.pdf

VAN CAMPENHOUT, R., KIMBALL, M., CLARK, M., DITTEL, J. S., JEROME, B., & JOHNSON, B. G. (2024b). An investigation of automatically generated feedback on student behavior and learning. In Proceedings of LAK24: 14th International Learning Analytics and Knowledge Conference, 850-856. https://doi.org/10.1145/3636555.3636901

VITALSOURCE. (2024). Supplemental Data Repository. https://github.com/vitalsource/data

ZHANG, R., GUO, J., CHEN, L., FAN, Y., & CHENG, X. (2021). A review on question generation from natural language text. ACM Transactions on Information Systems, 40(1), Article 14, 1-43. https://doi.org/10.1145/3468889

Issue

Vol. 17 No. 1 (2025)

Section

Extended Articles from the EDM 2024 Conference

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms:

The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:

Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
The Author represents and warrants that:

the Work is the Author’s original work;
the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
the Work is not pending review or under consideration by another publisher;
the Work has not previously been published;
the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
the Work contains no libel, invasion of privacy, or other unlawful matter.

The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Abstract

How to Cite

##plugins.themes.bootstrap3.article.details##