Automated Feedback Generation for Student Project Reports: A Data-Driven Approach

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Dec 18, 2022
Qinjin Jia Mitchell Young Yunkai Xiao Jialin Cui Chengyuan Liu Parvez Rashid Edward Gehringer

Abstract

Instant feedback plays a vital role in promoting academic achievement and student success. In practice,
however, delivering timely feedback to students can be challenging for instructors for a variety of reasons
(e.g., limited teaching resources). In many cases, feedback arrives too late for learners to act on
the advice and reinforce their understanding. To this end, researchers have designed various automated
feedback systems in different domains, including novice programming, short-essay writing, and openended
questions. To the best of our knowledge, no previous study has investigated automated feedback
generation for a more complex form of student work - student project reports. In this work, we present
a novel data-driven system, dubbed Insta-Reviewer, for automatically generating instant feedback on
student project reports. In addition to automatic metrics such as ROUGE scores and BERTScore, we
propose a five-dimension framework for manually evaluating system-generated feedback. Experimental
results show that feedback generated by Insta-Reviewer on real students’ project reports can achieve
near-human quality. Our work demonstrates the feasibility of automatic feedback generation for students’
project reports while highlighting several prominent challenges for future research.

How to Cite

Jia, Q., Young, M., Xiao, Y., Cui, J., Liu, C., Rashid, P., & Gehringer, E. (2022). Automated Feedback Generation for Student Project Reports: A Data-Driven Approach. Journal of Educational Data Mining, 14(3), 132–161. https://doi.org/10.5281/zenodo.7304954
Abstract 1312 | PDF Downloads 651

##plugins.themes.bootstrap3.article.details##

Keywords

automated feedback generation, automated review generation, instant feedback, learning at scale, mining educational data

References
ALHARBI, W. 2017. E-feedback as a scaffolding teaching strategy in the online language classroom.Journal of Educational Technology Systems 46, 2, 239–251.

ARIELY, M., NAZARETSKY, T., AND ALEXANDRON, G. 2020. Fisrt steps towards NLP-based formative feeadback to imporove scientific writing in Hebrew. In Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020), A. N. Rafferty, C. R. Jacob Whitehill, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society, Montreal, Canada, 565–568.

BASU, S., JACOBS, C., AND VANDERWENDE, L. 2013. Powergrading: a clustering approach to amplify human effort for short answer grading. Transactions of the Association for Computational Linguistics 1, 391–402.

BENDER, E. M., GEBRU, T., MCMILLAN-MAJOR, A., AND SHMITCHELL, S. 2021. On the dangers of stochastic parrots: Can language models be too big? In FAccT 2021 - Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Vol. 1. Association for Computing Machinery, New York, NY, 610–623.

BOROWCZAK, M. ET AL. 2015. Communication in stem education: A non-intrusive method for assessment & k20 educator feedback. Problems of Education in the 21st Century 65, 1, 18–27.

BOTARLEANU, R.-M., DASCALU, M., SIRBU, M.-D., CROSSLEY, S. A., AND TRAUSAN-MATU, S. 2018. Readme–generating personalized feedback for essay writing using the readerbench framework. In Conference on Smart Learning Ecosystems and Regional Development, H. Knoche, E. Popescu, and A. Cartelli, Eds. SLERD ’18. Springer, 133–145.

CALIGIURI, P. AND THOMAS, D. C. 2013. From the editors: How to write a high-quality review. Journal of International Business Studies 44, 6, 547–553.

CALLISON-BURCH, C., OSBORNE, M., AND KOEHN, P. 2006. Re-evaluating the role of bleu in machine translation research. In 11th conference of the european chapter of the association for computational linguistics, D. McCarthy and S. Wintner, Eds. Association for Computational Linguistics, Trento, Italy, 249–256.

CARLESS, D., SALTER, D., YANG, M., AND LAM, J. 2011. Developing sustainable feedback practices. Studies in higher education 36, 4, 395–407.

CELIKYILMAZ, A., CLARK, E., AND GAO, J. 2020. Evaluation of text generation: A survey. arXiv preprint arXiv:2006.14799.

DEEVA, G., BOGDANOVA, D., SERRAL, E., SNOECK, M., AND DE WEERDT, J. 2021. A review of automated feedback systems for learners: Classification framework, challenges and opportunities. Computers & Education 162, 104094.

DEVLIN, J., CHANG, M.-W., LEE, K., AND TOUTANOVA, K. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.

DUŠEK, O. AND KASNER, Z. 2020. Evaluating semantic accuracy of data-to-text generation with natural language inference. In Proceedings of the 13th International Conference on Natural Language Generation, B. Davis, Y. Graham, J. Kelleher, and Y. Sripada, Eds. Association for Computational Linguistics, Dublin, Ireland, 131–137.

ERHAN, D., COURVILLE, A., BENGIO, Y., AND VINCENT, P. 2010. Why does unsupervised pretraining help deep learning? In Proceedings of the thirteenth international conference on artificial 157 Journal of Educational Data Mining, Volume 14, No 3, 2022 intelligence and statistics, Y. W. Teh and D. M. Titterington, Eds. JMLR Workshop and Conference Proceedings, Sardinia, Italy, 201–208.

EVANS, D. J., ZEUN, P., AND STANIER, R. A. 2014. Motivating student learning using a formative assessment journey. Journal of anatomy 224, 3, 296–303.

FAN, A., JERNITE, Y., PEREZ, E., GRANGIER, D., WESTON, J., AND AULI, M. 2019. Eli5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, A. Korhonen, D. Traum, and L. M`arquez, Eds. Association for Computational Linguistics, Florence, Italy, 3558–3567.

FEIGENBLAT, G., ROITMAN, H., BONI, O., AND KONOPNICKI, D. 2017. Unsupervised query-focused multi-document summarization using the cross entropy method. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, N. Kando, T. Sakai, and H. Joho, Eds. SIGIR ’17. Association for Computing Machinery, New York, NY, USA, 961–964.

GANESAN, K. 2006. Rouge 2.0: Updated and improved measures for evaluation of summarization tasks. Computational Linguistics 1, 1.

GEHMAN, S., GURURANGAN, S., SAP, M., CHOI, Y., AND SMITH, N. A. 2020. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y. He, and Y. Liu, Eds. Association for Computational Linguistics, online, 3356–3369.

GEHRINGER, E. F. 2020. A course as ecosystem: melding teaching, research, and practice. In 2020 ASEE Virtual Annual Conference Content Access. online.

GEHRMANN, S., DENG, Y., AND RUSH, A. M. 2018. Bottom-up abstractive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Association for Computational Linguistics, Brussels, Belgium, 4098–4109.

HATTIE, J. AND TIMPERLEY, H. 2007. The power of feedback. Review of educational research 77, 1, 81–112.

ILLIA, L., COLLEONI, E., AND ZYGLIDOPOULOS, S. 2022. Ethical implications of text generation in the age of artificial intelligence. Business Ethics, the Environment & Responsibility 32, 201–210.

JIA, Q., CUI, J., XIAO, Y., LIU, C., RASHID, P., AND GEHRINGER, E. 2021. All-in-one: Multitask learning bert models for evaluating peer assessments. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 474–479.

JIA, Q., YOUNG, M., XIAO, Y., CUI, J., LIU, C., RASHID, P., AND GEHRINGER, E. 2022. Insta- Reviewer: A Data-Driven Approach for Generating Instant Feedback on Students’ Project Reports. In Proceedings of the 15th International Conference on Educational Data Mining, A. Mitrovic, N. Bosch, A. I. Cristea, and C. Brown, Eds. Number July in EDM ’22. International Educational Data Mining Society, 5–16.

KOEDINGER, K. R., CORBETT, A., ET AL. 2006. Cognitive tutors: Technology bringing learning sciences to the classroom. Cambridge University Press.

KULIK, J. A. AND KULIK, C.-L. C. 1988. Timing of feedback and verbal learning. Review of educational research 58, 1, 79–97.

KUSAIRI, S. 2020. A web-based formative feedback system development by utilizing isomorphic multiple choice items to support physics teaching and learning. Journal of Technology and Science Education 10, 1, 117–126.

LEWIS, M., LIU, Y., GOYAL, N., GHAZVININEJAD, M., MOHAMED, A., LEVY, O., STOYANOV, V., AND ZETTLEMOYER, L. 2020. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds. Association for Computational Linguistics, Seattle, Washington, 7871–7880.

LI, J., TANG, T., ZHAO, W. X., NIE, J.-Y., AND WEN, J.-R. 2022. A survey of pretrained language models based text generation. arXiv preprint arXiv:2201.05273.

LIN, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, M.-F. Moens and S. Szpakowicz, Eds. Association for Computational Linguistics, Barcelona, Spain, 74–81.

LOSHCHILOV, I. AND HUTTER, F. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. arXiv preprint arXiv:1711.05101.

LU, C. AND CUTUMISU, M. 2021. Integrating deep learning into an automated feedback generation system for automated essay scoring. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 573–579.

MALIK, A., WU, M., VASAVADA, V., SONG, J., COOTS, M., MITCHELL, J., GOODMAN, N., AND PIECH, C. 2021. Generative grading: Near human-level accuracy for automated feedback on richly structured problems. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 275–286.

MALLAVARAPU, A. 2020. Exploration maps, beyond top scores: Designing formative feedback for openended problems. In Proceedings of the 13th International Conference on Educational Data Mining. International Educational Data Mining Society, online, 790–795.

MALMI, E., DONG, Y., MALLINSON, J., CHUKLIN, A., ADAMEK, J., MIRYLENKA, D., STAHLBERG, F., KRAUSE, S., KUMAR, S., AND SEVERYN, A. 2022. Text generation with text-editing models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts. Association for Computational Linguistics, Seattle, United States, 1–7.

MANNIX, E. AND NEALE, M. A. 2005. What differences make a difference? the promise and reality of diverse teams in organizations. Psychological science in the public interest 6, 2, 31–55.

MARWAN, S., SHI, Y., MENEZES, I., CHI, M., BARNES, T., AND PRICE, T. W. 2021. Just a few expert constraints can help: Humanizing data-driven subgoal detection for novice programming. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 68–80.

MENSINK, P. J. AND KING, K. 2020. Student access of online feedback is modified by the availability of assessment marks, gender and academic performance. British Journal of Educational Technology 51, 1, 10–22.

MITROVIC, A. 2003. An intelligent sql tutor on the web. International Journal of Artificial Intelligence in Education 13, 2-4, 173–197.

MITROVIC, A. 2010. Modeling domains and students with constraint-based modeling. In Advances in Intelligent Tutoring Systems, R. Nkambou, J. Bourdeau, and R. Mizoguchi, Eds. Springer Berlin Heidelberg, Berlin, Heidelberg, 63–80.

MITROVIC, A. 2012. Fifteen years of constraint-based tutors: what we have achieved and where we are going. User modeling and user-adapted interaction 22, 1, 39–72.

NAGATA, R., VILENIUS, M., AND WHITTAKER, E. 2014. Correcting preposition errors in learner english using error case frames and feedback messages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), H. W. Kristina Toutanova, Ed. Association for Computational Linguistics, Baltimore, Maryland, 754–764.

OLNEY, A. M. 2021. Generating response-specific elaborated feedback using long-form neural question answering. In Proceedings of the Eighth ACM Conference on Learning@ Scale, M. P´erez-Sanagust´ın, A. Ogan, and M. Specht, Eds. Association for Computing Machinery, New York, NY, 27–36.

ORR, J. W. AND RUSSELL, N. 2021. Automatic assessment of the design quality of python programs with personalized feedback. In Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021), I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 495–501.

PAPINENI, K., ROUKOS, S., WARD, T., AND ZHU, W.-J. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, P. Isabelle, Ed. The association for computational linguistics, Philadelphia, Pennsylvania, USA, 311–318.

PILAULT, J., LI, R., SUBRAMANIAN, S., AND PAL, C. 2020. On extractive and abstractive neural document summarization with transformer language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y. He, and Y. Liu, Eds. Association for Computational Linguistics, Online, 9308–9319.

POULOS, A. AND MAHONY, M. J. 2008. Effectiveness of feedback: The students’ perspective. Assessment & Evaluation in Higher Education 33, 2, 143–154.

RADFORD, A., WU, J., CHILD, R., LUAN, D., AMODEI, D., SUTSKEVER, I., ET AL. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8, 9.

ROSCHELLE, J., LESTER, J., AND FUSCO, J. 2020. Ai and the future of learning: Expert panel report. https://circls.org/reports/ai-report.

RUBINSTEIN, R. Y. AND KROESE, D. P. 2004. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-Carlo Simulation (Information Science and Statistics). Springer- Verlag, Berlin, Heidelberg.

SELLAM, T., DAS, D., AND PARIKH, A. 2020. Bleurt: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7881–7892.

TANG, G., MÜLLER, M., GONZALES, A. R., AND SENNRICH, R. 2018. Why self-attention? a targeted evaluation of neural machine translation architectures. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Association for Computational Linguistics, Brussels, Belgium, 4263–4272.

TYTLER, R. 2020. Stem education for the twenty-first century. In Integrated Approaches to STEM Education: An International Perspective, J. Anderson and Y. Li, Eds. Springer International Publishing, Cham, 21–43.

VASWANI, A., SHAZEER, N., PARMAR, N., USZKOREIT, J., JONES, L., GOMEZ, A. N., KAISER, Ł., AND POLOSUKHIN, I. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H.Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Vol. 30. Long Beach, CA.

VIJAYAKUMAR, A. K., COGSWELL, M., SELVARAJU, R. R., SUN, Q., LEE, S., CRANDALL, D., AND BATRA, D. 2017. Diverse beam search: Decoding diverse solutions from neural sequence models. In International Conference on Learning Representations. arXiv preprint arXiv:1610.02424.

WANG, W., FRASER, G., BARNES, T., MARTENS, C., AND PRICE, T. 2021. Automated classification of visual, interactive programs using execution traces. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 677–681.

WEITEKAMP, D., HARPSTEAD, E., AND KOEDINGER, K. R. 2020. An interaction design for machine teaching to develop ai tutors. In Proceedings of the 2020 CHI conference on human factors in computing systems, R. Bernhaupt, F. Mueller, D. Verweij, and J. Andres, Eds. Association for Computing Machinery, Hononlulu, HI, 1–11.

WINSTONE, N. E. AND BOUD, D. 2022. The need to disentangle assessment and feedback in higher education. Studies in Higher Education 47, 3, 656–667.

WOODS, B., ADAMSON, D., MIEL, S., AND MAYFIELD, E. 2017. Formative essay feedback using predictive scoring models. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, S. Matwin, S. Yu, and F. Farooq, Eds. Association for Computing Machinery, Halifax, NS, Canada, 2071–2080.

XIAO, Y., ZINGLE, G., JIA, Q., SHAH, H. R., ZHANG, Y., LI, T., KAROVALIYA, M., ZHAO, W., SONG, Y., JI, J., ET AL. 2020. Detecting problem statements in peer assessments. In Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020), A. N. Rafferty, C. R. Jacob Whitehill, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society, Montreal, Canada, 704–709.

YOUNG, S. 2006. Student views of effective online teaching in higher education. The American Journal of Distance Education 20, 2, 65–77.

YUAN, W., LIU, P., AND NEUBIG, G. 2022. Can we automate scientific reviewing? Journal of Artificial Intelligence Research 75, 171–212.

YUAN, W., NEUBIG, G., AND LIU, P. 2021. Bartscore: Evaluating generated text as text generation. In Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds. Vol. 34. Morgan Kaufmann Publishers Inc., online.

ZAHEER, M., GURUGANESH, G., DUBEY, A., AINSLIE, J., ALBERTI, C., ONTANON, S., PHAM, P., RAVULA, A., WANG, Q., YANG, L., AND AHMED, A. 2020. Big bird: Transformers for longer sequences. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds. Curran Associates Inc., Vancouver, Canada.

ZHANG, T., KISHORE, V., WU, F., WEINBERGER, K. Q., AND ARTZI, Y. 2020. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations. Addis Ababa, Ethiopia.

ZHI, R., MARWAN, S., DONG, Y., LYTLE, N., PRICE, T. W., AND BARNES, T. 2019. Toward datadriven example feedback for novice programming. In Proceedings of the 12th International Conference on Educational Data Mining, C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. International Educational Data Mining Society, 218–227.

ZINGLE, G., RADHAKRISHNAN, B., XIAO, Y., GEHRINGER, E., XIAO, Z., PRAMUDIANTO, F., KHURANA, G., AND ARNAV, A. 2019. Detecting suggestions in peer assessments. In Proceedings of the 12th International Conference on Educational Data Mining, C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. International Educational Data Mining Society, 474–479.
Section
Extended Articles from the EDM 2022 Conference

Most read articles by the same author(s)