Instant feedback plays a vital role in promoting academic achievement and student success. In practice,
however, delivering timely feedback to students can be challenging for instructors for a variety of reasons
(e.g., limited teaching resources). In many cases, feedback arrives too late for learners to act on
the advice and reinforce their understanding. To this end, researchers have designed various automated
feedback systems in different domains, including novice programming, short-essay writing, and openended
questions. To the best of our knowledge, no previous study has investigated automated feedback
generation for a more complex form of student work - student project reports. In this work, we present
a novel data-driven system, dubbed Insta-Reviewer, for automatically generating instant feedback on
student project reports. In addition to automatic metrics such as ROUGE scores and BERTScore, we
propose a five-dimension framework for manually evaluating system-generated feedback. Experimental
results show that feedback generated by Insta-Reviewer on real students’ project reports can achieve
near-human quality. Our work demonstrates the feasibility of automatic feedback generation for students’
project reports while highlighting several prominent challenges for future research.
How to Cite
automated feedback generation, automated review generation, instant feedback, learning at scale, mining educational data
ARIELY, M., NAZARETSKY, T., AND ALEXANDRON, G. 2020. Fisrt steps towards NLP-based formative feeadback to imporove scientific writing in Hebrew. In Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020), A. N. Rafferty, C. R. Jacob Whitehill, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society, Montreal, Canada, 565–568.
BASU, S., JACOBS, C., AND VANDERWENDE, L. 2013. Powergrading: a clustering approach to amplify human effort for short answer grading. Transactions of the Association for Computational Linguistics 1, 391–402.
BENDER, E. M., GEBRU, T., MCMILLAN-MAJOR, A., AND SHMITCHELL, S. 2021. On the dangers of stochastic parrots: Can language models be too big? In FAccT 2021 - Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Vol. 1. Association for Computing Machinery, New York, NY, 610–623.
BOROWCZAK, M. ET AL. 2015. Communication in stem education: A non-intrusive method for assessment & k20 educator feedback. Problems of Education in the 21st Century 65, 1, 18–27.
BOTARLEANU, R.-M., DASCALU, M., SIRBU, M.-D., CROSSLEY, S. A., AND TRAUSAN-MATU, S. 2018. Readme–generating personalized feedback for essay writing using the readerbench framework. In Conference on Smart Learning Ecosystems and Regional Development, H. Knoche, E. Popescu, and A. Cartelli, Eds. SLERD ’18. Springer, 133–145.
CALIGIURI, P. AND THOMAS, D. C. 2013. From the editors: How to write a high-quality review. Journal of International Business Studies 44, 6, 547–553.
CALLISON-BURCH, C., OSBORNE, M., AND KOEHN, P. 2006. Re-evaluating the role of bleu in machine translation research. In 11th conference of the european chapter of the association for computational linguistics, D. McCarthy and S. Wintner, Eds. Association for Computational Linguistics, Trento, Italy, 249–256.
CARLESS, D., SALTER, D., YANG, M., AND LAM, J. 2011. Developing sustainable feedback practices. Studies in higher education 36, 4, 395–407.
CELIKYILMAZ, A., CLARK, E., AND GAO, J. 2020. Evaluation of text generation: A survey. arXiv preprint arXiv:2006.14799.
DEEVA, G., BOGDANOVA, D., SERRAL, E., SNOECK, M., AND DE WEERDT, J. 2021. A review of automated feedback systems for learners: Classification framework, challenges and opportunities. Computers & Education 162, 104094.
DEVLIN, J., CHANG, M.-W., LEE, K., AND TOUTANOVA, K. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.
DUŠEK, O. AND KASNER, Z. 2020. Evaluating semantic accuracy of data-to-text generation with natural language inference. In Proceedings of the 13th International Conference on Natural Language Generation, B. Davis, Y. Graham, J. Kelleher, and Y. Sripada, Eds. Association for Computational Linguistics, Dublin, Ireland, 131–137.
ERHAN, D., COURVILLE, A., BENGIO, Y., AND VINCENT, P. 2010. Why does unsupervised pretraining help deep learning? In Proceedings of the thirteenth international conference on artificial 157 Journal of Educational Data Mining, Volume 14, No 3, 2022 intelligence and statistics, Y. W. Teh and D. M. Titterington, Eds. JMLR Workshop and Conference Proceedings, Sardinia, Italy, 201–208.
EVANS, D. J., ZEUN, P., AND STANIER, R. A. 2014. Motivating student learning using a formative assessment journey. Journal of anatomy 224, 3, 296–303.
FAN, A., JERNITE, Y., PEREZ, E., GRANGIER, D., WESTON, J., AND AULI, M. 2019. Eli5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, A. Korhonen, D. Traum, and L. M`arquez, Eds. Association for Computational Linguistics, Florence, Italy, 3558–3567.
FEIGENBLAT, G., ROITMAN, H., BONI, O., AND KONOPNICKI, D. 2017. Unsupervised query-focused multi-document summarization using the cross entropy method. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, N. Kando, T. Sakai, and H. Joho, Eds. SIGIR ’17. Association for Computing Machinery, New York, NY, USA, 961–964.
GANESAN, K. 2006. Rouge 2.0: Updated and improved measures for evaluation of summarization tasks. Computational Linguistics 1, 1.
GEHMAN, S., GURURANGAN, S., SAP, M., CHOI, Y., AND SMITH, N. A. 2020. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y. He, and Y. Liu, Eds. Association for Computational Linguistics, online, 3356–3369.
GEHRINGER, E. F. 2020. A course as ecosystem: melding teaching, research, and practice. In 2020 ASEE Virtual Annual Conference Content Access. online.
GEHRMANN, S., DENG, Y., AND RUSH, A. M. 2018. Bottom-up abstractive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Association for Computational Linguistics, Brussels, Belgium, 4098–4109.
HATTIE, J. AND TIMPERLEY, H. 2007. The power of feedback. Review of educational research 77, 1, 81–112.
ILLIA, L., COLLEONI, E., AND ZYGLIDOPOULOS, S. 2022. Ethical implications of text generation in the age of artificial intelligence. Business Ethics, the Environment & Responsibility 32, 201–210.
JIA, Q., CUI, J., XIAO, Y., LIU, C., RASHID, P., AND GEHRINGER, E. 2021. All-in-one: Multitask learning bert models for evaluating peer assessments. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 474–479.
JIA, Q., YOUNG, M., XIAO, Y., CUI, J., LIU, C., RASHID, P., AND GEHRINGER, E. 2022. Insta- Reviewer: A Data-Driven Approach for Generating Instant Feedback on Students’ Project Reports. In Proceedings of the 15th International Conference on Educational Data Mining, A. Mitrovic, N. Bosch, A. I. Cristea, and C. Brown, Eds. Number July in EDM ’22. International Educational Data Mining Society, 5–16.
KOEDINGER, K. R., CORBETT, A., ET AL. 2006. Cognitive tutors: Technology bringing learning sciences to the classroom. Cambridge University Press.
KULIK, J. A. AND KULIK, C.-L. C. 1988. Timing of feedback and verbal learning. Review of educational research 58, 1, 79–97.
KUSAIRI, S. 2020. A web-based formative feedback system development by utilizing isomorphic multiple choice items to support physics teaching and learning. Journal of Technology and Science Education 10, 1, 117–126.
LEWIS, M., LIU, Y., GOYAL, N., GHAZVININEJAD, M., MOHAMED, A., LEVY, O., STOYANOV, V., AND ZETTLEMOYER, L. 2020. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds. Association for Computational Linguistics, Seattle, Washington, 7871–7880.
LI, J., TANG, T., ZHAO, W. X., NIE, J.-Y., AND WEN, J.-R. 2022. A survey of pretrained language models based text generation. arXiv preprint arXiv:2201.05273.
LIN, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, M.-F. Moens and S. Szpakowicz, Eds. Association for Computational Linguistics, Barcelona, Spain, 74–81.
LOSHCHILOV, I. AND HUTTER, F. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. arXiv preprint arXiv:1711.05101.
LU, C. AND CUTUMISU, M. 2021. Integrating deep learning into an automated feedback generation system for automated essay scoring. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 573–579.
MALIK, A., WU, M., VASAVADA, V., SONG, J., COOTS, M., MITCHELL, J., GOODMAN, N., AND PIECH, C. 2021. Generative grading: Near human-level accuracy for automated feedback on richly structured problems. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 275–286.
MALLAVARAPU, A. 2020. Exploration maps, beyond top scores: Designing formative feedback for openended problems. In Proceedings of the 13th International Conference on Educational Data Mining. International Educational Data Mining Society, online, 790–795.
MALMI, E., DONG, Y., MALLINSON, J., CHUKLIN, A., ADAMEK, J., MIRYLENKA, D., STAHLBERG, F., KRAUSE, S., KUMAR, S., AND SEVERYN, A. 2022. Text generation with text-editing models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts. Association for Computational Linguistics, Seattle, United States, 1–7.
MANNIX, E. AND NEALE, M. A. 2005. What differences make a difference? the promise and reality of diverse teams in organizations. Psychological science in the public interest 6, 2, 31–55.
MARWAN, S., SHI, Y., MENEZES, I., CHI, M., BARNES, T., AND PRICE, T. W. 2021. Just a few expert constraints can help: Humanizing data-driven subgoal detection for novice programming. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 68–80.
MENSINK, P. J. AND KING, K. 2020. Student access of online feedback is modified by the availability of assessment marks, gender and academic performance. British Journal of Educational Technology 51, 1, 10–22.
MITROVIC, A. 2003. An intelligent sql tutor on the web. International Journal of Artificial Intelligence in Education 13, 2-4, 173–197.
MITROVIC, A. 2010. Modeling domains and students with constraint-based modeling. In Advances in Intelligent Tutoring Systems, R. Nkambou, J. Bourdeau, and R. Mizoguchi, Eds. Springer Berlin Heidelberg, Berlin, Heidelberg, 63–80.
MITROVIC, A. 2012. Fifteen years of constraint-based tutors: what we have achieved and where we are going. User modeling and user-adapted interaction 22, 1, 39–72.
NAGATA, R., VILENIUS, M., AND WHITTAKER, E. 2014. Correcting preposition errors in learner english using error case frames and feedback messages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), H. W. Kristina Toutanova, Ed. Association for Computational Linguistics, Baltimore, Maryland, 754–764.
OLNEY, A. M. 2021. Generating response-specific elaborated feedback using long-form neural question answering. In Proceedings of the Eighth ACM Conference on Learning@ Scale, M. P´erez-Sanagust´ın, A. Ogan, and M. Specht, Eds. Association for Computing Machinery, New York, NY, 27–36.
ORR, J. W. AND RUSSELL, N. 2021. Automatic assessment of the design quality of python programs with personalized feedback. In Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021), I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 495–501.
PAPINENI, K., ROUKOS, S., WARD, T., AND ZHU, W.-J. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, P. Isabelle, Ed. The association for computational linguistics, Philadelphia, Pennsylvania, USA, 311–318.
PILAULT, J., LI, R., SUBRAMANIAN, S., AND PAL, C. 2020. On extractive and abstractive neural document summarization with transformer language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y. He, and Y. Liu, Eds. Association for Computational Linguistics, Online, 9308–9319.
POULOS, A. AND MAHONY, M. J. 2008. Effectiveness of feedback: The students’ perspective. Assessment & Evaluation in Higher Education 33, 2, 143–154.
RADFORD, A., WU, J., CHILD, R., LUAN, D., AMODEI, D., SUTSKEVER, I., ET AL. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8, 9.
ROSCHELLE, J., LESTER, J., AND FUSCO, J. 2020. Ai and the future of learning: Expert panel report. https://circls.org/reports/ai-report.
RUBINSTEIN, R. Y. AND KROESE, D. P. 2004. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-Carlo Simulation (Information Science and Statistics). Springer- Verlag, Berlin, Heidelberg.
SELLAM, T., DAS, D., AND PARIKH, A. 2020. Bleurt: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7881–7892.
TANG, G., MÜLLER, M., GONZALES, A. R., AND SENNRICH, R. 2018. Why self-attention? a targeted evaluation of neural machine translation architectures. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Association for Computational Linguistics, Brussels, Belgium, 4263–4272.
TYTLER, R. 2020. Stem education for the twenty-first century. In Integrated Approaches to STEM Education: An International Perspective, J. Anderson and Y. Li, Eds. Springer International Publishing, Cham, 21–43.
VASWANI, A., SHAZEER, N., PARMAR, N., USZKOREIT, J., JONES, L., GOMEZ, A. N., KAISER, Ł., AND POLOSUKHIN, I. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H.Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Vol. 30. Long Beach, CA.
VIJAYAKUMAR, A. K., COGSWELL, M., SELVARAJU, R. R., SUN, Q., LEE, S., CRANDALL, D., AND BATRA, D. 2017. Diverse beam search: Decoding diverse solutions from neural sequence models. In International Conference on Learning Representations. arXiv preprint arXiv:1610.02424.
WANG, W., FRASER, G., BARNES, T., MARTENS, C., AND PRICE, T. 2021. Automated classification of visual, interactive programs using execution traces. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 677–681.
WEITEKAMP, D., HARPSTEAD, E., AND KOEDINGER, K. R. 2020. An interaction design for machine teaching to develop ai tutors. In Proceedings of the 2020 CHI conference on human factors in computing systems, R. Bernhaupt, F. Mueller, D. Verweij, and J. Andres, Eds. Association for Computing Machinery, Hononlulu, HI, 1–11.
WINSTONE, N. E. AND BOUD, D. 2022. The need to disentangle assessment and feedback in higher education. Studies in Higher Education 47, 3, 656–667.
WOODS, B., ADAMSON, D., MIEL, S., AND MAYFIELD, E. 2017. Formative essay feedback using predictive scoring models. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, S. Matwin, S. Yu, and F. Farooq, Eds. Association for Computing Machinery, Halifax, NS, Canada, 2071–2080.
XIAO, Y., ZINGLE, G., JIA, Q., SHAH, H. R., ZHANG, Y., LI, T., KAROVALIYA, M., ZHAO, W., SONG, Y., JI, J., ET AL. 2020. Detecting problem statements in peer assessments. In Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020), A. N. Rafferty, C. R. Jacob Whitehill, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society, Montreal, Canada, 704–709.
YOUNG, S. 2006. Student views of effective online teaching in higher education. The American Journal of Distance Education 20, 2, 65–77.
YUAN, W., LIU, P., AND NEUBIG, G. 2022. Can we automate scientific reviewing? Journal of Artificial Intelligence Research 75, 171–212.
YUAN, W., NEUBIG, G., AND LIU, P. 2021. Bartscore: Evaluating generated text as text generation. In Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds. Vol. 34. Morgan Kaufmann Publishers Inc., online.
ZAHEER, M., GURUGANESH, G., DUBEY, A., AINSLIE, J., ALBERTI, C., ONTANON, S., PHAM, P., RAVULA, A., WANG, Q., YANG, L., AND AHMED, A. 2020. Big bird: Transformers for longer sequences. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds. Curran Associates Inc., Vancouver, Canada.
ZHANG, T., KISHORE, V., WU, F., WEINBERGER, K. Q., AND ARTZI, Y. 2020. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations. Addis Ababa, Ethiopia.
ZHI, R., MARWAN, S., DONG, Y., LYTLE, N., PRICE, T. W., AND BARNES, T. 2019. Toward datadriven example feedback for novice programming. In Proceedings of the 12th International Conference on Educational Data Mining, C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. International Educational Data Mining Society, 218–227.
ZINGLE, G., RADHAKRISHNAN, B., XIAO, Y., GEHRINGER, E., XIAO, Z., PRAMUDIANTO, F., KHURANA, G., AND ARNAV, A. 2019. Detecting suggestions in peer assessments. In Proceedings of the 12th International Conference on Educational Data Mining, C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. International Educational Data Mining Society, 474–479.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.