Less But Enough: Evaluation of peer reviews through pseudo-labeling with less annotated data

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Jun 21, 2023
Chengyuan Liu Divyang Doshi Ruixuan Shang Jialin Cui Qinjin Jia Edward Gehringer

Abstract

A peer-assessment system provides a structured learning process for students and allows them to write
textual feedback on each other’s assignments and projects. This helps instructors or teaching assistants
perform a more comprehensive evaluation of students’ work. However, the contribution of peer assessment
to students’ learning relies heavily on the quality of the review. Therefore, a thorough evaluation
of the quality of peer assessment is essential to assuring that the process will benefit students’ learning.
Previous studies have focused on applying machine learning to evaluate peer assessment by identifying
characteristics of reviews (e.g., Do they mention a problem, make a suggestion, or tell the students where
to make a change?). Unfortunately, collecting ground-truth labels of the characteristics is an arbitrary,
subjective, and labor-intensive task. Besides in most cases, those labels are assigned by students, not all
of whom are reliable as a source of labeling. In this study, we propose a semi-supervised pseudo-labeling
approach to build a robust peer assessment evaluation system to utilize large unlabeled datasets along
with only a small amount of labeled data. We aim to evaluate the peer assessment from two angles: Detect
a problem statement (Does the reviewer mention a problem with the work?) and suggestion (Does
the reviewer give a suggestion to the author?)

How to Cite

Liu, C., Doshi, D., Shang, R., Cui, J., Jia, Q., & Gehringer, E. (2023). Less But Enough: Evaluation of peer reviews through pseudo-labeling with less annotated data. Journal of Educational Data Mining, 15(2), 123–140. https://doi.org/10.5281/zenodo.7304981
Abstract 286 | PDF Downloads 225

##plugins.themes.bootstrap3.article.details##

Keywords

peer assessment evaluation, semi-supervised learning, pseudo labeling, problem statement detection, suggestion detection

References
ARAZO, E., ORTEGO, D., ALBERT, P., O’CONNOR, N. E., AND MCGUINNESS, K. 2020. Pseudolabeling and confirmation bias in deep semi-supervised learning. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.

CALIGIURI, P. AND THOMAS, D. C. 2013. From the Editors: How to write a high-quality review. Journal of International Business Studies 44, 6, 547–553.

CASCANTE-BONILLA, P., TAN, F., QI, Y., AND ORDONEZ, V. 2021. Curriculum labeling: Revisiting pseudo-labeling for semi-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. 6912–6920.

CHENG, W. AND WARREN, M. 2000. Making a difference: Using peers to assess individual students’ contributions to a group project. Teaching in Higher Education 5, 2, 243–255.

CHO, K. 2008. Machine classification of peer comments in physics. In Proceedings of the 1st International Conference on Educational Data Mining (EDM 2008), R. S. J. Baker, T. Tiffany Barnes, and J. Beck, Eds. International Educational Data Mining Society, 192–196.

DEMIRASLAN ÇEVIK, Y., HAŞLAMAN, T., AND ÇELIK, S. 2015. The effect of peer assessment on problem solving skills of prospective teachers supported by online learning activities. Studies in Educational Evaluation 44, March, 23–35.

DEVLIN, J., CHANG, M.-W., LEE, K., AND TOUTANOVA, K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.

DOUBLE, K. S., MCGRANE, J. A., AND HOPFENBECK, T. N. 2020. The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies. Educational Psychology Review 32, 2, 481–509.

DU, J., GRAVE, E., GUNEL, B., CHAUDHARY, V., CELEBI, O., AULI, M., STOYANOV, V., AND CONNEAU, A. 2021. Self-training improves pre-training for natural language understanding. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, Eds. Association for Computational Linguistics, online, 5408–5418.

FROMM, M., FAERMAN, E., BERRENDORF, M., BHARGAVA, S., QI, R., ZHANG, Y., DENNERT, L., SELLE, S., MAO, Y., AND SEIDL, T. Argument Mining Driven Analysis of Peer-Reviews. Tech. rep.

GARCIA, R. M. C. 2010. Exploring document clustering techniques for personalized peer assessment in exploratory courses. In Proceedings of the Workshop Computer-Supported Peer Review in Education (CSPRED-2010) held in conjunction with the Tenth International Conference on Intelligent Tutoring Systems (ITS 2010), I. Goldin, P. Brusilovsky, C. Schunn, K. Ashley, and I.-H. Hsiao, Eds.

GEHRINGER, E., EHRESMAN, L., GCONGER, S. G., AND WAGLE, P. 2007. Reusable Learning Objects Through Peer Review: The Expertiza Approach. innovate Journal of On-Line education 3, 5.

GOLDBERG, X. 2009. Introduction to semi-supervised learning. Vol. 6.

GRANDVALET, Y. AND BENGIO, Y. 2004. Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems, L. Saul, Y. Weiss, and L. Bottou, Eds. Vol. 17. MIT Press, 529–536.

HALLGREN, K. A. 2012. Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in quantitative methods for psychology 8, 1, 23.

JIA, Q., CUI, J., XIAO, Y., LIU, C., RASHID, P., AND GEHRINGER, E. F. 2021. All-in-one: Multitask learning bert models for evaluating peer assessments. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), I.-H. Hsiao, S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 525–532.

KANG, P., KIM, D., AND CHO, S. 2016. Semi-supervised support vector regression based on selftraining with label uncertainty: An application to virtual metrology in semiconductor manufacturing. Expert Systems with Applications 51, 85–106.

LEE, D.-H. ET AL. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning (WREPL) held in conjunction with the International Conference on Machine Learning (ICML). Vol. 3. 1–6.

LI, H., XIONG, Y., HUNTER, C. V., GUO, X., AND TYWONIW, R. 2020. Does peer assessment promote student learning? A meta-analysis. Assessment and Evaluation in Higher Education 45, 2, 193–211.

LIU, C., CUI, J., SHANG, R., XIAO, Y., JIA, Q., AND GEHRINGER, E. 2022. Improving problem detection in peer assessment through pseudo-labeling using semi-supervised learning. In Proceedings of the 15th International Conference on Educational Data Mining (EDM 2022), A. Mitrovic and N. Bosch, Eds. International Educational Data Mining Society, 391–397.

LIU, X. AND LI, L. 2014. Assessment training effects on student assessment skills and task performance in a technology-facilitated peer assessment. Assessment and Evaluation in Higher Education 39, 3, 275–292.

LUNDSTROM, K. AND BAKER, W. 2009. To give is better than to receive: The benefits of peer review to the reviewer’s own writing. Journal of Second Language Writing 18, 1, 30–43.

MUGNAI, D., PERNICI, F., TURCHINI, F., AND DEL BIMBO, A. 2021. Soft pseudo-labeling semisupervised learning applied to fine-grained visual classification. In Pattern Recognition. ICPR International Workshops and Challenges, A. Del Bimbo, R. Cucchiara, S. Sclaroff, G. M. Farinella, T. Mei, M. Bertini, H. J. Escalante, and R. Vezzani, Eds. Springer International Publishing, Cham, 102–110.

NELSON, M. M. AND SCHUNN, C. D. 2009. The nature of feedback: How different types of peer feedback affect writing performance. Instructional Science 37, 4, 375–401.

NILSSON, R. 1960. A preliminary report on a boring through middle ordovician strata in western scania (sweden). Geologiska Föreningen i Stockholm Förhandlingar 82, 2, 218–226.

NYE, B. D., CORE, M. G., JAISWA, S., GHOSAL, A., AND AUERBACH, D. 2021. Acting engaged: Leveraging play persona archetypes for semi-supervised classification of engagement. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), I.-H. Hsiao, S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 240–251.

RAMACHANDRAN, L., GEHRINGER, E. F., AND YADAV, R. K. 2017. Automated Assessment of the Quality of Peer Reviews using Natural Language Processing Techniques. International Journal of Artificial Intelligence in Education 27, 3, 534–581.

SÁNDOR, A. AND VORNDRAN, A. 2009. Detecting key sentences for automatic assistance in peer reviewing research articles in educational sciences. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries. NLPIR4DL ’09. Association for Computational Linguistics, USA, 36–44.

SHI, W., GONG, Y., DING, C., MA, Z., TAO, X., AND ZHENG, N. 2018. Transductive semi-supervised deep learning using min-max features. In Computer Vision – ECCV 2018, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds. Springer International Publishing, Cham, 311–327.

SONG, Y., HU, Z., AND GEHRINGER, E. F. 2015. Closing the circle: Use of students’ responses for peer-assessment rubric improvement. In Advances in Web-Based Learning – ICWL 2015, F. W. Li, R. Klamma, M. Laanpere, J. Zhang, B. F. Manjón, and R. W. Lau, Eds. Springer International Publishing, Cham, 27–36.

SUEN, H. K. 2014. Peer assessment for massive open online courses (MOOCs). International Review of Research in Open and Distance Learning 15, 3, 312–327.

TARVAINEN, A. AND VALPOLA, H. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the 31st International Conference on Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. NIPS’17. Curran Associates Inc., Red Hook, NY, USA, 1195–1204.

TOPPING, K. 1998. Peer assessment between students in colleges and universities. Review of Educational Research 68, 3, 249–276.

TOPPING, K. J. 2009. Peer assessment. Theory into Practice 48, 1, 20–27.

VAN ZUNDERT, M., SLUIJSMANS, D., AND VAN MERRÏE NBOER, J. 2010. Effective peer assessment processes: Research findings and future directions. Learning and Instruction 20, 4, 270–279.

VASWANI, A., SHAZEER, N., PARMAR, N., USZKOREIT, J., JONES, L., GOMEZ, A. N., KAISER, L. U., AND POLOSUKHIN, I. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. NIPS’17, vol. 30. Curran Associates, Inc., Red Hook, NY, USA, 6000–6010.

XIAO, Y., ZINGLE, G., JIA, Q., AKBAR, S., SONG, Y., DONG, M., QI, L., AND GEHRINGER, E. 2020. Problem detection in peer assessments between subjects by effective transfer learning and active learning. In Proceedings of the 13th International Conference on Educational Data Mining (EDM 2020), A. N. Rafferty, J. Whitehill, C. Romero, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society, 516–523.

XIAO, Y., ZINGLE, G., JIA, Q., SHAH, H. R., ZHANG, Y., LI, T., KAROVALIYA, M., ZHAO, W., SONG, Y., JI, J., ET AL. 2020. Detecting problem statements in peer assessments. In Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020), A. N. Rafferty, J. Whitehill, C. Romero, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society, 704–709.

XIE, Q., LUONG, M. T., HOVY, E., AND LE, Q. V. 2020. Self-training with noisy student improves imagenet classification. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 10684–10695.

XIONG, W. AND LITMAN, D. 2010. Identifying problem localization in peer-review feedback. In Intelligent Tutoring Systems, V. Aleven, J. Kay, and J. Mostow, Eds. Springer Berlin Heidelberg, Berlin, Heidelberg, 429–431.

XIONG, W. AND LITMAN, D. 2011. Automatically predicting peer-review helpfulness. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, D. Lin, Y. Matsumoto, and R. Mihalcea, Eds. Association for Computational Linguistics, Portland, Oregon, USA, 502–507.

XIONG, W., LITMAN, D., AND SCHUNN, C. 2010. Assessing reviewers’ performance based on mining problem localization in peer-review data. In Proceedings of the 3rd International Conference on Educational Data Mining (EDM 2010), R. S. Baker, A. Merceron, and P. I. Pavlik, Eds. International Educational Data Mining Society, 211–220.

ZHOU, Z. H. 2018. A brief introduction to weakly supervised learning. National Science Review 5, 1, 44–53.

ZHOU, Z.-H. AND LI, M. 2005. Semi-supervised regression with co-training. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. IJCAI’05. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 908–913.

ZINGLE, G., RADHAKRISHNAN, B., XIAO, Y., GEHRINGER, E., XIAO, Z., PRAMUDIANTO, F., KHURANA, G., AND ARNAV, A. 2019. Detecting suggestions in peer assessments. In Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019), C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. International Educational Data Mining Society, 474–479.
Section
Extended Articles from the EDM 2022 Conference

Most read articles by the same author(s)