Modelling Argument Quality in Technology-Mediated Peer Instruction

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Dec 26, 2023
Sameer Bhatnagar Michel Desmarais Amal Zouaq

Abstract

Learnersourcing is the process by which students submit content that enriches the bank of learning materials
available to their peers, all as an authentic part of their learning experience. One example of learnersourcing
is Technology-Mediated Peer Instruction (TMPI), whereby students are prompted to submit
explanations to justify their choice in a multiple-choice question (MCQ), and are subsequently presented
with explanations written by their peers, after which they can reconsider their own answer. TMPI allows
students to contrast their reasoning with a variety of peer-submitted explanations. It is intended to foster
reflection, ultimately leading to better learning. However, not all content submitted by students is adequate
and it must be curated, a process that can require a significant effort by the teacher. The curation
process ought to be automated for learnersourcing in TMPI to scale up to large classes, such as MOOCs.
Even for smaller settings, automation is critical for the timely curation of student-submitted content, such
as within a single assignment, or during a semester.
We adapt methods from argument mining and natural language processing to address the curation
challenge and assess the quality of student answers submitted in TMPI, as judged by their peers. The
curation task is confined to the prediction of argument convincingness: an explanation submitted by a
learner is considered of good quality, if it is convincing to their peers. We define a methodology to
measure convincingness scores using three methods, Bradley-Terry, Crowd-Bradley-Terry and WinRate.
We assess the performance of feature-rich supervised learning algorithms as well as transformer-based
neural approach to predict convincingness using these scores. Experiments are conducted over different
domains, from ethics to STEM. While the neural approach shows the greatest correlation between its
prediction and the different convincingness measures, results show that success on this task is highly dependent
on the domain and the type of question.

How to Cite

Bhatnagar, S., Desmarais, M., & Zouaq, A. (2023). Modelling Argument Quality in Technology-Mediated Peer Instruction. Journal of Educational Data Mining, 15(3), 26–57. https://doi.org/10.5281/zenodo.10391483
Abstract 88 | HTML Downloads 45 PDF Downloads 95

##plugins.themes.bootstrap3.article.details##

Keywords

learnersourcing, comparative peer evaluation, text mining, convincingness, Technology-Mediated Peer Instruction (TMPI)

References
ALDOUS, D. 2017. Elo ratings and the sports model: A neglected topic in applied probability? Statistical science 32, 4, 616–629.

BENTAHAR, J., MOULIN, B., AND BÉLANGER, M. 2010. A taxonomy of argumentation models used for knowledge representation. Artificial Intelligence Review 33, 3, 211–259.

BHATNAGAR, S., DESMARAIS, M., WHITTAKER, C., LASRY, N., DUGDALE, M., AND CHARLES, E. S. 2015. An analysis of peer-submitted and peer-reviewed answer rationales, in an asynchronous Peer Instruction based learning environment. In Proceedings of the 8th International Conference on Educational Data Mining, O. C. Santos, J. G. Boticario, C. Romero, M. Pechenizkiy, A. Merceron, P. Mitros, J. M. Luna, C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, and M. Desmarais, Eds. International Educational Data Mining Society, Madrid, Spain, 456–459.

BHATNAGAR, S., ZOUAQ, A., DESMARAIS, M., AND CHARLES, E. 2020a. A Dataset of Learnersourced Explanations from an Online Peer Instruction Environment. In International Conference on Educational Data Mining (EDM) (13th, Online, Jul 10-13, 2020), A. N. Rafferty, J. Whitehill, C. Romero, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society, 350–355.

BHATNAGAR, S., ZOUAQ, A., DESMARAIS, M. C., AND CHARLES, E. 2020b. Learnersourcing Quality Assessment of Explanations for Peer Instruction. In Addressing Global Challenges and Quality Education, C. Alario-Hoyos, M. J. Rodríguez-Triana, M. Scheffel, I. Arnedillo-Sánchez, and S. M. Dennerlein, Eds. Springer International Publishing, Cham, 144–157.

BLOOM, B. S., ENGELHART, M. D., FURST, E., HILL, W. H., AND KRATHWOHL, D. R. 1956. Handbook I: Cognitive Domain. David McKay, New York.

BRADLEY, R. A. AND TERRY, M. E. 1952. Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39, 3/4, 324–345.

BREIMAN, L. 2001. Random forests. Machine learning 45, 1, 5–32.

BURROWS, S., GUREVYCH, I., AND STEIN, B. 2015. The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education 25, 1, 60–117.

CAMBRE, J., KLEMMER, S., AND KULKARNI, C. 2018. Juxtapeer: Comparative Peer Review Yields Higher Quality Feedback and Promotes Deeper Reflection. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI ’18. ACM Press, Montreal QC, Canada, 1–13.

CHARLES, E. S., LASRY, N., BHATNAGAR, S., ADAMS, R., LENTON, K., BROUILLETTE, Y., DUGDALE, M., WHITTAKER, C., AND JACKSON, P. 2019. Harnessing peer instruction in- and out- of class with myDALITE. In Fifteenth Conference on Education and Training in Optics and Photonics: ETOP 2019. Optical Society of America, Quebec City, Canada, 11143 89.

CHARLES, E. S., LASRY, N., WHITTAKER, C., DUGDALE, M., LENTON, K., BHATNAGAR, S., AND GUILLEMETTE, J. 2015. Beyond and Within Classroom Walls: Designing Principled Pedagogical Tools for Student and Faculty Uptake. In Exploring the Material Conditions of Learning: The Computer Supported Collaborative Learning (CSCL) Conference 2015, Volume 1, O. Lindwall, P. Hakkinen, T. Koschman, P. Tchounikine, and S. Ludvigsen, Eds. International Society of the Learning Sciences, Inc.[ISLS], 292-299.

CHEN, X., BENNETT, P. N., COLLINS-THOMPSON, K., AND HORVITZ, E. 2013. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data mining. Association for Computing Machinery, New York, New York, 193–202.

CHI, M. T., LEEUW, N., CHIU, M.-H., AND LAVANCHER, C. 1994. Eliciting self-explanations improves understanding. Cognitive science 18, 3, 439–477.

CLINE, K., HUCKABY, D. A., AND ZULLO, H. 2021. Identifying Clicker Questions that Provoke Rich Discussions in Introductory Statistics. PRIMUS 0, ja, 1–32.

CROUCH, C. H. AND MAZUR, E. 2001. Peer instruction: Ten years of experience and results. American Journal of Physics 69, 9, 970–977.

DEERWESTER, S. C., DUMAIS, S. T., LANDAUER, T. K., FURNAS, G. W., AND HARSHMAN, R. A. 1990. Indexing by latent semantic analysis. JASIS 41, 6, 391–407.

DENNY, P., HAMER, J., LUXTON-REILLY, A., AND PURCHASE, H. 2008. PeerWise: Students Sharing Their Multiple Choice Questions. In Proceedings of the Fourth InternationalWorkshop on Computing Education Research. ICER ’08. Association for Computing Machinery, New York, NY, USA, 51–58. event-place: Sydney, Australia.

DEVLIN, J., CHANG, M.-W., LEE, K., AND TOUTANOVA, K. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

ELO, A. E. 1978. The rating of chessplayers, past and present. Arco Pub.

FALK, N. AND LAPESA, G. 2023. Bridging argument quality and deliberative quality annotations with adapters. In Findings of the Association for Computational Linguistics: EACL 2023, A. Vlachos and I. Augenstein, Eds. Association for Computational Linguistics, Dubrovnik, Croatia, 2424–2443.

FISHER, A., RUDIN, C., AND DOMINICI, F. 2019. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research 20, 177, 1–81.

FROMM, M., BERRENDORF, M., FAERMAN, E., AND SEIDL, T. 2023. Cross-domain argument quality estimation. In Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Association for Computational Linguistics, Toronto, Canada, 13435–13448.

GAGNON, V., LABRIE, A., DESMARAIS, M., AND BHATNAGAR, S. 2019. Filtering non-relevant short answers in peer learning applications. In 11th Conference on Educational Data Mining (EDM2019), C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. International Educational Data Mining Society, Montreal, Canada, 245–252.

GARCIA-MILA, M., GILABERT, S., ERDURAN, S., AND FELTON, M. 2013. The effect of argumentative task goal on the quality of argumentative discourse. Science Education 97, 4, 497–523.

GHOSH, D., KHANAM, A., HAN, Y., AND MURESAN, S. 2016. Coarse-grained argumentation features for scoring persuasive essays. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), K. Erk and N. A. Smith, Eds. Association for Computational Linguistics, Berlin, Germany, 549–554.

GLEIZE, M., SHNARCH, E., CHOSHEN, L., DANKIN, L., MOSHKOWICH, G., AHARONOV, R., AND SLONIM, N. 2019. Are you convinced? choosing the more convincing evidence with a Siamese network. arXiv preprint arXiv:1907.08971.

GOMIS, R. AND SILVESTRE, F. 2021. Plateforme elaastic. https://sia.univ-toulouse.fr/ initiatives-pedagogiques/plateforme-elaastic.

GRAESSER, A. C., MCNAMARA, D. S., LOUWERSE, M. M., AND CAI, Z. 2004. Coh-Metrix: Analysis of text on cohesion and language. Behavior research methods, instruments, & computers 36, 2, 193– 202.

GRETZ, S., FRIEDMAN, R., COHEN-KARLIK, E., TOLEDO, A., LAHAV, D., AHARONOV, R., AND SLONIM, N. 2020. A large-scale dataset for argument quality ranking: Construction and analysis. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. AAAI Press, New York, New York, 7805–7813.

HABERNAL, I. AND GUREVYCH, I. 2016. Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using bidirectional LSTM. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), K. Erk and N. A. Smith, Eds. Association for Computational Linguistics, Berlin, Germany, 1589–1599.

JONES, I. AND WHEADON, C. 2015. Peer assessment using comparative and absolute judgement. Studies in Educational Evaluation 47, 93–101.

KHOSRAVI, H., KITTO, K., AND WILLIAMS, J. J. 2019. Ripple: A crowdsourced adaptive platform for recommendation of learning activities. arXiv preprint arXiv:1910.05522.

KLEBANOV, B. B., STAB, C., BURSTEIN, J., SONG, Y., GYAWALI, B., AND GUREVYCH, I. 2016. Argumentation: Content, structure, and relationship with essay quality. In Proceedings of the Third Workshop on Argument Mining (ArgMining2016), C. Reed, Ed. Association for Computational Linguistics, Berlin, Germany, 70–75.

KOLHE, P., LITTMAN, M. L., AND ISBELL, C. L. 2016. Peer Reviewing Short Answers using Comparative Judgement. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale. Association for Computing Machinery, Edinburgh, Scotland, 241–244.

LE, Q. AND MIKOLOV, T. 2014. Distributed representations of sentences and documents. In International conference on machine learning, E. P. Xing and T. Jebara, Eds. Proceedings of Machine Learning Research, PMLR, Beijing, China, 1188–1196.

MAGOODA, A. E., ZAHRAN, M., RASHWAN, M., RAAFAT, H., AND FAYEK, M. 2016. Vector based techniques for short answer grading. In Proceedings of the twenty-ninth international FLAIRS conference, Z. Markov and I. Russell, Eds. AAAI Press, Key Largo, Florida, 238–243.

MELNIKOV, V., GUPTA, P., FRICK, B., KAIMANN, D., AND HÜLLERMEIER, E. 2016. Pairwise versus pointwise ranking: A case study. Schedae Informaticae 25, 73–83.

MERCIER, H. AND SPERBER, D. 2011. Why do humans reason? arguments for an argumentative theory. Behavioral and brain sciences 34, 2, 57–74.

MOHLER, M., BUNESCU, R., AND MIHALCEA, R. 2011. Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, D. Lin, Y. Matsumoto, and R. Mihalcea, Eds. Association for Computational Linguistics, Portland, Oregon, 752–762.

MOHLER, M. AND MIHALCEA, R. 2009. Text-to-text Semantic Similarity for Automatic Short Answer Grading. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, A. Lascarides, C. Gardent, and J. Nivre, Eds. EACL ’09. Association for Computational Linguistics, Stroudsburg, PA, USA, 567–575.

NATHAN, M. J., KOEDINGER, K. R., ALIBALI, M. W., AND OTHERS. 2001. Expert blind spot: When content knowledge eclipses pedagogical content knowledge. In Proceedings of the third international conference on cognitive science, L. Chen, K. Cheng, C.-Y. Chiu, S.-W. Cho, S. He, Y. Jang, J. Katanuma, C. Lee, G. Legendre, C. Ling, M. Lungarella, M. Nathan, R. Pfeifer, J. Zhang, J. Zhang, S. Zhang, and Y. Zhuo, Eds. Vol. 644648. USTC Press, Beijing, China.

NGUYEN, H. V. AND LITMAN, D. J. 2018. Argument Mining for Improving the Automated Scoring of Persuasive Essays. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, New Orleans, Louisiana, 5892–5899.

PELÁNEK, R. 2016. Applications of the Elo rating system in adaptive educational systems. Computers & Education 98, 169–179.

PENNINGTON, J., SOCHER, R., AND MANNING, C. D. 2014. Glove: Global Vectors forWord Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), A. Moschitti, B. Pang, and W. Daelemans, Eds. Vol. 14. Association for Computational Linguistics, Doha, Qatar, 1532–1543.

PERSING, I. AND NG, V. 2015. Modeling argument strength in student essays. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), C. Xong and M. Strube, Eds. Association for Computational Linguistics, Beijing, China, 543–552.

PERSING, I. AND NG, V. 2016. End-to-End Argumentation Mining in Student Essays. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Knight, A. Nenkova, and O. Rambow, Eds. Association for Computational Linguistics, San Diego, California, 1384–1394.

POTASH, P., FERGUSON, A., AND HAZEN, T. J. 2019. Ranking passages for argument convincingness. In Proceedings of the 6th Workshop on Argument Mining, B. Stein and H. Wachsmuth, Eds. Association for Computational Linguistics, Florence, Italy, 146–155.

POTTER, T., ENGLUND, L., CHARBONNEAU, J., MACLEAN, M. T., NEWELL, J., ROLL, I., AND OTHERS. 2017. ComPAIR: A new online tool using adaptive comparative judgement to support learning with peer feedback. Teaching & Learning Inquiry 5, 2, 89–113.

RAMAN, K. AND JOACHIMS, T. 2014. Methods for ordinal peer grading. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery, New York, New York, 1037–1046.

RIORDAN, B., HORBACH, A., CAHILL, A., ZESCH, T., AND LEE, C. 2017. Investigating neural architectures for short answer scoring. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, J. Tetreault, J. Burstein, C. Leacock, and H. Yannakoudakis, Eds. Association for Computational Linguistics, Copenhagen, Denmark, 159–168.

SILVESTRE, F., VIDAL, P., AND BROISIN, J. 2015. Reflexive learning, socio-cognitive conflict and peerassessment to improve the quality of feedbacks in online tests. In Design for Teaching and Learning in a NetworkedWorld: 10th European Conference on Technology Enhanced Learning, EC-TEL 2015, Proceedings 10. Springer, Toledo, Spain, 339–351.

SINGH, A., BROOKS, C., AND DOROUDI, S. 2022. Learnersourcing in theory and practice: synthesizing the literature and charting the future. In Proceedings of the Ninth ACM Conference On Learning@ Scale. Association for Computing Machinery, New York, New York, 234–245.

STEGMANN, K., WECKER, C., WEINBERGER, A., AND FISCHER, F. 2012. Collaborative argumentation and cognitive elaboration in a computer-supported collaborative learning environment. Instructional Science 40, 2 (Mar.), 297–323.

SULTAN, M. A., SALAZAR, C., AND SUMNER, T. 2016. Fast and easy short answer grading with high accuracy. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Knight, A. Nenkova, and O. Rambow, Eds. Association for Computational Linguistics, San Diego, California, 1070–1075.

THORBURN, L. AND KRUGER, A. 2022. Optimizing language models for argumentative reasoning. In Proceedings of the 1st Workshop on Argumentation & Machine Learning co-located with 9th International Conference on Computational Models of Argument (COMMA 2022), F. Toni, R. Booth, and S. Polberg, Eds. IOS Press, Cardiff, Wales, 27–44.

TOLEDO, A., GRETZ, S., COHEN-KARLIK, E., FRIEDMAN, R., VENEZIAN, E., LAHAV, D., JACOVI, M., AHARONOV, R., AND SLONIM, N. 2019. Automatic Argument Quality Assessment-New Datasets and Methods. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), K. Inui, J. Jiang, V. Ng, and X. Wan, Eds. Association for Computational Linguistics, Hong Kong, China, 5629–5639.

UNIVERSITY OF BRITISH COLUMBIA, T. . L. T. 2019. ubc/ubcpi. original-date: 2015-02- 17T21:37:02Z.

VENVILLE, G. J. AND DAWSON, V. M. 2010. The impact of a classroom intervention on grade 10 students’ argumentation skills, informal reasoning, and conceptual understanding of science. Journal of Research in Science Teaching 47, 8, 952–977.

WACHSMUTH, H., NADERI, N., HOU, Y., BILU, Y., PRABHAKARAN, V., THIJM, T. A., HIRST, G., AND STEIN, B. 2017. Computational argumentation quality assessment in natural language. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, M. Lapata, P. Blunsom, and A. Koller, Eds. Association for Computational Linguistics, Valencia, Spain, 176–187.

WEIR, S., KIM, J., GAJOS, K. Z., AND MILLER, R. C. 2015. Learnersourcing subgoal labels for how-to videos. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. Association for Computing Machinery, New York, New York, 405–416.

WILLIAMS, J. J., KIM, J., RAFFERTY, A., MALDONADO, S., GAJOS, K. Z., LASECKI, W. S., AND HEFFERNAN, N. 2016. AXIS: Generating Explanations at Scale with Learnersourcing and Machine Learning. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale - L@S ’16. ACM Press, Edinburgh, Scotland, UK, 379–388.
Section
Articles