Modelling Argument Quality in Technology-Mediated Peer Instruction
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
Learnersourcing is the process by which students submit content that enriches the bank of learning materials
available to their peers, all as an authentic part of their learning experience. One example of learnersourcing
is Technology-Mediated Peer Instruction (TMPI), whereby students are prompted to submit
explanations to justify their choice in a multiple-choice question (MCQ), and are subsequently presented
with explanations written by their peers, after which they can reconsider their own answer. TMPI allows
students to contrast their reasoning with a variety of peer-submitted explanations. It is intended to foster
reflection, ultimately leading to better learning. However, not all content submitted by students is adequate
and it must be curated, a process that can require a significant effort by the teacher. The curation
process ought to be automated for learnersourcing in TMPI to scale up to large classes, such as MOOCs.
Even for smaller settings, automation is critical for the timely curation of student-submitted content, such
as within a single assignment, or during a semester.
We adapt methods from argument mining and natural language processing to address the curation
challenge and assess the quality of student answers submitted in TMPI, as judged by their peers. The
curation task is confined to the prediction of argument convincingness: an explanation submitted by a
learner is considered of good quality, if it is convincing to their peers. We define a methodology to
measure convincingness scores using three methods, Bradley-Terry, Crowd-Bradley-Terry and WinRate.
We assess the performance of feature-rich supervised learning algorithms as well as transformer-based
neural approach to predict convincingness using these scores. Experiments are conducted over different
domains, from ethics to STEM. While the neural approach shows the greatest correlation between its
prediction and the different convincingness measures, results show that success on this task is highly dependent
on the domain and the type of question.
How to Cite
##plugins.themes.bootstrap3.article.details##
learnersourcing, comparative peer evaluation, text mining, convincingness, Technology-Mediated Peer Instruction (TMPI)
BENTAHAR, J., MOULIN, B., AND BÉLANGER, M. 2010. A taxonomy of argumentation models used for knowledge representation. Artificial Intelligence Review 33, 3, 211–259.
BHATNAGAR, S., DESMARAIS, M., WHITTAKER, C., LASRY, N., DUGDALE, M., AND CHARLES, E. S. 2015. An analysis of peer-submitted and peer-reviewed answer rationales, in an asynchronous Peer Instruction based learning environment. In Proceedings of the 8th International Conference on Educational Data Mining, O. C. Santos, J. G. Boticario, C. Romero, M. Pechenizkiy, A. Merceron, P. Mitros, J. M. Luna, C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, and M. Desmarais, Eds. International Educational Data Mining Society, Madrid, Spain, 456–459.
BHATNAGAR, S., ZOUAQ, A., DESMARAIS, M., AND CHARLES, E. 2020a. A Dataset of Learnersourced Explanations from an Online Peer Instruction Environment. In International Conference on Educational Data Mining (EDM) (13th, Online, Jul 10-13, 2020), A. N. Rafferty, J. Whitehill, C. Romero, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society, 350–355.
BHATNAGAR, S., ZOUAQ, A., DESMARAIS, M. C., AND CHARLES, E. 2020b. Learnersourcing Quality Assessment of Explanations for Peer Instruction. In Addressing Global Challenges and Quality Education, C. Alario-Hoyos, M. J. Rodríguez-Triana, M. Scheffel, I. Arnedillo-Sánchez, and S. M. Dennerlein, Eds. Springer International Publishing, Cham, 144–157.
BLOOM, B. S., ENGELHART, M. D., FURST, E., HILL, W. H., AND KRATHWOHL, D. R. 1956. Handbook I: Cognitive Domain. David McKay, New York.
BRADLEY, R. A. AND TERRY, M. E. 1952. Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39, 3/4, 324–345.
BREIMAN, L. 2001. Random forests. Machine learning 45, 1, 5–32.
BURROWS, S., GUREVYCH, I., AND STEIN, B. 2015. The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education 25, 1, 60–117.
CAMBRE, J., KLEMMER, S., AND KULKARNI, C. 2018. Juxtapeer: Comparative Peer Review Yields Higher Quality Feedback and Promotes Deeper Reflection. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI ’18. ACM Press, Montreal QC, Canada, 1–13.
CHARLES, E. S., LASRY, N., BHATNAGAR, S., ADAMS, R., LENTON, K., BROUILLETTE, Y., DUGDALE, M., WHITTAKER, C., AND JACKSON, P. 2019. Harnessing peer instruction in- and out- of class with myDALITE. In Fifteenth Conference on Education and Training in Optics and Photonics: ETOP 2019. Optical Society of America, Quebec City, Canada, 11143 89.
CHARLES, E. S., LASRY, N., WHITTAKER, C., DUGDALE, M., LENTON, K., BHATNAGAR, S., AND GUILLEMETTE, J. 2015. Beyond and Within Classroom Walls: Designing Principled Pedagogical Tools for Student and Faculty Uptake. In Exploring the Material Conditions of Learning: The Computer Supported Collaborative Learning (CSCL) Conference 2015, Volume 1, O. Lindwall, P. Hakkinen, T. Koschman, P. Tchounikine, and S. Ludvigsen, Eds. International Society of the Learning Sciences, Inc.[ISLS], 292-299.
CHEN, X., BENNETT, P. N., COLLINS-THOMPSON, K., AND HORVITZ, E. 2013. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data mining. Association for Computing Machinery, New York, New York, 193–202.
CHI, M. T., LEEUW, N., CHIU, M.-H., AND LAVANCHER, C. 1994. Eliciting self-explanations improves understanding. Cognitive science 18, 3, 439–477.
CLINE, K., HUCKABY, D. A., AND ZULLO, H. 2021. Identifying Clicker Questions that Provoke Rich Discussions in Introductory Statistics. PRIMUS 0, ja, 1–32.
CROUCH, C. H. AND MAZUR, E. 2001. Peer instruction: Ten years of experience and results. American Journal of Physics 69, 9, 970–977.
DEERWESTER, S. C., DUMAIS, S. T., LANDAUER, T. K., FURNAS, G. W., AND HARSHMAN, R. A. 1990. Indexing by latent semantic analysis. JASIS 41, 6, 391–407.
DENNY, P., HAMER, J., LUXTON-REILLY, A., AND PURCHASE, H. 2008. PeerWise: Students Sharing Their Multiple Choice Questions. In Proceedings of the Fourth InternationalWorkshop on Computing Education Research. ICER ’08. Association for Computing Machinery, New York, NY, USA, 51–58. event-place: Sydney, Australia.
DEVLIN, J., CHANG, M.-W., LEE, K., AND TOUTANOVA, K. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
ELO, A. E. 1978. The rating of chessplayers, past and present. Arco Pub.
FALK, N. AND LAPESA, G. 2023. Bridging argument quality and deliberative quality annotations with adapters. In Findings of the Association for Computational Linguistics: EACL 2023, A. Vlachos and I. Augenstein, Eds. Association for Computational Linguistics, Dubrovnik, Croatia, 2424–2443.
FISHER, A., RUDIN, C., AND DOMINICI, F. 2019. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research 20, 177, 1–81.
FROMM, M., BERRENDORF, M., FAERMAN, E., AND SEIDL, T. 2023. Cross-domain argument quality estimation. In Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Association for Computational Linguistics, Toronto, Canada, 13435–13448.
GAGNON, V., LABRIE, A., DESMARAIS, M., AND BHATNAGAR, S. 2019. Filtering non-relevant short answers in peer learning applications. In 11th Conference on Educational Data Mining (EDM2019), C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. International Educational Data Mining Society, Montreal, Canada, 245–252.
GARCIA-MILA, M., GILABERT, S., ERDURAN, S., AND FELTON, M. 2013. The effect of argumentative task goal on the quality of argumentative discourse. Science Education 97, 4, 497–523.
GHOSH, D., KHANAM, A., HAN, Y., AND MURESAN, S. 2016. Coarse-grained argumentation features for scoring persuasive essays. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), K. Erk and N. A. Smith, Eds. Association for Computational Linguistics, Berlin, Germany, 549–554.
GLEIZE, M., SHNARCH, E., CHOSHEN, L., DANKIN, L., MOSHKOWICH, G., AHARONOV, R., AND SLONIM, N. 2019. Are you convinced? choosing the more convincing evidence with a Siamese network. arXiv preprint arXiv:1907.08971.
GOMIS, R. AND SILVESTRE, F. 2021. Plateforme elaastic. https://sia.univ-toulouse.fr/ initiatives-pedagogiques/plateforme-elaastic.
GRAESSER, A. C., MCNAMARA, D. S., LOUWERSE, M. M., AND CAI, Z. 2004. Coh-Metrix: Analysis of text on cohesion and language. Behavior research methods, instruments, & computers 36, 2, 193– 202.
GRETZ, S., FRIEDMAN, R., COHEN-KARLIK, E., TOLEDO, A., LAHAV, D., AHARONOV, R., AND SLONIM, N. 2020. A large-scale dataset for argument quality ranking: Construction and analysis. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. AAAI Press, New York, New York, 7805–7813.
HABERNAL, I. AND GUREVYCH, I. 2016. Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using bidirectional LSTM. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), K. Erk and N. A. Smith, Eds. Association for Computational Linguistics, Berlin, Germany, 1589–1599.
JONES, I. AND WHEADON, C. 2015. Peer assessment using comparative and absolute judgement. Studies in Educational Evaluation 47, 93–101.
KHOSRAVI, H., KITTO, K., AND WILLIAMS, J. J. 2019. Ripple: A crowdsourced adaptive platform for recommendation of learning activities. arXiv preprint arXiv:1910.05522.
KLEBANOV, B. B., STAB, C., BURSTEIN, J., SONG, Y., GYAWALI, B., AND GUREVYCH, I. 2016. Argumentation: Content, structure, and relationship with essay quality. In Proceedings of the Third Workshop on Argument Mining (ArgMining2016), C. Reed, Ed. Association for Computational Linguistics, Berlin, Germany, 70–75.
KOLHE, P., LITTMAN, M. L., AND ISBELL, C. L. 2016. Peer Reviewing Short Answers using Comparative Judgement. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale. Association for Computing Machinery, Edinburgh, Scotland, 241–244.
LE, Q. AND MIKOLOV, T. 2014. Distributed representations of sentences and documents. In International conference on machine learning, E. P. Xing and T. Jebara, Eds. Proceedings of Machine Learning Research, PMLR, Beijing, China, 1188–1196.
MAGOODA, A. E., ZAHRAN, M., RASHWAN, M., RAAFAT, H., AND FAYEK, M. 2016. Vector based techniques for short answer grading. In Proceedings of the twenty-ninth international FLAIRS conference, Z. Markov and I. Russell, Eds. AAAI Press, Key Largo, Florida, 238–243.
MELNIKOV, V., GUPTA, P., FRICK, B., KAIMANN, D., AND HÜLLERMEIER, E. 2016. Pairwise versus pointwise ranking: A case study. Schedae Informaticae 25, 73–83.
MERCIER, H. AND SPERBER, D. 2011. Why do humans reason? arguments for an argumentative theory. Behavioral and brain sciences 34, 2, 57–74.
MOHLER, M., BUNESCU, R., AND MIHALCEA, R. 2011. Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, D. Lin, Y. Matsumoto, and R. Mihalcea, Eds. Association for Computational Linguistics, Portland, Oregon, 752–762.
MOHLER, M. AND MIHALCEA, R. 2009. Text-to-text Semantic Similarity for Automatic Short Answer Grading. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, A. Lascarides, C. Gardent, and J. Nivre, Eds. EACL ’09. Association for Computational Linguistics, Stroudsburg, PA, USA, 567–575.
NATHAN, M. J., KOEDINGER, K. R., ALIBALI, M. W., AND OTHERS. 2001. Expert blind spot: When content knowledge eclipses pedagogical content knowledge. In Proceedings of the third international conference on cognitive science, L. Chen, K. Cheng, C.-Y. Chiu, S.-W. Cho, S. He, Y. Jang, J. Katanuma, C. Lee, G. Legendre, C. Ling, M. Lungarella, M. Nathan, R. Pfeifer, J. Zhang, J. Zhang, S. Zhang, and Y. Zhuo, Eds. Vol. 644648. USTC Press, Beijing, China.
NGUYEN, H. V. AND LITMAN, D. J. 2018. Argument Mining for Improving the Automated Scoring of Persuasive Essays. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, New Orleans, Louisiana, 5892–5899.
PELÁNEK, R. 2016. Applications of the Elo rating system in adaptive educational systems. Computers & Education 98, 169–179.
PENNINGTON, J., SOCHER, R., AND MANNING, C. D. 2014. Glove: Global Vectors forWord Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), A. Moschitti, B. Pang, and W. Daelemans, Eds. Vol. 14. Association for Computational Linguistics, Doha, Qatar, 1532–1543.
PERSING, I. AND NG, V. 2015. Modeling argument strength in student essays. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), C. Xong and M. Strube, Eds. Association for Computational Linguistics, Beijing, China, 543–552.
PERSING, I. AND NG, V. 2016. End-to-End Argumentation Mining in Student Essays. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Knight, A. Nenkova, and O. Rambow, Eds. Association for Computational Linguistics, San Diego, California, 1384–1394.
POTASH, P., FERGUSON, A., AND HAZEN, T. J. 2019. Ranking passages for argument convincingness. In Proceedings of the 6th Workshop on Argument Mining, B. Stein and H. Wachsmuth, Eds. Association for Computational Linguistics, Florence, Italy, 146–155.
POTTER, T., ENGLUND, L., CHARBONNEAU, J., MACLEAN, M. T., NEWELL, J., ROLL, I., AND OTHERS. 2017. ComPAIR: A new online tool using adaptive comparative judgement to support learning with peer feedback. Teaching & Learning Inquiry 5, 2, 89–113.
RAMAN, K. AND JOACHIMS, T. 2014. Methods for ordinal peer grading. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery, New York, New York, 1037–1046.
RIORDAN, B., HORBACH, A., CAHILL, A., ZESCH, T., AND LEE, C. 2017. Investigating neural architectures for short answer scoring. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, J. Tetreault, J. Burstein, C. Leacock, and H. Yannakoudakis, Eds. Association for Computational Linguistics, Copenhagen, Denmark, 159–168.
SILVESTRE, F., VIDAL, P., AND BROISIN, J. 2015. Reflexive learning, socio-cognitive conflict and peerassessment to improve the quality of feedbacks in online tests. In Design for Teaching and Learning in a NetworkedWorld: 10th European Conference on Technology Enhanced Learning, EC-TEL 2015, Proceedings 10. Springer, Toledo, Spain, 339–351.
SINGH, A., BROOKS, C., AND DOROUDI, S. 2022. Learnersourcing in theory and practice: synthesizing the literature and charting the future. In Proceedings of the Ninth ACM Conference On Learning@ Scale. Association for Computing Machinery, New York, New York, 234–245.
STEGMANN, K., WECKER, C., WEINBERGER, A., AND FISCHER, F. 2012. Collaborative argumentation and cognitive elaboration in a computer-supported collaborative learning environment. Instructional Science 40, 2 (Mar.), 297–323.
SULTAN, M. A., SALAZAR, C., AND SUMNER, T. 2016. Fast and easy short answer grading with high accuracy. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Knight, A. Nenkova, and O. Rambow, Eds. Association for Computational Linguistics, San Diego, California, 1070–1075.
THORBURN, L. AND KRUGER, A. 2022. Optimizing language models for argumentative reasoning. In Proceedings of the 1st Workshop on Argumentation & Machine Learning co-located with 9th International Conference on Computational Models of Argument (COMMA 2022), F. Toni, R. Booth, and S. Polberg, Eds. IOS Press, Cardiff, Wales, 27–44.
TOLEDO, A., GRETZ, S., COHEN-KARLIK, E., FRIEDMAN, R., VENEZIAN, E., LAHAV, D., JACOVI, M., AHARONOV, R., AND SLONIM, N. 2019. Automatic Argument Quality Assessment-New Datasets and Methods. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), K. Inui, J. Jiang, V. Ng, and X. Wan, Eds. Association for Computational Linguistics, Hong Kong, China, 5629–5639.
UNIVERSITY OF BRITISH COLUMBIA, T. . L. T. 2019. ubc/ubcpi. original-date: 2015-02- 17T21:37:02Z.
VENVILLE, G. J. AND DAWSON, V. M. 2010. The impact of a classroom intervention on grade 10 students’ argumentation skills, informal reasoning, and conceptual understanding of science. Journal of Research in Science Teaching 47, 8, 952–977.
WACHSMUTH, H., NADERI, N., HOU, Y., BILU, Y., PRABHAKARAN, V., THIJM, T. A., HIRST, G., AND STEIN, B. 2017. Computational argumentation quality assessment in natural language. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, M. Lapata, P. Blunsom, and A. Koller, Eds. Association for Computational Linguistics, Valencia, Spain, 176–187.
WEIR, S., KIM, J., GAJOS, K. Z., AND MILLER, R. C. 2015. Learnersourcing subgoal labels for how-to videos. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. Association for Computing Machinery, New York, New York, 405–416.
WILLIAMS, J. J., KIM, J., RAFFERTY, A., MALDONADO, S., GAJOS, K. Z., LASECKI, W. S., AND HEFFERNAN, N. 2016. AXIS: Generating Explanations at Scale with Learnersourcing and Machine Learning. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale - L@S ’16. ACM Press, Edinburgh, Scotland, UK, 379–388.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.