Propositional Extraction from Collaborative Naturalistic Dialogues
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
In the realm of collaborative learning, extracting the beliefs shared within a group is a critical capability to navigate complex tasks. Inherent in this problem is the fact that in naturalistic collaborative discourse, the same propositional content may be expressed in radically different ways. This difficulty is exacerbated when speech overlaps and other communicative modalities are used, as would be the case in a co-situated collaborative task. In this paper, we conduct a comparative methodological analysis of extraction techniques for task-relevant propositions from natural speech dialogues in a challenging shared task setting where participants collaboratively determine the weights of five blocks using only a balance scale. We encode utterances and candidate propositions through language models and compare a cross-encoder method, adapted from coreference research, to a vector similarity baseline. Our cross-encoder approach outperforms both a cosine similarity baseline and zero-shot inference by both the GPT-4 and LLaMA 2 language models, and we establish a novel baseline on this challenging task on two collaborative task datasets---the Weights Task and DeliData---showing the generalizability of our approach. Furthermore, we explore the use of state of the art large language models for data augmentation to enhance performance, extend our examination to transcripts generated by Google's Automatic Speech Recognition system to assess the potential for automating the propositional extraction process in real-time, and introduce a framework for live propositional extraction from natural speech and multimodal signals. This study not only demonstrates the feasibility of detecting collaboration-relevant content in unstructured interactions but also lays the groundwork for employing AI to enhance collaborative problem-solving in classrooms, and other collaborative settings, such as the workforce. Our code may be found at: (https://github.com/csu-signal/PropositionExtraction).
How to Cite
##plugins.themes.bootstrap3.article.details##
collaborative problem solving, natural speech, propositional extraction, natural language processing, dialogue analysis
Ahmed, S. R., Nath, A., Regan, M., Pollins, A., Krishnaswamy, N., and Martin, J. H. 2023. How good is the model in model-in-the-loop event coreference resolution annotation? In Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII), J. Prange and A. Friedrich, Eds. Association for Computational Linguistics, Toronto, Canada, 136–145.
Beltagy, I., Peters, M. E., and Cohan, A. 2020. Longformer: The long-document transformer. CoRR abs/2004.05150, 1–17.
Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., and Jurafsky, D. 2004. Automatic extraction of opinion propositions and their holders. In Exploring Attitude and Affect in Text: Theories and Applications. Papers from the 2004 AAAI Spring Symposium. AAAI, Palo Alto, California, 20–27.
Bixler, R., Blanchard, N., Garrison, L., and D’Mello, S. 2015. Automatic detection of mind wandering during reading using gaze and physiology. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ICMI ’15. Association for Computing Machinery, New York, NY, USA, 299–306.
Blanchard, N., Donnelly, P., Olney, A. M., Samei, B., Ward, B., Sun, X., Kelly, S., Nystrand, M., and D’Mello, S. K. 2016. Identifying teacher questions using automatic speech recognition in classrooms. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, R. Fernandez, W. Minker, G. Carenini, R. Higashinaka, R. Artstein, and A. Gainer, Eds. Association for Computational Linguistics, Los Angeles, 191–201.
Blanchard, N., Moreira, D., Bharati, A., and Scheirer, W. 2018. Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities. In Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), A. Zadeh, P. P. Liang, L.-P. Morency, S. Poria, E. Cambria, and S. Scherer, Eds. Association for Computational Linguistics, Melbourne, Australia, 1–10.
Bradford, M., Hansen, P., Beveridge, J. R., Krishnaswamy, N., and Blanchard, N. 2022. A deep dive into microphone hardware for recording collaborative group work. In Proceedings of the 15th International Conference on Educational Data Mining, A. Mitrovic and N. Bosch, Eds. International Educational Data Mining Society, Durham, United Kingdom, 588–593.
Bradford, M., Khebour, I., Blanchard, N., and Krishnaswamy, N. 2023. Automatic detection of collaborative states in small groups using multimodal features. In Artificial Intelligence in Education, N. Wang, G. Rebolledo-Mendez, N. Matsuda, O. C. Santos, and V. Dimitrova, Eds. Springer Nature Switzerland, Cham, 767–773.
Caciularu, A., Cohan, A., Beltagy, I., Peters, M., Cattan, A., and Dagan, I. 2021. CDLM: Cross-document language modeling. In Findings of the Association for Computational Linguistics: EMNLP 2021, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds. Association for Computational Linguistics, Punta Cana, Dominican Republic, 2648–2662.
Castillon, I., Venkatesha, V., VanderHoeven, H., Bradford, M., Krishnaswamy, N., and Blanchard, N. 2022. Multimodal features for group dynamic-aware agents. In Interdisciplinary Approaches to Getting AI Experts and Education Stakeholders Talking Workshop at AIEd. International AIEd Society. Springer Cham, Durham, UK, 1–6.
Cattan, A., Eirew, A., Stanovsky, G., Joshi, M., and Dagan, I. 2021. Cross-document coreference resolution over predicted mentions. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, C. Zong, F. Xia, W. Li, and R. Navigli, Eds. Association for Computational Linguistics, Online, 5100–5107.
Chai, H. and Strube, M. 2022. Incorporating centering theory into neural coreference resolution. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, M. Carpuat, M.-C. de Marneffe, and I. V. Meza Ruiz, Eds. Association for Computational Linguistics, Seattle, United States, 2996–3002.
Chand, V., Baynes, K., Bonnici, L. M., and Farias, S. T. 2012. A rubric for extracting idea density from oral language samples. Current Protocols in Neuroscience 58, 1, 10–5.
Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1, 37–46.
Dennis, S. 2004. An unsupervised method for the extraction of propositional information from text. Proceedings of the National Academy of Sciences 101, suppl_1, 5206–5213.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.
Donnelly, P. J., Blanchard, N., Olney, A. M., Kelly, S., Nystrand, M., and D’Mello, S. K. 2017. Words matter: Automatic detection of teacher questions in live classroom discourse using linguistics, acoustics, and context. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference. LAK ’17. ACM, New York, NY, USA, 218–227.
Donnelly, P. J., Blanchard, N., Samei, B., Olney, A. M., Sun, X., Ward, B., Kelly, S., Nystrand, M., and D’Mello, S. K. 2016. Multi-sensor modeling of teacher instructional segments in live classrooms. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. Association for Computing Machinery, Tokyo, Japan, 177–184.
Evans, J. S. B. 2016. Reasoning, biases and dual processes: The lasting impact of Wason (1960). Quarterly Journal of Experimental Psychology 69, 10, 2076–2092.
Gentner, D. 1978. Testing the psychological reality of a representational model. In Theoretical Issues in Natural Language Processing-2, D. L. Waltz, Ed. Association for Computational Linguistics, Las Cruces, New Mexico, 1–7.
Gijlers, H. and de Jong, T. 2009. Sharing and confronting propositions in collaborative inquiry learning. Cognition and Instruction 27, 3, 239–268.
Graesser, A. C., Fiore, S. M., Greiff, S., Andrews-Todd, J., Foltz, P. W., and Hesse, F. W. 2018. Advancing the science of collaborative problem solving. Psychological Science in the Public Interest 19, 2, 59–92.
Grosz, B. J. and Sidner, C. L. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12, 3, 175–204.
Held, W., Iter, D., and Jurafsky, D. 2021. Focus on what matters: Applying discourse coherence theory to cross document coreference. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 1406–1417.
Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., and Blunsom, P. 2015. Teaching machines to read and comprehend. Advances in Neural Information Processing Systems 28, 1693–1701.
Hofstätter, S., Lipani, A., Althammer, S., Zlabinger, M., and Hanbury, A. 2021. Mitigating the position bias of transformer models in passage re-ranking. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part I 43. Springer, virtual event, 238–253.
Humeau, S., Shuster, K., Lachaux, M.-A., and Weston, J. 2020. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In International Conference on Learning Representations. ICLR, Addis Ababa, Ethiopia, 1–14.
Jeon, S. and Strube, M. 2020. Centering-based neural coherence modeling with hierarchical discourse segments. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y. He, and Y. Liu, Eds. Association for Computational Linguistics, Online, 7458–7472.
Jo, Y., Visser, J., Reed, C., and Hovy, E. 2019. A cascade model for proposition extraction in argumentation. In Proceedings of the 6th Workshop on Argument Mining, B. Stein and H. Wachsmuth, Eds. Association for Computational Linguistics, Florence, Italy, 11–24.
Jo, Y., Visser, J., Reed, C., and Hovy, E. 2020. Extracting implicitly asserted propositions in argumentation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y. He, and Y. Liu, Eds. Association for Computational Linguistics, Online, 24–38.
Karadzhov, G., Stafford, T., and Vlachos, A. 2023. Delidata: A dataset for deliberation in multi-party problem solving. Proceedings of the ACM on Human-Computer Interaction 7, CSCW2, 1–25.
Khebour, I., Brutti, R., Dey, I., Dickler, R., Sikes, K., Lai, K., Bradford, M., Cates, B., Hansen, P., Jung, C., Wisniewski, B., Terpstra, C., Hirshfield, L., Puntambekar, S., Blanchard, N., Pustejovsky, J., and Krishnaswamy, N. 2024. When text and speech are not enough: A multimodal dataset of collaboration in a situated task. Journal of Open Humanities Data 10, 1 (Jan), 1–7.
Khebour, I. K., Lai, K., Bradford, M., Zhu, Y., Brutti, R. A., Tam, C., Tu, J., Ibarra, B. A., Blanchard, N., Krishnaswamy, N., and Pustejovsky, J. 2024. Common ground tracking in multimodal dialogue. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, and N. Xue, Eds. ELRA and ICCL, Torino, Italia, 3587–3602.
Kulhánek, J., Hudeček, V., Nekvinda, T., and Dušek, O. 2021. AuGPT: Auxiliary tasks and data augmentation for end-to-end dialogue with pre-trained language models. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, A. Papangelis, P. Budzianowski, B. Liu, E. Nouri, A. Rastogi, and Y.-N. Chen, Eds. Association for Computational Linguistics, Online, 198–210.
Lappin, S. and Leass, H. J. 1994. An algorithm for pronominal anaphora resolution. Computational Linguistics 20, 4, 535–561.
Li, J., Luong, T., and Jurafsky, D. 2015. A hierarchical neural autoencoder for paragraphs and documents. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), C. Zong and M. Strube, Eds. Association for Computational Linguistics, Beijing, China, 1106–1115.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. 2020. Roberta: A robustly optimized BERT pretraining approach. In International Conference on Learning Representations. ICLR, Addis Ababa, Ethiopia, 1–15.
Nallapati, R., Zhou, B., dos Santos, C., Gulçehre, Ç., and Xiang, B. 2016. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, S. Riezler and Y. Goldberg, Eds. Association for Computational Linguistics, Berlin, Germany, 280–290.
Nath, A., Manafi Avari, S., Chelle, A., and Krishnaswamy, N. 2024. Okay, let‘s do this! modeling event coreference with generated rationales and knowledge distillation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), K. Duh, H. Gomez, and S. Bethard, Eds. Association for Computational Linguistics, Mexico City, Mexico, 3931–3946.
Nath, A., Mannan, S., and Krishnaswamy, N. 2023. AxomiyaBERTa: A phonologically-aware transformer model for Assamese. In Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Association for Computational Linguistics, Toronto, Canada, 11629–11646.
Nath, A., Venkatesha, V., Bradford, M., Chelle, A., Youngren, A. C., Mabrey, C., Blanchard, N., and Krishnaswamy, N. 2024. “any other thoughts, hedgehog?” linking deliberation chains in collaborative dialogues. In Findings of the Association for Computational Linguistics: EMNLP 2024, Y. Al-Onaizan, M. Bansal, and Y.-N. Chen, Eds. Association for Computational Linguistics, Miami, Florida, USA, 5297–5314.
Pacuit, E. 2017. Neighborhood Semantics for Modal Logic, 1st ed. Springer Publishing Company, Incorporated, New York, NY.
Pellicer, L. F. A. O., Ferreira, T. M., and Costa, A. H. R. 2023. Data augmentation techniques in natural language processing. Applied Soft Computing 132, 109803.
Piech, C., Bassen, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L. J., and Sohl-Dickstein, J. 2015. Deep knowledge tracing. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds. Vol. 28. Curran Associates, Inc., Montreal, Canada, 505 – 513.
Ravi, S., Tanner, C., Ng, R., and Shwartz, V. 2023. What happens before and after: Multi-event commonsense in event coreference resolution. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, A. Vlachos and I. Augenstein, Eds. Association for Computational Linguistics, Dubrovnik, Croatia, 1708–1724.
Reimers, N. and Gurevych, I. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), K. Inui, J. Jiang, V. Ng, and X. Wan, Eds. Association for Computational Linguistics, Hong Kong, China, 3982–3992.
Roschelle, J. and Teasley, S. D. 1995. The construction of shared knowledge in collaborative problem solving. In Computer Supported Collaborative Learning, C. O’Malley, Ed. Springer Berlin Heidelberg, Berlin, Heidelberg, 69–97.
Sun, C., Shute, V. J., Stewart, A., Yonehiro, J., Duran, N., and D’Mello, S. 2020. Towards a generalized competency model of collaborative problem solving. Computers & Education 143, 103672.
Terpstra, C., Khebour, I., Bradford, M., Wisniewski, B., Krishnaswamy, N., and Blanchard, N. 2023. How good is automatic segmentation as a multimodal discourse annotation aid? In Proceedings of the 19th Joint ACL-ISO Workshop on Interoperable Semantics (ISA-19), H. Bunt, Ed. Association for Computational Linguistics, Nancy, France, 75–81.
Tu, J., Rim, K., Holderness, E., Ye, B., and Pustejovsky, J. 2023. Dense paraphrasing for textual enrichment. In Proceedings of the 15th International Conference on Computational Semantics (IWCS). Association for Computational Linguistics, Nancy, France, 39–49.
Tu, J., Rim, K., and Pustejovsky, J. 2022. Competence-based question generation. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1521–1533.
VanderHoeven, H., Bradford, M., Jung, C., Khebour, I., Lai, K., Pustejovsky, J., Krishnaswamy, N., and Blanchard, N. 2024. Multimodal design for interactive collaborative problem-solving support. In Human Interface and the Management of Information, H. Mori and Y. Asahi, Eds. Springer Nature Switzerland, Cham, 60–80.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Curran Associates Inc., Red Hook, NY, USA, 6000—-6010.
Venkatesha, V., Nath, A., Khebour, I., Chelle, A., Bradford, M., Tu, J., Pustejovsky, J., Blanchard, N., and Krishnaswamy, N. 2024. Propositional extraction from natural speech in small group collaborative tasks. In Proceedings of the 17th International Conference on Educational Data Mining, B. Paaßen and C. D. Epp, Eds. International Educational Data Mining Society, Atlanta, Georgia, USA, 169–180.
Webb, N. M., Ing, M., Burnheimer, E., Johnson, N. C., Franke, M. L., and Zimmerman, J. 2021. Is there a right way? Productive patterns of interaction during collaborative problem solving. Education Sciences 11, 5, 214.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35, 24824–24837.
Yang, X., Peynetti, E., Meerman, V., and Tanner, C. 2022. What GPT knows about who is who. In Proceedings of the Third Workshop on Insights from Negative Results in NLP, S. Tafreshi, J. Sedoc, A. Rogers, A. Drozd, A. Rumshisky, and A. Akula, Eds. Association for Computational Linguistics, Dublin, Ireland, 75–81.
Yu, X., Yin, W., and Roth, D. 2022. Pairwise representation learning for event coreference. In Proceedings of the 11th Joint Conference on Lexical and Computational Semantics, V. Nastase, E. Pavlick, M. T. Pilehvar, J. Camacho-Collados, and A. Raganato, Eds. Association for Computational Linguistics, Seattle, Washington, 69–78.
Zeng, Y., Jin, X., Guan, S., Guo, J., and Cheng, X. 2020. Event coreference resolution with their paraphrases and argument-aware embeddings. In Proceedings of the 28th International Conference on Computational Linguistics, D. Scott, N. Bel, and C. Zong, Eds. International Committee on Computational Linguistics, Barcelona, Spain (Online), 3084–3094.
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. 2020. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations. ICLR, Virtual, 1–43.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.