Using LLMs to Identify Indicators of Persistence from Students’ Dialogues with a Pedagogical Agent

Main

Sidebar

Published March 3, 2026
Teresa Ober Shan Zhang Diego Zapata-Rivera Noah Schroeder Anthony Botelho

Abstract

Conversational learning systems offer new opportunities to examine learning processes through chat log data. Constructs such as persistence, self-efficacy, interest, perceived challenge, and prior knowledge are known predictors of student performance but are challenging to detect at scale using traditional methods. This study explores the use of Large Language Models (LLMs) to automatically code indicators of these constructs from student chat logs collected through a conversation-based assessment (CBA) for middle school mathematics. Indicators included observable behaviors such as students’ expressions of challenge, help-seeking, goal-setting, and self-regulatory strategies evident in their conversational interactions within the CBA. We evaluated multiple configurations of ChatGPT4o, varying temperature settings (0, .3, .7, 1) and model types (mini vs. regular), against human expert coders. The dataset comprised over 10,000 student turns collected from 107 middle school students classified as English learners as they interact with the CBA. Reliability was assessed within and between LLM configurations and humans. Results reveal systematic patterns: constructs with moderate theoretical coherence benefited from higher temperatures, while well-defined constructs required deterministic settings. Self-efficacy showed the highest human-LLM alignment. These findings illustrate the challenges of measuring complex psychological constructs and highlight the promise of human-LLM collaboration to enhance qualitative coding efficiency and validity in educational research. Supplemental materials are available online here: https://doi.org/10.17605/osf.io/s85ck.

How to Cite

Using LLMs to Identify Indicators of Persistence from Students’ Dialogues with a Pedagogical Agent. (2026). Journal of Educational Data Mining, 18(1), 208-243. https://doi.org/10.5281/zenodo.18852441
Abstract 69 | PDF Downloads 34 HTML Downloads 8

Details

Keywords

construct extraction, persistence, model configuration, Human-LLM collaboration, qualitative analysis, large language models (LLMs), conversation-based assessment (CBA), educational data mining, construct validity, temperature settings

References
Ainley, M., and Hidi, S. 2014. Interest and enjoyment. In International handbook of emotions in Education, L., Linnenbrink-Garcia and R. Pekrun, Eds., Routledge, 205–227. https://doi.org/10.4324/9780203148211

Ainley, M., Hidi, S., and Berndorff, D. 2002. Interest, learning, and the psychological processes that mediate their relationship. Journal of Educational Psychology, 94(3), 545–561. https://doi.org/10.1037/0022-0663.94.3.545

Alexander, P. A. 2003. The development of expertise: The journey from acclimation to proficiency. Educational Researcher, 32(8), 10–14. https://doi.org/10.3102/0013189X032008010

Anagnostidis, S., and Bulian, J. 2024. How susceptible are LLMs to influence in prompts? arXiv preprint arXiv:2408.11865. https://doi.org/10.48550/arXiv.2408.11865

Bandura, A. 1977. Self-efficacy: toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215. https://psycnet.apa.org/doi/10.1037/0033-295X.84.2.191

Bandura, A. 2006. Toward a psychology of human agency. Perspectives on Psychological Science, 1(2), 164–180. https://doi.org/10.1111/j.1745-6916.2006.00011.x

Barany, A., Nasiar, N., Porter, C., Zambrano, A. F., Andres, A. L., Bright, D., Shah, M., Liu, X., Gao, S., Zhang, J., Mehta, S., Choi, J., Giordano, C. and Baker, R. S. 2024. ChatGPT for Education Research: Exploring the Potential of Large Language Models for Qualitative Codebook Development. In Artificial Intelligence in Education. AIED 2024, A. M. Olney, I. A. Chounta, Z. Liu, O. C. Santos, and I. I. Bittencourt, Eds., Lecture Notes in Computer Science, vol 14830, Springer, Cham, 134–149. https://doi.org/10.1007/978-3-031-64299-9_10

Battle, E. S. 1965. Motivational determinants of academic task persistence. Journal of Personality and Social Psychology, 2(2), 209–218. https://doi.org/10.1037/h0022442

Bauer, M. I., and Zapata-Rivera, D. 2020. Cognitive foundations of automated scoring. In Handbook of automated scoring: Theory into Practice, D. Yan, A. A. Rupp, and P. W. Foltz, Eds., CRC Press, 13–28.

Bernacki, M. L. 2018. Examining the cyclical, loosely sequenced, and contingent features of self-regulated learning: Trace data and their analysis. In Handbook of self-regulation of learning and performance (2nd ed.), B. J. Zimmerman and D. H. Schunk, Eds., Routledge/Taylor & Francis Group, 370–387. https://psycnet.apa.org/doi/10.4324/9781315697048-24

Bernacki, M. L., Nokes-Malach, T. J., and Aleven, V. 2015. Examining self-efficacy during learning: Variability and relations to behavior, performance, and learning. Metacognition and Learning 10, 99–117. https://doi.org/10.1007/s11409-014-9127-x

Borsboom, D., Mellenbergh, G. J., and Van Heerden, J. 2004. The concept of validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061

Botelho, A., Baral, S., Erickson, J. A., Benachamardi, P., and Heffernan, N. T. 2023. Leveraging natural language processing to support automated assessment and feedback for student open responses in mathematics. Journal of Computer Assisted Learning, 39(3), 823–840. https://doi.org/10.1111/jcal.12793

Charmaz, K. 2006. Constructing grounded theory: A practical guide through qualitative analysis. Sage.

Chew, R., Bollenbacher, J., Wenger, M., Speer, J., and Kim, A. 2023. LLM-assisted content analysis: Using large language models to support deductive coding. arXiv preprint arXiv:2306.14924. https://doi.org/10.48550/arXiv.2306.14924

Clark, R. E., and Saxberg, B. 2018. Engineering motivation using the belief-expectancy-control framework. Interdisciplinary Education and Psychology, 2(1), 1–26. https://riverapublications.com/assets/files/pdf_files/engineering-motivation-using-the-belief-expectancy-control-framework.pdf

Cohen J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104

Cronbach, L. J., and Meehl, P. E. 1955. Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957

Crossley, S., McNamara, D., Baker, R. S., Wang, Y., Paquette, L., Barnes, T., and Bergner, Y. 2015. Language to Completion: Success in an Educational Data Mining Massive Open Online Course. In Proceedings of the 8th International Conference on Educational Data Mining, O. C. Santos, J. B. Boticario, C. Romero, M. Pechenizkiy, A. Merceron, P. Mitros, J. M. Luna, C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, and M. Desmarais, Eds., 388–391. https://www.educationaldatamining.org/EDM2015/proceedings/short388-391.pdf

D’Mello, S., and Graesser, A. 2012. Dynamics of affective states during complex learning. Learning and Instruction, 22(2), 145–157. https://doi.org/10.1016/j.learninstruc.2011.10.001

DiCerbo, K. E. 2014. Game-based assessment of persistence. Journal of Educational Technology & Society, 17(1), 17–28. https://www.jstor.org/stable/jeductechsoci.17.1.17

Dochy, F. J., and Alexander, P. A. 1995. Mapping prior knowledge: A framework for discussion among researchers. European Journal of Psychology of Education, 10(3), 225–242. https://doi.org/10.1007/BF03172918

Dowell, N., and Kovanović, V. 2022. Modeling educational discourse with natural language processing. In The handbook of learning analytics (2nd ed.), C. Lang, G. Siemens, A. F. Wise, D. Gašević, and A. Merceron, Eds., Society for Learning Analytics Research (SoLAR), 105–119. https://www.solaresearch.org/publications/hla-22/hla22-chapter11/

Du, J., Hew, K. F., and Liu, L. 2023. What can online traces tell us about students’ self-regulated learning? A systematic review of online trace data analysis. Computers & Education, 201, 104828. https://doi.org/10.1016/j.compedu.2023.104828

Dunivin, Z. O. 2025. Scaling hermeneutics: A guide to qualitative coding with LLMs for reflexive content analysis. EPJ Data Science, 14(28). https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-025-00548-8

Eccles, J. S., and Wigfield, A. 2020. From expectancy-value theory to situated expectancy-value theory: A developmental, social cognitive, and sociocultural perspective on motivation. Contemporary Educational Psychology, 61, 101859. https://doi.org/10.1016/j.cedpsych.2020.101859

Efklides, A. 2011. Interactions of metacognition with motivation and affect in self-regulated learning: The MASRL model. Educational Psychologist, 46(1), 6–25. https://doi.org/10.1080/00461520.2011.538645

Feinstein, A. R., and Cicchetti, D. V. 1990. High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43(6), 543–549. https://doi.org/10.1016/0895-4356(90)90158-L

Gignac, G. E. 2021. People who consider themselves smart do not consider themselves interpersonally challenged: Convergent validity evidence for subjectively measured IQ and EI. Personality and Individual Differences, 174, 110664. https://doi.org/10.1016/j.paid.2021.110664

Graesser, A., and McNamara, D. 2010. Self-regulated learning in learning environments with pedagogical agents that interact in natural language. Educational Psychologist, 45(4) 234–244. https://doi.org/10.1080/00461520.2010.515933

Gwet, K. L. 2008. Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(2), 297–308. https://doi.org/10.1348/000711006X126600

Harackiewicz, J. M., Barron, K. E., Tauer, J. M., Carter, S. M., and Elliot, A. J. 2000. Short-term and long-term consequences of achievement goals: predicting interest and performance over time. Journal of Educational Psychology, 92(2), 316–330. https://doi.org/10.1037/0022-0663.92.2.316

Heseltine, M., and Von Hohenberg, B. C. 2024. Large language models as a substitute for human experts in annotating political text. Research & Politics, 11(1), 20531680241236239. https://doi.org/10.1177/20531680241236239

Hulleman, C. S., Schrager, S. M., Bodmann, S. M., and Harackiewicz, J. M. 2010. A meta-analytic review of achievement goal measures: Different labels for the same constructs or different constructs with similar labels? Psychological Bulletin, 136(3), 422–449. https://psycnet.apa.org/buy/2010-07936-008

Jozsa, K., Wang, J., Barrett, K. C., and Morgan, G. A. 2014. Age and Cultural Differences in Self-Perceptions of Mastery Motivation and Competence in American, Chinese, and Hungarian School Age Children. Child Development Research 2014, 1, 803061. https://doi.org/10.1155/2014/803061

Kai, S., Almeda, M.V., Baker, R.S., Heffernan, C., and Heffernan, N. 2018. Decision tree modeling of wheel-spinning and productive persistence in skill builders. Journal of Educational Data Mining, 10(1), 36–71. https://doi.org/10.5281/zenodo.3344810

Kane, M. T. 2013. Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000

Karabenick, S. A., and Gonida, E. N. 2017. Academic help seeking as a self-regulated learning strategy: Current issues, future directions. In Handbook of self-regulation of learning and performance, D. H. Schun and J. A. Greene, Eds., Routledge, 421–433.

Kasneci, E., Sessler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., … Kasneci, G. 2023. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274

Klassen, R. M., and Usher, E. L. 2010. Self-efficacy in educational settings: Recent research and emerging directions. In The decade ahead: Theoretical perspectives on motivation and achievement, vol. 16, T. Urdan and Karabenick, S. A., Eds., Emerald, , 1–33.

Krapp, A. 2002. Structural and dynamic aspects of interest development: Theoretical considerations from an ontogenetic perspective. Learning and Instruction, 12(4), 383–409. https://doi.org/10.1016/S0959-4752(01)00011-1

Krippendorff, K. 2004. Measuring the reliability of qualitative text analysis data. Quality and Quantity, 38, 787–800. https://doi.org/10.1007/s11135-004-8107-7

Krippendorff, K. 2011. Agreement and information in the reliability of coding. Communication Methods and Measures, 5(2), 93–112. https://doi.org/10.1080/19312458.2011.568376

Kuzman, T., and Ljubešić, N. 2025. LLM teacher-student framework for text classification with no manually annotated data: a case study in IPTC news topic classification. IEEE Access, 13. https://ieeexplore.ieee.org/abstract/document/10900365

Landis, J. R., and Koch, G. G. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310

Lechner, C. M., Danner, D., and Rammstedt, B. 2019. Grit (effortful persistence) can be measured with a short scale, shows little variation across socio-demographic subgroups, and is associated with career success and career engagement. PLoS One, 14(11), e0224814. https://doi.org/10.1371/journal.pone.0224814

Lent, R. W., Brown, S. D., and Larkin, K. C. 1984. Relation of self-efficacy expectations to academic achievement and persistence. Journal of Counseling Psychology, 31(3,) 356–362. https://doi.org/10.1037/0022-0167.31.3.356

Li, J., Zhu, Y., Li, Y., Li, G., and Jin, Z. 2024. Showing LLM-Generated Code Selectively Based on Confidence of LLMs. arXiv preprint arXiv:2410.03234. https://arxiv.org/abs/2410.03234

Lin, J., Diesendruck, M., Du, L., and Abraham, R. 2023. BatchPrompt: Accomplish more with less. arXiv preprint arXiv:2309.00384. https://arxiv.org/abs/2309.00384

Liu, X., Zambrano, A. F., Baker, R. S., Barany, A., Ocumpaugh, J. Zhang, J., Pankiewicz, M., Nasiar, N., and Wei, Z. 2025. Qualitative coding with GPT-4: Where it works better. Journal of Learning Analytics, 12(1), 169–185. https://doi.org/10.18608/jla.2025.8575

Loevinger, J. 1957. Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694. https://doi.org/10.2466/pr0.1957.3.3.635

Lopez, A. A., Guzman-Orth, D., Zapata-Rivera, D., Forsyth, C. M., and Luce, C. 2021. Examining the accuracy of a conversation-based assessment in interpreting English learners’ written responses. (Research Report No. RR-21-03). Educational Testing Service. https://doi.org/10.1002/ets2.12315

Martin, A., Ryan, R. M., and Brooks-Gunn, J. 2013. Longitudinal associations among interest, persistence, supportive parenting, and achievement in early childhood. Early Childhood Research Quarterly, 28(4), 658–667. https://doi.org/10.1016/j.ecresq.2013.05.003

McCaffrey, D. F., Casabianca, J. M., Ricker-Pedley, K. L., Lawless, R. R., and Wendler, C. 2021. Best practices for constructed-response scoring. ETS Research Report Series 2021, 1, 1–58. https://www.ets.org/pdfs/about/cr_best_practices.pdf

McClure, C., Smyslova, O., Hall, A., and Jiang, Y. 2024. Deductive coding’s role in AI vs. human performance. In Proceedings of the 17th International Conference on Educational Data Mining (EDM 2024), C., Demmans Epp, B. Paaßen, and D., Joyner, Eds. https://educationaldatamining.org/edm2024/proceedings/2024.EDM-posters.91/

Meindl, P., Iyer, R., and Graham, J. 2019. Distributive justice beliefs are guided by whether people think the ultimate goal of society is well-being or power. Basic and Applied Social Psychology, 41(6), 359–385. https://doi.org/10.1080/01973533.2019.1663524

Mellon, J., Bailey, J., Scott, R., Breckwoldt, J., Miori, M., and Schmedeman, P. 2024. Do AIs know what the most important issue is? Using language models to code open-text social survey responses at scale. Research & Politics, 11(1), 20531680241231468. https://doi.org/10.1177/20531680241231468

Meredith, W. 1993. Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. https://doi.org/10.1007/BF02294825

Messick, S. 1995. Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/10.1037/0003-066X.50.9.741

Miles, M. B., Huberman, A. M., and Saldaña, J. 2014. Qualitative data analysis: A methods sourcebook (3rd ed.). SAGE Publications.

Minbashian, A., Wood, R. E., and Beckmann, N. 2010. Task-contingent conscientiousness as a unit of personality at work. Journal of Applied Psychology, 95(5), 793–806. https://doi.org/10.1037/a0020016

Multon, K. D., Brown, S. D., and Lent, R. W. 1991. Relation of self-efficacy beliefs to academic outcomes: A meta-analytic investigation. Journal of Counseling Psychology, 38(1), 30–38. https://psycnet.apa.org/buy/1991-16867-001

Newton, P. E., and Shaw, S. D. 2014. Validity in educational and psychological assessment. http://digital.casalini.it/9781473904064

Nuutila, K., Tapola, A., Tuominen, H., Molnár, G., and Niemivirta, M. 2021. Mutual relationships between the levels of and changes in interest, self-efficacy, and perceived difficulty during task engagement. Learning and Individual Differences, 92, 102090. https://doi.org/10.1016/j.lindif.2021.102090

Ober, T. M., Courey, K. A., and Flor, M. 2026. Integrating topic modeling and LLM prompt engineering into a human-driven approach to analyze interview transcripts. Journal of Educational Data Mining, 18(1).

O’Reilly, T., Wang, Z., and Sabatini, J. 2019. How much knowledge is too little? When a lack of knowledge becomes a barrier to comprehension. Psychological Science, 30(9), 1344–1351. https://doi.org/10.1177/0956797619862276

Ouyang, S., Zhang, J. M., Harman, M., and Wang, M. 2024. An empirical study of the non-determinism of ChatGPT in code generation. ACM Transactions on Software Engineering and Methodology, 34(2), 1–28. https://doi.org/10.1145/3697010

Pajares, F. 1996. Self-efficacy beliefs in academic settings. Review of Educational Research, 66(4), 543–578. https://doi.org/10.3102/00346543066004543

Peeperkorn, M., Kouwenhoven, T., Brown, D., and Jordanous, A. (2024). Is temperature the creativity parameter of large language models?. arXiv preprint arXiv:2405.00492. https://arxiv.org/abs/2405.00492

Pintrich, P. R., and De Groot, E. V. 1990. Motivational and self-regulated learning components of classroom academic performance. Journal of Educational Psychology, 82(1), 33–40. https://psycnet.apa.org/buy/1990-21075-001

Porter, T., Molina, D. C., Blackwell, L., Roberts, S., Quirk, A., Duckworth, A. L., and Trzesniewski, K. 2020. Measuring mastery behaviours at scale: The Persistence, Effort, Resilience, and Challenge-Seeking (PERC) Task. Journal of Learning Analytics, 7(1), 5–18. https://doi.org/10.18608/jla.2020.71.2

Qiao, S., Fang, X., Garrett, C., Zhang, R., Li, X., and Kang, Y. 2024. Generative AI for qualitative analysis in a maternal health study: Coding in-depth interviews using large language models (LLMs). medRxiv, 2024-09. https://doi.org/10.1101/2024.09.16.24313707

Rasheed, Z., Waseem, M., Ahmad, A., Kemell, K. K., Xiaofeng, W., Duc, A. N., and Abrahamsson, P. 2024. Can large language models serve as data analysts? A multi-agent assisted approach for qualitative data analysis. arXiv preprint arXiv:2402.01386. https://doi.org/10.48550/arXiv.2402.01386

Razavi, A., Soltangheis, M., Arabzadeh, N., Salamat, S., Zihayat, M., and Bagheri, E. 2025. Benchmarking prompt sensitivity in large language models. In European Conference on Information Retrieval, 303–313. Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-88714-7_29

Reninger, K. A., and Hidi, S. 2011. Revisiting the conceptualization, measurement, and generation of interest. Educational Psychologist, 46(3), 168–184. https://doi.org/10.1080/00461520.2011.587723

Sansone, C., and Thoman, D. B. 2005. Interest as the missing motivator in self-regulation. European Psychologist, 10(3), 175–186. https://doi.org/10.1027/1016-9040.10.3.175

Schunk, D. H. 1985. Self-efficacy and classroom learning. Psychology in the Schools 22, 2, 208–223. https://doi.org/10.1002/1520-6807(198504)22:2%3C208::AID-PITS2310220215%3E3.0.CO;2-7

Schunk, D. H., and DiBenedetto, M. K. 2016. Self-efficacy theory in education. In Handbook of motivation at school, K. R. Wentzel, Ed., Routledge, 34–54.

Shah, S. T. U., Hussein, M., Barcomb, A., and Moshirpour, M. 2025. From inductive to deductive: LLMs-based qualitative data analysis in requirements engineering. arXiv preprint arXiv:2504.19384. https://doi.org/10.48550/arXiv.2504.19384

Shapiro, A. M. 2004. How including prior knowledge as a subject variable may change outcomes of learning research. American Educational Research Journal, 41(1), 159–189. https://doi.org/10.3102/00028312041001159

Shepard, L. A. 2016. Evaluating test validity: Reprise and progress. Assessment in Education: Principles, Policy & Practice, 23(2), 268–280. https://doi.org/10.1080/0969594X.2016.1141168

Silvia, P. J. 2005. What is interesting? Exploring the appraisal structure of interest. Emotion, 5(1), 89–102. https://psycnet.apa.org/buy/2005-02259-008

Simonsmeier, B. A., Flaig, M., Deiglmayr, A., Schalk, L., and Schneider, M. 2022. Domain-specific prior knowledge and learning: A meta-analysis. Educational Psychologist, 57(1), 31–54. https://doi.org/10.1080/00461520.2021.1939700

Skinner, E. A., and Pitzer, J. R. 2012. Developmental dynamics of student engagement, coping, and everyday resilience. In Handbook of research on student engagement, S. Christenson, A. Reschly, and C. Wylie, Eds., Springer US, 21–44. https://doi.org/10.1007/978-1-4614-2018-7_2

Skinner, E. A., Graham, J. P., Brule, H., Rickert, N., and Kindermann, T. A. 2020. “I get knocked down but I get up again”: Integrative frameworks for studying the development of motivational resilience in school. International Journal of Behavioral Development, 44(4), 290–300. https://doi.org/10.1177/0165025420924122

Sparks, J. R., Lehman, B., Gladstone, J., Zhang, S., Schroeder, N., and Israel, M. 2025. Measuring persistence and academic resilience of K-12 students: Systematic review and operational definitions. Frontiers in Education, 10, 1673500. https://doi.org/10.3389/feduc.2025.1673500

Stewart, S., Lim, D. H., and Kim, J. 2015. Factors influencing college persistence for first-time students. Journal of Developmental Education, 38(30, 12–20. https://www.jstor.org/stable/24614019

Tinto, V. 2017. Reflections on student persistence. Student Success, 8(2), 1–8. https://search.informit.org/doi/abs/10.3316/INFORMIT.593199291602507

Tobias, S. 1994. Interest, prior knowledge, and learning. Review of Educational Research, 64(1), 37–54. https://doi.org/10.3102/00346543064001037

Törnberg, P. 2025. Large language models outperform expert coders and supervised classifiers at annotating political social media messages. Social Science Computer Review, 43(6), 1181–1195. https://doi.org/10.1177/08944393241286471

Tulis, M., and Fulmer, S. M. 2013. Students’ motivational and emotional experiences and their relationship to persistence during academic challenge in mathematics and reading. Learning and Individual Differences, 27, 35–46. https://doi.org/10.1016/j.lindif.2013.06.003

Turpin, M., Michael, J., Perez, E., and Bowman, S. 2023. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems, 36, 74952–74965. https://proceedings.neurips.cc/paper_files/paper/2023/hash/ed3fea9033a80fea1376299fa7863f4a-Abstract-Conference.html

Wigfield, A., and Eccles, J. S. 2000. Expectancy–value theory of achievement motivation. Contemporary Educational Psychology, 25(1), 68–81. https://doi.org/10.1006/ceps.1999.1015

Wigfield, A., Muenks, K., and Eccles, J. S. 2021. Achievement motivation: What we know and where we are going. Annual Review of Developmental Psychology, 3(1), 87–111. https://doi.org/10.1146/annurev-devpsych-050720-103500

Yoshida, L. 2025. Do we need a detailed rubric for automated essay scoring using large language models?. In Artificial Intelligence in Education. AIED 2025. Lecture Notes in Computer Science, vol 15882, A. I. Cristea, E. Walker, Y. Lu, O. C. Santos, and S. Isotani, Eds., Cham: Springer Nature Switzerland, 60–67.

Zapata-Rivera, D., and Forsyth, C. M. 2022, June. Learner modeling in conversation-based assessment. In International Conference on Human-Computer Interaction. Cham: Springer International Publishing, 73–83. https://doi.org/10.1007/978-3-031-05887-5_6

Zapata-Rivera, D., Jackson, T., and Katz, I. R. 2015. Authoring conversation-based assessment scenarios. In Design Recommendations for Intelligent Tutoring Systems Volume 3: Authoring Tools and Expert Modeling Techniques, R. A. Sottilare, A. C. Graesser, X. Hu, and K. Brawner Eds., U.S. Army Research Laboratory, 169–178.

Zapata-Rivera, D., Sparks, J. R., Forsyth, C. M., and Lehman, B. 2023. Conversation-based assessment: current findings and future work. In International Encyclopedia of Education (Fourth Edition) R. J. Tierney, F. Rizvi, and K. Ercikan, Eds., Elsevier, 504–518). https://doi.org/10.1016/B978-0-12-818630-5.10063-6

Zhang, S., Meshram, P. S., Ganapathy Prasad, P., Israel, M., and Bhat, S. 2025. An LLM-based framework for simulating, classifying, and correcting students’ programming knowledge with the SOLO taxonomy. In Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 2, J. A. Stone, T. Yuen, L. Shoop, S. A. Rebelsky, and J. Prather, Eds., 1681–1682. https://doi.org/10.1145/3641555.3705125

Zhou, M., and Kam, C. C. S. 2017. Trait procrastination, self-efficacy and achievement goals: the mediation role of boredom coping strategies. Educational Psychology, 37(7), 854–872. https://doi.org/10.1080/01443410.2017.1293801

Ziems, C., Chen, J., Zhang, A., and Yang, D. 2023. Can large language models transform computational social science? arXiv preprint. https://doi.org/10.48550/arXiv.2305.03514

Zimmerman, B. J. 2002. Becoming a self-regulated learner: An overview. Theory into Practice, 41(2), 64–70. https://doi.org/10.1207/s15430421tip4102_2

Zimmerman, B. J., and Moylan, A. R. 2009. Self-regulation: Where metacognition and motivation intersect. In Handbook of metacognition in education, D. J. Hacker, J. Dunlosky, and A. C. Graesser, Eds., Routledge, 299–315.

Zumbo, B. D. 2009. Validity as contextualized and pragmatic explanation, and its implications for validation practice. In The concept of validity: Revisions, new directions, and applications, R. W. Lissitz, Ed., Information Age Publishing, 65–82. https://psycnet.apa.org/record/2009-23060-004
Section
Special Section: Human-AI Partnership for Qualitative Analysis

Most read articles by the same author(s)