Using LLMs to Identify Indicators of Persistence from Students’ Dialogues with a Pedagogical Agent
Main
Sidebar
Abstract
Conversational learning systems offer new opportunities to examine learning processes through chat log data. Constructs such as persistence, self-efficacy, interest, perceived challenge, and prior knowledge are known predictors of student performance but are challenging to detect at scale using traditional methods. This study explores the use of Large Language Models (LLMs) to automatically code indicators of these constructs from student chat logs collected through a conversation-based assessment (CBA) for middle school mathematics. Indicators included observable behaviors such as students’ expressions of challenge, help-seeking, goal-setting, and self-regulatory strategies evident in their conversational interactions within the CBA. We evaluated multiple configurations of ChatGPT4o, varying temperature settings (0, .3, .7, 1) and model types (mini vs. regular), against human expert coders. The dataset comprised over 10,000 student turns collected from 107 middle school students classified as English learners as they interact with the CBA. Reliability was assessed within and between LLM configurations and humans. Results reveal systematic patterns: constructs with moderate theoretical coherence benefited from higher temperatures, while well-defined constructs required deterministic settings. Self-efficacy showed the highest human-LLM alignment. These findings illustrate the challenges of measuring complex psychological constructs and highlight the promise of human-LLM collaboration to enhance qualitative coding efficiency and validity in educational research. Supplemental materials are available online here: https://doi.org/10.17605/osf.io/s85ck.
How to Cite
Details
construct extraction, persistence, model configuration, Human-LLM collaboration, qualitative analysis, large language models (LLMs), conversation-based assessment (CBA), educational data mining, construct validity, temperature settings
Ainley, M., Hidi, S., and Berndorff, D. 2002. Interest, learning, and the psychological processes that mediate their relationship. Journal of Educational Psychology, 94(3), 545–561. https://doi.org/10.1037/0022-0663.94.3.545
Alexander, P. A. 2003. The development of expertise: The journey from acclimation to proficiency. Educational Researcher, 32(8), 10–14. https://doi.org/10.3102/0013189X032008010
Anagnostidis, S., and Bulian, J. 2024. How susceptible are LLMs to influence in prompts? arXiv preprint arXiv:2408.11865. https://doi.org/10.48550/arXiv.2408.11865
Bandura, A. 1977. Self-efficacy: toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215. https://psycnet.apa.org/doi/10.1037/0033-295X.84.2.191
Bandura, A. 2006. Toward a psychology of human agency. Perspectives on Psychological Science, 1(2), 164–180. https://doi.org/10.1111/j.1745-6916.2006.00011.x
Barany, A., Nasiar, N., Porter, C., Zambrano, A. F., Andres, A. L., Bright, D., Shah, M., Liu, X., Gao, S., Zhang, J., Mehta, S., Choi, J., Giordano, C. and Baker, R. S. 2024. ChatGPT for Education Research: Exploring the Potential of Large Language Models for Qualitative Codebook Development. In Artificial Intelligence in Education. AIED 2024, A. M. Olney, I. A. Chounta, Z. Liu, O. C. Santos, and I. I. Bittencourt, Eds., Lecture Notes in Computer Science, vol 14830, Springer, Cham, 134–149. https://doi.org/10.1007/978-3-031-64299-9_10
Battle, E. S. 1965. Motivational determinants of academic task persistence. Journal of Personality and Social Psychology, 2(2), 209–218. https://doi.org/10.1037/h0022442
Bauer, M. I., and Zapata-Rivera, D. 2020. Cognitive foundations of automated scoring. In Handbook of automated scoring: Theory into Practice, D. Yan, A. A. Rupp, and P. W. Foltz, Eds., CRC Press, 13–28.
Bernacki, M. L. 2018. Examining the cyclical, loosely sequenced, and contingent features of self-regulated learning: Trace data and their analysis. In Handbook of self-regulation of learning and performance (2nd ed.), B. J. Zimmerman and D. H. Schunk, Eds., Routledge/Taylor & Francis Group, 370–387. https://psycnet.apa.org/doi/10.4324/9781315697048-24
Bernacki, M. L., Nokes-Malach, T. J., and Aleven, V. 2015. Examining self-efficacy during learning: Variability and relations to behavior, performance, and learning. Metacognition and Learning 10, 99–117. https://doi.org/10.1007/s11409-014-9127-x
Borsboom, D., Mellenbergh, G. J., and Van Heerden, J. 2004. The concept of validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061
Botelho, A., Baral, S., Erickson, J. A., Benachamardi, P., and Heffernan, N. T. 2023. Leveraging natural language processing to support automated assessment and feedback for student open responses in mathematics. Journal of Computer Assisted Learning, 39(3), 823–840. https://doi.org/10.1111/jcal.12793
Charmaz, K. 2006. Constructing grounded theory: A practical guide through qualitative analysis. Sage.
Chew, R., Bollenbacher, J., Wenger, M., Speer, J., and Kim, A. 2023. LLM-assisted content analysis: Using large language models to support deductive coding. arXiv preprint arXiv:2306.14924. https://doi.org/10.48550/arXiv.2306.14924
Clark, R. E., and Saxberg, B. 2018. Engineering motivation using the belief-expectancy-control framework. Interdisciplinary Education and Psychology, 2(1), 1–26. https://riverapublications.com/assets/files/pdf_files/engineering-motivation-using-the-belief-expectancy-control-framework.pdf
Cohen J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104
Cronbach, L. J., and Meehl, P. E. 1955. Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957
Crossley, S., McNamara, D., Baker, R. S., Wang, Y., Paquette, L., Barnes, T., and Bergner, Y. 2015. Language to Completion: Success in an Educational Data Mining Massive Open Online Course. In Proceedings of the 8th International Conference on Educational Data Mining, O. C. Santos, J. B. Boticario, C. Romero, M. Pechenizkiy, A. Merceron, P. Mitros, J. M. Luna, C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, and M. Desmarais, Eds., 388–391. https://www.educationaldatamining.org/EDM2015/proceedings/short388-391.pdf
D’Mello, S., and Graesser, A. 2012. Dynamics of affective states during complex learning. Learning and Instruction, 22(2), 145–157. https://doi.org/10.1016/j.learninstruc.2011.10.001
DiCerbo, K. E. 2014. Game-based assessment of persistence. Journal of Educational Technology & Society, 17(1), 17–28. https://www.jstor.org/stable/jeductechsoci.17.1.17
Dochy, F. J., and Alexander, P. A. 1995. Mapping prior knowledge: A framework for discussion among researchers. European Journal of Psychology of Education, 10(3), 225–242. https://doi.org/10.1007/BF03172918
Dowell, N., and Kovanović, V. 2022. Modeling educational discourse with natural language processing. In The handbook of learning analytics (2nd ed.), C. Lang, G. Siemens, A. F. Wise, D. Gašević, and A. Merceron, Eds., Society for Learning Analytics Research (SoLAR), 105–119. https://www.solaresearch.org/publications/hla-22/hla22-chapter11/
Du, J., Hew, K. F., and Liu, L. 2023. What can online traces tell us about students’ self-regulated learning? A systematic review of online trace data analysis. Computers & Education, 201, 104828. https://doi.org/10.1016/j.compedu.2023.104828
Dunivin, Z. O. 2025. Scaling hermeneutics: A guide to qualitative coding with LLMs for reflexive content analysis. EPJ Data Science, 14(28). https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-025-00548-8
Eccles, J. S., and Wigfield, A. 2020. From expectancy-value theory to situated expectancy-value theory: A developmental, social cognitive, and sociocultural perspective on motivation. Contemporary Educational Psychology, 61, 101859. https://doi.org/10.1016/j.cedpsych.2020.101859
Efklides, A. 2011. Interactions of metacognition with motivation and affect in self-regulated learning: The MASRL model. Educational Psychologist, 46(1), 6–25. https://doi.org/10.1080/00461520.2011.538645
Feinstein, A. R., and Cicchetti, D. V. 1990. High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43(6), 543–549. https://doi.org/10.1016/0895-4356(90)90158-L
Gignac, G. E. 2021. People who consider themselves smart do not consider themselves interpersonally challenged: Convergent validity evidence for subjectively measured IQ and EI. Personality and Individual Differences, 174, 110664. https://doi.org/10.1016/j.paid.2021.110664
Graesser, A., and McNamara, D. 2010. Self-regulated learning in learning environments with pedagogical agents that interact in natural language. Educational Psychologist, 45(4) 234–244. https://doi.org/10.1080/00461520.2010.515933
Gwet, K. L. 2008. Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(2), 297–308. https://doi.org/10.1348/000711006X126600
Harackiewicz, J. M., Barron, K. E., Tauer, J. M., Carter, S. M., and Elliot, A. J. 2000. Short-term and long-term consequences of achievement goals: predicting interest and performance over time. Journal of Educational Psychology, 92(2), 316–330. https://doi.org/10.1037/0022-0663.92.2.316
Heseltine, M., and Von Hohenberg, B. C. 2024. Large language models as a substitute for human experts in annotating political text. Research & Politics, 11(1), 20531680241236239. https://doi.org/10.1177/20531680241236239
Hulleman, C. S., Schrager, S. M., Bodmann, S. M., and Harackiewicz, J. M. 2010. A meta-analytic review of achievement goal measures: Different labels for the same constructs or different constructs with similar labels? Psychological Bulletin, 136(3), 422–449. https://psycnet.apa.org/buy/2010-07936-008
Jozsa, K., Wang, J., Barrett, K. C., and Morgan, G. A. 2014. Age and Cultural Differences in Self-Perceptions of Mastery Motivation and Competence in American, Chinese, and Hungarian School Age Children. Child Development Research 2014, 1, 803061. https://doi.org/10.1155/2014/803061
Kai, S., Almeda, M.V., Baker, R.S., Heffernan, C., and Heffernan, N. 2018. Decision tree modeling of wheel-spinning and productive persistence in skill builders. Journal of Educational Data Mining, 10(1), 36–71. https://doi.org/10.5281/zenodo.3344810
Kane, M. T. 2013. Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000
Karabenick, S. A., and Gonida, E. N. 2017. Academic help seeking as a self-regulated learning strategy: Current issues, future directions. In Handbook of self-regulation of learning and performance, D. H. Schun and J. A. Greene, Eds., Routledge, 421–433.
Kasneci, E., Sessler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., … Kasneci, G. 2023. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274
Klassen, R. M., and Usher, E. L. 2010. Self-efficacy in educational settings: Recent research and emerging directions. In The decade ahead: Theoretical perspectives on motivation and achievement, vol. 16, T. Urdan and Karabenick, S. A., Eds., Emerald, , 1–33.
Krapp, A. 2002. Structural and dynamic aspects of interest development: Theoretical considerations from an ontogenetic perspective. Learning and Instruction, 12(4), 383–409. https://doi.org/10.1016/S0959-4752(01)00011-1
Krippendorff, K. 2004. Measuring the reliability of qualitative text analysis data. Quality and Quantity, 38, 787–800. https://doi.org/10.1007/s11135-004-8107-7
Krippendorff, K. 2011. Agreement and information in the reliability of coding. Communication Methods and Measures, 5(2), 93–112. https://doi.org/10.1080/19312458.2011.568376
Kuzman, T., and Ljubešić, N. 2025. LLM teacher-student framework for text classification with no manually annotated data: a case study in IPTC news topic classification. IEEE Access, 13. https://ieeexplore.ieee.org/abstract/document/10900365
Landis, J. R., and Koch, G. G. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310
Lechner, C. M., Danner, D., and Rammstedt, B. 2019. Grit (effortful persistence) can be measured with a short scale, shows little variation across socio-demographic subgroups, and is associated with career success and career engagement. PLoS One, 14(11), e0224814. https://doi.org/10.1371/journal.pone.0224814
Lent, R. W., Brown, S. D., and Larkin, K. C. 1984. Relation of self-efficacy expectations to academic achievement and persistence. Journal of Counseling Psychology, 31(3,) 356–362. https://doi.org/10.1037/0022-0167.31.3.356
Li, J., Zhu, Y., Li, Y., Li, G., and Jin, Z. 2024. Showing LLM-Generated Code Selectively Based on Confidence of LLMs. arXiv preprint arXiv:2410.03234. https://arxiv.org/abs/2410.03234
Lin, J., Diesendruck, M., Du, L., and Abraham, R. 2023. BatchPrompt: Accomplish more with less. arXiv preprint arXiv:2309.00384. https://arxiv.org/abs/2309.00384
Liu, X., Zambrano, A. F., Baker, R. S., Barany, A., Ocumpaugh, J. Zhang, J., Pankiewicz, M., Nasiar, N., and Wei, Z. 2025. Qualitative coding with GPT-4: Where it works better. Journal of Learning Analytics, 12(1), 169–185. https://doi.org/10.18608/jla.2025.8575
Loevinger, J. 1957. Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694. https://doi.org/10.2466/pr0.1957.3.3.635
Lopez, A. A., Guzman-Orth, D., Zapata-Rivera, D., Forsyth, C. M., and Luce, C. 2021. Examining the accuracy of a conversation-based assessment in interpreting English learners’ written responses. (Research Report No. RR-21-03). Educational Testing Service. https://doi.org/10.1002/ets2.12315
Martin, A., Ryan, R. M., and Brooks-Gunn, J. 2013. Longitudinal associations among interest, persistence, supportive parenting, and achievement in early childhood. Early Childhood Research Quarterly, 28(4), 658–667. https://doi.org/10.1016/j.ecresq.2013.05.003
McCaffrey, D. F., Casabianca, J. M., Ricker-Pedley, K. L., Lawless, R. R., and Wendler, C. 2021. Best practices for constructed-response scoring. ETS Research Report Series 2021, 1, 1–58. https://www.ets.org/pdfs/about/cr_best_practices.pdf
McClure, C., Smyslova, O., Hall, A., and Jiang, Y. 2024. Deductive coding’s role in AI vs. human performance. In Proceedings of the 17th International Conference on Educational Data Mining (EDM 2024), C., Demmans Epp, B. Paaßen, and D., Joyner, Eds. https://educationaldatamining.org/edm2024/proceedings/2024.EDM-posters.91/
Meindl, P., Iyer, R., and Graham, J. 2019. Distributive justice beliefs are guided by whether people think the ultimate goal of society is well-being or power. Basic and Applied Social Psychology, 41(6), 359–385. https://doi.org/10.1080/01973533.2019.1663524
Mellon, J., Bailey, J., Scott, R., Breckwoldt, J., Miori, M., and Schmedeman, P. 2024. Do AIs know what the most important issue is? Using language models to code open-text social survey responses at scale. Research & Politics, 11(1), 20531680241231468. https://doi.org/10.1177/20531680241231468
Meredith, W. 1993. Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. https://doi.org/10.1007/BF02294825
Messick, S. 1995. Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/10.1037/0003-066X.50.9.741
Miles, M. B., Huberman, A. M., and Saldaña, J. 2014. Qualitative data analysis: A methods sourcebook (3rd ed.). SAGE Publications.
Minbashian, A., Wood, R. E., and Beckmann, N. 2010. Task-contingent conscientiousness as a unit of personality at work. Journal of Applied Psychology, 95(5), 793–806. https://doi.org/10.1037/a0020016
Multon, K. D., Brown, S. D., and Lent, R. W. 1991. Relation of self-efficacy beliefs to academic outcomes: A meta-analytic investigation. Journal of Counseling Psychology, 38(1), 30–38. https://psycnet.apa.org/buy/1991-16867-001
Newton, P. E., and Shaw, S. D. 2014. Validity in educational and psychological assessment. http://digital.casalini.it/9781473904064
Nuutila, K., Tapola, A., Tuominen, H., Molnár, G., and Niemivirta, M. 2021. Mutual relationships between the levels of and changes in interest, self-efficacy, and perceived difficulty during task engagement. Learning and Individual Differences, 92, 102090. https://doi.org/10.1016/j.lindif.2021.102090
Ober, T. M., Courey, K. A., and Flor, M. 2026. Integrating topic modeling and LLM prompt engineering into a human-driven approach to analyze interview transcripts. Journal of Educational Data Mining, 18(1).
O’Reilly, T., Wang, Z., and Sabatini, J. 2019. How much knowledge is too little? When a lack of knowledge becomes a barrier to comprehension. Psychological Science, 30(9), 1344–1351. https://doi.org/10.1177/0956797619862276
Ouyang, S., Zhang, J. M., Harman, M., and Wang, M. 2024. An empirical study of the non-determinism of ChatGPT in code generation. ACM Transactions on Software Engineering and Methodology, 34(2), 1–28. https://doi.org/10.1145/3697010
Pajares, F. 1996. Self-efficacy beliefs in academic settings. Review of Educational Research, 66(4), 543–578. https://doi.org/10.3102/00346543066004543
Peeperkorn, M., Kouwenhoven, T., Brown, D., and Jordanous, A. (2024). Is temperature the creativity parameter of large language models?. arXiv preprint arXiv:2405.00492. https://arxiv.org/abs/2405.00492
Pintrich, P. R., and De Groot, E. V. 1990. Motivational and self-regulated learning components of classroom academic performance. Journal of Educational Psychology, 82(1), 33–40. https://psycnet.apa.org/buy/1990-21075-001
Porter, T., Molina, D. C., Blackwell, L., Roberts, S., Quirk, A., Duckworth, A. L., and Trzesniewski, K. 2020. Measuring mastery behaviours at scale: The Persistence, Effort, Resilience, and Challenge-Seeking (PERC) Task. Journal of Learning Analytics, 7(1), 5–18. https://doi.org/10.18608/jla.2020.71.2
Qiao, S., Fang, X., Garrett, C., Zhang, R., Li, X., and Kang, Y. 2024. Generative AI for qualitative analysis in a maternal health study: Coding in-depth interviews using large language models (LLMs). medRxiv, 2024-09. https://doi.org/10.1101/2024.09.16.24313707
Rasheed, Z., Waseem, M., Ahmad, A., Kemell, K. K., Xiaofeng, W., Duc, A. N., and Abrahamsson, P. 2024. Can large language models serve as data analysts? A multi-agent assisted approach for qualitative data analysis. arXiv preprint arXiv:2402.01386. https://doi.org/10.48550/arXiv.2402.01386
Razavi, A., Soltangheis, M., Arabzadeh, N., Salamat, S., Zihayat, M., and Bagheri, E. 2025. Benchmarking prompt sensitivity in large language models. In European Conference on Information Retrieval, 303–313. Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-88714-7_29
Reninger, K. A., and Hidi, S. 2011. Revisiting the conceptualization, measurement, and generation of interest. Educational Psychologist, 46(3), 168–184. https://doi.org/10.1080/00461520.2011.587723
Sansone, C., and Thoman, D. B. 2005. Interest as the missing motivator in self-regulation. European Psychologist, 10(3), 175–186. https://doi.org/10.1027/1016-9040.10.3.175
Schunk, D. H. 1985. Self-efficacy and classroom learning. Psychology in the Schools 22, 2, 208–223. https://doi.org/10.1002/1520-6807(198504)22:2%3C208::AID-PITS2310220215%3E3.0.CO;2-7
Schunk, D. H., and DiBenedetto, M. K. 2016. Self-efficacy theory in education. In Handbook of motivation at school, K. R. Wentzel, Ed., Routledge, 34–54.
Shah, S. T. U., Hussein, M., Barcomb, A., and Moshirpour, M. 2025. From inductive to deductive: LLMs-based qualitative data analysis in requirements engineering. arXiv preprint arXiv:2504.19384. https://doi.org/10.48550/arXiv.2504.19384
Shapiro, A. M. 2004. How including prior knowledge as a subject variable may change outcomes of learning research. American Educational Research Journal, 41(1), 159–189. https://doi.org/10.3102/00028312041001159
Shepard, L. A. 2016. Evaluating test validity: Reprise and progress. Assessment in Education: Principles, Policy & Practice, 23(2), 268–280. https://doi.org/10.1080/0969594X.2016.1141168
Silvia, P. J. 2005. What is interesting? Exploring the appraisal structure of interest. Emotion, 5(1), 89–102. https://psycnet.apa.org/buy/2005-02259-008
Simonsmeier, B. A., Flaig, M., Deiglmayr, A., Schalk, L., and Schneider, M. 2022. Domain-specific prior knowledge and learning: A meta-analysis. Educational Psychologist, 57(1), 31–54. https://doi.org/10.1080/00461520.2021.1939700
Skinner, E. A., and Pitzer, J. R. 2012. Developmental dynamics of student engagement, coping, and everyday resilience. In Handbook of research on student engagement, S. Christenson, A. Reschly, and C. Wylie, Eds., Springer US, 21–44. https://doi.org/10.1007/978-1-4614-2018-7_2
Skinner, E. A., Graham, J. P., Brule, H., Rickert, N., and Kindermann, T. A. 2020. “I get knocked down but I get up again”: Integrative frameworks for studying the development of motivational resilience in school. International Journal of Behavioral Development, 44(4), 290–300. https://doi.org/10.1177/0165025420924122
Sparks, J. R., Lehman, B., Gladstone, J., Zhang, S., Schroeder, N., and Israel, M. 2025. Measuring persistence and academic resilience of K-12 students: Systematic review and operational definitions. Frontiers in Education, 10, 1673500. https://doi.org/10.3389/feduc.2025.1673500
Stewart, S., Lim, D. H., and Kim, J. 2015. Factors influencing college persistence for first-time students. Journal of Developmental Education, 38(30, 12–20. https://www.jstor.org/stable/24614019
Tinto, V. 2017. Reflections on student persistence. Student Success, 8(2), 1–8. https://search.informit.org/doi/abs/10.3316/INFORMIT.593199291602507
Tobias, S. 1994. Interest, prior knowledge, and learning. Review of Educational Research, 64(1), 37–54. https://doi.org/10.3102/00346543064001037
Törnberg, P. 2025. Large language models outperform expert coders and supervised classifiers at annotating political social media messages. Social Science Computer Review, 43(6), 1181–1195. https://doi.org/10.1177/08944393241286471
Tulis, M., and Fulmer, S. M. 2013. Students’ motivational and emotional experiences and their relationship to persistence during academic challenge in mathematics and reading. Learning and Individual Differences, 27, 35–46. https://doi.org/10.1016/j.lindif.2013.06.003
Turpin, M., Michael, J., Perez, E., and Bowman, S. 2023. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems, 36, 74952–74965. https://proceedings.neurips.cc/paper_files/paper/2023/hash/ed3fea9033a80fea1376299fa7863f4a-Abstract-Conference.html
Wigfield, A., and Eccles, J. S. 2000. Expectancy–value theory of achievement motivation. Contemporary Educational Psychology, 25(1), 68–81. https://doi.org/10.1006/ceps.1999.1015
Wigfield, A., Muenks, K., and Eccles, J. S. 2021. Achievement motivation: What we know and where we are going. Annual Review of Developmental Psychology, 3(1), 87–111. https://doi.org/10.1146/annurev-devpsych-050720-103500
Yoshida, L. 2025. Do we need a detailed rubric for automated essay scoring using large language models?. In Artificial Intelligence in Education. AIED 2025. Lecture Notes in Computer Science, vol 15882, A. I. Cristea, E. Walker, Y. Lu, O. C. Santos, and S. Isotani, Eds., Cham: Springer Nature Switzerland, 60–67.
Zapata-Rivera, D., and Forsyth, C. M. 2022, June. Learner modeling in conversation-based assessment. In International Conference on Human-Computer Interaction. Cham: Springer International Publishing, 73–83. https://doi.org/10.1007/978-3-031-05887-5_6
Zapata-Rivera, D., Jackson, T., and Katz, I. R. 2015. Authoring conversation-based assessment scenarios. In Design Recommendations for Intelligent Tutoring Systems Volume 3: Authoring Tools and Expert Modeling Techniques, R. A. Sottilare, A. C. Graesser, X. Hu, and K. Brawner Eds., U.S. Army Research Laboratory, 169–178.
Zapata-Rivera, D., Sparks, J. R., Forsyth, C. M., and Lehman, B. 2023. Conversation-based assessment: current findings and future work. In International Encyclopedia of Education (Fourth Edition) R. J. Tierney, F. Rizvi, and K. Ercikan, Eds., Elsevier, 504–518). https://doi.org/10.1016/B978-0-12-818630-5.10063-6
Zhang, S., Meshram, P. S., Ganapathy Prasad, P., Israel, M., and Bhat, S. 2025. An LLM-based framework for simulating, classifying, and correcting students’ programming knowledge with the SOLO taxonomy. In Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 2, J. A. Stone, T. Yuen, L. Shoop, S. A. Rebelsky, and J. Prather, Eds., 1681–1682. https://doi.org/10.1145/3641555.3705125
Zhou, M., and Kam, C. C. S. 2017. Trait procrastination, self-efficacy and achievement goals: the mediation role of boredom coping strategies. Educational Psychology, 37(7), 854–872. https://doi.org/10.1080/01443410.2017.1293801
Ziems, C., Chen, J., Zhang, A., and Yang, D. 2023. Can large language models transform computational social science? arXiv preprint. https://doi.org/10.48550/arXiv.2305.03514
Zimmerman, B. J. 2002. Becoming a self-regulated learner: An overview. Theory into Practice, 41(2), 64–70. https://doi.org/10.1207/s15430421tip4102_2
Zimmerman, B. J., and Moylan, A. R. 2009. Self-regulation: Where metacognition and motivation intersect. In Handbook of metacognition in education, D. J. Hacker, J. Dunlosky, and A. C. Graesser, Eds., Routledge, 299–315.
Zumbo, B. D. 2009. Validity as contextualized and pragmatic explanation, and its implications for validation practice. In The concept of validity: Revisions, new directions, and applications, R. W. Lissitz, Ed., Information Age Publishing, 65–82. https://psycnet.apa.org/record/2009-23060-004

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.
https://orcid.org/0009-0003-3532-0661