Data Plus Theory Equals Codebook: Leveraging LLMs for Human-AI Codebook Development
Main
Sidebar
Abstract
Recent research has explored the use of Large Language Models (LLMs) to develop qualitative codebooks, mainly for inductive work with large datasets, where manual review is impractical. Although these efforts show promise, they often neglect the theoretical grounding essential to many types of qualitative analysis. This paper investigates the potential of GPT-4o to support theory-informed codebook development across two educational contexts. In the first study, we employ a three-step approach—drawing on Winne & Hadwin’s and Zimmerman’s Self-Regulated Learning (SRL) theories, think-aloud data, and human refinement—to evaluate GPT-4o’s ability to generate high-quality, theory-aligned codebooks. Results indicate that GPT-4o can effectively leverage its knowledge base to identify SRL constructs reflected in student problem-solving behavior. In the second study, we extend this approach to a STEM game-based learning context guided by Hidi & Renninger’s four-phase model of Interest Development. We compare four prompting strategies: no theories provided, theories named, full references given, and full-text theory papers supplied. Human evaluations show that naming the theory without including full references produced the most practical and usable codebook, while supplying full papers to the prompt enhanced theoretical alignment but reduced applicability. These findings suggest that GPT-4o can be a valuable partner in theory-driven qualitative research when grounded in well-established frameworks, but that attention to prompt design is required. Our results show that widely available foundation models—trained on large-scale open web and licensed datasets—can effectively distill established educational theories to support qualitative research and codebook development. The code for our codebook development process and all the employed prompts and codebooks produced by GPT are available for replication purposes at: https://osf.io/g3z4x
How to Cite
Details
Large Language Models, Qualitative Codebooks, Interest Development, Self-Regulated Learning, Thematic Analysis, Codebook Development
Baker, R. S., Hutt, S., Bosch, N., Ocumpaugh, J., Biswas, G., Paquette, L., Andres, J. M. A., Nasiar, N., and Munshi, A. 2024. Detector-driven classroom interviewing: Focusing qualitative researcher time by selecting cases in situ. Educational Technology Research and Development, 72, 2841–2863.
Bannert, M., Reimann, P., and Sonnenberg, C. 2014. Process mining techniques for analysing patterns and strategies in students’ self-regulated learning. Metacognition and Learning, 9, 161–185.
Barany, A., Nasiar, N., Porter, C., Zambrano, A. F., Andres, A., Bright, D., Choi, J., Gao, S., Giordano, C., Liu, X., Mehta, S., Shah, M., Zhang, J., and Baker, R. S. 2024. ChatGPT for education research: Exploring the potential of large language models for qualitative codebook development. In Proceedings of the International Conference on Artificial Intelligence in Education (Vol. 14830, pp. 134–149). Springer.
Bialik, M., Zhan, K., and Reich, J. 2025. Who coded it better? Exploring AI-assisted qualitative analysis through researcher reactions. In A. Barany, R. S. Baker, A. Katz, & J. Lin (Eds.), From data to discovery: LLMs for qualitative analysis in education (LAK ’25 Workshop). Dublin, Ireland.
Bingham, A. J., and Witkowsky, P. 2021. Deductive and inductive approaches to qualitative data analysis. In Analyzing and interpreting qualitative data: After the interview (pp. 133–146).
Blumer, H. (1954). The nature of race prejudice.
Borchers, C., Zhang, J., Baker, R. S., and Aleven, V. 2024. Using think-aloud data to understand relations between self-regulation cycle characteristics and student performance in intelligent tutoring systems. In Proceedings of the 14th Learning Analytics and Knowledge Conference (LAK ’24) (pp. 529–539). ACM.
Borchers, C., Shahrokhian, B., Balzan, F., Tajik, E., Sankaranarayanan, S., and Simon, S. (2025). Temperature and persona shape LLM agent consensus with minimal accuracy gains in qualitative coding.
Braun, V., and Clarke, V. 2012. Thematic analysis. American Psychological Association.
Charmaz, K. 1983. Loss of self: A fundamental form of suffering in the chronically ill. Sociology of Health & Illness, 5, 168–195.
Charmaz, K. 2006. Constructing grounded theory: A practical guide through qualitative analysis. Sage.
Chen, J., Lotsos, A., Wang, G., Zhao, L., Sherin, B., Wilensky, U., and Horn, M. 2025. Processes matter: How ML/GAI approaches could support open qualitative coding of online discourse datasets. In Proceedings of the 18th International Conference on Computer-Supported Collaborative Learning (pp. 415–419). ISLS.
Chew, R., Bollenbacher, J., Wenger, M., Speer, J., and Kim, A. 2023. LLM-assisted content analysis: Using large language models to support deductive coding.
Corbin, J. M., and Strauss, A. 1990. Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative Sociology, 13, 3–21.
De Paoli, S. 2024. Performing an inductive thematic analysis of semi-structured interviews with a large language model. Social Science Computer Review, 42, 997–1019.
Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M. M., Kim, S., Dernoncourt, F., Yu, T., Zhang, R., and Ahmed, N. K. 2024. Bias and fairness in large language models: A survey. Computational Linguistics, 50, 1097–1179.
Gao, J., Guo, Y., Lim, G., Zhang, T., Zhang, Z., Li, T. J.-J., and Perrault, S. T. 2024. CollabCoder: A lower-barrier, rigorous workflow for inductive collaborative qualitative analysis with large language models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Article 11, pp. 1–29). ACM. https://doi.org/10.1145/3613904.3642002
Gibbs, G. R. 2018. The nature of qualitative analysis. In Analyzing qualitative data (2nd ed., pp. 1–16).
Giray, L. 2023. Prompt engineering with ChatGPT: A guide for academic writers. Annals of Biomedical Engineering, 51, 2629–2633.
Greene, J. A., Bernacki, M. L., and Hadwin, A. F. 2023. Self-regulation. In Handbook of educational psychology (pp. 314–334).
Guest, G., Bunce, A., and Johnson, L. 2006. How many interviews are enough? Field Methods, 18, 59–82.
Hidi, S., and Renninger, K. A. 2006. The four-phase model of interest development. Educational Psychologist, 41, 111–127.
Hutt, S., Ocumpaugh, J., Ma, J., Andres, A. L., Bosch, N., Paquette, L., Biswas, G., and Baker, R. S. 2021. Investigating SMART models of self-regulation and their impact on learning. In Proceedings of the 14th International Conference on Educational Data Mining (pp. 580–587).
Hutt, S., Baker, R. S., Ocumpaugh, J., Munshi, A., Andres, J., Karumbaiah, S., Slater, S., Biswas, G., Paquette, L., Bosch, N., and van Velsen, M. 2022. Quick red fox: An app supporting a new paradigm in qualitative research on AIED for STEM. In Artificial intelligence in STEM education (pp. 319–332).
Irgens, G. A., Adisa, I. O., Sistla, D., Famaye, T., Bailey, C., Behboudi, A., and Adefisayo, A. O. 2024. Supporting theory building in design-based research through large-scale data-based models. In Proceedings of the 17th International Conference on Educational Data Mining (pp. 296–303).
Jiang, Y., Wang, W., and Xu, Y. 2025. Collaborative coding and debriefing with GPT-4o: Enhancing analytic rigor through dialogue. In A. Barany, R. S. Baker, A. Katz, & J. Lin (Eds.), From data to discovery: LLMs for qualitative analysis in education (LAK ’25 Workshop). Dublin, Ireland.
Katz, A., Gerhardt, M., and Soledad, M. 2024. Using generative text models to create qualitative codebooks for student evaluations of teaching. International Journal of Qualitative Methods, 23. https://doi.org/10.1177/16094069241293283
King, E. C., Benson, M., Raysor, S., Holme, T. A., Sewall, J., Koedinger, K. R., Aleven, V., and Yaron, D. J. 2022. The open-response chemistry cognitive assistance tutor system. Journal of Chemical Education, 99, 546–552.
Kirsten, E., Buckmann, A., Mhaidli, A., and Becker, S. 2024. Decoding complexity: Exploring human–AI concordance in qualitative coding. arXiv:2403.06607.
Koopman, B., and Zuccon, G. 2023. Dr ChatGPT tell me what I want to hear: How different prompts impact health answer correctness. In Proceedings of EMNLP 2023 (pp. 15012–15022). ACL.
Lane, H. C., Gadbury, M., Ginger, J., Yi, S., Comins, N., Henhapl, J., and Rivera-Rogers, A. 2022. Triggering STEM interest with Minecraft in a hybrid summer camp. Technology, Mind, and Behavior, 3(4), 580-597.
Linnenbrink-Garcia, L., Durik, A. M., Conley, A. M. M., Barron, K. E., Tauer, J. M., Karabenick, S. A., and Harackiewicz, J. M. 2010. Measuring situational interest in academic domains. Educational and Psychological Measurement, 70, 647–671.
Liu, X., Zhang, J., Barany, A., Pankiewicz, M., and Baker, R. S. 2024. Assessing the potential and limits of large language models in qualitative coding. In Advances in Quantitative Ethnography (pp. 89–103). Springer.
Liu, X., Zambrano, A. F., Baker, R. S., Barany, A., Ocumpaugh, J., Zhang, J., Pankiewicz, M., Nasiar, N., and Wei, Z. 2025. Qualitative coding with GPT-4: Where it works better. Journal of Learning Analytics, 12, 169–185.
López-Fierro, S., Shehzad, U., Zandi, A. S., Clarke-Midura, J., and Recker, M. 2025. Streamlining field note analysis: Leveraging GPT for further insights. Paper presented at the American Educational Research Association (AERA) Annual Meeting, Denver, CO.
McLaren, B. M., Lim, S.-J., Gagnon, F., Yaron, D., and Koedinger, K. R. 2006. Studying the effects of personalized language and worked examples. In Intelligent Tutoring Systems (pp. 318–328). Springer.
Modi, A., Veerubhotla, A. S., Rysbek, A., Huber, A., Wiltshire, B., Veprek, B., Gillick, D., Kasenberg, D., Ahmed, D., Jurenka, I., Cohan, J., She, J., Wilkowski, J., Alarakeyia, K., McKee, K. R., Wang, L., Kunesch, M., Schaeckermann, M., Pîslar, M., … and Assael, Y. 2024. LearnLM: Improving Gemini for learning. CoRR, abs/2412.16429.
Morgan, D. L. 2023. Exploring the use of artificial intelligence for qualitative data analysis. International Journal of Qualitative Methods, 22. https://doi.org/10.1177/16094069231211248
Mu, Y., Wu, B. P., Thorne, W., Robinson, A., Aletras, N., Scarton, C., Bontcheva, K., and Song, X. 2024. Navigating prompt complexity for zero-shot classification. In Proceedings of LREC–COLING 2024 (pp. 12074–12086).
Nguyen, H., Nguyen, V., Ludovise, S., and Santagata, R. 2025. Misrepresentation or inclusion: Promises of generative AI in climate change education. Learning, Media and Technology, 50, 393–409.
Ohmoto, Y., Shimojo, S., Morita, J., and Hayashi, Y. 2024. Estimation of ICAP states based on interaction data. Journal of Educational Data Mining, 16, 149–176.
Panadero, E. 2017. A review of self-regulated learning. Frontiers in Psychology, 8.
Peters, U., and Chin-Yee, B. 2025. Generalization bias in large language model summarization. Royal Society Open Science, 12, 241776.
Ramanathan, S., Lim, L.-A., Mottaghi, N. R., and Buckingham Shum, S. 2025. When the prompt becomes the codebook. In Proceedings of the 15th Learning Analytics and Knowledge Conference (LAK ’25) (pp. 713–725). ACM.
Rebedea, T., Dinu, R., Sreedhar, M. N., Parisien, C., and Cohen, J. 2023. NeMo Guardrails. In Proceedings of EMNLP 2023: System Demonstrations (pp. 431–445). ACL.
Renninger, K. A. 2009. Interest and identity development. Educational Psychologist, 44, 105–118.
Renninger, K. A., and Hidi, S. E. 2020. To level the playing field, develop interest. Policy Insights from the Behavioral and Brain Sciences, 7, 10–18.
Ruijten-Dodoiu, P. 2025. Collaborating with ChatGPT: Iterative thematic analysis. In A. Barany et al. (Eds.), From data to discovery: LLMs for qualitative analysis in education (LAK ’25 Workshop). Dublin, Ireland.
Rupp, A. A., Levy, R., DiCerbo, K. E., Sweet, S. J., Crawford, A. V., Caliço, T., Benson, M., Fay, D., Kunze, K. L., Mislevy, R. J., and Behrens, J. T. 2012. Putting ECD into practice. Journal of Educational Data Mining, 4, 49–110.
Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., and Chadha, A. 2024. A systematic survey of prompt engineering. CoRR, abs/2402.07927.
Saldaña, J. 2021. The coding manual for qualitative researchers. Sage.
Schäfer, K., Murray, J., and Tonya, B. 2025. Glows and grows. In A. Barany et al. (Eds.), From data to discovery (LAK ’25 Workshop). Dublin, Ireland.
Shaffer, D. W., and Ruis, A. R. 2021. How we code. In Advances in quantitative ethnography (pp. 62–77). Springer.
Shaffer, D. W., Collier, W., and Ruis, A. R. 2016. A tutorial on epistemic network analysis. Journal of Learning Analytics, 3, 9–45.
Shi, F., Chen, X., Misra, K., Scales, N., Dohan, D., Chi, E. H., Schärli, N., and Zhou, D. 2023. Large language models can be easily distracted. In Proceedings of ICML 2023 (pp. 31210–31227).
Simon, S., Sankaranarayanan, S., Tajik, E., Borchers, C., Shahrokhian, B., Balzan, F., Strauß, S., Viswanathan, S. A., Ataş, A. H., Čarapina, M., Liang, L., and Celik, B. 2025. Comparing a human’s and a multi-agent system’s thematic analysis. In Artificial Intelligence in Education (pp. 60–73). Springer.
Strauss, A. L. 1987. Qualitative analysis for social scientists. Cambridge University Press.
Tai, Y. C., Patni, K. N., Hemauer, N., Desmarais, B., and Lin, Y.-R. 2025. GenAI vs. human fact-checkers. In Proceedings of the 17th ACM Web Science Conference (WebSci ’25) (pp. 516–521). ACM.
Wang, Y., Song, W., Tao, W., Liotta, A., Yang, D., Li, X., Gao, S., Sun, Y., Ge, W., and Zhang, W. 2022. A systematic review on affective computing. Information Fusion, 83, 19–52.
Wei, Z., Nasiar, N., Zambrano, A. F., Liu, X., Ocumpaugh, J., Barany, A., Baker, R. S., & Giordano, C. 2025. Exploring students’ interest-driven patterns. In Proceedings of the 19th International Conference of the Learning Sciences (ICLS ’25) (pp. 386–394).
Weston, C., Gandell, T., Beauchamp, J., McAlpine, L., Wiseman, C., and Beauchamp, C. 2001. Analyzing interview data. Qualitative Sociology, 24, 381–400.
Winne, P. H., and Hadwin, A. F. 1998. Studying as self-regulated learning. In Metacognition in educational theory and practice (pp. 277–304).
Zambrano, A. F., Liu, X., Barany, A., Baker, R. S., Kim, J., and Nasiar, N. 2023. From nCoder to ChatGPT. In International Conference on Quantitative Ethnography (pp. 470–485). Springer.
Zhang, J., Borchers, C., and Barany, A. 2024a. Studying the interplay of self-regulated learning cycles. In Advances in Quantitative Ethnography (pp. 231–246). Springer.
Zhang, J., Borchers, C., Aleven, V., and Baker, R. S. 2024b. Using large language models to detect self-regulated learning. In Proceedings of the 17th International Conference on Educational Data Mining (pp. 157–168).
Zhang, J., Andres, J. M. A. L., Hutt, S., Baker, R. S., Ocumpaugh, J., Mills, C., Brooks, J., Sethuraman, S., and Young, T. 2022. Detecting SMART model cognitive operations. In Proceedings of the International Conference on Educational Data Mining (pp. 75–85).
Zhou, Y., and Paquette, L. 2024. Investigating student interest in a Minecraft environment. In Proceedings of the 17th International Conference on Educational Data Mining (pp. 396–404).
Zimmerman, B. J. 2000. Attaining self-regulation. In Handbook of self-regulation (pp. 13–39). Academic Press.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.