Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI‑Generated and Human Data

Jihong Zhang; Xinya Liang; Deng Anqi; Nicole Bonge; Lin Tan; Ling Zhang; Nicole Zarrett

doi:10.5281/zenodo.18733538

Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI‑Generated and Human Data

PDF HTML

Published January 14, 2026

DOI: https://doi.org/10.5281/zenodo.18733538

Jihong Zhang

University of Arkansas

Xinya Liang

University of Arkansas

Deng Anqi

University of Arkansas

Nicole Bonge

University of Arkansas

Lin Tan

University of Arkansas

Ling Zhang

University of Wyoming

Nicole Zarrett

University of South Carolina

Abstract

Mixed methods research integrates quantitative and qualitative data but faces challenges in aligning their distinct structures, particularly in examining measurement characteristics and individual response patterns. Advances in large language models (LLMs) offer promising solutions by generating synthetic survey responses informed by qualitative data. This study investigates whether LLMs, guided by personal interviews, can reliably predict human survey responses, using the Behavioral Regulations in Exercise Questionnaire (BREQ) and interviews from after-school program staff as a case study. Results indicate that LLMs capture overall response patterns but exhibit lower variability than humans. Incorporating interview data improves response diversity for some models (e.g., Claude, GPT), while well-crafted prompts and low-temperature settings enhance alignment between LLM and human responses. Demographic information had less impact than interview content on alignment accuracy. Item-level analysis revealed higher discrepancies for negatively worded questions, suggesting LLMs struggle with emotional nuance. Person-level differences indicated varying model performance across respondents, highlighting the role of interview relevance over length. Despite replicating individual item trends, LLMs faltered in reconstructing the test’s psychometric structure. These findings underscore the potential of interview-informed LLMs to bridge qualitative and quantitative methodologies while revealing limitations in response variability, emotional interpretation, and psychometric fidelity. Future research should refine prompt design, explore bias mitigation, and optimize model settings to enhance the validity of LLM-generated survey data in social science research. The R code and the supplementary materials are available on the OSF platform (DOI:10.17605/OSF.IO/AFQG3).

How to Cite

Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI‑Generated and Human Data. (2026). Journal of Educational Data Mining, 18(1), 1-24. https://doi.org/10.5281/zenodo.18733538

Abstract 215 | PDF Downloads 433 HTML Downloads 619

Keywords

quantitative data, qualitative data, LLM-driven interview, survey, behavioral regulations in exercise

References

Agarwal, M., Goswami, A., Sharma, P., Agarwal, M., Goswami, A., & Sharma, P. (2023). Evaluating ChatGPT-3.5 and claude-2 in answering and explaining conceptual medical physiology multiple-choice questions. Cureus, 15(9), e46222. https://doi.org/10.7759/cureus.46222

Anthropic (2025). Claude 2 [Large language model]. https://www.anthropic.com/news/claude-2

Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337–351. https://doi.org/10.1017/pan.2023.2

Binz, M. & Schulz, E. (2023). Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences, 120(6), e2218523120. https://doi.org/10.1073/pnas.2218523120

Bisbee, J., Clinton, J. D., Dorff, C., Kenkel, B., & Larson, J. M. (2024). Synthetic replacements for human survey data? The perils of large language models. Political Analysis, 32(4), 401-416. https://doi.org/10.1017/pan.2024.5

Bishop, F. L. (2015). Using mixed methods research designs in health psychology: An illustrated discussion from a pragmatist perspective. British Journal of Health Psychology, 20(1), 5–20. https://doi.org/10.1111/bjhp.12122

Chang, S., Chaszczewicz, A., Wang, E., Josifovska, M., Pierson, E., & Leskovec, J. (2024). LLMs generate structurally realistic social networks but overestimate political homophily. arXiv. https://doi.org/10.48550/arXiv.2408.16629

Chen, J., Wang, X., Xu, R., Yuan, S., Zhang, Y., Shi, W., Xie, J., Li, S., Yang, R., Zhu, T., Chen, A., Li, N., Chen, L., Hu, C., Wu, S., Ren, S., Fu, Z., & Xiao, Y. (2024). From persona to personalization: A survey on role-playing language agents. arXiv. https://doi.org/10.48550/arXiv.2404.18231

Cid, L., Moutão, J., Leitão, J., & Alves, J. (2012). Behavioral Regulation Assessment in Exercise: Exploring an Autonomous and Controlled Motivation Index. The Spanish Journal of Psychology, 15(3), 1520–1528. https://doi.org/10.5209/rev_SJOP.2012.v15.n3.39436

Creswell, J. W. & Clark, V. L. P. (2017). Designing and conducting mixed methods research. SAGE Publications.

Dillion, D., Tandon, N., Gu, Y., & Gray, K. (2023). Can AI language models replace human participants? Trends in Cognitive Sciences, 27(7), 597–600. https://doi.org/10.1016/j.tics.2023.04.008

Ding, M., Deng, C., Choo, J., Wu, Z., Agrawal, A., Schwarzschild, A., Zhou, T., Goldstein, T., Langford, J., Anandkumar, A., & Huang, F. (2024). Easy2Hard-bench: Standardized difficulty labels for profiling LLM performance and generalization. Advances in Neural Information Processing Systems, 37, 44323–44365. https://proceedings.neurips.cc/paper_files/paper/2024/hash/4e6f22305275966513990f53cec908e0-Abstract-Datasets_and_Benchmarks_Track.html

Ekin, S. (2023). Prompt engineering for ChatGPT: A quick guide to techniques, tips, and best practices. arXiv. https://www.authorea.com/doi/full/10.36227/techrxiv.22683919?commit=95e67146c79e1ed93e4caa2eb930eb0984abec35

Fateen, M. & Mine, T. (2025). Developing a tutoring dialog dataset to optimize LLMs for educational use. arXiv. https://doi.org/10.48550/arXiv.2410.19231

Federiakin, D., Molerov, D., Zlatkin-Troitschanskaia, O., & Maur, A. (2024). Prompt engineering as a new 21st century skill. Frontiers in Education, 9. https://doi.org/10.3389/feduc.2024.1366434

Ge, T., Chan, X., Wang, X., Yu, D., Mi, H., & Yu, D. (2024). Scaling synthetic data creation with 1,000,000,000 personas. arXiv. https://doi.org/10.48550/arXiv.2406.20094

Huang, J., Jiao, W., Lam, M. H., Li, E. J., Wang, W., & Lyu, M. R. (2024). Revisiting the reliability of psychological scales on large language models. arXiv. https://doi.org/10.48550/arXiv.2305.19926

Huang, J., Wang, W., Li, E. J., Lam, M. H., Ren, S., Yuan, Y., Jiao, W., Tu, Z., & Lyu, M. R. (2024). Who is ChatGPT? Benchmarking LLMs’ psychological portrayal using PsychoBench. arXiv. https://doi.org/10.48550/arXiv.2310.01386

Jansen, B. J., Salminen, J., Jung, S., & Guan, K. (2022). Data-driven personas. Springer Nature.

Jiang, H., Zhang, X., Cao, X., Breazeal, C., Roy, D., & Kabbara, J. (2023). PersonaLLM: Investigating the ability of large language models to express personality traits. arXiv. https://doi.org/10.48550/arXiv.2305.02547

Johnson, R. B., Onwuegbuzie, A. J., & Turner, L. A. (2007). Toward a definition of mixed methods research. Journal of Mixed Methods Research, 1(2), 112–133. https://doi.org/10.1177/1558689806298224

Laverghetta Jr., A. & Licato, J. (2023). Generating better items for cognitive assessments using large language models. In E. Kochmar, J. Burstein, A. Horbach, R. Laarmann-Quante, N. Madnani, A. Tack, V. Yaneva, Z. Yuan, & T. Zesch (Eds.), Proceedings of the 18th workshop on innovative use of NLP for building educational applications (BEA 2023) (pp. 414–428). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.bea-1.34

Li, A., Chen, H., Namkoong, H., & Peng, T. (2025). LLM generated persona is a promise with a catch. arXiv. https://doi.org/10.48550/arXiv.2503.16527

Li, Y., Huang, Y., Wang, H., Zhang, X., Zou, J., & Sun, L. (2024). Quantifying AI psychology: A psychometrics benchmark for large language models. arXiv. https://doi.org/10.48550/arXiv.2406.17675

Liu, Y., Bhandari, S., & Pardos, Z. A. (2025). Leveraging LLM respondents for item evaluation: A psychometric analysis. British Journal of Educational Technology, 56(3), 1028–1052. https://doi.org/10.1111/bjet.13570

Liu, Y., Sharma, P., Oswal, M. J., Xia, H., & Huang, Y. (2024). PersonaFlow: Boosting research ideation with LLM-simulated expert personas. arXiv. https://doi.org/10.48550/arXiv.2409.12538

Lozić, E. & Štular, B. (2023). Fluent but not factual: A comparative analysis of ChatGPT and other AI chatbots’ proficiency and originality in scientific writing for humanities. Future Internet, 15(1010), 336. https://doi.org/10.3390/fi15100336

Mancoridis, M., Weeks, B., Vafa, K., & Mullainathan, S. (2025, June 18). Potemkin Understanding in Large Language Models. Forty-second International Conference on Machine Learning. https://openreview.net/forum?id=oetxkccLoq

May, T. A., Stone, G. E., Fan, Y., Sondergeld, C. J., LaPlante, J. N., Provinzano, K., Koskey, K. L. K., & Johnson, C. C. (2025). Using generative artificial intelligence tools to develop multiple-choice assessment items: An effectiveness study. American Educational Research Association.

Mendonça, P. C., Quintal, F., & Mendonça, F. (2025). Evaluating LLMs for automated scoring in formative assessments. Applied Sciences, 15(55), 2787. https://doi.org/10.3390/app15052787

Mullan, E., Markland, D., & Ingledew, D. (1997). A graded conceptualisation of self-determination in the regulation of exercise behaviour: development of a measure using confirmatory factor analytic procedures. Pers Individ Differ, 23, 745–752. https://doi.org/10.1016/S0191-8869(97)00107-4

Mullan, E. & Markland, D. (1997), Variations in self-determination across the stages of change for exercise in adult. Motivation and Emotion, 21, 349–362. https://doi.org/10.1023/A:1024436423492

Nori, H., King, N., McKinney, S. M., Carignan, D., & Horvitz, E. (2023). Capabilities of GPT-4 on medical challenge problems. arXiv. https://doi.org/10.48550/arXiv.2303.13375

OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., …& Zoph, B. (2023). GPT-4 technical report. arXiv. https://doi.org/10.48550/arXiv.2303.08774

Parker, M. J., Anderson, C., Stone, C., & Oh, Y. (2024). A Large Language Model Approach to Educational Survey Feedback Analysis. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-024-00414-0

Peng, Q., Liu, H., Xu, H., Yang, Q., Shao, M., & Wang, W. (2024). Review-LLM: Harnessing large language models for personalized review generation. arXiv. https://doi.org/10.48550/arXiv.2407.07487

Ponce, O. A. & Pagán-Maldonado, N. (2015). Mixed methods research in education: Capturing the complexity of the profession. International Journal of Educational Excellence, 1(1), 111–135. https://doi.org/10.18562/ijee.2015.0005

Powell, H., Mihalas, S., Onwuegbuzie, A. J., Suldo, S., & Daley, C. E. (2008). Mixed methods research in school psychology: A mixed methods investigation of trends in the literature. Psychology in the Schools, 45(4), 291–309. https://doi.org/10.1002/pits.20296

Rasheed, Z., Waseem, M., Ahmad, A., Kemell, K.-K., Xiaofeng, W., Duc, A. N., & Abrahamsson, P. (2024). Can large language models serve as data analysts? A multi-agent assisted approach for qualitative data analysis. arXiv. https://doi.org/10.48550/arXiv.2402.01386

Saab, K., Tu, T., Weng, W.-H., Tanno, R., Stutz, D., Wulczyn, E., Zhang, F., Strother, T., Park, C., Vedadi, E., Chaves, J. Z., Hu, S.-Y., Schaekermann, M., Kamath, A., Cheng, Y., Barrett, D. G. T., Cheung, C., Mustafa, B., Palepu, A., … Natarajan, V. (2024). Capabilities of gemini models in medicine. arXiv. https://doi.org/10.48550/arXiv.2404.18416

Slavin, R., & Smith, D. (2009). The Relationship Between Sample Sizes and Effect Sizes in Systematic Reviews in Education. Educational Evaluation and Policy Analysis, 31(4), 500–506. https://doi.org/10.3102/0162373709352369

Sarstedt, M., Adler, S. J., Rau, L., & Schmitt, B. (2024). Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines. Psychology & Marketing, 41(6), 1254–1270. https://doi.org/10.1002/mar.21982

Schoonenboom, J. (2023). The fundamental difference between qualitative and quantitative data in mixed methods research. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 24(11). https://doi.org/10.17169/fqs-24.1.3986

Serapio-García, G., Safdari, M., Crepy, C., Sun, L., Fitz, S., Romero, P., Abdulhai, M., Faust, A., & Matarić, M. (2023). Personality traits in large language models. arXiv. https://doi.org/10.48550/arXiv.2307.00184

Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Bengio, S., & Farajtabar, M. (2025). The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. arXiv. https://doi.org/10.48550/arXiv.2506.06941

Sun, S., Lee, E., Nan, D., Zhao, X., Lee, W., Jansen, B. J., & Kim, J. H. (2024). Random silicon sampling: Simulating human sub-population opinion using a large language model based on group-level demographic information. arXiv. https://doi.org/10.48550/arXiv.2402.18144

Uto, M. & Uchida, Y. (2020). Automated short-answer grading using deep neural networks and item response theory. In I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, & E. Millán (Eds.), Artificial intelligence in education (pp. 334–339). Springer International Publishing. https://doi.org/10.1007/978-3-030-52240-7_61

Wang, J., Hida, R. M., Park, J., Kim, E. K., & Begeny, J. C. (2024). A systematic review of mixed methods studies published in six school psychology journals: Prevalence, characteristics, and trends from 2011 to 2020. Psychology in the Schools, 61(4), 1302–1317. https://doi.org/10.1002/pits.23114

Wang, P., Zou, H., Yan, Z., Guo, F., Sun, T., Xiao, Z., & Zhang, B. (2024). Not yet: Large language models cannot replace human respondents for psychometric research. OSF Preprint. https://doi.org/10.31219/osf.io/rwy9b

Wang, Q. & Li, H. (2025). On continually tracing origins of LLM-generated text and its application in detecting cheating in student coursework. Big Data and Cognitive Computing, 9(33), 50. https://doi.org/10.3390/bdcc9030050

Wilson, P., Rodgers, W. & Fraser, S. (2002). Examining the Psychometric Properties of the Behavioral Regulation in Exercise Questionnaire. Measurement & Evaluation in Exercise & Sport Science, 6, 1-21. https://doi.org/10.1207/S15327841MPEE0601_1

Wu, S., Koo, M., Blum, L., Black, A., Kao, L., Scalzo, F., & Kurtz, I. (2023). A comparative study of open-source large language models, GPT-4 and claude 2: Multiple-choice test taking in nephrology. arXiv. https://doi.org/10.48550/arXiv.2308.04709

Xu, S. & Zhang, X. (2023). Leveraging generative artificial intelligence to simulate student learning behavior. arXiv. https://doi.org/10.48550/arXiv.2310.19206

Yan, L., Sha, L., Zhao, L., Li, Y., Martinez-Maldonado, R., Chen, G., Li, X., Jin, Y., & Gašević, D. (2024). Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology, 55(1), 90–112. https://doi.org/10.1111/bjet.13370

Yu, C., Ye, J., Li, Y., Li, Z., Ferrara, E., Hu, X., & Zhao, Y. (2024). A large-scale simulation on large language models for decision-making in political science. arXiv. https://doi.org/10.48550/arXiv.2412.15291

Issue

Vol 18 No 1 (2026)

Section

Special Section: Human-AI Partnership for Qualitative Analysis

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms:

The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:

Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
The Author represents and warrants that:

the Work is the Author’s original work;
the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
the Work is not pending review or under consideration by another publisher;
the Work has not previously been published;
the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
the Work contains no libel, invasion of privacy, or other unlawful matter.

The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.

Main

Sidebar

Abstract

How to Cite

Details