Predicting Perceived Text Complexity: The Role of Person-Related Features in Profile-Based Models

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Jun 2, 2025
Boris Thome Friederike Hertweck Stefan Conrad

Abstract

Text complexity is inherently subjective, as it is not solely determined by linguistic properties but also shaped by the reader’s perception. Factors such as prior knowledge, language proficiency, and cognitive abilities influence how individuals assess the difficulty of a text. Existing methods for measuring text complexity commonly rely on quantitative linguistic features and ignore differences in the readers' backgrounds. In this paper, we evaluate several machine learning models that determine the complexity of texts as perceived by teenagers in high school prior to deciding on their post-secondary pathways. We collected and publicly released a dataset from German schools, where 193 students with diverse demographic backgrounds, school grades, and language abilities annotated a total of 3,954 German sentences. The text corpus is based on official study guides authored by German governmental authorities. In contrast to existing methods of determining text complexity, we build a model that is specialized to behave like the target audience, thereby accounting for the diverse backgrounds of the readers. The annotations indicate that students generally perceived the texts as significantly simpler than suggested by the Flesch-Reading-Ease score. We show that K-Nearest-Neighbors, Multilayer Perceptron, and ensemble models perform well in predicting the subjectively perceived text complexity. Furthermore, SHapley Additive exPlanation (SHAP) values reveal that these perceptions not only differ by the text's linguistic features but also by the students' mother tongue, gender, and self-estimation of German language skills. We also implement role-play prompting with ChatGPT and Claude and show that state-of-the-art large language models have difficulties in accurately assessing perceived text complexity from a student’s perspective. This work thereby contributes to the growing field of adjusting text complexity to the needs of the target audience by going beyond quantitative linguistic features. We have made the collected dataset publicly available at https://github.com/boshl/studentannotations.

How to Cite

Thome, B., Hertweck, F., & Conrad, S. (2025). Predicting Perceived Text Complexity: The Role of Person-Related Features in Profile-Based Models. Journal of Educational Data Mining, 17(1), 276–307. https://doi.org/10.5281/zenodo.15575437
Abstract 24 | PDF Downloads 19 HTML Downloads 11

##plugins.themes.bootstrap3.article.details##

Keywords

text complexity, prompt engineering, profile-based modeling, education, dataset, readability

References
Al-Thanyyan, S. S. and Azmi, A. M. 2021. Automated text simplification: A survey. ACM Computing Surveys 54, 2, 1–36.

Amstad, T. 1978. Wie verständlich sind unsere Zeitungen? Studenten-Schreib-Service.

Anderson, J. 1983. Lix and Rix: Variations on a little-known readability index. Journal of Reading 26, 6, 490–496.

Arps, D., Kels, J., Krämer, F., Renz, Y., Stodden, R., and Petersen, W. 2022. HHUplexity at text complexity DE challenge 2022. In Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text, S. Möller, S. Mohtaj, and B. Naderi, Eds. Association for Computational Linguistics, Potsdam, Germany, 27–32.

Bar-Haim, R., Eden, L., Friedman, R., Kantor, Y., Lahav, D., and Slonim, N. 2020. From arguments to key points: Towards automatic argument summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds. Association for Computational Linguistics, Online, 4029–4039.

Bast, H. and Korzen, C. 2017. A benchmark and evaluation for text extraction from PDF. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, 1–10.

Benedetto, L., Aradelli, G., Donvito, A., Lucchetti, A., Cappelli, A., and Buttery, P. 2024. Using LLMs to simulate students’ responses to exam questions. In Findings of the Association for Computational Linguistics: EMNLP 2024, Y. Al-Onaizan, M. Bansal, and Y.-N. Chen, Eds. Association for Computational Linguistics, Miami, Florida, USA, 11351–11368.

Bock, K. H. 1974. Studien- und Berufswahl - Entscheidungshilfen für Abiturienten und Absolventen der Fachoberschulen. Number 1. Verlag Karl Heinrich Bock.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33, 1877–1901.

Chen, T. and Guestrin, C. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16. Association for Computing Machinery, New York, NY, USA, 785–794.

Cooper, K. M., Krieg, A., and Brownell, S. E. 2018. Who perceives they are smarter? Exploring the influence of student characteristics on student academic self-concept in physiology. Advances in Physiology Education 42, 2, 200–208.

Cortes, C. and Vapnik, V. 1995. Support-vector networks. Machine Learning 20, 273–297.

Cover, T. and Hart, P. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 1, 21–27.

Dahl, A. C., Carlson, S. E., Renken, M., McCarthy, K. S., and Reynolds, E. 2021. Materials matter: An exploration of text complexity and its effects on middle school readers’ comprehension processing. Language, Speech, and Hearing Services in Schools 52, 2, 702–716.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.

Dietterich, T. G. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems, J. Kittler and F. Roli, Eds. Springer Berlin Heidelberg, Berlin, Heidelberg, 1–15.

Dunlosky, J. and Metcalfe, J. 2008. Metacognition. Sage Publications.

Espinosa-Zaragoza, I., Abreu-Salas, J., Lloret, E., Moreda, P., and Palomar, M. 2023. A review of research-based automatic text simplification tools. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, R. Mitkov and G. Angelova, Eds. INCOMA Ltd., Shoumen, Bulgaria, Varna, Bulgaria, 321–330.

Flesch, R. 1948. A new readability yardstick. Journal of Applied Psychology 32, 3, 221.

Fulmer, S. M., D’Mello, S. K., Strain, A., and Graesser, A. C. 2015. Interest-based text preference moderates the effect of text difficulty on engagement and learning. Contemporary Educational Psychology 41, 98–110.

Galton, F. 1886. Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland 15, 246–263.

Gilardi, F., Alizadeh, M., and Kubli, M. 2023. ChatGPT outperforms crowd-workers for text-annotation tasks. Proceedings of the National Academy of Sciences 120, 30 (July), e2305016120.

Gooding, S. and Tragut, M. 2022. One size does not fit all: The case for personalised word complexity models. In Findings of the Association for Computational Linguistics: NAACL 2022, M. Carpuat, M.-C. de Marneffe, and I. V. Meza Ruiz, Eds. Association for Computational Linguistics, Seattle, United States, 353–365.

Graesser, A. C., McNamara, D. S., Louwerse, M. M., and Cai, Z. 2004. Coh-metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers 36, 2, 193–202.

Gunning, R. 1952. The Technique of Clear Writing. McGraw-Hill.

Hertweck, F., Jonas, L., Thome, B., and Yasar, S. 2024. RWI-UNI-SUBJECTS: Complete records of all subjects across German HEIs (1971 - 1996). Tech. rep., RWI – Leibniz Institute for Economic Research.

Hu, B., Zhu, J., Pei, Y., and Gu, X. 2025. Exploring the potential of LLM to enhance teaching plans through teaching simulation. npj Science of Learning 10, 1, 7.

Jindal, P. and MacDermid, J. C. 2017. Assessing reading levels of health information: uses and limitations of flesch formula. Education for Health 30, 1, 84–88.

Kahneman, D. 1973. Attention and effort. Prentice-Hall, Englewood Cliffs.

Kleinnijenhuis, J. 1991. Newspaper complexity and the knowledge gap. European Journal of Communication 6, 4, 499–522.

Kong, A., Zhao, S., Chen, H., Li, Q., Qin, Y., Sun, R., Zhou, X., Wang, E., and Dong, X. 2024. Better zero-shot reasoning with role-play prompting. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), K. Duh, H. Gomez, and S. Bethard, Eds. Association for Computational Linguistics, Mexico City, Mexico, 4099–4113.

Lee, B. W. and Lee, J. 2023. Prompt-based learning for text readability assessment. In Findings of the Association for Computational Linguistics: EACL 2023, A. Vlachos and I. Augenstein, Eds. Association for Computational Linguistics, Dubrovnik, Croatia, 1819–1824.

Lee, M., Gero, K. I., Chung, J. J. Y., Shum, S. B., Raheja, V., Shen, H., Venugopalan, S., Wambsganss, T., Zhou, D., Alghamdi, E. A., et al. 2024. A design space for intelligent and interactive writing assistants. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. CHI ’24. Association for Computing Machinery, New York, NY, USA, 1–35.

Leroy, G., Helmreich, S., and Cowie, J. R. 2010. The influence of text characteristics on perceived and actual difficulty of health information. International Journal of Medical Informatics 79, 6, 438–449.

Lundberg, S. M. and Lee, S.-I. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Vol. 30. Curran Associates, Inc., Red Hook, NY, USA, 4768–4777.

Marginson, S. 2016. The worldwide trend to high participation higher education: Dynamics of social stratification in inclusive systems. Higher Education 72, 413–434.

Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., and Zettlemoyer, L. 2022. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Y. Goldberg, Z. Kozareva, and Y. Zhang, Eds. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 11048–11064.

Mohtaj, S., Naderi, B., and Möller, S. 2022. Overview of the GermEval 2022 shared task on text complexity assessment of German text. In Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text, S. Möller, S. Mohtaj, and B. Naderi, Eds. Association for Computational Linguistics, Potsdam, Germany, 1–9.

Mosquera, A. 2022. Tackling data drift with adversarial validation: An application for German text complexity estimation. In Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text, S. Möller, S. Mohtaj, and B. Naderi, Eds. Association for Computational Linguistics, Potsdam, Germany, 39–44.

Naderi, B., Mohtaj, S., Ensikat, K., and Möller, S. 2019. Subjective assessment of text complexity: A dataset for german language. arXiv preprint arXiv:1904.07733.

Napolitano, D., Sheehan, K., and Mundkowsky, R. 2015. Online readability and text complexity analysis with TextEvaluator. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, M. Gerber, C. Havasi, and F. Lacatusu, Eds. Association for Computational Linguistics, Denver, Colorado, 96–100.

Paetzold, G. and Specia, L. 2016. SemEval 2016 task 11: Complex word identification. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), S. Bethard, M. Carpuat, D. Cer, D. Jurgens, P. Nakov, and T. Zesch, Eds. Association for Computational Linguistics, San Diego, California, 560–569.

Romstadt, J., Strombach, T., and Berg, K. 2024. GraphVar – Ein Korpus für graphematische Variation (und mehr). De Gruyter, Berlin, Boston, 425–436.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986. Learning representations by back-propagating errors. nature 323, 6088, 533–536.

Sanh, V., Debut, L., Chaumond, J., and Wolf, T. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108.

Santucci, V., Santarelli, F., Forti, L., and Spina, S. 2020. Automatic classification of text complexity. Applied Sciences 10, 20, 7285.

Seiffe, L., Kallel, F., Möller, S., Naderi, B., and Roller, R. 2022. Subjective text complexity assessment for German. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, and S. Piperidis, Eds. European Language Resources Association, Marseille, France, 707–714.

Shakil, H., Farooq, A., and Kalita, J. 2024. Abstractive text summarization: State of the art, challenges, and improvements. Neurocomputing 603, 128255–128255.

Spencer, M., Gilmour, A. F., Miller, A. C., Emerson, A. M., Saha, N. M., and Cutting, L. E. 2019. Understanding the influence of text complexity and question type on reading outcomes. Reading and Writing 32, 603–637.

Thome, B., Hertweck, F., and Conrad, S. 2024. Determining perceived text complexity: An evaluation of German sentences through student assessments. In Proceedings of the 17th International Conference on Educational Data Mining. International Educational Data Mining Society, Atlanta, Georgia, USA, 714–721.

Tolochko, P., Song, H., and Boomgaarden, H. 2019. “That looks hard!”: Effects of objective and perceived textual complexity on factual and structural political knowledge. Political Communication 36, 4, 609–628.

Tversky, A. and Kahneman, D. 1974. Judgment under uncertainty: Heuristics and biases: Biases in judgments reveal some heuristics of thinking under uncertainty. Science 185, 4157, 1124–1131.

Yang, Y.-H., Chu, H.-C., and Tseng, W.-T. 2021. Text difficulty in extensive reading: Reading comprehension and reading motivation. Reading in a Foreign Language 33, 1, 78–102.

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. 2023. Judging LLM-as-a-judge with MT-bench and chatbot arena. Advances in Neural Information Processing Systems 36, 46595–46623.
Section
EDM 2025 Journal Track