The NAEP EDM Competition required participants to predict efficient test-taking behavior based on log data. This paper describes our top-down approach for engineering features by means of psychometric modeling, aiming at machine learning for the predictive classification task. For feature engineering, we employed, among others, the Log-Normal Response Time Model for estimating latent person speed, and the Generalized Partial Credit Model for estimating latent person ability. Additionally, we adopted an n-gram feature approach for event sequences. Furthermore, instead of using the provided binary target label, we distinguished inefficient test takers who were going too fast and those who were going too slow for training a multi-label classifier. Our best-performing ensemble classifier comprised three sets of low-dimensional classifiers, dominated by test-taker speed. While our classifier reached moderate performance, relative to the competition leaderboard, our approach makes two important contributions. First, we show how classifiers that contain features engineered through literature-derived domain knowledge can provide meaningful predictions if results can be contextualized to test administrators who wish to intervene or take action. Second, our re-engineering of test scores enabled us to incorporate person ability into the models. However, ability was hardly predictive of efficient behavior, leading to the conclusion that the target label's validity needs to be questioned. Beyond competition-related findings, we furthermore report a state sequence analysis for demonstrating the viability of the employed tools. The latter yielded four different test-taking types that described distinctive differences between test takers, providing relevant implications for assessment practice.
How to Cite
log files, psychometric models, domain knowledge–based feature engineering, process data, state sequence analysis, clustering, latent state, ensemble
BAKER, R., WOOLF, B., KATZ, I., FORSYTH, C., AND OCUMPAUGH, J. 2019. Nation’s Report Card Data Mining Competition 2019. https://sites.google.com/view/dataminingcompetition2019/home.
BAKER, R., WOOLF, B., KATZ, I., FORSYTH, C., AND OCUMPAUGH, J. 2020. Press release: 2019 NAEPEducational Data Mining Competition Results Announced. https://sites.google.com/view/dataminingcompetition2019/winners.
BISCHL, B., LANG, M., KOTTHOFF, L., SCHIFFNER, J., RICHTER, J., STUDERUS, E., CASALIC-CHIO, G., AND JONES, Z. M. 2016. mlr: Machine learning in R. Journal of Machine Learning Research 17, 170, 1–5.
COHEN, J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1, 37–46.
FOX, J.-P., KLOTZKE, K., AND ENTINK, R. K. 2019. LNIRT: LogNormal Response Time Item Response Theory Models. R package version 0.4.0.
GABADINHO, A., RITSCHARD, G., MÜLLER, N. S., AND STUDER, M. 2011. Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40, 4, 1–37.
GEIRHOS, R., JACOBSEN, J.-H., MICHAELIS, C., ZEMEL, R., BRENDEL, W., BETHGE, M., AND WICHMANN, F. A. 2020. Shortcut learning in deep neural networks. Nature Machine Intelligence 2, 11 (Nov.), 665–673.
GOLDHAMMER, F., MARTENS, T., AND LÜDTKE, O. 2017. Conditioning factors of test-taking engagement in PIAAC: An exploratory IRT modeling approach considering person and item characteristics. Large-Scale Assessments in Education 5, 1, 1–25.
GOLDHAMMER, F. AND ZEHNER, F. 2017. What to make of and how to interpret process data. Measurement: Interdisciplinary Research and Perspectives 15, 3-4, 128–132.
GRAESSER, A. C. AND BLACK, J. B. 2017. The Psychology of Questions. Psychology Revivals. Routledge.
GRAESSER, A. C. AND FRANKLIN, S. P. 1990. QUEST: A cognitive model of question answering. Discourse Processes 13, 3, 279–303.
HE, Q. AND VON DAVIER, M. 2016. Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Handbook of Research on Technology Tools for Real-World Skill Development, Y. Rosen, S. Ferrara, and M. Mosharraf, Eds. IGI Global, Hershey, PA, 750–777.
JAKWERTH, P. M. AND STANCAVAGE, F. B. 2003. An Investigation of Why Students Do Not Respond to Questions. NAEP Validity Studies. Working Paper Series. Tech. Rep. NCES-WP-2003-12, National Center for Education Statistics, Washington, D.C. Apr.
KLEINENTINK, R. H., FOX, J.-P., AND VAN DER LINDEN, W. J. 2008. A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika 74, 1, 21–48.
KROEHNE, U. 2019. LogFSM: Analyzing log data from educational assessments using finite state machines. http://logfsm.com/index.html.
KROEHNE, U. AND GOLDHAMMER, F. 2018. How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. Behaviormetrika 45, 2 (Aug.), 527–563.
LIU, Y., LI, Z., LIU, H., AND LUO, F. 2019. Modeling test-taking non-effort in MIRT models. Frontiers in Psychology 10, 145.
MANNING, C. D., MANNING, C. D., AND SCHÜTZE, H. 1999. Foundations of statistical natural language processing. MIT Press.
MERRIAM-WEBSTER. 2021. dictionary/efficiency. Efficiency. https://www.merriam-webster.com/
MURAKI, E. 1992. A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement 16, 2, 159–176.
NATIONAL ASSESSMENT GOVERNING BOARD. 2017. Mathematics Framework for the 2017 National Assessment of Educational Progress. National Assessment Governing Board, Washington, DC.
NATIONAL CENTER FOR EDUCATIONAL STATISTICS. 2020. NAEP questions tool. https://nces.ed.gov/nationsreportcard/nqt/.
POHL, S., GRÄFE, L., AND ROSE, N. 2014. Dealing with omitted and not-reached items in competence tests: Evaluating approaches accounting for missing responses in item response theory models. Educational and Psychological Measurement 74, 3, 423–452.
R CORE TEAM. 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
RASCH, G. 1960/1980. Probabilistic models for some intelligence and attainment tests. University of Chicago Press, Chicago, IL.
ROBITZSCH, A., KIEFER, T., AND WU, M. 2019. TAM: Test analysis modules. R package version 3.310.
SAHIN, F. AND COLVIN, K. F. 2020. Enhancing response time thresholds with response behaviors for detecting disengaged examinees. Large-scale Assessments in Education 8, 1–24.
SCHNIPKE, D. L. AND SCRAMS, D. J. 1997. Modeling item response times with a two-state mixture model: A new method of measuring speededness. Journal of Educational Measurement 34, 3, 213–232.
STUDER, M. 2013. WeightedCluster Library Manual: A practical guide to creating typologies of trajectories in the social sciences with R. LIVES Working Papers 24.
TOURANGEAU, R., RIPS, L. J., AND RASINSKI, K. A. 2009. The Psychology of Survey Response, 10. print ed. Cambridge University Press, Cambridge.
VAN DER LINDEN, W. J. 2006. A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics 31, 2, 181–204.
VAN DER LINDEN, W. J. 2007. A hierarchical framework for modeling speed and accuracy on test items. Psychometrika 72, 3, 287–308.
WARD, J. H. 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 301, 236–244.
WARM, T. A. 1989. Weighted likelihood estimation of ability in item response theory. Psychometrika 54, 3, 427–450.
WISE, S. L. 2017. Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice 36, 4, 52–61.
WISE, S. L. 2019. An information-based approach to identifying rapid-guessing thresholds. Applied Measurement in Education 32, 4, 325–336.
WISE, S. L. AND DEMARS, C. E. 2005. Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment 10, 1, 1–17.
WISE, S. L. AND KONG, X. 2005. Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education 18, 2, 163–183.
ZEHNER, F., HARRISON, S., EICHMANN, B., DERIBO, T., BENGS, D., ANDERSEN, N., AND HAHNEL, C. 2020. The NAEP Data Mining Competition: On the value of theory-driven psychometrics and machine learning for predictions based on log data. In Proceedings of the Thirteenth International Conference on Educational Data Mining, A. N. Rafferty, J. Whitehill, C. Romero, and V. Cavalli-Sforza, Eds. 302–312.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.