Identifying Key Features of Student Performance in Educational Video Games and Simulations through Cluster Analysis



Published Oct 1, 2012
Deirdre Kerr Gregory K.W.K. Chung


The assessment cycle of evidence-centered design (ECD) provides a framework for treating an educational video game or simulation as an assessment. One of the main steps in the assessment cycle of ECD is the identification of the key features of student performance. While this process is relatively simple for multiple choice tests, when applied to log data from educational video games or simulations it becomes one of the most serious bottlenecks facing researchers interested in implementing ECD. In this paper we examine the utility of cluster analysis as a method of identifying key features of student performance in log data stemming from educational video games or simulations. In our study, cluster analysis was able to consistently identify key features of student performance in the form of solution strategies and error patterns across levels, which contained few extraneous actions and explained a sufficient amount of the data.

How to Cite

Kerr, D., & Chung, G. K. (2012). Identifying Key Features of Student Performance in Educational Video Games and Simulations through Cluster Analysis. Journal of Educational Data Mining, 4(1), 144–182.
Abstract 1069 | PDF Downloads 746



evidence-centered design, cluster analysis, fuzzy cluster analysis, feature cluster analysis, log data, educational video games, key features of student performance, student strategies

BERKHIN, R. 2006. A survey of clustering data mining techniques. In Grouping Multidimensional Data, J. KOGAN, C. NICHOLAS, AND M. TEBOULLE, Eds. Springer, New York, NY, 25-72.

BONCHI, F., GIANNOTI, F., GOZZI, C., MANCO, G., NANNI, M., PEDRESCHI, D., RENSO, C., AND RUGGIERI, S. 2001. Web log data warehouses and mining for intelligent web caching. Data & Knowledge Engineering 39, 165-189.

CRAGAR, G. E., BERRY, D. T. R., SCHMITT, F. A., AND FAKHOURY, T. A. 2005. Cluster analysis of normal personality traits in patients with psychogenic nonepileptic seizures. Epilepsy & Behavior 6, 593-600.

CHUNG, G. K. W. K., BAKER, E. L ., VENDLINSKI, T. P., BUSCHANG, R. E., DELACRUZ, G. C., MICHIUYE, J. K., AND BITTICK, S. J. 2010. Testing instructional design variations in a prototype math game. In Current Perspectives from Three National R&D Centers Focused on Game-based Learning: Issues in Learning, Instruction, Assessment, and Game Design. Structured poster session at the annual meeting of the American Educational Research Association, Denver, CO, April, 2010, R. ATKINSON, Chair.

CHUNG, G. K. W. K., AND KERR, D. 2012. A Primer on Data Logging to Support Extraction of Meaningful Information from Educational Games: An Example from Save Patch. (CRESST Tech. Rep. No. 814). University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing, Los Angeles, CA.

FRAWLEY, W. J., PIATESKI-SHAPIRO, G., AND MATHEUS, C. J. 1992. Knowledge discovery in databases: An overview. AI Magazine 13, 57-70.

FREZZO, D. C., BEHRENS, J. T., MISLEVY, R. J., WEST, P., AND D I CERBO, K. E. 2009. Psychometric and evidentiary approaches to simulation assessment in packet tracer software. In Proceedings of the Fifth International Conference on Networking and Services (ICNS 2009), Valencia, Spain, April, 2009, 555-560.

HAND, D., MANNILA, H., AND SMYTH, P. 2001. Principles of data mining. MIT Press, Cambridge, MA.

HUANG, Z. 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283-304.

JAIN, A. K., MURTY, M. N., AND FLYNN, P. J. 1999. Data clustering: A review. ACM Computing Surveys, 31, 264-323.

JAMES, F., AND M C CULLOCH, C. 1990. Multivariate analysis in ecology and systematic: Panacea or Pandora’s box? Annual Review of Ecology and Systematics 21, 129-166.

KERR, D., AND CHUNG, G. K. W. K. 2012. Using Cluster Analysis to Extend Usability Testing to Instructional Content. (CRESST Tech. Rep. No. 816). University of

California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing, Los Angeles, CA.

KRIER, C., FRANCOIS, D., ROSSI, F., AND VERLEYSEN, M. 2007. Feature clustering and mutual information for the selection of variables in spectral data. In Proceedings of the 2007 European Symposium on Artificial Neural Networks (ESANN 2007), Bruges, Belgium, April, 2007, 157-162.

MADHYASTHA, T., AND HUNT, E. 2009. Mining diagnostic assessment data for concept similarity. Journal of Educational Data Mining 1, 72-91.

MALCOM, S. M., CHUBIN, D. E. AND JESSE, J. K. 2004. Standing our ground: A guidebook for STEM educators in the Post-Michigan Era. American Association for the Advancement of Science, Washington, D.C.

MASIP, D., MINGUILLON, J., AND MOR, E. 2 011. Capturing and analyzing student behavior in a virtual learning environment. In Handbook of Educational Data Mining, C. ROMERO, S. VENTURA, M. PECHENIZKIY, AND R. S.J. D . BAKER, Eds. CRC Press, Boca Raton, FL, 339-351.

MERCERON, A., AND YACEF, K. 2004. Mining student data captured from a web- based tutoring tool: Initial exploration and results. Journal of Interactive Learning Research 15, 319-346.

MISLEVY, R. J., ALMOND, R. G., AND LUKAS, J. F. 2004. A Brief Introduction to ECD. (CRESST Tech. Rep. No. 632). University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing, Los Angeles, CA.


Concepts, terminology, and basic models of evidence-centered design. In Automated Scoring of Complex Tasks in Computer-based Testing, D. M. WILLIAMSON, I. I. BEJAR, AND R. J. MISLEVY, Eds. Erlbaum, Mahwah, NJ, 15-48.

NATIONAL MATHEMATICS ADVISORY PANEL (NMAP). 2008. Foundations for success: The final report of the National Mathematics Advisory Panel. U.S. Department of Education, Washington, DC. R DEVELOPMENT CORE TEAM. 2010. R: A Language and Environment for Statistical Computing [Computer software]. Retrieved from http://www.R-

ROMERO, C., GONZALEZ, P., VENTURA, S., DEL JESUS, M. J., AND HERRERA, F. 2009. Evolutionary algorithms for subgroup discovery in e-learning: A practical application using Moodle data. Expert Systems with Applications 39, 1632-1644.

ROMERO, C., AND VENTURA, S. 2007. Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications 35, 135-146.

ROUSSOS, L., STOUT, W., AND MARDEN, J. 1998. Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement 35, 1-30.

ROWLEY, S. 2000. Profiles of African-American college students’ educational utility and performance: A cluster analysis. Journal of Black Psychology 26, 3-26.

RUPP, A. A. in press. Clustering and classification. In The Oxford Handbook of Quantitative Methods, T. LITTLE, Ed. Oxford University Press, New York, NY.

RUPP, A. A., GUSHTA, M., MISLEVY, R. J., AND SHAFFER, D. W. 2010. ECD of epistemic games: Measurement principles for complex learning environments. The Journal of Technology, Learning, and Assessment 8, Retrieved from

RUSPINI, E. H. 1969. A new approach to clustering. Information and Control 15, 22-32.

STEINLEY, D. 2006. K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology 59, 1-34.

TRIGWELL, K., PROSSER, M., AND WATERHOUSE, F. 1999. Relations between teachers’ approaches to teaching and students’ approaches to learning. Higher Education 37, 57-70.

VOGT, W., AND NAGEL, D. 1992. Cluster analysis in diagnosis. Clinical Chemistry 38, 182-198.