The Big Data for Education Spoke of the NSF Northeast Big Data Innovation Hub and ETS co-sponsored an educational data mining competition in which contestants were asked to predict efficient time use on the NAEP 8th grade mathematics computer-based assessment, based on the log file of a student’s actions on a prior portion of the assessment. In this work, a combined approach of process mining and expert feature engineering was used to build a large set of features that were then trained with an Extreme Gradient Boosting machine learning model to classify students based on whether they would use their time efficiently. Predictions were evaluated throughout the competition on half of a hidden data set and then the final results were based on the second half of the hidden data set. The approach used here earned the top score in the competition. The work presented elaborates on the combined technique for analyzing computer-based assessment log-file data with the hope that this approach will offer valuable insights for future predictive model building in educational data mining.
How to Cite
process mining, educational data mining, computer-based assessment, extreme gradient boosting
BAKER, R. S. 2019. Challenges for the future of educational data mining: The Baker learning analytics prizes. Journal of Educational Data Mining, 11(1), 1–17.
BAKER, R. S., & SIEMENS, G. 2014. Educational data mining and learning analytics. In R. K. Sawyer (Ed.), Cambridge Handbook of the Learning Sciences, 253–274.
BANNERT, M., REIMANN, P., AND SONNENBERG, C. 2014. Process mining techniques for analysing patterns and strategies in students’ self-regulated learning. Metacognition and Learning, 9(2), 161–185.
BERGSTRA, J. AND BENGIO, Y. 2012. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1), 281–305.
BOGARÍN, A., CEREZO, R., AND ROMERO, C. 2018. A survey on educational process mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(1), p. e1230.
BOGARÍN, A., ROMERO, C., CEREZO, R., AND SÁNCHEZ -SANTILLÁN, M. 2014. Clustering for improving educational process mining. In Proceedings of the Fourth International Conference on Learning Analytics and Knowledge, 11-15.
CHEN, F. AND CUI, Y. 2020. Utilizing student time series behaviour in learning management systems for early prediction of course performance. Journal of Learning Analytics, 7(2), 1–17.
ELLIS, A.P. AND RYAN, A.M. 2003. Race and cognitive-ability test performance: The mediating effects of test preparation, test-taking strategy use and self-efficacy. Journal of Applied Social Psychology, 33(12), 2607–2629.
FISCHER, C., PARDOS, Z.A., BAKER, R.S., WILLIAMS, J.J., SMYTH, P., YU, R., SLATER, S., BAKER, R. AND WARSCHAUER, M. 2020. Mining big data in education: Affordances and challenges. Review of Research in Education, 44(1), 130–160.
GREIFF, S., WÜSTENBERG, S., AND AVVISATI, F. 2015. Computer-generated log-file analyses as a window into students’ minds? A showcase study based on the PISA 2012 assessment of problem solving. Computers & Education, 91, 92–105.
GOLDHAMMER, F. AND ZEHNER, F. 2017. What to make of and how to interpret process data. Measurement: Interdisciplinary Research and Perspectives, 15(3–4), 128–132.
GULEK, C. 2003. Preparing for high-stakes testing. Theory into Practice, 42(1), 42–50.
JUHAŇÁK, L., ZOUNEK, J., AND ROHLÍKOVÁ, L. 2019. Using process mining to analyze students’ quiz-taking behavior patterns in a learning management system. Computers in Human Behavior, 92, 496–506.
MACFADYEN, L.P. AND DAWSON, S. 2010. Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers & Education, 54(2), 588–599.
MAKHLOUF, J. AND MINE, T. 2020. Analysis of click-stream data to predict STEM careers from student usage of an intelligent tutoring system. Journal of Educational Data Mining, 12(2), 1–18.
NESBIT, J.C., ZHOU, M., XU, Y., AND WINNE, P.H. 2007. Advancing log analysis of student interactions with cognitive tools. In 12th Biennial Conference of the European Association for Research on Learning and Instruction, 2–20.
PARK, J., YU, R., RODRIGUEZ, F., BAKER, R., SMYTH, P., & WARSCHAUER, M. 2018. Understanding student procrastination via mixture models. In Proceedings of the 11th International Conference on Educational Data Mining, K.E. Boyer and M. Yudelson, Eds. International Educational Data Mining Society, 187-197.
PAQUETTE, L., DECARVALHO, A. M. J. A., BAKER, R. S., & OCUMPAUGH, J. 2014. Reengineering the feature distillation process: A case study in the detection of gaming the system. In Proceedings of the 7th International Conference on Educational Data Mining, J.C. Stamper, Z.A. Pardos, M. Mavrikis and B.M. McLaren, Eds. International Educational Data Mining Society, 284–287.
PECHENIZKIY, M., TRCKA, N., VASILYEVA, E., VAN DER AALST, W. AND D EB RA, P. 2009. Process mining online assessment data. In Proceedings of the 2nd International Conference on Educational Data Mining, T. Barnes, M.C. Desmarais, C. Romero and S. Ventura, Eds. 279–288.
ROLL, I., & WINNE, P. H. 2015. Understanding, evaluating, and supporting self-regulated learning using learning analytics. Journal of Learning Analytics, 2(1), 7–12. https://doi.org/10.18608/jla.2015.21.2
STENLUND, T., EKLÖF, H., AND LYRÉN, P.-E. 2017. Group differences in test-taking behaviour: an example from a high-stakes testing program. Assessment in Education: Principles, Policy & Practice, 24(1), 4–20.
VAN DER AALST, W.M.P., SCHONENBERG, M.H., AND SONG, M. 2011. Time prediction based on process mining. Information Systems, 36(2), 450–475.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.