Review of CSEDM Data and Introduction of Two Public CS1 Keystroke Datasets

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Mar 15, 2023
John Edwards Kaden Hart Raj Shrestha

Abstract

Analysis of programming process data has become popular in computing education research and educational
data mining in the last decade. This type of data is quantitative, often of high temporal resolution,
and it can be collected non-intrusively while the student is in a natural setting. Many levels of granularity
can be obtained, such as submission, compilation, edit, and keystroke events, with keystroke-level logs
being the most fine-grained of commonly used dataset types. However, the lack of open datasets, especially
at the keystroke level, is notable. There are several reasons for this failing, with the most prominent
being the challenges of deidentification that are peculiar to keystroke log data. In this paper, we present
the public release of two fully deidentified keystroke datasets that are the first of their kind in terms of
both event and metadata richness. We describe our collection technique and properties of the data along
with deidentification techniques that, while not fully relieving researchers of significant effort, at least
reduce and streamline manual work in hopes that researchers will release similar datasets in the future.

How to Cite

Edwards, J., Hart, K., & Shrestha, R. (2023). Review of CSEDM Data and Introduction of Two Public CS1 Keystroke Datasets. Journal of Educational Data Mining, 15(1), 1–31. https://doi.org/10.5281/zenodo.7646659
Abstract 662 | PDF Downloads 536

##plugins.themes.bootstrap3.article.details##

Keywords
References
Section