Assessing the Performance of Online Students - New Data, New Approaches, Improved Accuracy
We consider the problem of assessing the changing performance levels of individual students as they go
through online courses. This student performance modeling problem is a critical step for building adaptive
online teaching systems. Specifically, we conduct a study of how to utilize various types and large amounts
of log data from earlier students to train accurate machine learning models that predict the performance of
future students. This study is the first to use four very large sets of student data made available recently
from four distinct intelligent tutoring systems.
Our results include a new machine learning approach that defines a new state of the art for logistic
regression based student performance modeling, improving over earlier methods in several ways: First, we
achieve improved accuracy of student modeling by introducing new features that can be easily computed
from conventional question-response logs (e.g., features such as the pattern in the student’s most recent
answers). Second, we take advantage of features of the student history that go beyond question-response
pairs (e.g., features such as which video segments the student watched, or skipped) as well as background
information about prerequisite structure in the curriculum. Third, we train multiple specialized student
performance models for different aspects of the curriculum (e.g., specializing in early versus later segments
of the student history), then combine these specialized models to create a group prediction of the student
performance. Taken together, these innovations yield an average AUC score across these four datasets of
0.808 compared to the previous best logistic regression approach score of 0.767, and also outperforming
state-of-the-art deep neural net approaches. Importantly, we observe consistent improvements from each of
our three methodological innovations, in each diverse dataset, suggesting that our methods are of general
utility and likely to produce improvements for other online tutoring systems as well.
performance modeling, knowledge tracing, logistic regression, deep learning, features
