Towards Interpretable Automated Machine Learning for STEM Career Prediction



Published Aug 22, 2020
Ruitao Liu Aixin Tan


In this paper, we describe our solution to predict student STEM career choices during the 2017 ASSISTments Datamining Competition. We built a machine learning system that automatically reformats the data set, generates new features and prunes redundant ones, and performs model and feature selection. We designed the system to automatically find a model that optimizes prediction performance, yet the final model is a simple logistic regression that allows researchers to discover important features and study their effects on STEM career choices. We also compared our method to other methods, which revealed that the key to good prediction is proper feature enrichment in the beginning stage of the data analysis, while feature selection in a later stage allows a simpler final model.

Liu, R., & Tan, A. (2020). Towards Interpretable Automated Machine Learning for STEM Career Prediction. Journal of Educational Data Mining, 12(2), 19–32.
STEM careers, automated prediction, penalized logistic regression, forward-backward search algorithm, interpretable machine learning

Special Issue on ASSISTments Longitudinal Data