AutoML Feature Engineering for Student Modeling Yields High Accuracy, but Limited Interpretability

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Aug 26, 2021
Nigel Bosch

Abstract

Automatic machine learning (AutoML) methods automate the time-consuming, feature-engineering process so that researchers produce accurate student models more quickly and easily. In this paper, we compare two AutoML feature engineering methods in the context of the National Assessment of Educational Progress (NAEP) data mining competition. The methods we compare, Featuretools and TSFRESH (Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests), have rarely been applied in the context of student interaction log data. Thus, we address research questions regarding the accuracy of models built with AutoML features, how AutoML feature types compare to each other and to expert-engineered features, and how interpretable the features are. Additionally, we developed a novel feature selection method that addresses problems applying AutoML feature engineering in this context, where there were many heterogeneous features (over 4,000) and relatively few students. Our entry to the NAEP competition placed 3rd overall on the final held-out dataset and 1st on the public leaderboard, with a final Cohen's kappa = .212 and area under the receiver operating characteristic curve (AUC) = .665 when predicting whether students would manage their time effectively on a math assessment. We found that TSFRESH features were significantly more effective than either Featuretools features or expert-engineered features in this context; however, they were also among the most difficult features to interpret based on a survey of six experts' judgments. Finally, we discuss the tradeoffs between effort and interpretability that arise in AutoML-based student modeling.

How to Cite

Bosch, N. (2021). AutoML Feature Engineering for Student Modeling Yields High Accuracy, but Limited Interpretability. Journal of Educational Data Mining, 13(2), 55–79. https://doi.org/10.5281/zenodo.5275314
Abstract 924 | PDF Downloads 1047

##plugins.themes.bootstrap3.article.details##

Keywords

AutoML, feature engineering, feature selection, student modeling

References
ABADI, M., BARHAM, P., CHEN, J., ET AL. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283.

ALYUZ, N., OKUR, E., GENC, U., ASLAN, S., TANRIOVER, C., AND ESME, A.A. 2017. An unobtrusive and multimodal approach for behavioral engagement detection of students. In Proceedings of the 1st ACM SIGCHI International Workshop on Multimodal Interaction for Education. Association for Computing Machinery, New York, NY, USA, 26–32.

BAKER, B., GUPTA, O., NAIK, N., AND RASKAR, R. 2017. Designing neural network architectures using reinforcement learning. arXiv:1611.02167 [cs].

BENJAMINI, Y. AND HOCHBERG, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 1, 289–300.

BREIMAN, L. 2001. Random forests. Machine Learning 45, 1, 5–32.

BREIMAN, L., FRIEDMAN, J., STONE, C.J., AND OLSHEN, R.A. 1984. Classification and regression trees. CRC Press.

CHEN, F. AND CUI, Y. 2020. Utilizing student time series behaviour in learning management systems for early prediction of course performance. Journal of Learning Analytics 7, 2, 1–17.

CHEN, T. AND GUESTRIN, C. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 785–794.

CHRIST, M., BRAUN, N., NEUFFER, J., AND KEMPA-LIEHR, A.W. 2018. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package). Neurocomputing 307, 72–77.

COHEN, J. 1988. Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum, Hillsdale, NJ.

DANG, S.C. AND KOEDINGER, K.R. 2020. Opportunities for human-AI collaborative tools to advance development of motivation analytics. In Companion Proceedings of the 10th International Conference on Learning Analytics & Knowledge (LAK20). SoLAR, 322–329.

EYBEN, F., WÖLLMER, M., AND SCHULLER, B. 2010. openSMILE: The Munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia. ACM, New York, NY, USA, 1459–1462.

FEI, M. AND YEUNG, D.-Y. 2015. Temporal models for predicting student dropout in massive open online courses. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW). IEEE, 256–263.

FEURER, M., EGGENSPERGER, K., FALKNER, S., LINDAUER, M., AND HUTTER, F. 2020. Auto-sklearn 2.0: The next generation. arXiv:2007.04074 [cs, stat].

FISCHER, C., PARDOS, Z.A., BAKER, R.S., ET AL. 2020. Mining big data in education: Affordances and challenges. Review of Research in Education 44, 1, 130–160.

FULCHER, B.D. AND JONES, N.S. 2017. hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell Systems 5, 5, 527-531.e3.

GERVET, T., KOEDINGER, K., SCHNEIDER, J., AND MITCHELL, T. 2020. When is deep learning the best approach to knowledge tracing? Journal of Educational Data Mining 12, 3, 31–54.

GEURTS, P., ERNST, D., AND WEHENKEL, L. 2006. Extremely randomized trees. Machine Learning 63, 1, 3–42.

GOSWAMI, M., MANUJA, M., AND LEEKHA, M. 2020. Towards social & engaging peer learning: Predicting backchanneling and disengagement in children. arXiv:2007.11346 [cs].

HEAD, T., MECHC ODER, LOUPPE, G., ET AL. 2018. scikit-optimize/scikit-optimize: v0.5.2..

HOLLANDS, F. AND BAKIR, I. 2015. Efficiency of automated detectors of learner engagement and affect compared with traditional observation methods. New York, NY: Center for Benefit-Cost Studies of Education, Teachers College, Columbia University.

HORN, F., PACK, R., AND RIEGER, M. 2020. The autofeat Python library for automated feature engineering and selection. In Machine Learning and Knowledge Discovery in Databases, P. Cellier and K. Driessens, Eds. Springer International Publishing, Cham, CH, 111–120.

HUR, P., BOSCH, N., PAQUETTE, L., AND MERCIER, E. 2020. Harbingers of collaboration? The role of early-class behaviors in predicting collaborative problem solving. In Proceedings of the 13th International Conference on Educational Data Mining (EDM 2020). International Educational Data Mining Society, 104–114.

HUTTER, F., KOTTHOFF, L., AND VANSCHOREN, J. 2019. Automated Machine Learning: Methods, Systems, Challenges. Springer Nature, Cham, CH.

JIANG, Y., BOSCH, N., BAKER, R.S., ET AL. 2018. Expert feature-engineering vs. deep neural networks: Which is better for sensor-free affect detection? In Proceedings of the 19th International Conference on Artificial Intelligence in Education (AIED 2018), C.P. Rosé, R. Martínez-Maldonado, H.U. Hoppe, et al., Eds. Springer, Cham, CH, 198–211.

KANTER, J.M. AND VEERAMACHANENI, K. 2015. Deep feature synthesis: Towards automating data science endeavors. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 1–10.

KARUMBAIAH, S., OCUMPAUGH, J., LABRUM, M., AND BAKER, R.S. 2019. Temporally rich features capture variable performance associated with elementary students’ lower math self-concept. In Companion Proceedings of the 9th International Learning Analytics and Knowledge Conference (LAK’19). Society for Learning Analytics Research (SoLAR), Tempe, AZ, USA, 384–388.

KAY, J. 2000. Stereotypes, student models and scrutability. In Proceedings of the 5th International Conference on Intelligent Tutoring Systems, G. Gauthier, C. Frasson and K. VanLehn, Eds. Springer, Berlin, Heidelberg, 19–30.

KHAJAH, M., LINDSEY, R.V., AND MOZER, M.C. 2016. How deep is knowledge tracing? In Proceedings of the 9th International Conference on Educational Data Mining (EDM 2016), T. Barnes, M. Chi and M. Feng, Eds. International Educational Data Mining Society, 94–101.

KONONENKO, I. 1994. Estimating attributes: Analysis and extensions of RELIEF. In European Conference on Machine Learning (ECML 94), F. Bergadano and L.D. Raedt, Eds. Berlin Heidelberg: Springer, 171–182.

KUHN, M. 2008. Building predictive models in R using the caret package. Journal of Statistical Software 28, 5, 1–26.

LANG, M., BINDER, M., RICHTER, J., ET AL. 2019. mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software 4, 44, 1903.

LE, T.T., FU, W., AND MOORE, J.H. 2020. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36, 1, 250–256.

LE CUN, Y., BENGIO, Y., AND HINTON, G. 2015. Deep learning. Nature 521, 7553, 436–444.

LUNDBERG, S.M. AND LEE, S.-I. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U.V. Luxburg, S. Bengio, et al., Eds. Curran Associates, Inc., 4765–4774.

MOHAMAD, N., AHMAD, N.B., JAWAWI, D.N.A., AND HASHIM, S.Z.M. 2020. Feature engineering for predicting MOOC performance. IOP Conference Series: Materials Science and Engineering 884, 012070.

OLSON, R.S., URBANOWICZ, R.J., ANDREWS, P.C., LAVENDER, N.A., KIDD, L.C., AND MOORE, J.H. 2016. Automating biomedical data science through tree-based pipeline optimization. In Applications of Evolutionary Computation, G. Squillero and P. Burelli, Eds. Springer International Publishing, Cham, CH, 123–137.

PAQUETTE, L., BAKER, R.S., DECARVALHO, A., AND OCUMPAUGH, J. 2015. Cross-system transfer of machine learned and knowledge engineered models of gaming the system. In Proceedings of the 23rd International Conference on User Modeling, Adaptation and Personalization (UMAP 2015), F. Ricci, K. Bontcheva, O. Conlan and S. Lawless, Eds. Springer International Publishing, Cham, CH, 183–194.

PAQUETTE, L., DECARVAHLO, A.M.J.A., BAKER, R.S., AND OCUMPAUGH, J. 2014. Reengineering the feature distillation process: A case study in detection of gaming the system. In Proceedings of the 7th International Conference on Educational Data Mining (EDM 2014). Educational Data Mining Society, 284–287.

PARDOS, Z.A., FAN, Z., AND JIANG, W. 2019. Connectionist recommendation in the wild: On the utility and scrutability of neural networks for personalized course guidance. User Modeling and User-Adapted Interaction 29, 2, 487–525.

PARDOS, Z.A., TANG, S., D AVIS, D., AND LE, C.V. 2017. Enabling real-time adaptivity in MOOCs with a personalized next-step recommendation framework. In Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale. Association for Computing Machinery, New York, NY, 23–32.

PEDREGOSA, F., VAROQUAUX, G., GRAMFORT, A., ET AL. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830.

PIECH, C., BASSEN, J., HUANG, J., ET AL. 2015. Deep knowledge tracing. In Advances in Neural Information Processing Systems 28 (NIPS 2015), C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama and R. Garnett, Eds. Curran Associates, Inc., 505–513.

RICKER, N. 1953. The form and laws of propagation of seismic wavelets. Geophysics 18, 1, 10–40.

ROSÉ, C.P., MCLAUGHLIN, E.A., LIU, R., AND KOEDINGER, K.R. 2019. Explanatory learner models: Why machine learning (alone) is not the answer. British Journal of Educational Technology 50, 6, 2943–2958.

SANYAL, D., BOSCH, N., AND PAQUETTE, L. 2020. Feature selection metrics: Similarities, differences, and characteristics of the selected models. In Proceedings of the 13th International Conference on Educational Data Mining (EDM 2020). International Educational Data Mining Society, 212–223.

SEGEDY, J.R., KINNEBREW, J.S., AND BISWAS, G. 2015. Using coherence analysis to characterize self-regulated learning behaviours in open-ended learning environments. Journal of Learning Analytics 2, 1, 13–48.

SEN, A., PATEL, P., RAU, M.A., ET AL. 2018. Machine beats human at sequencing visuals for perceptual-fluency practice. In Proceedings of the 11th International Conference on Educational Data Mining (EDM 2018), K.E. Boyer and M. Yudelson, Eds. International Educational Data Mining Society.

SHAHROKHIANG HAHFAROKHI, B., SIVARAMAN, A., AND VANLEHN, K. 2020. Toward an automatic speech classifier for the teacher. In Proceedings of the 21st International Conference on Artificial Intelligence in Education (AIED 2020), I.I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin and E. Millán, Eds. Springer International Publishing, Cham, CH, 279–284.

SIMARD, P.Y., AMERSHI, S., CHICKERING, D.M., ET AL. 2017. Machine teaching: A new paradigm for building machine learning systems. arXiv:1707.06742 [cs, stat].

STANDEN, P.J., BROWN, D.J., TAHERI, M., ET AL. 2020. An evaluation of an adaptive learning system based on multimodal affect recognition for learners with intellectual disabilities. British Journal of Educational Technology 51, 5, 1748–1765.

THORNTON, C., HUTTER, F., HOOS, H.H., AND LEYTON-BROWN, K. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, 847–855.

TSIAKMAKI, M., KOSTOPOULOS, G., KOTSIANTIS, S., AND RAGOS, O. 2020. Implementing AutoML in educational data mining for prediction tasks. Applied Sciences 10, 1, 90.

VISWANATHAN, S.A. AND VANLEHN, K. 2019. Collaboration detection that preserves privacy of students’ speech. In Proceedings of the 20th International Conference on Artificial Intelligence in Education (AIED 2019), S. Isotani, E. Millán, A. Ogan, P. Hastings, B. McLaren and R. Luckin, Eds. Springer International Publishing, Cham, CH, 507–517.

XIONG, X., ZHAO, S., VANINWEGEN, E.G., AND BECK, J.E. 2016. Going deeper with deep knowledge tracing. In Proceedings of the 9th International Conference on Educational Data Mining (EDM 2016). International Educational Data Mining Society, 545–550.

ZEHNER, F., HARRISON, S., EICHMANN, B., ET AL. 2020. The NAEP EDM competition: On the value of theory-driven psychometrics and machine learning for predictions based on log data. In Proceedings of The 13th International Conference on Educational DataMining (EDM 2020), A.N. Rafferty, J. Whitehill, V. Cavalli-Sforza and C. Romero, Eds. International Educational Data Mining Society, 302–312.

ZOPH, B. AND LE, Q.V. 2017. Neural architecture search with reinforcement learning. arXiv:1611.01578 [cs].
Section
Scientific Findings from the NAEP 2019 Data Mining Competition