Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods



Published Dec 23, 2019
Johannes Berens Kerstin Schneider Simon Gortz Simon Oster Julian Burghoff


To successfully reduce student attrition, it is imperative to understand what the underlying determinants of attrition are and which students are at risk of dropping out. We develop an early detection system (EDS) using administrative student data from a state and private university to predict student dropout as a basis for a targeted intervention. To create an EDS that can be used in any German university, we use the AdaBoost Algorithm to combine regression analysis, neural networks, and decision trees - instead of relying on only one specific method. Prediction accuracy at the end of the first semester is 79% for the state university and 85% for the private university of applied sciences. After the fourth semester, the accuracy improves to 90% for the state university and 95% for the private university of applied sciences.

How to Cite

Berens, J., Schneider, K., Gortz, S., Oster, S., & Burghoff, J. (2019). Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods. JEDM | Journal of Educational Data Mining, 11(3), 1-41. https://doi.org/10.5281/zenodo.3594771
Abstract 247 | PDF Downloads 130



student dropout, early detection, administrative data, higher education, AdaBoost

ARCIDIACONO, P., AUCEJO, E., MAUREL, A. & RANSOM, T. (2016) College attrition and the dynamics of information revelation. NBER Working Papers- National Bureau of Economic Research.

ARNOLD, K. & PISTILLI, M. (2012) Course signals at Purdue: using learning analytics to increase student success. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, 267-70.

ASIF, R., MERCERON, A., ALI, S.A. & HAIDER, N.G. (2017) Analyzing undergraduate students' performance using educational data mining. Computers & Education, 177-94.

BAKER, R.S.J. D. (2010) Statistical data mining tutorials. In B. McGaw, P. Peterson & E. Baker, eds. International Encyclopedia Of Education. UK: Elsevier. 112-18.

BAKER, R.S.J. D. & YACEF, K. (2009) The state of educational data mining in 2009: Areview and future visions. Journal of Educational Data Mining, 1(1), 3-16.

BARBER, R. & SHARKEY, M. (2012) Course correction: using analytics to predict course success. LAK '12 Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, 259-62.

BAREFOOT, B.O., GARDNER, J.N., CUTRIGHT, M., MORRIS, L.V., SCHROEDER, C.C., SCHWARTZ, S.W., SIEGEL, M.J. & SWING, R.L. (2005) Achieving And Sustaining Institutional Excellence for the First Year of College. San Francisco, CA.: Jossey-Bass.

BAYER, J., BYDZOVSKÁ, H., GÉRYK, J., OBSIVAC, T. & POPELINSKY, L. (2012) Predicting drop- out from social behaviour of students. Proceedings of the 5th International Conference on Educational Data Mining, 103-09.

BEAN, J.P. (1983) The application of amodel of turnover in working organizations to the student attrition process. The Review of Higher Education, 6, 129-48.

BERGER, M., GALONSKA, C. & KOOPMANS, R. (2004) Political integration by adetour? Ethnic communities and social capital of migrants in Berlin. Journal of Ethnic and Migration Studies, 30, 491-507.

BOUND, J., LOVENHEIM, M.F. & TURNER, S. (2010) Why have college completion rates declined? An analysis of changing student preparation and collegiate resources. American Economic Journal: Applied Economics, 2, 129-57.

BOWEN, W., CHINGOS, M. & MCPHERSON, M. (2009) Crossing the finish line: Completing college at America'spublic universities. Princeton: Princeton University Press.

BOWERS, A.J., SPROTT, R. & TAFF, T.A. (2013) Do we know who will drop out? Areview of the predictors of dropping out of high school: precision, sensitivity and specificity. The High School Journal, 77-100.

BRADLEY, A.P. (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, 30(7), 1145-59.

BRANDSTÄTTER, H. & FARTHOFER, A. (2002) Studienerfolgsprognose – konfigurativ oder linear additiv? [Predicting student success – Configurational or linear additive?]. Zeitschrift für Differentielle und Diagnostische Psychologie, 23, 381-91.

BRANDSTÄTTER, H., GRILLICH, L. & FARTHOFER, A. (2006) Prognose des Studienabbruchs [Predicting Student Dropout]. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 38, 121-31.

BREIMANN, L. (1996) Bagging predictors. Machine Learning, 24(2), 123-40.

BREIMANN, L. (2001) Random forests. Machine Learning, 45(1), 5-32.

BURRUS, J., ELLIOTT, D., BRENNEMANN, M. & MARKLE, R. (2013) Putting and keeping students on track: toward acomprehensive model of college persistence and goal attainment. ETS Research Report Series.

DANILOWICZ-GÖSELE, K., LERCHE, K., MEYA, J. & SCHWAGER, R. (2017) Determinants of students success at university. Education Economics, 25(5), 513-32.

DEKKER, G.W., PCHENENIZKIY, M. & VLEESHOUWERS, J.M. (2009) Predicting students drop out: acase study. In T. Barnes, M. Desmarais, C. Romero & S. Ventura, eds. Proceedings of the 2nd International Conference on Educational Data Mining. Cordoba, Spain. 41-50.

FENG, M., HEFFERNAN, N. & KOEDINGER, K.R. (2006) Predicting state test scores better with intelligent tutoring systems: developing metrics to measure assistance required. International Conference on Intelligent Tutoring Systems, 31-40.

FRANK, E., HALL, M.A. & WITTEN, I.H. (2016) The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques". Morgan Kaufmann.

GAEBEL, M., HAUSCHILDT, K., MÜHLECK, K. & SMIDT, H. (2012) Tracking learners' and graduates' progression paths. TRACKIT. EUA Publications.

GLEASON, P. & DYNARSKI, M. (2002) Do we know whom to serve? Issues in using risk factors to identify dropouts. Journal of Education for Students Placed At Risk, 25-41.

HALL, M., FRANK, E., HOLMES, G., PFAHRINGER, B., REUTEMANN, R. & WITTEN, I. (2009) The WEKA data mining software: an update. SIGKDD Explorations, 11(1), 10-18.

HEUBLEIN, U. (2014) Student drop-out from German higher education institutions. European Journal of Education. Research, Development and Policy, 49(4), 497-513.

HEUBLEIN, U. & BURKHART, S. (2013) Bildungsinländer 2011- daten und fakten zur situain von von ausländischen studierenden [Educational natives 2011 – Data and facts describing the situation of international students]. Bonn.

HUMPERT, A. & SCHNEIDERHEINZE, K. (2002) Stichprobenziehung für telefonische Zuwandererumfragen. Praktische Erfahrungen und Erweiterung der Auswahlgrundlage. [Sampling for telephone surveys of immigrants. Experience and broadening of the sampling frame]. Münster: Waxmann.

JETTER, T. (2017) Membrain NN [Online]. Available at: https://www.membrain-nn.de/ [Accessed 03 April 2019].

KEMPER, L., VORHOFF, G. & WIGGER, B.U. (2018) Predicting student dropout: Amachine learning approach [Online]. Available at: https://www.researchgate.net/publication/322919234_Predicting_Student_Dropout_a_Mac hine_Learning_Approach [Accessed 17 Juli 2019].

KNOWLES, J. (2015) Of needles and haystacks: building an accurate statewide dropout early warning system in Wisconsin. Journal of Educational Data Mining, 7(3), 18-67.

KOTSIANTIS, S.B., PIERRAKEAS, C.J. & PINTELAS, P.E. (2003) Preventing student dropout in distance learning- using machine learning techniques. Knowledge-Based Intelligent Information and Engineering Systems: 7th International Conference, KES 2003, Oxford, UK, Proceedings, Part II, 267-74.

LARSEN, M.L., KORNBECK, K.P., KRISTENSEN, R.M., LARSEN, M.R. & SOMMERSEL, H.B. (2013) Dropout phenomena at universities: What is dropout? Why does dropout occur? What can be done by the universities to prevent or reduce it? Asystematic review. Danish Clearinghouse for Educational Research.

MICHAEL, J. (2007) Anredebestimmung anhand des Vornamens [Determination of salutation by first name]. ćt, 17/2007, 182-83.

MICHAEL, J. (2016) Name Quality Pro (to be published). (available from the author; mail to: namequality.pro@gmail.com).

MINAEI-BIDGOLI, B., KORTEMEYER, G. & PUNCH, W.F. (2004) Enhancing online learning performance: An application of data mining methods. Proceedings of the Seventh IASTED International Conference on Computers and Advanced Technology in Education, 173-178.

MUCHERINO, A., PAPAJORGJI, P.J. & PARDALOS, P.M. (2009) k-nearest neighbor classification. Data Mining in Agriculture. Springer Optimization and Its Applications, 34, 109-13.

NGHE, N.T., JANECEK, P. & HADDAWAY, P. (2007) Acomparative analysis of techniques for predicting academic performance. Frontiers in Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports, FIE ́0, 37th Annual IEEE, 7- 12.

OECD. (2016) Immigrant background, student performance and students' attitudes towards science. In PISA 2015 Results (Volume I): Excellence and Equity in Education. Paris: OECD Publishing.

OECD. (2018) Equity in Education: Breaking Down Barriers to Social Mobility. Paris: OECD Publishing.

PASCARELLA, E.T. & TERENZINI, P.T. (1979) Interaction effects in Spady'sand Tinto's conceptual models of college dropout. Sociology of Education, 52, 197-210.

PEÑA-AYALA, A. (2014) Review: Educational data mining: Asurvey and adata mining-based analysis of recent works. Expert Systems with Applications, 1432-62.

POWERS, D.M.W. (2011) Evaluation: from precision, recall and f-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning, 2(1), 37-63.

ROMERO, C. & VENTURA, S. (2010) Educational data mining: Areview of the state of the art. IEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews), 601-18.

SAMMUT, C. & WEBB, G. (2017) Encyclopedia of Machine Learning and Data Mining. New York: Springer US.

SANTANA, M., COSTA, E., NETO, B., SILVA, I. & REGO, J. (2015) Apredicitive model for identifying students with dropout profiles in online courses. Workshops Proceedings of EDM 2015: 8th International Conference on Educational Data Mining, CEUR Workshop Proceedings 1446.

S ARA, N.-B., HALLAND, R., IGEL, C. & ALSTRUP, S. (2015) High-school dropout prediction using machine learning: ADanish large-scale study. ESANN 2015 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence, 319-24.

SCHAPIRE, E. & FREUND, Y. (1997) Adecision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science, 55, 119-39.

SCHAPIRE, R.E. & FREUND, Y. (2012) Boosting- Foundations and Algorithms. Massachusetts: Institute of Technology.

SCHULZE-STOCKER, F., SCHÄFER-HOCK, C. & PELZ, R. (2017) Weniger Studienabbruch durch Frühwarnsysteme – Das Beispiel des PASST?-Programms an der TU Dresden. [Reducing the number of student dropouts using early warning systems- An example of the PASST? programme at TU Dresden]. Zeitschrift für Beratung und Studium, 26-32.

STATISTISCHES BUNDESAMT . (2015) Bevölkerung und Erwerbstätigkeit. Bevölkerung mit Migrationshintergrund – Ergebnisse des Mikrozensus 2015 [Population and employment. Immigrants – Results from the Mikrozensus].

STINEBRICKNER, T. & STINEBRICKNER, R. (2008) The effect of credit constraints on the college drop-out decision: Adirect approach using anew panel study. American Economic Review, 98, 2163-84.

STINEBRICKNER, T. & STINEBRICKNER, R. (2012) Learning about academic ability and the college dropout decision. Journal of Labor Economics, 32, 707-48.

STINEBRICKNER, T. & STINEBRICKNER, R. (2013) Amajor in science? Initial beliefs and final outcomes for college major and dropout. Review of Economic Studies, 81, 426-72.

STINEBRICKNER, T. & STINEBRICKNER, R. (2014) Academic performance and college dropout: using longitudinal expectations data to estimate alearning model. Journal of Labor Economics, 32, 601-44.

STRECHT, P., CRUZ, L., SOARES, C., MENDES-MOREIRA, J. & ABREU, R. (2015) Acomparative study of classification and regression algorithms for modelling students' academic performance. Proceedings of the 8th International Conference on Educational Data Mining, 392-95.

SWETS, J.A. (1988) Measuring the accuracy of diagnostic systems. American Association for the Advancement of Science, 1285-93.

TING, K.M. (2011) Precision and recall. In C. Sammut & G. Webb, eds. Encyclopedia of Machine Learning. Springer US. 781 & 901.

TINTO, V. (1975) Dropout from higher education: Atheoretical synthesis of recent research. Review of Educational Research, 45, 89-125.

TRAPMANN, S., HELL, B., WEIGAND, S. & SCHULER, H. (2007) Die Validität von Schulnoten zur Vorhersage des Studienerfolgs- eine Metaanalyse [The validity of school grades for predicting study success- ameta-analysis]. Zeitschrift für pädagogische Psychologie, 21, 11- 27.

WESTERHOLT, N., LENZ, L., STEHLING, V. & ISENHARDT, I. (2018) Beratung und Mentoring im Studienverlauf- Ein Handbuch [Counseling and mentoring during your studies- A handbook.]. Münster: Waxmann.

WIERS-JENSSEN, J., STENSAKER, B. & GROGAARD, J.B. (2002) Student satisfaction: towards an empirical deconstruction of the concept. Quality in Higher Education, 8, 183-95.

XENOS, M. (2004) Prediction and assessment of student behaviour in open and distance education in computers using Bayesian networks. Computers & Education Journal, 345-59.

YUKSELTURK, E., OZEKES, S. & TÜREL, Y.K. (2014) Predicting dropout student: An application of data mining methods in an online education program. European Journal of Open, Distance and e-Learning, 118-33.

ZAFAR, M..B., VALERA, I., RDRIGUEZ, M.G. & GUMMADI, K.P. (2017) Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. WWW '17 Proceedings of the 26th International Conference on World Wide Web, 1171-80.

ZHANG, Y., OUSSENA, S., CLARK, T. & KIM, H. (2010) Use data mining to improve student retention in higher education- acase study. Proceedings of the 12th International Conference on Enterprise Information Systems, Volume 1, DISI, Funchal, Madeira, Portugal, June 8- 12, 2010.

ZWEIG, M.H. & CAMPBELL, G. (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39(4), 561-77.