Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
To successfully reduce student attrition, it is imperative to understand what the underlying determinants of attrition are and which students are at risk of dropping out. We develop an early detection system (EDS) using administrative student data from a state and private university to predict student dropout as a basis for a targeted intervention. To create an EDS that can be used in any German university, we use the AdaBoost Algorithm to combine regression analysis, neural networks, and decision trees - instead of relying on only one specific method. Prediction accuracy at the end of the first semester is 79% for the state university and 85% for the private university of applied sciences. After the fourth semester, the accuracy improves to 90% for the state university and 95% for the private university of applied sciences.
How to Cite
##plugins.themes.bootstrap3.article.details##
student dropout, early detection, administrative data, higher education, AdaBoost
ARNOLD, K. & PISTILLI, M. (2012) Course signals at Purdue: using learning analytics to increase student success. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, 267-70.
ASIF, R., MERCERON, A., ALI, S.A. & HAIDER, N.G. (2017) Analyzing undergraduate students' performance using educational data mining. Computers & Education, 177-94.
BAKER, R.S.J. D. (2010) Statistical data mining tutorials. In B. McGaw, P. Peterson & E. Baker, eds. International Encyclopedia Of Education. UK: Elsevier. 112-18.
BAKER, R.S.J. D. & YACEF, K. (2009) The state of educational data mining in 2009: Areview and future visions. Journal of Educational Data Mining, 1(1), 3-16.
BARBER, R. & SHARKEY, M. (2012) Course correction: using analytics to predict course success. LAK '12 Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, 259-62.
BAREFOOT, B.O., GARDNER, J.N., CUTRIGHT, M., MORRIS, L.V., SCHROEDER, C.C., SCHWARTZ, S.W., SIEGEL, M.J. & SWING, R.L. (2005) Achieving And Sustaining Institutional Excellence for the First Year of College. San Francisco, CA.: Jossey-Bass.
BAYER, J., BYDZOVSKÁ, H., GÉRYK, J., OBSIVAC, T. & POPELINSKY, L. (2012) Predicting drop- out from social behaviour of students. Proceedings of the 5th International Conference on Educational Data Mining, 103-09.
BEAN, J.P. (1983) The application of amodel of turnover in working organizations to the student attrition process. The Review of Higher Education, 6, 129-48.
BERGER, M., GALONSKA, C. & KOOPMANS, R. (2004) Political integration by adetour? Ethnic communities and social capital of migrants in Berlin. Journal of Ethnic and Migration Studies, 30, 491-507.
BOUND, J., LOVENHEIM, M.F. & TURNER, S. (2010) Why have college completion rates declined? An analysis of changing student preparation and collegiate resources. American Economic Journal: Applied Economics, 2, 129-57.
BOWEN, W., CHINGOS, M. & MCPHERSON, M. (2009) Crossing the finish line: Completing college at America'spublic universities. Princeton: Princeton University Press.
BOWERS, A.J., SPROTT, R. & TAFF, T.A. (2013) Do we know who will drop out? Areview of the predictors of dropping out of high school: precision, sensitivity and specificity. The High School Journal, 77-100.
BRADLEY, A.P. (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, 30(7), 1145-59.
BRANDSTÄTTER, H. & FARTHOFER, A. (2002) Studienerfolgsprognose – konfigurativ oder linear additiv? [Predicting student success – Configurational or linear additive?]. Zeitschrift für Differentielle und Diagnostische Psychologie, 23, 381-91.
BRANDSTÄTTER, H., GRILLICH, L. & FARTHOFER, A. (2006) Prognose des Studienabbruchs [Predicting Student Dropout]. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 38, 121-31.
BREIMANN, L. (1996) Bagging predictors. Machine Learning, 24(2), 123-40.
BREIMANN, L. (2001) Random forests. Machine Learning, 45(1), 5-32.
BURRUS, J., ELLIOTT, D., BRENNEMANN, M. & MARKLE, R. (2013) Putting and keeping students on track: toward acomprehensive model of college persistence and goal attainment. ETS Research Report Series.
DANILOWICZ-GÖSELE, K., LERCHE, K., MEYA, J. & SCHWAGER, R. (2017) Determinants of students success at university. Education Economics, 25(5), 513-32.
DEKKER, G.W., PCHENENIZKIY, M. & VLEESHOUWERS, J.M. (2009) Predicting students drop out: acase study. In T. Barnes, M. Desmarais, C. Romero & S. Ventura, eds. Proceedings of the 2nd International Conference on Educational Data Mining. Cordoba, Spain. 41-50.
FENG, M., HEFFERNAN, N. & KOEDINGER, K.R. (2006) Predicting state test scores better with intelligent tutoring systems: developing metrics to measure assistance required. International Conference on Intelligent Tutoring Systems, 31-40.
FRANK, E., HALL, M.A. & WITTEN, I.H. (2016) The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques". Morgan Kaufmann.
GAEBEL, M., HAUSCHILDT, K., MÜHLECK, K. & SMIDT, H. (2012) Tracking learners' and graduates' progression paths. TRACKIT. EUA Publications.
GLEASON, P. & DYNARSKI, M. (2002) Do we know whom to serve? Issues in using risk factors to identify dropouts. Journal of Education for Students Placed At Risk, 25-41.
HALL, M., FRANK, E., HOLMES, G., PFAHRINGER, B., REUTEMANN, R. & WITTEN, I. (2009) The WEKA data mining software: an update. SIGKDD Explorations, 11(1), 10-18.
HEUBLEIN, U. (2014) Student drop-out from German higher education institutions. European Journal of Education. Research, Development and Policy, 49(4), 497-513.
HEUBLEIN, U. & BURKHART, S. (2013) Bildungsinländer 2011- daten und fakten zur situain von von ausländischen studierenden [Educational natives 2011 – Data and facts describing the situation of international students]. Bonn.
HUMPERT, A. & SCHNEIDERHEINZE, K. (2002) Stichprobenziehung für telefonische Zuwandererumfragen. Praktische Erfahrungen und Erweiterung der Auswahlgrundlage. [Sampling for telephone surveys of immigrants. Experience and broadening of the sampling frame]. Münster: Waxmann.
JETTER, T. (2017) Membrain NN [Online]. Available at: https://www.membrain-nn.de/ [Accessed 03 April 2019].
KEMPER, L., VORHOFF, G. & WIGGER, B.U. (2018) Predicting student dropout: Amachine learning approach [Online]. Available at: https://www.researchgate.net/publication/322919234_Predicting_Student_Dropout_a_Mac hine_Learning_Approach [Accessed 17 Juli 2019].
KNOWLES, J. (2015) Of needles and haystacks: building an accurate statewide dropout early warning system in Wisconsin. Journal of Educational Data Mining, 7(3), 18-67.
KOTSIANTIS, S.B., PIERRAKEAS, C.J. & PINTELAS, P.E. (2003) Preventing student dropout in distance learning- using machine learning techniques. Knowledge-Based Intelligent Information and Engineering Systems: 7th International Conference, KES 2003, Oxford, UK, Proceedings, Part II, 267-74.
LARSEN, M.L., KORNBECK, K.P., KRISTENSEN, R.M., LARSEN, M.R. & SOMMERSEL, H.B. (2013) Dropout phenomena at universities: What is dropout? Why does dropout occur? What can be done by the universities to prevent or reduce it? Asystematic review. Danish Clearinghouse for Educational Research.
MICHAEL, J. (2007) Anredebestimmung anhand des Vornamens [Determination of salutation by first name]. ćt, 17/2007, 182-83.
MICHAEL, J. (2016) Name Quality Pro (to be published). (available from the author; mail to: namequality.pro@gmail.com).
MINAEI-BIDGOLI, B., KORTEMEYER, G. & PUNCH, W.F. (2004) Enhancing online learning performance: An application of data mining methods. Proceedings of the Seventh IASTED International Conference on Computers and Advanced Technology in Education, 173-178.
MUCHERINO, A., PAPAJORGJI, P.J. & PARDALOS, P.M. (2009) k-nearest neighbor classification. Data Mining in Agriculture. Springer Optimization and Its Applications, 34, 109-13.
NGHE, N.T., JANECEK, P. & HADDAWAY, P. (2007) Acomparative analysis of techniques for predicting academic performance. Frontiers in Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports, FIE ́0, 37th Annual IEEE, 7- 12.
OECD. (2016) Immigrant background, student performance and students' attitudes towards science. In PISA 2015 Results (Volume I): Excellence and Equity in Education. Paris: OECD Publishing.
OECD. (2018) Equity in Education: Breaking Down Barriers to Social Mobility. Paris: OECD Publishing.
PASCARELLA, E.T. & TERENZINI, P.T. (1979) Interaction effects in Spady'sand Tinto's conceptual models of college dropout. Sociology of Education, 52, 197-210.
PEÑA-AYALA, A. (2014) Review: Educational data mining: Asurvey and adata mining-based analysis of recent works. Expert Systems with Applications, 1432-62.
POWERS, D.M.W. (2011) Evaluation: from precision, recall and f-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning, 2(1), 37-63.
ROMERO, C. & VENTURA, S. (2010) Educational data mining: Areview of the state of the art. IEEE Transactions on Systems Man and Cybernetics Part C(Applications and Reviews), 601-18.
SAMMUT, C. & WEBB, G. (2017) Encyclopedia of Machine Learning and Data Mining. New York: Springer US.
SANTANA, M., COSTA, E., NETO, B., SILVA, I. & REGO, J. (2015) Apredicitive model for identifying students with dropout profiles in online courses. Workshops Proceedings of EDM 2015: 8th International Conference on Educational Data Mining, CEUR Workshop Proceedings 1446.
S ARA, N.-B., HALLAND, R., IGEL, C. & ALSTRUP, S. (2015) High-school dropout prediction using machine learning: ADanish large-scale study. ESANN 2015 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence, 319-24.
SCHAPIRE, E. & FREUND, Y. (1997) Adecision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science, 55, 119-39.
SCHAPIRE, R.E. & FREUND, Y. (2012) Boosting- Foundations and Algorithms. Massachusetts: Institute of Technology.
SCHULZE-STOCKER, F., SCHÄFER-HOCK, C. & PELZ, R. (2017) Weniger Studienabbruch durch Frühwarnsysteme – Das Beispiel des PASST?-Programms an der TU Dresden. [Reducing the number of student dropouts using early warning systems- An example of the PASST? programme at TU Dresden]. Zeitschrift für Beratung und Studium, 26-32.
STATISTISCHES BUNDESAMT . (2015) Bevölkerung und Erwerbstätigkeit. Bevölkerung mit Migrationshintergrund – Ergebnisse des Mikrozensus 2015 [Population and employment. Immigrants – Results from the Mikrozensus].
STINEBRICKNER, T. & STINEBRICKNER, R. (2008) The effect of credit constraints on the college drop-out decision: Adirect approach using anew panel study. American Economic Review, 98, 2163-84.
STINEBRICKNER, T. & STINEBRICKNER, R. (2012) Learning about academic ability and the college dropout decision. Journal of Labor Economics, 32, 707-48.
STINEBRICKNER, T. & STINEBRICKNER, R. (2013) Amajor in science? Initial beliefs and final outcomes for college major and dropout. Review of Economic Studies, 81, 426-72.
STINEBRICKNER, T. & STINEBRICKNER, R. (2014) Academic performance and college dropout: using longitudinal expectations data to estimate alearning model. Journal of Labor Economics, 32, 601-44.
STRECHT, P., CRUZ, L., SOARES, C., MENDES-MOREIRA, J. & ABREU, R. (2015) Acomparative study of classification and regression algorithms for modelling students' academic performance. Proceedings of the 8th International Conference on Educational Data Mining, 392-95.
SWETS, J.A. (1988) Measuring the accuracy of diagnostic systems. American Association for the Advancement of Science, 1285-93.
TING, K.M. (2011) Precision and recall. In C. Sammut & G. Webb, eds. Encyclopedia of Machine Learning. Springer US. 781 & 901.
TINTO, V. (1975) Dropout from higher education: Atheoretical synthesis of recent research. Review of Educational Research, 45, 89-125.
TRAPMANN, S., HELL, B., WEIGAND, S. & SCHULER, H. (2007) Die Validität von Schulnoten zur Vorhersage des Studienerfolgs- eine Metaanalyse [The validity of school grades for predicting study success- ameta-analysis]. Zeitschrift für pädagogische Psychologie, 21, 11- 27.
WESTERHOLT, N., LENZ, L., STEHLING, V. & ISENHARDT, I. (2018) Beratung und Mentoring im Studienverlauf- Ein Handbuch [Counseling and mentoring during your studies- A handbook.]. Münster: Waxmann.
WIERS-JENSSEN, J., STENSAKER, B. & GROGAARD, J.B. (2002) Student satisfaction: towards an empirical deconstruction of the concept. Quality in Higher Education, 8, 183-95.
XENOS, M. (2004) Prediction and assessment of student behaviour in open and distance education in computers using Bayesian networks. Computers & Education Journal, 345-59.
YUKSELTURK, E., OZEKES, S. & TÜREL, Y.K. (2014) Predicting dropout student: An application of data mining methods in an online education program. European Journal of Open, Distance and e-Learning, 118-33.
ZAFAR, M..B., VALERA, I., RDRIGUEZ, M.G. & GUMMADI, K.P. (2017) Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. WWW '17 Proceedings of the 26th International Conference on World Wide Web, 1171-80.
ZHANG, Y., OUSSENA, S., CLARK, T. & KIM, H. (2010) Use data mining to improve student retention in higher education- acase study. Proceedings of the 12th International Conference on Enterprise Information Systems, Volume 1, DISI, Funchal, Madeira, Portugal, June 8- 12, 2010.
ZWEIG, M.H. & CAMPBELL, G. (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39(4), 561-77.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.