Examining Algorithmic Fairness for First- Term College Grade Prediction Models Relying on Pre-matriculation Data

Takeshi Yanagiura; Shiho Yano; Masateru Kihira; Yukihiko Okada

doi:10.5281/zenodo.10117682

Examining Algorithmic Fairness for First- Term College Grade Prediction Models Relying on Pre-matriculation Data

HTML PDF

Published Dec 26, 2023

DOI https://doi.org/10.5281/zenodo.10117682

Takeshi Yanagiura

University of Tsukuba

https://orcid.org/0000-0003-0525-2883

Shiho Yano

Univerity of Tsukuba

Masateru Kihira

Univerity of Tsukuba

https://orcid.org/0000-0003-1273-2906

Yukihiko Okada

University of Tsukuba

https://orcid.org/0000-0003-4903-4191

Abstract

Many colleges use AI-powered early warning systems (EWS) to provide support to students as soon as
they start their first semester. However, concerns arise regarding the fairness of an EWS algorithm when
deployed so early in a student’s college journey, especially at institutions with limited data collection
capacity. To empirically address this fairness concern within this context, we developed a GPA prediction
algorithm for the first semester at an urban Japanese private university, relying exclusively on demographic
and pre-college academic data commonly collected by many colleges at matriculation. Then we
assessed the fairness of this prediction model between at-risk and lower-risk student groups. We also examined
whether the use of 33 novel non-academic skill data points, collected within the first three weeks
of matriculation, improves the model. Our analysis found that the model is less predictive for the at-risk
group than their majority counterpart, and the addition of non-academic skill data slightly improved the
model’s predictive performance but did not make the model fairer. Our research underscores that an early
adoption of EWS relying on pre-matriculation data alone may disadvantage at-risk students by potentially
overlooking those who genuinely require assistance.

How to Cite

Yanagiura, T., Yano, S., Kihira, M., & Okada, Y. (2023). Examining Algorithmic Fairness for First- Term College Grade Prediction Models Relying on Pre-matriculation Data. Journal of Educational Data Mining, 15(3), 1–25. https://doi.org/10.5281/zenodo.10117682

Abstract 727 | HTML Downloads 557 PDF Downloads 632

Keywords

algorithmic fairness, early warning system, predictive analytics, higher education, calibration

References

ADEBAYO, B. 2008. Cognitive and non-cognitive factors: Affecting the academic performance and retention of conditionally admitted freshmen. Journal of College Admission 200, 15–21.

ADELMAN, C. 2006. The toolbox revisited: Paths to degree completion from high school through college. US Department of Education.

AKOS, P., GREENE, J. A., FOTHERINGHAM, E., RAYNOR, S., GONZALES, J., AND GODWIN, J. 2022. The promise of noncognitive factors for underrepresented college students. Journal of College Student Retention: Research, Theory & Practice 24, 2, 575–602.

AKOS, P. AND KRETCHMAR, J. 2017. Investigating grit at a non-cognitive predictor of college success. The Review of Higher Education 40, 2, 163–186.

ANGWIN, J., LARSON, J., MATTU, S., AND KIRCHNER, L. 2016. Machine bias. https://www.propublica.org/article/ machine-bias-risk-assessments-in-criminal-sentencing. Accessed: 11- 22-2022.

ASTIN, A. W. 1984. Student involvement: A developmental theory for higher education. Journal of College Student Personnel 25, 4, 297–308.

BEATTIE, G., LALIBERTÉ, J.-W. P., AND OREOPOULOS, P. 2018. Thrivers and divers: Using nonacademic measures to predict college success and failure. Economics of Education Review 62, 170– 182.

BETTINGER, E. P. AND BAKER, R. B. 2014. The effects of student coaching: An evaluation of a randomized experiment in student advising. Educational Evaluation and Policy Analysis 36, 1, 3–19.

BIRD, K. A., CASTLEMAN, B., SONG, Y., AND YU, R. 2022. Is big data better? LMS data and predictive analytic performance in postsecondary education (edworkingpaper: 22-647). Annenberg Institute at Brown University.

BOWMAN, N. A., MILLER, A., WOOSLEY, S., MAXWELL, N. P., AND KOLZE, M. J. 2019. Understanding the link between noncognitive attributes and college retention. Research in Higher Education 60, 135–152.

CHEN, F. AND CUI, Y. 2020. Utilizing student time series behaviour in learning management systems for early prediction of course performance. Journal of Learning Analytics 7, 2, 1–17.

CHEN, T. AND GUESTRIN, C. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD 2016. Association for Computing Machinery, New York, NY, USA, 785–794.

CORBETT-DAVIES, S. AND GOEL, S. 2018. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023.

DIMEO, J. 2017. Data dive. https://www.insidehighered.com/digital-learning/ article/2017/07/19/georgia-state-improves-student-outcomes-data Accessed: 11-22-2022.

DOROUDI, S. AND BRUNSKILL, E. 2019. Fairer but not fair enough on the equitability of knowledge tracing. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge. LAK 2019. Association for Computing Machinery, New York, NY, USA, 335–339.

DRESSEL, J. AND FARID, H. 2018. The accuracy, fairness, and limits of predicting recidivism. Science advances 4, 1, eaao5580.

EKOWO, M. AND PALMER, I. 2016. The promise and peril of predictive analytics in higher education: A landscape analysis. Policy paper, New America.

FARRUGGIA, S. P., HAN, C.-W., WATSON, L., MOSS, T. P., AND BOTTOMS, B. L. 2018. Noncognitive factors and college student success. Journal of College Student Retention: Research, Theory & Practice 20, 3, 308–327.

FLORES, A. W., BECHTEL, K., AND LOWENKAMP, C. T. 2016. False positives, false negatives, and false analyses: A rejoinder to machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. Fed. Probation 80, 38.

FOSNACHT, K., COPRIDGE, K., AND SARRAF, S. A. 2019. How valid is grit in the postsecondary context? a construct and concurrent validity analysis. Research in Higher Education 60, 803–822.

GARDNER, J., POPOVIC, Z., AND SCHMIDT, L. 2022. Subgroup robustness grows on trees: An empirical baseline investigation. Advances in Neural Information Processing Systems 35, 9939–9954.

GARDNER, J., YU, R., NGUYEN, Q., BROOKS, C., AND KIZILCEC, R. 2023. Cross-institutional transfer learning for educational models: Implications for model performance, fairness, and equity. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. FAccT 2023. Association for Computing Machinery, New York, NY, USA, 1664–1684.

GEORGIA STATE UNIVERSITY. n.d. Student success programs at Georgia State. https://success.students.gsu.edu/early-alert/ Accessed: 11-22-2022.

GERSHENFELD, S., WARD HOOD, D., AND ZHAN, M. 2016. The role of first-semester gpa in predicting graduation rates of underrepresented students. Journal of College Student Retention: Research, Theory & Practice 17, 4, 469–488.

HANOVER RESEARCH. 2014. Early alert systems in higher education. Tech. rep. https://www.hanoverresearch.com/wp-content/uploads/2017/08/ Early-Alert-Systems-in-Higher-Education.pdf Accessed: 11-22-2022.

HART, S., DAUCOURT, M., AND GANLEY, C. 2017. Individual differences related to college students’ course performance in calculus II. Journal of Learning Analytics 4, 2, 129–153.

HECKMAN, J. J., STIXRUD, J., AND URZUA, S. 2006. The effects of cognitive and noncognitive abilities on labor market outcomes and social behavior. Journal of Labor Economics 24, 3, 411–482.

HUTT, S., GARDNER, M., DUCKWORTH, A. L., AND D’MELLO, S. K. 2019. Evaluating fairness and generalizability in models predicting on-time graduation from college applications. In Proceedings of the 12th International Conference on Educational Data Mining, C. F. Lynch, A. Merceron, M. Desmarais, and R. Nkambou, Eds. EDM 2019. International Educational Data Mining Society, 79–88.

JIANG, W. AND PARDOS, Z. A. 2021. Towards equity and algorithmic fairness in student grade prediction. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. AIES 2021. Association for Computing Machinery, New York, NY, USA, 608–617.

KIZILCEC, R. F. AND LEE, H. 2022. Algorithmic fairness in education. In The Ethics of Artificial Intelligence in Education,W. Holmes and K. Porayska-Pomsta, Eds. Routledge, New York, NY, 174–202.

KLEINBERG, J., LUDWIG, J., MULLAINATHAN, S., AND RAMBACHAN, A. 2018. Algorithmic fairness. In American Economic Association Papers and Proceedings,W. R. Johnson and K. Markel, Eds. Vol. 108. American Economic Association, 22–27.

KUH, G. D., CRUCE, T. M., SHOUP, R., KINZIE, J., AND GONYEA, R. M. 2008. Unmasking the effects of student engagement on first-year college grades and persistence. The Journal of Higher Education 79, 5, 540–563.

KUH, G. D., KINZIE, J., BUCKLEY, J. A., BRIDGES, B. K., AND HAYEK, J. C. 2011. Piecing together the student success puzzle: Research, propositions, and recommendations: ASHE higher education report. Vol. 116. John Wiley & Sons.

KUNG, C. AND YU, R. 2020. Interpretable models do not compromise accuracy or fairness in predicting college success. In Proceedings of the Seventh ACM Conference on Learning @ Scale. L@S 2020. Association for Computing Machinery, New York, NY, USA, 413–416.

LUNDBERG, S. M. AND LEE, S.-I. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, U. von Luxburg, I. Guyon, S. Bengio, H.Wallach, and R. Fergus, Eds. NeurIPS 2017, vol. 30. Curran Associates Inc., Red Hook, NY, 4768–4777.

MACFADYEN, L. P. AND DAWSON, S. 2010. Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers & Education 54, 2, 588–599.

MAHZOON, M. J., MAHER, M. L., ELTAYEBY, O., DOU, W., AND GRACE, K. 2018. A sequence data model for analyzing temporal patterns of student data. Journal of Learning Analytics 5, 1, 55–74.

MEHRABI, N., MORSTATTER, F., SAXENA, N., LERMAN, K., AND GALSTYAN, A. 2021. A survey on bias and fairness in machine learning. ACM Computing Surveys 54, 6 (jul), 1–35.

O’CONNELL, K. A., WOSTL, E., CROSSLIN, M., BERRY, T. L., AND GROVER, J. P. 2018. Student ability best predicts final grade in a college algebra course. Journal of Learning Analytics 5, 3, 167– 181.

PAQUETTE, L., OCUMPAUGH, J., LI, Z., ANDRES, A., AND BAKER, R. 2020. Who’s learning? using demographics in EDM research. Journal of Educational Data Mining 12, 3, 1–30.

PASCARELLA, E. T. AND TERENZINI, P. T. 2005. How College Affects Students: A Third Decade of Research. Volume 2. Vol. 2. Jossey-Bass, Indianapolis, IN.

PICKERING, J., CALLIOTTE, J., AND MCAULIFFE, G. 1992. The effect of noncognitive factors on freshman academic performance and retention. Journal of the First-Year Experience & Students in Transition 4, 2, 7–30.

PLAK, S., CORNELISZ, I., MEETER, M., AND VAN KLAVEREN, C. 2022. Early warning systems for more effective student counselling in higher education: Evidence from a dutch field experiment. Higher Education Quarterly 76, 1, 131–152.

SCLATER, N., PEASGOOD, A., AND MULLAN, J. 2016. Learning analytics in higher education: A review of UK and international practice. Tech. rep., JISC.

SIMONS, J. M. 2011. A national study of student early alert models at four-year institutions of higher education. Ph.D. thesis, Arkansas State University. UMI Order Number: AAT 8506171.

TINTO, V. 2012. Leaving college: Rethinking the causes and cures of student attrition, 2nd Edition. University of Chicago press.

TREVISAN, V. 2022. Using shap values to explain how your machine learning model works. https://towardsdatascience.com/ using-shap-values-to-explain-how-your-machine-learning-model-works-732b3f40e137. Accessed: 11-22-2022.

TYTON PARTNERS. 2022. Driving towards a degree: Closing outcome gaps through student supports. https://tytonpartners.com/ driving-towards-a-degree-closing-outcome-gaps-through-student-supports/ Accessed: 11-22-2022.

VON HIPPEL, P. T. AND HOFFLINGER, A. 2021. The data revolution comes to higher education: identifying students at risk of dropout in chile. Journal of Higher Education Policy and Management 43, 1, 2–23.

YU, R., LEE, H., AND KIZILCEC, R. F. 2021. Should college dropout prediction models include protected attributes? In Proceedings of the Eighth ACM Conference on Learning @ Scale. L@S 2021. Association for Computing Machinery, New York, NY, USA, 91–100.

YU, R., LI, Q., FISCHER, C., DOROUDI, S., AND XU, D. 2020. Towards accurate and fair prediction of college success: Evaluating different sources of student data. In Proceedings of the 13th International Conference on Educational Data Mining, A. N. Rafferty, J. Whitehill, V. Cavalli-Sforza, and C. Romero, Eds. EDM 2020. International Educational Data Mining Society, 292–301.

Issue

Vol. 15 No. 3 (2023)

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms:

The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:

Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
The Author represents and warrants that:

the Work is the Author’s original work;
the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
the Work is not pending review or under consideration by another publisher;
the Work has not previously been published;
the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
the Work contains no libel, invasion of privacy, or other unlawful matter.

The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Abstract

How to Cite

##plugins.themes.bootstrap3.article.details##