Analysis of Click-Stream Data to Predict STEM Careers from Student Usage of an Intelligent Tutoring System
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
In recent years, we have seen the continuous and rapid increase of job openings in Science, Technology, Engineering and Math (STEM)-related fields. Unfortunately, these positions are not met with an equal number of workers ready to fill them. Efforts are being made to find durable solutions for this phenomena, and they start by encouraging young students to enroll in STEM college majors. However, enrolling in a STEM major requires specific skills in math and science that are learned in schools. Hopefully, institutions are adopting educational software that collects data from the students' usage. This gathered data will serve to conduct analysis and detect students' behaviors, predict their performances and their eventual college enrollment. As we will outline in this paper, we used data collected from the students' usage of an Intelligent Tutoring System to predict whether they would pursue a career in STEM-related fields. We conducted different types of analysis called "problem-based approach" and "skill-based approach". The problem- based approach focused on evaluating students' actions based on the problems they solved. Likewise, in the skill-based approach we evaluated their usage based on the skills they had practiced. Furthermore, we investigated whether comparing students' features with those of their peer schoolmates can improve the prediction models in both the skill-based and the problem-based approaches. The experimental re- sults showed that the skill-based approach with school aggregation achieved the best results with regard to a combination of two metrics which are the Area Under the Receiver Operating Characteristic Curve (AUC) and the Root Mean Squared Error (RMSE).
How to Cite
##plugins.themes.bootstrap3.article.details##
STEM career, predictive analytics, educational data mining, intelligent tutoring system
BAKER, R. S., CORBETT, A. T., AND KOEDINGER, K. R. 2004. Detecting student misuse of intelligent tutoring sySTEMs. In Intelligent Tutoring SySTEMs. Springer Berlin Heidelberg, Berlin, Heidelberg, 531–540.
BALFANZ, R. 2009. Putting Middle Grades Students on the Graduation Path A Policy and Practice Brief. National Middle School Association, Westerville, OH.
CHUN-KIT, Y. AND DIT-YAN, Y. 2018. Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale. ACM, New York, NY, USA, 5:1–5:10.
CHUN-KIT, Y., ZIZHENG, L., KAI, Y., AND DIT-YAN, Y. 2018. Incorporating features learned by an enhanced deep knowledge tracing model for STEM/non-STEM job prediction. https://drive. google.com/file/d/1OSKLK5lXUHFGEPKfsbUhB3BRHvU-vMkk/view. Workshop on the Scientific Findings from the ASSISTments Longitudinal Data Competition in the 11th International Conference on Educational Data Mining, Buffalo, NY, USA.
CORBETT, A. T. AND ANDERSON, J. R. 1995. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction 4, 4 (Dec), 253–278.
EFFAT, F., MAAZ, S. K., WENJIA, C., AND COLLIN, F. L. 2018. Predicting post-college STEM career entrance from middle school clickstream data. https://drive.google.com/file/d/ 1dLOYDtQsl1C1kClKdbPrPvEr8eZ2kjik/view. Workshop on the Scientific Findings from the ASSISTments Longitudinal Data Competition in the 11th International Conference on Educational Data Mining, Buffalo, NY, USA.
JOEL, M. AND KURT, V. 1995. Student assessment using Bayesian nets. International Journal of Human Computer Studies 42, 6 (6), 575–591.
MAKHLOUF, J. AND MINE, T. 2018. Investigating how school-aggregated data can influence in predicting STEM careers from student usage of an intelligent tutoring sySTEM. https://drive.google. com/file/d/1XS1spxOdbFkFfRsjTAo-r0_hTTe6mkTO/view. Workshop on the Scientific Findings from the ASSISTments Longitudinal Data Competition in the 11th International Conference on Educational Data Mining, Buffalo, NY, USA.
NOONAN, R. 2017. Stem jobs: 2017 update. Office of the Chief Economist, Economics and Statistics Administration, U.S. Department of Commerce(ESA Issue Brief 02-17).
OLENCHAK, F. AND HBERT, T. 2002. Endangered academic talent: Lessons learned from gifted firstgeneration college males. Journal of College Student Development 43, 2 (03), 195–212.
OLSON, R. S., BARTLEY, N., URBANOWICZ, R. J., AND MOORE, J. H. 2016. Evaluation of a treebased pipeline optimization tool for automating data science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. ACM, New York, NY, USA, 485–492.
PARDOS, Z. A., BAKER, R. S., SAN PEDRO, M. O., GOWDA, S. M., AND GOWDA, S. M. 2013. Affective states and state tests: Investigating how affect throughout the school year predicts end of year learning outcomes. In Proceedings of the Third International Conference on Learning Analytics and Knowledge. ACM, New York, NY, USA, 117–124.
PASCARELLA, E. T., PIERSON, C. T., WOLNIAK, G. C., AND TERENZINI, P. T. 2004. First-generation college students: Additional evidence on college experiences and outcomes. Journal of Higher Education 75, 3 (5), 249–284.
PAVLIK, P. I., CEN, H., AND KOEDINGER, K. R. 2009. Performance factors analysis – a new alternative to knowledge tracing. In Proceedings of the 2009 Conference on Artificial Intelligence in Education: Building Learning SySTEMs That Care: From Knowledge Representation to Affective Modelling. IOS Press, Amsterdam, The Netherlands, 531–538.
PIECH, C., BASSEN, J., HUANG, J., GANGULI, S., SAHAMI, M., GUIBAS, L. J., AND SOHL-DICKSTEIN, J. 2015. Deep knowledge tracing. In Advances in Neural Information Processing SySTEMs 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 505–513.
REYE, J. 2004. Student modelling based on belief networks. International Journal of Artificial Intelligence in Education 14, 1 (jan), 63–96.
RUITAO, L. AND AIXIN, T. 2018. Stem career prediction using an automatic machine learning approach. https://drive.google.com/file/d/1ps_LX8mDSdnyY79FqCczlb8igqgX1JPA/view. Workshop on the Scientific Findings from the ASSISTments Longitudinal Data Competition in the 11th International Conference on Educational Data Mining, Buffalo, NY, USA.
SABOURIN, J., MOTT, B., AND LESTER, J. C. 2011. Modeling learner affect with theoretically grounded dynamic Bayesian networks. In Affective Computing and Intelligent Interaction. Springer Berlin Heidelberg, Berlin, Heidelberg, 286–295.
SAN PEDRO, M. O., BAKER, R. S., BOWERS, A., AND HEFFERNAN, N. T. 2013. Predicting college enrollment from student interaction with an intelligent tutoring sySTEM in middle school. In Proceedings of the 6th International Conference on Educational Data Mining. 177–184.
SAN PEDRO, M. O., BAKER, R. S., AND RODRIGO, M. M. T. 2011. Detecting carelessness through contextual estimation of slip probabilities among students using an intelligent tutor for mathematics. In Proceedings of the 15th International Conference on Artificial Intelligence in Education. Springer Berlin Heidelberg, Berlin, Heidelberg, 304–311.
SAN PEDRO, M. O., OCUMPAUGH, J., BAKER, R. S., AND HEFFERNAN, N. T. 2014. Predicting STEM and non-STEM college major enrollment from middle school interaction with mathematics educational software. In Proceedings of the 7th International Conference on Educational Data Mining. 276–279.
THANAPORN, P., HEFFERNAN, N. T., AND BAKER, R. S. 2018. Assistments longitudinal data mining competition 2017: A preface. https://drive.google.com/file/d/ 1Dt6xhFHTqpqvJp9rOcfc2X-3l2l9gG1J/view. Workshop on the Scientific Findings from the ASSISTments Longitudinal Data Competition in the 11th International Conference on Educational Data Mining, Buffalo, NY, USA.
WHALEN, D. F. AND S HELLEY II, M. C. 2010. Academic success for STEM and non-STEM. Journal of STEM Education: Innovations and Research 11, 1, 45 – 60.
XUELI, W. 2012. Modeling student choice of STEM fields of study: Testing a conceptual framework of motivation, high school learning, and postsecondary context of support. WISCAPE Working Paper. Wisconsin Center for the Advancement of Postsecondary Education.
XUELI, W. 2013. Why students choose STEM majors: Motivation, high school learning, and postsecondary context of support. American Educational Research Journal 50, 5, 1081–1121.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.