A Comparison of Educational Statistics and Data Mining Approaches to Identify Characteristics that Impact Online Learning

L. Dee Miller; Leen-Kiat  Soh; Ashok Samal; Kevin Kupzyk; Gwen Nugent

doi:10.5281/zenodo.3554731

A Comparison of Educational Statistics and Data Mining Approaches to Identify Characteristics that Impact Online Learning

PDF

Published October 23, 2015

DOI: https://doi.org/10.5281/zenodo.3554731

L. Dee Miller

University of Nebraska

Leen-Kiat Soh

University of Nebraska

Ashok Samal

University of Nebraska

Kevin Kupzyk

University of Nebraska

Gwen Nugent

University of Nebraska

Abstract

Learning objects (LOs) are important online resources for both learners and instructors and usage for LOs is growing. Automatic LO tracking collects large amounts of metadata about individual students as well as data aggregated across courses, learning objects, and other demographic characteristics (e.g. gender). The challenge becomes identifying which of the many variables derived from tracked data are useful for predicting student learning. This challenge has prompted considerable research in the field of educational data mining and learning analytics. This work advances such research in four ways. First, we bring together two approaches for finding salient variables from separate research areas: hierarchical linear modeling (HLM) from education and Lasso feature selection from computer science. Second, we show that these two approaches have complimentary and synergistic results with some variables considers salient by both and others salient by only one. Third, and most importantly, we demonstrate the benefits of a combined approach that considers a variable salient when either HLM or Lasso consider that variable salient. This combined approach both improves model predictive accuracy and finds additional variables considered salient in previous datasets on student learning. Lastly, we use the results to provide insights into the salient variables to the learning outcome in undergraduate CS education. Overall, this work suggests a combined approach that improves the identification of salient variables in big data and also improves the design of LO tracking systems for learning management systems.

How to Cite

A Comparison of Educational Statistics and Data Mining Approaches to Identify Characteristics that Impact Online Learning. (2015). Journal of Educational Data Mining, 7(3), 117-150. https://doi.org/10.5281/zenodo.3554731

Abstract 858 | PDF Downloads 1349

Keywords

learning object tracking, predicting student learning, hierarchical linear modeling (HLM), lasso feature selection

References

ALFONS, A. 2012. cvTools: Cross-validation tools for regression models. R package version 0.3.2.

ALVARADO, B., ZUNZUNEGUI, M., DELISLE, H., AND OSORNO, J. 2005. Growth trajectories are influenced by breast-feeding and infant health in an afro-colombian community. Journal of Nutrition, 2171– 2178.

BAKER, R. 2010. International Encyclopedia of Education (3rd edition). Oxford, UK: Elsevier, Chapter Data mining in education.

BERGIN, S., REILLY, R., AND TRAYNOR, D. 2005. Examining the role of self-regulated learning on introductory programming performance. In Proceedings of the 1st international workshop on Computing education research. 81–86.

BERK, J. 2004. The state of learning analytics. T&D, 34–39.

BIENKOWSKI, M., FENG, M., AND MEANS, B. 2012. Enhancing teaching and learning through educational data mining and learning analytics: An issue brief. Tech. rep., U.S. Department of Education.

CHEN, C. 2002. Self-regulated learning strategies and achievement in an introduction to information systems course. Information Technology, Learning, and Performance Journal 20, 11–23.

COHEN, J., COHEN, P., WEST, S., AND AIKEN, L. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd edition). Mahwah, NJ: Lawrence Earlbaum Associates, Inc.

CREDE, M., PHILLIPS, L. A. 2011. A meta-analytic review of the Motivated Strategies for Learning Questionnaire. Learning and Individual Differences 21, 337-346.

DAVIS, J., PENSKY, M., AND CRAMPTON, W. 2011. Bayesian feature selection for classification with possibly large number of classes. Journal of Statistical Planning and Inference 141, 3256–3266.

DELEN, D. 2009. Analysis of cancer data: A data mining approach. Expert Systems 26, 100–112. EDFacts. 2014. The edfacts initiative. U.S. Department of Education.

FERRON, J., BELL, B., HESS, M., RENDINA-GOBIOFF, G., AND HIBBARD, S. 2009. Making treatment effect inferences from multiple-baseline data: The utility of multilevel modeling approaches. Behavior Research Methods 41, 372–384.

FRANCIA, G. 2003. A tale of two learning objects. Journal of Educational Technology Systems 3, 117– 190.

FRIEDMAN, J., HASTIE, T., HOFLING, H., AND TIBSHIRANI, R. 2007. Pathwise coordinate optimization. The Annals of Applied Statistics 1, 302–332.

GRAVETTER, F. AND WALLNAU, L. 2004. Statistics for the Behavioral Sciences (6th edition). Belmont: Wadsworth/Thomson Learning.

HALKITIS, P., PALAMAR, J., AND MUKHERJEE, P. 2008. Analysis of HIV medication adherence in relation to person and treatment characteristics using hierarchical linear modeling. AIDS Patient Care and STDs 22, 323–335.

HASTIE, T., TIBSHIRANI, R., AND FRIEDMAN, J. 2011. The Elements of Statistical Learning (2nd edition). Springer-Verlag.

HERNANDEZ-LLOREDA, M., COLMENARES, F., AND MARTINEZ-ARIAS, R. 2004. Application of piecewise hierarchical linear growth modeling to the study of continuity in behavioral development of baboons (papio hamadryas). Journal of Comparative Psychology 118, 316–324.

HINDMAN, A., SKIBBE, L., AND ZIMMERMAN, M. 2010. Ecological contexts and early learning: Contributions of child, family, and classroom factors during head start, to literacy and mathematics growth through first grade. Early Childhood Research Quarterly 25, 235–250.

HOFMANN, D. AND GAVIN, M. 1998. Centering Decisions in Hierarchical Linear Models: Implications for Research Organizations. Journal of Management 24, 623–641.

KAHN, J. 2011. Multilevel modeling: overview and applications to research in counseling psychology. Journal of Counseling Psychology 58, 257–271.

KRUGER, A., MERCERON, A., AND WOLF, B. 2010. A data model to ease analysis and mining of educational data. In 3rd International Conference on Educational Data Mining (EDM). 131–140.

LOCKER, L., HOFFMAN, L., AND BOVAIRD, J. 2007. On the use of multilevel modeling as an alternative to items analysis in psycholinguistic research. Behavior Research Methods 39, 723–730.

MAYER, R. 2001. Multimedia Learning. New York: Cambridge University Press.

MCGREAL, R. 2004. Online Education Using Learning Objects. Psychology Press.

MCLAREN, B., SCHEUER, O., AND MIKSATKO, J. 2010. Supporting collaborative learning and e-discussions using artificial intelligence techniques. International Journal of Artificial Intelligence in Education 20, 1–46.

MILLER, L., SOH, L.-K., NUGENT, G., KUPZYK, K., MASMALIYEVA, L., AND SAMAL, A. 2011a. Evaluating the use of learning objects in CS1. In Proceedings of the 42nd ACM Technical Symposium on Computer Science Education. 57–62.

MILLER, L., SOH, L.-K., NEILSEN, B., LAM, E., SAMAL, A., KUPZYK, K., AND NUGENT, G. 2011b. Revising computer science learning objects from learner interaction data. In Proceedings of the 42nd ACM Technical Symposium on Computer Science Education. 45–50.

MILLER, L., SOH, L.-K., NUGENT, G., AND SAMAL, A. 2011c. iLOG: A framework for automatic annotation of learning objects with empirical usage metadata. International Journal of Artificial Intelligence in Education, 215–236. NSF. 2012. Core techniques and technologies for advancing big data science and engineering. National Science Foundation.

NUGENT, G., KUPZYK, K., MILLER, L., MASMALIYEVA, L., SOH, L.-K., AND SAMAL, A. 2011. Learning analytic approach to identify attributes of learners and multimedia instruction that influence learning. In Proceedings of the World Conference on Educational Multimedia, Hypermedia, and Telecommunications. 2021–2028.

NUGENT, G., KUPZYK, K., RILEY, S., MILLER, L., HOSTETLER, J., SOH, L.-K., AND SAMAL, A. 2009. Empirical usage metadata in learning objects. In Proceedings of the Frontiers in Education. 1–8.

NUGENT, G., SOH, L.-K., AND SAMAL, A. 2006. Design, development, and validation of learning objects. Journal of Educational Technology Systems 34, 271–281.

OCHOA, X. AND DUVAL, E. 2009. Relevance ranking metrics for learning objects. IEEE Transactions on Learning Technologies, 34–48.

PAPADIMITRIOU, A., GRIGORIADOU, M., AND GYFTODIMOS, G. 2009. Interactive problem solving support in the adaptive educational hypermedia system mathema. IEEE Transactions on Learning Technologies 2, 93–106.

PICKERING, R. 2002. Statistical aspects of measurement in palliative care. Palliative Medicine 16, 359– 364.

PINTRICH, P., SMITH, D., GARCIA, T., AND MCKEACHIE, W. 1993. Reliability and predictive validity of the motivated strategies for learning questionnaire (MSLQ). Educational and Psychological Measurement 53, 801–813.

PINTRICH, P., SMITH, D., GARCIA, T., AND MCKEACHIE, W. 1999. Ann Arbor, MI: University of Michigan. A Manual for the Use of the Motivated Strategies for Learning Questionnaire. Ann Arbor, MI: University of Michigan.

RAMASWAMI, M. AND BHASKARAN, R. 2009. A study on feature selection techniques in educational data mining. Journal of Computing 1, 7–11.

RAUDENBUSH, S. AND BRYK, A. 2002. Hierarchical Linear Models: Applications and Data Analysis Methods (2nd edition). Newbury Park, CA: Sage.

RILEY, S., MILLER, L., SOH, L.-K., SAMAL, A., AND NUGENT, G. 2009. Intelligent learning object guide (iLOG): A framework for automatic empirically-based metadata generation. In Proceedings of the International Conference on Artificial Intelligence in Education. 515–522.

ROMERO, C., ROMERO, J., LUNA, J., AND VENTURA, S. 2010. Mining rare association rules from e-learning data. In Proceedings of the 3rd International Conference on Educational Data Mining (EDM). 171–180.

ROMERO, C. AND VENTURA, S. 2010. Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics 40, 601–618.

ROMERO, C., VENTURA, S., ESPEJO, P., AND HERVAS, C. 2008. Data mining algorithms to classify students. In Proceedings of the 1st International Conference on Educational Data Mining. 8–17.

ROUNTREE, N., ROUNTREE, J., AND ROBINS, A. 2002. Predictors of success and failure in a CS1 course. In Proceedings of the 33rd SIGCSE technical symposium on Computer Science Education. 121–124.

SAYES, Y., INZA, I., AND LARRANGA, P. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics, 2507–2517.

SCHONFELD, I. AND RINDSKOPF, D. 2007. Hierarchical linear modeling in organizational research longitudinal data outside the context of growth modeling. Organizational Research Methods 10, 417– 429.

SEO, D. AND LI, K. 2009. Effects of college climate on students’ binge drinking: hierarchical generalized linear model. Annals of Behavioral Medicine 38, 262–268.

SHUTE, V. AND TOWLE, B. 2003. Adaptive e-learning. Educational Psychologist 38, 105–114.

SIMON, N., FRIEDMAN, J., HASTIE, T., AND TIBSHIRANI, R. 2011. Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of Statistical Software 39, 1–13.

SNOW, R. 1994. Mind in context: Interactionist perspectives on human intelligence. Cambridge: Cambridge University Press, Chapter Abilities in Academic Tasks.

SORGE, C. 2007. What happens? Relationship of age and gender with science attitudes from elementary to middle school. Science Educator 16, 33–37.

STACK, S. AND KPOSOWA, A. 2008. The association of suicide rates with individual-level suicide attitudes: A cross-national analysis. Social Science Quarterly 89, 39–59.

TERRACCIANO, A., MCCRAE, R., BRANT, L., AND COSTA, P. 2005. Hierarchical linear modeling analyses of the NEO-PI-R scales in the Baltimore longitudinal study of aging. Psychology and Aging 20, 493– 506.

TIBSHIRANI, R. 1996. Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society 58, 267–288.

VENTURA, P. 2005. Identifying predictors of success for an objects-first CS1. Computer Science Education 15, 223–243.

WIEDENBECK, S., LABELLE, D., AND KAIN, V. 2004. Factors affecting course outcomes in introductory programming. In 16th Workshop of the Psychology of Programming Interest Group. 97–110.

WILSON, B. AND SHROCK, S. 2001. Contributing to success in an introductory computer science course: a study of twelve factors. In Proceedings of the 32nd SIGCSE technical symposium on Computer Science Education. 184–188.

YUAN, M. AND LIN, Y. 2006. Model selection and estimation in regression with grouped variables. Journal of Royal Statistical Society 68, 49–67.

ZHAO, P. AND YU, B. 2006. On model selection consistency of lasso. Journal of Machine Learning Research 7, 2541–2563.

Issue

Vol 7 No 3 (2015)

Section

Articles

Authors who publish with this journal agree to the following terms:

The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:

Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
The Author represents and warrants that:

the Work is the Author’s original work;
the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
the Work is not pending review or under consideration by another publisher;
the Work has not previously been published;
the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
the Work contains no libel, invasion of privacy, or other unlawful matter.

The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.

Main

Sidebar

Abstract

How to Cite

Details