Massive open online courses (MOOCs) provide educators with an abundance of data describing how students interact with the platform, but this data is highly underutilized today. This is in part due to the lack of sophisticated tools to provide interpretable and actionable summaries of huge amounts of MOOC activity present in log data. To address this problem, we propose a student behavior representation method alongside a method for automatically discovering those student behavior patterns by leveraging the click log data that can be obtained from the MOOC platform itself. Specifically, we propose the use of a two-layer hidden Markov model (2L-HMM) to extract our desired behavior representation, and show that patterns extracted by such a 2L-HMM are interpretable and meaningful. We demonstrate that the proposed 2L-HMM can also be used to extract latent features from student behavioral data that correlate with educational outcomes.
How to Cite
MOOC, clickstream data, hidden Markov model, learning outcomes
CORBETT, A. T. AND ANDERSON, J. R. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction 4, 4, 253–278.
DAVIS, D., CHEN, G., HAUFF, C., AND HOUBEN, G.-J. 2016. Gauging MOOC learners' adherence to the designed learning path. In Proceedings of the 9th International Conference on Educational Data Mining, T. Barnes, M. Chi, and M. Feng, Eds. EDM '16. International Educational Data Mining Society (IEDMS), 54–61.
DEMPSTER, A. P., LAIRD, N. M., AND RUDIN, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological) 39, 1, 1–38.
FAUCON, L., KIDZINSKI, L., AND DILLENBOURG, P. 2016. Semi-Markov model for simulating MOOC students. In Proceedings of the 9th International Conference on Educational Data Mining, T. Barnes, M. Chi, and M. Feng, Eds. EDM 2016. International Educational Data Mining Society (IEDMS), 358–363.
FINE, S., SINGER, Y., AND TISHBY, N. 1998. The hierarchical hidden Markov model: Analysis and applications. Mach. Learn. 32, 1 (July), 41–62.
GUPTA, R., KUMAR, R., AND VASSILVITSKII, S. 2016. On mixtures of Markov chains. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 3441–3449.
HAMILTON, J. D. 1990. Analysis of time series subject to changes in regime. Journal of Econometrics 45, 1, 39 – 70.
HUANG, J., DASGUPTA, A., GHOSH, A., MANNING, J., AND SANDERS, M. 2014. Superposter behavior in MOOC forums. In Proceedings of the First ACM Conference on Learning @ Scale, A. Fox, M. A. Hearst, and M. T. H. Chi, Eds. 117–126.
HUANG, X., ARIKI, Y., AND JACK, M. 1990. Hidden Markov Models for Speech Recognition. Columbia University Press, New York, NY, USA.
JEH, G. AND WIDOM, J. 2003. Scaling personalized web search. In Proceedings of the 12th International Conference on World Wide Web, G. Hencsey and B. White, Eds. 271–279.
JURAFSKY, D. AND MARTIN, J. H. 2009. Speech and Language Processing (2nd Edition). Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
KIZILCEC, R. F., PIECH, C., AND SCHNEIDER, E. 2013. Deconstructing disengagement: Analyzing learner subpopulations in massive open online courses. In Proceedings of the Third International Conference on Learning Analytics and Knowledge, D. Suthers, K. Verbert, E. Duval, and X. Ochoa, Eds. LAK '13. 170–179.
KIZILCEC, R. F., PREZ-SANAGUSTN, M., AND MALDONADO, J. J. 2017. Self-regulated learning strategies predict learner behavior and goal attainment in massive open online courses. Computers & Education 104, 18 – 33.
KOLLER, D. AND FRIEDMAN, N. 2009. Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. The MIT Press.
MASSUNG, S., GEIGLE, C., AND ZHAI, C. 2016. MeTA: A unified toolkit for text retrieval and analysis. In Proceedings of ACL-2016 System Demonstrations, S. Pradhan and M. Apidianaki, Eds. Berlin, Germany, 91–96.
OLIVER, N., GARG, A., AND HORVITZ, E. 2004. Layered representations for learning and inferring office activity from multiple sensory channels. Comput. Vis. Image Underst. 96, 2 (Nov.), 163–180.
PAGE, L., BRIN, S., MOTWANI, R., AND WINOGRAD, T. 1999. The PageRank citation ranking: bringing order to the web.
PIECH, C., BASSEN, J., HUANG, J., GANGULI, S., SAHAMI, M., GUIBAS, L. J., AND SOHLDICKSTEIN, J. 2015. Deep knowledge tracing. In Advances in Neural Information Processing Systems 28, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds. 505–513.
RABINER, L. R. 1990. Readings in speech recognition. Chapter A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, 267–296.
SHIH, B., KOEDINGER, K. R., AND SCHEINES, R. 2010. Unsupervised discovery of student strategies. In Proceedings of the 3rd International Conference on Educational Data Mining, R. S. Baker, A. Merceron, and P. I. Pavlik, Jr., Eds. EDM 2010. International Educational Data Mining Society (IEDMS), 201–210.
SONG, Y., KEROMYTIS, A. D., AND STOLFO, S. J. 2009. Spectrogram: A mixture-of-Markovchains model for anomaly detection in web traffic. In 16th Annual Network and Distributed System Security Symposium, G. Vigna, Ed. NDSS. ISOC.
YPMA, A. AND HESKES, T. 2002. Automatic categorization of web pages and user clustering with mixtures of hidden markov models. In 4th International Workshop on Mining Web Data for Discovering Usage Patterns and Profiles, O. R. Za¨ıane, J. Srivastava, M. Spiliopoulou, and B. Masand, Eds. WEBKDD 2002. Springer, 35–49.
ZHANG, D., GATICA-PEREZ, D., BENGIO, S., MCCOWAN, I., AND LATHOUD, G. 2004. Modeling individual and group actions in meetings: A two-layer HMM framework. In 2004 Conference on Computer Vision and Pattern Recognition Workshop. 117–117.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.