Analyzing Transitions in Sequential Data with Marginal Models

Jeffrey Matayoshi; Shamya Karumbaiah

doi:10.5281/zenodo.12179681

Analyzing Transitions in Sequential Data with Marginal Models

HTML PDF

Published June 27, 2024

DOI: https://doi.org/10.5281/zenodo.12179681

Jeffrey Matayoshi

McGraw Hill ALEKS

https://orcid.org/0000-0003-1321-8159

Shamya Karumbaiah

University of Wisconsin–Madison

Abstract

Various areas of educational research are interested in the transitions between different states—or events—
in sequential data, with the goal of understanding the significance of these transitions; one notable example
is affect dynamics, which aims to identify important transitions between affective states. Unfortunately,
several works have uncovered issues with the metrics and procedures commonly used to analyze
these transitions. As such, our goal in this work is to address these issues by outlining an alternative
procedure that is based on the use of marginal models. We begin by looking at the specific mechanisms
responsible for a recently discovered statistical bias with several metrics used in sequential data analysis.
After giving a theoretical explanation for the issue, we show that the marginal model procedure appears
to adjust for this bias. Next, a related problem is that the common practice of removing transitions to
repeated states has been shown to have unintended side-effects—to account for this issue, we develop
a method for extending the marginal model procedure to this specific type of analysis. Finally, in a
recent study evaluating the problem of multiple comparisons and sequential data analysis, the Benjamini-
Hochberg (BH) procedure, a commonly used approach to control for false discoveries, did not perform
as expected. By applying a technique from the biostatistics and epidemiology literature, we show that the
performance of the BH procedure, when used with the marginal model method, can be brought back to its
expected level. In all of our analyses, we evaluate the proposed method by both running simulations and
using actual student data. The results indicate that the marginal model procedure seemingly compensates
for the problems observed with other transition metrics, thus resulting in more accurate estimates of the
importance of transitions between states.

How to Cite

Analyzing Transitions in Sequential Data with Marginal Models. (2024). Journal of Educational Data Mining, 16(1), 197-232. https://doi.org/10.5281/zenodo.12179681

Abstract 299 | HTML Downloads 256 PDF Downloads 324

Keywords

sequential data, transition metrics, marginal models, affect dynamics

References

AKAIKE, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 6, 716–723.

ANDRADE, A., DANISH, J., AND MALTESE, A. 2017. A measurement model of gestures in an embodied learning environment: Accounting for temporal dependencies. Journal of Learning Analytics 4, 18– 46.

ANDRES, J. M. L., RODRIGO, M. M. T., SUGAY, J. O., BANAWAN, M. P., PAREDES, Y. V. M., CRUZ, J. S. D., AND PALAOAG, T. D. 2015. More fun in the Philippines? Factors affecting transfer of western field methods to one developing world context. In Proceedings of the Sixth International Workshop on Culturally-Aware Tutoring Systems at the 17th International Conference on Artificial Intelligence in Education, J. Boticario and K. Muldner, Eds. CEUR Workshop Proceedings, vol. 1432. 31–40.

ANGRIST, J. D. AND PISCHKE, J.-S. 2008. Mostly Harmless Econometrics. Princeton University Press.

BAKER, R. S., D’MELLO, S. K., RODRIGO, M. T., AND GRAESSER, A. C. 2010. Better to be frustrated than bored: The incidence, persistence, and impact of learners cognitive affective states during interactions with three different computer-based learning environments. International Journal of Human-Computer Studies 68, 4, 223–241.

BENJAMINI, Y. 2010. Discovering the false discovery rate. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 4, 405–416.

BENJAMINI, Y. AND HOCHBERG, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 57, 1, 289–300.

BENJAMINI, Y. AND YEKUTIELI, D. 2001. The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 4, 1165–1188.

BISWAS, G., JEONG, H., KINNEBREW, J., SULCER, B., AND ROSCOE, R. D. 2010. Measuring self-regulated learning skills through social interactions in a teachable agent environment. Research and Practice in Technology Enhanced Learning 5, 2, 123–152.

BOSCH, N. AND D’MELLO, S. 2017. The affective experience of novice computer programmers. International Journal of Artificial Intelligence in Education 27, 1, 181–206.

BOSCH, N. AND PAQUETTE, L. 2021. What’s next? Sequence length and impossible loops in state transition measurement. Journal of Educational Data Mining 13, 1, 1–23.

BOTELHO, A. F., BAKER, R. S., AND HEFFERNAN, N. T. 2017. Improving sensor-free affect detection using deep learning. In Proceedings of the 18th International Conference on Artificial Intelligence in Education, E. André, R. Baker, X. Hu, M. M. T. Rodrigo, and B. du Boulay, Eds. Springer International Publishing, Cham, 40–51.

D’MELLO, S. AND GRAESSER, A. 2012. Dynamics of affective states during complex learning. Learning and Instruction 22, 2, 145–157.

D’MELLO, S., TAYLOR, R. S., AND GRAESSER, A. 2007. Monitoring affective trajectories during complex learning. In Proceedings of the 29th Annual Cognitive Science Society, D. S. McNamara and J. G. Trafton, Eds. Cognitive Science Society, Austin, TX, 203–208.

FARCOMENI, A. 2006. More powerful control of the false discovery rate under dependence. Statistical Methods and Applications 15, 1, 43–73.

GILOVICH, T., VALLONE, R., AND TVERSKY, A. 1985. The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology 17, 3, 295–314.

GOEMAN, J. J. AND SOLARI, A. 2014. Multiple hypothesis testing in genomics. Statistics in Medicine 33, 11, 1946–1978.

HARDIN, J. W. AND HILBE, J. M. 2012. Generalized Estimating Equations. Chapman and Hall/CRC.

HEAGERTY, P. J. AND ZEGER, S. L. 2000. Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Statistical Science 15, 1, 1–26.

HEFFERNAN, N. T. AND HEFFERNAN, C. L. 2014. The ASSISTments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. International Journal of Artificial Intelligence in Education 24, 4, 470–497.

JAMES, G., WITTEN, D., HASTIE, T., AND TIBSHIRANI, R. 2021. An Introduction to Statistical Learning, Second ed. Springer.

KARUMBAIAH, S., ANDRES, J. M. A. L., ANTHONY., F., BOTELHO, BAKER, R., AND OCUMPAUGH, J. L. 2018. The implications of a subtle difference in the calculation of affect dynamics. In Proceedings of the 26th International Conference on Computers in Education, J. C. Yang, M. Chang, L.-H. Wong, and M. M. T. Rodrigo, Eds. Asia-Pacific Society for Computers Philippines, 29–38.

KARUMBAIAH, S., BAKER, R., OCUMPAUGH, J., AND ANDRES, A. 2021. A re-analysis and synthesis of data on affect dynamics in learning. IEEE Transactions on Affective Computing 14, 2, 1696–1710.

KARUMBAIAH, S., BAKER, R. S., AND OCUMPAUGH, J. 2019. The case of self-transitions in affective dynamics. In Proceedings of the 20th International Conference on Artificial Intelligence in Education, S. Isotani, E. Millán, A. Ogan, P. Hastings, B. McLaren, and R. Luckin, Eds. Springer International Publishing, Cham, 172–181.

KIM, K. I. AND VAN DE WIEL, M. A. 2008. Effects of dependence in high-dimensional multiple testing problems. BMC Bioinformatics 9, 1, 1–12.

KNIGHT, S., WISE, A. F., AND CHEN, B. 2017. Time for change: Why learning analytics needs temporal analysis. Journal of Learning Analytics 4, 3, 7–17.

LI, P. AND REDDEN, D. T. 2015. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Statistics in Medicine 34, 2, 281–296.

LIANG, K.-Y. AND ZEGER, S. L. 1986. Longitudinal data analysis using generalized linear models. Biometrika 73, 1, 13–22.

MACKINNON, J. G. AND WHITE, H. 1985. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics 29, 3, 305–325.

MAHZOON, M. J., MAHER, M. L., ELTAYEBY, O., DOU, W., AND GRACE, K. 2018. A sequence data model for analyzing temporal patterns of student data. Journal of Learning Analytics 5, 1, 55–74.

MANCL, L. A. AND DEROUEN, T. A. 2001. A covariance estimator for GEE with improved small-sample properties. Biometrics 57, 1, 126–134.

MATAYOSHI, J. 2024. jmatayoshi/consolidated-transition-analysis: Release 1.0.0. https://doi.org/10.5281/zenodo.12049382.

MATAYOSHI, J. AND KARUMBAIAH, S. 2020. Adjusting the L statistic when self-transitions are excluded in affect dynamics. Journal of Educational Data Mining 12, 4 (Dec.), 1–23.

MATAYOSHI, J. AND KARUMBAIAH, S. 2021a. Investigating the validity of methods used to adjust for multiple comparisons in educational data mining. In Proceedings of the 14th International Conference on Educational Data Mining, I.-H. S. Hsiao, S. S. Sahebi, F. Bouchet, and J.-J. Vie, Eds. International Educational Data Mining Society, 33–45.

MATAYOSHI, J. AND KARUMBAIAH, S. 2021b. Using marginal models to adjust for statistical bias in the analysis of state transitions. In LAK21: 11th International Learning Analytics and Knowledge Conference. Association for Computing Machinery, 449– 455.

MCDONALD, J. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House Publishing.

MILLER, J. B. AND SANJURJO, A. 2018. Surprised by the gamblers and hot hand fallacies? A truth in the law of small numbers. Econometrica 6, 2019–2047.

OCUMPAUGH, J., BAKER, R. S., AND RODRIGO, M. M. T. 2015. Baker Rodrigo Ocumpaugh Monitoring Protocol (BROMP) 2.0 technical and training manual. New York, NY and Manila, Philippines: Teachers College, Columbia University and Ateneo Laboratory for the Learning Sciences 60.

PAN, W. 2001. Akaike’s information criterion in generalized estimating equations. Biometrics 57, 1, 120–125.

REINER-BENAIM, A. 2007. FDR control by the BH procedure for two-sided correlated tests with implications to gene expression data analysis. Biometrical Journal 49, 1, 107–126.

REINER-BENAIM, A., YEKUTIELI, D., LETWIN, N. E., ELMER, G. I., LEE, N. H., KAFKAFI, N., AND BENJAMINI, Y. 2007. Associating quantitative behavioral traits with gene expression in the brain: Searching for diamonds in the hay. Bioinformatics 23, 17, 2239–2246.

SACKETT, G. P. 1979. The lag sequential analysis of contingency and cyclicity in behavioral interaction research. Handbook of Infant Development 1, 623–649.

SEABOLD, S. AND PERKTOLD, J. 2010. Statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference, S. van derWalt and J. Millman, Eds. 92–96.

SHUTE, V. J. AND VENTURA, M. 2013. Stealth assessment: Measuring and supporting learning in video games. MIT Press.

SNIJDERS, T. A. AND BOSKER, R. J. 2012. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. Sage.

STOREY, J. D. AND TIBSHIRANI, R. 2003. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences 100, 16, 9440–9445.

SZMARAGD, C., CLARKE, P., AND STEELE, F. 2013. Subject specific and population average models for binary longitudinal data: a tutorial. Longitudinal and Life Course Studies 4, 2, 147–165.

WILLIAMS, V. S., JONES, L. V., AND TUKEY, J. W. 1999. Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement. Journal of Educational and Behavioral Statistics 24, 1, 42–69.

YEKUTIELI, D. 2008. False discovery rate control for non-positively regression dependent test statistics. Journal of Statistical Planning and Inference 138, 2, 405–415.

Issue

Vol 16 No 1 (2024)

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms:

The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:

Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
The Author represents and warrants that:

the Work is the Author’s original work;
the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
the Work is not pending review or under consideration by another publisher;
the Work has not previously been published;
the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
the Work contains no libel, invasion of privacy, or other unlawful matter.

The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.

Main

Sidebar

Abstract

How to Cite

Details

Most read articles by the same author(s)