Toward a Framework for Learner Segmentation



Published Nov 19, 2013
Bahareh Azarnoush Jennifer M. Bekki George C. Runger Bianca L. Bernstein Robert K. Atkinson


Effectively grouping learners in an online environment is a highly useful task. However, datasets used in this task often have large numbers of attributes of disparate types and different scales, which traditional clustering approaches cannot handle effectively. Here, the use of a dissimilarity measure based on the random forest, which handles the stated drawbacks of more traditional clustering approaches, is presented for this task. Additionally, the application of a rule-based method is proposed for interpreting the resulting learner segmentations. The approach was implemented on a real dataset of users of the CareerWISE online educational environment, designed to provide resilience training for women STEM doctoral students, and was shown to find stable and meaningful groups of users.

How to Cite

Azarnoush, B., Bekki, J. M., Runger, G. C., Bernstein, B. L., & Atkinson, R. K. (2013). Toward a Framework for Learner Segmentation. Journal of Educational Data Mining, 5(2), 102–126.
Abstract 610 | PDF Downloads 391



grouping learners, rule-based method, random forest

ALLEN, E., HORVATH, S., KRAFT, P., TONG, F., SPITERI, E., RIGGS, A., AND MARAHRENS, Y. 2003. High concentrations of long interspersed nuclear element sequence distinguish monoallelically expressed genes. Proceedings of the National Academy of Sciences, 100 .17, 9940–9945.

ATZMUELLER, M. AND LEMMERICH, F. 2012. VIKAMINE--Open-Source Subgroup Discovery, Pattern

Mining, and Analytics, Machine Learning and Knowledge Discovery in Databases, 842--845

ATZMUELLER, M., AND PUPPE, F. 2006. SD–MAP–A Fast Algorithm for Exhaustive Subgroup Discovery. In Proce. 10th European Conference on Principles and Practices of Knowledge Discovery in Databases (PKDD 2006), 4213 in LNAI, 6-17. Berlin: Springer Verlag.

BEKKI, J.M., SMITH, M.L., BERNSTEIN, B.L., AND HARRISON, C.J. 2012. under review. Effects of an Online Personal Resilience Training Program for Women in STEM Doctoral Programs.

BELLMAN, R.E. 1966. Adaptive control processes: a guided tour. New Jersey: Princeton University Press.

BERNSTEIN, B. L. 2011. Managing barriers and building supports in science and engineering doctoral programs: Conceptual underpinnings for a new online training program for women. Journal of Women and Minorities in Science and Engineering. 17.1, 29-50.

BERNSTEIN, B. L. AND RUSSO, N. F. 2008. Explaining too few women in academic science and engineering careers: A psychosocial perspective. In The psychology of women at work: Challenges and solutions for our female workforce , M. Paludi, Ed., Praeger, Westport, 1 – 33 .

Breckenridge, James N, 2000. Validating cluster analysis: Consistent replication and symmetry, Multivariate Behavioral Research, 35.2, 261-285.

BREIMAN, L. 2001.Random forests. Machine Learning, 45 .1, 5-32.

BREIMAN, L. 2002. RFtools–two-eyed algorithms. Invited talk at SIAM International Conference on Data Mining . Available at:

BREIMAN, L., AND CUTLER, A. 2003.Random forest manual v4.0. Technical report.

BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A., AND STONE, C.J. 1984. Classification and Regression Trees. Wadsworth, California.

DIDAY, E., AND SIMON, J. C. 1976. Clustering Analysis, Digital Pattern Recognition, 10, 47–94.

DURAN, B.S., AND ODELL, P. L .1974. Cluster Analysis: A survey. Springer, New York.

FERGUSON, R. 2012. The state of learning analytics in 2012: A review and future challenges. Technical Report KMI-12-01, Knowledge Media Institute, The Open University, UK., accessed June, 2012. First European Conference on Principles of Data Mining and Knowledge Discovery, Springer, 78–87.

HAN, J., PEI, J., YIN, Y. 2000. Mining frequent patterns without candidate generation. In Chen, W., Naughton, J., Bernstein, P.A., eds: 2000 ACM SIGMOID Intl. Conference on Management of data, ACM Press.

HASTIE, T., TIBSHIRANI, R., AND FRIEDMAN, J. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York.

HERHSKOVITZ, A., AND NACHMIAS, R. 2010. Online Persistence in Higher Education Web-supported Courses. The Internet and Higher Education, 14.2, 98-106.

HINES, P.J., JANSY, B.R., AND MERVIS, J .2009. Adding a T to the Three R’s. Science, 323 .5910, 53-89.

HUBERT, L., AND ARABIE, P. 1985. Comparing partitions, Journal of classification,2.1,193-218.

JAIN, A.K., MURTY, M. N., AND FLYNN .P.J. 1999. Data clustering: a review. ACM Computing Surveys .CSUR., 31 .3, 264-323.

JOHNSON, L., ADAMS, S., AND CUMMINS, M. 2012. The NMC Horizon Report: 2012 Higher Education

Edition, The New Media Consortium. Austin.

KAUFMAN, L., AND ROUSSEEUW, P. J. 1990. Finding groups in data: A introduction to cluster analysis, New York: Wiley.

KLOSGEN, W. 1996. Explora: A multipattern and multistrategy discovery assistant. Advances in Knowledge Discovery and Data Mining, MIT Press, 249–271.

LANGE,T. ,ROTH,V., BRAUN, M.L., AND BUHMANN, J.M. 2004, Stability-based validation of clustering solutions, Neural computation ,16.6,1299-1323.

LAVRAC, N., KAVSEK, B., FLACH, P., AND TODOROVSKI, L. 2004. Subgroup Discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188.

LIU, B., XIA, Y., AND YU, P., S. 2000. Clustering Trough Decision Tree Construction. In Proceedings of the ninth international conference on information and knowledge management , 20–29.

MEECE, J., L., AND HOLT, K. 1993. A pattern analysis of students' achievement goals. Journal of Educational Psychology, 85.4, 582-590.

MERCERON, A. AND YACEF, K. 2003. A web-based tutoring tool with mining facilities to improve learning and teaching. In Proceedings of the 11th International Conference on Artificial Intelligence in Education, 201– 208.

MERCERON, A., AND YACEF, K. 2004. Clustering Students to Help Evaluate Learning. In Technology Enhanced Learning , J.P. COURTIAT, C. DAVARAKIS, AND T. VILLEMUR .Eds. Kluwer, Toulouse, 31-42.

MICHALSKI, R., STEPP, R. E., AND DIDAY, E. 1981. A recent advance in data analysis: clustering objects into classes characterized by conjunctive concepts. In Progress in Pattern Recognition, L. N. KANAL AND A. ROSENFIELD, Eds., New York, 33-56.

MILLIGAN, G.W., AND COOPER, M. 1986: A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis, Multivariate Behavioral Research, 21.4, 441-458.

MURTAGH, F. 1983. A survey of recent advances in hierarchical clustering algorithms}, The Computer Journal, 26.4, 354-359.

NARCISS, S., PROSKE, A., AND KOERNDLE, H .2007. Promoting self-regulated learning in web-based environments. Computers in Human Behavior, 23.3, 1126 – 1144.

NG, R. T. AND HAN, J. 1994. Efficient and effective clustering methods for spatial data mining. In Proceedings of the Twentieth International Conference on Very Large Data Bases,144–154.

PARSON, L., HAQUE, E., AND LIU, H .2004. Subspace clustering for high dimensional data: A Review. ACM SIGKDD Explorations Newsletter, 6.1, 90-105.

PERERA, D., KAY, J., KOPRINSKA, I., YACEF, K., ZAÏANE, O. R. 2009. Clustering and Sequential Pattern Mining of Online Collaborative Learning Data. In IEEE Transaction on Knowledge and Data Engineering, 21.6, 759-772. R Development Core Team .2008.

R: A language and environment for statistical computing. R Foundation for Statistical Computing,Vienna, Austria. ISBN 3-900051-07-0,

ROMERO, C. AND VENTURA, S. 2007. Educational data mining: A survey from 1995 – 2005. Expert Systems with Applications, 30, 135-146.

ROMERO, C. AND VENTURA, S. 2010. Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetcis – Part C: Applications and Reviews, 40.6, 601-618.

ROMERO, C., GONZALEZ, P., VENTURA, S., DEL JESUS, M.J., HERRERA, F.2009. Evolutionary algorithms for subgroup discovery in e-learning: A practical application using Moodle data. In Expert System with Application Journal, 36, 1632-1644.

ROMERO, C., VENTURA, S., AND GARCÍA, E. 2008. Data mining in course management systems: Moodle case study and tutorial. Computers AND Education, 51, 368-384.

SELIGSON, D.B., HORVATH, S., SHI, T., YU, H., TZE. S., GRUNSTEIN, M., AND KURDISTANI, S.K. 2005. Global histone modification pattern predict risk of prostate cancer recurrence. Nature, 435, 1262- 1266.

SHAVELSON, R.J .1979. Application of cluster analysis in educational research: looking for a needle in haystack. British Educational Research Journal, 5.1, 45-53.

SHI, T., AND HORVATH, S. 2006. Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15.1, 118-138.

SHI, T., SELIGSON, D., BELLDEGRUN, A.S., PALOTIE, A., AND HORVATH, S. 2005. Tumor classification by tissue microarray profiling: Random forest clustering applied to renal cell carcinoma. Modern Pathology, 18, 547-557.

SILBERSCHATZ, A., TUZHILIN, A. 1995. On Subjective Measures of Interestingness in Knowledge

Discovery, Proc. of the First Int'l Conference on Knowledge Discovery and Data Mining , Montreal, Canada.

SINGH, S., S., AND CHAUHAN, N.,C. 2011, K-Means v/s K-Medoids: A Comparative Study, National Conference on Recent Trends inEngineering And Technology.

TALAVERA, L. AND GAUDIOSO, E. 2004. Mining student data to characterize similar behavior groups in unstructured collaboration spaces. Proceedings of the Artificial Intelligence in Computer Supported Collaborative Learning Workshop at the ECAI 2004, 17-23.

TAN, P.N., STEINBACH, M., AND KUMAR, V .2006. Introduction to data mining, Pearson Addison Wesley, Massachusetts.

TIBSHIRANI, R., AND WALTHER, G. 2005. Cluster validation by prediction strength, Journal of Computational and Graphical Statistics, 14.3, 511-528.

VALLE, R., AND DUFFY, M. 2009. Online learning: Learner characteristics and their approaches to managing learning, Instructional Science, 37, 129-149.

WROBEL, S. 1997. An algorithm for multi-relational discovery of subgroups. In Proceedings of the

WROBEL, S. 2001. Inductive logic programming for knowledge discovery in databases. Relational Data Mining, Springer, 74–101.

ZENKO, B., DZEROSKI, S., AND STRUYF, J. 2006. Learning predictive clustering rules. Proceedings of the 4th International Workshop on Knowledge Discovery in Inductive Databases , F. BONCHI AND J.F. BOULICAUT .Eds, Springer, Berlin, 234–250.