Understanding Teacher Users of a Digital Library Service: A Clustering Approach



Published Dec 1, 2011
Beijie Xu Mimi Recker


This article describes the Knowledge Discovery and Data Mining (KDD) process and its application in the field of educational data mining (EDM) in the context of a digital library service called the Instructional Architect (IA.usu.edu). In particular, the study reported in this article investigated a certain type of data mining problem, clustering, and used a statistical model, latent class analysis, to group the IA teacher users according to their diverse online behaviors. The use of LCA successfully helped us identify different types of users, ranging from window shoppers, lukewarm users to the most dedicated users, and distinguish the isolated users from the key brokers of this online community. The article concludes with a discussion of the implications of the discovered usage patterns on system design and on EDM in general.

How to Cite

Xu, B., & Recker, M. (2011). Understanding Teacher Users of a Digital Library Service: A Clustering Approach. JEDM | Journal of Educational Data Mining, 3(1), 1-28. Retrieved from https://jedm.educationaldatamining.org/index.php/JEDM/article/view/19
Abstract 375 | PDF Downloads 193



educational data mining, educational web mining, clustering, latent class analysis, digital libraries, teacher users

AIVAZIAN, B. I., GEARY, E., KHOO, M., SUMNER, T., AND IRETON, S. 2003. Serving K-12 education with DWEL. Knowledge Quest 31, 21-22.

ARON, A., ARON, E. N., AND COUPS, E. J. 2009. Statistics for psychology (5th ed.). Pearson Education, Upper Saddle River, New Jersey.

ASUNKA, S., CHAE, H. S., HUGHES, B., AND NATRIELLO, G. 2008. Understanding academic information seeking habits through analysis of web server log files: The case of the teachers college library website. The Journal of Academic Librarianship 35, 33–45.

BAKER, L. J. 2009. Science teachers’ use of online resources and the digital library for earth system education. In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, Austin, Texas, USA, 1-10.

BAKER, R. S. J. D., AND YACEF, K. 2009. The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining 1, 3-17.

BORGMAN, C. L. 1999. What are digital libraries? Competing visions. Information Processing and Management 35, 227-243.

CARLSON, B., AND REIDY, S. 2004. Effective access: Teachers' use of digital resources (research in progress). OCLC Systems & Services 20, 65 – 70.

CHEN, H., AND CHAU, M. 2004. Web mining: Machine learning for web applications. In Annual Review of Information Science and Technology 38, C. BLAISE, Eds. Information Today, Inc, Medford, NJ, 289-329.

CHOUDHURY, S., HOBBS, B., AND LORIE, M. 2002. A framework for evaluating digital library services. D-Lib Magazine 8.

CLAESKENS, G., AND HJORT, N. L. 2008. Model selection and model averaging. Cambridge University Press, New York, NY.

COHEN, B. H. 2001. Explaining psychological statistics. John Wiley & Sons, Inc, New York, NY.

COOLEY, R., MOBASHER, B., AND SRIVASTAVA, J. 1997. Web mining: Information and pattern discovery on the World Wide Web. Paper presented at the 9th IEEE International Conference on Tools with Artificial Intelligence, Newport Beach, CA.

DURFEE, A. SCHNEBERGER, S. AND AMOROSO, D. L. 2007. Evaluating students computer-based learning using a visual data mining approach. Journal of Informatics Education Research 9, 1-28.

GOODMAN, L. A. 1974. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61, 215-231.

GRIMES, S. 2007. The grand challenge for text mining. Retrieved from, http://www.informationweek.com/news/software/bi/showArticle.jhtml?articleID=228900459.

HAN, J., AND KAMBER, M. 2006. Data mining: Concepts and techniques (2nd ed.). Morgan Kaufmann Publishers, San Francisco, CA.

HÜBSCHER, R., PUNTAMBEKAR, S., AND NYE, A. H. 2007. Domain specific interactive data mining. In Proceedings of Workshop on Data Mining for User Modeling at the 11th International Conference on User Modeling, Corfu, Greece, 81-90.

KHOO, M. 2006. NSDL user survey 2006. Retrieved from, http://www.ischool.drexel.edu/faculty/mkhoo/docs/nsdl_06_user_survey.pdf.

KHOO, M., PAGANO, J., WASHINGTON, A. L., RECKER, M., PALMER, B., AND DONAHUE, R. A. 2008. Using web metrics to analyze digital libraries. In Proceedings of the Joint Conference on Digital Libraries, New York, 375-384.

KOUTRI, M., AVOURIS, N., AND DASKALAKI, S. 2004. A survey on web usage mining techniques for web- based adaptive hypermedia systems. In Adaptable and Adaptive Hypermedia Systems, S. Y. CHEN, AND G. D.

MAGOULAS, Eds. IRM Press, Hershey, PA, 125-149.

KRIEGEL, H. P., BORGWARDT, K. M., KRÖGER, P., PRYAKHIN, A., SCHUBERT, M., AND ZIMEK, A. 2006. Future trends in data mining. Data Mining and Knowledge Discovery 15, 87-97.

KUHA, J. 2004. AIC and BIC: Comparisons of assumptions and performance. Sociological Methods & Research, 33, 188-229.

LAGOZE, C., VAN DE SOMPEL, H., NELSON, M., AND WARNER, S. 2002. The open archives initiative protocol for metadata harvesting. Retrieved from http://www.openarchives.org/OAI/openarchivesprotocol.html

LAZARSFELD, P. F., AND HENRY, N. W. 1968. Latent structure analysis. Boston: Houghton Mifflin.

LEARY, H., GIERSCH, S., WALKER, A., AND RECKER, M. 2009. Developing a review rubric for learning resources in digital libraries. ITLS Faculty Publications.

LEE, C. 2007. Diagnostic, predictive and compositional modeling with data mining in integrated learning environments. Computers & Education 49, 562-580.

LÓPEZ-PINTADO, D. 2008. The spread of free-riding behavior in a social network. Eastern Economic Journal 34, 464-479.

MAGIDSON, J., & VERMUNT, J., K. 2004. Latent class models. In The Sage Handbook of Quantitative Methodology for the Social Sciences, D. KAPLAN, Eds. Sage Publications, Thousand Oaks, CA, 175-198.

MALIK, M., AND JAIN, A. K. 2006. Digital library: Link to e-learning. DRTC - ICT Conference on Digital Learning Environment. Bangalore, India.

MAULL, K. E., SALDIVAR, M. G., AND SUMNER, T. 2010. Online curriculum planning behavior of teachers. In Proceedings of the 3rd International Conference on Educational Data Mining, Pittsburgh, PA, 121-130.

Minka, T. P. 2002. Beyond Newton’s method. Retrieved from http://research.microsoft.com/en- us/um/people/minka/papers/minka-newton.pdf.

NIELSON, J. 1997. Zipf curves and website popularity. Retrieved from http://www.useit.com/alertbox/zipf.html

NISHIDA, M., AND KAWAHARA, T. 2005. Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing. IEEE Transactions on Speech and Audio Processing 13, 583-592.

PAHL, C., AND DONNELLAN, D. 2002. Data mining technology for the evaluation of web-based teaching and learning systems. Paper presented at the E-Learn 2002 World Conference on E-Learning in Corporate,

Goverment, Healthcare, & Higher Education, Montreal, Quebec, Canada.

PATTUELLI, M. C. 2008. Teachers’ perspectives and contextual dimensions to guide the design of N.C. history learning objects and ontology. Information Processing and Management 44, 635–646.

PERRAULT, A. M. 2007. An exploratory study of biology teachers' online information seeking practices. School Library Media Research 10.

RECKER, M., AND PITKOW, J. 1996. Predicting document access in large, multimedia repositories. ACM Transactions on Computer-Human Interaction 3, 352-375.

RECKER, M. 2006. Perspectives on teachers as digital library users: Consumers, contributors, and designers. D-Lib Magazine 12. Retrieved from http://www.dlib.org/dlib/september06/recker/09recker.html.

RECKER, M., WALKER, A., GIERSCH, S., MAO, X., HALIORIS, S., PALMER, B., JOHNSON, D., LEARY, H., AND ROBERTSHAW, M. B. 2007. A Study of teachers' use of online learning resources to design classroom activities. New Review of Hypermedia and Multimedia 13, 117-134.

ROMERO, C. AND VENTURA, S. 2007. Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications 33, 135-146.

SHREEVES, S. L., AND KIRKHAM, C. M. 2004. Experiences of educators using a portal of aggregated metadata. Journal of Digital Information 5. STATISTICAL INNOVATIONS. 2005. Tutorial 1: Using Latent GOLD® 4.5 to estimate LC cluster models. Retrieved from http://www.statisticalinnovations.com/products/latentgold_v4.html.

SUMNER, T., AND CCS TEAM. 2010. Customizing science instruction with educational digital libraries. In Proceedings of the 10th ACM/IEEE-CS Joint Conference on Digital Libraries, Gold Coast, Queensland, Australia, 353-356.

SUMNER, T., KHOO, M., RECKER, M., AND MARLINO, M. 2003. Understanding educator perceptions of “quality” in digital libraries. In Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, Houston, Texas, 269-279.

SUMNER, T., AND MARLINO, M. 2004. Digital libraries and educational practice: A case for new models. In Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, Tucson, Arizona, 170-178.

TALAVERA, L., AND GAUDIOSO, E. 2004. Mining student data to characterize similar behavior groups in unstructured collaboration spaces. Paper presented at the Workshop on Artificial Intelligence in CSCL, 16th European Conference on Artificial Intelligence, Valencia, Spain.

TANNI, M. 2008. Prospective history teachers' information behaviour in lesson planning. Information Research 13.

VERMUNT, J., K., AND MAGIDSON, J. 2002. Latent class cluster analysis. In Applied Latent Class Analysis, J. HAGENAARS AND A. MCCUTCHEON, Eds. Cambridge University Press, New York, NY, 89-106.

VERMUNT, J., K, AND MAGIDSON, J. 2005. Technical guide for Latent GOLD 4.0: Basic and advanced. Statistical Innovations Inc, Belmont, MA.

WALKER, H. M. 1940. Degrees of freedom. Journal of Educational Psychology 31, 253-269.

WANG, W., WENG, J., SU, J., AND TSENG, S. 2004. Learning portfolio analysis and mining in SCORM compliant environment. Presented at the 34 th ASEE/IEEE Frontiers in Education Conference, Savannah, GA.

WASKO, M. M., TEIGLAND, R. AND FARAJ S. 2009. The provision of online public goods: Examining social structure in an electronic network of practice. Decision Support Systems 47, 254-265.

WITTEN, I. H., AND FRANK, E. 2005. Data mining: Practical machine learning tools and techniques (2nd ed.). Morgan Kaufmann, San Francisco, CA.

XU, B., AND RECKER, M. 2010. Peer production of online learning resources: A social network analysis. Poster presented at the third Annual Conference on Educational Data Mining, Pittsburgh, PA.

XU, B., RECKER, M., AND HSI, S. 2010. The data deluge: Opportunities for research in educational digital libraries. In Internet Issues: Blogging, the Digital Divide and Digital Libraries, C. M. EVANS, Eds. Nova Science Publishers, Hauppauge, NY.

YPMA. T. J. 1995. Historical development of the Newton-Raphson Method. SIAM Review 37, 531-551.

ZIA, L. L. 2001. Growing a national learning environments and resources network for science, mathematics, engineering, and technology education. D-Lib Magazine 7.