This article describes the Knowledge Discovery and Data Mining (KDD) process and its application in the field of educational data mining (EDM) in the context of a digital library service called the Instructional Architect (IA.usu.edu). In particular, the study reported in this article investigated a certain type of data mining problem, clustering, and used a statistical model, latent class analysis, to group the IA teacher users according to their diverse online behaviors. The use of LCA successfully helped us identify different types of users, ranging from window shoppers, lukewarm users to the most dedicated users, and distinguish the isolated users from the key brokers of this online community. The article concludes with a discussion of the implications of the discovered usage patterns on system design and on EDM in general.
How to Cite
educational data mining, educational web mining, clustering, latent class analysis, digital libraries, teacher users
ARON, A., ARON, E. N., AND COUPS, E. J. 2009. Statistics for psychology (5th ed.). Pearson Education, Upper Saddle River, New Jersey.
ASUNKA, S., CHAE, H. S., HUGHES, B., AND NATRIELLO, G. 2008. Understanding academic information seeking habits through analysis of web server log files: The case of the teachers college library website. The Journal of Academic Librarianship 35, 33–45.
BAKER, L. J. 2009. Science teachers’ use of online resources and the digital library for earth system education. In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, Austin, Texas, USA, 1-10.
BAKER, R. S. J. D., AND YACEF, K. 2009. The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining 1, 3-17.
BORGMAN, C. L. 1999. What are digital libraries? Competing visions. Information Processing and Management 35, 227-243.
CARLSON, B., AND REIDY, S. 2004. Effective access: Teachers' use of digital resources (research in progress). OCLC Systems & Services 20, 65 – 70.
CHEN, H., AND CHAU, M. 2004. Web mining: Machine learning for web applications. In Annual Review of Information Science and Technology 38, C. BLAISE, Eds. Information Today, Inc, Medford, NJ, 289-329.
CHOUDHURY, S., HOBBS, B., AND LORIE, M. 2002. A framework for evaluating digital library services. D-Lib Magazine 8.
CLAESKENS, G., AND HJORT, N. L. 2008. Model selection and model averaging. Cambridge University Press, New York, NY.
COHEN, B. H. 2001. Explaining psychological statistics. John Wiley & Sons, Inc, New York, NY.
COOLEY, R., MOBASHER, B., AND SRIVASTAVA, J. 1997. Web mining: Information and pattern discovery on the World Wide Web. Paper presented at the 9th IEEE International Conference on Tools with Artificial Intelligence, Newport Beach, CA.
DURFEE, A. SCHNEBERGER, S. AND AMOROSO, D. L. 2007. Evaluating students computer-based learning using a visual data mining approach. Journal of Informatics Education Research 9, 1-28.
GOODMAN, L. A. 1974. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61, 215-231.
GRIMES, S. 2007. The grand challenge for text mining. Retrieved from, http://www.informationweek.com/news/software/bi/showArticle.jhtml?articleID=228900459.
HAN, J., AND KAMBER, M. 2006. Data mining: Concepts and techniques (2nd ed.). Morgan Kaufmann Publishers, San Francisco, CA.
HÜBSCHER, R., PUNTAMBEKAR, S., AND NYE, A. H. 2007. Domain specific interactive data mining. In Proceedings of Workshop on Data Mining for User Modeling at the 11th International Conference on User Modeling, Corfu, Greece, 81-90.
KHOO, M. 2006. NSDL user survey 2006. Retrieved from, http://www.ischool.drexel.edu/faculty/mkhoo/docs/nsdl_06_user_survey.pdf.
KHOO, M., PAGANO, J., WASHINGTON, A. L., RECKER, M., PALMER, B., AND DONAHUE, R. A. 2008. Using web metrics to analyze digital libraries. In Proceedings of the Joint Conference on Digital Libraries, New York, 375-384.
KOUTRI, M., AVOURIS, N., AND DASKALAKI, S. 2004. A survey on web usage mining techniques for web- based adaptive hypermedia systems. In Adaptable and Adaptive Hypermedia Systems, S. Y. CHEN, AND G. D.
MAGOULAS, Eds. IRM Press, Hershey, PA, 125-149.
KRIEGEL, H. P., BORGWARDT, K. M., KRÖGER, P., PRYAKHIN, A., SCHUBERT, M., AND ZIMEK, A. 2006. Future trends in data mining. Data Mining and Knowledge Discovery 15, 87-97.
KUHA, J. 2004. AIC and BIC: Comparisons of assumptions and performance. Sociological Methods & Research, 33, 188-229.
LAGOZE, C., VAN DE SOMPEL, H., NELSON, M., AND WARNER, S. 2002. The open archives initiative protocol for metadata harvesting. Retrieved from http://www.openarchives.org/OAI/openarchivesprotocol.html
LAZARSFELD, P. F., AND HENRY, N. W. 1968. Latent structure analysis. Boston: Houghton Mifflin.
LEARY, H., GIERSCH, S., WALKER, A., AND RECKER, M. 2009. Developing a review rubric for learning resources in digital libraries. ITLS Faculty Publications.
LEE, C. 2007. Diagnostic, predictive and compositional modeling with data mining in integrated learning environments. Computers & Education 49, 562-580.
LÓPEZ-PINTADO, D. 2008. The spread of free-riding behavior in a social network. Eastern Economic Journal 34, 464-479.
MAGIDSON, J., & VERMUNT, J., K. 2004. Latent class models. In The Sage Handbook of Quantitative Methodology for the Social Sciences, D. KAPLAN, Eds. Sage Publications, Thousand Oaks, CA, 175-198.
MALIK, M., AND JAIN, A. K. 2006. Digital library: Link to e-learning. DRTC - ICT Conference on Digital Learning Environment. Bangalore, India.
MAULL, K. E., SALDIVAR, M. G., AND SUMNER, T. 2010. Online curriculum planning behavior of teachers. In Proceedings of the 3rd International Conference on Educational Data Mining, Pittsburgh, PA, 121-130.
Minka, T. P. 2002. Beyond Newton’s method. Retrieved from http://research.microsoft.com/en- us/um/people/minka/papers/minka-newton.pdf.
NIELSON, J. 1997. Zipf curves and website popularity. Retrieved from http://www.useit.com/alertbox/zipf.html
NISHIDA, M., AND KAWAHARA, T. 2005. Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing. IEEE Transactions on Speech and Audio Processing 13, 583-592.
PAHL, C., AND DONNELLAN, D. 2002. Data mining technology for the evaluation of web-based teaching and learning systems. Paper presented at the E-Learn 2002 World Conference on E-Learning in Corporate,
Goverment, Healthcare, & Higher Education, Montreal, Quebec, Canada.
PATTUELLI, M. C. 2008. Teachers’ perspectives and contextual dimensions to guide the design of N.C. history learning objects and ontology. Information Processing and Management 44, 635–646.
PERRAULT, A. M. 2007. An exploratory study of biology teachers' online information seeking practices. School Library Media Research 10.
RECKER, M., AND PITKOW, J. 1996. Predicting document access in large, multimedia repositories. ACM Transactions on Computer-Human Interaction 3, 352-375.
RECKER, M. 2006. Perspectives on teachers as digital library users: Consumers, contributors, and designers. D-Lib Magazine 12. Retrieved from http://www.dlib.org/dlib/september06/recker/09recker.html.
RECKER, M., WALKER, A., GIERSCH, S., MAO, X., HALIORIS, S., PALMER, B., JOHNSON, D., LEARY, H., AND ROBERTSHAW, M. B. 2007. A Study of teachers' use of online learning resources to design classroom activities. New Review of Hypermedia and Multimedia 13, 117-134.
ROMERO, C. AND VENTURA, S. 2007. Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications 33, 135-146.
SHREEVES, S. L., AND KIRKHAM, C. M. 2004. Experiences of educators using a portal of aggregated metadata. Journal of Digital Information 5. STATISTICAL INNOVATIONS. 2005. Tutorial 1: Using Latent GOLD® 4.5 to estimate LC cluster models. Retrieved from http://www.statisticalinnovations.com/products/latentgold_v4.html.
SUMNER, T., AND CCS TEAM. 2010. Customizing science instruction with educational digital libraries. In Proceedings of the 10th ACM/IEEE-CS Joint Conference on Digital Libraries, Gold Coast, Queensland, Australia, 353-356.
SUMNER, T., KHOO, M., RECKER, M., AND MARLINO, M. 2003. Understanding educator perceptions of “quality” in digital libraries. In Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, Houston, Texas, 269-279.
SUMNER, T., AND MARLINO, M. 2004. Digital libraries and educational practice: A case for new models. In Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, Tucson, Arizona, 170-178.
TALAVERA, L., AND GAUDIOSO, E. 2004. Mining student data to characterize similar behavior groups in unstructured collaboration spaces. Paper presented at the Workshop on Artificial Intelligence in CSCL, 16th European Conference on Artificial Intelligence, Valencia, Spain.
TANNI, M. 2008. Prospective history teachers' information behaviour in lesson planning. Information Research 13.
VERMUNT, J., K., AND MAGIDSON, J. 2002. Latent class cluster analysis. In Applied Latent Class Analysis, J. HAGENAARS AND A. MCCUTCHEON, Eds. Cambridge University Press, New York, NY, 89-106.
VERMUNT, J., K, AND MAGIDSON, J. 2005. Technical guide for Latent GOLD 4.0: Basic and advanced. Statistical Innovations Inc, Belmont, MA.
WALKER, H. M. 1940. Degrees of freedom. Journal of Educational Psychology 31, 253-269.
WANG, W., WENG, J., SU, J., AND TSENG, S. 2004. Learning portfolio analysis and mining in SCORM compliant environment. Presented at the 34 th ASEE/IEEE Frontiers in Education Conference, Savannah, GA.
WASKO, M. M., TEIGLAND, R. AND FARAJ S. 2009. The provision of online public goods: Examining social structure in an electronic network of practice. Decision Support Systems 47, 254-265.
WITTEN, I. H., AND FRANK, E. 2005. Data mining: Practical machine learning tools and techniques (2nd ed.). Morgan Kaufmann, San Francisco, CA.
XU, B., AND RECKER, M. 2010. Peer production of online learning resources: A social network analysis. Poster presented at the third Annual Conference on Educational Data Mining, Pittsburgh, PA.
XU, B., RECKER, M., AND HSI, S. 2010. The data deluge: Opportunities for research in educational digital libraries. In Internet Issues: Blogging, the Digital Divide and Digital Libraries, C. M. EVANS, Eds. Nova Science Publishers, Hauppauge, NY.
YPMA. T. J. 1995. Historical development of the Newton-Raphson Method. SIAM Review 37, 531-551.
ZIA, L. L. 2001. Growing a national learning environments and resources network for science, mathematics, engineering, and technology education. D-Lib Magazine 7.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.