Design and Discovery in Educational Assessment: Evidence-Centered Design, Psychometrics, and Educational Data Mining

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Oct 1, 2012
Robert J. Mislevy John T. Behrens Kristen E. Dicerbo Roy Levy

Abstract

Evidence-centered design (ECD) is a comprehensive framework for describing the conceptual, computational and inferential elements of educational assessment. It emphasizes the importance of articulating inferences one wants to make and the evidence needed to support those inferences. At first blush, ECD and educational data mining (EDM) might seem in conflict: structuring situations to evoke particular kinds of evidence, versus discovering meaningful patterns in available data. However, a dialectic between the two stances increases understanding and improves practice. We first introduce ECD and relate its elements to the broad range of digital inputs relevant to modern assessment. We then discuss the relation between EDM and psychometric activities in educational assessment. We illustrate points with examples from the Cisco Networking Academy, a global program in which information technology is taught through a blended program of face-to-face classroom instruction, an online curriculum, and online assessments.

How to Cite

Mislevy, R. J., Behrens, J. T., Dicerbo, K. E., & Levy, R. (2012). Design and Discovery in Educational Assessment: Evidence-Centered Design, Psychometrics, and Educational Data Mining. Journal of Educational Data Mining, 4(1), 11–48. https://doi.org/10.5281/zenodo.3554641
Abstract 4447 | PDF Downloads 1630

##plugins.themes.bootstrap3.article.details##

Keywords

evidence-centered design, educational data mining, psychometrics, games and simulations, Cisco Networking Academy

References
ALMOND, R. G., STEINBERG, L. S., and MISLEVY, R. J. 2001. A sample assessment using the four process framework. CSE Technical Report 543. The National Center for Research on Evaluation, Standards, Student Testing (CRESST), Center for Studies in Education, UCLA, Los Angeles, CA. Retrieved from http://www.cse.ucla.edu/products/reports/TECH543.pdf

ALMOND, R.G., STEINBERG, L.S., AND MISLEVY, R.J. 2002. Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 5. Retrieved from http://escholarship.bc.edu/ojs/index.php/jtla/article/viewFile/1671/1509

ALMOND, R.G., STEINBERG, L.S., AND MISLEVY, R.J. 2003. A framework for reusing assessment components. In New Developments in Psychometrics, H. YANAI, A. OKADA, K. SHIGEMASU, Y. KANO, AND J.J. MEULMAN, Eds. Springer, Tokyo, Japan, 281-288.

AMERSHI, S., AND CONATI, C. 2010. Automatic recognition of learner types in exploratory learning environments. In Handbook of Educational Data Mining, C. ROMERO, S. VENTURA, S. VIOLA, M. PECHENIZKIY, AND R. BAKER, Eds. Chapman and Hall/CRC, Virginia Beach, VA, 213-229.

ARROYO, I., COOPER, D. G., BURLESON, W., AND WOOLF, B. 2010. Bayesian networks and linear regression models of students’ goals, moods, and emotions. In Handbook of Educational Data Mining, C. ROMERO, S. VENTURA, S. VIOLA, M. PECHENIZKIY, AND R. BAKER, Eds. Chapman and Hall/CRC, Virginia Beach, VA, 323-338.

BECK, J. 2005 Engagement tracing: Using response times to model student disengagement. In Proceedings of the 12th International Conference on Artificial Intelligence in Education, IOS Press, Amsterdam, The Netherlands, 88-95.

BEHRENS, J. T. 1997. Principles and procedures of exploratory data analysis. Psychological Methods, 2, 131-160.

BEHRENS, J.T., COLLISON, T.A., AND DEMARK, S.F. 2005. The Seven Cs of comprehensive assessment: Lessons learned from 40 million classroom exams in the Cisco Networking Academy Program. In Online Assessment and Measurement: Case Studies in Higher Education, K-12 and Corporate, S. HOWELL AND M. HRICKO, Eds. Information Science Publishing, Hershey, PA, 229-245.

BEHRENS, J. T., DICERBO, K. E., YEM, N., LEVY, R. in press. Exploratory data analysis. In Handbook of Psychology, 2nd ed., Volume II: Research Methods in Psychology, W. F. Velicer and I. Winer Eds. Wiley and Sons, New York, NY.

BEHRENS, J.T., FREZZO, D.C., MISLEVY, R.J., KROOPNICK, M., AND WISE, D. 2008. Structural, Functional and Semiotic Symmetries in Simulation-Based Games and Assessments. In Assessment of Problem Solving Using Simulations, E.L. BAKER, J. DICKIESON, W. WULFECK, AND H.F. O'NEIL, Eds. Erlbaum, New York, NY, 59-80.

BEHRENS, J.T., MISLEVY, R.J., DICERBO, K.E., AND LEVY, R. 2012. Evidence centered design for learning and assessment in the digital world. In Technology-Based Assessments for 21st Century Skills: Theoretical and Practical Implications from Modern Research, M. MAYRATH, J. CLARKE-MIDURA, AND D. H. ROBINSON, Eds. Information Age Publishing, Charlotte, NC, 13-54.

BEHRENS, J. T., AND SMITH, M. L. 1996. Data and data analysis. In Handbook of Educational Psychology, D. C. BERLINER AND R. C. CALFEE, Eds. MacMillan, New York, NY, 945–989.

BEJAR, I.I. 2010. Can Speech Technology Improve Assessment and Learning? New Capabilities May Facilitate Assessment Innovations. RDC-15. Princeton: Educational Testing Service. Available online at http://www.ets.org/research/policy_research_reports/rdc-15 .

BERNSTEIN, J. 1999. PhonePass Testing: Structure and Construct. Ordinate Corporation, Menlo Park, CA.

BLEICHER, E., HAWORTH, G. M., AND VAN DER HEIJDEN, H. M. J. F. 2010. Data-mining chess databases. ICGA Journal, 33 (4), 212-214.

CHUNG, G. K. W. K., NAGASHIMA, S. O., DELACRUZ, G. C., LEE, J. J., WAINESS, R., AND BAKER, E. L. 2011. Review of Rifle Marksmanship Training Research. CRESST Research Report. Los Angeles: The National Center for Research on Evaluation, Standards, Student Testing (CRESST), Center for Studies in Education, UCLA. Retrieved from http://www.cse.ucla.edu/products/reports/R783.pdf

CRONBACH, L.J. 1980. Validity on parole: How can we go straight? New directions for testing and measurement: Measuring achievement over a decade. In Proceedings of the 1979 ETS Invitational Conference, Jossey-Bass, San Francisco, CA, 99-108.

CRONBACH, L.J. 1988. Five perspectives on validity argument. In Test Validity, H. Wainer and H. Braun, Eds. Lawrence Erlbaum, Hillsdale, NJ, 3-17.

CRONBACH, L.J., GLESER, G.C., NANDA, H., AND RAJARATNAM, N. 1972. The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. John Wiley, New York, NY.

DE AYALA, R.J. 2009. The Theory and Practice of Item Response Theory. Guilford Press, New York, NY.

DEMARK, S. F. AND BEHRENS, J. T. 2004. Using statistical natural language processing for understanding complex responses to free-response tasks. International Journal of Testing, 4, 371-390.

DICERBO, K. E. 2007. Knowledge structures of entering networking students and their instructors. Journal of Information Technology Education, 6, 263-277. Retrieved from http://www.jite.org/documents/Vol6/JITEv6p263-277DiCerbo252.pdf

DICERBO, K. E. 2009. Communicating with instructors about complex data analysis. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA. Retrieved from https://research.netacad.net/mod/data/view.php?d=1&rid=29

DICERBO, K. E. AND BEHRENS, J. T. 2012. Implications of the digital ocean on current and future assessment. In Computers and Their Impact on State Assessment: Recent History and Predictions for the Future, R. LISSITZ AND H. JIAO, Eds. Information Age Publishing, Charlotte, NC, 273-306.

EMBRETSON, S. 1983. Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93, 179-197.

EMBRETSON, S.E. (Ed.) 1985. Test Design: Developments in Psychology and Psychometrics. Academic Press, Orlando, FL.

FREZZO, D.C., BEHRENS, J.T., AND MISLEVY, R.J. 2009. Design patterns for learning and assessment: facilitating the introduction of a complex simulation-based learning environment into a community of instructors. The Journal of Science Education and Technology. Springer Open Access http://www.springerlink.com/content/566p6g4307405346/

FREZZO, D.C., BEHRENS, J.T, MISLEVY, R.J., WEST, P., AND DICERBO, K.E. 2009. Psychometric and evidentiary approaches to simulation assessment in Packet Tracer software. In ICNS '09: Proceedings of the Fifth International Conference on Networking and Services, IEEE Computer Society, Washington, D.C., 555 – 560.

GARCIA, G., ROMERO, C., VENTURA, S, DE CASTRO, C., AND CALDERS, T. 2010. Association rule mining in learning management systems. In Handbook of Educational Data Mining, C. Romero, S. Ventura, S. Viola, M. Pechenizkiy, and R. Baker, Eds. Chapman and Hall/CRC, Virginia Beach, VA, 93-106.

GITOMER, D. H., AND YAMAMOTO, K. 1991. Performance modeling that integrates latent trait and class theory. Journal of Educational Measurement, 28, 173–189.

GULLIKSEN, H. 1961. Measurement of learning and mental abilities. Psychometrika, 26, 93-107.

HAMBLETON, R., AND SWAMINATHAN, H. 1985. Item Response Theory: Principles and Applications. Kluwer Nijhoff Publishing, Boston, MA.

HERSHKOVITZ, A., AND NACHMIAS, R. 2010. Log-based assessment of motivation in online learning. In Handbook of Educational Data Mining, C. ROMERO, S. VENTURA, S. VIOLA, M. PECHENIZKIY, AND R. BAKER, Eds. Chapman and Hall/CRC, Virginia Beach, VA, 287-297.

HURST, K., CASILLAS, A., AND STEVENS, R. 1997. Exploring the Dynamics of Complex Problem-solving with Artificial Neural Network-based Assessment Systems. (CRESST Report 444). National Center for Research on Evaluation, Standards and Student Testing (CRESST), Los Angeles, CA. Retrieved from http://www.cse.ucla.edu/products/reports/TECH444.pdf

ISELI, M.R., KOENIG, A.D., LEE, J.J., AND WAINESS, R. 2010. Automatic Assessment of Complex Task Performance in Games and Simulations. CRESST Technical Report 775. University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST), Los Angeles, CA. Retrieved from http://www.cse.ucla.edu/products/reports/R775.pdf

JENSEN, F.V. 1996. An Introduction to Bayesian Networks. Springer, New York, NY.

JUNKER, B. 2011. The role of nonparametric analysis in assessment modeling: Then and now. In Looking Back: Proceedings of a Conference in Honor of Paul W. Holland, N. J. Dorans and S. Sinharay Eds. Springer, New York, NY, 67-85.

KANE, M.T. 1992. An argument-based approach to validation. Psychological Bulletin, 112, 527-535.

KANE, M.T. 2006. Validation. In Educational Measurement, 4th ed., R. L. Brennan, Ed., American Council on Education and Praeger Publishers, Westport, CT, 17-64.

KERR, D., AND CHUNG, G. K. W. K. this issue. Using cluster analysis to identify key features of student performance in educational video games and simulations. Journal of Educational Data Mining.

LEIGHTON, J. P., AND GIERL, M. J. (Eds.) 2007. Cognitive Diagnostic Assessment for Education: Theory and Practices. Cambridge University Press, Cambridge, UK.

LEVY, R., AND MISLEVY, R.J. 2004. Specifying and refining a measurement model for a simulation-based assessment. International Journal of Testing, 4, 333-369.

LEVY, F., AND MURNANE, R.J. 2004. The New Division of Labor: How Computers are Creating the Next Job Market. Princeton University Press, Princeton, NJ.

LEWIS, C. 1986. Test theory and Psychometrika: The past twenty-five years. Psychometrika, 51, 11-22.

LIU, M., AND HAERTEL, G. 2011. Design Patterns: A Tool to Support Assessment Task Authoring. Large-Scale Assessment Technical Report 11. Menlo Park, CA: SRI International. Retrieved from http://ecd.sri.com/downloads/ECD_TR11_DP_Supporting_Task_Authoring.pdf

LORD, F.M. 1980. Applications of Item Response Theory to Practical Testing Problems. Erlbaum, Hillsdale, NJ.

MANNING, C.D. AND SCHÜTZE, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.

MAYRATH, M., CLARKE-MIDURA, J. AND ROBINSON, D.H. 2012. TechnologyBased Assessments for 21st Century skills: Theoretical and Practical Implications from Modern Research. Information Age Publishing, Charlotte, NC.

MESSICK, S. 1989. Validity. In Educational Measurement, 3rd ed., R. Linn, Ed., American Council on Education, Washington, D.C., 13-103.

MESSICK, S. 1994. The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23.

MISLEVY, R.J. 1994. Evidence and inference in educational assessment. Psychometrika, 59, 439-483.

MISLEVY, R.J. 2006. Cognitive psychology and educational assessment. In Educational Measurement, 4th ed., R.L. Brennan, Ed. Greenwood, Phoenix, AZ, 257-305.

MISLEVY, R.J. 2011. Evidence-centered Design for Simulation-based Assessment. Military Medicine. CRESST Report 800. University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST), Los Angeles, CA. Retrieved from http://www.cse.ucla.edu/products/reports/R800.pdf

MISLEVY, R.J., BEHRENS, J.T., BENNETT, R.E., DEMARK, S.F., FREZZO, D.C., LEVY, R., ROBINSON, D. H., RUTSTEIN, D.W., SHUTE, V.J., STANLEY, K., AND WINTERS, F.I. 2010. On the roles of external knowledge representations in assessment design. The Journal of Technology, Learning, and Assessment, 8(2). Retrieved from http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1621

MISLEVY, R.J. BEHRENS, J.T., DICERBO, K.E., FREZZO, D.C., AND WEST, P. in press. Three things game designers need to know about assessment. In Assessment in Game-Based Learning: Foundations, Innovations, and Perspectives, D. IFENTHALER, D. ESERYEL, AND X. GE, Eds. Springer, New York, NY.

MISLEVY, R.J., AND GITOMER, D.H. 1996. The role of probability-based inference in an intelligent tutoring system. User-Modeling and User-Adapted Interaction, 5, 253-282.

MISLEVY, R.J. AND RICONSCENTE, M.M. 2005. Evidence-Centered Assessment Design: Layers, Structures, and Terminology. PADI Technical Report. Retrieved from http://padi.sri.com/downloads/TR9_ECD.pdf

MISLEVY, R.J., AND RICONSCENTE, M.M. 2006. Evidence-centered assessment design: Layers, concepts, and terminology. In Handbook of Test Development, S. Downing and T. Haladyna, Eds. Erlbaum, Mahwah, NJ, 61-90.

MISLEVY, R.J., RICONSCENTE, M.M., AND RUTSTEIN, D.W. 2009. Design Patterns for Assessing Model Based Reasoning. PADI-Large Systems Technical Report 6. Menlo Park, CA: SRI International. Retrieved from http://ecd.sri.com/downloads/ECD_TR6_Model-Based_Reasoning.pdf

MISLEVY, R.J., STEINBERG, L.S., AND ALMOND, R.G. 1999. On the Roles of Task Model Variables in Assessment Design. CSE Technical Report 500. The National Center for Research on Evaluation, Standards, Student Testing (CRESST), Center for Studies in Education, UCLA, Los Angeles, CA. Retrieved from http://www.cse.ucla.edu/products/reports/TECH500.pdf

MISLEVY, R.J., STEINBERG, L.S., AND ALMOND, R.G. 2003. On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3-62.

MOSTELLER, F., AND TUKEY, J.W. 1977. Data Analysis and Regression: A Second course in Statistics, 1st ed., Addison Wesley, New York, NY.

MURNANE, R.J., SHARKEY, N.S., AND LEVY, F. 2002. A role for the internet in America education? Lessons from the Cisco Networking Academies. In The Knowledge Economy and Postsecondary Education, P. Graham and N. Stacey, Eds. National Academy Press , Washington, DC, 127-157.

PARDOS, Z. A., HEFFERNAN, N. T., ANDERSON, B. S., AND HEFFERNAN, C. L. (2010). Using fine-grained skill models to fit student performance with Bayesian networks. In Handbook of Educational Data Mining, C. ROMERO, S. VENTURA, S. VIOLA, M. PECHENIZKIY, AND R. BAKER, Eds. Chapman and Hall/CRC, Virginia Beach, VA, 417-426.

PEARL, J. 1988. Probabilistic Reasoning in Intelligent Systems. Kaufmann, San Mateo, CA.

PELLEGRINO, J., CHUDOWSKY, N., AND GLASER, R. (Eds.). 2001. Knowing What Students Know: The Science and Design of Educational Assessment. National Research Council’s Committee on the Foundations of Assessment. National Academy Press, Washington, DC.

PIRRONE, R., COSSENTINO, M., PILATO, G., AND RIZZO, R. 2003. Concept maps and course ontology: A multi-level approach to e-learning. In Proceedings of the 8th AI*IA Workshop on Artificial Intelligence and E-learning, Pisa, Italy.

RAMSAY, J.O. 1982. When the data are functions. Psychometrika, 47, 379-396.

RECKASE, M.D. 2009. Multidimensional Item Response Theory. Springer, New York, NY.

ROMERO, C., VENTURA, S., PECHENIZKIY, M., AND BAKER, R.S.J.D. (Eds.). 2011. Handbook of Educational Data Mining. CRC Press, Boca Raton, FL.

RUMBAUGH, J., BLAHA, M., PREMERLANI, W., EDDY, F., AND LORENSEN, W. 1991. Object-Oriented Modeling and Design. Prentice-Hall, Englewood Cliffs, NJ.

GINSBERG, M. 1987. Readings in Nonmonotonic Reasoning. Morgan Kaufmann, Los Altos, CA.

RICONSCENTE, M., MISLEVY, R.J., AND HAMEL, L. 2005. An Introduction to PADI Task Templates. PADI Technical Report 3. SRI International, Menlo Park, CA. Retrieved from http://padi.sri.com/downloads/TR3_Templates.pdf

RUPP, A.A. 2002. Feature selection for choosing and assembling measurement models: A building-block-based organization. International Journal of Testing, 2, 311-360.

RUPP, A.A., DICERBO, K.E., LEVY, R., BENSON, M., SWEET, S., CRAWFORD, A.V., FAY, D., KUNZE, K. L., CALIÇO, T., AND BEHRENS, J.T. this issue. Putting ECD into practice: The interplay of theory and data in evidence identification and accumulation within a digital learning environment. Journal of Educational Data Mining.

RUPP, A.A., GUSHTA, M., MISLEVY, R.J., AND SHAFFER, D.W. (2010). Evidencecentered design of epistemic games: Measurement principles for complex learning environments. The Journal of Technology, Learning, and Assessment, 8(4). Retrieved from http://escholarship.bc.edu/jtla/vol8/4

RUPP, A.A., TEMPLIN, J., AND HENSON, R.J. (2010). Diagnostic Measurement: Theory, Methods, and Applications. Guilford Press, New York, NY.

SHAFFER, D.W. 2006. Epistemic frames for epistemic games. Computers & Education, 46, 223-234.

SHUTE, V. J. 2011. Stealth assessment in computer-based games to support learning. In Computer Games and Instruction, S. TOBIAS AND J.D. FLETCHER, Eds. Information Age Publishers, Charlotte, NC, 503-523.

STEINBERG, L.S., MISLEVY, R.J., AND ALMOND R.G. 2005. Portal Assessment Design System for Educational Testing. U.S. Patent 434350000, August 4, 2005.

TANNENBAUM, A.S. 2003. Computer Networks. Pearson Education, Upper Saddle River, NJ.

TATSUOKA, K.K. 1983. Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345-354.

THEODORIDIS, S. AND KOUTROUMBAS, K. 1999. Pattern Recognition. Academic Press, San Diego, CA.

TUKEY, J.W. 1977. Exploratory Data Analysis. Adison Wesley, Reading, MA.

TRCKA, N., PECHENIZKIY, M., AND VAN DER AALST, W. 2010. Process mining from educational data. In Handbook of Educational Data Mining, C. ROMERO, S. VENTURA, S. VIOLA, M. PECHENIZKIY, AND R. BAKER, Eds. Chapman and Hall/CRC, Virginia Beach, VA, 123-142.

VAN DER LINDEN, W.J., AND GLAS, C.A.W. (Eds.) 2010. Elements of Adaptive Testing. Springer, New York, NY.

VANLEHN, K. 2008. Intelligent tutoring systems for continuous, embedded assessment. In The Future of Assessment: Shaping Teaching and Learning, C.A. Dwyer, Ed. Erlbaum, New York, NY, 113-138.

WAINER, H., DORANS, N.J., FLAUGHER, R., GREEN, B.F., AND MISLEVY, R.J. 2000. Computerized Adaptive Testing: A Primer. Routledge, New York, NY.

WEST, P., RUTSTEIN, D.W., MISLEVY, R.J., LIU, J., LEVY, R., DICERBO, K.E., CRAWFORD, A., CHOI, Y., CHAPPEL, K., AND BEHRENS, J.T. 2010. A Bayesian Network Approach to Modeling Learning Progressions. CRESST Research Report. The National Center for Research on Evaluation, Standards, Student Testing (CRESST), Center for Studies in Education, UCLA, Los Angeles, CA. Retrieved from http://www.cse.ucla.edu/products/download_report.asp?r=776

WILLIAMSON, D. M., BAUER, M., STEINBERG, L. S., MISLEVY, R. J., BEHRENS, J. T., AND DEMARK, S. 2004. Design rationale for a complex performance assessment. International Journal of Measurement, 4, 333–369.

WILLIAMSON, D.M., MISLEVY, R.J., AND BEJAR, I.I. (Eds.). 2006. Automated Scoring of Complex Performances in Computer Based Testing. Erlbaum Associates, Mahwah, NJ.

WITTEN, I.H., AND FRANK, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. Morgan Kaufmann, San Francisco, CA.

ZHOU, M., XU, Y., NESBIT, J.C., AND WINNE, P.H. 2010. Sequential pattern analysis of learning logs: Methodology and applications. In Handbook of Educational Data Mining, C. ROMERO, S. VENTURA, S. VIOLA, M. PECHENIZKIY, AND R. BAKER, Eds. Chapman and Hall/CRC, Virginia Beach, VA, 107-121.
Section
Articles

Most read articles by the same author(s)