Exploring Induced Pedagogical Strategies Through a Markov Decision Process Framework: Lessons Learned



Published Dec 26, 2018
Shitian Shen Behrooz Mostafavi Tiffany Barnes Min Chi


An important goal in the design and development of Intelligent Tutoring Systems (ITSs) is to have a system that adaptively reacts to students’ behavior in the short term and effectively improves their learning performance in the long term. Inducing effective pedagogical strategies that accomplish this goal is an essential challenge. To address this challenge, we explore three aspects of a Markov Decision Process (MDP) framework through four experiments. The three aspects are: 1) reward function, detecting the impact of immediate and delayed reward on effectiveness of the policies; 2) state representation, exploring ECR-based, correlation-based, and ensemble feature selection approaches for representing the MDP state space; and 3) policy execution, investigating the effectiveness of stochastic and deterministic policy executions on learning. The most important result of this work is that there exists an aptitude-treatment interaction (ATI) effect in our experiments: the policies have significantly different impacts on the particular types of students as opposed to the entire population. We refer the students who are sensitive to the policies as the Responsive group. All our following results are based on the Responsive group. First, we find that an immediate reward can facilitate a more effective induced policy than a delayed reward. Second, The MDP policies induced based on low correlation-based and ensemble feature selection approaches are more effective than a Random yet reasonable policy. Third, no significant improvement was found using stochastic policy execution due to a ceiling effect.

How to Cite

Shen, S., Mostafavi, B., Barnes, T., & Chi, M. (2018). Exploring Induced Pedagogical Strategies Through a Markov Decision Process Framework: Lessons Learned. JEDM | Journal of Educational Data Mining, 10(3), 27-68. Retrieved from https://jedm.educationaldatamining.org/index.php/JEDM/article/view/319
Abstract 130 | PDF Downloads 85



reinforcement learning, intelligent tutoring systems, problem solving, worked example, pedagogical strategy

ATKINSON, R. K., RENKL, A., AND MERRILL, M. M. 2003. Transitioning from studying examples to solving problems: Effects of self-explanation prompts and fading worked-out steps. Journal of Educational Psychology 95, 4, 774–783.

BACH, F. R. 2009. Exploring large feature spaces with hierarchical multiple kernel learning. In Advances in Neural Information Processing Systems. 105–112.

BAKER, R. S., CORBETT, A. T., KOEDINGER, K. R., AND WAGNER, A. Z. 2004. Off-task behavior in the cognitive tutor classroom: When students game the system. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 383–390.

BECK, J., WOOLF, B. P., AND BEAL, C. R. 2000. Advisor: A machine learning architecture for intelligent tutor construction. AAAI/IAAI 2000, 552-557, 1–2.

BEHROOZ, M. AND TIFFANY, B. 2017. Evolution of an intelligent deductive logic tutor using data-driven elements. International Journal of Artificial Intelligence in Education 27, 1, 5–36.

BROWN, J. S., COLLINS, A., AND DUGUID, P. 1989. Situated cognition and the culture of learning. Educational Researcher 18, 1, 32–42.

CHANDRASHEKAR, G. AND SAHIN, F. 2014. A survey on feature selection methods. Computers & Electrical Engineering 40, 1, 16–28.

CHI, M. AND VANLEHN, K. 2010. Meta-cognitive strategy instruction in intelligent tutoring systems: How, when, and why. Journal of Educational Technology & Society 13, 1, 25–39.

CHI, M., VANLEHN, K., LITMAN, D., AND JORDAN, P. 2011. Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction 21, 1-2, 137–180.

CLEMENT, B., OUDEYER, P.-Y., AND LOPES, M. 2016. A comparison of automatic teaching strategies for heterogeneous student populations. In Proceedings of the 9th International Conference on Educational Data Mining, T. Barnes, M. Chi, and M. Feng, Eds. 330–335.

CRONBACH, L. J. AND SNOW, R. E. 1977. Aptitudes and instructional methods: A handbook for research on interactions. Oxford, England: Irvington.


GRAESSER, A. 2010. A time for emoting: When affect-sensitivity is and isn’t effective at promoting deep learning. In International Conference on Intelligent Tutoring Systems. Springer, 245–254.

GERJETS, P., SCHEITER, K., AND CATRAMBONE, R. 2006. Can learning from molar and modular worked examples be enhanced by providing instructional explanations and prompting selfexplanations? Learning and Instruction 16, 2, 104–121.

GOLDBERGER, J., HINTON, G. E., ROWEIS, S. T., AND SALAKHUTDINOV, R. R. 2005. Neighbourhood components analysis. In Advances in Neural Information Processing Systems. 513–520.

GONZALEZ´ -ESPADA, W. J. AND BULLOCK, D. W. 2007. Innovative applications of classroom response systems: Investigating students’ item response times in relation to final course grade, gender, general point average, and high school act scores. Electronic Journal for the Integration of Technology in Education 6, 97–108.

HALL, M. A. 1999. Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato.

IGLESIAS, A., MART´INEZ, P., ALER, R., AND FERNANDEZ´ , F. 2009a. Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Applied Intelligence 31, 1, 89–106.

IGLESIAS, A., MART´INEZ, P., ALER, R., AND FERNANDEZ´ , F. 2009b. Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems. Knowledge-Based Systems 22, 4, 266–270.

IGLESIAS, A., MART´INEZ, P., AND FERNANDEZ´ , F. 2003. An experience applying reinforcement learning in a web-based adaptive and intelligent educational system. Informatics in Education 2, 223–240.

JAAKKOLA, T., SINGH, S. P., AND JORDAN, M. I. 1995. Reinforcement learning algorithm for partially observable Markov decision problems. In Advances in Neural Information Processing Systems. 345– 352.

KALYUGA, S., AYRES, P., CHANDLER, P., AND SWELLER, J. 2003. The expertise reversal effect. Educational psychologist 38, 1, 23–31.

KELLER, P. W., MANNOR, S., AND PRECUP, D. 2006. Automatic basis function construction for approximate dynamic programming and reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 449–456.

KENT, J. T. 1983. Information gain and a general measure of correlation. Biometrika 70, 1, 163–173.

KOEDINGER, K. R. AND ALEVEN, V. 2007. Exploring the assistance dilemma in experiments with cognitive tutors. Educational Psychology Review 19, 3, 239–264.

KOENIG, S. AND SIMMONS, R. 1998. Xavier: A robot navigation architecture based on partially observable Markov decision process models. Artificial Intelligence Based Mobile Robotics: Case Studies of Successful Robot Systems, 91–122.

KOLTER, J. Z. AND NG, A. Y. 2009. Regularization and feature selection in least-squares temporal difference learning. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 521–528.

KOPRINSKA, I., RANA, M., AND AGELIDIS, V. G. 2015. Correlation and instance based feature selection for electricity load forecasting. Knowledge-Based Systems 82, 29–40.

LEE, C. AND LEE, G. G. 2006. Information gain and divergence-based feature selection for machine learning-based text categorization. Information Processing & Management 42, 1, 155–165.

LI, L., WILLIAMS, J. D., AND BALAKRISHNAN, S. 2009. Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection. In 10th Annual Conference of the International Speech Communication Association. 2475–2478.

LITTMAN, M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning Proceedings. Elsevier, 157–163.

LUCE, R. D. ET AL. 1986. Response times: Their role in inferring elementary mental organization. Number 8. Oxford University Press on Demand.

MANDEL, T., LIU, Y.-E., LEVINE, S., BRUNSKILL, E., AND POPOVIC, Z. 2014. Offline policy evaluation across representations with applications to educational games. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. 1077–1084.

MARTIN, K. N. AND ARROYO, I. 2004. Agentx: Using reinforcement learning to improve the effectiveness of intelligent tutoring systems. In International Conference on Intelligent Tutoring Systems. 564–572.

MCHUGH, M. L. 2013. Chi-squared test of independence. Biochem Med (Zagreb) 23, 2, 105–133.

MCLAREN, B. M. AND ISOTANI, S. 2011. When is it best to learn with all worked examples? In

International Conference on Artificial Intelligence in Education. Springer, 222–229.

MCLAREN, B. M., LIM, S.-J., AND KOEDINGER, K. R. 2008. When and how often should worked examples be given to students? New results and a summary of the current state of research. In Proceedings of the 30th Annual Conference of the Cognitive Science Society. 2176–2181.

MCLAREN, B. M., VAN GOG, T., GANOE, C., YARON, D., AND KARABINOS, M. 2014. Exploring the assistance dilemma: Comparing instructional support in examples and problems. In Intelligent Tutoring Systems. Springer, 354–361.


AND RIEDMILLER, M. 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.


GRAVES, A., RIEDMILLER, M., FIDJELAND, A. K., OSTROVSKI, G., ET AL. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540, 529–533.

MOSTAFAVI, B., ZHOU, G., LYNCH, C., CHI, M., AND BARNES, T. 2015. Data-driven worked examples improve retention and completion in a logic tutor. In Artificial Intelligence in Education. Springer, 726–729.

NAJAR, A. S., MITROVIC, A., AND MCLAREN, B. M. 2014. Adaptive support versus alternating worked examples and tutored problems: Which leads to better learning? In User Modeling, Adaptation, and Personalization. Springer, 171–182.

NARASIMHAN, K., KULKARNI, T., AND BARZILAY, R. 2015. Language understanding for text-based games using deep reinforcement learning. arXiv preprint arXiv:1506.08941.

PESHKIN, L. AND SHELTON, C. R. 2002. Learning from scarce experience. arXiv preprint cs/0204043.

RAFFERTY, A. N., BRUNSKILL, E., GRIFFITHS, T. L., AND SHAFTO, P. 2016. Faster teaching via pomdp planning. Cognitive Science 40, 6, 1290–1332.

RAZZAQ, L. M. AND HEFFERNAN, N. T. 2009. To tutor or not to tutor: That is the question. In Artificial Intelligence in Education. 457–464.

RENKL, A. 2002. Worked-out examples: Instructional explanations support learning by selfexplanations. Learning and Instruction 12, 5, 529–556.

RENKL, A., ATKINSON, R. K., MAIER, U. H., AND STALEY, R. 2002. From example study to problem solving: Smooth transitions help learning. The Journal of Experimental Education 70, 4, 293–315.

SALDEN, R. J., ALEVEN, V., SCHWONKE, R., AND RENKL, A. 2010. The expertise reversal effect and worked examples in tutored problem solving. Instructional Science 38, 3, 289–307.

SCHNIPKE, D. L. AND SCRAMS, D. J. 2002. Exploring issues of examinee behavior: Insights gained from response-time analyses. Computer-based testing: Building the foundation for future assessments, 237–266.

SHEN, S. AND CHI, M. 2016. Reinforcement learning: The sooner the better, or the later the better? In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization. ACM, 37–44.

SNOW, R. E. 1991. Aptitude-treatment interaction as a framework for research on individual differences in psychotherapy. Journal of Consulting and Clinical Psychology 59, 2, 205–216.

STAMPER, J., EAGLE, M., BARNES, T., AND CROY, M. 2013. Experimental evaluation of automatic hint generation for a logic tutor. International Journal of Artificial Intelligence in Education 22, 1-2, 3–17.

SUTTON, R. S. AND BARTO, A. G. 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.

TAYLOR, R. S., O’REILLY, T., SINCLAIR, G. P., AND MCNAMARA, D. S. 2006. Enhancing learning of expository science texts in a remedial reading classroom via istart. In Proceedings of the 7th International Conference on Learning Sciences. International Society of the Learning Sciences, 765–770.

TETREAULT, J. R., BOHUS, D., AND LITMAN, D. J. 2007. Estimating the reliability of MDP policies: A confidence interval approach. In Proceedings Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics. 276–283.

TETREAULT, J. R. AND LITMAN, D. J. 2008. A reinforcement learning approach to evaluating state representations in spoken dialogue systems. Speech Communication 50, 8, 683–696.

VAN GOG, T., KESTER, L., AND PAAS, F. 2011. Effects of worked examples, example-problem, and problem-example pairs on novices’ learning. Contemporary Educational Psychology 36, 3, 212–218.

VYGOTSKY, L. 1978. Interaction between learning and development. Readings on the development of children 23, 3, 34–41.

WANG, P., ROWE, J., MIN, W., MOTT, B., AND LESTER, J. 2017. Interactive narrative personalization with deep reinforcement learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. 3852–3858.

WHITEHILL, J. AND MOVELLAN, J. 2018. Approximately optimal teaching of approximately optimal learners. IEEE Transactions on Learning Technologies 11, 2, 152–164.

WILLIAMS, J. D. 2008. The best of both worlds: Unifying conventional dialog systems and POMDPs. In Ninth Annual Conference of the International Speech Communication Association. 1173–1176.

WRIGHT, R., LOSCALZO, S., AND YU, L. 2012. Embedded incremental feature selection for reinforcement learning. In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence. Vol. 1. 263–268.

YANG, Y. AND PEDERSEN, J. O. 1997. A comparative study on feature selection in text categorization. In International Conference on Machine Learning. Vol. 97. 412–420.

YU, L. AND LIU, H. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In International Conference on Machine Learning. Vol. 3. 856–863.

Most read articles by the same author(s)