An important goal in the design and development of Intelligent Tutoring Systems (ITSs) is to have a system that adaptively reacts to students’ behavior in the short term and effectively improves their learning performance in the long term. Inducing effective pedagogical strategies that accomplish this goal is an essential challenge. To address this challenge, we explore three aspects of a Markov Decision Process (MDP) framework through four experiments. The three aspects are: 1) reward function, detecting the impact of immediate and delayed reward on effectiveness of the policies; 2) state representation, exploring ECR-based, correlation-based, and ensemble feature selection approaches for representing the MDP state space; and 3) policy execution, investigating the effectiveness of stochastic and deterministic policy executions on learning. The most important result of this work is that there exists an aptitude-treatment interaction (ATI) effect in our experiments: the policies have significantly different impacts on the particular types of students as opposed to the entire population. We refer the students who are sensitive to the policies as the Responsive group. All our following results are based on the Responsive group. First, we find that an immediate reward can facilitate a more effective induced policy than a delayed reward. Second, The MDP policies induced based on low correlation-based and ensemble feature selection approaches are more effective than a Random yet reasonable policy. Third, no significant improvement was found using stochastic policy execution due to a ceiling effect.
How to Cite
reinforcement learning, intelligent tutoring systems, problem solving, worked example, pedagogical strategy
BACH, F. R. 2009. Exploring large feature spaces with hierarchical multiple kernel learning. In Advances in Neural Information Processing Systems. 105–112.
BAKER, R. S., CORBETT, A. T., KOEDINGER, K. R., AND WAGNER, A. Z. 2004. Off-task behavior in the cognitive tutor classroom: When students game the system. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 383–390.
BECK, J., WOOLF, B. P., AND BEAL, C. R. 2000. Advisor: A machine learning architecture for intelligent tutor construction. AAAI/IAAI 2000, 552-557, 1–2.
BEHROOZ, M. AND TIFFANY, B. 2017. Evolution of an intelligent deductive logic tutor using data-driven elements. International Journal of Artificial Intelligence in Education 27, 1, 5–36.
BROWN, J. S., COLLINS, A., AND DUGUID, P. 1989. Situated cognition and the culture of learning. Educational Researcher 18, 1, 32–42.
CHANDRASHEKAR, G. AND SAHIN, F. 2014. A survey on feature selection methods. Computers & Electrical Engineering 40, 1, 16–28.
CHI, M. AND VANLEHN, K. 2010. Meta-cognitive strategy instruction in intelligent tutoring systems: How, when, and why. Journal of Educational Technology & Society 13, 1, 25–39.
CHI, M., VANLEHN, K., LITMAN, D., AND JORDAN, P. 2011. Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction 21, 1-2, 137–180.
CLEMENT, B., OUDEYER, P.-Y., AND LOPES, M. 2016. A comparison of automatic teaching strategies for heterogeneous student populations. In Proceedings of the 9th International Conference on Educational Data Mining, T. Barnes, M. Chi, and M. Feng, Eds. 330–335.
CRONBACH, L. J. AND SNOW, R. E. 1977. Aptitudes and instructional methods: A handbook for research on interactions. Oxford, England: Irvington.
D’MELLO, S., LEHMAN, B., SULLINS, J., DAIGLE, R., COMBS, R., VOGT, K., PERKINS, L., AND
GRAESSER, A. 2010. A time for emoting: When affect-sensitivity is and isn’t effective at promoting deep learning. In International Conference on Intelligent Tutoring Systems. Springer, 245–254.
GERJETS, P., SCHEITER, K., AND CATRAMBONE, R. 2006. Can learning from molar and modular worked examples be enhanced by providing instructional explanations and prompting selfexplanations? Learning and Instruction 16, 2, 104–121.
GOLDBERGER, J., HINTON, G. E., ROWEIS, S. T., AND SALAKHUTDINOV, R. R. 2005. Neighbourhood components analysis. In Advances in Neural Information Processing Systems. 513–520.
GONZALEZ´ -ESPADA, W. J. AND BULLOCK, D. W. 2007. Innovative applications of classroom response systems: Investigating students’ item response times in relation to final course grade, gender, general point average, and high school act scores. Electronic Journal for the Integration of Technology in Education 6, 97–108.
HALL, M. A. 1999. Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato.
IGLESIAS, A., MART´INEZ, P., ALER, R., AND FERNANDEZ´ , F. 2009a. Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Applied Intelligence 31, 1, 89–106.
IGLESIAS, A., MART´INEZ, P., ALER, R., AND FERNANDEZ´ , F. 2009b. Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems. Knowledge-Based Systems 22, 4, 266–270.
IGLESIAS, A., MART´INEZ, P., AND FERNANDEZ´ , F. 2003. An experience applying reinforcement learning in a web-based adaptive and intelligent educational system. Informatics in Education 2, 223–240.
JAAKKOLA, T., SINGH, S. P., AND JORDAN, M. I. 1995. Reinforcement learning algorithm for partially observable Markov decision problems. In Advances in Neural Information Processing Systems. 345– 352.
KALYUGA, S., AYRES, P., CHANDLER, P., AND SWELLER, J. 2003. The expertise reversal effect. Educational psychologist 38, 1, 23–31.
KELLER, P. W., MANNOR, S., AND PRECUP, D. 2006. Automatic basis function construction for approximate dynamic programming and reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 449–456.
KENT, J. T. 1983. Information gain and a general measure of correlation. Biometrika 70, 1, 163–173.
KOEDINGER, K. R. AND ALEVEN, V. 2007. Exploring the assistance dilemma in experiments with cognitive tutors. Educational Psychology Review 19, 3, 239–264.
KOENIG, S. AND SIMMONS, R. 1998. Xavier: A robot navigation architecture based on partially observable Markov decision process models. Artificial Intelligence Based Mobile Robotics: Case Studies of Successful Robot Systems, 91–122.
KOLTER, J. Z. AND NG, A. Y. 2009. Regularization and feature selection in least-squares temporal difference learning. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 521–528.
KOPRINSKA, I., RANA, M., AND AGELIDIS, V. G. 2015. Correlation and instance based feature selection for electricity load forecasting. Knowledge-Based Systems 82, 29–40.
LEE, C. AND LEE, G. G. 2006. Information gain and divergence-based feature selection for machine learning-based text categorization. Information Processing & Management 42, 1, 155–165.
LI, L., WILLIAMS, J. D., AND BALAKRISHNAN, S. 2009. Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection. In 10th Annual Conference of the International Speech Communication Association. 2475–2478.
LITTMAN, M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning Proceedings. Elsevier, 157–163.
LUCE, R. D. ET AL. 1986. Response times: Their role in inferring elementary mental organization. Number 8. Oxford University Press on Demand.
MANDEL, T., LIU, Y.-E., LEVINE, S., BRUNSKILL, E., AND POPOVIC, Z. 2014. Offline policy evaluation across representations with applications to educational games. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. 1077–1084.
MARTIN, K. N. AND ARROYO, I. 2004. Agentx: Using reinforcement learning to improve the effectiveness of intelligent tutoring systems. In International Conference on Intelligent Tutoring Systems. 564–572.
MCHUGH, M. L. 2013. Chi-squared test of independence. Biochem Med (Zagreb) 23, 2, 105–133.
MCLAREN, B. M. AND ISOTANI, S. 2011. When is it best to learn with all worked examples? In
International Conference on Artificial Intelligence in Education. Springer, 222–229.
MCLAREN, B. M., LIM, S.-J., AND KOEDINGER, K. R. 2008. When and how often should worked examples be given to students? New results and a summary of the current state of research. In Proceedings of the 30th Annual Conference of the Cognitive Science Society. 2176–2181.
MCLAREN, B. M., VAN GOG, T., GANOE, C., YARON, D., AND KARABINOS, M. 2014. Exploring the assistance dilemma: Comparing instructional support in examples and problems. In Intelligent Tutoring Systems. Springer, 354–361.
MNIH, V., KAVUKCUOGLU, K., SILVER, D., GRAVES, A., ANTONOGLOU, I., WIERSTRA, D.,
AND RIEDMILLER, M. 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
MNIH, V., KAVUKCUOGLU, K., SILVER, D., RUSU, A. A., VENESS, J., BELLEMARE, M. G.,
GRAVES, A., RIEDMILLER, M., FIDJELAND, A. K., OSTROVSKI, G., ET AL. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540, 529–533.
MOSTAFAVI, B., ZHOU, G., LYNCH, C., CHI, M., AND BARNES, T. 2015. Data-driven worked examples improve retention and completion in a logic tutor. In Artificial Intelligence in Education. Springer, 726–729.
NAJAR, A. S., MITROVIC, A., AND MCLAREN, B. M. 2014. Adaptive support versus alternating worked examples and tutored problems: Which leads to better learning? In User Modeling, Adaptation, and Personalization. Springer, 171–182.
NARASIMHAN, K., KULKARNI, T., AND BARZILAY, R. 2015. Language understanding for text-based games using deep reinforcement learning. arXiv preprint arXiv:1506.08941.
PESHKIN, L. AND SHELTON, C. R. 2002. Learning from scarce experience. arXiv preprint cs/0204043.
RAFFERTY, A. N., BRUNSKILL, E., GRIFFITHS, T. L., AND SHAFTO, P. 2016. Faster teaching via pomdp planning. Cognitive Science 40, 6, 1290–1332.
RAZZAQ, L. M. AND HEFFERNAN, N. T. 2009. To tutor or not to tutor: That is the question. In Artificial Intelligence in Education. 457–464.
RENKL, A. 2002. Worked-out examples: Instructional explanations support learning by selfexplanations. Learning and Instruction 12, 5, 529–556.
RENKL, A., ATKINSON, R. K., MAIER, U. H., AND STALEY, R. 2002. From example study to problem solving: Smooth transitions help learning. The Journal of Experimental Education 70, 4, 293–315.
SALDEN, R. J., ALEVEN, V., SCHWONKE, R., AND RENKL, A. 2010. The expertise reversal effect and worked examples in tutored problem solving. Instructional Science 38, 3, 289–307.
SCHNIPKE, D. L. AND SCRAMS, D. J. 2002. Exploring issues of examinee behavior: Insights gained from response-time analyses. Computer-based testing: Building the foundation for future assessments, 237–266.
SHEN, S. AND CHI, M. 2016. Reinforcement learning: The sooner the better, or the later the better? In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization. ACM, 37–44.
SNOW, R. E. 1991. Aptitude-treatment interaction as a framework for research on individual differences in psychotherapy. Journal of Consulting and Clinical Psychology 59, 2, 205–216.
STAMPER, J., EAGLE, M., BARNES, T., AND CROY, M. 2013. Experimental evaluation of automatic hint generation for a logic tutor. International Journal of Artificial Intelligence in Education 22, 1-2, 3–17.
SUTTON, R. S. AND BARTO, A. G. 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.
TAYLOR, R. S., O’REILLY, T., SINCLAIR, G. P., AND MCNAMARA, D. S. 2006. Enhancing learning of expository science texts in a remedial reading classroom via istart. In Proceedings of the 7th International Conference on Learning Sciences. International Society of the Learning Sciences, 765–770.
TETREAULT, J. R., BOHUS, D., AND LITMAN, D. J. 2007. Estimating the reliability of MDP policies: A confidence interval approach. In Proceedings Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics. 276–283.
TETREAULT, J. R. AND LITMAN, D. J. 2008. A reinforcement learning approach to evaluating state representations in spoken dialogue systems. Speech Communication 50, 8, 683–696.
VAN GOG, T., KESTER, L., AND PAAS, F. 2011. Effects of worked examples, example-problem, and problem-example pairs on novices’ learning. Contemporary Educational Psychology 36, 3, 212–218.
VYGOTSKY, L. 1978. Interaction between learning and development. Readings on the development of children 23, 3, 34–41.
WANG, P., ROWE, J., MIN, W., MOTT, B., AND LESTER, J. 2017. Interactive narrative personalization with deep reinforcement learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. 3852–3858.
WHITEHILL, J. AND MOVELLAN, J. 2018. Approximately optimal teaching of approximately optimal learners. IEEE Transactions on Learning Technologies 11, 2, 152–164.
WILLIAMS, J. D. 2008. The best of both worlds: Unifying conventional dialog systems and POMDPs. In Ninth Annual Conference of the International Speech Communication Association. 1173–1176.
WRIGHT, R., LOSCALZO, S., AND YU, L. 2012. Embedded incremental feature selection for reinforcement learning. In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence. Vol. 1. 263–268.
YANG, Y. AND PEDERSEN, J. O. 1997. A comparative study on feature selection in text categorization. In International Conference on Machine Learning. Vol. 97. 412–420.
YU, L. AND LIU, H. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In International Conference on Machine Learning. Vol. 3. 856–863.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.