PharmaSimText: A Text-Based Educational Playground filled with RL-LLM Agents That Work Together Even in Disagreement
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
There has been a growing interest in developing simulated learners to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the learner’s ability to generalize skills across tasks. In this paper, we aim to enhance simulated learners’ generalization capabilities in less-structured text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs). We investigate three types of agents: (i) RL-based agents that utilize natural language for state and action representations, (ii) LLM-based agents that leverage the model’s general knowledge and reasoning through prompting, and (iii) hybrid RL-LLM agents that combine these two strategies to improve agents’ performance and generalizability. To support the development of these agents, we introduce PharmaSimText
, a novel benchmark developed with expert-evaluated GPT-4 generations derived from a virtual pharmacy environment designed for practicing diagnostic conversations. After experimenting with RL-based and LLM-based agents using GPT-4 and open-source LLMs along with a wide range of strategies for combining them, we find that RL-based agents are good at completing tasks, but not at asking quality diagnostic questions. Conversely, LLM-based agents are better at asking diagnostic questions, but not at completing tasks. Finally, specific variations of hybrid RL-LLM agents enable us to overcome these limitations. Our findings highlight the potential of combining methods based on RL and LLMs in creating generalizable agents that have solutions close to human ones with the LLM component, while remaining faithful to controlled environments with the RL component. The source code and benchmark are available on GitHub (https://github.com/epfl-ml4ed/PharmaSimText).
How to Cite
##plugins.themes.bootstrap3.article.details##
reinforcement learning, large language models, text-based educational environments, simulated learners
Ahn, M., Zhu, H., Hartikainen, K., Ponte, H., Gupta, A., Levine, S., and Kumar, V. 2019. ROBEL: robotics benchmarks for learning with low-cost robots. In 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, L. P. Kaelbling, D. Kragic, and K. Sugiura, Eds. Proceedings of Machine Learning Research, vol. 100. PMLR, 1300–1313.
Ammanabrolu, P. and Hausknecht, M. J. 2020. Graph constrained reinforcement learning for natural language action spaces. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
Ankit Pal, M. S. 2024. Openbiollms: Advancing open-source large language models for healthcare and life sciences. https://huggingface.co/aaditya/OpenBioLLM-Llama3-70B.
Barnes, T. and Stamper, J. C. 2008. Toward Automatic Hint Generation for Logic Proof Tutoring Using Historical Student Data. In Proceedings of the International Conference on Intelligent Tutoring Systems (ITS). Springer, Berlin, Germany, 373–382.
Bewersdorff, A., Sessler, K., Baur, A., Kasneci, E., and Nerdel, C. 2023. Assessing Student Errors Experimentation Using Artificial Intelligence and Large Language Models: A Comparative Study with Human Raters. Online publication. abs/2308.06088, 100177.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguistics 5, 135–146.
Botelho, A. F., Adjei, S., and Heffernan, N. T. 2016. Modeling interactions across skills: A method to construct and compare models predicting the existence of skill relationships. In Proceedings of the 9th International Conference on Educational Data Mining (EDM 2016). International Educational Data Mining Society, Raleigh, NC, USA, 292–297.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds.
Bubeck, S. et al. 2023. Sparks of Artificial General Intelligence: Early Experiments with GPT-4. Online publication. abs/2303.12712. arXiv preprint, 154 pages.
Bunel, R., Hausknecht, M. J., Devlin, J., Singh, R., and Kohli, P. 2018. Leveraging grammar and reinforcement learning for neural program synthesis. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
Corbett, A. T. and Anderson, J. R. 2005. Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Modeling and User-Adapted Interaction 4, 253–278.
Denny, P., Gulwani, S., Heffernan, N. T., Käser, T., Moore, S., Rafferty, A. N., and Singla, A. 2024. Generative AI for Education (GAIED): Advances, Opportunities, and Challenges. CoRR abs/2402.01580.
Du, Y., Watkins, O., Wang, Z., Colas, C., Darrell, T., Abbeel, P., Gupta, A., and Andreas, J. 2023. Guiding pretraining in reinforcement learning with large language models. In Proceedings of the 40th International Conference on Machine Learning. PMLR, Honolulu, HI, USA, 8657–8677.
Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., Rodriguez, A., Gregerson, A., Spataru, A., Rozière, B., Biron, B., Tang, B., Chern, B., Caucheteux, C., Nayak, C., Bi, C., Marra, C., McConnell, C., Keller, C., Touret, C., Wu, C., Wong, C., Ferrer, C. C., Nikolaidis, C., Allonsius, D., Song, D., Pintz, D., Livshits, D., Esiobu, D., Choudhary, D., Mahajan, D., Garcia-Olano, D., Perino, D., Hupkes, D., Lakomkin, E., AlBadawy, E., Lobanova, E., Dinan, E., Smith, E. M., Radenovic, F., Zhang, F., Synnaeve, G., Lee, G., Anderson, G. L., Nail, G., Mialon, G., Pang, G., Cucurell, G., Nguyen, H., Korevaar, H., Xu, H., Touvron, H., Zarov, I., Ibarra, I. A., Kloumann, I. M., Misra, I., Evtimov, I., Copet, J., Lee, J., Geffert, J., Vranes, J., Park, J., Mahadeokar, J., Shah, J., van der Linde, J., Billock, J., Hong, J., Lee, J., Fu, J., Chi, J., Huang, J., Liu, J., Wang, J., Yu, J., Bitton, J., Spisak, J., Park, J., Rocca, J., Johnstun, J., Saxe, J., Jia, J., Alwala, K. V., Upasani, K., Plawiak, K., Li, K., Heafield, K., Stone, K., and et al. 2024. The llama 3 herd of models. Advance online publication. https://doi.org/10.48550/arXiv.2407.21783.
Efremov, A., Ghosh, A., and Singla, A. 2020. Zero-shot learning of hint policy via reinforcement learning and program synthesis. In Proceedings of the International Conference on Educational Data Mining (EDM), A. N. Rafferty, J. Whitehill, C. Romero, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society.
Faucon, L., Kidzinski, L., and Dillenbourg, P. 2016. Semi-Markov Model for Simulating MOOC Students. In Proceedings of the 9th International Conference on Educational Data Mining, EDM 2016, Raleigh, North Carolina, USA, June 29 - July 2, 2016. International Educational Data Mining Society (IEDMS), 358–363.
He, J., Chen, J., He, X., Gao, J., Li, L., Deng, L., and Ostendorf, M. 2016. Deep Reinforcement Learning with a Natural Language Action Space. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). The Association for Computer Linguistics.
He-Yueya, J. and Singla, A. 2021. Quizzing policy using reinforcement learning for inferring the student knowledge state. In Proceedings of the 14th International Conference on Educational Data Mining, EDM 2021, virtual, June 29 - July 2, 2021, S. I. Hsiao, S. S. Sahebi, F. Bouchet, and J. Vie, Eds. International Educational Data Mining Society.
Hirunyasiri, D., Thomas, D. R., Lin, J., Koedinger, K. R., and Aleven, V. 2023. Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise Given to Students in Synthetic Dialogues. CoRR abs/2307.02018.
Ichter, B., Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., Julian, R., Kalashnikov, D., Levine, S., Lu, Y., Parada, C., Rao, K., Sermanet, P., Toshev, A., Vanhoucke, V., Xia, F., Xiao, T., Xu, P., Yan, M., Brown, N., Ahn, M., Cortes, O., Sievers, N., Tan, C., Xu, S., Reyes, D., Rettinghouse, J., Quiambao, J., Pastor, P., Luu, L., Lee, K., Kuang, Y., Jesmonth, S., Joshi, N. J., Jeffrey, K., Ruano, R. J., Hsu, J., Gopalakrishnan, K., David, B., Zeng, A., and Fu, C. K. 2022. Do as I can, not as I say: Grounding language in robotic affordances. In Conference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, K. Liu, D. Kulic, and J. Ichnowski, Eds. Proceedings of Machine Learning Research, vol. 205. PMLR, 287–318.
Jiang, A. Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Chaplot, D. S., de Las Casas, D., Hanna, E. B., Bressand, F., Lengyel, G., Bour, G., Lample, G., Lavaud, L. R., Saulnier, L., Lachaux, M., Stock, P., Subramanian, S., Yang, S., Antoniak, S., Scao, T. L., Gervet, T., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. 2024. Mixtral of experts. Advance online publication. https://doi.org/10.48550/arXiv.2401.04088.
Ju, S., Chi, M., and Zhou, G. 2020. Pick the Moment: Identifying Critical Pedagogical Decisions Using Long-Short Term Rewards. In Proceedings of the International Conference on Educational Data Mining (EDM). International Educational Data Mining Society.
Käser, T. and Alexandron, G. 2023. Simulated learners in educational technology: A systematic literature review and a turing-like test. International Journal of Artificial Intelligence in Education, 1–41.
Kumar, A. P., Nayak, A., K, M. S., Chaitanya, and Ghosh, K. 2023. A Novel Framework for the Generation of Multiple Choice Question Stems Using Semantic and Machine-Learning Techniques. International Journal of Artificial Intelligence in Education (IJAIED), 1–44.
Kwon, M., Xie, S. M., Bullard, K., and Sadigh, D. 2023. Reward Design with Language Models. In Proceedings of the International Conference on Learning Representations (ICLR). OpenReview.net.
Lee, U., Lee, S., Koh, J., Jeong, Y., Jung, H., Byun, G., Lee, Y., Moon, J., Lim, J., and Kim, H. 2023. Generative agent for teacher training: Designing educational problem-solving simulations with large language model-based agents for pre-service teachers. NeurIPS’23 Workshop on Generative AI for Education (GAIED).
Li, N., Cohen, W. W., Koedinger, K. R., and Matsuda, N. 2011. A Machine Learning Approach for Automatic Student Model Discovery. In Proceedings of the International Conference on Educational Data Mining (EDM). www.educationaldatamining.org, 31–40.
Li, S., Puig, X., Paxton, C., Du, Y., Wang, C., Fan, L., Chen, T., Huang, D., Akyürek, E., Anandkumar, A., Andreas, J., Mordatch, I., Torralba, A., and Zhu, Y. 2022. Pre-Trained Language Models for Interactive Decision-Making. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS). Neural Information Processing Systems Foundation.
Lin, B. Y., Fu, Y., Yang, K., Brahman, F., Huang, S., Bhagavatula, C., Ammanabrolu, P., Choi, Y., and Ren, X. 2023. Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds.
MacLellan, C. J. and Gupta, A. 2021. Learning Expert Models for Educationally Relevant Tasks using Reinforcement Learning. In Proceedings of the International Conference on Educational Data Mining (EDM). International Educational Data Mining Society, Online.
MacLellan, C. J., Harpstead, E., Patel, R., and Koedinger, K. R. 2016. The Apprentice Learner Architecture: Closing the Loop between Learning Theory and Educational Data. In Proceedings of the International Conference on Educational Data Mining (EDM). International Educational Data Mining Society (IEDMS), 151–158.
Majumder, B. P., Mishra, B. D., Jansen, P. A., Tafjord, O., Tandon, N., Zhang, L., Callison-Burch, C., and Clark, P. 2023. Clin: A continually learning language agent for rapid task adaptation and generalization. CoRR abs/2310.10134.
Markel, J. M., Opferman, S. G., Landay, J. A., and Piech, C. 2023. GPTeach: Interactive TA Training with GPT-based Students. In Proceedings of the Conference on Learning @ Scale (L@S). ACM, 226–236.
McIlroy-Young, R., Sen, S., Kleinberg, J. M., and Anderson, A. 2020. Aligning superhuman AI with human behavior: Chess as a model system. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, R. Gupta, Y. Liu, J. Tang, and B. A. Prakash, Eds. ACM, 1677–1687.
McNichols, H., Feng, W., Lee, J., Scarlatos, A., Smith, D., Woodhead, S., and Lan, A. 2023. Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning. NeurIPS’23 Workshop on Generative AI for Education (GAIED).
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. A. 2013. Playing Atari with Deep Reinforcement Learning. CoRR abs/1312.5602.
Mui, J., Lin, F., and Dewan, M. A. A. 2021. Multi-Armed Bandit Algorithms for Adaptive Learning: A Survey. In Proceedings of the International Conference on Artificial Intelligence in Education (AIED). Lecture Notes in Computer Science, vol. 12748. Springer, 273–278.
Nazaretsky, T., Hershkovitz, S., and Alexandron, G. 2019. Kappa Learning: A New Item-Similarity Method for Clustering Educational Items from Response Data. In Proceedings of the International Conference on Educational Data Mining (EDM). International Educational Data Mining Society (IEDMS).
Nguyen, M. H., Tschiatschek, S., and Singla, A. 2024. Large language models for in-context student modeling: Synthesizing student’s behavior in visual programming. In Proceedings of the 17th International Conference on Educational Data Mining, EDM 2024, Atlanta, Georgia, USA, July 14-17, 2024, D. A. Joyner, B. Paaßen, and C. D. Epp, Eds. International Educational Data Mining Society.
Nikishin, E., Schwarzer, M., D’Oro, P., Bacon, P., and Courville, A. C. 2022. The primacy bias in deep reinforcement learning. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, and S. Sabato, Eds. Proceedings of Machine Learning Research, vol. 162. PMLR, 16828–16847.
Nottingham, K., Ammanabrolu, P., Suhr, A., Choi, Y., Hajishirzi, H., Singh, S., and Fox, R. 2023. Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling. In Proceedings of the International Conference on Machine Learning (ICML). PMLR, 26311–26325.
OpenAI. 2023. GPT-4 technical report. Online publication. https://doi.org/10.48550/arXiv.2303.08774.
Padurean, V., Tzannetos, G., and Singla, A. 2024. Neural Task Synthesis for Visual Programming. Transactions of Machine Learning Research (TMLR).
Pan, A., Shern, C. J., Zou, A., Li, N., Basart, S., Woodside, T., Ng, J., Zhang, H., Emmons, S., and Hendrycks, D. 2023. Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark. In Proceedings of the International Conference on Machine Learning (ICML). PMLR, 26837–26867.
Pankiewicz, M. and Baker, R. S. 2023. Large Language Models (GPT) for Automating Feedback on Programming Assignments. CoRR abs/2307.00150.
Pardos, Z. A. and Bhandari, S. 2023. Learning Gain Differences between ChatGPT and Human Tutor Generated Algebra Hints. CoRR abs/2302.06871.
Pareto, L. 2014. A Teachable Agent Game Engaging Primary School Children to Learn Arithmetic Concepts and Reasoning. International Journal of Artificial Intelligence in Education (IJAIED) 24, 3, 251–283.
Phung, T., Padurean, V., Cambronero, J., Gulwani, S., Kohn, T., Majumdar, R., Singla, A., and Soares, G. 2023a. Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors. In Proceedings of the Conference on International Computing Education Research - Volume 2 (ICER V.2). ACM.
Phung, T., Padurean, V., Cambronero, J., Gulwani, S., Kohn, T., Majumdar, R., Singla, A., and Soares, G. 2023b. Generative AI for programming education: Benchmarking chatgpt, gpt-4, and human tutors. In Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 2, ICER 2023, Chicago, IL, USA, August 7-11, 2023, K. Fisler, P. Denny, D. Franklin, and M. Hamilton, Eds. ACM, 41–42.
Phung, T., Pădurean, V., Singh, A., Brooks, C., Cambronero, J., Gulwani, S., Singla, A., and Soares, G. 2024. Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation. In Proceedings of the International Learning Analytics and Knowledge Conference (LAK). ACM, 333–344.
Radmehr, B., Singla, A., and Käser, T. 2024. Towards generalizable agents in text-based educational environments: A study of integrating rl with llms. In Proceedings of the 17th International Conference on Educational Data Mining, B. Paaßen and C. D. Epp, Eds. International Educational Data Mining Society, Atlanta, Georgia, USA, 181–193.
Rafferty, A. N., Brunskill, E., Griffiths, T. L., and Shafto, P. 2016. Faster Teaching via POMDP Planning. Cognitive Science 40, 6, 1290–1332.
Rafferty, A. N., Williams, J. J., and Ying, H. 2019. Statistical Consequences of Using Multi-Armed Bandits to Conduct Adaptive Educational Experiments. Journal of Educational Data Mining (JEDM) 11, 47–79.
Robinson, K., Jahanian, K., and Reich, J. 2018. Using online practice spaces to investigate challenges in enacting principles of equitable computer science teaching. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (SIGCSE 2018), T. Barnes, D. D. Garcia, E. K. Hawthorne, and M. A. Pérez-Quiñones, Eds. Association for Computing Machinery, New York, NY, USA, 882–887.
Sarsa, S., Denny, P., Hellas, A., and Leinonen, J. 2022. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 18th ACM Conference on International Computing Education Research (ICER 2022), J. Vahrenhold, K. Fisler, M. Hauswirth, and D. Franklin, Eds. Association for Computing Machinery, New York, NY, USA, 27–43.
Schmucker, R., Xia, M., Azaria, A., and Mitchell, T. 2023. Ruffle&riley: Towards the automated induction of conversational tutoring systems. Proceedings of the NeurIPS 2023 Workshop on Generative AI for Education (GAIED).
Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K., and Yao, S. 2023. Reflexion: Language agents with verbal reinforcement learning. CoRR abs/2303.11366.
Singla, A., Rafferty, A. N., Radanovic, G., and Heffernan, N. T. 2021. Reinforcement Learning for Education: Opportunities and Challenges. Online publication. https://arxiv.org/abs/2107.08828.
Singla, A. and Theodoropoulos, N. 2022. From {Solution Synthesis} to {Student Attempt Synthesis} for Block-Based Visual Programming Tasks. In Proceedings of the International Conference on Educational Data Mining (EDM). International Educational Data Mining Society, Durham, UK.
Sutton, R. S. and Barto, A. G. 2018. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, USA.
Tack, A. and Piech, C. 2022. The AI teacher test: Measuring the pedagogical ability of blender and GPT-3 in educational dialogues. In Proceedings of the 15th International Conference on Educational Data Mining (EDM 2022). International Educational Data Mining Society, Durham, UK.
Team, G. 2024. Gemma. Available on Kaggle.
Towers, M., Kwiatkowski, A., Terry, J. K., Balis, J. U., Cola, G. D., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J. J., Tan, H., and Younis, O. G. 2024. Gymnasium: A standard interface for reinforcement learning environments. Advance online publication. abs/2407.17032.
Wang, R., Jansen, P. A., Côté, M.-A., and Ammanabrolu, P. 2022. Scienceworld: Is your agent smarter than a 5th grader? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 11279–11298.
Wang, R., Yu, H., Zhang, W. S., Qi, Z., Sap, M., Bisk, Y., Neubig, G., and Zhu, H. 2024. Sotopia-
: Interactive learning of socially intelligent language agents. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, L. Ku, A. Martins, and V. Srikumar, Eds. Association for Computational Linguistics, 12912–12940.
Whitehill, J. and Movellan, J. R. 2018. Approximately Optimal Teaching of Approximately Optimal Learners. IEEE Transactions of Learning Technololy 11, 2, 152–164.
Wu, C., Kreidieh, A., Parvate, K., Vinitsky, E., and Bayen, A. M. 2017. Flow: Architecture and benchmarking for reinforcement learning in traffic control. Advance online publication. http://arxiv.org/abs/1710.05465. No page numbers, article available on arXiv.
Xu, C., Ding, W., Lyu, W., Liu, Z., Wang, S., He, Y., Hu, H., Zhao, D., and Li, B. 2022. Safebench: A benchmarking platform for safety evaluation of autonomous vehicles. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds.
Yao, S., Narasimhan, K., and Hausknecht, M. J. 2021. Reading and acting while blindfolded: The need for semantics in text game agents. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tür, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, Eds. Association for Computational Linguistics, 3097–3102.
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., and Cao, Y. 2023. React: Synergizing reasoning and acting in language models. In Proceedings of the Eleventh International Conference on Learning Representations (ICLR). OpenReview.net.
Zhou, G., Azizsoltani, H., Ausin, M. S., Barnes, T., and Chi, M. 2019. Hierarchical reinforcement learning for pedagogical policy induction. In Proceedings of the International Conference on Artificial Intelligence in Education (AIED). Lecture Notes in Computer Science, vol. 11625. Springer International Publishing, Cham, Switzerland, 544–556.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.