PharmaSimText: A Text-Based Educational Playground filled with RL-LLM Agents That Work Together Even in Disagreement

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Jan 15, 2025
Bahar Radmehr Adish Singla Tanja Käser

Abstract

There has been a growing interest in developing simulated learners to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the learner’s ability to generalize skills across tasks. In this paper, we aim to enhance simulated learners’ generalization capabilities in less-structured text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs). We investigate three types of agents: (i) RL-based agents that utilize natural language for state and action representations, (ii) LLM-based agents that leverage the model’s general knowledge and reasoning through prompting, and (iii) hybrid RL-LLM agents that combine these two strategies to improve agents’ performance and generalizability. To support the development of these agents, we introduce PharmaSimText, a novel benchmark developed with expert-evaluated GPT-4 generations derived from a virtual pharmacy environment designed for practicing diagnostic conversations. After experimenting with RL-based and LLM-based agents using GPT-4 and open-source LLMs along with a wide range of strategies for combining them, we find that RL-based agents are good at completing tasks, but not at asking quality diagnostic questions. Conversely, LLM-based agents are better at asking diagnostic questions, but not at completing tasks. Finally, specific variations of hybrid RL-LLM agents enable us to overcome these limitations. Our findings highlight the potential of combining methods based on RL and LLMs in creating generalizable agents that have solutions close to human ones with the LLM component, while remaining faithful to controlled environments with the RL component. The source code and benchmark are available on GitHub (https://github.com/epfl-ml4ed/PharmaSimText).

How to Cite

Radmehr, B., Singla, A., & Käser, T. (2025). PharmaSimText: A Text-Based Educational Playground filled with RL-LLM Agents That Work Together Even in Disagreement. Journal of Educational Data Mining, 17(1), 1–40. https://doi.org/10.5281/zenodo.14681290
Abstract 113 | HTML Downloads 15 PDF Downloads 10

##plugins.themes.bootstrap3.article.details##

Keywords

reinforcement learning, large language models, text-based educational environments, simulated learners

References
Ahmed, U. Z., Christakis, M., Efremov, A., Fernandez, N., Ghosh, A., Roychoudhury, A., and Singla, A. 2020. Synthesizing tasks for block-based programming. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS 2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds. Curran Associates, Inc., Red Hook, NY, USA, 1–12.

Ahn, M., Zhu, H., Hartikainen, K., Ponte, H., Gupta, A., Levine, S., and Kumar, V. 2019. ROBEL: robotics benchmarks for learning with low-cost robots. In 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, L. P. Kaelbling, D. Kragic, and K. Sugiura, Eds. Proceedings of Machine Learning Research, vol. 100. PMLR, 1300–1313.

Ammanabrolu, P. and Hausknecht, M. J. 2020. Graph constrained reinforcement learning for natural language action spaces. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.

Ankit Pal, M. S. 2024. Openbiollms: Advancing open-source large language models for healthcare and life sciences. https://huggingface.co/aaditya/OpenBioLLM-Llama3-70B.

Barnes, T. and Stamper, J. C. 2008. Toward Automatic Hint Generation for Logic Proof Tutoring Using Historical Student Data. In Proceedings of the International Conference on Intelligent Tutoring Systems (ITS). Springer, Berlin, Germany, 373–382.

Bewersdorff, A., Sessler, K., Baur, A., Kasneci, E., and Nerdel, C. 2023. Assessing Student Errors Experimentation Using Artificial Intelligence and Large Language Models: A Comparative Study with Human Raters. Online publication. abs/2308.06088, 100177.

Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguistics 5, 135–146.

Botelho, A. F., Adjei, S., and Heffernan, N. T. 2016. Modeling interactions across skills: A method to construct and compare models predicting the existence of skill relationships. In Proceedings of the 9th International Conference on Educational Data Mining (EDM 2016). International Educational Data Mining Society, Raleigh, NC, USA, 292–297.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds.

Bubeck, S. et al. 2023. Sparks of Artificial General Intelligence: Early Experiments with GPT-4. Online publication. abs/2303.12712. arXiv preprint, 154 pages.

Bunel, R., Hausknecht, M. J., Devlin, J., Singh, R., and Kohli, P. 2018. Leveraging grammar and reinforcement learning for neural program synthesis. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.

Corbett, A. T. and Anderson, J. R. 2005. Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Modeling and User-Adapted Interaction 4, 253–278.

Denny, P., Gulwani, S., Heffernan, N. T., Käser, T., Moore, S., Rafferty, A. N., and Singla, A. 2024. Generative AI for Education (GAIED): Advances, Opportunities, and Challenges. CoRR abs/2402.01580.

Du, Y., Watkins, O., Wang, Z., Colas, C., Darrell, T., Abbeel, P., Gupta, A., and Andreas, J. 2023. Guiding pretraining in reinforcement learning with large language models. In Proceedings of the 40th International Conference on Machine Learning. PMLR, Honolulu, HI, USA, 8657–8677.

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., Rodriguez, A., Gregerson, A., Spataru, A., Rozière, B., Biron, B., Tang, B., Chern, B., Caucheteux, C., Nayak, C., Bi, C., Marra, C., McConnell, C., Keller, C., Touret, C., Wu, C., Wong, C., Ferrer, C. C., Nikolaidis, C., Allonsius, D., Song, D., Pintz, D., Livshits, D., Esiobu, D., Choudhary, D., Mahajan, D., Garcia-Olano, D., Perino, D., Hupkes, D., Lakomkin, E., AlBadawy, E., Lobanova, E., Dinan, E., Smith, E. M., Radenovic, F., Zhang, F., Synnaeve, G., Lee, G., Anderson, G. L., Nail, G., Mialon, G., Pang, G., Cucurell, G., Nguyen, H., Korevaar, H., Xu, H., Touvron, H., Zarov, I., Ibarra, I. A., Kloumann, I. M., Misra, I., Evtimov, I., Copet, J., Lee, J., Geffert, J., Vranes, J., Park, J., Mahadeokar, J., Shah, J., van der Linde, J., Billock, J., Hong, J., Lee, J., Fu, J., Chi, J., Huang, J., Liu, J., Wang, J., Yu, J., Bitton, J., Spisak, J., Park, J., Rocca, J., Johnstun, J., Saxe, J., Jia, J., Alwala, K. V., Upasani, K., Plawiak, K., Li, K., Heafield, K., Stone, K., and et al. 2024. The llama 3 herd of models. Advance online publication. https://doi.org/10.48550/arXiv.2407.21783.

Efremov, A., Ghosh, A., and Singla, A. 2020. Zero-shot learning of hint policy via reinforcement learning and program synthesis. In Proceedings of the International Conference on Educational Data Mining (EDM), A. N. Rafferty, J. Whitehill, C. Romero, and V. Cavalli-Sforza, Eds. International Educational Data Mining Society.

Faucon, L., Kidzinski, L., and Dillenbourg, P. 2016. Semi-Markov Model for Simulating MOOC Students. In Proceedings of the 9th International Conference on Educational Data Mining, EDM 2016, Raleigh, North Carolina, USA, June 29 - July 2, 2016. International Educational Data Mining Society (IEDMS), 358–363.

He, J., Chen, J., He, X., Gao, J., Li, L., Deng, L., and Ostendorf, M. 2016. Deep Reinforcement Learning with a Natural Language Action Space. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). The Association for Computer Linguistics.

He-Yueya, J. and Singla, A. 2021. Quizzing policy using reinforcement learning for inferring the student knowledge state. In Proceedings of the 14th International Conference on Educational Data Mining, EDM 2021, virtual, June 29 - July 2, 2021, S. I. Hsiao, S. S. Sahebi, F. Bouchet, and J. Vie, Eds. International Educational Data Mining Society.

Hirunyasiri, D., Thomas, D. R., Lin, J., Koedinger, K. R., and Aleven, V. 2023. Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise Given to Students in Synthetic Dialogues. CoRR abs/2307.02018.

Ichter, B., Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., Julian, R., Kalashnikov, D., Levine, S., Lu, Y., Parada, C., Rao, K., Sermanet, P., Toshev, A., Vanhoucke, V., Xia, F., Xiao, T., Xu, P., Yan, M., Brown, N., Ahn, M., Cortes, O., Sievers, N., Tan, C., Xu, S., Reyes, D., Rettinghouse, J., Quiambao, J., Pastor, P., Luu, L., Lee, K., Kuang, Y., Jesmonth, S., Joshi, N. J., Jeffrey, K., Ruano, R. J., Hsu, J., Gopalakrishnan, K., David, B., Zeng, A., and Fu, C. K. 2022. Do as I can, not as I say: Grounding language in robotic affordances. In Conference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, K. Liu, D. Kulic, and J. Ichnowski, Eds. Proceedings of Machine Learning Research, vol. 205. PMLR, 287–318.

Jiang, A. Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Chaplot, D. S., de Las Casas, D., Hanna, E. B., Bressand, F., Lengyel, G., Bour, G., Lample, G., Lavaud, L. R., Saulnier, L., Lachaux, M., Stock, P., Subramanian, S., Yang, S., Antoniak, S., Scao, T. L., Gervet, T., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. 2024. Mixtral of experts. Advance online publication. https://doi.org/10.48550/arXiv.2401.04088.

Ju, S., Chi, M., and Zhou, G. 2020. Pick the Moment: Identifying Critical Pedagogical Decisions Using Long-Short Term Rewards. In Proceedings of the International Conference on Educational Data Mining (EDM). International Educational Data Mining Society.

Käser, T. and Alexandron, G. 2023. Simulated learners in educational technology: A systematic literature review and a turing-like test. International Journal of Artificial Intelligence in Education, 1–41.

Kumar, A. P., Nayak, A., K, M. S., Chaitanya, and Ghosh, K. 2023. A Novel Framework for the Generation of Multiple Choice Question Stems Using Semantic and Machine-Learning Techniques. International Journal of Artificial Intelligence in Education (IJAIED), 1–44.

Kwon, M., Xie, S. M., Bullard, K., and Sadigh, D. 2023. Reward Design with Language Models. In Proceedings of the International Conference on Learning Representations (ICLR). OpenReview.net.

Lee, U., Lee, S., Koh, J., Jeong, Y., Jung, H., Byun, G., Lee, Y., Moon, J., Lim, J., and Kim, H. 2023. Generative agent for teacher training: Designing educational problem-solving simulations with large language model-based agents for pre-service teachers. NeurIPS’23 Workshop on Generative AI for Education (GAIED).

Li, N., Cohen, W. W., Koedinger, K. R., and Matsuda, N. 2011. A Machine Learning Approach for Automatic Student Model Discovery. In Proceedings of the International Conference on Educational Data Mining (EDM). www.educationaldatamining.org, 31–40.

Li, S., Puig, X., Paxton, C., Du, Y., Wang, C., Fan, L., Chen, T., Huang, D., Akyürek, E., Anandkumar, A., Andreas, J., Mordatch, I., Torralba, A., and Zhu, Y. 2022. Pre-Trained Language Models for Interactive Decision-Making. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS). Neural Information Processing Systems Foundation.

Lin, B. Y., Fu, Y., Yang, K., Brahman, F., Huang, S., Bhagavatula, C., Ammanabrolu, P., Choi, Y., and Ren, X. 2023. Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds.

MacLellan, C. J. and Gupta, A. 2021. Learning Expert Models for Educationally Relevant Tasks using Reinforcement Learning. In Proceedings of the International Conference on Educational Data Mining (EDM). International Educational Data Mining Society, Online.

MacLellan, C. J., Harpstead, E., Patel, R., and Koedinger, K. R. 2016. The Apprentice Learner Architecture: Closing the Loop between Learning Theory and Educational Data. In Proceedings of the International Conference on Educational Data Mining (EDM). International Educational Data Mining Society (IEDMS), 151–158.

Majumder, B. P., Mishra, B. D., Jansen, P. A., Tafjord, O., Tandon, N., Zhang, L., Callison-Burch, C., and Clark, P. 2023. Clin: A continually learning language agent for rapid task adaptation and generalization. CoRR abs/2310.10134.

Markel, J. M., Opferman, S. G., Landay, J. A., and Piech, C. 2023. GPTeach: Interactive TA Training with GPT-based Students. In Proceedings of the Conference on Learning @ Scale (L@S). ACM, 226–236.

McIlroy-Young, R., Sen, S., Kleinberg, J. M., and Anderson, A. 2020. Aligning superhuman AI with human behavior: Chess as a model system. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, R. Gupta, Y. Liu, J. Tang, and B. A. Prakash, Eds. ACM, 1677–1687.

McNichols, H., Feng, W., Lee, J., Scarlatos, A., Smith, D., Woodhead, S., and Lan, A. 2023. Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning. NeurIPS’23 Workshop on Generative AI for Education (GAIED).

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. A. 2013. Playing Atari with Deep Reinforcement Learning. CoRR abs/1312.5602.

Mui, J., Lin, F., and Dewan, M. A. A. 2021. Multi-Armed Bandit Algorithms for Adaptive Learning: A Survey. In Proceedings of the International Conference on Artificial Intelligence in Education (AIED). Lecture Notes in Computer Science, vol. 12748. Springer, 273–278.

Nazaretsky, T., Hershkovitz, S., and Alexandron, G. 2019. Kappa Learning: A New Item-Similarity Method for Clustering Educational Items from Response Data. In Proceedings of the International Conference on Educational Data Mining (EDM). International Educational Data Mining Society (IEDMS).

Nguyen, M. H., Tschiatschek, S., and Singla, A. 2024. Large language models for in-context student modeling: Synthesizing student’s behavior in visual programming. In Proceedings of the 17th International Conference on Educational Data Mining, EDM 2024, Atlanta, Georgia, USA, July 14-17, 2024, D. A. Joyner, B. Paaßen, and C. D. Epp, Eds. International Educational Data Mining Society.

Nikishin, E., Schwarzer, M., D’Oro, P., Bacon, P., and Courville, A. C. 2022. The primacy bias in deep reinforcement learning. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, and S. Sabato, Eds. Proceedings of Machine Learning Research, vol. 162. PMLR, 16828–16847.

Nottingham, K., Ammanabrolu, P., Suhr, A., Choi, Y., Hajishirzi, H., Singh, S., and Fox, R. 2023. Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling. In Proceedings of the International Conference on Machine Learning (ICML). PMLR, 26311–26325.

OpenAI. 2023. GPT-4 technical report. Online publication. https://doi.org/10.48550/arXiv.2303.08774.

Padurean, V., Tzannetos, G., and Singla, A. 2024. Neural Task Synthesis for Visual Programming. Transactions of Machine Learning Research (TMLR).

Pan, A., Shern, C. J., Zou, A., Li, N., Basart, S., Woodside, T., Ng, J., Zhang, H., Emmons, S., and Hendrycks, D. 2023. Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark. In Proceedings of the International Conference on Machine Learning (ICML). PMLR, 26837–26867.

Pankiewicz, M. and Baker, R. S. 2023. Large Language Models (GPT) for Automating Feedback on Programming Assignments. CoRR abs/2307.00150.

Pardos, Z. A. and Bhandari, S. 2023. Learning Gain Differences between ChatGPT and Human Tutor Generated Algebra Hints. CoRR abs/2302.06871.

Pareto, L. 2014. A Teachable Agent Game Engaging Primary School Children to Learn Arithmetic Concepts and Reasoning. International Journal of Artificial Intelligence in Education (IJAIED) 24, 3, 251–283.

Phung, T., Padurean, V., Cambronero, J., Gulwani, S., Kohn, T., Majumdar, R., Singla, A., and Soares, G. 2023a. Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors. In Proceedings of the Conference on International Computing Education Research - Volume 2 (ICER V.2). ACM.

Phung, T., Padurean, V., Cambronero, J., Gulwani, S., Kohn, T., Majumdar, R., Singla, A., and Soares, G. 2023b. Generative AI for programming education: Benchmarking chatgpt, gpt-4, and human tutors. In Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 2, ICER 2023, Chicago, IL, USA, August 7-11, 2023, K. Fisler, P. Denny, D. Franklin, and M. Hamilton, Eds. ACM, 41–42.

Phung, T., Pădurean, V., Singh, A., Brooks, C., Cambronero, J., Gulwani, S., Singla, A., and Soares, G. 2024. Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation. In Proceedings of the International Learning Analytics and Knowledge Conference (LAK). ACM, 333–344.

Radmehr, B., Singla, A., and Käser, T. 2024. Towards generalizable agents in text-based educational environments: A study of integrating rl with llms. In Proceedings of the 17th International Conference on Educational Data Mining, B. Paaßen and C. D. Epp, Eds. International Educational Data Mining Society, Atlanta, Georgia, USA, 181–193.

Rafferty, A. N., Brunskill, E., Griffiths, T. L., and Shafto, P. 2016. Faster Teaching via POMDP Planning. Cognitive Science 40, 6, 1290–1332.

Rafferty, A. N., Williams, J. J., and Ying, H. 2019. Statistical Consequences of Using Multi-Armed Bandits to Conduct Adaptive Educational Experiments. Journal of Educational Data Mining (JEDM) 11, 47–79.

Robinson, K., Jahanian, K., and Reich, J. 2018. Using online practice spaces to investigate challenges in enacting principles of equitable computer science teaching. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (SIGCSE 2018), T. Barnes, D. D. Garcia, E. K. Hawthorne, and M. A. Pérez-Quiñones, Eds. Association for Computing Machinery, New York, NY, USA, 882–887.

Sarsa, S., Denny, P., Hellas, A., and Leinonen, J. 2022. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 18th ACM Conference on International Computing Education Research (ICER 2022), J. Vahrenhold, K. Fisler, M. Hauswirth, and D. Franklin, Eds. Association for Computing Machinery, New York, NY, USA, 27–43.

Schmucker, R., Xia, M., Azaria, A., and Mitchell, T. 2023. Ruffle&riley: Towards the automated induction of conversational tutoring systems. Proceedings of the NeurIPS 2023 Workshop on Generative AI for Education (GAIED).

Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K., and Yao, S. 2023. Reflexion: Language agents with verbal reinforcement learning. CoRR abs/2303.11366.

Singla, A., Rafferty, A. N., Radanovic, G., and Heffernan, N. T. 2021. Reinforcement Learning for Education: Opportunities and Challenges. Online publication. https://arxiv.org/abs/2107.08828.

Singla, A. and Theodoropoulos, N. 2022. From {Solution Synthesis} to {Student Attempt Synthesis} for Block-Based Visual Programming Tasks. In Proceedings of the International Conference on Educational Data Mining (EDM). International Educational Data Mining Society, Durham, UK.

Sutton, R. S. and Barto, A. G. 2018. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, USA.

Tack, A. and Piech, C. 2022. The AI teacher test: Measuring the pedagogical ability of blender and GPT-3 in educational dialogues. In Proceedings of the 15th International Conference on Educational Data Mining (EDM 2022). International Educational Data Mining Society, Durham, UK.

Team, G. 2024. Gemma. Available on Kaggle.

Towers, M., Kwiatkowski, A., Terry, J. K., Balis, J. U., Cola, G. D., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J. J., Tan, H., and Younis, O. G. 2024. Gymnasium: A standard interface for reinforcement learning environments. Advance online publication. abs/2407.17032.

Wang, R., Jansen, P. A., Côté, M.-A., and Ammanabrolu, P. 2022. Scienceworld: Is your agent smarter than a 5th grader? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 11279–11298.

Wang, R., Yu, H., Zhang, W. S., Qi, Z., Sap, M., Bisk, Y., Neubig, G., and Zhu, H. 2024. Sotopia-

: Interactive learning of socially intelligent language agents. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, L. Ku, A. Martins, and V. Srikumar, Eds. Association for Computational Linguistics, 12912–12940.

Whitehill, J. and Movellan, J. R. 2018. Approximately Optimal Teaching of Approximately Optimal Learners. IEEE Transactions of Learning Technololy 11, 2, 152–164.

Wu, C., Kreidieh, A., Parvate, K., Vinitsky, E., and Bayen, A. M. 2017. Flow: Architecture and benchmarking for reinforcement learning in traffic control. Advance online publication. http://arxiv.org/abs/1710.05465. No page numbers, article available on arXiv.

Xu, C., Ding, W., Lyu, W., Liu, Z., Wang, S., He, Y., Hu, H., Zhao, D., and Li, B. 2022. Safebench: A benchmarking platform for safety evaluation of autonomous vehicles. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds.

Yao, S., Narasimhan, K., and Hausknecht, M. J. 2021. Reading and acting while blindfolded: The need for semantics in text game agents. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tür, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, Eds. Association for Computational Linguistics, 3097–3102.

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., and Cao, Y. 2023. React: Synergizing reasoning and acting in language models. In Proceedings of the Eleventh International Conference on Learning Representations (ICLR). OpenReview.net.

Zhou, G., Azizsoltani, H., Ausin, M. S., Barnes, T., and Chi, M. 2019. Hierarchical reinforcement learning for pedagogical policy induction. In Proceedings of the International Conference on Artificial Intelligence in Education (AIED). Lecture Notes in Computer Science, vol. 11625. Springer International Publishing, Cham, Switzerland, 544–556.
Section
Extended Articles from the EDM 2024 Conference

Most read articles by the same author(s)