TGRL: Teacher Guided Reinforcement Learning Algorithm for POMDPs



Published on




In many real-world problems, an agent must operate in an uncertain and partially observable environment. Due to partial information, a policy directly trained to operate from these restricted observations tends to perform poorly. In some scenarios, during training more information about the environment is available, which can be utilized to find a superior policy. Because this privileged information is unavailable at deployment, such a policy cannot be deployed. The teacher-student paradigm overcomes this challenge by using actions of privileged (or teacher) policy as the target for training the deployable (or student) policy operating from the restricted observation space using supervised learning. However, due to information asymmetry, it is not always feasible for the student to perfectly mimic the teacher. We provide a principled solution to this problem, wherein the student policy dynamically balances between following the teacher’s guidance and utilizing reinforcement learning to solve the partially observed task directly. The proposed algorithm is evaluated on diverse domains and fares favorably against strong baselines.

This work was presented at ICLR 2023, Reincarnating Reinforcement Learning Workshop.

Please cite our work using the BibTeX below.

title={{TGRL}: Teacher Guided Reinforcement Learning Algorithm for {POMDP}s},
author={Idan Shenfeld and Zhang-Wei Hong and Aviv Tamar and Pulkit Agrawal},
booktitle={Workshop on Reincarnating Reinforcement Learning at ICLR 2023},
Close Modal