TGRL: Teacher Guided Reinforcement Learning Algorithm for POMDPs

ICLR

Cite Paper

Authors

Idan Shenfeld
Zhang-Wei Hong
Pulkit Agrawal
Aviv Tamar

Published on

05/05/2023

Categories

ICLR

In many real-world problems, an agent must operate in an uncertain and partially observable environment. Due to partial information, a policy directly trained to operate from these restricted observations tends to perform poorly. In some scenarios, during training more information about the environment is available, which can be utilized to find a superior policy. Because this privileged information is unavailable at deployment, such a policy cannot be deployed. The teacher-student paradigm overcomes this challenge by using actions of privileged (or teacher) policy as the target for training the deployable (or student) policy operating from the restricted observation space using supervised learning. However, due to information asymmetry, it is not always feasible for the student to perfectly mimic the teacher. We provide a principled solution to this problem, wherein the student policy dynamically balances between following the teacher’s guidance and utilizing reinforcement learning to solve the partially observed task directly. The proposed algorithm is evaluated on diverse domains and fares favorably against strong baselines.

This work was presented at ICLR 2023, Reincarnating Reinforcement Learning Workshop.

Please cite our work using the BibTeX below.

@inproceedings{
shenfeld2023tgrl,
title={{TGRL}: Teacher Guided Reinforcement Learning Algorithm for {POMDP}s},
author={Idan Shenfeld and Zhang-Wei Hong and Aviv Tamar and Pulkit Agrawal},
booktitle={Workshop on Reincarnating Reinforcement Learning at ICLR 2023},
year={2023},
url={https://openreview.net/forum?id=kTqjkIvjj7}
}