Prompting Decision Transformer for Few-shot Policy Generalization
Authors
Authors
- Chuang Gan
- Joshua Tenenbaum
- Mengdi Xu
- Yikang Shen
- Shun Zhang
- Yuchen Lu
- Ding Zhao
Authors
- Chuang Gan
- Joshua Tenenbaum
- Mengdi Xu
- Yikang Shen
- Shun Zhang
- Yuchen Lu
- Ding Zhao
Published on
07/23/2022
Categories
Human can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments. Project page: https://mxu34.github.io/PromptDT/.
Please cite our work using the BibTeX below.
@inproceedings{DBLP:conf/icml/XuSZLZTG22,
author={Mengdi Xu and Yikang Shen and Shun Zhang and Yuchen Lu and Ding Zhao and Joshua B. Tenenbaum and Chuang Gan},
title={Prompting Decision Transformer for Few-Shot Policy Generalization},
year={2022},
cdate={1640995200000},
pages={24631-24645},
url={https://proceedings.mlr.press/v162/xu22g.html},
booktitle={ICML},
crossref={conf/icml/2022}
}