Research

Imitation learning from observations

Deep Learning

Authors

  • Chuang Gan
  • Chao Yang
  • Xiaojian Ma
  • Wenbing Huang
  • Fuchun Sun
  • Huaping Liu
  • Junzhou Huang

Published on

12/10/2019

Categories

Deep Learning

In this post, we share a brief Q&A with the authors of the paper, Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement, presented at NeurIPS 2019.

What is your paper about?

In this paper, we study learning from observations for imitation learning (LfO) with access to state-only demonstrations, in contrast to Learning from Demonstration (LfD) that involves both action and state supervision. We investigate LfO and its difference with LfD in both theoretical and practical perspectives. The gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and expert which can be minimized in a model-free way.

What is new and significant about your paper?

We contend the novel role of inverse dynamic disagreement (IDD) in the field of imitation learning. We prove that the IDD actually accounts for the optimization gap between GAIL and GAIfO. We propose a model-free solution for mitigating the issue of IDD. We verify the effectiveness of our approach compared to other Learning-from-Observations methods on various tasks.

What will the impact be on the real world?

From the divergence minimization perspective, to look at imitation learning, inverse dynamic disagreement which is the f-divergence distance between inverse dynamic model of the imitator and the expert is playing the noticeable part in the field of Learning-From-Observations. In the real world, such as robot learning to cook, if the expert demonstrations are internet videos, the inverse dynamic disagreement between robot and human is the key component for further research.

What would be the next steps?

Further exploration on combining our work with representation learning to enable imitation across different domains could be a new direction for future work.

Paper Abstract

This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervisions, LfO is more practical in leveraging previously inapplicable resources (e.g., videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO method through further bridging the gap to LfD. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.

Please cite our work using the BibTeX below.

@misc{yang2019imitation,
    title={Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement},
    author={Chao Yang and Xiaojian Ma and Wenbing Huang and Fuchun Sun and Huaping Liu and Junzhou Huang and Chuang Gan},
    year={2019},
    eprint={1910.04417},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Close Modal