Research

Extending the WILDS Benchmark for Unsupervised Adaptation

Authors

Shiori Sagawa
Pang Wei Koh
Tony Lee
Irena Gao
Sang Michael Xie
Kendrick Shen
Ananya Kumar
Weihua Hu
Michihiro Yasunaga
Henrik Marklund
Sara Beery
Etienne David
Ian Stavness
Wei Guo
Jure Leskovec
Kate Saenko
Tatsunori Hashimoto
Sergey Levine Chelsea Finn, Percy Liang

Cite

Research

Extending the WILDS Benchmark for Unsupervised Adaptation

ICLR

Cite Paper Project Page Code

Authors

Shiori Sagawa
Pang Wei Koh
Tony Lee
Irena Gao
Sang Michael Xie
Kendrick Shen
Ananya Kumar
Weihua Hu
Michihiro Yasunaga
Henrik Marklund
Sara Beery
Etienne David
Ian Stavness
Wei Guo
Jure Leskovec
Kate Saenko
Tatsunori Hashimoto
Sergey Levine Chelsea Finn, Percy Liang

Published on

04/29/2022

Categories

ICLR

Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribution shift benchmarks with unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). The update maintains consistency with the original WILDS benchmark by using identical labeled training, validation, and test sets, as well as identical evaluation metrics. We systematically benchmark state-of-the-art methods that use unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS is limited. To facilitate method development, we provide an open-source package that automates data loading and contains the model architectures and methods used in this paper. Code and leaderboards are available at https://wilds.stanford.edu.

Please cite our work using the BibTeX below.

@inproceedings{
sagawa2022extending,
title={Extending the {WILDS} Benchmark for Unsupervised Adaptation},
author={Shiori Sagawa and Pang Wei Koh and Tony Lee and Irena Gao and Sang Michael Xie and Kendrick Shen and Ananya Kumar and Weihua Hu and Michihiro Yasunaga and Henrik Marklund and Sara Beery and Etienne David and Ian Stavness and Wei Guo and Jure Leskovec and Kate Saenko and Tatsunori Hashimoto and Sergey Levine and Chelsea Finn and Percy Liang},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=z7p2V6KROOV}
}