Temperature Schedules for self-supervised contrastive methods on long-tail data



  • Anna Kukleva
  • Moritz Böhle
  • Bernt Schiele
  • Hilde Kuehne
  • Christian Rupprecht

Published on




Most approaches for self-supervised learning (SSL) are optimised on curated balanced datasets, e.g. ImageNet, despite the fact that natural data usually exhibits long-tail distributions. In this paper, we analyse the behaviour of one of the most popular variants of SSL, i.e. contrastive methods, on long-tail data. In particular, we investigate the role of the temperature parameter τ in the contrastive loss, by analysing the loss through the lens of average distance maximisation, and find that a large τ emphasises group-wise discrimination, whereas a small τ leads to a higher degree of instance discrimination. While τ has thus far been treated exclusively as a constant hyperparameter, in this work, we propose to employ a dynamic τ and show that a simple cosine schedule can yield significant improvements in the learnt representations. Such a schedule results in a constant ‘task switching’ between an emphasis on instance discrimination and group-wise discrimination and thereby ensures that the model learns both group-wise features, as well as instance-specific details. Since frequent classes benefit from the former, while infrequent classes require the latter, we find this method to consistently improve separation between the classes in long-tail data without any additional computational cost.

Please cite our work using the BibTeX below.

title={Temperature Schedules for self-supervised contrastive methods on long-tail data},
author={Anna Kukleva and Moritz B{\"o}hle and Bernt Schiele and Hilde Kuehne and Christian Rupprecht},
booktitle={The Eleventh International Conference on Learning Representations },
Close Modal