Differentiable Top-k Classification Learning
Authors
Authors
- Felix Petersen
- Hilde Kuehne
- Christian Borgelt
- Oliver Deussen
Authors
- Felix Petersen
- Hilde Kuehne
- Christian Borgelt
- Oliver Deussen
Published on
02/09/2022
Categories
The top-k classification accuracy is one of the core metrics in machine learning. Here, k is conventionally a positive integer, such as 1 or 5, leading to top-1 or top-5 training objectives. In this work, we relax this assumption and optimize the model for multiple k simultaneously instead of using a single k. Leveraging recent advances in differentiable sorting and ranking, we propose a family of differentiable top-k cross-entropy classification losses. This allows training while not only considering the top-1 prediction, but also, e.g., the top-2 and top-5 predictions. We evaluate the proposed losses for fine-tuning on state-ofthe-art architectures, as well as for training from scratch. We find that relaxing k not only produces better top-5 accuracies, but also leads to top-1 accuracy improvements. When fine-tuning publicly available ImageNet models, we achieve a new state-of-the-art for these models.
Please cite our work using the BibTeX below.
@InProceedings{pmlr-v162-petersen22a,
title = {Differentiable Top-k Classification Learning},
author = {Petersen, Felix and Kuehne, Hilde and Borgelt, Christian and Deussen, Oliver},
booktitle = {Proceedings of the 39th International Conference on Machine Learning},
pages = {17656--17668},
year = {2022},
editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
volume = {162},
series = {Proceedings of Machine Learning Research},
month = {17--23 Jul},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v162/petersen22a/petersen22a.pdf},
url = {https://proceedings.mlr.press/v162/petersen22a.html},
abstract = {The top-k classification accuracy is one of the core metrics in machine learning. Here, k is conventionally a positive integer, such as 1 or 5, leading to top-1 or top-5 training objectives. In this work, we relax this assumption and optimize the model for multiple k simultaneously instead of using a single k. Leveraging recent advances in differentiable sorting and ranking, we propose a family of differentiable top-k cross-entropy classification losses. This allows training while not only considering the top-1 prediction, but also, e.g., the top-2 and top-5 predictions. We evaluate the proposed losses for fine-tuning on state-of-the-art architectures, as well as for training from scratch. We find that relaxing k not only produces better top-5 accuracies, but also leads to top-1 accuracy improvements. When fine-tuning publicly available ImageNet models, we achieve a new state-of-the-art for these models.}
}