Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning



  • Ligeng Zhu
  • Hongzhou Lin
  • Yao Lu
  • Yujun Lin
  • Song Han

Published on




Federated Learning is an emerging direction in distributed machine learning that enables jointly training a model without sharing the data. Since the data is distributed across many edge devices through wireless / long-distance connections, federated learning suffers from inevitable high communication latency. However, the latency issues are undermined in the current literature [15] and existing approaches such as FedAvg [27] become less efficient when the latency increases. To overcome the problem, we propose Delayed Gradient Averaging (DGA), which delays the averaging step to improve efficiency and allows local computation in parallel to communication. We theoretically prove that DGA attains a similar convergence rate as FedAvg, and empirically show that our algorithm can tolerate high network latency without compromising accuracy. Specifically, we benchmark the training speed on various vision (CIFAR, ImageNet) and language tasks (Shakespeare), with both IID and non-IID partitions, and show DGA can bring 2.55⇥ to 4.07⇥ speedup. Moreover, we built a 16-node Raspberry Pi cluster and show that DGA can consistently speed up real-world federated learning applications.

Please cite our work using the BibTeX below.

title={Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning},
author={Ligeng Zhu and Hongzhou Lin and Yao Lu and Yujun Lin and song han},
booktitle={Advances in Neural Information Processing Systems},
editor={A. Beygelzimer and Y. Dauphin and P. Liang and J. Wortman Vaughan},
Close Modal