Deep Audio Priors Emerge From Harmonic Convolutional Networks

Audio Processing


Published on


Convolutional neural networks (CNNs) excel in image recognition and generation. Among many efforts to explain their effectiveness, experiments show that CNNs carry strong inductive biases that capture natural image priors. Do deep networks also have inductive biases for audio signals? In this paper, we empirically show that current network architectures for audio processing do not show strong evidence in capturing such priors. We propose Harmonic Convolution, an operation that helps deep networks distill priors in audio signals by explicitly utilizing the harmonic structure within. This is done by engineering the kernel to be supported by sets of harmonic series, instead of local neighborhoods for convolutional kernels. We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks. With Harmonic Convolution, they also achieve better generalization performance for sound source separation.

Please cite our work using the BibTeX below.

title={Deep Audio Priors Emerge From Harmonic Convolutional Networks},
author={Zhoutong Zhang and Yunyun Wang and Chuang Gan and Jiajun Wu and Joshua B. Tenenbaum and Antonio Torralba and William T. Freeman},
booktitle={International Conference on Learning Representations},
Close Modal