Deep Audio Priors Emerge From Harmonic Convolutional Networks

Audio Processing

Cite Paper Peer Review Demo

Authors

Zhoutong Zhang
Yunyun Wang
Chuang Gan
Jiajun Wu
Joshua Tenenbaum
Antonio Torralba
William T. Freeman

Published on

09/25/2019

Categories

Audio Processing Deep Learning ICLR

Convolutional neural networks (CNNs) excel in image recognition and generation. Among many efforts to explain their effectiveness, experiments show that CNNs carry strong inductive biases that capture natural image priors. Do deep networks also have inductive biases for audio signals? In this paper, we empirically show that current network architectures for audio processing do not show strong evidence in capturing such priors. We propose Harmonic Convolution, an operation that helps deep networks distill priors in audio signals by explicitly utilizing the harmonic structure within. This is done by engineering the kernel to be supported by sets of harmonic series, instead of local neighborhoods for convolutional kernels. We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks. With Harmonic Convolution, they also achieve better generalization performance for sound source separation.

Please cite our work using the BibTeX below.

@inproceedings{
Zhang2020Deep,
title={Deep Audio Priors Emerge From Harmonic Convolutional Networks},
author={Zhoutong Zhang and Yunyun Wang and Chuang Gan and Jiajun Wu and Joshua B. Tenenbaum and Antonio Torralba and William T. Freeman},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=rygjHxrYDB}
}