Research

AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks

Authors

Chun-Chen Tu
Paishun Ting
Pin-Yu Chen
Sijia Liu
Huan Zhang
Jinfeng Yi
Cho-Jui Hsieh
Shin-Ming Cheng

Cite

Research

AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks

Robustness

Cite Paper Code

Authors

Chun-Chen Tu
Paishun Ting
Pin-Yu Chen
Sijia Liu
Huan Zhang
Jinfeng Yi
Cho-Jui Hsieh
Shin-Ming Cheng

Published on

05/30/2018

Categories

Optimization Robustness

Recent studies have shown that adversarial examples in state-of-the-art image classifiers trained by deep neural networks (DNN) can be easily generated when the target model is transparent to an attacker, known as the white-box setting. However, when attacking a deployed machine learning service, one can only acquire the input-output correspondences of the target model; this is the so-called black-box attack setting. The major drawback of existing black-box attacks is the need for excessive model queries, which may give a false sense of model robustness due to inefficient query designs. To bridge this gap, we propose a generic framework for query-efficient black-box attacks. Our framework, AutoZOOM, which is short for Autoencoder-based Zeroth Order Optimization Method, has two novel building blocks towards efficient black-box attacks: (i) an adaptive random gradient estimation strategy to balance query counts and distortion, and (ii) an autoencoder that is either trained offline with unlabeled data or a bilinear resizing operation for attack acceleration. Experimental results suggest that, by applying AutoZOOM to a state-of-the-art black-box attack (ZOO), a significant reduction in model queries can be achieved without sacrificing the attack success rate and the visual quality of the resulting adversarial examples. In particular, when compared to the standard ZOO method, AutoZOOM can consistently reduce the mean query counts in finding successful adversarial examples (or reaching the same distortion level) by at least 93% on MNIST, CIFAR-10 and ImageNet datasets, leading to novel insights on adversarial robustness.

Please cite our work using the BibTeX below.

@article{DBLP:journals/corr/abs-1805-11770,
  author    = {Chun{-}Chen Tu and
               Pai{-}Shun Ting and
               Pin{-}Yu Chen and
               Sijia Liu and
               Huan Zhang and
               Jinfeng Yi and
               Cho{-}Jui Hsieh and
               Shin{-}Ming Cheng},
  title     = {AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking
               Black-box Neural Networks},
  journal   = {CoRR},
  volume    = {abs/1805.11770},
  year      = {2018},
  url       = {http://arxiv.org/abs/1805.11770},
  archivePrefix = {arXiv},
  eprint    = {1805.11770},
  timestamp = {Sat, 31 Aug 2019 16:23:04 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1805-11770.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}