Authors

Published on

05/01/2020

We present a novel black-box adversarial attack algorithm with state-of-the-art model evasion rates for query efficiency under ℓ∞ and ℓ2 metrics. It exploits a \textit{sign-based}, rather than magnitude-based, gradient estimation approach that shifts the gradient estimation from continuous to binary black-box optimization. It adaptively constructs queries to estimate the gradient, one query relying upon the previous, rather than re-estimating the gradient each step with random query construction. Its reliance on sign bits yields a smaller memory footprint and it requires neither hyperparameter tuning or dimensionality reduction. Further, its theoretical performance is guaranteed and it can characterize adversarial subspaces better than white-box gradient-aligned subspaces. On two public black-box attack challenges and a model robustly trained against transfer attacks, the algorithm’s evasion rates surpass all submitted attacks. For a suite of published models, the algorithm is 3.8× less failure-prone while spending 2.5× fewer queries versus the best combination of state of art algorithms. For example, it evades a standard MNIST model using just 12 queries on average. Similar performance is observed on a standard IMAGENET model with an average of 579 queries.

Please cite our work using the BibTeX below.

@inproceedings{ Al-Dujaili2020Sign, title={Sign Bits Are All You Need for Black-Box Attacks}, author={Abdullah Al-Dujaili and Una-May O'Reilly}, booktitle={International Conference on Learning Representations}, year={2020}, url={https://openreview.net/forum?id=SygW0TEFwH} }