A content-aware attack generator for AI cybersecurity

Adversarial Machine Learning


  • Jie Chen
  • Huy Phan
  • Yi Xie
  • Siyu Liao
  • Bo Yuan

Published on


Despite their tremendous success in many fields, deep neural networks (DNNs) are still extremely vulnerable to adversarial attack. Adversarial attack is a method that causes the intended misclassfication by adding imperceptible perturbations to legitimate inputs. Researchers have developed numerous types of adversarial attack methods. However, from the perspective of practical deployment, these methods suffer from several drawbacks such as long attack generating time, high memory cost, insufficient robustness and low transferability. In our paper, CAG: A Real-time Low-cost Enhanced-robustness High-transferability Content-aware Adversarial Attack Generator, published in AAAI 2020, we propose a method to achieve real-time, low-cost, enhanced-robustness and high-transferability adversarial attack.

Stress-testing AI systems

Why are we publishing new methods for attacking AI systems? Because the only way to fortify AI systems is to discover their vulnerabilities. Such stress testing is the core of the field of adversarial robustness. Here we will define adversarial attacks as actions, such as peturbations of an image imperceptible to the human eye, causing an intentional misclassification by an AI model. For example, in adversarial work at UC Berkeley, by adding just a small amount of noise (in the form of spray paint, stickers) on a stop sign, the researchers compromised the output of the DNN to a desired misclassification. Such vulnerabilities speak to the long road ahead for self-driving systems and other systems where model failure has serious consequences. For a fuller primer, see Explaining and Harnessing Adversarial Examples (Goodfellow, Shlens, Szegedy).

Taking the perspective the attacker, the practical deployment of adversarial attacks suffers from certain drawbacks, namely long adversarial example generating time, high memory cost for launching adversarial attack, insufficient robustness against defense methods and low transferability in black-box attack scenario. However, these drawbacks can be overcome. We demonstrate this with the Content-aware Adversarial Attack Generator (CAG).

Content-aware Adversarial Attack Generator (CAG)

The Content-aware Adversarial Attack Generator (CAG) is a deep neural network (DNN) that generates adversarial examples to attack other DNNs. Compared to previous methods, CAG has several advantages:

  1. CAG significantly reduces the generation time, at least 500x. By doing this it could attack other DNNs in real time, instead of waiting hours to generate adversarial examples.
  2. CAG reduces the training cost and storage cost by 1000x. The number of generative models is reduced from n to 1.
  3. CAG increases the attack strength of our adversarial examples so it could attack a wide range of DNNs with a higher attack success rate. The novel aspect here is that CAG could use an embedding layer and localization information to inform itself about the content of the original image used. The CAG is aware of the correct class of the image, the desire targeted class, and also the position of the object in the image. Hence, “Content-Aware.”

How it works

To create a DNN that can synthesize adversarial images, we first need to pick an existing DNN architecture that maps image to image. We chose a Convolution Neural Network called U-Net in this case. We also use an embedding layer together with the U-Net. We first initialize the U-Net and the embedding layer randomly. The CAG takes a clean image in and output an adversarial image. This adversarial image is then ready to fool many classifiers. We also need to input the correct label and the target label besides the clean image.

Next, we extract the corresponding slices (in form of tensors) of the original label and the target label from the embedding layer. You could think of this embedding layer as a dictionary, where the label is the key and the tensor is the value.

Next, we use Class Activation Mapping (CAM) technique to generate the localization information of the object in the image. We then make the input tensor by concatenating the original image, the labels embedding information and the localization information in the channel dimension. The output of the CAG is an adversarial noise. To increase the robustness of the noise, we drop out parts of the noise during training phase. The noise is then added on top of the original image to create an adversarial image. We want the adversarial image to be classified as the targeted class and also have the CAM the same as the original image. Hence we formulate a loss function to satisfy these constraints.

The most interesting thing we found out is that the embedding layers actually carry useful information. At epoch 0, class embeddings are initialized and distributed randomly. However, at epoch 500, embeddings of similar classes are close to each other, such as car-truck, horse-deer, and dog-cat. Therefore, the local distance between similar classes suggests that our approach creates a useful set of embeddings.

Choosing a framework

We think it is important to choose deep learning libraries and tools that require less boilerplate code so we could focus on the research side. We use Pytorch Lightning, a framework which takes care of the boilerplate and provides highly reproducible standards of ML research pipeline. We carefully examine many related papers to determine its strengths and weaknesses. We then make a list of the drawbacks of previous architecture used, get to the heart of the problems and think how we could improve them. For example, adversarial examples generated from generative models have low transferability. The problem is that these adversarial examples overfit to the particular classifier used during the training process. We then try using dropout on different parts of CAG architecture. After many experiments, we finally conclude that applying dropout directly on the noises produce the best result.The biggest challenge we encountered is that training DNNs take lots of time, hence we could not iterate and try new architecture as fast as we want to.

Experimental Results

We used 2 popular dataset call CIFAR-10, and ImageNet to do all experiments. We compare our attack against other state of the art attack in these criteria: attack success rate, classifier accuracy, L2 norm, time, storage cost and transferability. The most noteworthy result is that when compared with iterative attack, our attack has 1000x speed up; and when compared with other generative attacks, our attack require only 1 model instead of 1000 to do a comprehensive attack to all classes.


The decision to publish on adversarial robustness is always a difficult one because in the short term, methods like CAG can be used by adversarial actors. However, by publicizing the vulnerabilities of AI systems while it is still in its infancy, the field of adversarial robustness plays an important role in the long-term strengthening these systems. By demonstrating how to execute real-time, low-cost, and high-transferability adversarial attacks on DNN’s, CAG sits as a warning and as a call-to-action for researchers and practitioners alike to guard against such attacks.

  • Deep Neural Network: an artificial neural network (ANN) with multiple layers between the input and output layers. The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship.
  • Misclassfication: to change the result of from the correct class to an incorrect class. For example, to classify a dog image to have a cat in it.
  • Black box attack: an attack scenario where the attacker does not have access to the underlying architecture of the system.
  • Real-time: fast generation time
  • Low-cost: less time to train, and less time to store the model
  • Enhanced-robustness: able to bypass several DNNs systems that have defense mechanism against adversarial attacks.
  • High-transferability: adversarial examples generate using one DNN model could be use to attack a completely different DNN model.

Please cite our work using the BibTeX below.

@article{phan2019cag, title={CAG: A Real-time Low-cost Enhanced-robustness High-transferability Content-aware Adversarial Attack Generator}, author={Phan, Huy and Xie, Yi and Liao, Siyu and Chen, Jie and Yuan, Bo}, journal={arXiv preprint arXiv:1912.07742}, year={2019}}
Close Modal