Certifying robustness of an AI model

Luca Daniel (MIT), Pin-Yu Chen (IBM Research)


AI is transforming myriad aspects of our lives, from our shopping habits to our healthcare systems. But these models are not infallible: they may be vulnerable to adversarial attack, raising security concerns and potentially compromising people’s confidence in them. Earlier this year, MIT-IBM researchers described and then extended the first comprehensive measure of the robustness of neural networks against adversarial attack. They have since advanced their work further and created a general and efficient framework for certifying robustness in neural networks. The method is highly scalable and can be applied to various activation functions (which map the relationship between input to the network and the network’s response for each layer). The framework, called CROWN, enables developers and researchers to report with confidence how much perturbation to its input a model can withstand and not be tricked. This capability is particularly important for applications with high security requirements (e.g., autonomous vehicles). More broadly, robustness is an essential component of trusted AI systems, and the ability to certify a model’s robustness and communicate that information to model practitioners and users also builds confidence in AI.