Research

Class-wise rationalization: teaching AI to weigh pros and cons

Natural Language Processing

Cite Paper Code

Authors

Edited by

Mark Weber

Published on

12/09/2019

A primer on deep rationalization

This paper is the latest in an important line of work we’re pursuing at the MIT-IBM Watson AI Lab. We call this work Deep Rationalization. You can think of it like this.

A child asks “Why can’t I eat this jelly bean?” A parent replies, “Because I said so.” The child learns she can’t eat the jelly bean (at least not while Dad is watching), but she doesn’t learn *why*. So she repeats many such questions before learning what flies. Dad might save himself a headache if he explained, “Jelly beans are pure sugar and such foods give you cavities, which really hurt. That’s why we don’t eat pure sugar foods.” In contrast, today’s AI models are like the lazy Dad who actually costs himself more work in the long run.

For all the massive amounts of training data fed to AI models, the general absence of basic rationalizations in the labels means the models are hard to train and hard to explain. This may have been acceptable early on, but, AI is now reaching adolescence and “Because I said so” is no longer cutting it.

Our two-pronged goal is 1) to radically reduce the number of labelled examples a model needs to learn, and 2) to radically improve explainability.

Selective Rationalization

Selective rationalization refers to the process of finding rationales. A rationale is a hard selection of input features that are sufficient to explain the output prediction. In natural language processing (NLP), a rationale is a selection of text-spans that are short and coherent, and sufficient for the correct prediction.

As an example, below is a beer review, and the output prediction is the rating of the beer appearance. The rationales are short sentences that explain why the appearance achieves 5 stars. A possible set of rationales are highlighted in green.

Beer review annotation

The class problem with today’s rationalization techniques

Existing rationalization algorithms have one limitation. They only look for rationales that support the labeled class. For example, given a negative hotel review like the one below, today’s models can only find the explanation for the negative sentiment. What if we want to also find rationales supporting positive sentiment i.e. the counterfactual rationales? This would allow a more structural interpretation of deep learning models, much like the weighing of pros and cons, especially when the input evidence is complicated and mixed.

Hotel review classes

Introducing Class-wise Adversarial Rationalization (CAR)

Drawing on insights from game theory, we present a new method called Class-wise Adversarial Rationalization, or CAR, which can find rationales explaining any given class. Consider a binary sentiment classification task, where there are two classes, the negative class (or class 0) and the positive class (or class 1). Under this setting, the CAR structure consists of two groups, the class-0 group and the class-1 group. The class-0 group aims to find class-0 rationales, or more concretely the cons, from input text. The class-1 group aims to find the pros.

Let’s focus on the class-0 group. Below is the structure diagram of this group.