Research

Causal inference is expensive. Here’s an algorithm for fixing that.

Causal Inference

Cite

Authors

Edited by

Mark Weber

Published on

12/08/2019

Active learning and causal discovery

An active learning algorithm is one that actively engages some subject or information source. It is the computer science equivalent of statistical experiment design, a real-world example of which might be a Randomized Control Trial (RCT) to study whether or not chocolate really does improve cognition.

Active learning is popular for problems in which data is abundant but labels are scarce or expensive because the algorithm can interact with the data to generate labels.

In the case of a causal model, which can be described as a graph of nodes and edges mapping cause and effect relationships, we want to see if we can use active learning to learn the directed edges of a causal graph. Ideally, this is done by iteratively using current knowledge to design targeted experiments (interventions) in the system that result in causal structure being learned as quickly as possible.

It turns out that in this regime the common active learning approach of greedily choosing interventions that maximize information gain
can be shown to be exponentially suboptimal in terms of the number of interventions required to learn causal relationships (see figure below).

Information greedy approach to causal inference

A greedy active learning algorithm, shown for K = 3. The optimal algorithm can find the root in [log2 K] + 1 interventions by a top-down approach, while the information greedy algorithm intervenes on the lk (k = 1, …, K) nodes, taking at least K / 2 steps in expectation.

Here, the graph structure is causing us problems, but what if we could turn that around? What if we could exploit the graph structure itself for a better approach? It turns out we can using a centrality measure to choose nodes to intervene on.

Central Node Algorithm

We propose an intervention design method called the Central Node Algorithm that overcomes this exponential suboptimality.

In fact what’s exciting about this algorithm is that it is provably near optimal for iterative experiment design when orienting the edges in a causal graph whose un-oriented components form a forest. This means we can now give active learning agents the ability to efficiently explore an environment and discover its causal structure.

How it works

Consider the problem of experimental design for learning causal graphs that have a tree structure. Using an adaptive framework we can determine the next intervention based on a Bayesian prior updated with the outcomes of previous experiments, focusing on the setting where observational data is cheap (assumed infinite) and interventional data is expensive.

In this setting, we show that the algorithm requires a number of interventions less than or equal to a factor of 2 times the minimum achievable number. Additionally, several extensions are presented, to the case where a specified set of nodes cannot be intervened on, to the case where K interventions are scheduled at once, and to the fully adaptive case where each experiment yields only one sample. In the case of finite interventional data, through simulated experiments we show that our algorithms outperform different adaptive baseline algorithms.

Step 1: Observed data with undefined causal relationships

Central Node Algorithm

Step 2: Intervene on a central node to reveal its causal edges

Central Node Algorithm

Step 3: Repeat until all edges are mapped

Central Node Algorithm

Reflection

One of the surprises in our work was the fact that we can prove that the performance of our approach is close to that of the (unknown) optimal algorithm.

Moreover, the exploration problem is particularly intriguing since the algorithm that greedily maximizes information gain (a popular approach in active experiment design) can take exponentially longer to orient the graph than our approach does.

As this research develops, we could see similar concepts being used to design AI agents that when asked to operate in an environment. In such a setting, an agent first adaptively explores the environment with targeted experimentation to build a causal model, then uses this causal model to reason about how to best go about completing tasks it is asked to do. Then the agent may also make sense of the behavior of other (possibly non-artificial) agents and predict the future state of the system. Such capabilities are fundamental to all forms of intelligence, and we are beginning to take small steps toward this with causal inference algorithms.

Causal inference is an important line of work at the MIT-IBM Watson AI Lab. Also see our NeurIPS 2019 paper, Characterization and Learning of Causal Graphs with Latent Variables from Soft Interventions.

Please cite our work using the BibTeX below.

   @incollection{NIPS2019_9575,
    title = {Sample Efficient Active Learning of Causal Trees},
    author = {Greenewald, Kristjan and Katz, Dmitriy and Shanmugam, Karthikeyan and Magliacane, Sara and Kocaoglu, Murat and Boix Adsera, Enric and Bresler, Guy},
    booktitle = {Advances in Neural Information Processing Systems 32},
    editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
    pages = {14279--14289},
    year = {2019},
    publisher = {Curran Associates, Inc.},
    url = {http://papers.nips.cc/paper/9575-sample-efficient-active-learning-of-causal-trees.pdf}
    }