Teaching AI where to focus

Regina Barzilay (MIT), Tommi Jaakkola (MIT), Shiyu Chang (IBM), and Mo Yu (IBM) 



When developing neural models for natural language processing, scientists often use the attention mechanism to improve performance. Attention-based models can provide human-interpretable rationales for their predictions by communicating what they are focusing on. The success of these models, however, requires access to large amounts of training data. Recently, MIT-IBM researchers have found an ingenious way to extend their benefits to new tasks where adequate training data is lacking. They developed a “meta-mapping" that models the relationship between human rationales and machine attention across different tasks. The meta-mapping can be used to guide models trained in tasks where high-quality training data is lacking. The researchers demonstrated that this approach performs significantly better than state-of-the-art techniques for domain transfer, reducing error by more than 15 percent. Using this method could bring the power of attention-based neural networks to new applications where lack of training data previously precluded their use.