Topics are more meaningful than words. AI for comparative literature.

Natural Language Processing


Published on


In this post, we share a brief Q&A with the authors of the paper, Hierarchical Optimal Transport for Document Representation, presented at NeurIPS 2019.

Hierarchical Optimal Transport for Document Representation

Abstract: The ability to measure similarity between documents enables intelligent summarization and analysis of large corpora. Past distances between documents suffer from either an inability to incorporate semantic similarities between words or from scalability issues. As an alternative, we introduce hierarchical optimal transport as a meta-distance between documents, where documents are modeled as distributions over topics, which themselves are modeled as distributions over words. We then solve an optimal transport problem on the smaller topic space to compute a similarity score. We give conditions on the topics under which this construction defines a distance, and we relate it to the word mover’s distance. We evaluate our technique for k-NN classification and show better interpretability and scalability with comparable performance to current methods at a fraction of the cost.

What is your paper about?

Optimal transport provides a way to account for the geometry of data in machine learning and statistics applications. Bayesian modeling allows us to learn probabilistic latent structures from the data. In this paper, we use topic models to represent documents as distributions over topics, which are themselves distributions over words, and words are represented using word embeddings. To enable meaningful distance computation for such representation of text data we utilize hierarchical optimal transport: distance between two documents is the Wasserstein distance between their distributions over topics, where the ground metric is another Wasserstein distance between topics represented as distributions over word embeddings.

What is new and significant about your paper?

We demonstrate that hierarchical optimal transport provides a natural distance function that is sensitive to high-level similarities between documents. Our Hierarchical Optimal Topic Transport (HOTT) document distance is interpretable and computationally efficient. In several text classification experiments with k-NN classification, we show better interpretability and scalability with comparable performance to current methods at a fraction of the cost.

What will the impact be on the real world?

Our method is interpretable, fast and easy to implement, making it a promising tool for data scientists working in natural language processing.

What would be the next steps?

An intriguing future work direction is incorporating hierarchical optimal transport into the learning pipeline of data representations. In terms of applications, it would be interesting to consider document ranking and retrieval, as well as applications to data beyond text.

What was the most interesting thing you learned?

It was interesting to see that seemingly different perspectives on text data, topic models and word embeddings, naturally compliment each other through the lens of (hierarchical) optimal transport.

How would you describe your paper in less than 5 words?

Topics more meaningful than words.

What made you most excited about this paper?

This paper provides an off-the-shelf tool that is ready to be adopted by ML practitioners.

How do you see this research evolving in the next 5 years?

We expect more papers combining Bayesian modeling and optimal transport in the coming years. The former is about learning distributions to represent data, while the latter provides an appropriate metric for these distributions.

If you could invite someone to dinner and tell them about your paper, who would it be? Why?

Someone with a doctorate in linguistics or comparative literature.

Please cite our work using the BibTeX below.

author = {Mikhail Yurochkin and
Sebastian Claici and
Edward Chien and
Farzaneh Mirzazadeh and
Justin Solomon},
title = {Hierarchical Optimal Transport for Document Representation},
journal = {CoRR},
volume = {abs/1906.10827},
year = {2019},
url = {},
archivePrefix = {arXiv},
eprint = {1906.10827},
timestamp = {Thu, 27 Jun 2019 18:54:51 +0200},
biburl = {},
bibsource = {dblp computer science bibliography,}
Close Modal