Unsupervised Clinical Language Translation



Published on


As patients’ access to their doctors’ clinical notes becomes common, translating professional, clinical jargon to layperson-understandable language is essential to improve patient-clinician communication. Such translation yields better clinical outcomes by enhancing patients’ understanding of their own health conditions, and thus improving patients’ involvement in their own care. Existing research has used dictionary-based word replacement or definition insertion to approach the need. However, these methods are limited by expert curation, which is hard to scale and has trouble generalizing to unseen datasets that do not share an overlapping vocabulary. In contrast, we approach the clinical word and sentence translation problem in a completely unsupervised manner. We show that a framework using representation learning, bilingual dictionary induction and statistical machine translation yields the best precision at 10 of 0.827 on professional-to-consumer word translation, and mean opinion scores of 4.10 and 4.28 out of 5 for clinical correctness and layperson readability, respectively, on sentence translation. Our fully-unsupervised strategy overcomes the curation problem, and the clinically meaningful evaluation reduces biases from inappropriate evaluators, which are critical in clinical machine learning.

Please cite our work using the BibTeX below.

author = {Weng, Wei-Hung and Chung, Yu-An and Szolovits, Peter},
title = {Unsupervised Clinical Language Translation},
year = {2019},
isbn = {9781450362016},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {},
doi = {10.1145/3292500.3330710},
booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
pages = {3121–3131},
numpages = {11},
keywords = {machine translation, unsupervised learning, consumer health, representation learning},
location = {Anchorage, AK, USA},
series = {KDD '19}
Close Modal