Multilingual BERT Post-Pretraining Alignment

NAACL

Cite Paper

Authors

Lin Pan
Chung-Wei Hang
Haode Qi
Abhishek Shah
Saloni Potdar
Mo Yu

Published on

10/23/2020

Categories

Computation and Language NAACL

We propose a simple method to align multilingual contextual embeddings as a post-pretraining step for improved zero-shot cross-lingual transferability of the pretrained models. Using parallel data, our method aligns embeddings on the word level through the recently proposed Translation Language Modeling objective as well as on the sentence level via contrastive learning and random input shuffling. We also perform sentence-level code-switching with English when finetuning on downstream tasks. On XNLI, our best model (initialized from mBERT) improves over mBERT by 4.7% in the zero-shot setting and achieves comparable result to XLM for translate-train while using less than 18% of the same parallel data and 31% less model parameters. On MLQA, our model outperforms XLM-R_Base that has 57% more parameters than ours.

This paper has been published at NAACL 2021

Please cite our work using the BibTeX below.

@misc{pan2021multilingual,
      title={Multilingual BERT Post-Pretraining Alignment}, 
      author={Lin Pan and Chung-Wei Hang and Haode Qi and Abhishek Shah and Saloni Potdar and Mo Yu},
      year={2021},
      eprint={2010.12547},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}