James Glass
Senior Research Scientist, Computer Science and Artificial Intelligence Laboratory

James Glass is a senior research scientist and heads the Spoken Language Systems Group in MIT’s Computer Science and Artificial Intelligence Laboratory. He is also a faculty member of the Harvard-MIT Division of Health Sciences and Technology program. His research focuses on automatic speech recognition, unsupervised speech processing, and spoken language understanding. His group is focused on finding answers to three questions: who is talking; what is said; and what is meant. The first area focuses on paralinguistic issues like speaker verification, language and dialect identification, and speaker diarization, or who spoke when. The group is also analyzing health markers embedded in speech, an area that addresses speech recognition capabilities and challenges related to noise robustness, limited linguistic resources, and unsupervised language acquisition. The third and final area focuses on the boundary between speech and natural language processing, and includes topics related to speech understanding such as sentiment analysis and dialogue. Some research also focuses on the user-generated text in social forums.
Glass is a fellow of the Institute of Electrical and Electronics Engineers (IEEE) and the International Speech Communication Association, and is currently an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence. He earned an MS and PhD in electrical engineering and computer science at MIT.
Selected Publications
- Rouditchenko, A. et al., (2023) C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. ICASSP 2023 – 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10094821.
- Rouditchenko A., Liu, A., Harwath, D., Karlinsky, L., Kuehne, H., Glass, J., Gong, Y. (2023) Contrastive Audio-Visual Masked Autoencoder. International Conference on Learning Representations (ICLR)
- N. Shvetsova et al., “Everything at Once – Multi-modal Fusion Transformer for Video Retrieval,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 19988-19997, doi: 10.1109/CVPR52688.2022.01939.
- Lai, C-I. J., Zhang, Y., Liu, A. H., Chang, S., Liao, Y-L., Chuang, Y-S., Qian, K., Khurana, S., Cox, D., Glass, J. (2021). PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition. Conference on Neural Information Processing Systems (NeurIPS).
Media
- November 4, 2021: MIT News, Toward speech recognition for uncommon spoken languages.
- February 21, 2019: MIT News, Exploring the nature of intelligence.
- October 4, 2018: MIT News, Detecting fake news at its source.
- September 18, 2018: MIT News, Machine-learning system tackles speech and object recognition, all at once.
- Augugust 29, 2018: MIT News, Model can more naturally detect depression in conversations.