James Glass
Senior Research Scientist, Computer Science and Artificial Intelligence Laboratory
James Glass is a senior research scientist and heads the Spoken Language Systems Group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). He is also a faculty member of the Harvard-MIT Division of Health Sciences and Technology program. His research focuses on automatic speech recognition, unsupervised speech processing, and spoken language understanding. His group is focused on finding answers to three questions: who is talking; what is said; and what is meant. The first area focuses on paralinguistic issues like speaker verification, language and dialect identification, and speaker diarization, or who spoke when. The group is also analyzing health markers embedded in speech, an area that addresses speech recognition capabilities and challenges related to noise robustness, limited linguistic resources, and unsupervised language acquisition. The third and final area focuses on the boundary between speech and natural language processing, and includes topics related to speech understanding such as sentiment analysis and dialogue. Some research also focuses on the user-generated text in social forums.
Glass is a fellow of the Institute of Electrical and Electronics Engineers (IEEE) and the International Speech Communication Association, and is currently an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence. He earned an SM and PhD in electrical engineering and computer science at MIT.
Selected Publications
- Schroeder, P., Morgan, N., Luo, H., & Glass, J. (2025). THREAD: Thinking deeper with recursive spawning. Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL): Human Language Technologies (Volume 1: Long Papers) (pp. 8418–8442). Association for Computational Linguistics.
- Doveh, S., Shabtay, N., Lin, W., Schwartz, E., Kuehne, H., Giryes, R., Feris, R., Karlinsky, L., Glass, J., Arbelle, A., Ullman, S., & Mirza, M. J. (2025). Teaching VLMs to localize specific objects from in‑context examples. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
- Araujo, E., Rouditchenko, A., Gong, Y., Bhati, S., Thomas, S., Kingsbury, B., Karlinsky, L., Feris, R., Glass, J. R., & Kuehne, H. (2025). CAV-MAE Sync: Improving contrastive audio-visual mask autoencoders via fine-grained alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025) (pp. 18794–18803). IEEE.
Media
-
- May 22, 2025: MIT News, AI learns how vision and sound are connected, without human intervention
- April 10, 2024: MIT News, A faster, better way to prevent an AI chatbot from giving toxic responses