James Glass

Senior Research Scientist, Computer Science and Artificial Intelligence Laboratory

James Glass is a senior research scientist and heads the Spoken Language Systems Group in MIT’s Computer Science and Artificial Intelligence Laboratory. He is also a faculty member of the Harvard-MIT Division of Health Sciences and Technology program. His research focuses on automatic speech recognition, unsupervised speech processing, and spoken language understanding. His group is focused on finding answers to three questions: who is talking; what is said; and what is meant. The first area focuses on paralinguistic issues like speaker verification, language and dialect identification, and speaker diarization, or who spoke when. The group is also analyzing health markers embedded in speech, an area that addresses speech recognition capabilities and challenges related to noise robustness, limited linguistic resources, and unsupervised language acquisition. The third and final area focuses on the boundary between speech and natural language processing, and includes topics related to speech understanding such as sentiment analysis and dialogue. Some research also focuses on the user-generated text in social forums.

Glass is a fellow of the Institute of Electrical and Electronics Engineers (IEEE) and the International Speech Communication Association, and is currently an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence. He earned an MS and PhD  in electrical engineering and computer science at MIT.

Selected Publications

  • Gong, Y., Liu, A. H., Luo, H., Karlinsky, L., & Glass, J. (2023). Joint Audio and Speech Understanding. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). https://doi.org/10.1109/asru57964.2023.10389742
  • Lai, C., Shi, F., Peng, P., Kim, Y., Gimpel, K., Chang, S., Chuang, Y., Bhati, S., Cox, D. D., Harwath, D., Zhang, Y., Livescu, K., & Glass, J. (2023). Audio-Visual Neural Syntax Acquisition. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). https://doi.org/10.1109/asru57964.2023.10389619
  • Rouditchenko, A., Chuang, Y., Shvetsova, N., Thomas, S., Feris, R., Kingsbury, B., Karlinsky, L., Harwath, D., Kuehne, H., & Glass, J. (2023). C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. International Conference on Acoustics, Speech, and Signal Processing (ICASSP). https://doi.org/10.1109/icassp49357.2023.10094821