About

James Glass

Senior Research Scientist, Computer Science and Artificial Intelligence Laboratory

Publications Research

Who they work with

Categories

Machine Learning Natural Language Processing

James Glass is a senior research scientist and heads the Spoken Language Systems Group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). He is also a faculty member of the Harvard-MIT Division of Health Sciences and Technology program. His research focuses on automatic speech recognition, unsupervised speech processing, and spoken language understanding. His group is focused on finding answers to three questions: who is talking; what is said; and what is meant. The first area focuses on paralinguistic issues like speaker verification, language and dialect identification, and speaker diarization, or who spoke when. The group is also analyzing health markers embedded in speech, an area that addresses speech recognition capabilities and challenges related to noise robustness, limited linguistic resources, and unsupervised language acquisition. The third and final area focuses on the boundary between speech and natural language processing, and includes topics related to speech understanding such as sentiment analysis and dialogue. Some research also focuses on the user-generated text in social forums.

Glass is a fellow of the Institute of Electrical and Electronics Engineers (IEEE) and the International Speech Communication Association, and is currently an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence. He earned an SM and PhD in electrical engineering and computer science at MIT.

Selected Publications

Gong, Y., Liu, A. H., Luo, H., Karlinsky, L., & Glass, J. (2023). Joint Audio and Speech Understanding. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). https://doi.org/10.1109/asru57964.2023.10389742
Lai, C., Shi, F., Peng, P., Kim, Y., Gimpel, K., Chang, S., Chuang, Y., Bhati, S., Cox, D. D., Harwath, D., Zhang, Y., Livescu, K., & Glass, J. (2023). Audio-Visual Neural Syntax Acquisition. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). https://doi.org/10.1109/asru57964.2023.10389619
Rouditchenko, A., Chuang, Y., Shvetsova, N., Thomas, S., Feris, R., Kingsbury, B., Karlinsky, L., Harwath, D., Kuehne, H., & Glass, J. (2023). C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. International Conference on Acoustics, Speech, and Signal Processing (ICASSP). https://doi.org/10.1109/icassp49357.2023.10094821

Media

- May 29, 2024: MIT News, Looking for a specific action in a video? This AI-based method can find it for you
- April 10, 2024: MIT News, A faster, better way to prevent an AI chatbot from giving toxic responses
- May 4, 2022: MIT News, Artificial intelligence system learns concepts shared across video, audio, and text