Research

Creating space for the evolution of generative and trustworthy AI

Generative Models

From left: Aude Oliva, David Cox, and Dan Huttenlocher sit on tall chairs in front of the Lab's Industry Showcase logo on a screen

Credit: Christopher Harting

MIT-IBM Watson AI Lab and its member companies delved into the history and future of explainable AI and foundation models and its intersection with different parts of society and business at this year’s Industry Showcase.

Over the years, AI has experienced periods of freeze, thaw, and growth. As recently as 2015, AI was gaining traction in the public sphere, and it was hypothesized that “deep learning, machine intelligence would eat the world.” While that may have been an exaggeration, “this has really come to pass. AI is taking over all different areas of software,” said David Cox, MIT-IBM Watson AI Lab co-director and IBM VP, at the Lab’s recent Industry Showcase.

Here, member companies of the Lab came to discuss current projects in the morning, learn about the Lab’s latest research, as well as hear about the recent explosion of generative models and how the Lab’s leadership plans to navigate this new era of technology.

Eating the world

“The guiding principle of the Lab has always been that we’re focused on next generation AI. What’s coming next?” said Cox, kicking off the afternoon’s directors’ briefing and panel discussion. “And how do we empower IBM clients, and particularly the member companies who decided to join us on this journey, with the latest technologies to address the problems you’re facing businesses? And more broadly, how do we address the problems in society?” During his career, Cox has seen excitement rise around computer vision and image classifiers to machines beating humans at games. Over time, interest in prediction and forecasting of data has given way to image re-rendering and text-to-image generation. While this technological progress is astounding and moving quickly, it somehow feels normal, said Cox. Further, this research extends beyond art and fun to make a real impact in business and exceed human capabilities to solve longstanding problems in science and engineering. As examples, Cox pointed to a joint human-AI design for a new kind of power converter circuit out of IBM and the linkage problem that the Lab is tackling, in addition to large language models and generative text.

“Everyone now can sort of see and touch some of these generative AI technologies, and that’s caused a lot of excitement,” said Cox. “That, in turn, is causing sort of a flywheel effect where the field is moving faster and faster every day.”

Unsurprisingly, businesses are also looking to leverage the latest in foundation and generative models, Cox said, but, “the problem with automation is that it’s too labor intensive,” referring to data gathering and cleaning, human and compute resources, model designing, etc. To assist, the Lab and IBM are taking a holistic viewpoint and a domain agnostic approach to data, compute, architecture, algorithms, and innovation, which includes the Blue Pile and curated datasets, Vela cloud-based computing, eco-friendly innovation, new architectures, and release of various families of models in different sizes and specializations for different domains to maximize “efficiency and enterprise value.”

“Now, when we say generative AI and foundational model, these are actually the techniques, that are going to impact almost everything we do in computing in AI,” said Aude Oliva, transitioning into a high-level rundown of the Lab’s research portfolio. Highlights included prompt-tuning for better performance, learning to grow large foundation models from smaller ones, trustworthy AI, inverse design and synthetic data for science applications, visual hallucinating to improve text translations, and generating machine memorable prompts. “So, there are quite a bit of good ideas from neuroscience and cognitive science that are making their way into the modern way of doing research in the Lab,” said Oliva.

This work is bolstered by the MIT Stephen A. Schwarzman College of Computing’s mission of driving advancement in AI research and education, the infusion of AI with disciplines across MIT, and the consideration of social and ethical considerations in AI, shared Dean Dan Huttenlocher, who is tasked with realizing this three-fold mission, as well as attracting and developing talent. He pointed out that an education or research in “computing” used to mean hardware, then it meant hardware and software, and now it’s hardware, software, algorithms, and AI. Further, “the demand for a computer science (CS) bachelors now is about four times what it was 25 years ago; it’s about 40,000 a year,” said Huttenlocher. That still doesn’t stack up to the demand for business undergraduate degrees, “but in technical fields, it’s become an overwhelming need.” More importantly, academia and industry are seeing that recent research results in CS and AI are increasingly impacting other disciplines and fields, so there’s a strong need to infuse and blend computing into both education and research in almost every area.

During a panel discussion, Oliva, Cox, and Huttenlocher tackled questions around large models and compute access, academic-industry partnerships, and sustainability. At this point, larger models and more compute are driving innovation, particularly in generative AI, but there are plenty of examples where smaller models with better data and training are outperforming them, noted Cox. So, when it comes to the interaction between academia and industry, Huttenlocher said that competition “doesn’t make sense,” but rather academia’s role can help to drive model development in scientific discovery, as well as in core areas such as smaller models and the development of more efficient larger models. Oliva pointed out that an advantage for graduate students working with the Lab is that they can access their own cloud cluster and larger models, but even if they didn’t, their creativity still propels them forward, so she’s not concerned. “They can draft code to reduce the size of the model and they’ll come up with a solution that’s publishable,” said Oliva. Further, academia sits in a sweet spot where students can reconstruct and innovate on industry models — a non-commercial advantage not afforded to industry competitors, said Cox, and just knowing that it’s possible to build or engineer a particular model is a huge part of the battle. Huttenlocher shared that it can be useful to view large language models as lossy compression algorithms trained on massive amounts of text, which thus have a certain lack of precision which is both important to their capabilities to generalize and summarize but also can lead to inaccuracies. Cox noted that the world’s energy budget won’t allow for generative models and the field to continue to grow, so there’s something to be said about integrating domain expertise, training on clean datasets, and downsizing models with increased fidelity. Further, humans learn language in a multimodal, interactive way, with three orders of magnitude less compute, and we run on sandwiches, Cox quipped. So, by threading the academia-industry needle, the Lab can leverage the best of both worlds for technological advancement.

“The world is catching up with us [the Lab],” said Huttenlocher, “seriously, I mean, I think we’re going to need a lot more partnerships like this, the world is going to need them.”

Attendees briefly recessed for a tour of demos around the Lab.

Generating trustworthiness and explainability without data

For businesses, like the Lab’s member companies, to deploy machine learning, researchers in the Lab understand that they need to know what’s happening under the hood of these models. During the final session, Aleksandra Mojsilovic, head of foundations of trustworthy AI at IBM Research, and Antonio Torralba, Lab researcher and Delta Electronics Professor of Electrical Engineering and Computer Science at MIT, provided a peek into the way the Lab is addressing and designing these aspects of models.

While there’s no shortage of headlines laying out instances of models performing bad and malicious behaviors, hallucinating false information, or even gaslighting users, IBM has a roadmap and strategy to address these. After cataloging possible model shortcomings in fairness, accountability, transparency, and ethics, Mojsilovic provides a multipronged approach that’s built on risk identification, assessment, intervention, and review, which includes new architectures, prefiltering of data with the Lab’s equi-tuning work to reduce output issues around race and gender, incorporating insights from neuroscience like spiking neurons that operate more like the human brain, data point attribution to their influence on the output, prompt tuning. Together, Mojsilovic said, “that’s basically like a toolbox of various mitigators, and defenders and safeguards that the developers can then use to make better applications.”

Torralba considers that the problem may be with the data itself that’s used to train massive models. “We are just trying to see how far we can go with more data every single time,” said Torralba. “This is just because we are addicted to data. And one of the goals that we had was let’s remove the data.” His computer vision research team tackled the problem by first trying to understand how models work and how the units within them are stimulated. With some minimal annotation, Torralba was able to uncover model units which were excited by lit lamps and faces. Since annotation is labor intensive, Torralba incorporated a language model to provide image descriptions. “We know that these units, when they are trained to generate images, they seem to have some interpretable structure inside that is not perfect,” but they do have some representation of what they’re generating, said Torralba. With a generative adversarial network (GAN), Torralba found that with minimal annotation, the model could generate a synthetic image, for instance, of a car with appropriate labels, even of car parts. However, these steps still required some labels and annotation.

In order to try to remove the need for data for training, Torralba turned to noise that mimicked natural noise structures found in real images. Using unsupervised contrastive learning, Torralba layered image noise transformations on top of random model weights to try to improve training and output performance. These included dead leaves, the magnitude of image Fourier transform, the histogram of the gradient, and shaders (small code that generate synthetic images and video). These innovations brought the team close to accuracies achieved on real data, demonstrating what creative thinking and leveraging properties inherent to natural images can do. Torralba sees medical imaging as a prime target for this kind of work, but thinks it’s not limited to that: “In the evolution of datasets, we’re going from small datasets to very, very large datasets, and now probably, hopefully, at some point, we’ll go back to the datasets that are very, very small again,” said Torralba, “but only the truth lives in between two things.”