Start talking to Ellie Pavlick about her work, which seeks to find evidence of understanding within large language models (LLMs) and makes it seem as though she’s making fun of it. The term “hand-flowing” is a favorite, and if she mentions “meaning” or “reasoning,” it’ll often come with visible atmosphere estimates. This is just Pavlick’s way of keeping herself sincere. She is aware that accepting healthy idiomatic natural mushiness is the only way to take it seriously as a computer scientist who studies language versions at Brown University and Google DeepMind. “This is a scientific discipline — and it’s a little squishy,” she said.
Since she was a child, Pavlick has enjoyed math and science but has always been viewed as more of a creative type. Since then, efficiency and nuance have coexisted in her world. She studied economics and clarinet performance as an academic before going on to pursue a degree in computer science, a field in which she still feels alien. “There are a lot of people who think intelligent systems will look a lot like computer code: neat and conveniently like a lot of systems we’re good at understanding,” she said. “I really believe the responses are complicated. If I have a remedy that’s easy, I’m pretty sure it’s bad. And I don’t want to be wrong.”
Pavlick began her graduate work investigating how laptops could express semantics, or signifying, in speech, after a chance meeting with a computer professor who happened to work in natural language processing. “I think it scratched a particular itch,” she said. “It dips into idea, and that fits with a lot of the things I’m currently working on.” Today, one of Pavlick’s key areas of research focuses on “grounding” — the issue of whether the meaning of words depends on things that exist independently of speech itself, such as visual perceptions, social contacts, or even other ideas. Language models are completely trained in text, which gives them a useful starting point for understanding how meaning can be grounded in. However, the subject has preoccupied linguists and other thinkers for decades.
“These are not only ‘technical’ problems,” Pavlick said. “Language is so huge that, to me, it feels like it encompasses everything.”
Quanta spoke with Pavlick about making science out of philosophy, what “meaning” means, and the importance of unsexy results. The interview was condensed and edited to make it more concise.
What does “understanding” or “meaning” mean, empirically? What, specifically, do you look for?
We made the decision to involve concepts in some way when I started my Brown research program. Although it seems obvious, I am aware that this is a theoretical commitment that not everyone makes. If you use the word “apple” to mean apple, you need the concept of an apple. Whether or not you use the word to refer to it, that has to be a thing. That’s what it means to “have meaning”: there needs to be the concept, something you’re verbalizing.
In the model, I want to find ideas. I need evidence that within the neural network there is a thing internally that consistently refers to “apple” by the same word. Because it appears to be an internal structure that isn’t random and arbitrary. These little well-defined function nuggets that can be reliably found do something.
I’ve been focusing on characterizing this internal structure. What form does it have? It can be a subset of the weights within the neural network, a linear algebraic operation over those weights, or a geometric abstraction. But it has to play a causal role in the model’s behavior: It’s connected to these inputs but not those, and these outputs and not those.
That sounds like something you could start referring to as “meaning.” It’s about figuring out how to create relationships and find this structure so that once we have it all in place, we can use it to answer questions like “Does it know what the name ‘apple’ means”?
Have you found any examples of this structure?
Yes, one result involves when a language model retrieves a piece of information. If you ask the model, “What is the capital of France,” it needs to say “Paris,” and “What is the capital of Poland” should return “Warsaw.” It very readily could just memorize all these answers, and they could be scattered all around within the model — there’s no real reason it needs to have a connection between those things.
Instead, we discovered a small area in the model where it basically boils that connection down to a single, small vector. If you add it to “What is the capital of France,” it will retrieve “Paris,” and that same vector, if you ask “What is the capital of Poland,” will retrieve “Warsaw.” It’s like this systematic “retrieve-capital-city” vector.
That’s a really exciting finding because it seems like the model is distilling these simple ideas and then applying general algorithms to them. And even though we’re looking at these really simple questions, it’s about finding evidence of these raw ingredients that the model is using. In this situation, memorizing would be simpler to get away with because that’s what these networks are meant to do in many ways. Instead, it breaks information down into pieces and “reasons” about it. And we hope that as we develop better experimental designs, we might discover something similar for more challenging kinds of concepts.
How does grounding relate to these representations?
The way humans learn language is grounded in a ton of nonlinguistic input: your bodily sensations, your emotions, whether you’re hungry, whatever. That’s considered to be really important to meaning.
However, there are other grounding concepts that are more closely related to internal representations. There are words that aren’t obviously connected to the physical world, yet they still have meaning. A word like “democracy” is a favorite example. I can think about democracy without talking about it, in your head. So the grounding could be from language to that thing, that internal representation.
But you argue that even things that are more external, like color, might still be anchored to internal “conceptual” representations, without relying on perceptions. How would that work?
Well, a language model doesn’t have eyes, right? It doesn’t “know” anything about colors. So maybe it captures something more general, like understanding the relationships between them. I know that when I combine blue and red, I get purple, those kinds of relations could define this internal grounding structure.
Using RGB codes [strings of numbers that represent colors], we can give an LLM examples of color. If you say “OK, here’s red,” and give it the RGB code for red, and “Here’s blue,” with the RGB code for blue, and then say “Tell me what purple is,” it should generate the RGB code for purple. This mapping should be a good indication that the internal structure the model has is sound — it’s missing the percepts for color, but the conceptual structure’s there.
What’s tricky is that the model could just memorize RGB codes, which are all over its training data. So we “rotated” all the colors away from their real RGB values: We’d tell the LLM that the word “yellow” was associated with the RGB code for green, and so on. The model was effective when it asked for green, and it provided the rotated version of the RGB code. That suggests there is some degree of consistency in its color representations internally. It’s applying knowledge of their relations, not just memorizing.
That’s the whole point of grounding. Mapping a name onto a color is arbitrary. More important are their relationships than their relationships. So that was exciting.
How can these philosophical-sounding questions be scientific?
I recently learned of a thought experiment: What if the ocean swept up onto the sand and when it pulled back, the patterns generated a poem? Does the poem have meaning? That seems super abstract, and you can have this long philosophical debate.
The nice thing about language models is we don’t need a thought experiment. It’s not like, “In theory, would such and such a thing be intelligent”? It’s just: Is this thing intelligent? It becomes scientific and empirical.
Sometimes people are dismissive, there’s the “stochastic parrots” approach. I believe it stems from a concern that people will oversubscribe intelligence from these things, which we do indeed. And to correct for that, people are like, “No, it’s all a sham. This is smoke and mirrors.”
It’s a bit of a disservice. We’ve discovered something that is quite intriguing and relatively new, and it’s important to comprehend it in detail. That’s a huge opportunity that shouldn’t get skirted over because we’re worried about over-interpreting the models.
Of course, you’ve also produced research debunking exactly that kind of over-interpretation.
That work, where people were finding all the “shallow heuristics” that models were exploiting to mimic understanding — those were very foundational to my coming-of-age as a scientist. But it’s complicated. It’s like, don’t declare victory too soon. There’s a bit of skepticism or paranoia in me that an evaluation was done right, even one that I know I designed very carefully!
So that’s part of it: not over-claiming. Another part is that, if you deal with these language model systems, you know that they’re not human-level — the way that they’re solving things is not as intelligent as it seems.
How do you even measure success when so many of the fundamental methods and terms are up for debate in this field?
What I think we’re looking for, as scientists, is a precise, human-understandable description of what we care about — intelligence, in this case. And then we add words to aid us in getting there. We need some kind of working vocabulary.
But that’s hard because then you can get into this battle of semantics. When people say “Does it have meaning: yes or no”? I don’t know. We are talking about the incorrect topic.
What I’m trying to provide is a precise account of the behaviors that were important in explaining. And it is kind of moot at that point whether you want to call it “meaning” or “representation” or any of these loaded words. The point is, there’s a theory or a proposed model on the table— let’s evaluate that.
How can language model research advance toward a more direct approach?
What are the building blocks of intelligence, exactly the kinds of deep questions I would really like to be able to respond to? What does intelligence look like in a person? What does model intelligence resemble? — are really important. However, I believe the things that need to happen in the upcoming ten years are not very seductive.
If we want to deal with these internal representations, we need methods for finding them — methods that are scientifically sound. If it’s done in the right way, this low-level, super in-the-weeds methodological stuff won’t deliver headlines. But that’s the crucial element that will enable us to correctly respond to these complex questions.
Meanwhile, the models are going to keep changing. So there’s going to be a lot of stuff that people will keep publishing as though it’s “the breakthrough”, but it’s probably not. In my mind, it feels too soon to get big breakthroughs.
People are studying these really simple tasks, like asking a language model to complete “John gave a drink to ___________”, and trying to see whether it says “John” or “Mary”. That doesn’t have the feeling of a result that explains intelligence. However, I do believe that the methods we’re using to describe this pointless problem are necessary to answer the deep questions about intelligence.