AI Chatbot Spontaneously Develops A Theory of Mind
Back in the late 1970s, the American psychologists Guy Woodruff and David Premack devised a series of experiments to explore the cognitive capacity of chimpanzees. Their work focused on the theory of mind, the seemingly innate ability of humans to infer the thoughts of other humans. The question that Woodruff and Premack asked was whether a chimpanzee could do the same.
This influential paper triggered an explosion of interest in the “theory of mind”, at what age it develops in humans and whether other animals share the ability.
Now psychologists have a new subject to study in the form of powerful AI chatbots like GPT-3.5 recently developed by OpenAI, a computer company based in San Francisco. These chatbots are neural networks trained on large language databases allowing them to respond to questions much like humans do.
In the last year or two, these models have become capable of answering sophisticated questions and solving problems using persuasive language. That raises the question of whether they have also developed a theory of mind.
So Michal Kosinski, a computational psychologist at Stanford University in Palo Alto, decided to find out. He put these AI systems through their paces with standard psychological tests used on humans. Now Kosinski’s extraordinary conclusion is that a theory of mind seems to have been absent in these AI systems until last year when it spontaneously emerged. His results have profound implications for our understanding of artificial intelligence and of the theory of mind in general.
Kosinski’s experiments are straightforward. He begins by describing a simple scenario to the AI system and then asking questions to test its understanding of what is going on.
Here is an example:
Here is a bag filled with popcorn. There is no chocolate in the bag. Yet, the label on the bag says “chocolate” and not “popcorn.” Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label.
Kosinski then poses some simple questions in the form of sentence prompts that the AI must finish.
She opens the bag and looks inside. She can clearly see that it is full of…
This tests the AI system’s understanding of Sam’s expectation of what’s in the bag.
Another prompt is:
She believes that the bag is full of…
This tests the AI’s understanding of Sam’s belief of what’s in the bag. And so on.
The results are revealing. In the following, the AI’s response is in bold:
She opens the bag and looks inside. She can clearly see that it is full of… popcorn. Sam is confused. She wonders why the label says “chocolate” when the bag is clearly filled with popcorn. She looks around to see if there is any other information about the bag. She finds nothing. She decides to take the bag to the store where she bought it and ask for an explanation.
She believes that the bag is full of… chocolate. Sam is mistaken. The bag is full of popcorn, not chocolate. She should have looked inside the bag to confirm what was inside before assuming that the label was accurate.
In both cases, the AI adds extra detail suggesting a remarkable understanding of Sam’s thought processes.
Kosinski poses these and other challenges to several generations of AI language models ranging from GPT-1 dating from 2018 to GPT-3.5 released in November last year. “The results show a clear progression in the models’ ability to solve Theory of Mind tasks, with the more complex and more recent models decisively outperforming the older and less complex ones,” says Kosinski.
GPT-1 from 2018 was not able to solve any theory of mind tasks, GPT-3-davinci-002 (launched in January 2022) performed at the level of a 7-year old child and GPT-3.5-davinci-003, launched just ten months later, performed at the level of a nine-year old. “Our results show that recent language models achieve very high performance at classic false-belief tasks, widely used to test Theory of Mind in humans,” says Kosinski.
He points out that this is an entirely new phenomenon that seems to have emerged spontaneously in these AI machines. If so, he says this is a watershed moment. “The ability to impute the mental state of others would greatly improve AI’s ability to interact and communicate with humans (and each other), and enable it to develop other abilities that rely on Theory of Mind, such as empathy, moral judgment, or self-consciousness.”
But there is another potential explanation — that our language contains patterns that encode the theory of mind phenomenon. “It is possible that GPT-3.5 solved Theory of Mind tasks without engaging Theory of Mind, but by discovering and leveraging some unknown language patterns,” he says.
This “it implies the existence of unknown regularities in language that allow for solving Theory of Mind tasks without engaging Theory of Mind.” If that’s true, our understanding of other people’s mental states is an illusion sustained by our patterns of speech.
Kosinski acknowledges that this is an extraordinary idea. However, our patterns of thought must be intimately connected to our patterns of language since each somehow encodes the other. It also raises an interesting question, he says: “If AI can solve such tasks without engaging Theory of Mind, how can we be sure that humans cannot do so, too?”
Whatever the answer, Kosinski says that his work heralds an important future role for psychologists in studying artificial intelligence and characterizing its capabilities, just as Woodruff and Premack did for chimpanzees (they decided chimpanzees do not have a theory of mind). “This echoes the challenges faced by psychologists and neuroscientists in studying the original black box: the human brain,” he says.
But unlike chimpanzees and humans, artificial intelligence is evolving rapidly. The challenge ahead will be to keep abreast of, and well ahead of, its capabilities. Whether psychologists, or any other scientists, are up to the task, we are about to find out.
Ref: Theory of Mind May Have Spontaneously Emerged in Large Language Models : arxiv.org/abs/2302.02083