Researchers at the University of California, San Francisco and the University of California, Berkeley have developed a brain-computer interface (BCI) that has enabled a woman with severe brainstem paralysis to speak through a digital avatar. This is the first time speech or facial expression has been synthesized from brain signals. The system can also decode these signals into text at nearly 80 words per minute, a vast improvement over commercially available technology.
Edward Chang, MD, chair of neurological surgery at UCSF, who has been working on the technology known as brain computer interface, or BCI, for more than a decade hopes this latest research breakthrough, which appears Aug. 23, 2023, in Naturewill lead to an FDA-approved system that enables speech from brain signals in the near future.
“Our goal is to restore a full, embodied way of communicating, which is really the most natural way for us to talk to others,” said Chang, who is a member of the UCSF Weill and Jean Robertson Institute for Neuroscience, Distinguished Professor of Psychiatry. “These advances bring us much closer to making this a real solution for patients.”
Chang’s team previously demonstrated that it was possible to decode brain signals into text in a person who had also experienced a brainstem stroke many years earlier. The current research demonstrates something more ambitious: decoding brain signals in the richness of speech, along with the movements that animate a person’s face during a conversation.
Chang implanted a paper-thin rectangle of 253 electrodes on the surface of the woman’s brain over areas his team found to be critical for speech. The electrodes picked up brain signals that, had it not been for the stroke, would have gone to muscles in her tongue, jaw and larynx, as well as her face. A cable plugged into a port fixed to her head connected the electrodes to a cluster of computers.
For weeks, the participant worked with the team to train the system’s artificial intelligence algorithms to recognize her unique brain signals for speech. This involves repeating different phrases from a 1,024-word conversational dictionary over and over until the computer recognizes the patterns of brain activity associated with the sounds.
Instead of training the AI to recognize whole words, the researchers created a system that decodes words from phonemes. These are the subunits of speech that form spoken words in the same way that letters form written words. “Hello,” for example, contains four phonemes: “HH,” “AH,” “L,” and “OW.”
Using this approach, the computer only needed to learn 39 phonemes to decipher each English word. This both improved the system’s accuracy and made it three times faster.
“Accuracy, speed and vocabulary are critical,” said Sean Metzger, who developed the text decoder with Alex Silva, both graduate students in the joint bioengineering program at UC Berkeley and UCSF. “This is what gives the user the potential over time to communicate almost as quickly as we do and have much more naturalistic and normal conversations.”
To create the voice, the team developed a speech synthesis algorithm that they customized to sound like her pre-injury voice, using a recording of her conversation at her wedding.
The team animated the avatar using software that simulates and animates facial muscle movements developed by Speech Graphics, a company that makes AI-driven facial animation. The researchers created customized machine learning processes that allowed the company’s software to connect with the signals sent by the woman’s brain as she tried to speak and translate them into the avatar’s facial movements, causing the jaw to open and closing, lips coming forward and purse and tongue moving up and down, as well as facial movements of happiness, sadness and surprise.
“We’re making up for the connections between the brain and the vocal tract that were disrupted by the stroke,” said Kylo Littlejohn, a graduate student working with Chang and Gopala Anumanchipalli, Ph.D., professor of electrical engineering and computer science at UC Berkeley. “When the subject first used this system to speak and move the avatar’s face in tandem, I knew it was going to be something that would have a real impact.”
An important next step for the team is to create a wireless version that does not require the user to be physically connected to the BCI.
Enabling people to freely control their own computers and phones with this technology would have a profound effect on their independence and social interactions.”
David Moses, MD, co-author, associate professor of neurological surgery
University of California – San Francisco
Metzger, SL, and others. (2023). A high-performance neuroprosthesis for speech decoding and avatar control. Nature. doi.org/10.1038/s41586-023-06443-4.