Mark Cartwright Uses AI to Give Non-Verbal Performers a Voice in New Opera
Art is rooted in thought and emotion. The greatest music and drama allow viewers and performers alike to feel joy, anger or sadness as a shared communicative experience. But how does someone with no voice “sing” in an opera?
Assistant Professor Mark Cartwright, who leads the Sound Interaction and Computing (SinC) Lab in the Ying Wu College of Computing, and a collaborative team of researchers have innovated a breakthrough in text-to-speech technology that not only gives voice to a non-verbal character and the performer playing him but allows him to sound fully and expressively human.
An expert at the intersection of human-computer interaction and applied machine learning for audio and music, Cartwright is part of a creative team bringing Sensorium Ex, a trailblazing opera that alternately questions and pioneers the use of AI as part of the story of a mother and son in a dystopian world in which corporate greed and lack of empathy figure largely.
Cartwright and his collaborator, Luke DeBois, co-chair of NYU Tandon’s Department of Technology, Culture and Society, had previously worked together in NYU’s Music and Audio Research Lab and Tandon’s Center for Urban Science & Progress. Debois is also on the faculty of the NYU Ability Project, which focuses on assistive technology and accessible design. Together, Cartwright and DeBois have a shared vision for using audio and technology to improve the lives of people with disabilities.
“I enjoy working on problems industry neglects, especially when they can impact accessibility,” said Cartwright.
Sensorium Ex was created over a seven-year period by composer Paola Prestini and librettist Brenda Shaughnessy, a Rutgers-Newark professor whose son is non-verbal, along with directors Jerron Herman and Jay Scheib, and added technology by Cartwright, Dubois, and Eric Singer, who engineered the Ther’minator, which allows the performer to control the sound without physical contact.
The opera premiered in Omaha, Nebraska in May 2025, and “fuses 3D sensing, artificial intelligence, disability solutions, operatic talent and artistic creativity to deliver a groundbreaking performance,” according to Forbes.
Pre-existing text-to-speech technology, most recognized through its use by Stephen Hawking, has traditionally sounded robotic. Cartwright, as someone who is energized by the creative side of AI, dedicated himself to exploring how to make the mechanical sound human by using the performers themselves as source material.
Cartwright and his team, which included Ph.D. candidate Danzel Serrano M.S. ‘22 and Michael Clemens Ph.D. ‘25, began by recording the natural vocalizations of the two actors who would play the role of the son: Kader Zioueche, who has cerebral palsy, and Jakob Jordon, who has autism and apraxia, a condition that severely impacts speech.
The team then combined the use of a text-to-speech synthesizer and a specialized neural vocoder (a type of neural network used in speech synthesis to convert audio to low-dimensional acoustic features and back again) and extrapolated, via AI, how to transform the libretto into the vocal style of the actors, with real-time control over the prosodic features of the speech to express a range of emotions and represent the personality of the character or performer.
Cartwright called the process “disentangling” phonemes to determine content, pitch, timbre, and loudness.
Researchers from the University of Illinois Urbana-Champaign and Northwestern University also collaborated as part of the AI team’s massive undertaking of bringing the entire opera fully to life.
The final step of providing a way for the performers to activate and speak through the technology was accomplished by Singer’s Ther’minator, which transforms Light Detection and Ranging (LiDAR) sensing signals into distance data and transmits them using WiFi. This device enabled the actors to control nuances of their speech with rudimentary hand motions. The sensors are low-cost and similar to the technology used in self-driving cars to sense distance and obstacles.
The technology team envisions extending their work beyond the stage to enhance the experiences of all people with developmental, intellectual and physical disabilities and allow them to play a leading real-life role among their peers throughout society.
Cartwright is currently engaged in developing an AI system that can automatically identify and caption meaningful non-speech sounds for online videos through an $800,000 National Science Foundation grant in partnership with Sooyeon Lee, assistant professor in the Department of Informatics at NJIT, and researchers at NYU over the next three years.