Researchers at ԹϱHealth and Weill Cornell Medicine, as part of an expansive, multi-institutional project investigating voice as a biomarker for disease, have reached a significant milestone by publishing the first version of their clinically validated voice dataset to an online artificial intelligence platform where it will be an invaluable resource for researchers across the globe.
The National Institutes of Health-funded project, , seeks to build an ethically sourced AI-enabled database of 10,000 human voices from patients with different illnesses to help doctors diagnose and treat diseases, such as cancer and depression, based on the sound of a patient’s voice.
The initial data release includes more than 12,500 separate recordings from 306 participants across the United States and Canada. The dataset will be published on multiple platforms including s and is available to the community of health researchers studying voice. The release comes at the end of the second year of the four-year $14 million project, with several additional releases scheduled for the next two years. Already among the largest collections of human voices, by the end of the study the repository will become the world’s flagship database for AI voice and health.

Dr. Yaël Bensoussan, ԹϱHealth Morsani College of Medicine
“There is so much information in these first recordings and we are excited to receive feedback on it because what we are developing what will be an unequalled resource for the scientific community,” said Yaël Bensoussan, MD, director of the ԹϱHealth Voice Center and co-lead of the project. “It is really important for us to understand what people can do with this initial data and what kinds of clinical questions they can answer.”
As one of four precision health data projects funded by the , Voice as a Biomarker of Health aims to introduce a transformative new method of diagnosing and treating diseases by training AI models to identify illnesses through changes in the human voice, with vast implications for the clinical setting.
The Թϱ is the lead institution for the project in collaboration with and 10 other institutions across the United States and Canada. Dr. Bensoussan of the ԹϱHealth Morsani College of Medicine, and Olivier Elemento, PhD, director of the Englander Institute for Precision Medicine at Weill Cornell Medicine, are the project’s co-principal investigators.
While previous research utilizing voice and AI to detect disease is encouraging, it has been limited due to the small size of datasets, as well as concerns over data security, ownership and bias. Voice as a Biomarker of Health is addressing that shortcoming by bringing together medical voice, AI engineering and ethics experts to generate a landmark voice database using privacy-preserving AI.

Dr. Olivier Elemento, Weill Cornell Medicine
"Artificial intelligence is revolutionizing our ability to detect and understand disease, and this groundbreaking voice dataset is a monumental step forward in that journey," said Dr. Elemento, who is also a professor of physiology and biophysics at Weill C