How it works


head_transparent_background

To build custom crafted voices, we extract properties from a target talker’s disordered speech (whatever sounds the target talker can produce) and apply these features to a synthetic voice that was created from a surrogate voice donor that resembles the target talker in age, size, sex, etc. The result is a synthetic voice that contains as much of the vocal identity of the target talker as possible, and the speech clarity of the surrogate voice donor.

Below are some sound files that guide you through information about speech production and our vocaliD process.

1. VOCAL SOURCE: The larynx (a.k.a voice box) creates a buzzing sound that we call the vocal source. Every individual has a unique source that reflects their anatomy and physiology. Here is an example:



2. VOCAL FILTER: The source is pushed through the rest of the vocal tract (also called filter – the chambers of your head and neck) which changes shape to form consonants and vowels.

Here is an example of how the same source sound is transformed into a vowel for a given speaker:



3. MORPHING A VOICE: Some people are unable to manipulate their vocal filter due to neuromotor speech impairment. Many people, however, can still vocalize and have some residual control of their vocal source. For these individuals, the VocaliD process takes natural speech from a surrogate donor (someone who is anatomically similar to the target talker) and combines it with the vocal source of the target talker to create an understandable yet unique voice.

Here is a sentence produced by a surrogate donor (a neurologically intact talker):



Here is the same sentence but now with the target talker’s source features:



4. CREATING A PERSONALIZED SYNTHETIC VOICE: To build a customized synthetic voice, we need to record several thousand sentences from a surrogate talker. Then we morph the surrogate’s natural speech to infuse the target talker’s source characteristics just as in the example above. Finally, these morphed recordings are used to create a synthetic voice that is based on a concatenative synthesis technique called unit selection. The result is a voice that is as clear as the surrogate, but similar in identity to the target talker. This customized voice allows the target talker to say any sentence (even novel sentences that were not pre-recorded by the surrogate) in his/her own voice!

Here is an example of a personalized voice for the young woman with severe speech impairment:



More technical details…

This work leverages the source-filter theory of speech production (Chiba & Kajiyama, 1941; Fant, 1960) which assumes that the vocal “source” (generated by the vocal folds for voiced segments) and the “filter” (the resonant properties due to the tube-like structure of the vocal tract) are independent in their contributions to the acoustic output. Empirical studies have shown that both source (Carrell, 1984; Prasanna, Gupta, & Yegnanarayana, 2006) and filter characteristics (Itoh, 1992; Remez, Fellowes, and & 1997; Lavner, Gath, & Rosenhouse, 2000) contribute to talker identity. Despite unintelligible speech due to impaired motor control of the speech filter, many “non-speaking” individuals are able to vocalize and control source characteristics such as their pitch, loudness and aspects of voice quality. Thus, our VocaliD approach captures as much of the residual source features as possible from the target talker and combines them with the filter characteristics of an anatomically approximate surrogate talker to generate a clear understandable voice that also reflects the vocal identity of the target talker.