AI Generated Voices: Towards Emotive Speech Synthesis – Vibhor Saran – ADCx India 2024

[[Wednesday May 1, 2024]]

By digitalmedium1

Join Us For ADC24 - Bristol - 11-13 November 2024
More Info: https://audio.dev/
@audiodevcon

AI Generated Voices: Towards Emotive Speech Synthesis - Vibhor Saran - ADCx India 2024

Traditionally, machine generated voices were synthesised by joining the phonemes of any language, which made these voices robotic in nature. With the availability of more data and advent of deep learning, these AI voices started becoming more human and engaging. The next step is to make these AI generated voices more emotive so that it can laugh, be sad or even cry just like how expressive human speech is. In this talk, we touch base upon deep learning approaches to make synthetic voices more emotive. Specifically, we will focus on how to manipulate the Mel Spectrogram of the speech to make it engaging, removing the dependency of large quantums of data.

Link to Slides: https://data.audio.dev/talks/ADCxIndia/2024/towards-emotive-speech-synthesis.pdf
_

Edited by Digital Medium Ltd - online.digital-medium.co.uk
_

Organized and produced by JUCE: https://juce.com/
_

Special thanks to the ADC24 Team:

Sophie Carus
Derek Heimlich
Andrew Kirk
Bobby Lombardi
Tom Poole
Ralph Richbourg
Prashant Mishra

#adc #ai #dsp #audio #speechsynthesis

Previous:Legacy Code in C++ for the Learning Engineer - José Díaz Rohena - ADC23

Next:Audio Technology Industry Standards - the Agony and the Ecstasy - Angus Hewlett - ADC23