-- @audiodevcon​

Singing Synthesis Beyond Human-Level Naturalness: Not What You Think - Kanru Hua - ADC 2023

Achieving human-level naturalness is often viewed as the pinnacle of vocal synthesis research. While recent advances in Text-to-Speech (TTS) using deep generative models has reported subjective ratings comparable to human speech, singing synthesis hasn't reached this milestone. In this presentation, we showcase a singing synthesis system that, intriguingly, exceeds raw recordings in comparative mean opinion score tests—with statistical significance. However, as we delve deeper, we highlight the subtle but crucial differences between true human parity and competitive ratings in subjective tests, challenging our understanding of “naturalness” in this domain. We will also unpack the complexities of subjective quality evaluation, the unique challenges posed by singing versus speech, and shed light on the implications these findings hold for future designs of singing synthesis systems.

Link to Slides:

Kanru Hua

Kanru Hua founded Dreamtonics (developer of Synthesizer V) in 2019, after dropping out of University of Illinois. A self-taught programmer and researcher, Kanru has been focusing on bridging speech signal processing algorithms with the latest advances in generative models, as well as addressing the production challenges of deploying neural networks for audio processing. He was nominated for Forbes JAPAN 30 UNDER 30 in 2022.

Streamed & Edited by Digital Medium Ltd:

Organized and produced by JUCE:

Special thanks to the ADC23 Team:

Sophie Carus
Derek Heimlich
Andrew Kirk
Bobby Lombardi
Tom Poole
Ralph Richbourg
Jim Roper
Jonathan Roper
Prashant Mishra

#adc #audiodev #dsp #audio #audiotech