Real-Time Inference of Neural Networks

A Practical Approach for DSP Engineers – Part II

14:00 - 14:50 | Wednesday 13th November 2024 | Bristol 3

Intermediate

Continuing our exploration of implementing neural network inference for real-time audio applications, we have expanded from our initial plugin example to a comprehensive library that simplifies the deployment and integration of neural networks in audio applications.

In this talk, we discuss various aspects of our implementation. Since it is crucial to know whether inference engines exhibit real-time violations, we first quantify real-time violations within inference executions. Subsequently, we explore the integration of these engines in real-time audio environments, specifically addressing the challenges of running multiple instances simultaneously. To accomplish this, we use a static thread pool and, when available, host-provided threads. We also focus on strategies for achieving the lowest possible latency, presenting techniques we have implemented, and open a dialogue on a controversial approach to further reduce latency. Moreover, we share our findings on the performance impact of various factors on inference runtimes. To this end, we have extensively benchmarked different neural network architectures across different inference engines, and can show how differences in input buffer size and model size, as well as previously executed inferences, affect the overall performance.

View Slides

Fares Schulz

Researcher

Technische Universität Berlin

I am a researcher in the Computer Music and Sound Synthesis Team, part of the Audio Communication Group at the Technische Universität Berlin. At present, my particular interest lies in the exploration of novel applications of neural networks for creative audio effects and synthesis, especially in the real-time and mixed-signal domains. Considering neural networks as a tool rather than a one-size-fits-all solution, I am researching how to make them available alongside long-established methods such as DSP algorithms and analog circuitry. I am also working on spatial audio (multi) systems, which require clustered audio servers due to their high computational cost.

Currently in the final stages of my Master's degree in Audio Communication and Technology, my educational background includes two Bachelor's degrees in Physics and Audio Engineering. Throughout this time, it is my passion for electronic music production that has taken me from theoretical mathematical equations and abstract artistic concepts to their development as algorithms and analog circuits. I am always looking for new ways to combine my interests in music, technology, and science, and love to chat with others who share these passions.

Valentin Ackva

Audio Software Developer

INSONE

I am an audio programmer and electronic musician based in Berlin. With a background in computer science, I'm currently working towards my master's degree in Audio Communication and Technology at the Technische Universität Berlin. My passion lies at the intersection of music, programming, and technology, especially where artistry meets innovation.

For the last 4 years, I have been working as an audio software developer at a speech processing startup in Leipzig. At my position there, I am responsible for the development of audio effects for speech enhancement. This role includes research into the real-time implementation of state-of-the-art neural networks for tasks such as denoising, audio super-resolution, and dereverberation.

Last year, I have co-founded a collective that combines the fields of DSP and AI, bringing together a group of audio programmers, machine learning engineers, and artists based in Berlin. In March, we released our first software, "Scyclone". An audio plugin that utilizes neural timbre transfer technology, introducing a new approach to automatic layering. Scyclone's innovative design and interaction of DSP and AI led to it winning the Audio Plugin Competition organised by the Audio Programmer.