Helicopter View of Audio ML

18:20 - 18:40 UTC | Friday 26th September 2025 | ADCx Gather

Beginner

Audio machine learning can seem overwhelming: so many model types, representations, and tasks - and no clear map. This session provides a structured overview to help you make sense of it all and build intuition for how different parts fit together.

We begin by looking at what models are actually used for. Tasks such as classification, transcription, generation, and transformation shape not just the training targets, but the flow of data between modalities - audio-to-text, text-to-audio, audio-to-audio, and so on. These task-modality pairs define the shape of the problem and, by extension, influence what types of models are suitable.

Once the task and modality framing is in place, we examine how audio is represented inside models - waveforms, spectrograms, tokens - and what these formats enable or constrain.

Then we turn to model architectures themselves: CNNs, RNNs, Autoencoders, U-Nets, Transformers and hybrids, and a quick look at Diffusion techniques. Each comes with its own structure, computational properties and tradeoffs. Examples provide concrete illustrations of the concepts.

The focus is conceptual: a clean overview to clarify the terrain, not dive into implementation. A starting point for navigating the audio ML space with purpose.

View Slides

Martin Swanholm

CTO

Hindenburg Systems

Martin is a software developer and DSP engineer with over 30 years of experience, currently focusing on practical, real-world applications of machine learning in audio. His work emphasizes getting the most out of available hardware and compute resources, ensuring solutions are efficient and accessible to a wide range of users. He is currently developing effective tools for audio restoration, like phase-coherent frequency-domain models and multi-task learning models that improve speech off-line or interactively in real time.

Martin’s journey in digital audio began in the 1990s, and over the years, he’s worked on everything from basic signal processing to full multimedia systems. His approach is rooted in pragmatism—using techniques that work, whether simple or advanced, to solve real problems.

Martin excels at breaking down complex concepts into clear, actionable steps, making his presentations valuable for beginners looking to understand audio processing with machine learning. He’s committed to showing how practical, tried-and-true methods can yield strong results without requiring cutting-edge hardware or expertise, making his sessions approachable for all skill levels.