Year: 2025

ADCx Gather Welcome Address

The ADCx Gather Welcome Address will warmly welcome you to the conference, providing an overview of the exciting events and sessions planned throughout the day. It will outline the schedule, key activities, and important moments to look forward to, ensuring you are well-prepared for the day's proceedings. Additionally, the address will include a brief explanation of the different online systems that will be utilized during the ADCx Gather event, ensuring you are familiar with the platforms and tools that will enhance your participation and engagement.

Filed under: Uncategorized

Virtual Venue Opens/Networking In Gather

Our virtual venue hosted on Gather Town opens! Connect ahead of time to test things out and get familiar with the online conferences systems before the welcome address as well as chat, socialize and interact with other attendees through a dynamic video chat system. Explore the venue, interact and have fun!

Filed under: Uncategorized

Python Templates for Neural Image Classification and Spectral Audio Processing

This presentation introduces two open-source research frameworks for neural image classification and spectral audio processing: (1) the Lightning Hydra Template Extended (LHTE) and (2) the Neural Spectral Modeling Template (NSMT). The LHTE extends the widely used PyTorch Lightning + Hydra template with state-of-the-art architectures (CNNs, ConvNeXt, EfficientNet, Vision Transformers) and expanded dataset support, adding CIFAR-10, CIFAR-100, and a new generalized Variable Image Multi-Head (VIMH) format. VIMH accommodates extremely large image/channel dimensions, multi-head tasks, and supports both classification and regression from a single shared backbone. The LHTE also provides reproducible benchmark experiments, and systematic workflows for rapid model comparison.

Built upon the LHTE, the NSMT specializes in spectral audio modeling, where stacked spectrograms and other 2D audio representations serve as image-like inputs. By leveraging the perceptual inductive priors of human hearing, the NSMT avoids the computational expense of end-to-end waveform modeling while maintaining high accuracy. Applications include synthesizer parameter estimation (tested on sawtooth oscillators, and Moog VCFs with ADSR envelopes), instrument recognition, and real-time effect control. NSMT emphasizes small, efficient architectures, extended spectral representations, auxiliary conditioning inputs, and enhanced VIMH support for audio-specific datasets.

Together, the LHTE and NSMT form robust, reproducible platforms for advancing machine learning research at the intersection of vision and audio. Code, datasets, and other resources are available online for immediate adoption.

Filed under: Uncategorized

Helicopter View of Audio ML

Audio machine learning can seem overwhelming: so many model types, representations, and tasks - and no clear map. This session provides a structured overview to help you make sense of it all and build intuition for how different parts fit together.

We begin by looking at what models are actually used for. Tasks such as classification, transcription, generation, and transformation shape not just the training targets, but the flow of data between modalities - audio-to-text, text-to-audio, audio-to-audio, and so on. These task-modality pairs define the shape of the problem and, by extension, influence what types of models are suitable.

Once the task and modality framing is in place, we examine how audio is represented inside models - waveforms, spectrograms, tokens - and what these formats enable or constrain.

Then we turn to model architectures themselves: CNNs, RNNs, Autoencoders, U-Nets, Transformers and hybrids, and a quick look at Diffusion techniques. Each comes with its own structure, computational properties and tradeoffs. Examples provide concrete illustrations of the concepts.

The focus is conceptual: a clean overview to clarify the terrain, not dive into implementation. A starting point for navigating the audio ML space with purpose.

Filed under: Uncategorized

Designing an Audio Live Coding Environment

Live coding is the practice of using code as a medium for visual or (in this case) musical expression and performance in real time. Code represents a uniquely flexible way of expressing musical structures and ideas, unconstrained by the fixed architectures of GUI-based tools such as DAWs.

Over the past year, I’ve been developing a new live coding environment called ohm. In doing so I’ve encountered a number of interesting design questions that I’ll discuss in this talk, such as:
- What live coding languages and audio programming DSLs exist, and what are their respective strengths and limitations, including the trade-offs between visual and textual programming
- How to design a domain-specific language (DSL) that describes audio graphs and musical structures in an expressive yet readable way
- How different programming paradigms—imperative, functional, declarative, and dataflow—and syntaxes map onto musical systems and structures
- The technical challenges of building an audio engine that supports arbitrary edits to its graph while supporting real-time, interruption-free playback

Filed under: Uncategorized

Free-Range Users Make for More Profitable DAWs

Interchange formats for DAWs have a long history but an uneven track record. The CMX EDL format emerged in the early 1970s, AES31 has been around since 2001, and AAF and OMF provide ways to transfer project data between DAWs and NLEs. More recently, Bitwig’s DAWproject format shows that this area is still evolving. Yet despite the available standards, implementing interchange often remains a lower priority for many DAW developers compared to other features.

In this talk, we’ll explore why robust interchange support should be seen as a strategic feature rather than an afterthought. We’ll examine the technical and business benefits of enabling users to move sessions between tools, including increased user retention, enabling niche workflows, and added value even in the early stages of a DAW’s development.

Finally, drawing on real-world examples (including curious bugs encountered while working with AES31 exports) we’ll cover why a poorly implemented interchange format can do more harm than good, and share practical tips for avoiding common pitfalls. Attendees will gain a clearer understanding of both the technical challenges and the potential rewards of opening their DAWs to “free-range” users.

Filed under: Uncategorized

The Immersive Score

Much of the attraction of Dolby Atmos is the scalability of the technology, and the ability for ADM masters to be compatible with a variety of scenarios. This is especially true for sound-to-picture, and the realisation of a single delivery format being able to address a range of experiences, from cinema to games consoles to mobile phones. In particular, mixers for streaming services have seen the benefit of expanded creative freedom and simplified deliverables. However, there are key creative and technical factors regarding the behaviour of Beds and Objects, and their associated metadata based on each scenario, that should be considered. This talk draws on the practice of film and game score mixing and production to illustrate the benefits of both Beds and Objects for different purposes within the same mix, creative expression through binaural metadata, and includes links to prepared media examples.

Filed under: Uncategorized

Sound Over Boilerplate

Complex programming languages, build tools, and code signing processes create barriers that prevent musicians and sound artists from developing audio plugins.

Phausto addresses this challenge as a free, open-source DSL built on Pharo Smalltalk's accessible syntax and integrated IDE. It integrates the Faust compiler for direct access to professional-grade oscillators, filters, and effects, while seamlessly exporting to Cmajor patches for rapid plugin deployment.

This session introduces Phausto fundamentals and demonstrates workflows for shipping Cmajor-based plugins using free tools. We'll explore distribution models that enable sound artists to independently share and monetize their creations.

Filed under: Uncategorized

Audio Codec Switching in the Linux Kernel for Automotive Edge Devices

In order to maximize battery life and guarantee safety-critical audio delivery, power-constrained automotive edge devices need to seamlessly switch between audio codecs (for example, Opus for high-quality infotainment to G.729 for low-power emergency calls). Because the Linux kernel's ALSA sound core does not have runtime codec switching, real-time audio is disrupted by >100 ms dropouts and 10-15% CPU spikes. This talk presents a kernel-level framework that achieves <2 mW power overhead and <15 ms latency, which is not possible with some automotive solutions. We address the low-power automotive audio challenges by introducing a new ALSA control interface, a power-aware codec scheduler, and triple-buffering in SoC drivers, which are improved by device tree profiles.

Filed under: Uncategorized