VolumetricCondensed

Back To Schedule

Scalable, Efficient Processing and Analysis of Large Audio Datasets

18:20 - 18:40 UTC | Friday 1st November 2024 | ADCx Gather
Online Only

The exponential growth of audio data necessitates robust and scalable solutions for processing and analysis. This presentation introduces a novel approach to handle a colossal audio dataset (e.g 40 or more TB) using various methods and Ray framework for distributed computing.
When you have terabytes or petabytes of data, it is difficult to use python to process it and finish it before the asterisk. Distributed computations are now easy to perform thanks to the ray.io framework. I will show you how to use distributed methods in practice based on my experience in analyzing and training ML audio, speech and language models. With a wide range of applications, we always face the elementary problem of data preparation, and the dynamically created Ray cluster with calculation and optimization pipelines speeds them up many times. It will show you the basics of the environment, how to navigate and prepare production-ready applications. In this talk, we provide practical tips on how to manage data to build a scalable/robust/reliable software system.We will delve into specific use cases, including the feature extraction like Mel-frequency cepstral coefficients (MFCCs) and spectrogram analysis, showcasing how Ray’s flexibility and scalability can transform conventional audio processing workflows.
The presentation will conclude with a discussion on aggregating results and deriving meaningful insights from large-scale audio data, providing attendees with actionable strategies to manage and analyze vast audio datasets effectively.

Join Paweł as he shares invaluable insights and practical tips to master massive audio data distributed parallel processing.

Pawel Cyrta

AI/ML Research Scientist Engineer

Metamedia Technologies

Paweł Cyrta is a Applied Research Scientist and ML Engineer with over 20 years of expertise in audio technology and machine learning.
His innovative work spans the realms of speech recognition, speech synthesis, natural language processing, and generative audio AI.
Currently, Paweł consults on emerging audio technology projects, delivering bespoke on-premise state-of-the-art ML solutions for complex speech and audio tasks, bridging the gap between cutting-edge technology and practical business solutions.

His diverse career spans multiple industries, including work with prominent organizations such as NowThisMedia, Rev.ai and Roche, where he implemented cutting-edge audio ML solutions.
At Samsung, he played a key role in developing speech recognition and synthesis for S-Voice in 24 European languages, a technology now available in Samsung TVs.

Paweł's academic background combines Computer Science and Electroacoustics from the Warsaw University of Technology with Computational Engineering from HPC center, at the University of Warsaw.
He completed research intership at IRCAM in Paris focused on integrating natural emotions into speech and singing synthesis, bridging the gap between technology and expressive audio content.

He also shares his expertise as a lecturer in Deep Learning postgraduate studies at Warsaw University of Technology,
previously teaching "Interactive Systems" and "Interactive Sound II" at the Fryderyk Chopin University of Music.

As a composer and researcher in music technology, Paweł brings a unique perspective to audio ML, specializing in generative music, interactive systems, and algorithmic composition.
His multifaceted approach combines technical prowess with creative insight, driving innovation in sound analysis and processing, as technical curator and artist at many digital art festivals in Poland.

In his talk, Paweł will share insights from his latest research on large scale, distributed processing of audio data, demonstrating how advanced ML techniques are reshaping the landscape of audio technology and opening new possibilities for intelligent sound processing.

#AppliedResearch on #speech_recognition, #speech_synthesis, #speech_enhancement, #sound_generation, #audio_analysis, #music_information_retrieval, #music_generation, #NLP, #LLM