VolumetricCondensed

Pawel Cyrta

Sessions

  • Scalable, Efficient Processing and Analysis of Large Audio Datasets

    18:20 - 18:40 UTC | Friday 1st November 2024 | ADCx Gather
    Online Only

    The exponential growth of audio data necessitates robust and scalable solutions for processing and analysis. This presentation introduces a novel approach to handle a colossal 30 or more TB audio dataset using various methods and Ray framework for distributed computing. When you have terabytes or petabytes of data, it is difficult to use python to process it and finish it before the asterisk. Distributed computations are now easy to perform thanks to the ray.io framework. I will show you how to use distributed methods in practice based on my experience in analyzing and training ML audio, speech and language models. […]