An Efficient, Open-Source C++ Loop Classifier and Tempo Estimator
The Algorithm Behind Audacity’s Brand New Tempo Detection Feature
An efficient, offline C++ algorithm for loop classification and tempo estimation is presented, alongside its evaluation framework. The framework provides the area under the ROC curve (AUC) of the classifier, facilitating regression-free development and tuning of the algorithm. The AUC is now 0.93 when evaluated against the set of files (publicly available on freesound.org under the Creative Commons license) listed in the framework's source code. By providing computation time measurement, the framework has also been useful for optimizing the algorithm, which is now typically over 2500 times faster than real-time (measurement made on a Windows laptop with a 12th Gen Intel Core i7-12800HX processor and 32 GB of RAM). Furthermore, the framework can be used to set the target false positive rate according to the requirements of your application. Algorithm and evaluation framework are open source, and care has been taken to keep the algorithm easily reusable.
The algorithm can be seen as a "classical" algorithm and reuses ideas described elsewhere in the literature. However, the idea behind the classifier is original. A set of loosely plausible numbers of tatums (or ticks) fitting in the duration of the provided audio file is taken. The likelihood of each tatum count hypothesis is evaluated by measuring the distance of each onset to its closest tatum and using the onset's strength in the weighted average of all distances. The average is then compared to a threshold, and, if below, a disambiguation step is carried out, where the number of tatums is reused to determine the most likely tempo (BPM) and time signature.
As implied above, the input audio must be a loop for its tempo to be detected. This limitation was not deemed critical for the application the algorithm was intended for. On the other hand, it opened possibilities to improve the discriminant factor of the classifier, allowing a higher success rate while keeping the false positive rate low. This choice may explain the originality of the approach despite its simplicity.
Matthieu Hodgkinson
Senior Software Developer
Muse Group
Passionate about music, I pursued a Bachelor's degree in Musicology at the Université Jean Monnet in Saint-Etienne, France. I completed the final year of this degree at the National University of Ireland, Maynooth, as an Erasmus student, where I delved into computer music. I continued my studies there, earning an MA in Music Technology, and later completed a PhD in Computer Science, focusing on Digital Signal Processing (DSP).
My PhD thesis developed mathematical models for the vibration of plucked and struck strings. These models could be used to separate transients from more stable frequency components (predominantly harmonics) in recorded samples. The extracted transients could serve as realistic inputs to synthesis systems, such as digital waveguides. Additionally, the frequency and decay parameters derived from fitting the model to the sample could be used to calibrate these systems, enabling them to replicate the sound of classical or acoustic guitars, plucked violins, and similar instruments. I gave talks about the findings of my doctoral research at several international conferences, primarily DAFx.
Following my PhD, I spent over 10 years working in the video conferencing industry, contributing to the development of GoToMeeting at Citrix, LogMeIn, and later GoTo. Working on the endpoint and the server, the backend and the frontend, in C++ and Javascript, my work primarily focused on the audio processing chain and the network transport of audio packets. I developed a patented algorithm for jitter buffer control that utilizes a perceptual model that considers packet late loss and conversational interactivity to determine the optimal buffer size on the receiver’s endpoint.
Since April 2023, I have been working for the Muse Group as part of the Audacity team. As the DSP specialist, I have the privilege of tackling some of the most engaging tasks – at least from my point of view… These often offer opportunities for creativity, and it was while working on the problem of tempo detection that I discovered the approach I will present at ADC 2024.