Lightning Hydra Template Extended
Image Classification Architectures and Datasets Useful for Spectral Audio Processing
The Lightning Hydra Template Extended (LHTE) aggregates several state-of-the-art image-classification methods aimed toward high-speed spectral audio processing.
The LHTE is a fork of the Lightning Hydra Template from GitHub: https://github.com/ashleve/lightning-hydra-template
PyTorch Lightning streamlines deep-learning model development in numerous ways.
Hydra is a powerful configuration management framework often used in such projects.
The Lightning Hydra Template provides a super clean starting point, and out of the box it can train a simple MLP on MNIST (hand-written digits).
Beyond MLPs, the LHTE further adds support for CNNs ConvNeXt, EfficientNet, and Vision Transformer.
Beyond MNIST, the LHTE adds support for CIFAR-10 and CIFAR-100, and introduces a new dataset format called Variable Image Multi-Head (VIMH) format, which can be regarded as a generalized CIFAR-100 format supporting up to 65k x 65k images having up to 65k channels (i.e., 16-bit height, width, and channel specifications). The multi-head support allows separate classification heads to be assigned to each image/spectrogram category. Several data loaders are provided for these four formats and associated model architectures.
The directory configs/experiment/
contains various benchmark replications on MNIST, CIFAR, and VIMH datasets, and a top-level Makefile contains on the order of a hundred make targets spanning testing, training, and running experiments.

Julius Smith
Professor Emeritus
Stanford University
Julius O. Smith is a research engineer, educator, and musician devoted primarily to developing new technologies for music and audio signal processing. He received the B.S.E.E. degree from Rice University in 1975 (Control, Circuits, and Communication), and the M.S. and Ph.D. degrees in E.E. from Stanford University, in 1978 and 1983, respectively. For his MS/EE, he focused largely on statistical signal processing. His Ph.D. research was devoted to improved methods for digital filter design and system identification applied to music and audio systems, particularly the violin. From 1975 to 1977 he worked in the Signal Processing Department at ESL, Sunnyvale, CA, on systems for digital communications. From 1982 to 1986 he was with the Adaptive Systems Department at Systems Control Technology, Palo Alto, CA, where he worked in the areas of adaptive filtering and spectral estimation. From 1986 to 1991 he was employed at NeXT Computer, Inc., responsible for sound, music, and signal processing software for the NeXT computer workstation. After NeXT, he became a Professor at the Center for Computer Research in Music and Acoustics (CCRMA) at Stanford, with a courtesy appointment in EE, teaching courses and pursuing/supervising research related to signal processing techniques applied to music and audio systems. At varying part-time levels, he was a founding consultant for Staccato Systems, Shazam Inc., and moForte Inc. He is presently a Professor Emeritus of Music and by courtesy Electrical Engineering at Stanford, and a perennial consultant for moForte Inc. and a few others. For more information, see https//ccrma.stanford.edu/~jos/.