This page is intended to document how to install audio backends and provides troubleshooting steps for your audio loading troubles.
SpeechBrain now uses soundfile as the sole supported audio I/O backend through the :mod:`speechbrain.dataio.audio_io` module.
The soundfile backend supports most common audio formats including:
wav, flac, and mp3. For advanced format support or issues,
please refer to the sections below.
Note
Legacy torchaudio backends: SpeechBrain previously used torchaudio for
audio I/O, which supported three backends: ffmpeg, sox and soundfile.
However, torchaudio 2.9 deprecated all audio I/O support so SpeechBrain
now relies on soundfile directly for audio I/O.
The pip package soundfile is a dependency of SpeechBrain and should be automatically
installed when you install SpeechBrain.
Starting with SoundFile 0.12.0, the pip package bundles a prebuilt libsndfile
for most platforms (Windows, macOS, Linux), so it typically works out of the box
when installed via pip.
If you encounter issues with audio loading:
Update soundfile: Try running
pip install --upgrade soundfileto get the latest version with updatedlibsndfilebinaries.On Linux with superuser rights: Install
libsndfilethrough your distribution's package manager (e.g.,sudo apt install libsndfile1on Ubuntu/Debian).For advanced codec support: If you need to work with formats not supported by soundfile (e.g., AAC/M4A), you may need to convert your audio files to a supported format like WAV or FLAC using external tools such as
ffmpeg.Check installation: You can verify soundfile is working by running:
import soundfile as sf print(sf.__version__) print(sf.available_formats())
SpeechBrain provides its own audio I/O interface through the :mod:`speechbrain.dataio.audio_io` module. Usage example:
from speechbrain.dataio import audio_io
# Load audio file
audio, sample_rate = audio_io.load("path/to/audio.wav")
# Get audio metadata
info = audio_io.info("path/to/audio.wav")
print(info.sample_rate, info.duration, info.channels)
# Save audio file
audio_io.save("output.wav", audio, sample_rate)This API is compatible with the previous torchaudio-based interface, making migration straightforward.