Skip to content

Latest commit

 

History

History
75 lines (53 loc) · 2.62 KB

File metadata and controls

75 lines (53 loc) · 2.62 KB

Audio loading troubleshooting

This page is intended to document how to install audio backends and provides troubleshooting steps for your audio loading troubles.

Introduction

SpeechBrain now uses soundfile as the sole supported audio I/O backend through the :mod:`speechbrain.dataio.audio_io` module.

The soundfile backend supports most common audio formats including: wav, flac, and mp3. For advanced format support or issues, please refer to the sections below.

Note

Legacy torchaudio backends: SpeechBrain previously used torchaudio for audio I/O, which supported three backends: ffmpeg, sox and soundfile. However, torchaudio 2.9 deprecated all audio I/O support so SpeechBrain now relies on soundfile directly for audio I/O.

Recommended install steps

The pip package soundfile is a dependency of SpeechBrain and should be automatically installed when you install SpeechBrain. Starting with SoundFile 0.12.0, the pip package bundles a prebuilt libsndfile for most platforms (Windows, macOS, Linux), so it typically works out of the box when installed via pip.

If you encounter issues with audio loading:

  • Update soundfile: Try running pip install --upgrade soundfile to get the latest version with updated libsndfile binaries.

  • On Linux with superuser rights: Install libsndfile through your distribution's package manager (e.g., sudo apt install libsndfile1 on Ubuntu/Debian).

  • For advanced codec support: If you need to work with formats not supported by soundfile (e.g., AAC/M4A), you may need to convert your audio files to a supported format like WAV or FLAC using external tools such as ffmpeg.

  • Check installation: You can verify soundfile is working by running:

    import soundfile as sf
    print(sf.__version__)
    print(sf.available_formats())

SpeechBrain Audio I/O API

SpeechBrain provides its own audio I/O interface through the :mod:`speechbrain.dataio.audio_io` module. Usage example:

from speechbrain.dataio import audio_io

# Load audio file
audio, sample_rate = audio_io.load("path/to/audio.wav")

# Get audio metadata
info = audio_io.info("path/to/audio.wav")
print(info.sample_rate, info.duration, info.channels)

# Save audio file
audio_io.save("output.wav", audio, sample_rate)

This API is compatible with the previous torchaudio-based interface, making migration straightforward.