Audio Models

This page provides information on the Audio models that are available in the Prediction Guard API. These models are designed for working with audio files, and are used in the /audio endpoint.

Models

Model NameTypeUse CaseMore Info
whisper-baseTranscriptionUsed for transcribing audiolink

Model Descriptions

whisper-base

Whisper is a pre-trained model designed for automatic speech recognition (ASR) and speech translation.
Trained on 680k hours of labeled data, Whisper demonstrates a strong ability to generalize across
multiple datasets and domains without requiring fine-tuning.

Type: Transcription Use Case: Used for transcribing audio (multilingual support)

https://huggingface.co/openai/whisper-base

Whisper is a Transformer-based encoder-decoder model, also known as a sequence-to-sequence model.
It was trained on large-scale weak supervision, with models available in both English-only and multilingual variants.