Audio Models | Prediction Guard

This page provides information on the Audio models that are available in the Prediction Guard API. These models are designed for working with audio files, and are used in the /audio endpoint.

Models

Model Name	Type	Use Case	More Info
base	Transcription	Used for transcribing audio	link

Model Descriptions

base

Whisper is a pre-trained model designed for automatic speech recognition (ASR) and speech translation.
Trained on 680k hours of labeled data, Whisper demonstrates a strong ability to generalize across
multiple datasets and domains without requiring fine-tuning.

Type: Transcription
Use Case: Used for transcribing audio (multilingual support)

https://huggingface.co/openai/whisper-base

Whisper is a Transformer-based encoder-decoder model, also known as a sequence-to-sequence model.
It was trained on large-scale weak supervision, with models available in both English-only and multilingual variants.