Model Management

Deploy and configure models in your Prediction Guard platform.

Model Deployment

Deploy New Model Page

Deploy from Hugging Face

Browse the model library in the admin panel
Search for models by name or type
Click Deploy and configure settings
Monitor deployment progress
Test the model via API

Upload Custom Models

Upload model files to your platform
Configure model settings (name, description, capabilities)
Set resource requirements (CPU, GPU, memory)
Deploy and test the model
Make available to API users

Model Configuration

General Settings

Select from Catalog: Choose from available model catalog
Model Name: Display name for the model
Description: Model description and capabilities
Container Image URL: Custom container image (optional)
Replicas: Number of model instances to run
Enable Model: Toggle model availability
Runtime Class Name: Kubernetes runtime class

Model Parameters

Model CPU (millicores): CPU allocation for the model server
Model Memory (GB): RAM allocation for the model
Accelerator Cards: Number of GPUs to allocate
Card Type: GPU type (NVIDIA, etc.)
Hugepages (GB): Memory optimization settings
Max Input Tokens: Maximum input context length
Max Total Tokens: Maximum total tokens per request
Min Input Tokens: Minimum input tokens required
Max Client Batches: Maximum concurrent client batches
Aliases: Model aliases (one per line)

Model Capabilities

Streaming: Enable streaming responses
Tool Use: Enable tool/function calling capabilities
Image Input: Enable image processing capabilities
Image Formats: Supported image formats (PNG, JPEG, etc.)
Capabilities: Model capabilities (embedding, chat, etc.)

Advanced Configuration

Container Execution: Define container command and arguments
Environment Variables: Set environment variables (YAML)
K8s Resource Limits: Configure Kubernetes resource limits (YAML)
K8s Scheduling: Control pod placement with selectors and affinity
K8s Storage: Configure and mount storage volumes
K8s Health Probes: Configure liveness and readiness probes