Large Language Model (LLM)

What is LiteLLM, and How to Deploy It in an Enterprise Data Stack?

Last updated on
May 12, 2026

What is LiteLLM?

LiteLLM streamlines interaction with diverse AI language models, handling the nitty-gritty of translating inputs across completion, embedding, and image generation endpoints from providers like Bedrock, Huggingface, Azure, and more. It ensures consistent textual output and offers robust retry logic across deployments. Plus, LiteLLM lets you closely track project-specific spend – perfect for managing those AI model budgets.

Watch LiteLLM in action

Read more about LiteLLM

No items found.

Why is LiteLLM better on Shakudo?

LiteLLM Knowledge Base

LiteLLM Overview

LiteLLM is an open-source LLM gateway that gives your applications a single, unified API to call any large language model — whether it runs on OpenAI, Anthropic, Azure, Google Vertex AI, or a self-hosted model via Ollama. Instead of writing separate integrations for each provider, you point everything at LiteLLM and it handles the routing, key management, retries, and cost tracking for you.

In Shakudo environments, LiteLLM is the central model-access layer. Applications like Dify, AgentFlow, and custom AI tools send requests to LiteLLM; LiteLLM routes them to the appropriate provider based on configuration. This means you can swap providers, add fallbacks, or change models without touching application code.

What Problem Does LiteLLM Solve?

  • Every LLM provider has a slightly different API, authentication method, and response format
  • Managing API keys scattered across multiple apps is a security risk
  • There is no built-in way to track which models are being called, at what cost, and with what latency
  • If one provider goes down or changes pricing, every application needs to be updated

LiteLLM solves all of these by acting as a single choke point: one endpoint, one API key for your apps, and full control over routing, budgets, and observability behind it.

How LiteLLM Fits in the Shakudo Stack

LiteLLM sits at the LLM gateway layer between your applications and the model providers:

  • Apps (Dify, AgentFlow, custom services) send OpenAI-compatible requests to LiteLLM
  • LiteLLM routes requests to the configured provider (Vertex AI, OpenAI, Anthropic, Ollama, etc.)
  • LiteLLM optionally logs calls to Langfuse for tracing and cost tracking
  • LiteLLM connects to Redis or Valkey for rate-limiting, caching, and session state
  • LiteLLM exposes a Prometheus metrics endpoint for monitoring dashboards

Key Concepts

  • Model alias: a name you define (e.g. gpt-4o) that maps to a real provider model (e.g. openai/gpt-4o). Apps use the alias; you control what it points to.
  • Virtual keys: API keys you issue to your teams or apps. LiteLLM validates them and tracks usage per key without exposing the real provider keys.
  • Router: the component that decides which provider/model to send a request to. Supports round-robin, least-busy, and priority-based routing.
  • Fallback: if the primary model fails or is rate-limited, LiteLLM automatically retries on a configured fallback model.
  • Budget: per-user or per-key spend limits enforced by LiteLLM before requests reach the provider.

What LiteLLM Is Not

  • Not a model itself. It is a proxy — it forwards requests to real models.
  • Not a vector database or RAG system. Use Qdrant, Neo4j, or similar tools for that.
  • Not an observability platform. Use Langfuse alongside LiteLLM for full trace-level visibility.

Getting Started & Usage

Once LiteLLM is deployed and the health check passes, connecting your applications and issuing your first call takes a few minutes.

How to Call LiteLLM from Any Application

LiteLLM is OpenAI-compatible. Any application that can call the OpenAI API can call LiteLLM with no code changes — just update the base URL and API key.

Creating Virtual Keys

Virtual keys let you issue scoped API credentials to teams, services, or apps without exposing your provider keys.

The returned key (sk-...) is what the application uses. LiteLLM tracks spend per key.

Viewing Available Models

curl <http://localhost:4000/v1/models> \\
 -H "Authorization: Bearer <YOUR_KEY>"

Checking Usage and Spend

# Per-key usage
curl <http://localhost:4000/key/info> \\
 -H "Authorization: Bearer <LITELLM_MASTER_KEY>" \\
 -d '{"key": "sk-..."}'

# All keys overview
curl <http://localhost:4000/key/list> \\
 -H "Authorization: Bearer <LITELLM_MASTER_KEY>"

Testing a Specific Model

curl -X POST <http://localhost:4000/v1/chat/completions> \\
 -H "Authorization: Bearer <YOUR_KEY>" \\
 -H "Content-Type: application/json" \\
 -d '{
   "model": "claude-3-5-sonnet",
   "messages": [{"role": "user", "content": "What is 2+2?"}]
 }'

Streaming Responses

curl -X POST <http://localhost:4000/v1/chat/completions> \\
 -H "Authorization: Bearer <YOUR_KEY>" \\
 -H "Content-Type: application/json" \\
 -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Tell me a story"}],"stream":true}'

Connecting Dify to LiteLLM

In the Dify settings, configure a new model provider with:

Once configured, all Dify model calls route through LiteLLM.

Practical Tips

  • Use model aliases (not provider model IDs) in your app code — this lets you swap the underlying model without code changes
  • Issue separate virtual keys per application or team for usage isolation
  • Set budget limits on virtual keys to prevent runaway costs
  • Use the /health endpoint in your application startup health checks

Shakudo SaaS-first quick start

This section is for customers using LiteLLM as a managed component inside Shakudo. Start from the Shakudo platform instead of installing or exposing LiteLLM manually.

1. Access the component in Shakudo

  • Sign in to your Shakudo workspace with your organization-approved account.
  • Open the workspace or environment where this component is enabled.
  • Go to the Applications or component catalog area and select LiteLLM.
  • If you cannot see the component, ask your workspace administrator to confirm that it is enabled for your role and environment.

2. Open the component UI

  • Use the Shakudo-provided Open, Launch, or Access action for LiteLLM.
  • Let Shakudo handle authentication, networking, and workspace routing. Avoid using internal service URLs unless your administrator explicitly provides them.
  • Confirm that the component opens in the expected workspace before creating or changing resources.

3. Complete a first safe use case

Open the LiteLLM dashboard or API endpoint, select an approved model, and send a small test completion through the gateway to confirm routing and credentials are working.

  • Use a small non-production example first, especially when testing credentials, scans, model calls, or data connections.
  • Name the test clearly so other workspace users can recognize it as a first-run validation.

4. Monitor and validate the result

  • Check the component UI for run status, logs, traces, scan results, job history, or project activity, depending on the component.
  • Return to Shakudo if you need platform-level status, access control changes, or administrator support.
  • Record any errors, missing permissions, or unexpected results before retrying with production workloads.

5. Next steps

  • Review the use cases, administration, and troubleshooting pages in this knowledge base for deeper examples.
  • For production usage, follow your team’s Shakudo workspace policies for credentials, data access, resource limits, and approvals.
  • Previous getting-started content snapshot
  • The page content below was present before this SaaS-first section was added. It is retained here as an inline snapshot so existing guidance is not lost.
  • heading_1: Getting Started & Usage; paragraph: Once LiteLLM is deployed and the health check passes, connecting your applications and issuing your first call takes a few minutes.; heading_2: How to Call LiteLLM from Any Application; paragraph: LiteLLM is OpenAI-compatible. Any application that can call the OpenAI API can call LiteLLM with no code changes — just update the base URL and API key.; code: # Python — using the openai SDKfrom openai import OpenAI
  • client = OpenAI(api_key="<YOUR_VIRTUAL_KEY>",base_url="http://litellm.hyperplane-litellm.svc.cluster.local:4000/v1")
  • response = client.chat.completions.create(model="gpt-4o",   # use the alias defined in litellmConfigmessages=[{"role": "user", "content": "Summarize this document."}])print(response.choices[0].message.content); paragraph: The model name is the alias from litellmConfig.model_list (e.g. gpt-4o, claude-3-5-sonnet, gemini-flash). LiteLLM routes it to the correct provider.; heading_2: Creating Virtual Keys; paragraph: Virtual keys let you issue scoped API credentials to teams, services, or apps without exposing your provider keys.; code: # Create a virtual key via the LiteLLM APIcurl -X POST http://localhost:4000/key/generate \-H "Authorization: Bearer <LITELLM_MASTER_KEY>" \-H "Content-Type: application/json" \-d '{"models": ["gpt-4o","gemini-flash"],"max_budget": 100,"budget_duration": "monthly","metadata": {"team": "risk-team", "environment": "prod"}}'; paragraph: The returned key (sk-...) is what the application uses. LiteLLM tracks spend per key.; heading_2: Viewing Available Models; code: curl http://localhost:4000/v1/models \-H "Authorization: Bearer <YOUR_KEY>"

Deployment Runbook

📌 Commands-first runbook based on the Shakudo Helm deployment pattern used in production environments including Loblaw and Hitachi. Each section is independently usable during a live deployment call.

Scope

Helm-based deployment of LiteLLM on a Shakudo Kubernetes cluster. Uses the Shakudo monorepo chart, Redis/Valkey for caching, and optionally Langfuse for call logging.

Required Inputs

Confirm before starting:

  • Local kubeconfig with namespace admin access to the LiteLLM namespace
  • Provider API keys (e.g. OPENAI_API_KEY, VERTEXAI_PROJECT, ANTHROPIC_API_KEY)
  • LiteLLM master key (LITELLM_MASTER_KEY) — a strong secret used to generate virtual keys
  • Redis/Valkey endpoint and credentials (or confirmation that the bundled Redis subchart is used)
  • Optional: Langfuse host, public key, and secret key if call logging is needed

Step 1 — Clone the Helm Chart

git clone --depth=1 --branch <release-branch> \\
 <https://<PAT>@github.com/devsentient/monorepo.git>

cd monorepo/stack-components/litellm/helm

Step 2 — Pull Chart Dependencies

helm dependency update .
ls charts/   # expect: redis-*.tgz or valkey-*.tgz

Step 3 — Configure values.yaml

Step 4 — Add Provider API Keys as Secrets

Never hardcode API keys in values.yaml. Create a Kubernetes secret and reference it:

Step 5 — Deploy

Step 6 — Verify Deployment Health

Step 7 — Optional: Enable Langfuse Logging

Add to litellmConfig in values.yaml:

litellmConfig:
 litellm_settings:
   success_callback: ["langfuse"]
   failure_callback: ["langfuse"]
   langfuse_host: <https://langfuse.hyperplane-langfuse.svc.cluster.local>
   langfuse_public_key: os.environ/LANGFUSE_PUBLIC_KEY
   langfuse_secret_key: os.environ/LANGFUSE_SECRET_KEY

Then add the Langfuse keys to the Kubernetes secret and redeploy.

Step 8 — Enable Prometheus Metrics

Add to litellmConfig:

litellmConfig:
 litellm_settings:
   success_callback: ["prometheus"]
   failure_callback: ["prometheus"]

Metrics are exposed at /metrics on port 4000. Confirm:

curl <http://localhost:4000/metrics> | grep litellm

Step 9 — OIDC / Keyless Auth (Optional)

For environments like Loblaw that use OIDC-based keyless auth (no hardcoded API keys):

litellmConfig:
 model_list:
   - model_name: gemini-flash
     litellm_params:
       model: vertex_ai/gemini-2.0-flash
       vertex_project: os.environ/VERTEXAI_PROJECT
       vertex_location: us-central1
       # No api_key needed — uses Workload Identity / OIDC from the cluster

  • The LiteLLM pod must have a Kubernetes service account with Workload Identity binding to the GCP service account
  • Annotate the service account with: iam.gke.io/gcp-service-account: <gcp-sa>@<project>.iam.gserviceaccount.com

Safe Rollback

helm rollback litellm -n hyperplane-litellm

# Or to a specific revision
helm history litellm -n hyperplane-litellm
helm rollback litellm <REVISION> -n hyperplane-litellm

Post-Deployment Checklist

  • All pods Running — no CrashLoopBackOff
  • GET /health returns {"status":"healthy"} or per-model status
  • Test model call returns a valid completion (Step 6)
  • Virtual key created and tested (see Getting Started)
  • Provider keys confirmed in Kubernetes secret, not hardcoded in values.yaml
  • Redis connectivity confirmed (check logs for cache errors if not)
  • Langfuse and Prometheus callbacks enabled if requested

Administration & Best Practices

This page covers how to keep LiteLLM stable, secure, and cost-efficient in a production Shakudo environment.

Model Configuration Management

All model routing is defined in litellmConfig in values.yaml. Treat this as code:

  • Version-control values.yaml in Git alongside the Helm chart
  • Use model aliases not raw provider model IDs in app code — aliases are forward-compatible
  • Test changes in staging before promoting to production

To update a model without a full redeploy:

kubectl get cm litellm-config -n hyperplane-litellm -o yaml > backup-$(date +%Y%m%d).yaml
kubectl edit cm litellm-config -n hyperplane-litellm
kubectl rollout restart deployment/litellm -n hyperplane-litellm

Virtual Key Best Practices

  • Issue one virtual key per application or team — never share the master key with apps
  • Set max_budget and budget_duration on every key to enforce spend caps
  • Set models to restrict which models a key can access
  • Rotate virtual keys periodically using the /key/delete endpoint

# List all keys
curl <http://localhost:4000/key/list> -H "Authorization: Bearer <MASTER_KEY>"

# Delete a key
curl -X DELETE <http://localhost:4000/key/delete> \\
 -H "Authorization: Bearer <MASTER_KEY>" \\
 -d '{"keys": ["sk-..."]}'

Redis / Valkey Cache

LiteLLM uses Redis/Valkey for rate limiting, response caching, and load-balancing state. Cache tuning:

litellm_settings:
 cache: true
 cache_params:
   type: redis
   ttl: 600
   max_in_memory_cache_size: 200

Redis to Valkey Migration

Valkey is Redis-compatible. LiteLLM supports it with no code changes — just update the endpoint:

cache:
 type: redis
 host: valkey-master.hyperplane-valkey.svc.cluster.local
 port: 6379

Security Basics

Master key

  • Store in a Kubernetes secret — never in values.yaml plain text
  • Rotate by updating the secret and restarting the deployment

Provider API keys

  • All provider keys must be in the Kubernetes secret, not the ConfigMap or values.yaml
  • Use OIDC/Workload Identity wherever possible to eliminate static key management

Network access

  • Do not expose LiteLLM publicly — use cluster-internal DNS only
  • If external access is needed, add authentication middleware (Istio AuthorizationPolicy) in front

Scaling

  • Scale horizontally — LiteLLM is stateless when backed by Redis
  • Use HPA on CPU/memory for traffic spikes

kubectl scale deployment/litellm -n hyperplane-litellm --replicas=3

Backup Strategy

  • Back up the LiteLLM PostgreSQL database if STORE_MODEL_IN_DB is enabled
  • Keep a Git-versioned copy of values.yaml and litellmConfig as source of truth
  • Back up provider-key Kubernetes secrets to a secrets manager

Troubleshooting & FAQ

Use this page during live debugging. Format: Problem -> What to check -> Fix.

Deployment Issues

Pod stuck in CrashLoopBackOff

  • Check: kubectl logs deployment/litellm -n hyperplane-litellm
  • Common causes: DATABASE_URL incorrect, Redis not reachable, missing env vars
  • Fix: verify all required env vars are in the Kubernetes secret and the secret is referenced in envFrom

Provider API key error on startup

  • Check: logs show AuthenticationError or No API key
  • Fix: confirm the key is in the secret and the env var name matches what litellmConfig references

helm upgrade fails or times out

  • Check: kubectl get pods -n hyperplane-litellm during upgrade
  • Fix: run from local kubeconfig with full namespace access. Increase --timeout if image pull is slow

Connection Issues

App cannot reach LiteLLM

  • Check: use cluster-internal DNS: litellm.hyperplane-litellm.svc.cluster.local:4000
  • Fix: confirm service is up (kubectl get svc -n hyperplane-litellm) and port is 4000

LiteLLM cannot reach the provider

Redis connection errors in logs

  • Check: redis-master or valkey-master pod is running in the expected namespace
  • Fix: verify cache.host and cache.port in values.yaml match the actual service DNS

Model and Routing Issues

Model not found error

  • Check: model name in the request must match a model_name in litellmConfig.model_list exactly
  • Fix: call GET /v1/models to see available aliases. Update the app to use the correct alias

Fallback not triggering

  • Check: router_settings.fallbacks must include the failing model as a key
  • Fix: update fallbacks config and rollout restart. Confirm fallback model alias exists in model_list

Rate limit errors (429) even with quota remaining

  • Check: the virtual key may have a max_budget or rate_limit set
  • Fix: inspect with GET /key/info. Increase the budget or rate limit if appropriate

Observability Issues

Langfuse traces not appearing

  • Check: litellm_settings.success_callback includes "langfuse" in litellmConfig
  • Check: Langfuse endpoint and keys are correct and in the Kubernetes secret
  • Fix: check logs for Langfuse callback errors. Confirm pod can reach the Langfuse service

Prometheus metrics endpoint returns nothing

  • Check: both success_callback and failure_callback must include "prometheus"
  • Fix: add both callbacks and rollout restart. Then: curl http://localhost:4000/metrics | grep litellm

Frequently Asked Questions

Q: How do I add a new model without a full redeploy?

Update the litellm-config ConfigMap directly and run kubectl rollout restart deployment/litellm. If STORE_MODEL_IN_DB is enabled, you can add models via the API with no restart.

Q: Can I use LiteLLM with a local Ollama instance?

Yes. Add an entry to model_list with model: openai/<model-name>, api_base pointing to the Ollama cluster service, and api_key: "none". LiteLLM treats it as an OpenAI-compatible endpoint.

Q: How do I track which team is spending the most?

Issue a separate virtual key per team with metadata tagging. Call GET /spend/keys to see spend per key. With Langfuse enabled, filter traces by metadata fields like team.

Q: What happens if my primary provider goes down?

If router_settings.fallbacks is configured, LiteLLM automatically retries on the fallback model. num_retries and retry_after control how many attempts before returning an error.

Q: How do I upgrade LiteLLM to a newer version?

Update image.tag in values.yaml, re-run helm dependency update if needed, and redeploy with helm upgrade. Test in staging first and review the LiteLLM changelog for breaking config key name changes.

Why is LiteLLM better on Shakudo?

Why is LiteLLM better on Shakudo?

Core Shakudo Features

Own Your AI

Keep data sovereign, protect IP, and avoid vendor lock-in with infra-agnostic deployments.

Faster Time-to-Value

Pre-built templates and automated DevOps accelerate time-to-value.
integrate

Flexible with Experts

Operating system and dedicated support ensure seamless adoption of the latest and greatest tools.
See Shakudo in Action
Neal Gilmore
Get Started >