LiteLLM Overview
LiteLLM is an open-source LLM gateway that gives your applications a single, unified API to call any large language model — whether it runs on OpenAI, Anthropic, Azure, Google Vertex AI, or a self-hosted model via Ollama. Instead of writing separate integrations for each provider, you point everything at LiteLLM and it handles the routing, key management, retries, and cost tracking for you.
In Shakudo environments, LiteLLM is the central model-access layer. Applications like Dify, AgentFlow, and custom AI tools send requests to LiteLLM; LiteLLM routes them to the appropriate provider based on configuration. This means you can swap providers, add fallbacks, or change models without touching application code.
What Problem Does LiteLLM Solve?
- Every LLM provider has a slightly different API, authentication method, and response format
- Managing API keys scattered across multiple apps is a security risk
- There is no built-in way to track which models are being called, at what cost, and with what latency
- If one provider goes down or changes pricing, every application needs to be updated
LiteLLM solves all of these by acting as a single choke point: one endpoint, one API key for your apps, and full control over routing, budgets, and observability behind it.
How LiteLLM Fits in the Shakudo Stack
LiteLLM sits at the LLM gateway layer between your applications and the model providers:
- Apps (Dify, AgentFlow, custom services) send OpenAI-compatible requests to LiteLLM
- LiteLLM routes requests to the configured provider (Vertex AI, OpenAI, Anthropic, Ollama, etc.)
- LiteLLM optionally logs calls to Langfuse for tracing and cost tracking
- LiteLLM connects to Redis or Valkey for rate-limiting, caching, and session state
- LiteLLM exposes a Prometheus metrics endpoint for monitoring dashboards
Key Concepts
- Model alias: a name you define (e.g. gpt-4o) that maps to a real provider model (e.g. openai/gpt-4o). Apps use the alias; you control what it points to.
- Virtual keys: API keys you issue to your teams or apps. LiteLLM validates them and tracks usage per key without exposing the real provider keys.
- Router: the component that decides which provider/model to send a request to. Supports round-robin, least-busy, and priority-based routing.
- Fallback: if the primary model fails or is rate-limited, LiteLLM automatically retries on a configured fallback model.
- Budget: per-user or per-key spend limits enforced by LiteLLM before requests reach the provider.
What LiteLLM Is Not
- Not a model itself. It is a proxy — it forwards requests to real models.
- Not a vector database or RAG system. Use Qdrant, Neo4j, or similar tools for that.
- Not an observability platform. Use Langfuse alongside LiteLLM for full trace-level visibility.
Getting Started & Usage
Once LiteLLM is deployed and the health check passes, connecting your applications and issuing your first call takes a few minutes.
How to Call LiteLLM from Any Application
LiteLLM is OpenAI-compatible. Any application that can call the OpenAI API can call LiteLLM with no code changes — just update the base URL and API key.
Creating Virtual Keys
Virtual keys let you issue scoped API credentials to teams, services, or apps without exposing your provider keys.
The returned key (sk-...) is what the application uses. LiteLLM tracks spend per key.
Viewing Available Models
curl <http://localhost:4000/v1/models> \\
-H "Authorization: Bearer <YOUR_KEY>"
Checking Usage and Spend
# Per-key usage
curl <http://localhost:4000/key/info> \\
-H "Authorization: Bearer <LITELLM_MASTER_KEY>" \\
-d '{"key": "sk-..."}'
# All keys overview
curl <http://localhost:4000/key/list> \\
-H "Authorization: Bearer <LITELLM_MASTER_KEY>"
Testing a Specific Model
curl -X POST <http://localhost:4000/v1/chat/completions> \\
-H "Authorization: Bearer <YOUR_KEY>" \\
-H "Content-Type: application/json" \\
-d '{
"model": "claude-3-5-sonnet",
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'
Streaming Responses
curl -X POST <http://localhost:4000/v1/chat/completions> \\
-H "Authorization: Bearer <YOUR_KEY>" \\
-H "Content-Type: application/json" \\
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Tell me a story"}],"stream":true}'
Connecting Dify to LiteLLM
In the Dify settings, configure a new model provider with:
- Provider type: OpenAI-compatible API
- API base URL: http://litellm.hyperplane-litellm.svc.cluster.local:4000/v1
- API key: your LiteLLM virtual key
- Model name: the alias from LiteLLM (e.g. gpt-4o)
Once configured, all Dify model calls route through LiteLLM.
Practical Tips
- Use model aliases (not provider model IDs) in your app code — this lets you swap the underlying model without code changes
- Issue separate virtual keys per application or team for usage isolation
- Set budget limits on virtual keys to prevent runaway costs
- Use the /health endpoint in your application startup health checks
Shakudo SaaS-first quick start
This section is for customers using LiteLLM as a managed component inside Shakudo. Start from the Shakudo platform instead of installing or exposing LiteLLM manually.
1. Access the component in Shakudo
- Sign in to your Shakudo workspace with your organization-approved account.
- Open the workspace or environment where this component is enabled.
- Go to the Applications or component catalog area and select LiteLLM.
- If you cannot see the component, ask your workspace administrator to confirm that it is enabled for your role and environment.
2. Open the component UI
- Use the Shakudo-provided Open, Launch, or Access action for LiteLLM.
- Let Shakudo handle authentication, networking, and workspace routing. Avoid using internal service URLs unless your administrator explicitly provides them.
- Confirm that the component opens in the expected workspace before creating or changing resources.
3. Complete a first safe use case
Open the LiteLLM dashboard or API endpoint, select an approved model, and send a small test completion through the gateway to confirm routing and credentials are working.
- Use a small non-production example first, especially when testing credentials, scans, model calls, or data connections.
- Name the test clearly so other workspace users can recognize it as a first-run validation.
4. Monitor and validate the result
- Check the component UI for run status, logs, traces, scan results, job history, or project activity, depending on the component.
- Return to Shakudo if you need platform-level status, access control changes, or administrator support.
- Record any errors, missing permissions, or unexpected results before retrying with production workloads.
5. Next steps
- Review the use cases, administration, and troubleshooting pages in this knowledge base for deeper examples.
- For production usage, follow your team’s Shakudo workspace policies for credentials, data access, resource limits, and approvals.
- Previous getting-started content snapshot
- The page content below was present before this SaaS-first section was added. It is retained here as an inline snapshot so existing guidance is not lost.
- heading_1: Getting Started & Usage; paragraph: Once LiteLLM is deployed and the health check passes, connecting your applications and issuing your first call takes a few minutes.; heading_2: How to Call LiteLLM from Any Application; paragraph: LiteLLM is OpenAI-compatible. Any application that can call the OpenAI API can call LiteLLM with no code changes — just update the base URL and API key.; code: # Python — using the openai SDKfrom openai import OpenAI
- client = OpenAI(api_key="<YOUR_VIRTUAL_KEY>",base_url="http://litellm.hyperplane-litellm.svc.cluster.local:4000/v1")
- response = client.chat.completions.create(model="gpt-4o", # use the alias defined in litellmConfigmessages=[{"role": "user", "content": "Summarize this document."}])print(response.choices[0].message.content); paragraph: The model name is the alias from litellmConfig.model_list (e.g. gpt-4o, claude-3-5-sonnet, gemini-flash). LiteLLM routes it to the correct provider.; heading_2: Creating Virtual Keys; paragraph: Virtual keys let you issue scoped API credentials to teams, services, or apps without exposing your provider keys.; code: # Create a virtual key via the LiteLLM APIcurl -X POST http://localhost:4000/key/generate \-H "Authorization: Bearer <LITELLM_MASTER_KEY>" \-H "Content-Type: application/json" \-d '{"models": ["gpt-4o","gemini-flash"],"max_budget": 100,"budget_duration": "monthly","metadata": {"team": "risk-team", "environment": "prod"}}'; paragraph: The returned key (sk-...) is what the application uses. LiteLLM tracks spend per key.; heading_2: Viewing Available Models; code: curl http://localhost:4000/v1/models \-H "Authorization: Bearer <YOUR_KEY>"
Deployment Runbook
📌 Commands-first runbook based on the Shakudo Helm deployment pattern used in production environments including Loblaw and Hitachi. Each section is independently usable during a live deployment call.
Scope
Helm-based deployment of LiteLLM on a Shakudo Kubernetes cluster. Uses the Shakudo monorepo chart, Redis/Valkey for caching, and optionally Langfuse for call logging.
Required Inputs
Confirm before starting:
- Local kubeconfig with namespace admin access to the LiteLLM namespace
- Provider API keys (e.g. OPENAI_API_KEY, VERTEXAI_PROJECT, ANTHROPIC_API_KEY)
- LiteLLM master key (LITELLM_MASTER_KEY) — a strong secret used to generate virtual keys
- Redis/Valkey endpoint and credentials (or confirmation that the bundled Redis subchart is used)
- Optional: Langfuse host, public key, and secret key if call logging is needed
Step 1 — Clone the Helm Chart
git clone --depth=1 --branch <release-branch> \\
<https://<PAT>@github.com/devsentient/monorepo.git>
cd monorepo/stack-components/litellm/helm
- Use a GitHub PAT (github.com/settings/tokens) — password auth is not supported
Step 2 — Pull Chart Dependencies
helm dependency update .
ls charts/ # expect: redis-*.tgz or valkey-*.tgz
Step 3 — Configure values.yaml
Step 4 — Add Provider API Keys as Secrets
Never hardcode API keys in values.yaml. Create a Kubernetes secret and reference it:
Step 5 — Deploy
Step 6 — Verify Deployment Health
Step 7 — Optional: Enable Langfuse Logging
Add to litellmConfig in values.yaml:
litellmConfig:
litellm_settings:
success_callback: ["langfuse"]
failure_callback: ["langfuse"]
langfuse_host: <https://langfuse.hyperplane-langfuse.svc.cluster.local>
langfuse_public_key: os.environ/LANGFUSE_PUBLIC_KEY
langfuse_secret_key: os.environ/LANGFUSE_SECRET_KEY
Then add the Langfuse keys to the Kubernetes secret and redeploy.
Step 8 — Enable Prometheus Metrics
Add to litellmConfig:
litellmConfig:
litellm_settings:
success_callback: ["prometheus"]
failure_callback: ["prometheus"]
Metrics are exposed at /metrics on port 4000. Confirm:
curl <http://localhost:4000/metrics> | grep litellm
Step 9 — OIDC / Keyless Auth (Optional)
For environments like Loblaw that use OIDC-based keyless auth (no hardcoded API keys):
litellmConfig:
model_list:
- model_name: gemini-flash
litellm_params:
model: vertex_ai/gemini-2.0-flash
vertex_project: os.environ/VERTEXAI_PROJECT
vertex_location: us-central1
# No api_key needed — uses Workload Identity / OIDC from the cluster
- The LiteLLM pod must have a Kubernetes service account with Workload Identity binding to the GCP service account
- Annotate the service account with: iam.gke.io/gcp-service-account: <gcp-sa>@<project>.iam.gserviceaccount.com
Safe Rollback
helm rollback litellm -n hyperplane-litellm
# Or to a specific revision
helm history litellm -n hyperplane-litellm
helm rollback litellm <REVISION> -n hyperplane-litellm
Post-Deployment Checklist
- All pods Running — no CrashLoopBackOff
- GET /health returns {"status":"healthy"} or per-model status
- Test model call returns a valid completion (Step 6)
- Virtual key created and tested (see Getting Started)
- Provider keys confirmed in Kubernetes secret, not hardcoded in values.yaml
- Redis connectivity confirmed (check logs for cache errors if not)
- Langfuse and Prometheus callbacks enabled if requested
Administration & Best Practices
This page covers how to keep LiteLLM stable, secure, and cost-efficient in a production Shakudo environment.
Model Configuration Management
All model routing is defined in litellmConfig in values.yaml. Treat this as code:
- Version-control values.yaml in Git alongside the Helm chart
- Use model aliases not raw provider model IDs in app code — aliases are forward-compatible
- Test changes in staging before promoting to production
To update a model without a full redeploy:
kubectl get cm litellm-config -n hyperplane-litellm -o yaml > backup-$(date +%Y%m%d).yaml
kubectl edit cm litellm-config -n hyperplane-litellm
kubectl rollout restart deployment/litellm -n hyperplane-litellm
Virtual Key Best Practices
- Issue one virtual key per application or team — never share the master key with apps
- Set max_budget and budget_duration on every key to enforce spend caps
- Set models to restrict which models a key can access
- Rotate virtual keys periodically using the /key/delete endpoint
# List all keys
curl <http://localhost:4000/key/list> -H "Authorization: Bearer <MASTER_KEY>"
# Delete a key
curl -X DELETE <http://localhost:4000/key/delete> \\
-H "Authorization: Bearer <MASTER_KEY>" \\
-d '{"keys": ["sk-..."]}'
Redis / Valkey Cache
LiteLLM uses Redis/Valkey for rate limiting, response caching, and load-balancing state. Cache tuning:
litellm_settings:
cache: true
cache_params:
type: redis
ttl: 600
max_in_memory_cache_size: 200
Redis to Valkey Migration
Valkey is Redis-compatible. LiteLLM supports it with no code changes — just update the endpoint:
cache:
type: redis
host: valkey-master.hyperplane-valkey.svc.cluster.local
port: 6379
Security Basics
Master key
- Store in a Kubernetes secret — never in values.yaml plain text
- Rotate by updating the secret and restarting the deployment
Provider API keys
- All provider keys must be in the Kubernetes secret, not the ConfigMap or values.yaml
- Use OIDC/Workload Identity wherever possible to eliminate static key management
Network access
- Do not expose LiteLLM publicly — use cluster-internal DNS only
- If external access is needed, add authentication middleware (Istio AuthorizationPolicy) in front
Scaling
- Scale horizontally — LiteLLM is stateless when backed by Redis
- Use HPA on CPU/memory for traffic spikes
kubectl scale deployment/litellm -n hyperplane-litellm --replicas=3
Backup Strategy
- Back up the LiteLLM PostgreSQL database if STORE_MODEL_IN_DB is enabled
- Keep a Git-versioned copy of values.yaml and litellmConfig as source of truth
- Back up provider-key Kubernetes secrets to a secrets manager
Troubleshooting & FAQ
Use this page during live debugging. Format: Problem -> What to check -> Fix.
Deployment Issues
Pod stuck in CrashLoopBackOff
- Check: kubectl logs deployment/litellm -n hyperplane-litellm
- Common causes: DATABASE_URL incorrect, Redis not reachable, missing env vars
- Fix: verify all required env vars are in the Kubernetes secret and the secret is referenced in envFrom
Provider API key error on startup
- Check: logs show AuthenticationError or No API key
- Fix: confirm the key is in the secret and the env var name matches what litellmConfig references
helm upgrade fails or times out
- Check: kubectl get pods -n hyperplane-litellm during upgrade
- Fix: run from local kubeconfig with full namespace access. Increase --timeout if image pull is slow
Connection Issues
App cannot reach LiteLLM
- Check: use cluster-internal DNS: litellm.hyperplane-litellm.svc.cluster.local:4000
- Fix: confirm service is up (kubectl get svc -n hyperplane-litellm) and port is 4000
LiteLLM cannot reach the provider
- Check: kubectl exec into pod and test: curl https://api.openai.com/v1/models
- Fix: check egress network policy. Cloud providers need outbound HTTPS from the pod
Redis connection errors in logs
- Check: redis-master or valkey-master pod is running in the expected namespace
- Fix: verify cache.host and cache.port in values.yaml match the actual service DNS
Model and Routing Issues
Model not found error
- Check: model name in the request must match a model_name in litellmConfig.model_list exactly
- Fix: call GET /v1/models to see available aliases. Update the app to use the correct alias
Fallback not triggering
- Check: router_settings.fallbacks must include the failing model as a key
- Fix: update fallbacks config and rollout restart. Confirm fallback model alias exists in model_list
Rate limit errors (429) even with quota remaining
- Check: the virtual key may have a max_budget or rate_limit set
- Fix: inspect with GET /key/info. Increase the budget or rate limit if appropriate
Observability Issues
Langfuse traces not appearing
- Check: litellm_settings.success_callback includes "langfuse" in litellmConfig
- Check: Langfuse endpoint and keys are correct and in the Kubernetes secret
- Fix: check logs for Langfuse callback errors. Confirm pod can reach the Langfuse service
Prometheus metrics endpoint returns nothing
- Check: both success_callback and failure_callback must include "prometheus"
- Fix: add both callbacks and rollout restart. Then: curl http://localhost:4000/metrics | grep litellm
Frequently Asked Questions
Q: How do I add a new model without a full redeploy?
Update the litellm-config ConfigMap directly and run kubectl rollout restart deployment/litellm. If STORE_MODEL_IN_DB is enabled, you can add models via the API with no restart.
Q: Can I use LiteLLM with a local Ollama instance?
Yes. Add an entry to model_list with model: openai/<model-name>, api_base pointing to the Ollama cluster service, and api_key: "none". LiteLLM treats it as an OpenAI-compatible endpoint.
Q: How do I track which team is spending the most?
Issue a separate virtual key per team with metadata tagging. Call GET /spend/keys to see spend per key. With Langfuse enabled, filter traces by metadata fields like team.
Q: What happens if my primary provider goes down?
If router_settings.fallbacks is configured, LiteLLM automatically retries on the fallback model. num_retries and retry_after control how many attempts before returning an error.
Q: How do I upgrade LiteLLM to a newer version?
Update image.tag in values.yaml, re-run helm dependency update if needed, and redeploy with helm upgrade. Test in staging first and review the LiteLLM changelog for breaking config key name changes.

