LiteLLM Integration | Deploy on Shakudo

LiteLLM Knowledge Base

LiteLLM Overview

LiteLLM is an open-source LLM gateway that gives your applications a single, unified API to call any large language model — whether it runs on OpenAI, Anthropic, Azure, Google Vertex AI, or a self-hosted model via Ollama. Instead of writing separate integrations for each provider, you point everything at LiteLLM and it handles the routing, key management, retries, and cost tracking for you.

In Shakudo environments, LiteLLM is the central model-access layer. Applications like Dify, AgentFlow, and custom AI tools send requests to LiteLLM; LiteLLM routes them to the appropriate provider based on configuration. This means you can swap providers, add fallbacks, or change models without touching application code.

What Problem Does LiteLLM Solve?

Every LLM provider has a slightly different API, authentication method, and response format
Managing API keys scattered across multiple apps is a security risk
There is no built-in way to track which models are being called, at what cost, and with what latency
If one provider goes down or changes pricing, every application needs to be updated

LiteLLM solves all of these by acting as a single choke point: one endpoint, one API key for your apps, and full control over routing, budgets, and observability behind it.

How LiteLLM Fits in the Shakudo Stack

LiteLLM sits at the LLM gateway layer between your applications and the model providers:

Apps (Dify, AgentFlow, custom services) send OpenAI-compatible requests to LiteLLM
LiteLLM routes requests to the configured provider (Vertex AI, OpenAI, Anthropic, Ollama, etc.)
LiteLLM optionally logs calls to Langfuse for tracing and cost tracking
LiteLLM connects to Redis or Valkey for rate-limiting, caching, and session state
LiteLLM exposes a Prometheus metrics endpoint for monitoring dashboards

Key Concepts

Model alias: a name you define (e.g. gpt-4o) that maps to a real provider model (e.g. openai/gpt-4o). Apps use the alias; you control what it points to.
Virtual keys: API keys you issue to your teams or apps. LiteLLM validates them and tracks usage per key without exposing the real provider keys.
Router: the component that decides which provider/model to send a request to. Supports round-robin, least-busy, and priority-based routing.
Fallback: if the primary model fails or is rate-limited, LiteLLM automatically retries on a configured fallback model.
Budget: per-user or per-key spend limits enforced by LiteLLM before requests reach the provider.

What LiteLLM Is Not

Not a model itself. It is a proxy — it forwards requests to real models.
Not a vector database or RAG system. Use Qdrant, Neo4j, or similar tools for that.
Not an observability platform. Use Langfuse alongside LiteLLM for full trace-level visibility.

Getting Started & Usage

Once LiteLLM is deployed and the health check passes, connecting your applications and issuing your first call takes a few minutes.

How to Call LiteLLM from Any Application

LiteLLM is OpenAI-compatible. Any application that can call the OpenAI API can call LiteLLM with no code changes — just update the base URL and API key.

Creating Virtual Keys

Virtual keys let you issue scoped API credentials to teams, services, or apps without exposing your provider keys.

The returned key (sk-...) is what the application uses. LiteLLM tracks spend per key.

Viewing Available Models

curl <http://localhost:4000/v1/models> \\ -H "Authorization: Bearer <YOUR_KEY>"

Checking Usage and Spend

# Per-key usage curl <http://localhost:4000/key/info> \\ -H "Authorization: Bearer <LITELLM_MASTER_KEY>" \\ -d '{"key": "sk-..."}' # All keys overview curl <http://localhost:4000/key/list> \\ -H "Authorization: Bearer <LITELLM_MASTER_KEY>"

Testing a Specific Model

curl -X POST <http://localhost:4000/v1/chat/completions> \\ -H "Authorization: Bearer <YOUR_KEY>" \\ -H "Content-Type: application/json" \\ -d '{ "model": "claude-3-5-sonnet", "messages": [{"role": "user", "content": "What is 2+2?"}] }'

Streaming Responses

curl -X POST <http://localhost:4000/v1/chat/completions> \\ -H "Authorization: Bearer <YOUR_KEY>" \\ -H "Content-Type: application/json" \\ -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Tell me a story"}],"stream":true}'

Connecting Dify to LiteLLM

In the Dify settings, configure a new model provider with:

Provider type: OpenAI-compatible API
API base URL: http://litellm.hyperplane-litellm.svc.cluster.local:4000/v1
API key: your LiteLLM virtual key
Model name: the alias from LiteLLM (e.g. gpt-4o)

Once configured, all Dify model calls route through LiteLLM.

Practical Tips

Use model aliases (not provider model IDs) in your app code — this lets you swap the underlying model without code changes
Issue separate virtual keys per application or team for usage isolation
Set budget limits on virtual keys to prevent runaway costs
Use the /health endpoint in your application startup health checks

Shakudo SaaS-first quick start

This section is for customers using LiteLLM as a managed component inside Shakudo. Start from the Shakudo platform instead of installing or exposing LiteLLM manually.

1. Access the component in Shakudo

Sign in to your Shakudo workspace with your organization-approved account.
Open the workspace or environment where this component is enabled.
Go to the Applications or component catalog area and select LiteLLM.
If you cannot see the component, ask your workspace administrator to confirm that it is enabled for your role and environment.

2. Open the component UI

Use the Shakudo-provided Open, Launch, or Access action for LiteLLM.
Let Shakudo handle authentication, networking, and workspace routing. Avoid using internal service URLs unless your administrator explicitly provides them.
Confirm that the component opens in the expected workspace before creating or changing resources.

3. Complete a first safe use case

Open the LiteLLM dashboard or API endpoint, select an approved model, and send a small test completion through the gateway to confirm routing and credentials are working.

Use a small non-production example first, especially when testing credentials, scans, model calls, or data connections.
Name the test clearly so other workspace users can recognize it as a first-run validation.

4. Monitor and validate the result

Check the component UI for run status, logs, traces, scan results, job history, or project activity, depending on the component.
Return to Shakudo if you need platform-level status, access control changes, or administrator support.
Record any errors, missing permissions, or unexpected results before retrying with production workloads.

5. Next steps

Review the use cases, administration, and troubleshooting pages in this knowledge base for deeper examples.
For production usage, follow your team’s Shakudo workspace policies for credentials, data access, resource limits, and approvals.
Previous getting-started content snapshot
The page content below was present before this SaaS-first section was added. It is retained here as an inline snapshot so existing guidance is not lost.
heading_1: Getting Started & Usage; paragraph: Once LiteLLM is deployed and the health check passes, connecting your applications and issuing your first call takes a few minutes.; heading_2: How to Call LiteLLM from Any Application; paragraph: LiteLLM is OpenAI-compatible. Any application that can call the OpenAI API can call LiteLLM with no code changes — just update the base URL and API key.; code: # Python — using the openai SDKfrom openai import OpenAI
client = OpenAI(api_key="<YOUR_VIRTUAL_KEY>",base_url="http://litellm.hyperplane-litellm.svc.cluster.local:4000/v1")
response = client.chat.completions.create(model="gpt-4o", # use the alias defined in litellmConfigmessages=[{"role": "user", "content": "Summarize this document."}])print(response.choices[0].message.content); paragraph: The model name is the alias from litellmConfig.model_list (e.g. gpt-4o, claude-3-5-sonnet, gemini-flash). LiteLLM routes it to the correct provider.; heading_2: Creating Virtual Keys; paragraph: Virtual keys let you issue scoped API credentials to teams, services, or apps without exposing your provider keys.; code: # Create a virtual key via the LiteLLM APIcurl -X POST http://localhost:4000/key/generate \-H "Authorization: Bearer <LITELLM_MASTER_KEY>" \-H "Content-Type: application/json" \-d '{"models": ["gpt-4o","gemini-flash"],"max_budget": 100,"budget_duration": "monthly","metadata": {"team": "risk-team", "environment": "prod"}}'; paragraph: The returned key (sk-...) is what the application uses. LiteLLM tracks spend per key.; heading_2: Viewing Available Models; code: curl http://localhost:4000/v1/models \-H "Authorization: Bearer <YOUR_KEY>"

Deployment Runbook

📌 Commands-first runbook based on the Shakudo Helm deployment pattern used in production environments including Loblaw and Hitachi. Each section is independently usable during a live deployment call.

Scope

Helm-based deployment of LiteLLM on a Shakudo Kubernetes cluster. Uses the Shakudo monorepo chart, Redis/Valkey for caching, and optionally Langfuse for call logging.

Required Inputs

Confirm before starting:

Local kubeconfig with namespace admin access to the LiteLLM namespace
Provider API keys (e.g. OPENAI_API_KEY, VERTEXAI_PROJECT, ANTHROPIC_API_KEY)
LiteLLM master key (LITELLM_MASTER_KEY) — a strong secret used to generate virtual keys
Redis/Valkey endpoint and credentials (or confirmation that the bundled Redis subchart is used)
Optional: Langfuse host, public key, and secret key if call logging is needed

Step 1 — Clone the Helm Chart

git clone --depth=1 --branch <release-branch> \\ <https://<PAT>@github.com/devsentient/monorepo.git> cd monorepo/stack-components/litellm/helm

Use a GitHub PAT (github.com/settings/tokens) — password auth is not supported

Step 2 — Pull Chart Dependencies

helm dependency update . ls charts/ # expect: redis-*.tgz or valkey-*.tgz

Step 3 — Configure values.yaml

Step 4 — Add Provider API Keys as Secrets

Never hardcode API keys in values.yaml. Create a Kubernetes secret and reference it:

Step 5 — Deploy

Step 6 — Verify Deployment Health

Step 7 — Optional: Enable Langfuse Logging

Add to litellmConfig in values.yaml:

litellmConfig: litellm_settings: success_callback: ["langfuse"] failure_callback: ["langfuse"] langfuse_host: <https://langfuse.hyperplane-langfuse.svc.cluster.local> langfuse_public_key: os.environ/LANGFUSE_PUBLIC_KEY langfuse_secret_key: os.environ/LANGFUSE_SECRET_KEY

Then add the Langfuse keys to the Kubernetes secret and redeploy.

Step 8 — Enable Prometheus Metrics

Add to litellmConfig:

litellmConfig: litellm_settings: success_callback: ["prometheus"] failure_callback: ["prometheus"]

Metrics are exposed at /metrics on port 4000. Confirm:

curl <http://localhost:4000/metrics> | grep litellm

Step 9 — OIDC / Keyless Auth (Optional)

For environments like Loblaw that use OIDC-based keyless auth (no hardcoded API keys):

litellmConfig: model_list: - model_name: gemini-flash litellm_params: model: vertex_ai/gemini-2.0-flash vertex_project: os.environ/VERTEXAI_PROJECT vertex_location: us-central1 # No api_key needed — uses Workload Identity / OIDC from the cluster

The LiteLLM pod must have a Kubernetes service account with Workload Identity binding to the GCP service account
Annotate the service account with: iam.gke.io/gcp-service-account: <gcp-sa>@<project>.iam.gserviceaccount.com

Safe Rollback

helm rollback litellm -n hyperplane-litellm # Or to a specific revision helm history litellm -n hyperplane-litellm helm rollback litellm <REVISION> -n hyperplane-litellm

Post-Deployment Checklist

All pods Running — no CrashLoopBackOff
GET /health returns {"status":"healthy"} or per-model status
Test model call returns a valid completion (Step 6)
Virtual key created and tested (see Getting Started)
Provider keys confirmed in Kubernetes secret, not hardcoded in values.yaml
Redis connectivity confirmed (check logs for cache errors if not)
Langfuse and Prometheus callbacks enabled if requested

Administration & Best Practices

This page covers how to keep LiteLLM stable, secure, and cost-efficient in a production Shakudo environment.

Model Configuration Management

All model routing is defined in litellmConfig in values.yaml. Treat this as code:

Version-control values.yaml in Git alongside the Helm chart
Use model aliases not raw provider model IDs in app code — aliases are forward-compatible
Test changes in staging before promoting to production

To update a model without a full redeploy:

kubectl get cm litellm-config -n hyperplane-litellm -o yaml > backup-$(date +%Y%m%d).yaml kubectl edit cm litellm-config -n hyperplane-litellm kubectl rollout restart deployment/litellm -n hyperplane-litellm

Virtual Key Best Practices

Issue one virtual key per application or team — never share the master key with apps
Set max_budget and budget_duration on every key to enforce spend caps
Set models to restrict which models a key can access
Rotate virtual keys periodically using the /key/delete endpoint

# List all keys curl <http://localhost:4000/key/list> -H "Authorization: Bearer <MASTER_KEY>" # Delete a key curl -X DELETE <http://localhost:4000/key/delete> \\ -H "Authorization: Bearer <MASTER_KEY>" \\ -d '{"keys": ["sk-..."]}'

Redis / Valkey Cache

LiteLLM uses Redis/Valkey for rate limiting, response caching, and load-balancing state. Cache tuning:

litellm_settings: cache: true cache_params: type: redis ttl: 600 max_in_memory_cache_size: 200

Redis to Valkey Migration

Valkey is Redis-compatible. LiteLLM supports it with no code changes — just update the endpoint:

cache: type: redis host: valkey-master.hyperplane-valkey.svc.cluster.local port: 6379

Security Basics

Master key

Store in a Kubernetes secret — never in values.yaml plain text
Rotate by updating the secret and restarting the deployment

Provider API keys

All provider keys must be in the Kubernetes secret, not the ConfigMap or values.yaml
Use OIDC/Workload Identity wherever possible to eliminate static key management

Network access

Do not expose LiteLLM publicly — use cluster-internal DNS only
If external access is needed, add authentication middleware (Istio AuthorizationPolicy) in front

Scaling

Scale horizontally — LiteLLM is stateless when backed by Redis
Use HPA on CPU/memory for traffic spikes

kubectl scale deployment/litellm -n hyperplane-litellm --replicas=3

Backup Strategy

Back up the LiteLLM PostgreSQL database if STORE_MODEL_IN_DB is enabled
Keep a Git-versioned copy of values.yaml and litellmConfig as source of truth
Back up provider-key Kubernetes secrets to a secrets manager

Troubleshooting & FAQ

Use this page during live debugging. Format: Problem -> What to check -> Fix.

Deployment Issues

Pod stuck in CrashLoopBackOff

Check: kubectl logs deployment/litellm -n hyperplane-litellm
Common causes: DATABASE_URL incorrect, Redis not reachable, missing env vars
Fix: verify all required env vars are in the Kubernetes secret and the secret is referenced in envFrom

Provider API key error on startup

Check: logs show AuthenticationError or No API key
Fix: confirm the key is in the secret and the env var name matches what litellmConfig references

helm upgrade fails or times out

Check: kubectl get pods -n hyperplane-litellm during upgrade
Fix: run from local kubeconfig with full namespace access. Increase --timeout if image pull is slow

Connection Issues

App cannot reach LiteLLM

Check: use cluster-internal DNS: litellm.hyperplane-litellm.svc.cluster.local:4000
Fix: confirm service is up (kubectl get svc -n hyperplane-litellm) and port is 4000

LiteLLM cannot reach the provider

Check: kubectl exec into pod and test: curl https://api.openai.com/v1/models
Fix: check egress network policy. Cloud providers need outbound HTTPS from the pod

Redis connection errors in logs

Check: redis-master or valkey-master pod is running in the expected namespace
Fix: verify cache.host and cache.port in values.yaml match the actual service DNS

Model and Routing Issues

Model not found error

Check: model name in the request must match a model_name in litellmConfig.model_list exactly
Fix: call GET /v1/models to see available aliases. Update the app to use the correct alias

Fallback not triggering

Check: router_settings.fallbacks must include the failing model as a key
Fix: update fallbacks config and rollout restart. Confirm fallback model alias exists in model_list

Rate limit errors (429) even with quota remaining

Check: the virtual key may have a max_budget or rate_limit set
Fix: inspect with GET /key/info. Increase the budget or rate limit if appropriate

Observability Issues

Langfuse traces not appearing

Check: litellm_settings.success_callback includes "langfuse" in litellmConfig
Check: Langfuse endpoint and keys are correct and in the Kubernetes secret
Fix: check logs for Langfuse callback errors. Confirm pod can reach the Langfuse service

Prometheus metrics endpoint returns nothing

Check: both success_callback and failure_callback must include "prometheus"
Fix: add both callbacks and rollout restart. Then: curl http://localhost:4000/metrics | grep litellm

Frequently Asked Questions

Q: How do I add a new model without a full redeploy?

Update the litellm-config ConfigMap directly and run kubectl rollout restart deployment/litellm. If STORE_MODEL_IN_DB is enabled, you can add models via the API with no restart.

Q: Can I use LiteLLM with a local Ollama instance?

Yes. Add an entry to model_list with model: openai/<model-name>, api_base pointing to the Ollama cluster service, and api_key: "none". LiteLLM treats it as an OpenAI-compatible endpoint.

Q: How do I track which team is spending the most?

Issue a separate virtual key per team with metadata tagging. Call GET /spend/keys to see spend per key. With Langfuse enabled, filter traces by metadata fields like team.

Q: What happens if my primary provider goes down?

If router_settings.fallbacks is configured, LiteLLM automatically retries on the fallback model. num_retries and retry_after control how many attempts before returning an error.

Q: How do I upgrade LiteLLM to a newer version?

Update image.tag in values.yaml, re-run helm dependency update if needed, and redeploy with helm upgrade. Test in staging first and review the LiteLLM changelog for breaking config key name changes.

Large Language Model Llm

What is LiteLLM, and How to Deploy It in an Enterprise Data Stack?

LiteLLM

What is LiteLLM?

What is LiteLLM?

Watch LiteLLM in action

Read more about LiteLLM

Why is LiteLLM better on Shakudo?

LiteLLM Knowledge Base

LiteLLM Overview

What Problem Does LiteLLM Solve?

How LiteLLM Fits in the Shakudo Stack

Key Concepts

What LiteLLM Is Not

Getting Started & Usage

How to Call LiteLLM from Any Application

Creating Virtual Keys

Viewing Available Models

Checking Usage and Spend

Testing a Specific Model

Streaming Responses

Connecting Dify to LiteLLM

Practical Tips

Shakudo SaaS-first quick start

1. Access the component in Shakudo

2. Open the component UI

3. Complete a first safe use case

4. Monitor and validate the result

5. Next steps

Deployment Runbook

Scope

Required Inputs

Step 1 — Clone the Helm Chart

Step 2 — Pull Chart Dependencies

Step 3 — Configure values.yaml

Step 4 — Add Provider API Keys as Secrets

Step 5 — Deploy

Step 6 — Verify Deployment Health

Step 7 — Optional: Enable Langfuse Logging

Step 8 — Enable Prometheus Metrics

Step 9 — OIDC / Keyless Auth (Optional)

Safe Rollback

Post-Deployment Checklist

Administration & Best Practices

Model Configuration Management

Virtual Key Best Practices

Redis / Valkey Cache

Redis to Valkey Migration

Security Basics

Master key

Provider API keys

Network access

Scaling

Backup Strategy

Troubleshooting & FAQ

Deployment Issues

Pod stuck in CrashLoopBackOff

Provider API key error on startup

helm upgrade fails or times out

Connection Issues

App cannot reach LiteLLM

LiteLLM cannot reach the provider

Redis connection errors in logs

Model and Routing Issues

Model not found error

Fallback not triggering

Rate limit errors (429) even with quota remaining

Observability Issues

Langfuse traces not appearing

Prometheus metrics endpoint returns nothing

Frequently Asked Questions

Q: How do I add a new model without a full redeploy?

Q: Can I use LiteLLM with a local Ollama instance?

Q: How do I track which team is spending the most?

Q: What happens if my primary provider goes down?

Q: How do I upgrade LiteLLM to a newer version?

Why is LiteLLM better on Shakudo?

Why is LiteLLM better on Shakudo?

Core Shakudo Features

Own Your AI