< Browse Integrations

Large Language Model Llm

What is Langfuse, and How to Deploy It in an Enterprise Data Stack?

Last updated on

May 12, 2026

Langfuse

Website

Github

Video

See Langfuse on Shakudo

What is Langfuse?

Langfuse is an open-source LLM engineering platform that's gaining traction among tech-savvy teams for its robust observability and tracing capabilities. Its core differentiator lies in its production-ready, asynchronous architecture that doesn't compromise application performance. Organizations appreciate Langfuse for its ability to streamline debugging, analysis, and iteration of LLM applications, offering features like model-based evaluations, user feedback collection, and manual annotations. Teams using Langfuse report significant improvements in application latency and quality, with one European fintech halving their application's response time through Langfuse's tracing insights. The platform's open-core model, extensive integrations, and focus on data ownership also make it an attractive choice for enterprises looking to maintain control over their LLM infrastructure while benefiting from advanced observability tools.

What is Langfuse?

Watch Langfuse in action

Why is Langfuse better on Shakudo?

While Langfuse is powerful on its own, deploying it on Shakudo's data and AI operating system takes it to the next level.

By running Langfuse on Shakudo, you get the best of both worlds: Langfuse's cutting-edge LLM tools and Shakudo's seamless deployment and integration capabilities. Instead of wrestling with complex setups or worrying about security, you can have Langfuse up and running in minutes with just a few clicks. Shakudo's automated DevOps ensures your Langfuse instance is always optimized and secure, while its deep integration ecosystem allows you to effortlessly connect Langfuse with your existing AI stack. This combination not only saves you time and resources but also provides a level of operational efficiency and scalability that's hard to achieve with other solutions or self-deployment.

Langfuse Knowledge Base

Langfuse Overview

Langfuse is an open-source observability and analytics platform purpose-built for LLM applications. It gives your team a single place to see every prompt, response, trace, latency number, token count, and cost — across all your AI-powered tools and services, from production runs to debugging sessions.

In Shakudo environments, Langfuse sits in the observability layer, downstream of LiteLLM and alongside applications like Dify and AgentFlow. Every model call that flows through LiteLLM can be logged to Langfuse automatically, giving you a live trace of exactly what your AI stack is doing.

What Problem Does Langfuse Solve?

LLM applications are hard to debug and expensive to operate without visibility. When something goes wrong — a bad response, a spike in latency, unexpected token costs — you need to know which prompt caused it, which model handled it, and how long it took. Without Langfuse, that information lives in scattered logs or does not exist at all.

Captures full traces: prompt, model, response, latency, token count, and cost in one view
Lets you compare prompt versions and evaluate quality over time
Shows which models and agents are most expensive or slow
Enables your team to replay and debug individual LLM calls

How Langfuse Fits in the Shakudo Stack

Langfuse is the observability layer for all AI activity in the environment:

LiteLLM logs every model call to Langfuse via a success/failure callback — no app code changes needed
Dify can send traces to Langfuse directly via its built-in Langfuse integration
Custom Python or JavaScript apps use the Langfuse SDK to instrument their own LLM calls
LangChain and LlamaIndex apps work with Langfuse via native callback handlers
Langfuse stores trace data in PostgreSQL and exports artifacts to MinIO

Key Concepts

Trace: a complete record of one logical operation — e.g. a user query from start to finish, including all LLM calls, tool uses, and sub-steps inside it.
Span: a single step within a trace (one LLM call, one function, one retrieval). Spans can nest.
Observation: the individual data points inside a span: input, output, latency, token count, cost.
Session: a group of traces belonging to the same user conversation or workflow run.
Score: a human or automated evaluation attached to a trace or span (e.g. pass/fail, 1-5 rating).
Prompt Management: versioned prompts stored in Langfuse and linked to the traces that use them.

What Langfuse Is Not

Not an LLM gateway. It does not route model calls — use LiteLLM for that.
Not a log aggregation platform. It is purpose-built for LLM traces, not general application logs.
Not a feature store or model registry. It tracks prompt/response quality, not model weights.

Administration & Best Practices

This page covers how to keep Langfuse stable, organised, and cost-efficient in a production Shakudo environment.

Project and API Key Organisation

Separate observability data by creating one Langfuse project per environment:

production: all live traffic — strict access
staging: pre-production validation
dev: developer experiments and testing

Create one API key pair per application or team so usage is attributable and keys can be rotated independently. Revoke old keys via Settings > API Keys.

Tagging Traces for Observability

Always include user_id, session_id, and metadata on traces to enable filtering, cost attribution, and debugging:

trace = client.trace( name="workflow-name", user_id="[email protected]", session_id="session-abc-123", metadata={"team": "risk", "env": "prod", "version": "v2.1"} )

Without tags, you cannot filter who or what triggered a given trace.

Data Retention and Storage Management

Langfuse stores trace data in PostgreSQL and media files in MinIO. Both grow over time:

Set a retention policy on the PostgreSQL langfuse database to delete old traces
Configure TTL on the MinIO bucket to auto-expire old files
Langfuse v3 supports configurable data retention — check Settings > Data Retention in the UI

# Monitor PostgreSQL DB size kubectl exec -it langfuse-postgresql-0 -n hyperplane-langfuse -- \\ psql -U langfuse -c "SELECT pg_size_pretty(pg_database_size('langfuse'));" # Monitor MinIO bucket size mc du shakudo-minio/langfuse-<env>

Keep-Alive Timeout (GCP/GKE Production Fix)

On GCP with Cloud Load Balancer, the default Node.js keep-alive timeout (5s) is shorter than the load balancer idle timeout (600s). This causes 502 errors on long requests.

Fix (already included in the Deployment Runbook):

LANGFUSE_HTTP_KEEPALIVE_TIMEOUT_MS: "620000" LANGFUSE_HTTP_HEADERS_TIMEOUT_MS: "621000"

Always verify these are set after upgrades — they can be reset if values.yaml is regenerated from defaults.

Upgrades

Update image.tag in values.yaml to the new Langfuse version and redeploy:

helm upgrade langfuse . \\ --namespace hyperplane-langfuse \\ --values values.yaml \\ --timeout 10m \\ --wait

Langfuse v3 runs database migrations automatically on startup. Always back up the PostgreSQL database before upgrading.

Security Basics

Secrets

Store NEXTAUTH_SECRET, SALT, database password, and MinIO credentials in a Kubernetes secret
Never put secrets in values.yaml or ConfigMaps in plain text

Access control

Langfuse v3 has built-in RBAC at the organisation and project level
Use project-level roles (Owner, Admin, Member, Viewer) to limit who can see traces
For SSO/OIDC integration, configure AUTH_CUSTOM_CLIENT_ID and related env vars

Network

Expose Langfuse only on cluster-internal DNS unless external access is explicitly needed
If external access is required, use an Istio VirtualService or ingress with authentication

Backup Strategy

PostgreSQL: schedule regular pg_dump and upload to MinIO or off-cluster storage
MinIO: include the langfuse-<env> bucket in the cluster backup policy
API keys: if the database is lost, all API keys are lost — keep a secure record of public keys

Troubleshooting & FAQ

Use this page during live debugging. Format: Problem -> What to check -> Fix.

Deployment Issues

Pod stuck in CrashLoopBackOff

Check: kubectl logs deployment/langfuse -n hyperplane-langfuse
Common causes: DATABASE_URL incorrect, missing NEXTAUTH_SECRET or SALT, PostgreSQL not ready
Fix: confirm all required env vars are in the Kubernetes secret and referenced in envFrom. Wait for postgresql pod to be Running before the main pod starts.

UI loads but shows database error

Check: Langfuse could connect to the service but failed the migration or query
Fix: verify DATABASE_URL and DIRECT_URL both point to the correct PostgreSQL host and database. Check pod logs for Prisma migration errors.

502 or timeout errors on requests

Check: GCP/GKE environments — keepAlive timeout is shorter than load balancer idle timeout
Fix: set LANGFUSE_HTTP_KEEPALIVE_TIMEOUT_MS=620000 and LANGFUSE_HTTP_HEADERS_TIMEOUT_MS=621000 in values.yaml and redeploy

Trace Ingestion Issues

Traces not appearing in the UI

Check: POST to /api/public/ingestion returns errors — look in the response body for specific failures
Check: verify the Authorization header uses the correct public/secret key pair for the project
Fix: re-run the Step 8 validation curl command. Confirm the keys match the project in the UI.

LiteLLM traces not appearing in Langfuse

Check: LiteLLM litellmConfig includes success_callback and failure_callback with "langfuse"
Check: langfuse_host, langfuse_public_key, and langfuse_secret_key are set and correct
Fix: check LiteLLM pod logs for Langfuse callback errors. Confirm LiteLLM pod can reach the Langfuse service on port 3000.

MinIO export error when downloading traces

Check: LANGFUSE_S3_ENDPOINT must use the cluster-internal MinIO DNS, not an external URL
Check: LANGFUSE_S3_FORCE_PATH_STYLE must be "true" for MinIO compatibility
Fix: update the MinIO env vars and rollout restart. Run the Step 3 MinIO health check to confirm connectivity.

Performance Issues

Langfuse UI is slow to load traces

Check: PostgreSQL pod CPU/memory — it is the primary data store
Check: number of traces in the database — very large tables slow down queries
Fix: add database indexes on frequently queried fields. Set a data retention policy to purge old traces.

Trace ingestion throughput is low

Check: Langfuse default deployment is single-replica — high-volume environments may need more replicas
Fix: scale the deployment: kubectl scale deployment/langfuse -n hyperplane-langfuse --replicas=2

Frequently Asked Questions

Q: How do I reset the admin password?

Langfuse uses email-based sign-in with magic links by default. If email is not configured, reset the password directly in PostgreSQL by updating the users table. Contact the Shakudo team for a guided reset.

Q: Can I use Langfuse without LiteLLM?

Yes. Any application can send traces to Langfuse via the SDK, REST API, or framework callbacks (LangChain, LlamaIndex, Dify). LiteLLM integration is the most automatic path but is not required.

Q: How do I delete traces to manage storage?

Use the Langfuse UI: filter traces by date, project, or other criteria and delete them in bulk. Or use the API: DELETE /api/public/traces with filters. Set a retention policy in Settings > Data Retention to automate this.

Q: Will traces still appear if Langfuse is temporarily down?

If Langfuse is unavailable, LiteLLM and SDK clients will log errors but continue serving model requests. Traces generated during the outage are lost — there is no built-in queue or replay. For high-availability trace requirements, run Langfuse with multiple replicas.

Q: How do I upgrade Langfuse to a new version?

Update image.tag in values.yaml to the new version and run helm upgrade. Langfuse v3 handles database migrations automatically on startup. Always back up the PostgreSQL database first and check the Langfuse changelog for breaking changes.

Large Language Model Llm

What is Langfuse, and How to Deploy It in an Enterprise Data Stack?

Langfuse

What is Langfuse?

What is Langfuse?

Watch Langfuse in action

Read more about Langfuse

AI Risk Management: How Enterprises Can Safely Scale AI

AI Trends 2026: Master Complex Enterprise Challenges

Shakudo Technology Partnership Roundup #02

Why is Langfuse better on Shakudo?

Langfuse Knowledge Base

Langfuse Overview

What Problem Does Langfuse Solve?

How Langfuse Fits in the Shakudo Stack

Key Concepts

What Langfuse Is Not

Administration & Best Practices

Project and API Key Organisation

Tagging Traces for Observability

Data Retention and Storage Management

Keep-Alive Timeout (GCP/GKE Production Fix)

Upgrades

Security Basics

Secrets

Access control

Network

Backup Strategy

Troubleshooting & FAQ

Deployment Issues

Pod stuck in CrashLoopBackOff

UI loads but shows database error

502 or timeout errors on requests

Trace Ingestion Issues

Traces not appearing in the UI

LiteLLM traces not appearing in Langfuse

MinIO export error when downloading traces

Performance Issues

Langfuse UI is slow to load traces

Trace ingestion throughput is low

Frequently Asked Questions

Q: How do I reset the admin password?

Q: Can I use Langfuse without LiteLLM?

Q: How do I delete traces to manage storage?

Q: Will traces still appear if Langfuse is temporarily down?

Q: How do I upgrade Langfuse to a new version?

Why is Langfuse better on Shakudo?

Why is Langfuse better on Shakudo?

Core Shakudo Features

Own Your AI

Faster Time-to-Value

Flexible with Experts