Pipeline Orchestration

What is Kestra, and How to Deploy It in an Enterprise Data Stack?

Last updated on
May 12, 2026

What is Kestra?

Kestra is an open-source orchestrator designed for both scheduled and event-driven workflows. It stands out by integrating Infrastructure as Code practices into data, process, and microservice orchestration, allowing for the creation of dependable workflows with just a few lines of code. The platform's declarative YAML interface and versatile UI make it accessible to both developers and business professionals, fostering collaborative workflow creation. With Kestra, users benefit from simplified workflow management, the ability to adapt quickly to changes through UI or API, and a robust set of developer tools. This unique combination ensures users can efficiently build and manage complex workflows, making Kestra a superior choice for reliable process orchestration.

Watch Kestra in action

Read more about Kestra

No items found.

Why is Kestra better on Shakudo?

Kestra Knowledge Base

Kestra Overview

Kestra is an open-source workflow orchestration platform that lets teams automate data pipelines, ETL jobs, AI workflows, and infrastructure tasks using simple YAML-based definitions. Think of it as the glue layer that coordinates everything: it runs your scripts, calls your APIs, moves your data, and responds to events — all without custom glue code.

In Shakudo environments, Kestra is deployed as a managed stack component. It sits between your data sources, compute services, and downstream systems, giving your team a single place to author, schedule, and monitor all automated workflows.

What Problem Does Kestra Solve?

Without an orchestrator, teams write ad-hoc cron jobs, one-off scripts, and manual pipelines that are hard to monitor, retry, or hand off. Kestra replaces that fragmentation with a single platform where every workflow is versioned, observable, and recoverable.

  • Replaces cron jobs and manual scripts with tracked, retryable workflows
  • Gives a UI to monitor all executions, logs, and errors in one place
  • Supports both scheduled (time-based) and event-driven (trigger-based) execution
  • Integrates with 1,000+ tools via plugins — databases, cloud services, APIs, ML frameworks

How Kestra Fits in the Shakudo Stack

Kestra typically sits in the orchestration layer alongside or as an alternative to Airflow. It connects to:

  • MinIO: reads and writes files to object storage as part of pipeline steps
  • PostgreSQL / Supabase: stores workflow state and runs SQL tasks
  • External APIs and services: any HTTP endpoint or plugin-supported tool
  • Containers and scripts: runs Python, Bash, or containerized tasks in isolated environments
  • LLM services (via LiteLLM or Ollama): orchestrates AI-enrichment steps in data pipelines

Key Concepts

  • Flow: the core unit in Kestra. A flow is a YAML file that defines tasks, triggers, and execution order.
  • Task: a single unit of work inside a flow (run a script, call an API, query a database).
  • Trigger: what starts a flow. Triggers can be time-based (schedule) or event-based (webhook, file arrival).
  • Namespace: a folder-like grouping for flows. Used to separate environments (dev, prod) or teams.
  • Execution: a single run of a flow. Executions have logs, inputs, outputs, and status.
  • Plugin: an extension that adds new task types. Plugins exist for AWS, GCP, Postgres, HTTP, Python, and hundreds more.

What Kestra Is Not

  • Not a data transformation engine (use dbt or Pandas inside Kestra tasks for that).
  • Not a feature store or model registry (use MLflow alongside Kestra for ML lifecycle management).
  • Not a streaming platform (use Kafka or Flink for real-time streams; Kestra handles batch and near-real-time events).

Deployment Runbook

Helm-based deployment of Kestra v1.3.2 on a Shakudo-managed Kubernetes cluster. Uses the shared cluster MinIO for storage and a bundled PostgreSQL instance for state.

What Has Worked in Practice

  • Deploy from a local kubeconfig with full namespace admin access — CI pipeline service accounts fail due to cross-namespace secret restrictions
  • Use the Shakudo monorepo Helm chart (branch kestra_upgrade_v1.3.2) not the upstream open-source chart
  • Use shared cluster MinIO (hyperplane-minio namespace) — do not deploy a per-Kestra MinIO instance
  • Set basicAuth username to email format ([email protected]) — plain usernames are rejected in v1.3.x
  • Include both datasources.default and datasources.postgres in values.yaml — Micronaut 4 requires both or silently fails

Required Inputs

Confirm before starting:

  • Local kubeconfig with full namespace admin access to hyperplane-kestra
  • GitHub PAT for cloning the Shakudo monorepo (password auth no longer works)
  • MinIO root credentials (MINIO_ROOT_USER, MINIO_ROOT_PASSWORD)
  • Chosen kestra-<env> bucket name and service account key
  • Database password for the bundled PostgreSQL
  • basicAuth password: must have uppercase + lowercase + digit + 8+ chars
  • Customer domain for the basicAuth username (e.g. [email protected])

Step 1 — Clone the Helm Chart

Step 2 — Pull Chart Dependencies

helm dependency update .
ls charts/   # expect: postgresql-*.tgz, minio-*.tgz

Step 3 — Create MinIO Bucket

Step 4 — Configure values.yaml

Critical sections — do not omit any of these:

Step 5 — Deploy

helm upgrade kestra . \\
 --install \\
 --namespace hyperplane-kestra \\
 --values values.yaml \\
 --timeout 10m \\
 --wait

Step 6 — Verify Deployment Health

Expected pod state:

  • kestra-postgres-0 — 2/2 Running
  • kestra-standalone-* — 2/2 Running
  • kestra-post-install-* — 1/2 Completed (normal — Istio sidecar stays up after init job finishes)

Step 7 — ConfigMap Patch (config changes without full redeploy)

Use this pattern for credential updates, AI Copilot addition, or endpoint changes:


📌 Always use single-quoted <<'YAML' (not <<YAML) when writing YAML heredocs in terminal. The double-quote version causes a dquote> hang when the YAML contains double-quoted strings.

Safe Rollback

# Roll back to previous Helm release
helm rollback kestra -n hyperplane-kestra

# View history first
helm history kestra -n hyperplane-kestra

Post-Deployment Checklist

  • All pods Running or Completed (no CrashLoopBackOff)
  • Pod readiness 2/2 (Istio sidecar + app)
  • Zero Warning events in namespace
  • MinIO health returns 200 OK
  • Kestra API responds on /api/v1/flows
  • Login works with configured credentials
  • Test flow created and executed end-to-end
  • ConfigMap has all three required sections: kestra.*, datasources.default, datasources.postgres

Administration & Best Practices

This page covers how to keep a Kestra deployment stable, organized, and secure in a production Shakudo environment.

Workflow Organisation (Namespaces)

Namespaces in Kestra are like folders. Use them to separate environments, teams, or workflow domains:

  • production: live, scheduled workflows
  • staging: test new flows before promoting to production
  • dev: individual developer workflows
  • data-team, ops, ai: domain-based grouping within production

Use lowercase with hyphens (e.g. production.data-ingestion). Avoid deep nesting.

Version Control (Git Integration)

Store flow YAML files in a Git repository and sync them to Kestra for change history, peer review, and rollback.

  • Keep all flow YAML files under a flows/ directory in the monorepo or a dedicated flows repo
  • Use a CI pipeline to push flows to Kestra via the API on merge to main
  • Tag each flow with a version comment in the YAML for audit purposes

# Push a flow via API
curl -X POST <http://localhost:8080/api/v1/flows> \\
 -H "Content-Type: application/yaml" \\
 --data-binary @my-flow.yaml

Retry Policies and Error Handling

Always define retry behaviour on tasks that call external services (APIs, databases, SFTP):

tasks:
 - id: call-api
   type: io.kestra.plugin.core.http.Request
   uri: <https://api.example.com/data>
   retry:
     type: exponential
     maxAttempts: 3
     multiplier: 2.0
     maxDuration: PT10M

For flow-level error handling, use an errors block to run cleanup or notification tasks:

errors:
 - id: notify-failure
   type: io.kestra.plugin.core.log.Log
   message: "Flow {{flow.id}} failed on execution {{execution.id}}"

Scaling Workers

  • Increase worker thread count in values.yaml for higher throughput
  • For high volume, move to a distributed deployment (separate webserver + workers) — contact the Shakudo team
  • Stagger cron expressions to avoid hundreds of flows starting at the same second

Security Basics

BasicAuth credentials

  • Username must be email format: [email protected]
  • Password must meet complexity: uppercase + lowercase + digit + 8+ chars
  • Store credentials in a Kubernetes secret — do not hardcode plain text in values.yaml

RBAC

The open-source version uses basicAuth for a single admin account. Multi-user RBAC requires Kestra Enterprise. Confirm with the Shakudo team if this is needed.

Plugin blacklist

The ICI deployment blacklists the Docker plugin to prevent arbitrary container execution:

kestra:
 plugins:
   blacklist: ["io.kestra.plugin.docker.*"]

ConfigMap Backup Before Changes

kubectl get cm kestra-config -n hyperplane-kestra -o yaml > backup-$(date +%Y%m%d).yaml

Backup Strategy

  • PostgreSQL: schedule a pg_dump and upload to MinIO or off-cluster storage
  • MinIO: include the kestra-<env> bucket in the backup policy
  • Flows: export all flows periodically via the API if not already in Git

Troubleshooting Guide

Pod stuck in CrashLoopBackOff after install

  • Check: kubectl logs -n hyperplane-kestra deployment/kestra-standalone -c kestra-standalone
  • Look for DataSource or Micronaut errors -- usually missing datasources.default block
  • Fix: add all three datasource sections in values.yaml: kestra.datasources.postgres, datasources.default, datasources.postgres

Login fails or password rejected

  • Check: username must be email format ([email protected])
  • Check: password must have uppercase + lowercase + digit + 8+ chars
  • Fix: update credentials in ConfigMap using Step 7 and rollout restart

helm upgrade hangs or fails with RBAC error

  • Check: CI service accounts lack cross-namespace secret access
  • Fix: run helm upgrade from a local kubeconfig with full namespace admin access

Post-install job shows 1/2 Completed

  • This is normal with Istio enabled -- the sidecar stays running after the init job completes
  • No action needed unless the init container shows Error or CrashLoopBackOff

Connection Issues

Kestra cannot connect to MinIO

  • Check: endpoint must use cluster-internal DNS (minio.hyperplane-minio.svc.cluster.local:9000)
  • Check: run the Step 6 MinIO health check to verify reachability
  • Fix: re-run Step 3 if the bucket or service account is missing

Kestra cannot connect to PostgreSQL

  • Check: kestra-postgres-0 must be Running
  • Check: passwords in ConfigMap datasources.default and datasources.postgres must be correct
  • Fix: update ConfigMap with correct credentials and rollout restart

Workflow Issues

Flow not triggering on schedule

  • Check: validate cron expression at crontab.guru
  • Check: flow must be enabled in the UI
  • Check: Kestra must have been running at the scheduled time -- missed runs are not replayed
  • Fix: trigger manually via UI or API to confirm the flow itself works, then re-check trigger

Task fails with plugin not found

  • Check: copy the plugin type string exactly from the Kestra plugin docs
  • Check: plugin must not be on the blacklist in values.yaml
  • Fix: remove from blacklist or use an alternative plugin type

Execution stuck in Running state

  • Check: all pods in hyperplane-kestra namespace must be healthy
  • Check: view logs for the stuck execution in the Kestra UI
  • Fix: check the external resource the task is waiting on; rollout restart if pods are unhealthy

dquote> prompt when applying ConfigMap YAML

  • Cause: using <<EOF (unquoted) with YAML containing double-quoted strings -- shell misinterprets them
  • Fix: use single-quoted heredoc <<'YAML' (see Step 7 in the Deployment Runbook)

Performance Issues

Executions are slow or queuing

  • Check: default worker thread count is low for high-volume environments
  • Check: PostgreSQL pod CPU/memory -- it is the shared queue backend
  • Fix: increase workerThread in ConfigMap. For heavy workloads, consider distributed mode

Frequently Asked Questions

Q: How do I add a new plugin?

Kestra ships with many plugins bundled. To add a new one, raise a request with the Shakudo team to include it in the next custom Kestra image build.

Q: How do I promote a flow from staging to production?

Change the namespace in the flow YAML from staging to production and re-create the flow in Kestra. The original staging flow remains until manually deleted.

Q: Can I run Python scripts inside Kestra tasks?

Yes. Use io.kestra.plugin.scripts.python.Script for inline scripts. Ensure required Python libraries are available in the Kestra image.

Q: What happens if a scheduled flow misses its window because Kestra was down?

Kestra does not replay missed schedules by default. That execution is skipped. Trigger a backfill manually via UI or API if needed. Implement external monitoring if missed executions are critical.

Q: How do I update Kestra to a newer version?

Switch to the new branch in the Shakudo monorepo, update the image tag in values.yaml, re-run helm dependency update, and redeploy with helm upgrade. Always back up the ConfigMap, test in staging first, and review the Kestra changelog for breaking changes.

Why is Kestra better on Shakudo?

Why is Kestra better on Shakudo?

Core Shakudo Features

Own Your AI

Keep data sovereign, protect IP, and avoid vendor lock-in with infra-agnostic deployments.

Faster Time-to-Value

Pre-built templates and automated DevOps accelerate time-to-value.
integrate

Flexible with Experts

Operating system and dedicated support ensure seamless adoption of the latest and greatest tools.
See Shakudo in Action
Neal Gilmore
Get Started >