Kestra Overview
Kestra is an open-source workflow orchestration platform that lets teams automate data pipelines, ETL jobs, AI workflows, and infrastructure tasks using simple YAML-based definitions. Think of it as the glue layer that coordinates everything: it runs your scripts, calls your APIs, moves your data, and responds to events — all without custom glue code.
In Shakudo environments, Kestra is deployed as a managed stack component. It sits between your data sources, compute services, and downstream systems, giving your team a single place to author, schedule, and monitor all automated workflows.
What Problem Does Kestra Solve?
Without an orchestrator, teams write ad-hoc cron jobs, one-off scripts, and manual pipelines that are hard to monitor, retry, or hand off. Kestra replaces that fragmentation with a single platform where every workflow is versioned, observable, and recoverable.
- Replaces cron jobs and manual scripts with tracked, retryable workflows
- Gives a UI to monitor all executions, logs, and errors in one place
- Supports both scheduled (time-based) and event-driven (trigger-based) execution
- Integrates with 1,000+ tools via plugins — databases, cloud services, APIs, ML frameworks
How Kestra Fits in the Shakudo Stack
Kestra typically sits in the orchestration layer alongside or as an alternative to Airflow. It connects to:
- MinIO: reads and writes files to object storage as part of pipeline steps
- PostgreSQL / Supabase: stores workflow state and runs SQL tasks
- External APIs and services: any HTTP endpoint or plugin-supported tool
- Containers and scripts: runs Python, Bash, or containerized tasks in isolated environments
- LLM services (via LiteLLM or Ollama): orchestrates AI-enrichment steps in data pipelines
Key Concepts
- Flow: the core unit in Kestra. A flow is a YAML file that defines tasks, triggers, and execution order.
- Task: a single unit of work inside a flow (run a script, call an API, query a database).
- Trigger: what starts a flow. Triggers can be time-based (schedule) or event-based (webhook, file arrival).
- Namespace: a folder-like grouping for flows. Used to separate environments (dev, prod) or teams.
- Execution: a single run of a flow. Executions have logs, inputs, outputs, and status.
- Plugin: an extension that adds new task types. Plugins exist for AWS, GCP, Postgres, HTTP, Python, and hundreds more.
What Kestra Is Not
- Not a data transformation engine (use dbt or Pandas inside Kestra tasks for that).
- Not a feature store or model registry (use MLflow alongside Kestra for ML lifecycle management).
- Not a streaming platform (use Kafka or Flink for real-time streams; Kestra handles batch and near-real-time events).
Deployment Runbook
Helm-based deployment of Kestra v1.3.2 on a Shakudo-managed Kubernetes cluster. Uses the shared cluster MinIO for storage and a bundled PostgreSQL instance for state.
What Has Worked in Practice
- Deploy from a local kubeconfig with full namespace admin access — CI pipeline service accounts fail due to cross-namespace secret restrictions
- Use the Shakudo monorepo Helm chart (branch kestra_upgrade_v1.3.2) not the upstream open-source chart
- Use shared cluster MinIO (hyperplane-minio namespace) — do not deploy a per-Kestra MinIO instance
- Set basicAuth username to email format ([email protected]) — plain usernames are rejected in v1.3.x
- Include both datasources.default and datasources.postgres in values.yaml — Micronaut 4 requires both or silently fails
Required Inputs
Confirm before starting:
- Local kubeconfig with full namespace admin access to hyperplane-kestra
- GitHub PAT for cloning the Shakudo monorepo (password auth no longer works)
- MinIO root credentials (MINIO_ROOT_USER, MINIO_ROOT_PASSWORD)
- Chosen kestra-<env> bucket name and service account key
- Database password for the bundled PostgreSQL
- basicAuth password: must have uppercase + lowercase + digit + 8+ chars
- Customer domain for the basicAuth username (e.g. [email protected])
Step 1 — Clone the Helm Chart
Step 2 — Pull Chart Dependencies
helm dependency update .
ls charts/ # expect: postgresql-*.tgz, minio-*.tgz
ls charts/ # expect: postgresql-*.tgz, minio-*.tgz
Step 3 — Create MinIO Bucket
Step 4 — Configure values.yaml
Critical sections — do not omit any of these:
Step 5 — Deploy
helm upgrade kestra . \\
--install \\
--namespace hyperplane-kestra \\
--values values.yaml \\
--timeout 10m \\
--wait
--install \\
--namespace hyperplane-kestra \\
--values values.yaml \\
--timeout 10m \\
--wait
Step 6 — Verify Deployment Health
Expected pod state:
- kestra-postgres-0 — 2/2 Running
- kestra-standalone-* — 2/2 Running
- kestra-post-install-* — 1/2 Completed (normal — Istio sidecar stays up after init job finishes)
Step 7 — ConfigMap Patch (config changes without full redeploy)
Use this pattern for credential updates, AI Copilot addition, or endpoint changes:
📌 Always use single-quoted <<'YAML' (not <<YAML) when writing YAML heredocs in terminal. The double-quote version causes a dquote> hang when the YAML contains double-quoted strings.
Safe Rollback
# Roll back to previous Helm release
helm rollback kestra -n hyperplane-kestra
# View history first
helm history kestra -n hyperplane-kestra
helm rollback kestra -n hyperplane-kestra
# View history first
helm history kestra -n hyperplane-kestra
Post-Deployment Checklist
- All pods Running or Completed (no CrashLoopBackOff)
- Pod readiness 2/2 (Istio sidecar + app)
- Zero Warning events in namespace
- MinIO health returns 200 OK
- Kestra API responds on /api/v1/flows
- Login works with configured credentials
- Test flow created and executed end-to-end
- ConfigMap has all three required sections: kestra.*, datasources.default, datasources.postgres
Administration & Best Practices
This page covers how to keep a Kestra deployment stable, organized, and secure in a production Shakudo environment.
Workflow Organisation (Namespaces)
Namespaces in Kestra are like folders. Use them to separate environments, teams, or workflow domains:
- production: live, scheduled workflows
- staging: test new flows before promoting to production
- dev: individual developer workflows
- data-team, ops, ai: domain-based grouping within production
Use lowercase with hyphens (e.g. production.data-ingestion). Avoid deep nesting.
Version Control (Git Integration)
Store flow YAML files in a Git repository and sync them to Kestra for change history, peer review, and rollback.
- Keep all flow YAML files under a flows/ directory in the monorepo or a dedicated flows repo
- Use a CI pipeline to push flows to Kestra via the API on merge to main
- Tag each flow with a version comment in the YAML for audit purposes
# Push a flow via API
curl -X POST <http://localhost:8080/api/v1/flows> \\
-H "Content-Type: application/yaml" \\
--data-binary @my-flow.yaml
Retry Policies and Error Handling
Always define retry behaviour on tasks that call external services (APIs, databases, SFTP):
tasks:
- id: call-api
type: io.kestra.plugin.core.http.Request
uri: <https://api.example.com/data>
retry:
type: exponential
maxAttempts: 3
multiplier: 2.0
maxDuration: PT10M
For flow-level error handling, use an errors block to run cleanup or notification tasks:
errors:
- id: notify-failure
type: io.kestra.plugin.core.log.Log
message: "Flow {{flow.id}} failed on execution {{execution.id}}"
Scaling Workers
- Increase worker thread count in values.yaml for higher throughput
- For high volume, move to a distributed deployment (separate webserver + workers) — contact the Shakudo team
- Stagger cron expressions to avoid hundreds of flows starting at the same second
Security Basics
BasicAuth credentials
- Username must be email format: [email protected]
- Password must meet complexity: uppercase + lowercase + digit + 8+ chars
- Store credentials in a Kubernetes secret — do not hardcode plain text in values.yaml
RBAC
The open-source version uses basicAuth for a single admin account. Multi-user RBAC requires Kestra Enterprise. Confirm with the Shakudo team if this is needed.
Plugin blacklist
The ICI deployment blacklists the Docker plugin to prevent arbitrary container execution:
kestra:
plugins:
blacklist: ["io.kestra.plugin.docker.*"]
ConfigMap Backup Before Changes
kubectl get cm kestra-config -n hyperplane-kestra -o yaml > backup-$(date +%Y%m%d).yaml
Backup Strategy
- PostgreSQL: schedule a pg_dump and upload to MinIO or off-cluster storage
- MinIO: include the kestra-<env> bucket in the backup policy
- Flows: export all flows periodically via the API if not already in Git
Troubleshooting Guide
Pod stuck in CrashLoopBackOff after install
- Check: kubectl logs -n hyperplane-kestra deployment/kestra-standalone -c kestra-standalone
- Look for DataSource or Micronaut errors -- usually missing datasources.default block
- Fix: add all three datasource sections in values.yaml: kestra.datasources.postgres, datasources.default, datasources.postgres
Login fails or password rejected
- Check: username must be email format ([email protected])
- Check: password must have uppercase + lowercase + digit + 8+ chars
- Fix: update credentials in ConfigMap using Step 7 and rollout restart
helm upgrade hangs or fails with RBAC error
- Check: CI service accounts lack cross-namespace secret access
- Fix: run helm upgrade from a local kubeconfig with full namespace admin access
Post-install job shows 1/2 Completed
- This is normal with Istio enabled -- the sidecar stays running after the init job completes
- No action needed unless the init container shows Error or CrashLoopBackOff
Connection Issues
Kestra cannot connect to MinIO
- Check: endpoint must use cluster-internal DNS (minio.hyperplane-minio.svc.cluster.local:9000)
- Check: run the Step 6 MinIO health check to verify reachability
- Fix: re-run Step 3 if the bucket or service account is missing
Kestra cannot connect to PostgreSQL
- Check: kestra-postgres-0 must be Running
- Check: passwords in ConfigMap datasources.default and datasources.postgres must be correct
- Fix: update ConfigMap with correct credentials and rollout restart
Workflow Issues
Flow not triggering on schedule
- Check: validate cron expression at crontab.guru
- Check: flow must be enabled in the UI
- Check: Kestra must have been running at the scheduled time -- missed runs are not replayed
- Fix: trigger manually via UI or API to confirm the flow itself works, then re-check trigger
Task fails with plugin not found
- Check: copy the plugin type string exactly from the Kestra plugin docs
- Check: plugin must not be on the blacklist in values.yaml
- Fix: remove from blacklist or use an alternative plugin type
Execution stuck in Running state
- Check: all pods in hyperplane-kestra namespace must be healthy
- Check: view logs for the stuck execution in the Kestra UI
- Fix: check the external resource the task is waiting on; rollout restart if pods are unhealthy
dquote> prompt when applying ConfigMap YAML
- Cause: using <<EOF (unquoted) with YAML containing double-quoted strings -- shell misinterprets them
- Fix: use single-quoted heredoc <<'YAML' (see Step 7 in the Deployment Runbook)
Performance Issues
Executions are slow or queuing
- Check: default worker thread count is low for high-volume environments
- Check: PostgreSQL pod CPU/memory -- it is the shared queue backend
- Fix: increase workerThread in ConfigMap. For heavy workloads, consider distributed mode
Frequently Asked Questions
Q: How do I add a new plugin?
Kestra ships with many plugins bundled. To add a new one, raise a request with the Shakudo team to include it in the next custom Kestra image build.
Q: How do I promote a flow from staging to production?
Change the namespace in the flow YAML from staging to production and re-create the flow in Kestra. The original staging flow remains until manually deleted.
Q: Can I run Python scripts inside Kestra tasks?
Yes. Use io.kestra.plugin.scripts.python.Script for inline scripts. Ensure required Python libraries are available in the Kestra image.
Q: What happens if a scheduled flow misses its window because Kestra was down?
Kestra does not replay missed schedules by default. That execution is skipped. Trigger a backfill manually via UI or API if needed. Implement external monitoring if missed executions are critical.
Q: How do I update Kestra to a newer version?
Switch to the new branch in the Shakudo monorepo, update the image tag in values.yaml, re-run helm dependency update, and redeploy with helm upgrade. Always back up the ConfigMap, test in staging first, and review the Kestra changelog for breaking changes.

