Data Catalog

What is DataHub, and How to Deploy It in an Enterprise Data Stack?

Last updated on
May 12, 2026

What is DataHub?

DataHub is a powerful open-source platform that allows you to easily manage your data and collaborate with your team. Are you struggling with data silos and manual processes that slow down your projects? DataHub helps you break down silos, streamline workflows, and accelerate innovation. With its intuitive interface and flexible architecture, DataHub makes it easy to access, manage, and share data across your organization. Plus, its built-in security features ensure that your data is always protected.

Watch DataHub in action

Read more about DataHub

No items found.

Why is DataHub better on Shakudo?

DataHub Knowledge Base

Overview

DataHub is an open-source metadata platform used to catalog data assets, document ownership, track lineage, and make datasets easier to discover and govern.

In a Shakudo environment, DataHub sits at the data discovery and governance layer. It connects to warehouses, BI tools, orchestration tools, and databases so teams can understand what data exists, who owns it, and how it is used.

This page is written for onboarding and deployment calls. It focuses on what customers need to understand, provide, validate, and troubleshoot in a real environment.

Where it fits in the stack

  • Primary role: DataHub provides a reusable platform capability rather than a one-off application.
  • Typical deployment model: Kubernetes + Helm, with customer-specific values and secrets.
  • Typical access model: private internal endpoint or customer-approved external route.
  • Typical support model: validate deployment health first, then validate user workflow and integrations.

Getting Started

Start with one safe workflow in DataHub before enabling production usage. The goal is to prove connectivity, permissions, and operational ownership.

What the customer needs to provide

  • metadata sources such as warehouses, databases, Airbyte, dbt, Superset, or Kafka
  • ingestion credentials with read-only metadata access
  • search/index backend such as Elasticsearch or OpenSearch
  • Kafka and SQL metadata store configuration, either bundled or external
  • initial admin users and ownership model

First workflow

  • Open the DataHub UI
  • Create or import the first ingestion source
  • Run ingestion against one safe source first, such as a staging database
  • Review datasets, schemas, ownership, and glossary terms
  • Add owners, tags, and documentation for high-value assets
  • Schedule ingestion after the initial result is validated

Administration and Best Practices

Use these practices to keep DataHub reliable after the initial deployment.

  • Start with a small number of high-value sources before cataloging everything
  • Use read-only ingestion credentials
  • Define owner and domain conventions before asking teams to contribute
  • Schedule metadata ingestion during low-traffic windows
  • Monitor Elasticsearch/OpenSearch storage because metadata indexes grow over time
  • Back up DataHub metadata store before upgrades

Troubleshooting & FAQ

Use this section during customer debugging calls. Format: Problem → What to check → Fix.

Ingestion job fails

  • What to check: Check connector credentials, network access, and the ingestion pod logs
  • Fix: Fix the source config and rerun the ingestion recipe manually

Assets do not appear in search

  • What to check: Check GMS health, search backend health, and whether ingestion completed
  • Fix: Restart ingestion and confirm Elasticsearch/OpenSearch indexes are healthy

Lineage is missing

  • What to check: Check whether the source supports lineage and whether dbt/BI metadata was ingested
  • Fix: Add the relevant source connector or dbt manifest ingestion

UI loads but metadata pages error

  • What to check: Check DataHub GMS logs and metadata store connectivity
  • Fix: Restart GMS after confirming database and Kafka are healthy

Why is DataHub better on Shakudo?

Why is DataHub better on Shakudo?

Core Shakudo Features

Own Your AI

Keep data sovereign, protect IP, and avoid vendor lock-in with infra-agnostic deployments.

Faster Time-to-Value

Pre-built templates and automated DevOps accelerate time-to-value.
integrate

Flexible with Experts

Operating system and dedicated support ensure seamless adoption of the latest and greatest tools.
See Shakudo in Action
Neal Gilmore
Get Started >