< Browse Integrations

Data Catalog

What is DataHub, and How to Deploy It in an Enterprise Data Stack?

Last updated on

May 12, 2026

DataHub

Website

Github

Video

See DataHub on Shakudo

What is DataHub?

DataHub is a powerful open-source platform that allows you to easily manage your data and collaborate with your team. Are you struggling with data silos and manual processes that slow down your projects? DataHub helps you break down silos, streamline workflows, and accelerate innovation. With its intuitive interface and flexible architecture, DataHub makes it easy to access, manage, and share data across your organization. Plus, its built-in security features ensure that your data is always protected.

What is DataHub?

Watch DataHub in action

Why is DataHub better on Shakudo?

DataHub Knowledge Base

Overview

DataHub is an open-source metadata platform used to catalog data assets, document ownership, track lineage, and make datasets easier to discover and govern.

In a Shakudo environment, DataHub sits at the data discovery and governance layer. It connects to warehouses, BI tools, orchestration tools, and databases so teams can understand what data exists, who owns it, and how it is used.

This page is written for onboarding and deployment calls. It focuses on what customers need to understand, provide, validate, and troubleshoot in a real environment.

Where it fits in the stack

Primary role: DataHub provides a reusable platform capability rather than a one-off application.
Typical deployment model: Kubernetes + Helm, with customer-specific values and secrets.
Typical access model: private internal endpoint or customer-approved external route.
Typical support model: validate deployment health first, then validate user workflow and integrations.

Getting Started

Start with one safe workflow in DataHub before enabling production usage. The goal is to prove connectivity, permissions, and operational ownership.

What the customer needs to provide

metadata sources such as warehouses, databases, Airbyte, dbt, Superset, or Kafka
ingestion credentials with read-only metadata access
search/index backend such as Elasticsearch or OpenSearch
Kafka and SQL metadata store configuration, either bundled or external
initial admin users and ownership model

First workflow

Open the DataHub UI
Create or import the first ingestion source
Run ingestion against one safe source first, such as a staging database
Review datasets, schemas, ownership, and glossary terms
Add owners, tags, and documentation for high-value assets
Schedule ingestion after the initial result is validated

Administration and Best Practices

Use these practices to keep DataHub reliable after the initial deployment.

Start with a small number of high-value sources before cataloging everything
Use read-only ingestion credentials
Define owner and domain conventions before asking teams to contribute
Schedule metadata ingestion during low-traffic windows
Monitor Elasticsearch/OpenSearch storage because metadata indexes grow over time
Back up DataHub metadata store before upgrades

Troubleshooting & FAQ

Use this section during customer debugging calls. Format: Problem → What to check → Fix.

Ingestion job fails

What to check: Check connector credentials, network access, and the ingestion pod logs
Fix: Fix the source config and rerun the ingestion recipe manually

Assets do not appear in search

What to check: Check GMS health, search backend health, and whether ingestion completed
Fix: Restart ingestion and confirm Elasticsearch/OpenSearch indexes are healthy

Lineage is missing

What to check: Check whether the source supports lineage and whether dbt/BI metadata was ingested
Fix: Add the relevant source connector or dbt manifest ingestion

UI loads but metadata pages error

What to check: Check DataHub GMS logs and metadata store connectivity
Fix: Restart GMS after confirming database and Kafka are healthy

Data Catalog

What is DataHub, and How to Deploy It in an Enterprise Data Stack?

DataHub

What is DataHub?

What is DataHub?

Watch DataHub in action

Read more about DataHub

Why is DataHub better on Shakudo?

DataHub Knowledge Base

Overview

Where it fits in the stack

Getting Started

What the customer needs to provide

First workflow

Administration and Best Practices

Troubleshooting & FAQ

Ingestion job fails

Assets do not appear in search

Lineage is missing

UI loads but metadata pages error

Why is DataHub better on Shakudo?

Why is DataHub better on Shakudo?

Core Shakudo Features

Own Your AI

Faster Time-to-Value

Flexible with Experts