Stay organized with collections Save and categorize content based on your preferences.
Build a well-functioning data mesh architecture on Google Cloud. Get started.
Jump to

Dataplex

Break free from data silos with Dataplex’s intelligent data fabric that enables organizations to centrally discover, manage, monitor, and govern their data across data lakes, data warehouses, and data marts with consistent controls, providing access to trusted data and powering analytics at scale.

  • Single pane of glass for data management across data silos

  • Centralized security and governance enabling distributed ownership of data with global control

  • Unified search and data discovery, based on business context, across distributed data

  • Built-in data intelligence to enable trust in data and accelerate time to insights

  • An open platform with support for open source tools and a robust partner ecosystem

Benefits

Freedom of choice

Get the freedom to store data where you want for the best price and performance while choosing the best analytics tools, open source or cloud native, to accelerate the entire analytics lifecycle.

Intelligent automation

Built-in data intelligence using Google’s best-in-class AI/ML capabilities that automate data discovery, metadata harvesting, data lifecycle management, data quality and lineage to reduce management costs.

Unified governance

Enable standardization and unification of metadata, security policies, governance, and data classification for consistency across distributed data.

Key features

Key features

Simplified data discovery

Automate data discovery, classification, and metadata enrichment of structured, semi-structured, and unstructured data, stored in Google Cloud and beyond, with built-in data intelligence. Manage technical, operational, and business metadata, for all your data, in a unified, flexible, and powerful Data Catalog. Easily search, find, and understand your data with built-in faceted-search interface using the same search technology as Gmail. 

Data organization and life cycle management

Logically organize your data that spans multiple storage services into business-specific domains using Dataplex lakes and data zones. Manage, curate, tier, and archive your data easily with one click.

Centralized security and governance

Enable central policy management, monitoring, and auditing for data authorization and classification, across data silos. Facilitate distributed data ownership based on business domains with global monitoring and governance. 

Built-in data quality and lineage

Automate data quality across distributed data and enable access to data you can trust. Use automatically captured data lineage to better understand your data, trace dependencies, and effectively troubleshoot data issues. 

Serverless data exploration

Interactively query fully governed, high-quality data using a serverless data exploration workbench with one-click access to Spark SQL scripts and Jupyter notebooks. Easily collaborate across teams with built-in publishing, sharing, and search features. Operationalize your work with one-click scheduling from the workbench. 

Customers

Learn from customers using Dataplex

Snap

"We have PBs of data stored in Google Cloud, accessed by 1,000s of internal users daily. Dataplex enables us to deliver a business domain-specific, self-service data platform across distributed data, with decentralized data ownership but centralized governance and visibility. We are very excited to adopt Dataplex as a central component for building a unified data mesh across our analytics data."

Saral Jain, Director of Engineering, Snap Inc

Documentation

Documentation

Google Cloud Basics
How Dataplex works

As you identify new data sources, Dataplex harvests the metadata for both structured and unstructured data, using built-in data quality checks to enhance integrity.

Google Cloud Basics
Overview of Data Catalog

Find out how Data Catalog powers the efficient use of your data.

Quickstart
How to get started with Dataplex

Logically organize your data stored into lakes and zones, and automate data management and governance across that data to power analytics at scale.

Tutorial
How to search with Data Catalog

Use Data Catalog to perform a search of data assets, such as datasets, tables, views, and Pub/Sub topics in your Google Cloud projects.

Best Practice
Dataplex best practices

Follow these best practices to optimize your Dataplex experience.

APIs & Libraries
Dataplex API

Use Dataplex APIs to centrally manage and govern distributed data.

APIs & Libraries
Data Catalog API

Use Data Catalog APIs to centrally manage and enrich metadata for your distributed data.

Use cases

Use cases

Use case
Build a data mesh architecture

Build a business domain-specific data mesh architecture across data in Cloud Storage and BigQuery using Dataplex. Enable decentralized ownership of data while still centrally managing, monitoring, and governing data across your enterprise and making this data securely accessible to a variety of analytics and data science tools. 

Use case
Democratize data insights with Data Catalog

Easily search and discover your data assets across data silos using a fully-managed, serverless Data Catalog within Dataplex. Data Catalog provides built-in capabilities to automatically ingest technical metadata, enrich metadata with relevant business context, and empower every user in your organization to easily find and understand their data using a powerful faceted search interface. 

Pricing

Pricing

Dataplex pricing is based on pay-as-you-go usage, including:

- Dataplex processing, which covers the data discovery feature in Dataplex

- Data catalog metadata storage

- Data catalog read, write, and search API calls

Partners

Partnering with industry leaders

We are working with industry leading data analytics providers so Dataplex can quickly integrate with your existing data analytics investments.