Skip to main content
Hudi banner

Apache Hudi!

Hudi brings transactions, record-level updates/deletes and change streams to data lakes!

Hudi Data Lakes#

Hudi is a rich platform to build streaming data lakes with incremental data pipelines
on a self-managing database layer, while being optimized for lake engines and regular batch processing.
Hudi Data Lake

Hudi Features

Upserts, Deletes with fast, pluggable indexing.Incremental queries, Record level change streams
Transactions, Rollbacks, Concurrency Control.SQL Read/Writes from Spark, Presto, Trino, Hive & more
Automatic file sizing, data clustering, compactions, cleaning.Streaming ingestion, Built-in CDC sources & tools.
Built-in metadata tracking for scalable storage access.Backwards compatible schema evolution and enforcement.