Skip to main content

Hudi banner

Apache Hudi

Hudi brings transactions, record-level updates/deletes and change streams to data lakes!

Hudi Data Lakes

Hudi is a rich platform to build streaming data lakes with incremental data pipelines
on a self-managing database layer, while being optimized for lake engines and regular batch processing.
Hudi Data Lake

Hudi Features

Upserts, Deletes with fast, pluggable indexing.Incremental queries, Record level change streams
Transactions, Rollbacks, Concurrency Control.SQL Read/Writes from Spark, Presto, Trino, Hive & more
Automatic file sizing, data clustering, compactions, cleaning.Streaming ingestion, Built-in CDC sources & tools.
Built-in metadata tracking for scalable storage access.Backwards compatible schema evolution and enforcement.