Apache Hudi
Hudi brings transactions, record-level updates/deletes and change streams to data lakes!
Hudi Data Lakes
Hudi is a rich platform to build streaming data lakes with incremental data pipelines
on a self-managing database layer, while being optimized for lake engines and regular batch processing.
on a self-managing database layer, while being optimized for lake engines and regular batch processing.
Hudi Features
Upserts, Deletes with fast, pluggable indexing. | Incremental queries, Record level change streams |
Transactions, Rollbacks, Concurrency Control. | SQL Read/Writes from Spark, Presto, Trino, Hive & more |
Automatic file sizing, data clustering, compactions, cleaning. | Streaming ingestion, Built-in CDC sources & tools. |
Built-in metadata tracking for scalable storage access. | Backwards compatible schema evolution and enforcement. |