Apache Iceberg vs Delta Lake vs Apache Hudi

Apache Iceberg, Delta Lake, and Apache Hudi are the three formats that have become the standard choices for mutable analytical tables in object storage. Each solves the same core problem (consistent, updatable tables on cheap storage) but with different design priorities that make them better fits for different workloads.

Origins

Format	Created by	Year	Governance	Primary design goal
Apache Iceberg	Netflix	2018	Apache Software Foundation	Multi-engine interoperability and open standards
Delta Lake	Databricks	2019	Linux Foundation	Reliable data lake on top of Spark
Apache Hudi	Uber	2019	Apache Software Foundation	High-frequency upserts and incremental processing

How Each Format Tracks Table State

graph TD subgraph ICE["Apache Iceberg"] I1["metadata.json (current snapshot pointer)"] I2["Manifest List (snapshot state)"] I3["Manifest Files (per-file stats)"] I4["Parquet Data Files"] I1 --> I2 --> I3 --> I4 end subgraph DL["Delta Lake"] D1["_delta_log/ (JSON commit files + checkpoints)"] D3["Parquet Data Files"] D1 --> D3 end subgraph HUDI["Apache Hudi"] H1[".hoodie/ timeline (commit files, compaction)"] H2["Base Files (Parquet)"] H3["Delta Log Files (MoR only)"] H1 --> H2 H1 --> H3 end

Feature Comparison

Feature	Apache Iceberg	Delta Lake	Apache Hudi
Time travel	Yes (snapshot ID or timestamp)	Yes (version or timestamp)	Yes (timeline-based)
Schema evolution	Full (column IDs, no rewrites)	Full	Full
Partition evolution	Yes (no rewrites)	Partial	Limited
Hidden partitioning	Yes	No	No
Row-level deletes	Yes (CoW + MoR)	Yes (deletion vectors in 2.0+)	Yes (native, multiple strategies)
Branching and tagging	Yes	No (Unity Catalog only)	No
Open catalog spec	Yes (REST Catalog)	No (Unity proprietary)	No
Credential vending	Yes (via Polaris, Nessie, Glue)	Via Unity Catalog (Databricks)	No standard mechanism

Multi-Engine Support

Engine	Iceberg	Delta Lake	Hudi
Apache Spark	Full	Full (best-in-class)	Full
Apache Flink	Full	Read + limited write	Full
Trino	Full	Read + write (connector)	Read
Dremio	Full (native)	Read (external table)	Limited
AWS Athena	Full	Full	Read
Google BigQuery	Full (BigLake)	No	No
Snowflake	Full (Iceberg + Open Catalog)	No	No
DuckDB	Read + partial write	No	No

Decision Framework

flowchart TD A["What is your primary requirement?"] A -->|"Multi-engine reads and writes OR open catalog governance OR cloud-native"| B["Apache Iceberg"] A -->|"All-in on Databricks + Spark with Unity Catalog for governance"| C["Delta Lake"] A -->|"High-frequency key-based upserts in Spark-primary streaming pipelines"| D["Apache Hudi"]

Your situation	Best format
New project, no existing vendor commitment	Apache Iceberg
All-in Databricks, using Unity Catalog	Delta Lake
Spark-based CDC with frequent key-based updates	Apache Hudi
AI agent analytics on enterprise data	Apache Iceberg (Dremio + Polaris)
Multi-cloud or multi-engine architecture	Apache Iceberg
AWS S3-native managed table service	Apache Iceberg (S3 Tables)
Google Cloud-native managed table service	Apache Iceberg (BigLake)