Apache Iceberg vs Delta Lake vs Apache Hudi
Apache Iceberg, Delta Lake, and Apache Hudi are the three formats that
have become the standard choices for mutable analytical tables in object
storage. Each solves the same core problem (consistent, updatable tables
on cheap storage) but with different design priorities that make them
better fits for different workloads.
Origins
| Format | Created by | Year | Governance | Primary design goal |
| Apache Iceberg | Netflix | 2018 | Apache Software Foundation | Multi-engine interoperability and open standards |
| Delta Lake | Databricks | 2019 | Linux Foundation | Reliable data lake on top of Spark |
| Apache Hudi | Uber | 2019 | Apache Software Foundation | High-frequency upserts and incremental processing |
How Each Format Tracks Table State
graph TD
subgraph ICE["Apache Iceberg"]
I1["metadata.json (current snapshot pointer)"]
I2["Manifest List (snapshot state)"]
I3["Manifest Files (per-file stats)"]
I4["Parquet Data Files"]
I1 --> I2 --> I3 --> I4
end
subgraph DL["Delta Lake"]
D1["_delta_log/ (JSON commit files + checkpoints)"]
D3["Parquet Data Files"]
D1 --> D3
end
subgraph HUDI["Apache Hudi"]
H1[".hoodie/ timeline (commit files, compaction)"]
H2["Base Files (Parquet)"]
H3["Delta Log Files (MoR only)"]
H1 --> H2
H1 --> H3
end
Feature Comparison
| Feature | Apache Iceberg | Delta Lake | Apache Hudi |
| Time travel | Yes (snapshot ID or timestamp) | Yes (version or timestamp) | Yes (timeline-based) |
| Schema evolution | Full (column IDs, no rewrites) | Full | Full |
| Partition evolution | Yes (no rewrites) | Partial | Limited |
| Hidden partitioning | Yes | No | No |
| Row-level deletes | Yes (CoW + MoR) | Yes (deletion vectors in 2.0+) | Yes (native, multiple strategies) |
| Branching and tagging | Yes | No (Unity Catalog only) | No |
| Open catalog spec | Yes (REST Catalog) | No (Unity proprietary) | No |
| Credential vending | Yes (via Polaris, Nessie, Glue) | Via Unity Catalog (Databricks) | No standard mechanism |
Multi-Engine Support
| Engine | Iceberg | Delta Lake | Hudi |
| Apache Spark | Full | Full (best-in-class) | Full |
| Apache Flink | Full | Read + limited write | Full |
| Trino | Full | Read + write (connector) | Read |
| Dremio | Full (native) | Read (external table) | Limited |
| AWS Athena | Full | Full | Read |
| Google BigQuery | Full (BigLake) | No | No |
| Snowflake | Full (Iceberg + Open Catalog) | No | No |
| DuckDB | Read + partial write | No | No |
Decision Framework
flowchart TD
A["What is your primary requirement?"]
A -->|"Multi-engine reads and writes OR open catalog governance OR cloud-native"| B["Apache Iceberg"]
A -->|"All-in on Databricks + Spark with Unity Catalog for governance"| C["Delta Lake"]
A -->|"High-frequency key-based upserts in Spark-primary streaming pipelines"| D["Apache Hudi"]
| Your situation | Best format |
| New project, no existing vendor commitment | Apache Iceberg |
| All-in Databricks, using Unity Catalog | Delta Lake |
| Spark-based CDC with frequent key-based updates | Apache Hudi |
| AI agent analytics on enterprise data | Apache Iceberg (Dremio + Polaris) |
| Multi-cloud or multi-engine architecture | Apache Iceberg |
| AWS S3-native managed table service | Apache Iceberg (S3 Tables) |
| Google Cloud-native managed table service | Apache Iceberg (BigLake) |
Go Deeper
๐ Go Deeper on Apache Iceberg
Alex Merced has authored three hands-on books covering Apache Iceberg, the
Agentic Lakehouse, and modern data architecture. Pick up a copy to master
the full ecosystem.
โ๏ธ
Stay Current on Data & AI
Subscribe to Alex Merced's weekly newsletter โ Data, Lakehouse & AI โ for deep dives, tutorials, and industry insights delivered straight
to your inbox.
Subscribe for Free on Substack โ