Welcome!
RSS FeedThis blog will contain articles about the Iceberg Data Lakehouse (using your data lake are your data warehouse with Apache Iceberg) and The Agentic Lakehouse (Lakehouses Optimized for working with AI Agents). What is Apache Iceberg? What is a Data Lakehouse? Where can I find more resources are all the types of content you'll find on this blog.
This blog is not affiliated with the Apache Foundation or the Apache Iceberg project whose official page is iceberg.apache.org.
Join the Data Lakehouse Hub Slack Community: Join Now!
Subscribe to our calendar of Data Lakehouse events: Subscribe!
Recent Posts
Open Table Format Benchmarks: Why They Require Critical Evaluation
Published: at 12:00 PMAn in-depth analysis of open table format benchmarks comparing Apache Iceberg, Delta Lake, and Apache Hudi, detailing the pitfalls of standard benchmarks and how to choose a format.
Apache Iceberg SCD Type 2 and CDC Patterns: Building Historical Lakehouse Tables
Published: at 11:00 AMA deep dive into implementing Slowly Changing Dimension Type 2 (SCD Type 2) patterns and Change Data Capture (CDC) pipelines on Apache Iceberg, using PySpark and Dremio.
Setting Up an AWS-Native Open Lakehouse: Querying Apache Iceberg with AWS Athena and AWS Glue Catalog
Published: at 10:30 AMA comprehensive guide to building an open, high-performance lakehouse on AWS using Apache Iceberg, AWS Glue Catalog, Amazon S3, and S3 Tables, with query acceleration via the Dremio engine.
Apache Iceberg Catalogs Explained: REST, Glue, Hive Metastore, Polaris, Nessie, and Snowflake
Published: at 10:00 AMA deep dive into Apache Iceberg catalog architecture, comparing REST catalogs, AWS Glue, Project Nessie, Polaris, and Snowflake. Learn catalog role, credential vending, and cross-engine configurations.
Must Reads on Iceberg, Agentic AI and Lakehouse from Around the Web
-
The Definitive Guide to the Semantic Layer
Understand what a semantic layer is, why it matters for modern data architectures, and how it creates a consistent, governed layer between raw data and business consumers.
Read Article -
Apache Polaris: The Catalog Standard for Lakehouses and AI
A deep dive into Apache Polaris, the open-source catalog that is emerging as the standard for managing Iceberg tables across multi-engine Lakehouses and AI workloads.
Read Article -
What Are Table Formats and Why Were They Needed?
Explore the history and motivations behind open table formats like Apache Iceberg, Delta Lake, and Apache Hudi, and why they solved critical problems in big data engineering.
Read Article -
What is Dremio?
A comprehensive overview of Dremio's Lakehouse platform — how it unifies data access, accelerates queries, and powers self-service analytics across cloud and on-premise sources.
Read Article -
What Apache Iceberg Native Actually Means
Not all Iceberg integrations are equal. This article breaks down what it truly means for a platform to be 'Apache Iceberg native' and why the distinction matters for your architecture.
Read Article -
Open Source and the Data Lakehouse
A survey of the open source ecosystem powering modern Data Lakehouses — from Apache Iceberg and Nessie to Apache Arrow and Spark — and how they work together.
Read Article -
What is Agentic Analytics?
Discover how AI agents are transforming analytics pipelines — autonomously querying data, generating insights, and taking actions — and what it means for the future of the Lakehouse.
Read Article