Skip to content

Iceberg REST Catalog and Apache Polaris

The Iceberg REST Catalog specification defines a standard HTTP API that any catalog can implement and any engine can call. Before this spec existed, each engine needed its own connector for each catalog. The REST spec replaces that matrix with a single interface: implement the API once and every compliant engine can connect.

The Multi-Engine Problem the REST Spec Solves

graph TD subgraph ENGINES["Query Engines"] SP["Apache Spark"] FL["Apache Flink"] TR["Trino"] DR["Dremio"] AT["Athena / BigQuery / Snowflake"] end subgraph API["Iceberg REST Catalog API"] EP1["GET /v1/config"] EP2["POST /v1/oauth/tokens"] EP3["GET /v1/namespaces/ns/tables/table"] EP4["POST /v1/namespaces/ns/tables/table (commit)"] end subgraph CATALOGS["Catalog Implementations"] POL["Apache Polaris"] NES["Project Nessie"] GLUE["AWS Glue"] SOC["Snowflake Open Catalog"] end ENGINES --> API API --> CATALOGS
EndpointWhat it does
GET /v1/configReturns catalog-level config: warehouse path, defaults, credential scope
POST /v1/oauth/tokensExchanges client credentials for a bearer token
GET /v1/namespacesLists all namespaces
GET /v1/namespaces/{ns}/tables/{table}Loads table metadata and returns vended storage credentials
POST /v1/namespaces/{ns}/tables/{table}Commits a new snapshot
POST /v1/namespaces/{ns}/viewsCreates or updates a view

Credential Vending

When an engine loads a table, the catalog response includes temporary, scoped storage credentials alongside the table metadata. The engine uses these to read and write data files in object storage without holding long-lived storage keys. The catalog enforces access policies and issues short-lived tokens that expire after the session.

sequenceDiagram participant E as Engine (Spark / Trino / Dremio) participant C as REST Catalog (Apache Polaris) participant S3 as Object Storage E->>C: GET /v1/namespaces/analytics/tables/orders Note over C: Check RBAC for requesting principal C-->>E: metadata.json location + vended S3 credentials (15-min TTL) E->>S3: Read Parquet files with vended credentials S3-->>E: Data

Apache Polaris

Apache Polaris is the reference implementation of the Iceberg REST Catalog specification. It was co-created by Dremio and Snowflake and donated to the Apache Software Foundation in 2024. Polaris is open source and production-ready.

Connecting Engines

-- Apache Spark
spark.conf.set("spark.sql.catalog.polaris", "org.apache.iceberg.spark.SparkCatalog")
spark.conf.set("spark.sql.catalog.polaris.type", "rest")
spark.conf.set("spark.sql.catalog.polaris.uri", "https://your-polaris-host/api/catalog")
spark.conf.set("spark.sql.catalog.polaris.credential", "client_id:client_secret")
spark.conf.set("spark.sql.catalog.polaris.scope", "PRINCIPAL_ROLE:my_role")
# catalog/polaris.properties (Trino)
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=https://your-polaris-host/api/catalog
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.credential=client_id:client_secret

Go Deeper

๐Ÿ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.