Agentic Analytics and the Semantic Layer
Agentic analytics is what happens when the bridge between business questions and data answers is an AI agent rather than a human analyst. A user asks "which product categories grew fastest this quarter?" and an agent generates SQL, runs it against Iceberg tables, and returns a business-language answer. No analyst writes the query, no BI dashboard has to be pre-built.
Making this reliable requires more than pointing a language model at a database. The semantic layer is the difference between an agentic analytics system that is useful and one that gives confidently wrong answers.
The NL2SQL Pipeline
Why Raw Schemas Are Not Enough
Consider a table with columns: ord_id, cust,
tot, dt, st, rgn. An
LLM seeing this schema does not know that tot excludes tax, that
st is an order status where CANCELLED orders must be excluded
from revenue, or that dt is stored in UTC. Without that context,
the LLM writes syntactically valid SQL that returns the wrong answer.
What the Semantic Layer Provides
| Element | Example | Why agents need it |
|---|---|---|
| Table description | "Canonical order table. Every customer order since 2020-01-01. Excludes test orders." | Agent knows when to use this table vs others |
| Column descriptions | "tot: net order revenue in USD after discounts, before tax and shipping" | Agent uses the right column for revenue |
| Metric definitions | "revenue = SUM(tot) WHERE st IN ('SHIPPED','DELIVERED')" | Agent applies correct filters without guessing |
| Relationship declarations | "cust references customers.customer_id" | Agent writes correct JOIN conditions |
| Business rules | "Never include CANCELLED orders in any revenue metric" | Agent applies domain logic, not just SQL logic |
| Timezone / unit notes | "dt is stored in UTC" | Agent converts correctly for local-time questions |
Dremio as a Platform-Managed Semantic Layer
Dremio Virtual Datasets wrap raw Iceberg tables with a clean schema, column descriptions in plain language, and enforced business logic. Dremio's AI Semantic Layer makes these descriptions available to the AI agent at query time, so the LLM generating SQL sees business context rather than raw column names. The metadata lives in Dremio, versioned and governed alongside the data.
Common Failure Modes
| Failure | Cause | Fix |
|---|---|---|
| Wrong revenue number | LLM included CANCELLED orders | Business rule in semantic layer |
| Join returns duplicates | LLM guessed wrong join key | Relationship declaration in semantic layer |
| Wrong date range | LLM used local timezone, data is UTC | Timezone note in column description |
| Hallucinated numbers | LLM answered from training data, not live data | Force agent to always execute SQL, never answer from memory |
| Unauthorized table access | No access control on query engine | Catalog RBAC (Polaris) + MCP server authorization |