Tallie AI
Architecture
9 min read

Letting an LLM Write SQL Against Your Warehouse — Safely

If your agent can read the warehouse, the right question is not 'can it answer the question?' but 'what is the worst query it could run, and what stops it?'

Archie Norman

Most "AI for finance" pitches contain, somewhere in the middle, a slide about the agent reading from your warehouse. The slide is usually a green checkmark next to the word "Snowflake" or "BigQuery." The CFO nods. The CISO does not. (For the broader argument that finance teams should keep the data, the model, and the deployment posture in their hands, see Customer-Controlled AI.)

The CISO is right to hesitate. Letting a language model emit SQL against the system of record for your business is the kind of capability that earns a vendor the demo and loses them the procurement cycle — unless the vendor has thought hard about what sits between the model and the query planner. Most have not.

This post is a sketch of how we think about it. No vendor pitch — just the model.

The wrong question

The wrong question is "can the LLM write good SQL?" Frontier models can write good SQL. They can also write a DELETE with no WHERE clause, a sixteen-way join that locks a production table, an exfiltration query that selects every row of a payroll fact table, or a perfectly valid statement against a schema they were never supposed to see.

Capability is not the safety story. The safety story is what happens when the model gets it wrong, gets it right against the wrong table, or gets it right but is being manipulated by the user in front of it.

The right question is "what is the worst query the agent could run, and what would stop it?" That is the question this post is about.

The threat model in plain English

Three threats matter, in order of how easy they are to ignore:

  1. Mutation by accident. The agent generates a DELETE, UPDATE, TRUNCATE, DROP, MERGE, GRANT, or CREATE against your warehouse — because the user asked it to "clean up duplicate rows," because a model regression made it less cautious, or because the prompt contained an instruction it should have ignored.
  2. Mutation by injection. A user pastes content from an email, a CRM note, or an uploaded PDF that contains text designed to look like an instruction to the agent. The agent treats it as one and emits a query the user never asked for.
  3. Over-reading. The agent issues a syntactically valid, read-only query that returns far more than it should — a SELECT * against a table containing PII, a join that pulls customer data the requesting user is not entitled to see, or a query that fans out into millions of rows and lands in a model context window.

Threats 1 and 2 are about what kind of statement the agent can issue. Threat 3 is about what data a permitted statement can return. The defenses are different.

Defense layer 1: the connection itself is incapable

The first and most important rule: the agent's database connection should be structurally incapable of mutation. Not "the prompt tells it not to mutate." Not "the tool description says read-only." Incapable.

Concretely, that means:

  • A dedicated database role used only by the agent, with SELECT privileges on the schemas it is allowed to read and no other privileges anywhere. No INSERT, no UPDATE, no DELETE, no CREATE, no GRANT, no schema-modification rights, no EXECUTE on functions that mutate.
  • Sessions opened in read-only transaction mode, so even a privilege escalation bug or a misconfiguration cannot be used to write.
  • No access to system catalogs that expose credentials, no access to extensions that reach the network, no access to file-system functions.

This is a database administration job, not an AI engineering job. It is also the layer most vendors hand-wave past. If the vendor cannot describe their agent's database role in one paragraph, they have not done it.

The instinct to build mutation safety into the prompt is exactly the wrong instinct. Prompts are advisory. Database privileges are not.

Defense layer 2: a SQL gate in front of the planner

A read-only role stops mutation. It does not stop the other ways an LLM-generated query can go wrong: queries that are syntactically read-only but operationally dangerous, queries against schemas the agent should not see, multi-statement payloads designed to slip past simple checks.

So the second layer is a SQL gate — a small, deterministic piece of code that sits between the model's output and the database, and refuses anything it does not like. The gate is not a model. It is parser-driven and explicit.

A useful gate enforces, at minimum:

  • Single-statement only. A query must parse to exactly one statement. No semicolon-separated payloads, no piggy-backed DROP after a SELECT. The classic injection pattern dies at the door.
  • Read-only statement types only. SELECT and WITH ... SELECT are allowed. Everything else is rejected by AST type, not by string matching. ("Don't use regex to detect SQL keywords" is one of those rules that everyone agrees with and half of vendors break.)
  • An allow-list of schemas and tables. The gate knows which schemas the agent is permitted to read from, and rejects any query that references anything else. Adding a new table to the allow-list is a deliberate, reviewable change — not something the model can do at runtime.
  • A row-count ceiling. Every query is rewritten with a LIMIT (and, for warehouses that support it, a query timeout and a max-bytes-scanned cap). The agent does not get to decide how much data to return; the gate does.
  • No DDL, no DCL, no procedural extensions. No CREATE, no GRANT, no CALL, no warehouse-specific procedural blocks that can hide mutation behind a function call.

Together, the connection layer and the gate layer mean that even if the model produces something unhinged, the worst outcome is a rejected query. Not a missing table.

Defense layer 3: schema discovery before ad-hoc queries

The third layer is about discipline, not enforcement. Even with a perfect gate, an agent that is guessing at table names will produce noisy, expensive queries — and a lot of "is this column called customer_id or cust_id or account_no?" thrash.

The fix is to make schema discovery a separate, deterministic step. Before the agent issues any ad-hoc SQL, it has access to a curated description of the allow-listed schemas: tables, columns, types, descriptions, and example values. That description is authored — not auto-generated from the warehouse — so it reflects what the team actually wants the agent to use, in the language they actually use to describe it.

This pattern has two effects. First, the agent stops guessing, which kills a long tail of failed queries and accidental joins. Second, the team has a reviewable artifact — "here is what the agent knows about the warehouse" — that can be edited, versioned, and audited. The line between "data the agent can use" and "data that exists" becomes a deliberate decision rather than an emergent one. The same versioning pattern shows up in Skills, Not Prompts — both are runtime artifacts the customer owns and can review.

Defense layer 4: a query log every CFO can read

Every query the agent runs must be logged in a form a human can read in plain English: who asked, what skill ran, what statement was emitted, which tables were touched, how many rows came back, and which model produced the SQL. Stored alongside the agent's overall run log, it becomes the artifact you hand an auditor when they ask "what did your AI do, on what data, last quarter?"

The log is not a safety mechanism on its own. It is what makes the other three layers trustable. If a query slips through that should not have, the log is how you find it. If a regulator asks how you are governing model access to the warehouse, the log is the evidence.

We have seen a lot of AI systems where the auditing story is "we sample some queries to a separate logging system." That is not enough. Every statement, every time, with the context that produced it.

What this rules out — and why that is the point

Stack the four layers and there is a class of agent behaviour you can no longer ship:

  • An agent that "explores" the warehouse on its own, finding tables nobody added to the allow-list.
  • An agent that runs an UPDATE because a user asked it to "fix a wrong status."
  • An agent that returns a million rows because the model's LIMIT clause was ignored.
  • An agent whose actions on the warehouse cannot be reconstructed after the fact.

That is the point. The capabilities removed by the four layers are exactly the capabilities a finance function does not want an LLM to have.

What remains — a model that can compose well-formed, read-only, bounded queries against a curated set of tables, and whose every statement is logged — is genuinely useful. It is also the only shape of warehouse access that survives a real CISO review.

What to ask a vendor

If you are evaluating any AI tool that claims to read from your warehouse, four questions cut through the demo:

  1. What database role does the agent use, and what privileges does it hold? "Read-only" is a posture, not a configuration. You want the role definition.
  2. What sits between the model's SQL output and the database — and what would it reject? If the answer is "the prompt tells it not to do bad things," the answer is "nothing."
  3. Where is the list of tables the agent is allowed to read, and who controls it? If the answer is "the model figures it out from the warehouse," that is a no.
  4. Show me the query log for the last week. Per statement, with the context that produced it. If the vendor cannot show one, they do not have one.

These are not gotcha questions. They are the minimum a finance function should ask before pointing any LLM at the system of record. The vendors that have done the work will answer them in a paragraph each. The ones that haven't will start talking about SOC 2.

The bar for letting an LLM write SQL against a finance warehouse is high — and it should be. The good news is that it is a solvable engineering problem. The bad news is that "we trained the model to be careful" is not the solution.

Going beyond reads? The same defence-in-depth model extends to mutation APIs — the place agents start posting journals or updating analysis codes. We wrote up the SunSystems version of this in Agentic Finance Workflows on SunSystems.

Frequently asked

Is it safe to let an LLM write SQL against a production warehouse?
Yes — if the agent is treated as an untrusted user of the warehouse and the safety posture is defence-in-depth: a read-only role, a query planner enforcing row and column policies, statement-level timeout and byte-scan caps, and an explicit schema allow-list. The model is the policy author, not the policy enforcer.
What's the worst-case query an agent could run?
Without limits: a cross-join across two large fact tables that returns gigabytes, scans for hours, and either bills you for it (BigQuery) or pins a warehouse (Snowflake). With statement-level byte caps and timeouts in place, that query simply doesn't execute — the agent gets an error and tries again. Worst-case becomes 'wasted a few seconds of compute.'
Should the agent be able to write to the warehouse at all?
Default no. Read-only is the right baseline. Write capabilities should be per-skill, behind an explicit approval step, and ideally targeted at a sandbox schema rather than the system of record. The same defence-in-depth model applies — least-privilege role, statement guards, audit log.
#warehouse#sql#safety#llm#data-access#finance-ai
Talk to us

Your data. Your model. Your infrastructure.

Bring AI productivity to your finance, operations, and sales teams without handing over your data estate, your deployment posture, your model strategy, or the cost of your stack. Your processes encoded as skills, authored with you by our engineers — on-prem or VPC, LLM-agnostic, governed by default.