On-Prem AI for Finance: A Practical Path for Regulated Teams
VPC and on-prem are not exotic any more. Here is what a phased deployment actually looks like for a finance function with real residency, regulator, and audit constraints.
For a long time, "AI in finance" implicitly meant "data goes to a SaaS vendor, gets routed to a model provider, and we hope the contract holds." For a lot of teams — banks, insurers, healthcare finance, parts of the public sector — that has been a non-starter from the beginning.
It does not need to be. On-prem and VPC deployments of finance AI are practical today, and for regulated environments they are increasingly the only credible path. This post is a sketch of what that deployment actually looks like, and how to phase it.
What changes when the worker runs in your environment
A managed-cloud AI deployment usually has the platform sitting in the vendor's tenancy, calling out to model providers, with the customer's data flowing in and out of the vendor's perimeter. Even with strong contracts, your data crosses three boundaries: yours, the vendor's, and the model provider's.
A customer-controlled deployment changes this. The agent worker — the component that orchestrates skills, calls connectors, and routes to models — runs inside your VPC or your data centre. Concretely:
- Connectors to your warehouse, lake, and accounting systems are on your network. Source data does not leave the perimeter.
- Model calls to self-hosted weights stay inside the perimeter entirely.
- Model calls to external providers (OpenAI, Anthropic, etc.) leave the perimeter only when you have approved that workload to do so, with the residency, retention, and processing terms negotiated upstream.
- Logs, audit trail, and skill definitions live on your storage. Your SIEM, your retention policy, your eDiscovery posture.
This is not a bolt-on. It is the deployment shape we built the platform around, because it is the only one that works for the customers we built it for.
The control surface, in plain terms
A regulated-environment deployment of Tallie has four control surfaces that a managed deployment does not need:
- Network policy. What egress is the worker allowed to make, to which endpoints, on which ports? Default-deny, allowlisted, logged.
- Model policy. Which model providers are eligible for which workloads? For sensitive workloads, often only self-hosted weights are eligible. The router enforces this; it is not an organisational guideline. The architectural argument for treating this per task is in LLM-Agnostic by Design.
- Data policy. Which connectors are read-only, which can write, and which capabilities are simply unavailable in this environment? Capability-scoped from day one — see Letting an LLM Write SQL Against Your Warehouse — Safely for how that's enforced at the connector layer.
- Audit and observability. Every run, prompt, tool call, model invocation, and output is recorded into your logging stack — not the vendor's. Your audit team owns the trail.
These are the things a CISO and a head of risk will ask about. Having clear answers, with defaults that are conservative and explicit, is what makes the deployment shippable.
A phased plan
Big-bang AI deployments do not work in regulated finance functions. The compliance review alone tends to outlast the enthusiasm. We recommend a four-phase plan, and we typically run it with the customer over four to eight weeks.
Phase 1 — Scope. A forward-deployed engineer embeds with the finance and IT teams. We map data sources, the recurring processes worth encoding as skills, and — critically — the capabilities the agent should never have. The output of this phase is a written deployment shape that security, finance, and IT all sign.
Phase 2 — Deploy. The agent worker is deployed to the agreed environment (managed cloud, VPC, or on-prem). Read-only connectors are wired in. Network and model policies are applied. No skills run yet. The deployment is reviewed end-to-end against the security checklist.
Phase 3 — Author skills. Read-only finance answers land first — analysis, recurring reporting, variance commentary. Skills are authored from our templates and adapted with the controller. Each skill is reviewed before it is allowed to run; outputs are reviewed in the first cycle before going to the broader team.
Phase 4 — Go live. Staged rollout to the finance team. Read-only analysis and recurring reporting first. Broader capabilities — write actions, external integrations, anything with operational impact — are added on the customer's terms, one capability at a time, each with its own review.
The point of the phasing is not to slow things down. It is to make every step approvable, so the deployment does not stall halfway through compliance review.
What "regulated" actually constrains
A few patterns we see specifically in regulated environments:
- Data residency. Some workloads must run on infrastructure in a specific jurisdiction. Self-hosted weights and a regional VPC handle this cleanly; cross-border SaaS does not. The release of an open-weights model that competes with the closed frontier is what makes this practical now — see the Kimi K2.6 writeup.
- Auditor reproducibility. "Show me how this number was produced" needs a deterministic-enough answer. Skills, with versioned definitions and per-run logs, provide it. Ad-hoc prompting does not.
- Capability allowlists. Regulators increasingly want to see that AI cannot take certain actions, full stop — not "is unlikely to," but "is structurally unable to." Capability-scoped connectors are how you make that statement true.
- Vendor portability. Some teams need a written exit plan: if the vendor goes away, can the customer continue running the agent on their infrastructure with their model providers? Customer-controlled execution is what makes that answer "yes."
These constraints are not blockers. They are design inputs. Building to them from the start is dramatically cheaper than retrofitting them onto a SaaS platform that was designed without them in mind.
The takeaway
On-prem and VPC AI for finance is not the bleeding edge any more. It is the credible default for any team with regulator-, residency-, or audit-driven constraints. The work is in the deployment shape and the phasing — not in waiting for the technology to catch up. It already has.
If your finance team has been told that "real" AI tooling means SaaS-only and a single model provider, that advice is out of date. The shape of the deployment is now a choice. Make it on your terms.
Frequently asked
- Is on-prem AI actually viable for finance teams in 2026?
- Yes — and for regulated teams (banks, insurers, healthcare finance, public sector) it is increasingly the only credible posture. The combination of open-weights frontier models, MicroVM agent sandboxes, and VPC-native control planes has closed the gaps that used to force a managed-cloud trade-off.
- What's the difference between VPC and on-prem deployment?
- VPC means the AI runtime sits inside the customer's cloud account (AWS, Azure, GCP) — same trust boundary as the rest of their data. On-prem means it sits inside the customer's physical or private-cloud network with no public-cloud dependency. Both collapse the three-trust-boundary problem of managed SaaS AI down to one.
- Where do you start — read-only or full automation?
- Always read-only first. Phase one is read-only connectors against existing systems (warehouse, ledger, CRM); phase two adds governed write capabilities per skill with audit and approval; phase three brings in self-hosted open-weights models for the workloads that need to stay fully inside the perimeter. Each phase ships independently and clears its own compliance review.