Tallie AI
Architecture
5 min read

LLM-Agnostic by Design: Why Finance AI Shouldn't Be Locked to One Vendor

Routing per task — not per platform — is how you keep the cost curve, the capability curve, and the procurement story under your control.

Archie Norman

The model market is moving faster than any procurement cycle. Last quarter's frontier model is this quarter's commodity, and next quarter's regulated workload will probably want something else entirely. Picking a single LLM provider for your finance function and wiring everything to it is a structural mistake.

Tallie is LLM-agnostic by design — not as a marketing line, but as a deployment property. Here is what that means in practice and why it matters more for finance than for almost any other function.

"LLM-agnostic" is a property of the system, not a slide

A lot of products claim to be model-agnostic. What they usually mean is: "we currently call OpenAI, and one day we might call Anthropic." That is not agnosticism. It is a single integration with optionality on the roadmap.

A genuinely LLM-agnostic system has three properties:

  1. Per-task routing. A long-context summarisation task can go to one provider; a structured-output reconciliation can go to another; a sensitive board-pack draft can stay on a self-hosted open-weights model that never leaves your VPC. The router is part of the platform, not part of a single workflow.
  2. Replaceable provider, stable contract. When you swap GPT-style provider A for provider B, the skills, audit trail, connectors, and access controls do not change. The model is a runtime detail, not the architecture.
  3. Cost and capability transparency. You can see, per workload, which model handled the call, what it cost, and what the latency looked like. You can reroute based on those numbers, not on a vendor's promise.

Without all three, you do not have agnosticism — you have a thin abstraction over a single bet.

Why finance, specifically

Finance workloads are unusually heterogeneous. A single month-end might involve:

  • Long-context analysis of board materials, contracts, and policy documents.
  • Structured extraction from invoices, statements, and trial balances.
  • Numerical reasoning over reconciliation deltas and variance analysis.
  • Drafting in a tightly controlled tone — board commentary, audit responses, lender updates.
  • Sensitive workloads — payroll commentary, restructuring analysis, M&A — where the data should not leave the perimeter at all.

No single model is best at all of these. The frontier model that writes the cleanest commentary may be slow and expensive for batch reconciliations. The cheapest extraction model may not be defensible for audit-touching outputs. The right answer is to route — and to keep routing as a first-class capability of the platform.

The procurement and risk angle

There is a second reason LLM-agnosticism matters that has nothing to do with capability.

If your AI strategy is built on top of a single vendor, you have inherited that vendor's:

  • Pricing curve. Token costs change. Sometimes a lot.
  • Roadmap risk. Models get deprecated. Behaviour changes between versions in ways that break tightly-coupled workflows.
  • Compliance posture. A change in a vendor's data handling, training policy, or regional availability can disqualify them from your stack overnight.
  • Geopolitical exposure. Some customers cannot route certain workloads through certain jurisdictions, full stop.

A finance function that has hardwired itself to one provider is one provider decision away from a forced re-platforming. Agnosticism is, more than anything, an option-value play.

What to demand from a vendor

If you are evaluating an AI platform for finance, the bar should be:

  1. Show me how a workload is routed today, and how I would route it differently tomorrow.
  2. Show me what happens to a skill when I swap the underlying model — does it break, does it behave the same, what changes?
  3. Show me self-hosted as a first-class option, not a roadmap item.
  4. Show me the per-call audit log: which model, which prompt, which tool calls, which output.

If the answer to any of those is hand-wavy, the platform is more locked-in than it sounds.

The Tallie default

For Tallie deployments, the default is a small, governed router that maps task type to model provider, with the customer choosing the providers and the policy. Read-only finance answers, by default, route to a provider the customer has already approved on their data residency and processing terms. Sensitive workloads route to self-hosted weights inside the customer's environment.

This is not exotic. It is what an LLM-agnostic system looks like when you build it for a finance team rather than for a general developer audience. And it is what keeps the model market a tailwind for you, instead of a source of lock-in.


See also: the architectural argument here was reinforced almost immediately by two consecutive open-source releases — Kimi K2.6, an open-weights model competitive with the closed frontier on agentic coding, and CubeSandbox, a Tencent-released open-source MicroVM sandbox compatible with the E2B SDK. Together they make the "open-weights model + open execution sandbox + your-environment deployment" stack buildable end-to-end without a closed vendor in the data path.

Frequently asked

What does 'LLM-agnostic' actually mean?
It means the system routes per task across multiple model providers — hosted (OpenAI, Anthropic, Google), open-weights (Kimi, Llama, Qwen), and self-hosted — with the routing rules versioned alongside the rest of the application. Most products that claim agnosticism have a single integration with optionality on the roadmap; true agnosticism is a deployment property you can verify.
Doesn't model routing add latency and complexity?
Routing decisions are made per task type, not per request, so the runtime overhead is the cost of selecting from a small dispatch table. The real complexity lives in the eval harness and routing rules — both of which a customer wants to own anyway, because they encode which model is trusted for which workload.
What does this mean for cost?
Per-task routing lets a customer compose cheap models for high-volume work and frontier models only where capability genuinely matters. The marginal cost of a workflow stays in the customer's hands rather than tracking a single vendor's per-seat or per-token curve.
#llm-routing#llm-agnostic#model-selection#architecture
Talk to us

Your data. Your model. Your infrastructure.

Bring AI productivity to your finance, operations, and sales teams without handing over your data estate, your deployment posture, your model strategy, or the cost of your stack. Your processes encoded as skills, authored with you by our engineers — on-prem or VPC, LLM-agnostic, governed by default.