How to Scale AI Without Surprising Your CFO

Based on the webinar: How to Scale AI Without Surprising Your CFO

Panelists:
Eric Lam — Head of Cloud FinOps, Google · Chase Platon — Senior Staff Technical Program Manager, Shopify · Roi Ravhon — Co-Founder & CEO, Finout

How to Scale AI Without Surprising Your CFO

AI has crossed a line. It's no longer a research budget or a "let's see what happens" experiment. Multiple agents, copilots, and inference workflows are now doing real work in production. And the moment that happens, the conversation changes.

The CFO stops asking, "is the model working?" and starts asking, "what are we actually getting from this?"

That question is harder than it sounds. Because in most enterprises, AI cost shows up as one line item per provider per month, with no obvious answer to who used it, which product it powered, or whether it generated any business value at all.

This was the focus of a recent Finout panel on scaling AI responsibly. The discussion brought together FinOps practitioners, Google Cloud, and engineering leaders who have lived through this exact shift. The question on the table: how do you give engineering the room to move fast on AI, while giving finance the answers it actually needs?

Adoption Is Not Impact

Most AI programs start with the wrong scoreboard.

Leadership announces an AI mandate. Teams stand up pilots. Adoption metrics go up and to the right. Everyone celebrates. Then a quarter later, the CFO asks what changed in the P&L, and the room goes quiet.

As one panelist put it: "Adoption does not equal business impact. There are multiple steps along the way to get there."

That gap between "people are using it" and "the business is benefiting from it" is where most AI ROI conversations break down. Pilots, usage volume, and tool deployment are leading indicators at best. They tell you something is happening. They don't tell you whether it's worth the spend.

The shift now underway in mature organizations is from adoption tracking to a staged value model:

Adoption (are people using it?)
Trust and acceptance (are they relying on it for real work?)
Business outcomes (cost reduction, revenue, productivity, time to market, failure recovery)

Each stage has different metrics. Each stage requires different evidence. And none of them can be answered without visibility into what's actually being consumed and by whom.

The Attribution Gap Is the New Cloud Bill Problem

The early cloud era had a familiar problem. A single bill arrived. Nobody could explain it.

AI is recreating that exact dynamic, just faster.

Inference calls span multiple providers. Models change weekly. Teams pipe usage through proxies, gateways, agents, and downstream applications. By the time the bill arrives, the chain of "who called what for which feature" is already cold.

The result is the same surprise that defined the early cloud-cost era, compressed into a faster cycle. And the response has to be the same: visibility first, governance second.

As one panelist framed it, visibility cannot be "a monthly email and one line item". It has to go down to the team, the product, and ideally the individual workload. That's the only level at which an engineer or a finance partner can actually do something with the data.

Shopify's AI FinOps Journey: A Concrete Playbook

The clearest example shared on the panel was Shopify's path to AI cost attribution.

They started where most enterprises start. A centralized AI proxy. All LLM calls routed through one gateway. Clean from a security standpoint, useful from a rate-limiting standpoint, but missing the one thing finance kept asking for: context.

The gateway knew tokens were flowing. It didn't know which team, which product, or which feature was driving the usage.

So they did three things:

First, they instrumented usage tagging by passing response headers on every LLM call. That made every token traceable back to its origin.

Second, they mapped tokens to internal "cost containers" tied to vault teams, so usage rolled up to the same ownership model the rest of the business already understood.

Third, they unified the data. Usage and cost flowed into a single warehouse, and from there into near real-time dashboards. The team called it the "pit wall", a deliberate replacement for the older governance pattern of "wait for a Slack alert when something goes wrong".

The takeaway is not that every company should rebuild Shopify's stack from scratch. It's that AI cost attribution is a solved problem when you treat it as one. Visibility, ownership, and unit economics are not new ideas. They just have to be applied to a faster-moving, more fragmented surface.

This is exactly where legacy FinOps tools start to break. Tools designed for static infrastructure cannot keep pace with workloads that span cloud, Kubernetes, and AI/LLMs at the same time, with model choices that change quarter over quarter. The job now is to be the system of record across all of it, with allocation that adapts as fast as the org does.

FinOps Principles, Applied to AI

The mechanics that mature FinOps teams use for cloud and Kubernetes apply almost cleanly to AI. The terminology is the same. The discipline is the same.

Right-size the workload. Most teams default to the largest, most capable model and never revisit the choice. A smaller model, a fine-tuned model, or a cached response often delivers the same outcome at a fraction of the cost. Right-sizing for AI is model selection, prompt efficiency, and caching strategy.

Measure unit economics. Cost per request. Cost per user. Cost per resolved support ticket. Cost per generated draft. Without unit economics, you cannot answer the CFO's question, and you cannot defend the investment.

Maintain cost discipline through guardrails, not blockers. Budgets, alerts, anomaly detection, and decision forums let engineering keep moving while finance keeps the floor under control. The goal is not to slow innovation. The goal is to make sure innovation has a price tag attached to it.

That last point matters. Governance that blocks experimentation will be routed around. Governance that informs experimentation gets adopted.

The Multi-Provider Question

One of the sharper audience questions on the panel was: "Can you actually tag current AI providers clearly?"

The honest answer is that it depends on how the workload is deployed.

Managed platforms (Vertex AI, Bedrock, Azure OpenAI) make telemetry and logging far easier. Self-hosted or directly-called provider APIs require more instrumentation, usually at the gateway or proxy layer, exactly the pattern Shopify used.

The principle is consistent across both. Every call should produce a trace that connects usage back to a team, a product, and a business purpose. If a workload cannot answer those three questions, it cannot be governed, and it cannot be defended in a CFO conversation.

How to Calculate the Value of AI

This is the question every panel on AI eventually arrives at, and it does not have a one-line answer.

What works is a staged measurement approach:

Adoption metrics confirm the tool is reaching users.
Trust and acceptance metrics (acceptance rate of suggestions, override rate, repeat usage) confirm the tool is good enough to rely on.
Business outcome metrics (cost reduction, revenue lift, hours saved, faster cycle times, fewer incidents) confirm the investment is paying back.

Most organizations stop at adoption because adoption is the easiest to measure. The teams that earn CFO trust on AI are the ones that push all the way through to outcome metrics, even when the data is messy.

What "Surprising the CFO" Actually Looks Like

The phrase is a useful test. If your finance partner is finding out about a 3x jump in inference spend from the monthly invoice, you've already lost the conversation.

The pattern that prevents that surprise is not complicated:

Visibility down to the team and product level, not just the provider level
Attribution that ties tokens and inference cost to the workloads driving them
Budgets and alerts that catch anomalies in near real time, not at month end
A regular forum where engineering and finance review AI spend together, with the same data in front of them

When those four things are in place, the CFO conversation changes. It stops being a defensive review of last month's bill. It becomes a forward-looking discussion about which AI investments are working and which need to be reshaped.

The Bigger Shift

Scaling AI safely is not a technology problem. It's a financial and operational discipline problem dressed up in new vocabulary.

The organizations getting it right are the ones treating AI exactly like they treat any other production cost center. Attribution down to the workload. Unit economics that connect spend to outcome. Governance that preserves engineering autonomy while keeping finance informed. Decisions made with the same data on both sides of the table.

That's how AI scales without surprising the CFO. Not by slowing engineering down. By making sure every team can answer the question the CFO is going to ask, before they ask it.

This is the work mature FinOps teams have been doing for cloud and Kubernetes for years. Wiz, Lyft, and The New York Times built their FinOps practices on exactly this foundation. AI is just the next surface where the same discipline has to be applied, faster than before.

What to Do This Quarter

If you're at the start of this journey, three concrete steps will move you forward:

Make AI spend visible at the team and product level. A single invoice from a model provider is not visibility.
Pick one AI workload and build full attribution for it end to end. Use it as the template for the rest.
Stand up a recurring review with finance. Same data, same cadence, same decisions made together.

Visibility first. Governance second. Value measurement third. In that order.

That sequence is what turns AI from a finance risk into a finance advantage. And it's what keeps the CFO conversation about ROI, not surprises.

Adopt the new standard for
cloud & AI spend

Start free trial now

How to Scale AI Without Surprising Your CFO