Bringing FinOps to Your LLMs: Understanding and Tracking OpenAI Spend

Written by Alon Shvo | Sep 29, 2025 2:14:43 PM

The engineering team tweaks your amazing support chatbot - just a small change to include full customer history in every prompt. Overnight, support costs spike 10x.

If you’ve ever tracked cloud costs, you already know the drill: something is always quietly draining your budget while you’re busy doing literally anything else. Now we’ve added LLMs into the mix - and surprise - they bring brand-new (and sometimes beautifully hidden) ways to overspend.

The good news? Your FinOps instincts still apply. The bad news? OpenAI’s billing model doesn’t exactly speak fluent “AWS Cost Explorer” yet. (AWS is just our example here - swap in your CSP of choice.)

Why Does This Matter Now?

LLMs are no longer experiments tucked away in R&D - they’re powering customer support, search, product features, and even internal tooling. But as adoption explodes, the old ways of tracking cloud spend start to break down. Unlike EC2 or storage, LLMs don’t come with resource IDs, flexible tagging, or predictable scaling curves. That’s why applying FinOps principles to LLM usage isn’t a ‘nice-to-have’ anymore - it’s becoming a survival skill for any team watching their AI bill grow faster than expected.

The Core Problem: No Real Tagging

Here’s the kicker - OpenAI doesn’t give you the rich, flexible tagging and metadata capabilities you’ve come to depend on in AWS, GCP, or Azure. Yes, they have projects, but these aren’t the same as true key/value tags you can use to slice, dice, and allocate spend however you want.

So your naming convention is your cost allocation strategy. If your projects are called test1 or foo, your attribution data will be completely useless.

In cloud FinOps, tagging lets you pivot costs instantly by environment, feature, team, or customer. In OpenAI, you’re working with fewer dials to turn - which means structure and discipline in naming is not optional.

The Anatomy of OpenAI Cost & Usage

At its core, OpenAI pricing is just another flavor of usage-based billing - but with its own quirks:

Model – Your “instance type,” but with more flavors and barely any discounts available. You’ve got text models (GPT-3.5, GPT-4), image generation (DALL·E), embedding models for vector searches, and fine-tunes for custom cases. Each comes with its own performance profile and price point - just like cloud instance families.
Operation Type – Your CSP “service type” equivalent. Chat, completion, embedding, fine-tuning, image generation - all billed differently, even within the same model.
Tokens – Your “compute minutes” or “CPU cycles,” but in text form. Every word you send to the model is broken into smaller chunks called tokens (roughly ~4 characters or ¾ of a word). You pay for tokens you send in (prompts) and tokens you get back (completions). The phrase “How can I reduce my cloud costs?” is 7 words but about 9 tokens. If you send a 500-word customer history in the prompt, that’s ~650 tokens before the model even starts replying — and if the response is another 500 words, you’re paying for both sides of the exchange. The more verbose your prompt or the longer the model’s output, the bigger the bill.
Time – When the request happened. This is critical for correlating spend spikes to a code change, feature launch, or that one developer “just testing something” in prod.

If AWS bills you for “4 hours of m5.large,” OpenAI bills you for “X tokens of GPT-4.” Same concept - just swap the jargon.

Example: Your product team ships an “instant support reply” feature using GPT-4. The prompt includes the customer’s entire history for context, and the completion is several paragraphs long. Everyone’s thrilled with the quality… until Finance notices this feature costs 10x more than it would with a smaller model or a tighter prompt. If you’d tracked model + operation type + token count from the start, you could have optimized early.

The Proxy Pattern: Don’t Leave Metadata to Chance

The reality: OpenAI only gives you two attribution fields - user and project. That’s it. No arbitrary tags, no environment=prod, no feature=checkout.

That’s why the LLM Proxy pattern is emerging as a best practice. Think of it as a traffic cop that sits between your app and OpenAI’s API.

Without a proxy: every API call goes straight to OpenAI. Developers have to remember to attach the right project name or user ID each time — and mistakes slip through.
With a proxy: every call first passes through the proxy. The proxy automatically stamps the request with the right metadata (project, team, feature, environment) before forwarding it to OpenAI.

It’s a thin middleware layer that intercepts every API call (to OpenAI or any other LLM vendor)

Usage Data: The Other Half of the Puzzle

OpenAI’s Usage API is pure gold for FinOps - but by itself, it’s not the whole picture. It tells you what was used, but not the associated dollar value.

The real insight comes when you join usage data with cost data, and the bridge between those datasets is the project ID. That’s why consistent, meaningful project names are so critical.

When you align usage and cost data, you can:

Attribute costs by team, feature, or environment
Spot anomalies before they blow up into budget issues
Forecast LLM spend with the same rigor you’d apply to cloud workloads

This is exactly the level of visibility you expect from AWS or GCP - it just takes a little extra work to get there with OpenAI.

The Bottom Line

Tracking OpenAI costs isn’t magic - it’s just… different. Forget EC2 instance IDs; think models, tokens, and context. Get those right, and you’ll have true FinOps visibility for your LLM workloads.

Ignore them, and you’ll be back here in six months, staring at a bill you wish you’d understood sooner.

View full post