The engineering team tweaks your amazing support chatbot - just a small change to include full customer history in every prompt. Overnight, support costs spike 10x.
If you’ve ever tracked cloud costs, you already know the drill: something is always quietly draining your budget while you’re busy doing literally anything else. Now we’ve added LLMs into the mix - and surprise - they bring brand-new (and sometimes beautifully hidden) ways to overspend.
The good news? Your FinOps instincts still apply. The bad news? OpenAI’s billing model doesn’t exactly speak fluent “AWS Cost Explorer” yet. (AWS is just our example here - swap in your CSP of choice.)
LLMs are no longer experiments tucked away in R&D - they’re powering customer support, search, product features, and even internal tooling. But as adoption explodes, the old ways of tracking cloud spend start to break down. Unlike EC2 or storage, LLMs don’t come with resource IDs, flexible tagging, or predictable scaling curves. That’s why applying FinOps principles to LLM usage isn’t a ‘nice-to-have’ anymore - it’s becoming a survival skill for any team watching their AI bill grow faster than expected.
Here’s the kicker - OpenAI doesn’t give you the rich, flexible tagging and metadata capabilities you’ve come to depend on in AWS, GCP, or Azure. Yes, they have projects, but these aren’t the same as true key/value tags you can use to slice, dice, and allocate spend however you want.
So your naming convention is your cost allocation strategy. If your projects are called test1 or foo, your attribution data will be completely useless.
In cloud FinOps, tagging lets you pivot costs instantly by environment, feature, team, or customer. In OpenAI, you’re working with fewer dials to turn - which means structure and discipline in naming is not optional.
At its core, OpenAI pricing is just another flavor of usage-based billing - but with its own quirks:
If AWS bills you for “4 hours of m5.large,” OpenAI bills you for “X tokens of GPT-4.” Same concept - just swap the jargon.
Example: Your product team ships an “instant support reply” feature using GPT-4. The prompt includes the customer’s entire history for context, and the completion is several paragraphs long. Everyone’s thrilled with the quality… until Finance notices this feature costs 10x more than it would with a smaller model or a tighter prompt. If you’d tracked model + operation type + token count from the start, you could have optimized early.
The reality: OpenAI only gives you two attribution fields - user and project. That’s it. No arbitrary tags, no environment=prod, no feature=checkout.
That’s why the LLM Proxy pattern is emerging as a best practice. Think of it as a traffic cop that sits between your app and OpenAI’s API.
It’s a thin middleware layer that intercepts every API call (to OpenAI or any other LLM vendor)
OpenAI’s Usage API is pure gold for FinOps - but by itself, it’s not the whole picture. It tells you what was used, but not the associated dollar value.
The real insight comes when you join usage data with cost data, and the bridge between those datasets is the project ID. That’s why consistent, meaningful project names are so critical.
When you align usage and cost data, you can:
This is exactly the level of visibility you expect from AWS or GCP - it just takes a little extra work to get there with OpenAI.
Tracking OpenAI costs isn’t magic - it’s just… different. Forget EC2 instance IDs; think models, tokens, and context. Get those right, and you’ll have true FinOps visibility for your LLM workloads.
Ignore them, and you’ll be back here in six months, staring at a bill you wish you’d understood sooner.