Table of Contents

Written By

Asaf Liveanu
Co-Founder & CPO
Asaf is the CPO and co-founder of Finout. He has more than 12 years of experience in software engineering, QA and product management at companies like Taboola and Intel. In his last position at Logz.io, he met Roi, and together they decided to embark on the Finout journey.

For months, teams have been quietly duct-taping the same thing together: a way to put Claude Code behind a single choke point so they could see who was using it, what it cost, and keep it inside some kind of policy.

Custom binaries. Homegrown proxies. Usage metrics piped into whatever dashboard was already lying around. It worked, sort of. It was also a hack.

This week Anthropic shipped the real thing: the Claude apps gateway. A self-hosted control plane for Claude Code with corporate login, central policy, role-based access, per-user cost attribution, and spend caps. One container between your developers and the models.

Interested in reading their full announcement: : https://claude.com/blog/introducing-the-claude-apps-gateway

So, I have thoughts. Some of them are "this is fantastic." Some of them are "don't roll this out to 5,000 engineers on Monday." Let's do both.

First, credit where it's due: this is fast

Six months ago, cost governance for AI coding tools was nobody's job. You had an API key per developer, a monthly bill you couldn't decompose, and a vague sense that spend was going up and to the right.

Now the vendor itself ships identity, policy, attribution, and spend limits in the box. That's a remarkable pace. Anthropic looked at what its own power users were cobbling together and turned it into a product before most of the market even noticed there was a problem.

That matters beyond the feature list. When a model provider builds cost controls into its own tooling, it's making a statement: AI usage is now infrastructure, and infrastructure gets governed. The experiment phase is ending.

What's actually great about it

A few things genuinely stand out.

It's a choke point, not a surveillance agent. The gateway sits at the one place all the traffic already flows through. You don't need software installed on every developer's laptop watching what they type. You govern at the layer, not the endpoint. That's the right architecture, and it's the one that scales.

It speaks open standards. Usage streams out over OTLP to a collector you run- not a proprietary black box, not "call our API to get your own data back." If you already have an OpenTelemetry pipeline, you're plugging into something you understand. Anthropic even published the protocol, which is a real signal this is meant to be an ecosystem, not a walled garden.

Identity resolution is the sleeper feature. Tying spend to an actual person and team- not just an aggregated model bill- is the thing every FinOps team has been begging for. "We spent $40K on Opus" is useless. "This team spent $18K, here's the split by engineer and model" is a budget conversation you can actually have.

The data is richer than a generic proxy. Because the client and the gateway are built together, you get per-developer, per-model signals that a plain proxy sitting in front of an API simply can't see. That granularity is where the interesting optimization lives.

An honest starter guide: how to try it

If you want to kick the tires- and you should- here's a sane way in.

Start small and self-hosted. It's one stateless container backed by Postgres. Stand it up, point it at your identity provider, route a single team through it. Don't boil the ocean.

Wire the telemetry into what you already have. The usage stream lands wherever your OpenTelemetry collector points- a metrics backend, a log store, your existing observability stack. Resist the urge to build a bespoke dashboard on day one. Get the data flowing first, decide what's worth visualizing second.

Validate the money before you trust it. Set a small spend cap and confirm the gateway's cost numbers line up with your actual provider invoice. Metered estimates and billed reality are not the same thing until you've proven they are.

Then widen the rollout. Add teams once you trust the identity mapping, the numbers, and the failover behavior- not before.

Now the part nobody puts in the launch post: enterprise scale

Here's where I'd pump the brakes.

It's day one. This is a brand-new, self-hosted control plane. "Available now" is not the same as "battle-tested across 10,000 seats." Anything that sits in the critical path of every developer's request has to earn that trust, and trust takes uptime you haven't accumulated yet.

You're now running critical infrastructure. That single container is a choke point in the literal sense- if it's down, work stops. High availability, failover, backups for that database, upgrade paths, an on-call rotation. This is a control plane you operate, with everything that implies.

It holds your upstream credential. Centralizing the key is great for control and meaningful for blast radius at the same time. One component now has the keys to your model spend. Secure it like it does.

Identity through Bedrock or Vertex is an open question. Most enterprises don't hit the Anthropic API directly- they route through Bedrock or Vertex for procurement, security, and reliability reasons. Whether clean per-engineer identity resolution holds up through those cloud upstreams is exactly the thing to test, not assume.

It governs one tool. This is a gateway for Claude Code. It does not see Cursor, Copilot, your production inference, or your raw Bedrock bill. Native, per-vendor governance is a feature- it is not a cost strategy. If you run five AI tools, you now have five places to set a limit and still no single number finance can trust.

Metered usage still needs reconciliation. A gateway that estimates cost is not the invoice you actually pay. Aligning the two- across accounts, models, and providers- is the unglamorous work that decides whether any of this is real.

And the strategic question underneath all of it: do you invest in standing this up now, or wait for it to mature? There's no universally right answer. But ask it on purpose, not by default.

Who should be an early adopter

This isn't for everyone yet.

It's a great fit if you:

  • Run Claude Code at real volume and feel the attribution pain today.
  • Are Kubernetes-native with an existing OpenTelemetry / observability practice — this will feel like home.
  • Have a platform or FinOps team that can own a self-hosted service.
  • Want per-engineer visibility badly enough to run a pilot and help shape the roadmap.

Wait, or pilot narrowly, if you:

  • Need turnkey, enterprise-SLA reliability from day one.
  • Route everything through Bedrock or Vertex and depend on clean identity resolution there.
  • Are trying to govern many AI tools at once and need one unified view more than deep control of one.
  • Have no appetite to operate another piece of critical infrastructure.

Where this leaves us

I love this release. Not because it's perfect- it isn't- but because it's directionally exactly right. Governance belongs at the choke point, on open standards, tied to identity. Anthropic built that, fast, and put the protocol in the open.

The honest read is that the pattern is production-ready before the product is. That's fine. That's what early adoption is. The teams that pilot this now will understand AI cost governance in a way their competitors won't for another year.

Just be clear-eyed about what a per-vendor gateway is and isn't. It's a fantastic way to govern Claude Code. It is not a way to govern AI. The layer that sits above all these gateways- the one that turns every tool's usage into unified allocation, showback, and anomalies, without an agent on a single laptop- that's still the job. This release makes that job easier. It doesn't replace it.

The gateways are coming. All of them. The question is the same as it's always been: will you have one place to see across them?

Adopt the new standard for
cloud & AI spend
Start free trial now