Finout Blog Archive

Cloud Vendors Finally Agreed on Commitments. AI Vendors Didn't.

Written by Asaf Liveanu | May 27, 2026 8:44:17 AM

A 2026 follow-up to my RIP RIs piece — because the story didn't end where I thought it would.

Last year I argued AWS was quietly killing Reserved Instances. The post made the rounds. I stand by it.

But it turns out that was only half the story.

In the year since, AWS, GCP, and Azure have all moved in the same direction — toward spend-based, flexible commitments. They've arrived at something close to a shared model. And then the AI vendors showed up. Each one with a different commitment vehicle. None of them coordinated.

So here's the punchline: The cloud commitment market just got coherent. The AI commitment market is still in 2015.

If your FinOps practice handles cloud commitments today, congratulations — that knowledge mostly transfers across AWS, GCP, and Azure now. But it doesn't transfer to AI. Not one of the five AI commitment products on the market looks like the others. And the gap between them is bigger than the gap between an EC2 RI and a Compute Savings Plan ever was.

Let me walk you through where the three clouds actually landed — and what AI did instead.

1. AWS, GCP, Azure: The Slow Convergence

It took eight years.

AWS went first. Reserved Instances were the original commitment vehicle — instance-tied, region-locked, OS-specific. They worked for 2013-style workloads. They didn't survive 2018-style elasticity. Compute Savings Plans replaced them as the default commitment in 2019. By 2024, AWS stopped extending RIs to new instance families. trn2, p5e, i8g — none of them ever got RIs. The message was clear.

GCP went the same way. Resource-based CUDs were locked to machine type and region. Then came Flex CUDs — spend-based, cross-family, cross-region. As of January 2026, GCP migrated all spend-based CUDs to a "multiprice" model where discounts apply directly to SKU prices instead of credit offsets. Flex CUDs now cover N1, N2, N4, C3, C4, E2 — most modern VM families. The legacy CUD lives on, but it's not where new commitment volume goes.

Azure was last. Reserved VM Instances were the workhorse for years. Azure Savings Plans for Compute launched in 2022. The two ran in parallel — until May 6, 2026, when Microsoft announced that legacy-VM Reserved Instances retire on July 1, 2026. Av2. Bv1. The entire V1–V3 generation. The official transition guide explicitly tells customers to "trade in current RIs to Azure savings plan for compute." The push is no longer subtle.

Three different vendors. Three different starting points. One convergent answer: commit to spend, not to a resource.

The cloud commitment market is the most coherent it has ever been. A FinOps practitioner who understands AWS Savings Plans can read a GCP Flex CUD page or an Azure Savings Plan page and recognize the structure immediately. The unit is dollars per hour. The flexibility is across families and regions. The term is one or three years. The trade-off is the same.

This wasn't planned. It's the result of eight years of customer pressure and provider iteration. But it landed. And it's actually a good place to be.

2. Then AI Showed Up

Now look at the AI side of the bill.

AWS Bedrock Provisioned Throughput — 1-month or 6-month commitments. Locked to a specific model. Billed hourly per Model Unit. You pay whether you use the capacity or not. The AWS docs effectively describe it as the RI model applied to inference. They aren't wrong — it's the same playbook AWS already replaced on the compute side.

Azure OpenAI Provisioned Throughput Units (PTUs) — Hourly, monthly, or annual. Locked to a specific model deployment. About $2,400 per month minimum. Up to 70% savings if you can sustain the usage. The committed-purchase model isn't even available to new customers or some newer models — Microsoft is still iterating on the structure.

Google Vertex AI Provisioned Throughput — 1 week, 1 month, 3 months, or 1 year. Locked to a model (Gemini 3 Pro and the supported list). Priced in Generative AI Scale Units (GSUs). Capacity reservation built in.

Anthropic Priority Tier — Sales-assisted. Committed spend. Negotiated. 99.5% uptime SLA. Tied to Claude models. Public pricing? Not really.

OpenAI Guaranteed Capacity — Launched May 19, 2026. 1, 2, or 3 years. Commitment in dollars per year. Capacity expressed in tokens per minute — up to 1B/min. Works across model families. Works across supported cloud providers. Use-it-or-lose-it.

Five vendors. Five different commitment vehicles. Different terms. Different units. Different lock-in profiles. Different procurement paths.

And if you noticed a pattern: four of the five are RI-flavored. They lock you to a model, they bill you for capacity whether you use it or not, and they're tied to one vendor.

OpenAI is the outlier. Its commitment unit is dollars per year, not throughput per model. It works across cloud providers. It mirrors the structure that cloud providers spent eight years arriving at. It's a Savings Plan for AI.

That's worth noticing. The newest AI commitment vehicle on the market is the one that looks most like the current state of cloud commitments. The earlier AI commitment vehicles look like 2015 cloud commitments. The market is repeating its own evolution in fast-forward — and OpenAI just leapfrogged to the end of the cycle.

3. The Comparison Nobody Has Made Yet

Put them side by side and the gap is hard to miss.

Vendor

Product

Term

Commitment unit

Locked to a specific model?

Cross-cloud?

Capacity reservation?

AWS

Compute Savings Plan

1 or 3 yr

$ / hour

n/a

No

No

AWS

EC2 Instance Savings Plan

1 or 3 yr

$ / hour

n/a

No

No

AWS

Reserved Instance

1 or 3 yr

Specific instance

n/a

No

Yes (zonal)

GCP

Flex CUD

1 or 3 yr

$ / hour

n/a

No

No

GCP

Resource-based CUD

1 or 3 yr

Machine type

n/a

No

Yes

Azure

Savings Plan for Compute

1 or 3 yr

$ / hour

n/a

No

No

Azure

Reserved VM Instance

1 or 3 yr

Specific VM family

n/a

No

Yes

AWS Bedrock

Provisioned Throughput

1 or 6 mo

Model Units

Yes

No

Yes

Azure

OpenAI PTUs

1 mo or 1 yr

PTUs

Yes

No

Yes

AWS Bedrock

Provisioned Throughput

1 or 6 mo

Model Units

Yes

No

Yes

Azure

OpenAI PTUs

1 mo or 1 yr

PTUs

Yes

No

Yes

GCP

Vertex AI Provisioned Throughput

1 wk – 1 yr

GSUs

Yes

No

Yes

Anthropic

Priority Tier

Negotiated

Committed spend

Yes (Claude only)

Sort of

Sort of

OpenAI

Guaranteed Capacity

1, 2, or 3 yr

$ / year

No

Yes

Yes

The top half — the cloud half — reads like rows of the same table. The same units. The same terms. The same trade-offs.

The bottom half — the AI half — reads like a different document for every vendor.

Different terms. Different units. Different lock-in. Different procurement.

And the table doesn't even capture the bigger problem: none of these AI commitments show up on your FinOps dashboard the way cloud commitments do. You can pull Bedrock Provisioned Throughput utilization from CloudWatch. PTU utilization from Azure Monitor. GSU utilization from Vertex AI metrics. Priority Tier and Guaranteed Capacity? You're working from a CSV.

4. Your FinOps Stack Wasn't Built for This

Here's the part no vendor will advertise.

Your FinOps tooling assumes one of two worlds. Either you're optimizing instance-tied commitments (the old RI / Azure RI / GCP CUD model), or you're optimizing spend-based commitments (the Savings Plan model). It probably handles both. Badly or well, it handles both.

But it almost certainly doesn't handle:

  • A model-tied AI commitment with a 6-month term whose break-even is 150 million tokens per month
  • A throughput reservation in Generative AI Scale Units whose utilization isn't visible in any cost dashboard you own
  • A negotiated Anthropic Priority Tier commitment that doesn't appear in any vendor's pricing page
  • A multi-year OpenAI Guaranteed Capacity commitment that applies discounts across both Azure and AWS line items

That's five different optimization problems. Each one requires its own utilization signal. Each one has its own break-even point. Each one breaks differently when usage shifts.

If your cost tool only sees the line item after it lands on the bill, you're already three months behind. By the time you notice you're underutilizing a Bedrock Model Unit, you've burned half the commitment.

The real question isn't which commitment to buy. It's whether your stack can tell you, in real time, what you're getting for each one.

5. What This Means For The Next 12 Months

The cloud commitment market is done converging. AWS, GCP, and Azure have arrived at compatible answers. The few remaining holdouts (Azure legacy RIs, GCP resource-based CUDs) are being phased out. If you have a strong cloud-commitment playbook in 2026, that playbook will keep working in 2027.

The AI commitment market is in the opposite place. Every vendor is iterating in public. OpenAI just released the first AI Savings Plan-style product. The others will either copy that structure, differentiate harder around model-locked capacity reservations, or fall back on negotiated enterprise pricing. We will see significant changes in this space every quarter for the next two years.

For FinOps leaders, this means three things over the next four quarters:

The cloud commitment skill set you have today will stay relevant. Don't over-invest in chasing more cloud-commitment tooling — that market is mature.

The AI commitment skill set you need next is structurally different. Tokens-per-minute, Model Units, GSUs, and committed-spend tiers don't map onto vCPU-hours. The optimization muscle is new.

The cost-visibility gap between "AI committed" and "AI on-demand" is the next FinOps cliff. Every vendor wants you to commit. None of them want to make underutilization visible to you in real time.

At Finout, we've spent the last year extending the same multi-layer view that handled the cloud commitment sprawl to handle the AI commitment sprawl. The data layer matters more than ever because the unit of commitment keeps changing.

The cloud part of your bill just got easier to manage.

The AI part of your bill is about to get harder.

Are you watching the right one?