Bedrock vs. Vertex vs. Azure Cognitive: a FinOps comparison for AI spend

Written by Asaf Liveanu | Nov 2, 2025 3:19:32 PM

Everyone is shipping LLM features. Then month-end hits and someone asks the only question that matters: can we explain where the money went? Here’s a straight-up comparison of AWS Bedrock, Google Vertex AI, and Azure Cognitive/Azure OpenAI from a FinOps perspective. No hype, no pep talk -- just how each cloud stacks up on attribution, governance, and optimization.

Snapshot table

Dimension	AWS Bedrock	GCP Vertex AI	Azure Cognitive / Azure OpenAI
Billing model	Pay-as-you-go per model family via Bedrock; some models billed as Marketplace line items	Pay-as-you-go per SKU; training/hosting on managed or GCE	Pay-as-you-go per model; option for Provisioned Throughput Units for steady usage
Commitment levers	Commit on underlying infra if self-hosting (SageMaker, EC2, EKS); Bedrock itself largely pay-go	CUDs/SUDs apply to infra workloads; serverless tokens pay-go	PTU reservations for Azure OpenAI; standard VM commitments for self-hosted stacks
Attribution handles	Application Inference Profiles (AIPs) surface profile names in cost data if used; standard tags on self-host	Projects/accounts + labels; Pipelines metadata helps grouping; labels flow to BigQuery export	Subscriptions/resource groups + tags with inheritance; clean grouping by app or BU
Telemetry richness	CloudWatch metrics for invocations, tokens, latency; useful for cost-per-X when joined to CUR	Cloud Monitoring + Logging capture request metrics; join with BigQuery billing for unit economics	Azure Monitor and service logs expose calls, token usage, latency; easy to alert and dashboard
Cost data freshness	Cost updates multiple times per day; telemetry near real time	Billing export typically lags 24–48h; telemetry is near real time	Cost updates roughly daily; telemetry near real time
Quota mechanics	Per-model quotas; increases by request; multi-account fan-out is common at scale	Per-endpoint and regional limits; increases by request	Per-deployment limits; PTU provides dedicated capacity where approved
Shared resource allocation	AIPs make team or feature splits cleaner; otherwise allocate by usage metrics	Shared endpoints common; allocate by prediction counts or custom logs	Resource-per-app pattern + tags simplifies showback; or allocate from usage logs
Data residency & regions	Model availability varies by region; residency subject to model provider terms	Broad regional footprint; some model support varies	Azure OpenAI regions subject to approval and availability; PTU tied to region
Governance hooks	Budgets, Anomaly Detection, SCPs/IAM; programmatic guardrails via Lambda	Budgets + Pub/Sub + Functions for automated reactions; Org Policy for guardrails	Budgets, Cost Alerts, Policy, Automation Accounts or Functions for enforcement
Maturity for enterprise showback	Strong with AIPs if enforced; otherwise stitching required	Strong with disciplined projects + labels; stitching required for pure API usage	Strong out of the box with scopes and tags; PTU adds predictability for stable loads

FinOps Foundation pillars, applied

Inform -- clear, explainable cost

Bedrock: AIPs are the stand-out. When teams attach profiles at call time, profile names show up in cost data, making showback by team or feature straightforward. Without AIPs you’re correlating costs with CloudWatch metrics or access logs.
Vertex: Project boundaries are simple and effective. Labels on endpoints, jobs, and pipelines flow into billing export and help stitch who used what. For shared endpoints, plan to attribute via prediction counts or requestor metadata from logs.
Azure Cognitive/Azure OpenAI: Subscription and resource group scopes plus tag inheritance make base-level showback clean. Per-deployment metrics in Azure Monitor provide the usage side needed for unit costing.

Bottom line: All three can explain totals. Bedrock’s AIPs and Azure’s tag inheritance reduce friction. Vertex is excellent when project hygiene is strong.

Optimize -- unit economics and right-sizing

Bedrock: CUR gives totals; CloudWatch adds input/output tokens and latency per model. Join them to calculate cost per generated artifact, cost per resolved ticket, or cost per 1k tokens. Caching, model choice, and prompt brevity move the needle fast.
Vertex: Same playbook, different tools. Monitoring/Logging carry request metrics; BigQuery holds cost. Join them to see cost per output and whether higher-end models outperform cheaper alternatives on a cost-per-result basis.
Azure: Token and latency metrics per deployment make it easy to benchmark model tiers and prompts. PTU can reduce steady inference cost if utilization is high; keep bursty or experimental traffic on pay-go.

Bottom line: Unit economics require billing + telemetry everywhere. No provider’s bill alone is enough.

Operate -- guardrails, budgets, and SLOs

Bedrock: Budgets and Anomaly Detection help detect drift. Enforcement is usually custom Lambda/IAM to cap QPS, switch profiles, or disable keys when thresholds hit.
Vertex: Budgets feed Pub/Sub; Functions or Cloud Run enforce policy. Org Policy controls creation patterns; Monitoring alerts for usage or latency SLOs.
Azure: Budgets integrate with Action Groups and Automation to throttle or disable resources. Policy can require tags and block out-of-policy deployments.

Bottom line: All three support alert-and-act loops. The work is in connecting usage signals to automatic brakes before finance notices.

Where each cloud is strongest for FinOps

AWS Bedrock: Best native path to clean attribution with Application Inference Profiles. Telemetry is rich and close to real time. Excellent if you can enforce profiles through SDKs or an AI gateway.
Google Vertex AI: Best when you standardize on project-per-team and labels-everywhere. Pipelines help keep multi-step jobs together. Expect to join logs to bills for shared services.
Azure Cognitive/Azure OpenAI: Best for predictable enterprise operations. Scopes + tags simplify showback, Azure Monitor is clear, and PTU introduces a commitment lever for stable, high-volume inference.

Nuances that matter in practice

Marketplace vs native billing lines: Some Bedrock models appear as Marketplace items. Units can be non-intuitive and require normalization when you build cross-cloud token views.
Quotas shape architecture: Teams commonly spread inference across accounts or deployments to aggregate capacity while approvals catch up.
Gateway strategy: Central AI gateways make attribution and policy easier but can become quota bottlenecks. Per-team resources simplify showback and quota but multiply deployments.
Data residency and legal: Model availability and data handling terms vary by region and provider. Many enterprises will only use LLMs inside their existing CSP agreements.

Takeaway

All three providers can support rigorous FinOps for AI with the same equation: billing for cost, telemetry for truth, standards for attribution.

Choose Bedrock if you want native attribution via AIPs and can enforce them.
Choose Vertex if your org already lives and breathes project hygiene and labels.
Choose Azure if you value predictable capacity and clean tag inheritance at scale.

Different roads, same destination -- explainable, governable, and optimizable AI spend.

View full post