Everyone is shipping LLM features. Then month-end hits and someone asks the only question that matters: can we explain where the money went? Here’s a straight-up comparison of AWS Bedrock, Google Vertex AI, and Azure Cognitive/Azure OpenAI from a FinOps perspective. No hype, no pep talk -- just how each cloud stacks up on attribution, governance, and optimization.
|
Dimension |
AWS Bedrock |
GCP Vertex AI |
Azure Cognitive / Azure OpenAI |
|
Billing model |
Pay-as-you-go per model family via Bedrock; some models billed as Marketplace line items |
Pay-as-you-go per SKU; training/hosting on managed or GCE |
Pay-as-you-go per model; option for Provisioned Throughput Units for steady usage |
|
Commitment levers |
Commit on underlying infra if self-hosting (SageMaker, EC2, EKS); Bedrock itself largely pay-go |
CUDs/SUDs apply to infra workloads; serverless tokens pay-go |
PTU reservations for Azure OpenAI; standard VM commitments for self-hosted stacks |
|
Attribution handles |
Application Inference Profiles (AIPs) surface profile names in cost data if used; standard tags on self-host |
Projects/accounts + labels; Pipelines metadata helps grouping; labels flow to BigQuery export |
Subscriptions/resource groups + tags with inheritance; clean grouping by app or BU |
|
Telemetry richness |
CloudWatch metrics for invocations, tokens, latency; useful for cost-per-X when joined to CUR |
Cloud Monitoring + Logging capture request metrics; join with BigQuery billing for unit economics |
Azure Monitor and service logs expose calls, token usage, latency; easy to alert and dashboard |
|
Cost data freshness |
Cost updates multiple times per day; telemetry near real time |
Billing export typically lags 24–48h; telemetry is near real time |
Cost updates roughly daily; telemetry near real time |
|
Quota mechanics |
Per-model quotas; increases by request; multi-account fan-out is common at scale |
Per-endpoint and regional limits; increases by request |
Per-deployment limits; PTU provides dedicated capacity where approved |
|
Shared resource allocation |
AIPs make team or feature splits cleaner; otherwise allocate by usage metrics |
Shared endpoints common; allocate by prediction counts or custom logs |
Resource-per-app pattern + tags simplifies showback; or allocate from usage logs |
|
Data residency & regions |
Model availability varies by region; residency subject to model provider terms |
Broad regional footprint; some model support varies |
Azure OpenAI regions subject to approval and availability; PTU tied to region |
|
Governance hooks |
Budgets, Anomaly Detection, SCPs/IAM; programmatic guardrails via Lambda |
Budgets + Pub/Sub + Functions for automated reactions; Org Policy for guardrails |
Budgets, Cost Alerts, Policy, Automation Accounts or Functions for enforcement |
|
Maturity for enterprise showback |
Strong with AIPs if enforced; otherwise stitching required |
Strong with disciplined projects + labels; stitching required for pure API usage |
Strong out of the box with scopes and tags; PTU adds predictability for stable loads |
Bottom line: All three can explain totals. Bedrock’s AIPs and Azure’s tag inheritance reduce friction. Vertex is excellent when project hygiene is strong.
Bottom line: Unit economics require billing + telemetry everywhere. No provider’s bill alone is enough.
Bottom line: All three support alert-and-act loops. The work is in connecting usage signals to automatic brakes before finance notices.
All three providers can support rigorous FinOps for AI with the same equation: billing for cost, telemetry for truth, standards for attribution.
Different roads, same destination -- explainable, governable, and optimizable AI spend.