You’ve signed the Bedrock contracts. The models are running. And now you’re staring at your AWS CUR wondering:
Why does my generative AI cost show up as one line item? Where’s the detail?
Welcome to the new frontier of FinOps for AI.
While most people focus on GPU hours or token pricing, the smartest teams have already realized that the key to AWS Bedrock cost management comes down to one feature:
Application Inference Profiles.
Inference Profiles let you logically group and label Bedrock usage—think Kubernetes namespaces or EC2 tags, but built for AI inference.
Each Bedrock invocation can carry a profile to segment workloads by:
Team — growth_team_profile
Feature — summarization_v3_profile
Business Unit — customer_support_genAI
Without profiles, every inference rolls into a single, untraceable bucket.
With them, you can:
Allocate spend by team or feature
Chargeback accurately
Detect anomalies by workload
Profile names appear directly in the AWS Cost and Usage Report (CUR). But while that’s powerful, it’s only half the visibility story.
The CUR includes profile names, regions, SKUs, and total usage metrics.
Useful, but it omits the real drivers of AI cost.
Metric | In CUR? | Why It Matters |
---|---|---|
Input/Output Token Counts | ❌ | Measure unit economics per request |
Invocation & Model Latency | ❌ | Correlate cost and performance |
Per-Request Invocations | ❌ | Analyze usage efficiency |
User or Session Mapping | ❌ | Tie spend to business context |
That missing data lives in CloudWatch, not in the CUR.
So unless you merge both datasets, you’re leaving optimization insights on the table.
Inference Profiles are essential—but they come with structural limits:
Profile Caps
AWS enforces roughly 1,000 profiles per account per region. For enterprise-scale Bedrock workloads, you’ll hit that ceiling fast.
Granularity Trade-offs
Profiles act as buckets, not scalpel blades. Two workloads sharing one profile = blurred attribution.
Untracked Usage
Direct API calls or vendor integrations often bypass profiles. AWS doesn’t retroactively tag them—missed attribution is permanent.
Opaque Billing Units
Bedrock bills in “units,” not tokens. Two identical units can represent wildly different workloads. Without telemetry, cost optimization is blindfolded.
Operational Overhead
Profiles lack a UI. Managing them requires CLI or SDK scripting—an unnecessary tax on FinOps teams.
Attribution assigns cost where it occurs.
But AI workloads are inherently shared—multi-tenant APIs, cross-functional models, and common infrastructure make it impossible to cleanly map spend to one owner.
That’s where reallocation comes in: the FinOps practice of distributing shared costs based on actual consumption signals.
AWS doesn’t offer that natively.
Finout does.
Finout connects your CUR, CloudWatch, and business context to reveal the real economics of AI workloads.
We:
Parse Application Inference Profiles automatically from your CUR
Enrich them with CloudWatch metrics like token counts, invocation latency, and throughput
Map those to business dimensions like teams, features, or customers
Apply Virtual Tags—our dynamic reallocation engine—to fill attribution gaps
With this, FinOps teams can model:
Cost per report generated
Cost per customer question answered
Cost per token or per output
You get actionable intelligence—not just billing data.
Want to see if your customer support GPT cost 12x more last week?
Or if latency spikes doubled your unit cost?
Finout gives you that visibility.
Inference Profiles are the best native feature AWS offers for Bedrock cost attribution.
But they’re capped, coarse, and incomplete.
To get full showback and chargeback for AI, you need reallocation—a flexible layer that bridges AWS’s gaps with business context.
With Finout, you can:
Retroactively tag and reallocate unattributed spend
Stitch cost with telemetry for complete visibility
Maintain accurate showback across teams, tenants, and products
If you’re using Bedrock without Inference Profiles, you’re flying blind.
If you’re relying only on the CUR, you’re seeing half the picture.
And if you’re not reallocating shared costs, you’re not doing FinOps for AI—you’re just paying bills.
Inference Profiles give you visibility.
Reallocation gives you truth.
Together, they give you control.
That’s what we built Finout for.