The Hidden Superpower of Bedrock Cost Allocation: Application Inference Profiles

As seen in Asaf's LinkedIn article
You’ve bought the hype. You’ve signed the Bedrock contracts. And now, you’re staring at your AWS CUR wondering: Why is my generative AI cost just a single line item? Where’s the insight? Where’s the granularity?
Welcome to the new frontier of FinOps for AI.
While most people are still obsessing over GPU hours and token pricing, the smartest teams I talk to have already realized the secret to making Bedrock costs actionable boils down to one AWS feature:
Application Inference Profiles.
What Are Application Inference Profiles?
At a high level, they’re a way to logically group and label your Bedrock usage. Think of them as a “Kubernetes namespace” or an “EC2 tag” — but made specifically for AI inference.
With each invocation to Bedrock, you can attach a profile to help segment workloads. Whether that’s by:
- Team (growth_team_profile)
- Feature (summarization_v3_profile)
- Business unit (customer_support_genAI)
…the point is simple: if you don’t use profiles, you can’t allocate cost beyond a generic bucket.
And yes — profile names are exposed in the CUR under usage records. That’s huge. It means:
- You can reallocate by team
- You can chargeback by feature
- You can filter anomalies by AI workload
But only if you actually use the feature and enforce naming standards. Garbage in, garbage out.
What the CUR Gives You — and What It Doesn’t
Let’s get something straight: the CUR is only the beginning of the story. It gives you:
- Profile name (if used)
- Region, service, SKU
- Total usage type and pricing (e.g., tokens generated, invocation units)
That’s useful — but here’s what you’re not getting:
Metric |
In CUR? |
Why it matters |
InputTokenCount, OutputTokenCount per call |
❌ |
Unit economics per user or request |
InvocationLatency, ModelLatency |
❌ |
Performance vs. cost trade-offs |
NumberOfInvocations (granular) |
❌ |
Request-level analysis |
Mapping to user/org/session context |
❌ |
True business-level attribution |
All of those live in CloudWatch, not in the CUR. Which means you’re either:
- Pulling and stitching both sources yourself (good luck),
- Or ignoring half the data that could drive smarter decisions.
Finout’s Value: Turning AI Billing into Intelligence
This is where Finout kicks in.
We:
- Ingest your CUR with Application Inference Profiles pre-parsed and allocated.
- Augment with CloudWatch metrics to enrich cost records with token counts, latencies, and invocation detail.
- Allow you to define business logic mappings between profiles and cost centers (teams, features, customers).
- Help you model unit costs per output, cost per generated report, or cost per customer question answered.
Want to see which LLMs are running hot? Want to know if your customer support GPT cost 12x more last week? Want to stop burning $80K/month on a model no one’s using?
You need profiles, telemetry, and a brain on top of them. That’s what we built Finout for.
Final Take
If you’re using Bedrock without Inference Profiles, you’re flying blind.
If you’re relying only on CUR, you’re seeing half the picture.
If you’re not connecting the dots between cost, usage, and value — well, you’re not doing FinOps for AI. You’re just paying bills.
With Finout, you don’t just track cost. You understand it.
And in the age of $1M+ AI cloud bills, that understanding is the difference between innovation and incineration.





