FAQ’s

01 How does AWS Bedrock pricing work?

AWS Bedrock pricing is based on model inference charges calculated per token processed. You pay for input tokens (prompt) and output tokens (response) separately, with different rates for each. Pricing varies by model provider (Anthropic, Cohere, Meta, etc.) and model size, with larger models typically costing more per token.

02 What are input and output tokens, and why do they have different prices?

Input tokens represent your prompt or question sent to the model, while output tokens are the model's response. Output tokens typically cost 3-4x more than input tokens because generating responses requires significantly more computational resources than processing prompts. This pricing structure encourages efficient prompt engineering.

03 What's the difference between on-demand and provisioned throughput pricing?

On-demand pricing charges per token with no upfront commitment, ideal for variable or unpredictable workloads. Provisioned throughput requires purchasing dedicated capacity (measured in model units) with hourly charges, offering cost savings of 20-50% for consistent, high-volume usage patterns.

04 How do I calculate the cost for a specific request?

Multiply your input tokens by the input token rate, output tokens by the output token rate, then sum both. For example: 1,000 input tokens × $0.0003 + 500 output tokens × $0.0015 = $1.05 total. Use AWS Bedrock's token counting API or model-specific tokenizers for accurate estimates.