FAQ’s

01 How does Snowflake Cortex pricing work?

Snowflake Cortex uses a consumption-based model. Most LLM functions (like COMPLETE or SUMMARIZE) are billed per million tokens processed. However, specialized services like Cortex Search also incur a "serving cost" based on the size of your index (GB/month) and Cortex Analyst may involve warehouse compute for executing generated SQL.

02 What is the difference between input and output tokens?

Input tokens represent the data you send to the model (your prompt or document), while output tokens are the model’s generated response. Output tokens are often more expensive or contribute more to total cost because generating text requires more compute than reading it. Snowflake converts this total token usage into Snowflake Credits.

03 Does Cortex use my Virtual Warehouse credits?

It depends on the function. LLM functions are serverless and bill tokens directly as credits. However, Cortex Search uses your virtual warehouse to refresh and build the index, while the serving (keeping the index online) is billed as a separate serverless charge. Always check if your specific AI task requires an active warehouse for orchestration.

04 How do I calculate the cost for a large-scale batch job?

To estimate costs for millions of rows, multiply the average tokens per row (Input + Output) by your total record count, then apply the credit rate for your chosen model (e.g., Llama 3.1 vs. Claude 3.5 Sonnet). Remember that 1,000 tokens are roughly equivalent to 750 words.

05 Are there "idle costs" for Snowflake AI services?

Yes, specifically for Cortex Search. Once a search service is created and resumed, it incurs a continuous serving fee based on the gigabytes of indexed data, even if no queries are being run. LLM functions, conversely, only charge you when you execute a call.