How Does Google Price Vertex AI?
Google Vertex AI is a managed machine learning (ML) platform for the end-to-end development lifecycle for AI and ML projects on Google Cloud. It provides tools for data preparation, model training, hyperparameter tuning, deployment, and monitoring within a unified interface.
Vertex AI pricing varies by service, with core costs determined by model training, prediction (online and batch), and storage. For example, custom training is often priced per node-hour, online predictions can be priced per 1,000 counts, and generative AI models are priced per 1 million tokens. Other factors like pipeline runs ($0.03 per run), data preprocessing, and specialized services like Vertex AI Search or Vision AI have their own unique pricing structures.
This is part of a series of articles about AI costs
In this article:
- Google Vertex AI Pricing: Generative and Agentic AI
- Google Vertex AI Pricing: Model Training
- Google Vertex AI Pricing: Forecasting and Prediction
- Google Vertex AI Pricing: Supporting AI Services
- Cost Optimization Strategies for Vertex AI Costs
Google Vertex AI Pricing: Generative and Agentic AI
1. Generative AI
Google Vertex AI offers a flexible pricing model for generative AI services, allowing developers and enterprises to choose from several Gemini model variants based on performance and cost requirements.
The platform supports a variety of input and output types including text, images, audio, and video, with pricing determined by model type, token volume, and context length. In addition, optional features like grounding with Google Search or Maps come with separate usage-based charges.
Below is a breakdown of generative AI pricing on Vertex AI.
Gemini 3 (Text/Image/Video/Audio)
|
Usage Type |
≤200K Tokens |
>200K Tokens |
≤200K Cached Tokens |
>200K Cached Tokens |
Batch API ≤200K |
Batch API >200K |
|
Input |
$2.00 |
$4.00 |
$0.20 |
$0.40 |
$1.00 |
$2.00 |
|
Text Output |
$12.00 |
$18.00 |
N/A |
N/A |
$6.00 |
$9.00 |
|
Image Output |
$120.00 |
N/A |
N/A |
N/A |
$60.00 |
N/A |
Gemini 2.5 Pro (Text, Image, Video, Audio Inputs)
|
Usage Type |
≤200K Tokens |
>200K Tokens |
≤200K Cached Tokens |
>200K Cached Tokens |
Batch API ≤200K |
Batch API >200K |
|
Input |
$1.25 |
$2.50 |
$0.125 |
$0.250 |
$0.625 |
$1.25 |
|
Text Output |
$10.00 |
$15.00 |
N/A |
N/A |
$5.00 |
$7.50 |
Gemini 2.5 Flash
|
Input Type |
≤200K Tokens |
>200K Tokens |
≤200K Cached Tokens |
>200K Cached Tokens |
Batch API ≤200K |
Batch API >200K |
|
Text/Image/Video Input |
$0.30 |
$0.30 |
$0.030 |
$0.030 |
$0.15 |
$0.15 |
|
Audio Input |
$1.00 |
$1.00 |
$0.100 |
$0.100 |
$0.50 |
$0.50 |
|
Text Output |
$2.50 |
$2.50 |
N/A |
N/A |
$1.25 |
$1.25 |
|
Image Output |
$30.00 |
$30.00 |
N/A |
N/A |
$15.00 |
$15.00 |
|
Tuning (1M Training Tokens) |
$5.00 |
N/A |
N/A |
N/A |
N/A |
N/A |
Gemini 2.5 Flash Live API
|
Input/Output Type |
Price (per 1M Tokens) |
|
Text Input |
$0.50 |
|
Audio Input |
$3.00 |
|
Video/Image Input |
$3.00 |
|
Text Output |
$2.00 |
|
Audio Output |
$12.00 |
Gemini 2.5 Flash Lite
|
Usage Type |
≤200K Tokens |
>200K Tokens |
≤200K Cached Tokens |
>200K Cached Tokens |
Batch API ≤200K |
Batch API >200K |
|
Text/Image/Video Input |
$0.10 |
$0.10 |
$0.010 |
$0.010 |
$0.05 |
$0.05 |
|
Audio Input |
$0.30 |
$0.30 |
$0.030 |
$0.030 |
$0.15 |
$0.15 |
|
Text Output |
$0.40 |
$0.40 |
N/A |
N/A |
$0.20 |
$0.20 |
Grounding Features (Add-On Costs)
|
Feature |
Free Usage Tier |
Price Beyond Free Tier |
|
Google Search Grounding |
1,500 prompts/day (Flash, Flash Lite) 10,000 prompts/day (Pro) |
$35 per 1,000 grounded prompts |
|
Web Grounding for Enterprise |
None |
$45 per 1,000 grounded prompts |
|
Google Maps Grounding |
None |
$25 per 1,000 grounded prompts |
|
Grounding with Your Data |
None |
$2.50 per 1,000 requests |
Notes:
- These rates are applicable only for successful requests (HTTP 200).
- Token usage includes both new and previous turns in live sessions, and long context inputs (>200K tokens) incur higher charges.
- For enterprise-scale needs, such as over 1 million grounded prompts daily, contact Google for a custom quote.
Given the complexity of tiered pricing and token counting, calculating these costs manually can be difficult. You can use this Vertex AI Cost Calculator to instantly estimate your spend based on your specific model choice and volume.
2. Vertex AI Agent Engine
Vertex AI Agent Engine enables developers to run production-scale AI agents with compute and memory usage–based billing. Charges apply only to active runtime usage; idle agents are not billed. Pricing is calculated per second and includes a generous free tier.
|
Resource |
Free Tier (Per Month) |
Paid Rate (After Free Tier) |
|
vCPU |
First 180,000 vCPU-seconds |
$0.0994 per 3,600 seconds (1 hour) |
|
RAM |
First 360,000 GiB-seconds |
$0.0105 per 3,600 GiB-seconds (1 hr) |
Related content: Read our guide to Gemini pricing (coming soon)
Google Vertex AI Pricing: Model Training
3. AutoML Models
Vertex AI AutoML allows users to train models on image and tabular data without writing code. The pricing is divided into three main activities: training the model, deploying it to an endpoint, and making predictions. Charges are based on the time and resources used, and costs accrue even if no predictions are made—unless the model is undeployed.
|
Operation |
Image Classification |
Image Object Detection |
Tabular (Classification/Regression) |
|
Training (per hour) |
$3.465 |
$3.465 |
$21.252 |
|
Training (Edge model, per hr) |
$18.00 |
$18.00 |
N/A |
|
Deployment & Online Prediction (per hr) |
$1.375 |
$2.002 |
Same as custom-trained model |
|
Batch Prediction (per hour) |
$2.222 |
$2.222 |
See Vertex AI Forecast section |
|
Explainable AI (for predictions) |
Included in prediction price |
Included |
Included (may use more nodes) |
4. Ray on Vertex AI
Ray on Vertex AI allows users to run distributed training workloads using Ray, an open-source framework for scalable AI and Python applications. Users can define custom clusters using various machine types, including CPU and GPU configurations, and pricing is based on the underlying compute and accelerator resources consumed during training.
Ray Training Pricing (us-central1)
|
Machine Type |
Price per Hour (USD) |
|
n1-standard-4 |
$0.228 |
|
n1-highmem-16 |
$1.136 |
|
n2-standard-32 |
$1.865 |
|
a2-highgpu-8g* |
$35.264 |
|
a3-highgpu-8g* |
$105.399 |
|
e2-standard-16 |
$0.643 |
|
c2-standard-16 |
$1.002 |
|
m1-ultramem-160 |
$30.207 |
(*) GPU cost included in total instance price.
Common Accelerator Pricing (us-central1)
|
Accelerator Type |
Price per Hour (USD) |
|
NVIDIA A100 |
$3.52 |
|
NVIDIA H100 80GB |
$11.76 |
|
TPU v3 Pod (32 cores) |
$38.40 |
Disk Pricing
|
Disk Type |
Price per GiB-Hour (USD) |
|
pd-standard |
$0.000066 |
|
pd-ssd |
$0.000279 |
5. Vertex AI Neural Architecture Search
Vertex AI Neural Architecture Search (NAS) lets users automatically discover optimized model architectures. Pricing is based on compute resources used during search and training, and includes both CPU and GPU options. You can choose from predefined scale tiers or fully customize the setup.
NAS Training Pricing (us-central1)
|
Machine Type |
Price per Hour (USD) |
|
n1-standard-16 |
$1.14 |
|
n1-highmem-32 |
$2.84 |
|
n2-standard-64 |
$4.66 |
|
e2-standard-32 |
$1.61 |
|
c2-standard-16 |
$1.25 |
|
a2-highgpu-8g* |
$45.13 |
*Includes GPU cost
Accelerator Pricing
|
Accelerator Type |
Price per Hour (USD) |
|
NVIDIA A100 |
$4.40 |
|
NVIDIA T4 |
$0.53 |
|
NVIDIA V100 |
$3.72 |
Disk Pricing
|
Disk Type |
Price per GiB-Hour (USD) |
|
pd-standard |
$0.000082 |
|
pd-ssd |
$0.000349 |
6. Custom-Trained Models
Vertex AI allows full flexibility by letting users train models with their choice of frameworks and infrastructure. Pricing is determined by the compute, accelerator, and disk resources used. Users can also utilize spot VMs or reserved instances for cost efficiency.
Common Machine Type Pricing (us-central1)
|
Machine Type |
Hourly Price (USD) |
|
n1-standard-4 |
$0.2185 |
|
n1-highmem-8 |
$0.5442 |
|
e2-standard-16 |
$0.6165 |
|
g4-standard-96 |
$10.35 |
|
a2-highgpu-8g* |
$35.40 |
|
a3-ultragpu-8g* |
$99.77 |
|
m1-ultramem-160 |
$28.95 |
(*) GPU included in machine type.
Common Accelerator Pricing (us-central1)
|
Accelerator Type |
Price per Hour (USD) |
|
NVIDIA A100 |
$2.93 |
|
NVIDIA H100 80GB |
$9.80 |
|
NVIDIA L4 |
$0.64 |
|
TPU v2 Single (8 cores) |
$5.175 |
|
TPU v3 Pod (32 cores) |
$36.80 |
Google Vertex AI Pricing: Forecasting and Prediction
7. Vertex AI Forecast
Vertex AI Forecast provides time-series forecasting for tabular data using AutoML or ARIMA+ models. Pricing is based on training time and the number of forecasted data points. Costs are tiered depending on usage volume and whether you're using AutoML or ARIMA+ models.
AutoML Forecasting Pricing
|
Stage |
Usage Volume |
Price |
|
Prediction |
0 to 1M points/month |
$0.20 per 1,000 predictions |
|
1M to 50M points/month |
$0.10 per 1,000 predictions |
|
|
>50M points/month |
$0.02 per 1,000 predictions |
|
|
Training |
N/A |
$21.252 per hour |
ARIMA+ Forecasting Pricing
|
Stage |
Pricing |
|
Prediction |
$5.00 per 1,000 points |
|
Training |
$250 per TB × Candidate Models × Backtesting Windows |
|
Explainability |
Included (via time series decomposition) |
Note: Each data point is a single time step in the forecast horizon. Up to five quantiles are included at no extra cost.
8. Running Prediction and Explanation Jobs
Vertex AI charges for predictions based on the machine type and configuration used by the prediction nodes. These nodes are billed per vCPU, RAM, and optional GPU usage, whether serving online requests, running batch jobs, or remaining idle in a ready state. Pricing also applies to explanations generated by the model.
Online Batch Prediction Pricing
|
Item |
Price per Hour (USD) |
|
vCPU |
$0.025–$0.039 (varies by machine type) |
|
RAM (per GiB) |
$0.0025–$0.0053 |
|
GPU (e.g., A100 40GB) |
$3.37 |
|
Example-Based Indexing |
$3.00 per GB of index (data size × float size) |
Example Calculation (Batch Job)
- Job: 3961 inputs, 0.72 sec each
- Time: 47.5 minutes (approx.)
- Cost (n1-highcpu-32 @ $0.09099/hr): ~$0.063
Google Vertex AI Pricing: Supporting AI Services
In addition to model training, prediction, and generative AI tools, Vertex AI offers a range of supporting services for building production-ready ML systems. These include tools for pipelines, metadata tracking, experiment monitoring, model registry, optimization, and vector-based search. Below is a breakdown of their pricing.
9. Vertex AI Pipelines
Vertex AI Pipelines help automate ML workflows with reproducible runs.
|
Item |
Price (USD) |
|
Pipeline Run Fee |
$0.03 per run |
|
Compute Usage |
Charged as per training VM rates |
|
External Services |
Billed separately (e.g., Dataflow) |
10. Vertex AI Feature Store
Used to manage and serve features for training and inference.
Online Operations Pricing
|
Operation Type |
Price (USD) |
|
Data Processing Node |
$0.08 per hour |
|
Optimized Serving Node |
$0.30 per hour (includes 200 GB) |
|
Bigtable Serving Node |
$0.94 per hour |
|
Bigtable Storage |
$0.000342 per GiB-hour |
Notes:
- Offline operations have the same pricing as BigQuery for ingestion, querying, and storage.
- Pricing above is for the new version of the feature store. Google offers separate pricing for the legacy AI Feature Store.
11. Vertex ML Metadata
Stores metadata for tracking datasets, models, and runs.
|
Metric |
Price (USD) |
|
Storage |
$10 per GiB per month |
12. Vertex AI TensorBoard
Used for logging, visualization, and tracking model experiments.
|
Resource |
Price (USD) |
|
Storage |
$10 per GiB per month |
13. Vertex AI Vizier
Optimizes hyperparameters through black-box search.
|
Usage Tier |
Price (USD) |
|
First 100 trials/month |
Free |
|
Additional trials |
$1 per trial (excl. random/grid) |
|
Random/Grid Search |
Free always |
14. Vertex AI Vector Search
Used for semantic search over embeddings with Approximate Nearest Neighbor (ANN) indexing.
|
Component |
Price (USD) |
|
Index Serving (e2-standard-2) |
$0.0938 per node-hour |
|
Index Build/Update (Batch) |
$3.00 per GiB processed |
|
Streaming Update (Insert) |
$0.45 per GiB |
Storage-Optimized Tier
|
Unit Type |
Price (USD) |
|
Capacity Unit (CU) |
$2.30 per hour |
|
Write Unit |
$0.45 per GiB |
15. Vertex AI Model Monitoring
Monitors deployed model performance and data drift.
|
Resource Type |
Price (USD) |
|
Data Analyzed |
$3.50 per GB |
|
Additional Services |
Billed separately (e.g., BigQuery, Explain) |
16. Vertex AI Workbench
Development environment for managing notebooks and experiments.
Compute & Memory Pricing
|
Resource |
Price (USD) per Hour |
|
vCPU (E2) |
$0.0261 |
|
vCPU (N1, N2, A2) |
$0.0379 |
|
Memory (E2) |
$0.0035 per GiB-hour |
|
Memory (Others) |
$0.0051 per GiB-hour |
Accelerator Pricing
|
GPU Type |
Price (USD) per Hour |
|
A100 |
$4.40 |
|
T4 |
$0.525 |
|
V100 |
$3.72 |
|
P100 |
$2.19 |
Disk Pricing
|
Disk Type |
Price (USD) per GiB-hour |
|
Standard |
$0.0000658 |
|
SSD |
$0.0002795 |
|
Hyperdisk / Extreme / Balanced |
$0.000164 – $0.000205 |
Management Fees
|
Workbench Type |
Resource |
Price (USD) |
|
Managed Notebooks |
vCPU |
$0.05 per vCPU |
|
GPU (Standard) |
$0.35 per GPU |
|
|
GPU (Premium) |
$2.48 per GPU |
|
|
User-Managed |
vCPU |
$0.005 per vCPU |
|
GPU (Standard) |
$0.035 per GPU |
|
|
GPU (Premium) |
$0.25 per GPU |
Cost Optimization Strategies for Vertex AI Costs
1. Reduce Token Usage Via Prompt Engineering and Truncation
A significant portion of generative model costs in Vertex AI is determined by the number of tokens processed per request. By engineering prompts to be concise and relevant, users can reduce the overall token count, directly lowering expenses. Truncating unnecessary prompt details and focusing only on the core task helps avoid extra computational effort, especially for large language models where verbose prompts can double or triple the token bill.
It’s important to measure and monitor how prompt changes affect cost and model output quality. Automated tools and regression testing can ensure that prompt reductions do not negatively impact performance. Teams should establish guidelines for maximum prompt length and consistently audit deployed prompts to maintain efficiency as new use cases or models are introduced.
2. Optimize Embeddings and Retrieval Workflows
Vertex AI users often leverage large pre-trained models to generate embeddings for similarity search or retrieval augmented generation. Each embedding generated or retrieved incurs a fee, and unnecessary or redundant generation can inflate costs. Optimizing data pipelines to eliminate duplicate computations, such as by caching previous embeddings or only updating representations upon data changes, offers immediate savings.
Batch processing and judicious selection of the data to be embedded can also control charges. For instance, embedding only the most relevant or frequently accessed data rather than the entire corpus can significantly decrease volume-based expenses. Regular audits of retrieval workflow logs can identify unused or rarely accessed embeddings, guiding pruning strategies to further manage spend.
3. Select Cost-Efficient Model Families
Selecting the right model family has a direct impact on cost efficiency. Vertex AI offers a variety of models with different capabilities and associated pricing, ranging from lightweight models suitable for high-throughput scenarios to premium models for complex tasks requiring higher quality. By matching use cases to the most cost-effective model, organizations can prevent overpaying for unnecessary capability.
It’s crucial to benchmark model performance against project requirements before scaling usage. Prototyping with small workloads can help determine if smaller or older-generation models suffice with minimal impact on accuracy or latency. Regular reviews should reassess model choices as newer, more optimized offerings become available, potentially unlocking further savings without sacrificing output quality.
4. Use Caching and Batching To Reduce Inference Calls
Frequent, individual inference requests made to Vertex AI endpoints can accumulate high costs. Implementing caching for repeated queries and batching inference jobs allows organizations to reduce the number of API calls and fully utilize endpoint throughput. Batching enables multiple requests to be processed simultaneously, maximizing the efficiency of computational resources and driving down per-request expenses.
Automated cache management and configurable batch intervals help maintain response times while moderating expenditures. Organizations should monitor endpoint logs for frequent or redundant requests and test different batching configurations to find the optimal balance between user experience and operational cost. Adding metrics dashboards to track these changes facilitates ongoing optimization as traffic patterns evolve.
5. Control Regional Spending and Deployment Footprint
Vertex AI pricing can vary by region due to differences in available hardware, network egress, and integration dependencies. Deploying endpoints or workloads only in essential regions helps avoid duplicative spending. Geo-fencing workloads to where data and users reside reduces latency and unnecessary inter-region data transfer, contributing to predictable and localized cost management.
Monitoring deployment footprints across projects is also crucial for preventing resource sprawl. Automated tooling can identify idle or underutilized endpoints, prompting decommissioning or regional consolidation. Organizations should periodically audit regional resource allocation and align deployment policies with usage patterns to ensure they are not incurring overlapping charges for the same workload in multiple geographies.
Conclusion
Vertex AI offers powerful tools for developing and deploying machine learning systems, but costs can escalate quickly without careful management. Understanding the pricing model for each service, from generative AI and custom training to prediction, pipelines, and supporting infrastructure, is essential for making informed architectural and operational decisions. By aligning model choices, workload strategies, and optimization practices, teams can take full advantage of Vertex AI’s capabilities while keeping costs under control.

