Google Vertex AI is a managed machine learning (ML) platform for the end-to-end development lifecycle for AI and ML projects on Google Cloud. It provides tools for data preparation, model training, hyperparameter tuning, deployment, and monitoring within a unified interface.
Vertex AI pricing varies by service, with core costs determined by model training, prediction (online and batch), and storage. For example, custom training is often priced per node-hour, online predictions can be priced per 1,000 counts, and generative AI models are priced per 1 million tokens. Other factors like pipeline runs ($0.03 per run), data preprocessing, and specialized services like Vertex AI Search or Vision AI have their own unique pricing structures.
This is part of a series of articles about AI costs
In this article:
Google Vertex AI offers a flexible pricing model for generative AI services, allowing developers and enterprises to choose from several Gemini model variants based on performance and cost requirements.
The platform supports a variety of input and output types including text, images, audio, and video, with pricing determined by model type, token volume, and context length. In addition, optional features like grounding with Google Search or Maps come with separate usage-based charges.
Below is a breakdown of generative AI pricing on Vertex AI.
Gemini 3 (Text/Image/Video/Audio)
|
Usage Type |
≤200K Tokens |
>200K Tokens |
≤200K Cached Tokens |
>200K Cached Tokens |
Batch API ≤200K |
Batch API >200K |
|
Input |
$2.00 |
$4.00 |
$0.20 |
$0.40 |
$1.00 |
$2.00 |
|
Text Output |
$12.00 |
$18.00 |
N/A |
N/A |
$6.00 |
$9.00 |
|
Image Output |
$120.00 |
N/A |
N/A |
N/A |
$60.00 |
N/A |
Gemini 2.5 Pro (Text, Image, Video, Audio Inputs)
|
Usage Type |
≤200K Tokens |
>200K Tokens |
≤200K Cached Tokens |
>200K Cached Tokens |
Batch API ≤200K |
Batch API >200K |
|
Input |
$1.25 |
$2.50 |
$0.125 |
$0.250 |
$0.625 |
$1.25 |
|
Text Output |
$10.00 |
$15.00 |
N/A |
N/A |
$5.00 |
$7.50 |
Gemini 2.5 Flash
|
Input Type |
≤200K Tokens |
>200K Tokens |
≤200K Cached Tokens |
>200K Cached Tokens |
Batch API ≤200K |
Batch API >200K |
|
Text/Image/Video Input |
$0.30 |
$0.30 |
$0.030 |
$0.030 |
$0.15 |
$0.15 |
|
Audio Input |
$1.00 |
$1.00 |
$0.100 |
$0.100 |
$0.50 |
$0.50 |
|
Text Output |
$2.50 |
$2.50 |
N/A |
N/A |
$1.25 |
$1.25 |
|
Image Output |
$30.00 |
$30.00 |
N/A |
N/A |
$15.00 |
$15.00 |
|
Tuning (1M Training Tokens) |
$5.00 |
N/A |
N/A |
N/A |
N/A |
N/A |
Gemini 2.5 Flash Live API
|
Input/Output Type |
Price (per 1M Tokens) |
|
Text Input |
$0.50 |
|
Audio Input |
$3.00 |
|
Video/Image Input |
$3.00 |
|
Text Output |
$2.00 |
|
Audio Output |
$12.00 |
Gemini 2.5 Flash Lite
|
Usage Type |
≤200K Tokens |
>200K Tokens |
≤200K Cached Tokens |
>200K Cached Tokens |
Batch API ≤200K |
Batch API >200K |
|
Text/Image/Video Input |
$0.10 |
$0.10 |
$0.010 |
$0.010 |
$0.05 |
$0.05 |
|
Audio Input |
$0.30 |
$0.30 |
$0.030 |
$0.030 |
$0.15 |
$0.15 |
|
Text Output |
$0.40 |
$0.40 |
N/A |
N/A |
$0.20 |
$0.20 |
Grounding Features (Add-On Costs)
|
Feature |
Free Usage Tier |
Price Beyond Free Tier |
|
Google Search Grounding |
1,500 prompts/day (Flash, Flash Lite) 10,000 prompts/day (Pro) |
$35 per 1,000 grounded prompts |
|
Web Grounding for Enterprise |
None |
$45 per 1,000 grounded prompts |
|
Google Maps Grounding |
None |
$25 per 1,000 grounded prompts |
|
Grounding with Your Data |
None |
$2.50 per 1,000 requests |
Notes:
Given the complexity of tiered pricing and token counting, calculating these costs manually can be difficult. You can use this Vertex AI Cost Calculator to instantly estimate your spend based on your specific model choice and volume.
Vertex AI Agent Engine enables developers to run production-scale AI agents with compute and memory usage–based billing. Charges apply only to active runtime usage; idle agents are not billed. Pricing is calculated per second and includes a generous free tier.
|
Resource |
Free Tier (Per Month) |
Paid Rate (After Free Tier) |
|
vCPU |
First 180,000 vCPU-seconds |
$0.0994 per 3,600 seconds (1 hour) |
|
RAM |
First 360,000 GiB-seconds |
$0.0105 per 3,600 GiB-seconds (1 hr) |
Related content: Read our guide to Gemini pricing (coming soon)
Vertex AI AutoML allows users to train models on image and tabular data without writing code. The pricing is divided into three main activities: training the model, deploying it to an endpoint, and making predictions. Charges are based on the time and resources used, and costs accrue even if no predictions are made—unless the model is undeployed.
|
Operation |
Image Classification |
Image Object Detection |
Tabular (Classification/Regression) |
|
Training (per hour) |
$3.465 |
$3.465 |
$21.252 |
|
Training (Edge model, per hr) |
$18.00 |
$18.00 |
N/A |
|
Deployment & Online Prediction (per hr) |
$1.375 |
$2.002 |
Same as custom-trained model |
|
Batch Prediction (per hour) |
$2.222 |
$2.222 |
See Vertex AI Forecast section |
|
Explainable AI (for predictions) |
Included in prediction price |
Included |
Included (may use more nodes) |
Ray on Vertex AI allows users to run distributed training workloads using Ray, an open-source framework for scalable AI and Python applications. Users can define custom clusters using various machine types, including CPU and GPU configurations, and pricing is based on the underlying compute and accelerator resources consumed during training.
Ray Training Pricing (us-central1)
|
Machine Type |
Price per Hour (USD) |
|
n1-standard-4 |
$0.228 |
|
n1-highmem-16 |
$1.136 |
|
n2-standard-32 |
$1.865 |
|
a2-highgpu-8g* |
$35.264 |
|
a3-highgpu-8g* |
$105.399 |
|
e2-standard-16 |
$0.643 |
|
c2-standard-16 |
$1.002 |
|
m1-ultramem-160 |
$30.207 |
(*) GPU cost included in total instance price.
Common Accelerator Pricing (us-central1)
|
Accelerator Type |
Price per Hour (USD) |
|
NVIDIA A100 |
$3.52 |
|
NVIDIA H100 80GB |
$11.76 |
|
TPU v3 Pod (32 cores) |
$38.40 |
Disk Pricing
|
Disk Type |
Price per GiB-Hour (USD) |
|
pd-standard |
$0.000066 |
|
pd-ssd |
$0.000279 |
Vertex AI Neural Architecture Search (NAS) lets users automatically discover optimized model architectures. Pricing is based on compute resources used during search and training, and includes both CPU and GPU options. You can choose from predefined scale tiers or fully customize the setup.
NAS Training Pricing (us-central1)
|
Machine Type |
Price per Hour (USD) |
|
n1-standard-16 |
$1.14 |
|
n1-highmem-32 |
$2.84 |
|
n2-standard-64 |
$4.66 |
|
e2-standard-32 |
$1.61 |
|
c2-standard-16 |
$1.25 |
|
a2-highgpu-8g* |
$45.13 |
*Includes GPU cost
Accelerator Pricing
|
Accelerator Type |
Price per Hour (USD) |
|
NVIDIA A100 |
$4.40 |
|
NVIDIA T4 |
$0.53 |
|
NVIDIA V100 |
$3.72 |
Disk Pricing
|
Disk Type |
Price per GiB-Hour (USD) |
|
pd-standard |
$0.000082 |
|
pd-ssd |
$0.000349 |
Vertex AI allows full flexibility by letting users train models with their choice of frameworks and infrastructure. Pricing is determined by the compute, accelerator, and disk resources used. Users can also utilize spot VMs or reserved instances for cost efficiency.
Common Machine Type Pricing (us-central1)
|
Machine Type |
Hourly Price (USD) |
|
n1-standard-4 |
$0.2185 |
|
n1-highmem-8 |
$0.5442 |
|
e2-standard-16 |
$0.6165 |
|
g4-standard-96 |
$10.35 |
|
a2-highgpu-8g* |
$35.40 |
|
a3-ultragpu-8g* |
$99.77 |
|
m1-ultramem-160 |
$28.95 |
(*) GPU included in machine type.
Common Accelerator Pricing (us-central1)
|
Accelerator Type |
Price per Hour (USD) |
|
NVIDIA A100 |
$2.93 |
|
NVIDIA H100 80GB |
$9.80 |
|
NVIDIA L4 |
$0.64 |
|
TPU v2 Single (8 cores) |
$5.175 |
|
TPU v3 Pod (32 cores) |
$36.80 |
Vertex AI Forecast provides time-series forecasting for tabular data using AutoML or ARIMA+ models. Pricing is based on training time and the number of forecasted data points. Costs are tiered depending on usage volume and whether you're using AutoML or ARIMA+ models.
AutoML Forecasting Pricing
|
Stage |
Usage Volume |
Price |
|
Prediction |
0 to 1M points/month |
$0.20 per 1,000 predictions |
|
1M to 50M points/month |
$0.10 per 1,000 predictions |
|
|
>50M points/month |
$0.02 per 1,000 predictions |
|
|
Training |
N/A |
$21.252 per hour |
ARIMA+ Forecasting Pricing
|
Stage |
Pricing |
|
Prediction |
$5.00 per 1,000 points |
|
Training |
$250 per TB × Candidate Models × Backtesting Windows |
|
Explainability |
Included (via time series decomposition) |
Note: Each data point is a single time step in the forecast horizon. Up to five quantiles are included at no extra cost.
Vertex AI charges for predictions based on the machine type and configuration used by the prediction nodes. These nodes are billed per vCPU, RAM, and optional GPU usage, whether serving online requests, running batch jobs, or remaining idle in a ready state. Pricing also applies to explanations generated by the model.
Online Batch Prediction Pricing
|
Item |
Price per Hour (USD) |
|
vCPU |
$0.025–$0.039 (varies by machine type) |
|
RAM (per GiB) |
$0.0025–$0.0053 |
|
GPU (e.g., A100 40GB) |
$3.37 |
|
Example-Based Indexing |
$3.00 per GB of index (data size × float size) |
Example Calculation (Batch Job)
In addition to model training, prediction, and generative AI tools, Vertex AI offers a range of supporting services for building production-ready ML systems. These include tools for pipelines, metadata tracking, experiment monitoring, model registry, optimization, and vector-based search. Below is a breakdown of their pricing.
Vertex AI Pipelines help automate ML workflows with reproducible runs.
|
Item |
Price (USD) |
|
Pipeline Run Fee |
$0.03 per run |
|
Compute Usage |
Charged as per training VM rates |
|
External Services |
Billed separately (e.g., Dataflow) |
Used to manage and serve features for training and inference.
|
Operation Type |
Price (USD) |
|
Data Processing Node |
$0.08 per hour |
|
Optimized Serving Node |
$0.30 per hour (includes 200 GB) |
|
Bigtable Serving Node |
$0.94 per hour |
|
Bigtable Storage |
$0.000342 per GiB-hour |
Notes:
Stores metadata for tracking datasets, models, and runs.
|
Metric |
Price (USD) |
|
Storage |
$10 per GiB per month |
Used for logging, visualization, and tracking model experiments.
|
Resource |
Price (USD) |
|
Storage |
$10 per GiB per month |
Optimizes hyperparameters through black-box search.
|
Usage Tier |
Price (USD) |
|
First 100 trials/month |
Free |
|
Additional trials |
$1 per trial (excl. random/grid) |
|
Random/Grid Search |
Free always |
Used for semantic search over embeddings with Approximate Nearest Neighbor (ANN) indexing.
|
Component |
Price (USD) |
|
Index Serving (e2-standard-2) |
$0.0938 per node-hour |
|
Index Build/Update (Batch) |
$3.00 per GiB processed |
|
Streaming Update (Insert) |
$0.45 per GiB |
|
Unit Type |
Price (USD) |
|
Capacity Unit (CU) |
$2.30 per hour |
|
Write Unit |
$0.45 per GiB |
Monitors deployed model performance and data drift.
|
Resource Type |
Price (USD) |
|
Data Analyzed |
$3.50 per GB |
|
Additional Services |
Billed separately (e.g., BigQuery, Explain) |
Development environment for managing notebooks and experiments.
Compute & Memory Pricing
|
Resource |
Price (USD) per Hour |
|
vCPU (E2) |
$0.0261 |
|
vCPU (N1, N2, A2) |
$0.0379 |
|
Memory (E2) |
$0.0035 per GiB-hour |
|
Memory (Others) |
$0.0051 per GiB-hour |
Accelerator Pricing
|
GPU Type |
Price (USD) per Hour |
|
A100 |
$4.40 |
|
T4 |
$0.525 |
|
V100 |
$3.72 |
|
P100 |
$2.19 |
Disk Pricing
|
Disk Type |
Price (USD) per GiB-hour |
|
Standard |
$0.0000658 |
|
SSD |
$0.0002795 |
|
Hyperdisk / Extreme / Balanced |
$0.000164 – $0.000205 |
Management Fees
|
Workbench Type |
Resource |
Price (USD) |
|
Managed Notebooks |
vCPU |
$0.05 per vCPU |
|
GPU (Standard) |
$0.35 per GPU |
|
|
GPU (Premium) |
$2.48 per GPU |
|
|
User-Managed |
vCPU |
$0.005 per vCPU |
|
GPU (Standard) |
$0.035 per GPU |
|
|
GPU (Premium) |
$0.25 per GPU |
A significant portion of generative model costs in Vertex AI is determined by the number of tokens processed per request. By engineering prompts to be concise and relevant, users can reduce the overall token count, directly lowering expenses. Truncating unnecessary prompt details and focusing only on the core task helps avoid extra computational effort, especially for large language models where verbose prompts can double or triple the token bill.
It’s important to measure and monitor how prompt changes affect cost and model output quality. Automated tools and regression testing can ensure that prompt reductions do not negatively impact performance. Teams should establish guidelines for maximum prompt length and consistently audit deployed prompts to maintain efficiency as new use cases or models are introduced.
Vertex AI users often leverage large pre-trained models to generate embeddings for similarity search or retrieval augmented generation. Each embedding generated or retrieved incurs a fee, and unnecessary or redundant generation can inflate costs. Optimizing data pipelines to eliminate duplicate computations, such as by caching previous embeddings or only updating representations upon data changes, offers immediate savings.
Batch processing and judicious selection of the data to be embedded can also control charges. For instance, embedding only the most relevant or frequently accessed data rather than the entire corpus can significantly decrease volume-based expenses. Regular audits of retrieval workflow logs can identify unused or rarely accessed embeddings, guiding pruning strategies to further manage spend.
Selecting the right model family has a direct impact on cost efficiency. Vertex AI offers a variety of models with different capabilities and associated pricing, ranging from lightweight models suitable for high-throughput scenarios to premium models for complex tasks requiring higher quality. By matching use cases to the most cost-effective model, organizations can prevent overpaying for unnecessary capability.
It’s crucial to benchmark model performance against project requirements before scaling usage. Prototyping with small workloads can help determine if smaller or older-generation models suffice with minimal impact on accuracy or latency. Regular reviews should reassess model choices as newer, more optimized offerings become available, potentially unlocking further savings without sacrificing output quality.
Frequent, individual inference requests made to Vertex AI endpoints can accumulate high costs. Implementing caching for repeated queries and batching inference jobs allows organizations to reduce the number of API calls and fully utilize endpoint throughput. Batching enables multiple requests to be processed simultaneously, maximizing the efficiency of computational resources and driving down per-request expenses.
Automated cache management and configurable batch intervals help maintain response times while moderating expenditures. Organizations should monitor endpoint logs for frequent or redundant requests and test different batching configurations to find the optimal balance between user experience and operational cost. Adding metrics dashboards to track these changes facilitates ongoing optimization as traffic patterns evolve.
Vertex AI pricing can vary by region due to differences in available hardware, network egress, and integration dependencies. Deploying endpoints or workloads only in essential regions helps avoid duplicative spending. Geo-fencing workloads to where data and users reside reduces latency and unnecessary inter-region data transfer, contributing to predictable and localized cost management.
Monitoring deployment footprints across projects is also crucial for preventing resource sprawl. Automated tooling can identify idle or underutilized endpoints, prompting decommissioning or regional consolidation. Organizations should periodically audit regional resource allocation and align deployment policies with usage patterns to ensure they are not incurring overlapping charges for the same workload in multiple geographies.
Vertex AI offers powerful tools for developing and deploying machine learning systems, but costs can escalate quickly without careful management. Understanding the pricing model for each service, from generative AI and custom training to prediction, pipelines, and supporting infrastructure, is essential for making informed architectural and operational decisions. By aligning model choices, workload strategies, and optimization practices, teams can take full advantage of Vertex AI’s capabilities while keeping costs under control.