Vertex AI: Pricing for Top 16 Vertex Services in 2026

Jan 18th, 2026

How Does Google Price Vertex AI?

Google Vertex AI is a managed machine learning (ML) platform for the end-to-end development lifecycle for AI and ML projects on Google Cloud. It provides tools for data preparation, model training, hyperparameter tuning, deployment, and monitoring within a unified interface.

Vertex AI pricing varies by service, with core costs determined by model training, prediction (online and batch), and storage. For example, custom training is often priced per node-hour, online predictions can be priced per 1,000 counts, and generative AI models are priced per 1 million tokens. Other factors like pipeline runs ($0.03 per run), data preprocessing, and specialized services like Vertex AI Search or Vision AI have their own unique pricing structures.

This is part of a series of articles about AI costs

In this article:

Google Vertex AI Pricing: Generative and Agentic AI
Google Vertex AI Pricing: Model Training
Google Vertex AI Pricing: Forecasting and Prediction
Google Vertex AI Pricing: Supporting AI Services
Cost Optimization Strategies for Vertex AI Costs

Google Vertex AI Pricing: Generative and Agentic AI

1. Generative AI

Google Vertex AI offers a flexible pricing model for generative AI services, allowing developers and enterprises to choose from several Gemini model variants based on performance and cost requirements.

The platform supports a variety of input and output types including text, images, audio, and video, with pricing determined by model type, token volume, and context length. In addition, optional features like grounding with Google Search or Maps come with separate usage-based charges.

Below is a breakdown of generative AI pricing on Vertex AI.

Gemini 3 (Text/Image/Video/Audio)

Usage Type	≤200K Tokens	>200K Tokens	≤200K Cached Tokens	>200K Cached Tokens	Batch API ≤200K	Batch API >200K
Input	$2.00	$4.00	$0.20	$0.40	$1.00	$2.00
Text Output	$12.00	$18.00	N/A	N/A	$6.00	$9.00
Image Output	$120.00	N/A	N/A	N/A	$60.00	N/A

Gemini 2.5 Pro (Text, Image, Video, Audio Inputs)

Usage Type	≤200K Tokens	>200K Tokens	≤200K Cached Tokens	>200K Cached Tokens	Batch API ≤200K	Batch API >200K
Input	$1.25	$2.50	$0.125	$0.250	$0.625	$1.25
Text Output	$10.00	$15.00	N/A	N/A	$5.00	$7.50

Gemini 2.5 Flash

Input Type	≤200K Tokens	>200K Tokens	≤200K Cached Tokens	>200K Cached Tokens	Batch API ≤200K	Batch API >200K
Text/Image/Video Input	$0.30	$0.30	$0.030	$0.030	$0.15	$0.15
Audio Input	$1.00	$1.00	$0.100	$0.100	$0.50	$0.50
Text Output	$2.50	$2.50	N/A	N/A	$1.25	$1.25
Image Output	$30.00	$30.00	N/A	N/A	$15.00	$15.00
Tuning (1M Training Tokens)	$5.00	N/A	N/A	N/A	N/A	N/A

Gemini 2.5 Flash Live API

Input/Output Type	Price (per 1M Tokens)
Text Input	$0.50
Audio Input	$3.00
Video/Image Input	$3.00
Text Output	$2.00
Audio Output	$12.00

Gemini 2.5 Flash Lite

Usage Type	≤200K Tokens	>200K Tokens	≤200K Cached Tokens	>200K Cached Tokens	Batch API ≤200K	Batch API >200K
Text/Image/Video Input	$0.10	$0.10	$0.010	$0.010	$0.05	$0.05
Audio Input	$0.30	$0.30	$0.030	$0.030	$0.15	$0.15
Text Output	$0.40	$0.40	N/A	N/A	$0.20	$0.20

Grounding Features (Add-On Costs)

Feature	Free Usage Tier	Price Beyond Free Tier
Google Search Grounding	1,500 prompts/day (Flash, Flash Lite) 10,000 prompts/day (Pro)	$35 per 1,000 grounded prompts
Web Grounding for Enterprise	None	$45 per 1,000 grounded prompts
Google Maps Grounding	None	$25 per 1,000 grounded prompts
Grounding with Your Data	None	$2.50 per 1,000 requests

Notes:

These rates are applicable only for successful requests (HTTP 200).
Token usage includes both new and previous turns in live sessions, and long context inputs (>200K tokens) incur higher charges.
For enterprise-scale needs, such as over 1 million grounded prompts daily, contact Google for a custom quote.

Given the complexity of tiered pricing and token counting, calculating these costs manually can be difficult. You can use this Vertex AI Cost Calculator to instantly estimate your spend based on your specific model choice and volume.

2. Vertex AI Agent Engine

Vertex AI Agent Engine enables developers to run production-scale AI agents with compute and memory usage–based billing. Charges apply only to active runtime usage; idle agents are not billed. Pricing is calculated per second and includes a generous free tier.

Resource	Free Tier (Per Month)	Paid Rate (After Free Tier)
vCPU	First 180,000 vCPU-seconds	$0.0994 per 3,600 seconds (1 hour)
RAM	First 360,000 GiB-seconds	$0.0105 per 3,600 GiB-seconds (1 hr)

Related content: Read our guide to Gemini pricing (coming soon)

Google Vertex AI Pricing: Model Training

3. AutoML Models

Vertex AI AutoML allows users to train models on image and tabular data without writing code. The pricing is divided into three main activities: training the model, deploying it to an endpoint, and making predictions. Charges are based on the time and resources used, and costs accrue even if no predictions are made—unless the model is undeployed.

Operation	Image Classification	Image Object Detection	Tabular (Classification/Regression)
Training (per hour)	$3.465	$3.465	$21.252
Training (Edge model, per hr)	$18.00	$18.00	N/A
Deployment & Online Prediction (per hr)	$1.375	$2.002	Same as custom-trained model
Batch Prediction (per hour)	$2.222	$2.222	See Vertex AI Forecast section
Explainable AI (for predictions)	Included in prediction price	Included	Included (may use more nodes)

4. Ray on Vertex AI

Ray on Vertex AI allows users to run distributed training workloads using Ray, an open-source framework for scalable AI and Python applications. Users can define custom clusters using various machine types, including CPU and GPU configurations, and pricing is based on the underlying compute and accelerator resources consumed during training.

Ray Training Pricing (us-central1)

Machine Type	Price per Hour (USD)
n1-standard-4	$0.228
n1-highmem-16	$1.136
n2-standard-32	$1.865
a2-highgpu-8g*	$35.264
a3-highgpu-8g*	$105.399
e2-standard-16	$0.643
c2-standard-16	$1.002
m1-ultramem-160	$30.207

(*) GPU cost included in total instance price.

Common Accelerator Pricing (us-central1)

Accelerator Type	Price per Hour (USD)
NVIDIA A100	$3.52
NVIDIA H100 80GB	$11.76
TPU v3 Pod (32 cores)	$38.40

Disk Pricing

Disk Type	Price per GiB-Hour (USD)
pd-standard	$0.000066
pd-ssd	$0.000279

5. Vertex AI Neural Architecture Search

Vertex AI Neural Architecture Search (NAS) lets users automatically discover optimized model architectures. Pricing is based on compute resources used during search and training, and includes both CPU and GPU options. You can choose from predefined scale tiers or fully customize the setup.

NAS Training Pricing (us-central1)

Machine Type	Price per Hour (USD)
n1-standard-16	$1.14
n1-highmem-32	$2.84
n2-standard-64	$4.66
e2-standard-32	$1.61
c2-standard-16	$1.25
a2-highgpu-8g*	$45.13

*Includes GPU cost

Accelerator Pricing

Accelerator Type	Price per Hour (USD)
NVIDIA A100	$4.40
NVIDIA T4	$0.53
NVIDIA V100	$3.72

Disk Pricing

Disk Type	Price per GiB-Hour (USD)
pd-standard	$0.000082
pd-ssd	$0.000349

6. Custom-Trained Models

Vertex AI allows full flexibility by letting users train models with their choice of frameworks and infrastructure. Pricing is determined by the compute, accelerator, and disk resources used. Users can also utilize spot VMs or reserved instances for cost efficiency.

Common Machine Type Pricing (us-central1)

Machine Type	Hourly Price (USD)
n1-standard-4	$0.2185
n1-highmem-8	$0.5442
e2-standard-16	$0.6165
g4-standard-96	$10.35
a2-highgpu-8g*	$35.40
a3-ultragpu-8g*	$99.77
m1-ultramem-160	$28.95

(*) GPU included in machine type.

Common Accelerator Pricing (us-central1)

Accelerator Type	Price per Hour (USD)
NVIDIA A100	$2.93
NVIDIA H100 80GB	$9.80
NVIDIA L4	$0.64
TPU v2 Single (8 cores)	$5.175
TPU v3 Pod (32 cores)	$36.80

Google Vertex AI Pricing: Forecasting and Prediction

7. Vertex AI Forecast

Vertex AI Forecast provides time-series forecasting for tabular data using AutoML or ARIMA+ models. Pricing is based on training time and the number of forecasted data points. Costs are tiered depending on usage volume and whether you're using AutoML or ARIMA+ models.

AutoML Forecasting Pricing

Stage	Usage Volume	Price
Prediction	0 to 1M points/month	$0.20 per 1,000 predictions
	1M to 50M points/month	$0.10 per 1,000 predictions
	>50M points/month	$0.02 per 1,000 predictions
Training	N/A	$21.252 per hour

ARIMA+ Forecasting Pricing

Stage	Pricing
Prediction	$5.00 per 1,000 points
Training	$250 per TB × Candidate Models × Backtesting Windows
Explainability	Included (via time series decomposition)

Note: Each data point is a single time step in the forecast horizon. Up to five quantiles are included at no extra cost.

8. Running Prediction and Explanation Jobs

Vertex AI charges for predictions based on the machine type and configuration used by the prediction nodes. These nodes are billed per vCPU, RAM, and optional GPU usage, whether serving online requests, running batch jobs, or remaining idle in a ready state. Pricing also applies to explanations generated by the model.

Online Batch Prediction Pricing

Item	Price per Hour (USD)
vCPU	$0.025–$0.039 (varies by machine type)
RAM (per GiB)	$0.0025–$0.0053
GPU (e.g., A100 40GB)	$3.37
Example-Based Indexing	$3.00 per GB of index (data size × float size)

Example Calculation (Batch Job)

Job: 3961 inputs, 0.72 sec each
Time: 47.5 minutes (approx.)
Cost (n1-highcpu-32 @ $0.09099/hr): ~$0.063

Google Vertex AI Pricing: Supporting AI Services

In addition to model training, prediction, and generative AI tools, Vertex AI offers a range of supporting services for building production-ready ML systems. These include tools for pipelines, metadata tracking, experiment monitoring, model registry, optimization, and vector-based search. Below is a breakdown of their pricing.

9. Vertex AI Pipelines

Vertex AI Pipelines help automate ML workflows with reproducible runs.

Item	Price (USD)
Pipeline Run Fee	$0.03 per run
Compute Usage	Charged as per training VM rates
External Services	Billed separately (e.g., Dataflow)

10. Vertex AI Feature Store

Used to manage and serve features for training and inference.

Online Operations Pricing

Operation Type	Price (USD)
Data Processing Node	$0.08 per hour
Optimized Serving Node	$0.30 per hour (includes 200 GB)
Bigtable Serving Node	$0.94 per hour
Bigtable Storage	$0.000342 per GiB-hour

Notes:

Offline operations have the same pricing as BigQuery for ingestion, querying, and storage.
Pricing above is for the new version of the feature store. Google offers separate pricing for the legacy AI Feature Store.

11. Vertex ML Metadata

Stores metadata for tracking datasets, models, and runs.

Metric	Price (USD)
Storage	$10 per GiB per month

12. Vertex AI TensorBoard

Used for logging, visualization, and tracking model experiments.

Resource	Price (USD)
Storage	$10 per GiB per month

13. Vertex AI Vizier

Optimizes hyperparameters through black-box search.

Usage Tier	Price (USD)
First 100 trials/month	Free
Additional trials	$1 per trial (excl. random/grid)
Random/Grid Search	Free always

14. Vertex AI Vector Search

Used for semantic search over embeddings with Approximate Nearest Neighbor (ANN) indexing.

Component	Price (USD)
Index Serving (e2-standard-2)	$0.0938 per node-hour
Index Build/Update (Batch)	$3.00 per GiB processed
Streaming Update (Insert)	$0.45 per GiB

Storage-Optimized Tier

Unit Type	Price (USD)
Capacity Unit (CU)	$2.30 per hour
Write Unit	$0.45 per GiB

15. Vertex AI Model Monitoring

Monitors deployed model performance and data drift.

Resource Type	Price (USD)
Data Analyzed	$3.50 per GB
Additional Services	Billed separately (e.g., BigQuery, Explain)

16. Vertex AI Workbench

Development environment for managing notebooks and experiments.

Compute & Memory Pricing

Resource	Price (USD) per Hour
vCPU (E2)	$0.0261
vCPU (N1, N2, A2)	$0.0379
Memory (E2)	$0.0035 per GiB-hour
Memory (Others)	$0.0051 per GiB-hour

Accelerator Pricing

GPU Type	Price (USD) per Hour
A100	$4.40
T4	$0.525
V100	$3.72
P100	$2.19

Disk Pricing

Disk Type	Price (USD) per GiB-hour
Standard	$0.0000658
SSD	$0.0002795
Hyperdisk / Extreme / Balanced	$0.000164 – $0.000205

Management Fees

Workbench Type	Resource	Price (USD)
Managed Notebooks	vCPU	$0.05 per vCPU
	GPU (Standard)	$0.35 per GPU
	GPU (Premium)	$2.48 per GPU
User-Managed	vCPU	$0.005 per vCPU
	GPU (Standard)	$0.035 per GPU
	GPU (Premium)	$0.25 per GPU

Cost Optimization Strategies for Vertex AI Costs

1. Reduce Token Usage Via Prompt Engineering and Truncation

A significant portion of generative model costs in Vertex AI is determined by the number of tokens processed per request. By engineering prompts to be concise and relevant, users can reduce the overall token count, directly lowering expenses. Truncating unnecessary prompt details and focusing only on the core task helps avoid extra computational effort, especially for large language models where verbose prompts can double or triple the token bill.

It’s important to measure and monitor how prompt changes affect cost and model output quality. Automated tools and regression testing can ensure that prompt reductions do not negatively impact performance. Teams should establish guidelines for maximum prompt length and consistently audit deployed prompts to maintain efficiency as new use cases or models are introduced.

2. Optimize Embeddings and Retrieval Workflows

Vertex AI users often leverage large pre-trained models to generate embeddings for similarity search or retrieval augmented generation. Each embedding generated or retrieved incurs a fee, and unnecessary or redundant generation can inflate costs. Optimizing data pipelines to eliminate duplicate computations, such as by caching previous embeddings or only updating representations upon data changes, offers immediate savings.

Batch processing and judicious selection of the data to be embedded can also control charges. For instance, embedding only the most relevant or frequently accessed data rather than the entire corpus can significantly decrease volume-based expenses. Regular audits of retrieval workflow logs can identify unused or rarely accessed embeddings, guiding pruning strategies to further manage spend.

3. Select Cost-Efficient Model Families

Selecting the right model family has a direct impact on cost efficiency. Vertex AI offers a variety of models with different capabilities and associated pricing, ranging from lightweight models suitable for high-throughput scenarios to premium models for complex tasks requiring higher quality. By matching use cases to the most cost-effective model, organizations can prevent overpaying for unnecessary capability.

It’s crucial to benchmark model performance against project requirements before scaling usage. Prototyping with small workloads can help determine if smaller or older-generation models suffice with minimal impact on accuracy or latency. Regular reviews should reassess model choices as newer, more optimized offerings become available, potentially unlocking further savings without sacrificing output quality.

4. Use Caching and Batching To Reduce Inference Calls

Frequent, individual inference requests made to Vertex AI endpoints can accumulate high costs. Implementing caching for repeated queries and batching inference jobs allows organizations to reduce the number of API calls and fully utilize endpoint throughput. Batching enables multiple requests to be processed simultaneously, maximizing the efficiency of computational resources and driving down per-request expenses.

Automated cache management and configurable batch intervals help maintain response times while moderating expenditures. Organizations should monitor endpoint logs for frequent or redundant requests and test different batching configurations to find the optimal balance between user experience and operational cost. Adding metrics dashboards to track these changes facilitates ongoing optimization as traffic patterns evolve.

5. Control Regional Spending and Deployment Footprint

Vertex AI pricing can vary by region due to differences in available hardware, network egress, and integration dependencies. Deploying endpoints or workloads only in essential regions helps avoid duplicative spending. Geo-fencing workloads to where data and users reside reduces latency and unnecessary inter-region data transfer, contributing to predictable and localized cost management.

Monitoring deployment footprints across projects is also crucial for preventing resource sprawl. Automated tooling can identify idle or underutilized endpoints, prompting decommissioning or regional consolidation. Organizations should periodically audit regional resource allocation and align deployment policies with usage patterns to ensure they are not incurring overlapping charges for the same workload in multiple geographies.

Conclusion

Vertex AI offers powerful tools for developing and deploying machine learning systems, but costs can escalate quickly without careful management. Understanding the pricing model for each service, from generative AI and custom training to prediction, pipelines, and supporting infrastructure, is essential for making informed architectural and operational decisions. By aligning model choices, workload strategies, and optimization practices, teams can take full advantage of Vertex AI’s capabilities while keeping costs under control.