Finout Blog Archive

Vertex AI: Pricing for Top 16 Vertex Services in 2026

Written by Asaf Liveanu | Jan 18, 2026 8:46:33 AM

How Does Google Price Vertex AI?

Google Vertex AI is a managed machine learning (ML) platform for the end-to-end development lifecycle for AI and ML projects on Google Cloud. It provides tools for data preparation, model training, hyperparameter tuning, deployment, and monitoring within a unified interface.

Vertex AI pricing varies by service, with core costs determined by model training, prediction (online and batch), and storage. For example, custom training is often priced per node-hour, online predictions can be priced per 1,000 counts, and generative AI models are priced per 1 million tokens. Other factors like pipeline runs ($0.03 per run), data preprocessing, and specialized services like Vertex AI Search or Vision AI have their own unique pricing structures. 

This is part of a series of articles about AI costs

In this article:

Google Vertex AI Pricing: Generative and Agentic AI 

1. Generative AI 

Google Vertex AI offers a flexible pricing model for generative AI services, allowing developers and enterprises to choose from several Gemini model variants based on performance and cost requirements. 

The platform supports a variety of input and output types including text, images, audio, and video, with pricing determined by model type, token volume, and context length. In addition, optional features like grounding with Google Search or Maps come with separate usage-based charges.

Below is a breakdown of generative AI pricing on Vertex AI.

Gemini 3 (Text/Image/Video/Audio)

Usage Type

≤200K Tokens

>200K Tokens

≤200K Cached Tokens

>200K Cached Tokens

Batch API ≤200K

Batch API >200K

Input 

$2.00

$4.00

$0.20

$0.40

$1.00

$2.00

Text Output

$12.00

$18.00

N/A

N/A

$6.00

$9.00

Image Output

$120.00

N/A

N/A

N/A

$60.00

N/A


Gemini 2.5 Pro (Text, Image, Video, Audio Inputs)

Usage Type

≤200K Tokens

>200K Tokens

≤200K Cached Tokens

>200K Cached Tokens

Batch API ≤200K

Batch API >200K

Input

$1.25

$2.50

$0.125

$0.250

$0.625

$1.25

Text Output

$10.00

$15.00

N/A

N/A

$5.00

$7.50

Gemini 2.5 Flash

Input Type

≤200K Tokens

>200K Tokens

≤200K Cached Tokens

>200K Cached Tokens

Batch API ≤200K

Batch API >200K

Text/Image/Video Input

$0.30

$0.30

$0.030

$0.030

$0.15

$0.15

Audio Input

$1.00

$1.00

$0.100

$0.100

$0.50

$0.50

Text Output

$2.50

$2.50

N/A

N/A

$1.25

$1.25

Image Output

$30.00

$30.00

N/A

N/A

$15.00

$15.00

Tuning (1M Training Tokens)

$5.00

N/A

N/A

N/A

N/A

N/A

Gemini 2.5 Flash Live API

Input/Output Type

Price (per 1M Tokens)

Text Input

$0.50

Audio Input

$3.00

Video/Image Input

$3.00

Text Output

$2.00

Audio Output

$12.00

Gemini 2.5 Flash Lite

Usage Type

≤200K Tokens

>200K Tokens

≤200K Cached Tokens

>200K Cached Tokens

Batch API ≤200K

Batch API >200K

Text/Image/Video Input

$0.10

$0.10

$0.010

$0.010

$0.05

$0.05

Audio Input

$0.30

$0.30

$0.030

$0.030

$0.15

$0.15

Text Output

$0.40

$0.40

N/A

N/A

$0.20

$0.20

Grounding Features (Add-On Costs)

Feature

Free Usage Tier

Price Beyond Free Tier

Google Search Grounding

1,500 prompts/day (Flash, Flash Lite)

10,000 prompts/day (Pro)

$35 per 1,000 grounded prompts

Web Grounding for Enterprise

None

$45 per 1,000 grounded prompts

Google Maps Grounding

None

$25 per 1,000 grounded prompts

Grounding with Your Data

None

$2.50 per 1,000 requests


Notes:

  • These rates are applicable only for successful requests (HTTP 200). 
  • Token usage includes both new and previous turns in live sessions, and long context inputs (>200K tokens) incur higher charges. 
  • For enterprise-scale needs, such as over 1 million grounded prompts daily, contact Google for a custom quote.

Given the complexity of tiered pricing and token counting, calculating these costs manually can be difficult. You can use this Vertex AI Cost Calculator to instantly estimate your spend based on your specific model choice and volume.

2. Vertex AI Agent Engine

Vertex AI Agent Engine enables developers to run production-scale AI agents with compute and memory usage–based billing. Charges apply only to active runtime usage; idle agents are not billed. Pricing is calculated per second and includes a generous free tier.

Resource

Free Tier (Per Month)

Paid Rate (After Free Tier)

vCPU

First 180,000 vCPU-seconds

$0.0994 per 3,600 seconds (1 hour)

RAM

First 360,000 GiB-seconds

$0.0105 per 3,600 GiB-seconds (1 hr)


Related content: Read our guide to Gemini pricing (coming soon)

Google Vertex AI Pricing: Model Training 

3. AutoML Models

Vertex AI AutoML allows users to train models on image and tabular data without writing code. The pricing is divided into three main activities: training the model, deploying it to an endpoint, and making predictions. Charges are based on the time and resources used, and costs accrue even if no predictions are made—unless the model is undeployed.

Operation

Image Classification

Image Object Detection

Tabular (Classification/Regression)

Training (per hour)

$3.465

$3.465

$21.252

Training (Edge model, per hr)

$18.00

$18.00

N/A

Deployment & Online Prediction (per hr)

$1.375

$2.002

Same as custom-trained model

Batch Prediction (per hour)

$2.222

$2.222

See Vertex AI Forecast section

Explainable AI (for predictions)

Included in prediction price

Included

Included (may use more nodes)


4. Ray on Vertex AI

Ray on Vertex AI allows users to run distributed training workloads using Ray, an open-source framework for scalable AI and Python applications. Users can define custom clusters using various machine types, including CPU and GPU configurations, and pricing is based on the underlying compute and accelerator resources consumed during training.

Ray Training Pricing (us-central1)

Machine Type

Price per Hour (USD)

n1-standard-4

$0.228

n1-highmem-16

$1.136

n2-standard-32

$1.865

a2-highgpu-8g*

$35.264

a3-highgpu-8g*

$105.399

e2-standard-16

$0.643

c2-standard-16

$1.002

m1-ultramem-160

$30.207


(*) GPU cost included in total instance price.

Common Accelerator Pricing (us-central1)

Accelerator Type

Price per Hour (USD)

NVIDIA A100

$3.52

NVIDIA H100 80GB

$11.76

TPU v3 Pod (32 cores)

$38.40

Disk Pricing

Disk Type

Price per GiB-Hour (USD)

pd-standard

$0.000066

pd-ssd

$0.000279


5. Vertex AI Neural Architecture Search

Vertex AI Neural Architecture Search (NAS) lets users automatically discover optimized model architectures. Pricing is based on compute resources used during search and training, and includes both CPU and GPU options. You can choose from predefined scale tiers or fully customize the setup.

NAS Training Pricing (us-central1)

Machine Type

Price per Hour (USD)

n1-standard-16

$1.14

n1-highmem-32

$2.84

n2-standard-64

$4.66

e2-standard-32

$1.61

c2-standard-16

$1.25

a2-highgpu-8g*

$45.13

*Includes GPU cost

Accelerator Pricing

Accelerator Type

Price per Hour (USD)

NVIDIA A100

$4.40

NVIDIA T4

$0.53

NVIDIA V100

$3.72

Disk Pricing

Disk Type

Price per GiB-Hour (USD)

pd-standard

$0.000082

pd-ssd

$0.000349


6. Custom-Trained Models

Vertex AI allows full flexibility by letting users train models with their choice of frameworks and infrastructure. Pricing is determined by the compute, accelerator, and disk resources used. Users can also utilize spot VMs or reserved instances for cost efficiency.

Common Machine Type Pricing (us-central1)

Machine Type

Hourly Price (USD)

n1-standard-4

$0.2185

n1-highmem-8

$0.5442

e2-standard-16

$0.6165

g4-standard-96

$10.35

a2-highgpu-8g*

$35.40

a3-ultragpu-8g*

$99.77

m1-ultramem-160

$28.95


(*) GPU included in machine type.

Common Accelerator Pricing (us-central1)

Accelerator Type

Price per Hour (USD)

NVIDIA A100

$2.93

NVIDIA H100 80GB

$9.80

NVIDIA L4

$0.64

TPU v2 Single (8 cores)

$5.175

TPU v3 Pod (32 cores)

$36.80


Google Vertex AI Pricing: Forecasting and Prediction 

7. Vertex AI Forecast

Vertex AI Forecast provides time-series forecasting for tabular data using AutoML or ARIMA+ models. Pricing is based on training time and the number of forecasted data points. Costs are tiered depending on usage volume and whether you're using AutoML or ARIMA+ models.

AutoML Forecasting Pricing

Stage

Usage Volume

Price

Prediction

0 to 1M points/month

$0.20 per 1,000 predictions

 

1M to 50M points/month

$0.10 per 1,000 predictions

 

>50M points/month

$0.02 per 1,000 predictions

Training

N/A

$21.252 per hour

ARIMA+ Forecasting Pricing

Stage

Pricing

Prediction

$5.00 per 1,000 points

Training

$250 per TB × Candidate Models × Backtesting Windows

Explainability

Included (via time series decomposition)


Note: Each data point is a single time step in the forecast horizon. Up to five quantiles are included at no extra cost.

8. Running Prediction and Explanation Jobs

Vertex AI charges for predictions based on the machine type and configuration used by the prediction nodes. These nodes are billed per vCPU, RAM, and optional GPU usage, whether serving online requests, running batch jobs, or remaining idle in a ready state. Pricing also applies to explanations generated by the model.

Online Batch Prediction Pricing

Item

Price per Hour (USD)

vCPU

$0.025–$0.039 (varies by machine type)

RAM (per GiB)

$0.0025–$0.0053

GPU (e.g., A100 40GB)

$3.37

Example-Based Indexing

$3.00 per GB of index (data size × float size)

Example Calculation (Batch Job)

  • Job: 3961 inputs, 0.72 sec each
  • Time: 47.5 minutes (approx.)
  • Cost (n1-highcpu-32 @ $0.09099/hr): ~$0.063

Google Vertex AI Pricing: Supporting AI Services

In addition to model training, prediction, and generative AI tools, Vertex AI offers a range of supporting services for building production-ready ML systems. These include tools for pipelines, metadata tracking, experiment monitoring, model registry, optimization, and vector-based search. Below is a breakdown of their pricing.

9. Vertex AI Pipelines

Vertex AI Pipelines help automate ML workflows with reproducible runs.

Item

Price (USD)

Pipeline Run Fee

$0.03 per run

Compute Usage

Charged as per training VM rates

External Services

Billed separately (e.g., Dataflow)


10. Vertex AI Feature Store

Used to manage and serve features for training and inference.

Online Operations Pricing

Operation Type

Price (USD)

Data Processing Node

$0.08 per hour

Optimized Serving Node

$0.30 per hour (includes 200 GB)

Bigtable Serving Node

$0.94 per hour

Bigtable Storage

$0.000342 per GiB-hour

Notes: 

  • Offline operations have the same pricing as BigQuery for ingestion, querying, and storage.
  • Pricing above is for the new version of the feature store. Google offers separate pricing for the legacy AI Feature Store.

11. Vertex ML Metadata

Stores metadata for tracking datasets, models, and runs.

Metric

Price (USD)

Storage

$10 per GiB per month


12. Vertex AI TensorBoard

Used for logging, visualization, and tracking model experiments.

Resource

Price (USD)

Storage

$10 per GiB per month


13. Vertex AI Vizier

Optimizes hyperparameters through black-box search.

Usage Tier

Price (USD)

First 100 trials/month

Free

Additional trials

$1 per trial (excl. random/grid)

Random/Grid Search

Free always

14. Vertex AI Vector Search

Used for semantic search over embeddings with Approximate Nearest Neighbor (ANN) indexing.

Component

Price (USD)

Index Serving (e2-standard-2)

$0.0938 per node-hour

Index Build/Update (Batch)

$3.00 per GiB processed

Streaming Update (Insert)

$0.45 per GiB

Storage-Optimized Tier

Unit Type

Price (USD)

Capacity Unit (CU)

$2.30 per hour

Write Unit

$0.45 per GiB

15. Vertex AI Model Monitoring

Monitors deployed model performance and data drift.

Resource Type

Price (USD)

Data Analyzed

$3.50 per GB

Additional Services

Billed separately (e.g., BigQuery, Explain)

16. Vertex AI Workbench

Development environment for managing notebooks and experiments.

Compute & Memory Pricing

Resource

Price (USD) per Hour

vCPU (E2)

$0.0261

vCPU (N1, N2, A2)

$0.0379

Memory (E2)

$0.0035 per GiB-hour

Memory (Others)

$0.0051 per GiB-hour

Accelerator Pricing

GPU Type

Price (USD) per Hour

A100

$4.40

T4

$0.525

V100

$3.72

P100

$2.19

Disk Pricing

Disk Type

Price (USD) per GiB-hour

Standard

$0.0000658

SSD

$0.0002795

Hyperdisk / Extreme / Balanced

$0.000164 – $0.000205

Management Fees

Workbench Type

Resource

Price (USD)

Managed Notebooks

vCPU

$0.05 per vCPU

 

GPU (Standard)

$0.35 per GPU

 

GPU (Premium)

$2.48 per GPU

User-Managed

vCPU

$0.005 per vCPU

 

GPU (Standard)

$0.035 per GPU

 

GPU (Premium)

$0.25 per GPU


Cost Optimization Strategies for Vertex AI Costs 

1. Reduce Token Usage Via Prompt Engineering and Truncation

A significant portion of generative model costs in Vertex AI is determined by the number of tokens processed per request. By engineering prompts to be concise and relevant, users can reduce the overall token count, directly lowering expenses. Truncating unnecessary prompt details and focusing only on the core task helps avoid extra computational effort, especially for large language models where verbose prompts can double or triple the token bill.

It’s important to measure and monitor how prompt changes affect cost and model output quality. Automated tools and regression testing can ensure that prompt reductions do not negatively impact performance. Teams should establish guidelines for maximum prompt length and consistently audit deployed prompts to maintain efficiency as new use cases or models are introduced.

2. Optimize Embeddings and Retrieval Workflows

Vertex AI users often leverage large pre-trained models to generate embeddings for similarity search or retrieval augmented generation. Each embedding generated or retrieved incurs a fee, and unnecessary or redundant generation can inflate costs. Optimizing data pipelines to eliminate duplicate computations, such as by caching previous embeddings or only updating representations upon data changes, offers immediate savings.

Batch processing and judicious selection of the data to be embedded can also control charges. For instance, embedding only the most relevant or frequently accessed data rather than the entire corpus can significantly decrease volume-based expenses. Regular audits of retrieval workflow logs can identify unused or rarely accessed embeddings, guiding pruning strategies to further manage spend.

3. Select Cost-Efficient Model Families

Selecting the right model family has a direct impact on cost efficiency. Vertex AI offers a variety of models with different capabilities and associated pricing, ranging from lightweight models suitable for high-throughput scenarios to premium models for complex tasks requiring higher quality. By matching use cases to the most cost-effective model, organizations can prevent overpaying for unnecessary capability.

It’s crucial to benchmark model performance against project requirements before scaling usage. Prototyping with small workloads can help determine if smaller or older-generation models suffice with minimal impact on accuracy or latency. Regular reviews should reassess model choices as newer, more optimized offerings become available, potentially unlocking further savings without sacrificing output quality.

4. Use Caching and Batching To Reduce Inference Calls

Frequent, individual inference requests made to Vertex AI endpoints can accumulate high costs. Implementing caching for repeated queries and batching inference jobs allows organizations to reduce the number of API calls and fully utilize endpoint throughput. Batching enables multiple requests to be processed simultaneously, maximizing the efficiency of computational resources and driving down per-request expenses.


Automated cache management and configurable batch intervals help maintain response times while moderating expenditures. Organizations should monitor endpoint logs for frequent or redundant requests and test different batching configurations to find the optimal balance between user experience and operational cost. Adding metrics dashboards to track these changes facilitates ongoing optimization as traffic patterns evolve.

5. Control Regional Spending and Deployment Footprint

Vertex AI pricing can vary by region due to differences in available hardware, network egress, and integration dependencies. Deploying endpoints or workloads only in essential regions helps avoid duplicative spending. Geo-fencing workloads to where data and users reside reduces latency and unnecessary inter-region data transfer, contributing to predictable and localized cost management.

Monitoring deployment footprints across projects is also crucial for preventing resource sprawl. Automated tooling can identify idle or underutilized endpoints, prompting decommissioning or regional consolidation. Organizations should periodically audit regional resource allocation and align deployment policies with usage patterns to ensure they are not incurring overlapping charges for the same workload in multiple geographies.

Conclusion

Vertex AI offers powerful tools for developing and deploying machine learning systems, but costs can escalate quickly without careful management. Understanding the pricing model for each service, from generative AI and custom training to prediction, pipelines, and supporting infrastructure, is essential for making informed architectural and operational decisions. By aligning model choices, workload strategies, and optimization practices, teams can take full advantage of Vertex AI’s capabilities while keeping costs under control.