Snowflake Cost Optimization Techniques: Part Two

Mar 13th, 2023
Snowflake Cost Optimization Techniques: Part Two
URL Copied

Table of Contents

  1. Rightsize Warehouses
  2. Compress Data
  3. Optimize Snowflake Tables
  4. Archive Historical Data
  5. Use Time Travel Wisely
  6. Enable Regular Snowflake Cost Audits
  7. Understand Your Snowflake Costs With Finout

In our part one blog post on Snowflake cost optimization, we've looked at how Snowflake pricing works and how to drive costs down without compromising performance. 

We wanted to take the existing advice a bit further and add a few extra tips to help you save on your infrastructure costs. Take your time to experiment with them to see what has the greatest impact on your monthly bill.

Rightsize Warehouses

Snowflake charges for warehouses based on their size and duration of usage. For example, if you have a medium-sized warehouse that runs for 100 hours a month, you will be charged for 100 hours of usage at the hourly rate for a medium-sized warehouse.

WAREHOUSE SIZE

SERVERS

CREDITS PER HOUR

CREDITS PER SECOND

X-Small

1

1

0.0003

Small

2

2

0.0006

Medium

4

4

0.0011

Large

8

8

0.0022

X-Large

16

16

0.0044

2X-Large

32

32

0.0089

3X-Large

64

64

0.0178

2X-Large

128

128

0.0356

2X-Large

256

256

0.0711

2X-Large

512

512

0.1422

But even though smaller warehouses consume fewer credits, they're not necessarily cheaper than large warehouses. A query can run for an hour on an X-Small warehouse or for 15 minutes on a Medium warehouse. In both cases, it will consume one credit, but it will execute more quickly in the larger warehouse. 

Query

Runtime on X-Small Warehouse (1 credit)

Runtime on Medium Warehouse (4 credits)

Savings

A

1 hour

15 minutes

45 minutes

B

2 hours

30 minutes

1 hour, 30 minutes

C

30 minutes

7.5 minutes

22.5 minutes

Nailing down the right size for your warehouses requires a bit of trial and error to come up with an option with the shortest runtime and lowest rate.

Compress Data

Compressing data before loading it into Snowflake can help you drastically reduce storage costs. Snowflake supports various compression formats, including gzip, bzip2, brotli, etc.

The amount of compression achieved depends on the data type and compression algorithm used.

For example, suppose you have a dataset that is 1GB in size, and you compress it using gzip compression as below:

Original Data Size

1 GB

Compressed Data Size

200 MB

Compression Ratio

5

Compression Percentage

80%

In that scenario, the compressed file size decreased to approximately 200 MB, saving you 80% on storage expenses.

Optimize Snowflake Tables

In Snowflake, tables are an essential component of your data warehouse. Tables are used to store and organize your data, and they have a significant impact on the performance and cost of your Snowflake instance.

To optimize table usage in Snowflake, follow these best practices:

Switch to columnar storage

Columnar storage is a technique used in Snowflake that organizes data by columns rather than rows. This technique can improve query performance and reduce storage costs. When data is stored in columns, it can be compressed more efficiently, resulting in smaller storage requirements.

````
CREATE TABLE my_SFtable (
    column1 INTEGER,
    column2 VARCHAR,
    column3 DATE
)
CLUSTER BY (column1);
```
In this example, we are creating a table named "my_SFtable" with three columns. We are also clustering the table on the "column1" column to improve query performance.

Use table compression

By using compression, you can reduce the amount of storage required for your data, which can result in significant cost savings.

```
CREATE TABLE my_SFtable (
    column1 INTEGER,
    column2 VARCHAR,
    column3 DATE
)
CLUSTER BY (column1)
COMPRESSION = 'ZSTD';
```
Here, we are creating a table named "my_SFtable" with three columns. We are also clustering the table on the "column1" column and using the Zstandard (ZSTD) compression algorithm.

Partition tables

Partitioning tables can also improve query performance and reduce compute costs. When tables are partitioned, data is stored in separate files based on a partition key. This can improve query performance because queries can be limited to specific partitions, and it can reduce compute costs because only the required partitions need to be scanned.

```
CREATE TABLE my_table (
    column1 INTEGER,
    column2 VARCHAR,
    column3 DATE
)
PARTITION BY RANGE(column1)(
    PARTITION 1 <= VALUES < 100,
    PARTITION 100 <= VALUES < 1000,
    PARTITION 1000 <= VALUES < 10000
);
```
In this example, we are creating a table named "my_table" with three columns. We are also partitioning the table by the "column1" column into three partitions based on the value range.

Use aggregates

Aggregating your data before querying can also reduce the amount of data being processed, resulting in lower compute costs. When possible, use GROUP BY and aggregate functions to summarize your data.

```
SELECT column1, SUM(column2)
FROM my_table
GROUP BY column1;
```
In the example above, we are grouping data from the "my_table" table by "column1" and summing the values of "column2."

Avoid joins

Joining tables can significantly increase the amount of data being processed, resulting in higher compute costs. When possible, try to limit the number of tables being joined in a query.

```
SELECT *
FROM table1
JOIN table2 ON table1.id = table2.id;
```
This example joins "table1" and "table2" on the "id" column.

Create materialized views

Materialized views are pre-computed views that can improve query performance by caching results. Materialized views are created by executing a SELECT statement, and the results are stored in a table. When the underlying data changes, the materialized view is automatically updated.

```
CREATE MATERIALIZED VIEW my_materialized_view AS
SELECT column1, COUNT(column2)
FROM my_table
GROUP BY column1;
```
To demonstrate this, we are creating a materialized view named "my_materialized_view" that counts the number of occurrences of "column2" for each value of "column1" in the "my_table" table.

New call-to-action

Archive Historical Data

You should use archival storage if you need infrequent access to data that likely won't be used in the future.

Archive storage pricing is based on the amount of data stored per month, measured in terabytes (TB) per month. The standard pricing model is as follows:

First 6 months: $23/TB/month
Next 6 months: $15/TB/month
After 1 year: $10/TB/month

Storage costs are typically lower than compute costs, but they can quickly add up over the years. Consider that you wish to archive 5 GB of historical data for a period of two years:

First 6 months: 6 months x 5 TB x $23/TB/month = $690
Next 6 months: 6 months x 5 TB x $15/TB/month = $450
After 1 year: 12 months x 5 TB x $10/TB/month = $600

Total cost for 2 years of archive storage: $1740

Considering standard storage costs in Snowflake can go up to $40/TB/month, you can save up to 300% in the long term by simply archiving rarely accessed data.

Use Time Travel Wisely

Snowflake's Time-Travel feature allows you to query data as it appeared at a specific point in time. 

The issue with Time-Travel is that it requires additional storage, and the longer you retain Time-Travel data, the higher the storage costs. To reduce storage costs, you should consider limiting the duration of Time-Travel data retention or use it only when necessary. To do that, simply adjust the DATA_RETENTION_TIME_IN_DAYS object parameter to define the time travel period at the DB, schema, or table level. 

 

To further optimize the cost of Time-Travel storage, Snowflake recommends partitioning large tables and using Time-Travel on a partitioned table. This allows users to retain historical data for specific partitions that are relevant to your business needs.

Enable Regular Snowflake Cost Audits

Snowflake is a usage-based service which is why it's crucial to keep monitoring your usage and modify your approach where you deem fit. 

You can run a number of checks to dig deep into your Snowflake usage and identify areas for improvement:

```
SELECT *
FROM table(information_schema.query_history())
WHERE query_type = 'SELECT';
```
This example queries the "information_schema.query_history" table to retrieve a list of all SELECT queries executed in Snowflake.

````
SELECT SUM(bytes)
FROM table(information_schema.columns)
WHERE table_name = 'my_table';
```
In this case, we are querying the "information_schema.columns" table to retrieve the total number of bytes used by the "my_table" table.

To retrieve a list of all queries that took longer than 1000 milliseconds to execute, run:

```
SELECT query_id, execution_status, start_time, end_time, total_elapsed_time
FROM information_schema.query_history
WHERE total_elapsed_time > 1000
ORDER BY start_time DESC;
```
You can use this information to identify queries that may need to be optimized further.

After identifying the cost drivers, you can take steps to optimize your usage and reduce costs. This may include reducing storage usage by deleting unused data or optimizing queries to reduce compute costs.

```
SELECT SUM(bytes)
FROM table(information_schema.columns)
WHERE table_name = 'my_table';
```

To illustrate, we are deleting data from the "my_table" table that was created before January 1, 2022.

Understand Your Snowflake Costs With Finout

Manually tracking your Snowflake costs is often complex and time-consuming. 

With a cost monitoring tool like Finout, you can easily track and optimize the cost of your Snowflake environment by setting up cost alerts, get recommendations to correctly rightsize your Snowflake warehouses, identifying the cost of individual queries and storage, and much more.

Plus, Finout can track Snowflake costs together with all your other cloud infrastructure including AWS, DataDog, Azure, GCP, Databricks, and more, all from a single dashboard.

Try our platform for free.

Main topics