Essential Strategies for Optimizing Snowflake Performance and Reducing Costs

In today’s data-driven enterprise environment, businesses increasingly rely on Snowflake’s cloud data warehouse for handling large-scale data analytics and storage needs. However, optimizing Snowflake performance is crucial to manage growing data warehouse costs and maintain efficiency. This guide shares actionable strategies for reducing Snowflake expenses while boosting query performance and scalability. Snowflake offers many benefits, but it could also be more expensive if you don’t apply optimizations, and I learned it when my organization’s snowflake bill increased exponentially as the usage increased.

Understanding Snowflake’s Cost Structure

Before diving into optimization strategies, it’s crucial to understand how Snowflake’s pricing model works. Unlike traditional data warehouses, Snowflake’s consumption-based pricing model means you pay for what you use, but this also presents unique opportunities for optimization.

Note: all prices are subject to change and should be used for relative magnitude only. Costs are very subject to change, so be sure to check out Snowflake’s documentation. For current up to date cost structures before committing to a new contract/subscription.

There are three core cost components in Snowflake’s cost structure. Compute, Storage, and Data Transfer.

Compute Costs

Compute costs are billed in credits. Credits are Snowflake’s primary billing unit for compute resources. Every time you run a virtual warehouse, you consume credits based on the warehouse size and runtime. Let me walk through a couple of scenarios in configuring a Virtual Warehouse.

Credits/hour are based on warehouse size:
- X-Small (XS): 1
- Small (S): 2
- Medium (M): 4
- Large (L): 8
- X-Large (XL): 16
- 2X-Large (2XL): 32
Examples:
- Scenario 1: Daily ETL Workload – A Large warehouse running for 4 hours/day over 30 days costs 704 credits/month at $2/credit = $1,408.
  - Calculation:
    Daily Credits: 8 credits/hour × 4 hours = 32 credits
    Monthly Credits: 32 credits × 30 business days = 906 credits
    Monthly Cost: 906 credits × $2/credit = $1,812
- Scenario 2: BI Query Workload – A Medium warehouse running for 8 hours/day over 30 days costs 960 credits/month at $3/credit = $2,880
  - Calculation:
    Daily Credits: 4 credits/hour × 8 hours = 32 credits
    Monthly Credits: 32 credits × 30 days = 960 credits
    Monthly Cost: 960 credits × $3/credit = $2,880

Storage Costs

Storage costs represent data that is being persisted. For example for Active storage, staging, used for fail safe, time travel, platform costs (Cloud platform (AWS, Azure, GCP)). etc..

Types of storage affect your bill:

Active Storage: Currently used data
- Compressed and optimized automatically.
- Rates typically range from $23 to $40 per TB per month.
- Varies by region and cloud provider.
Time Travel Storage: Historical data versions
- Cost varies based on retention period.
- Standard Edition: 1 day included.
- Enterprise Edition: Up to 90 days
- Additional storage needs can significantly impact costs.
Fail-safe Storage: 7-day disaster recovery backup
- No direct charge in most editions
- Included in overall storage calculations.
- Critical for compliance and recovery

Real-World Storage Example:

Items stored:
- Average Data Storage: 5 TB
- Time Travel Storage: 1 TB
- Fail-safe Storage: 2 TB
- Storage Rate: $40/TB/month
Calculation:
- Total Storage: 8 TB
- Monthly Storage Cost: 8 TB × $40/TB = $320

Data Transfer Costs

Snowflake charges for data transfer based on different scenarios and directions of data movement. Let’s break down each type with explanations and an example.

Cross-region data movement
- Costs range from $20 to $200 per TB.
- Varies by cloud provider and regions involved.
- Some regions have higher costs due to infrastructure.
- Example: Moving 1TB of data between AWS regions could cost $100-200
  - Moving 1TB of data between AWS regions:
    
    US East to US West: ~$20/TB
    
    US to Europe: ~$100/TB
    
    US to Asia: ~$200/TB
External table access
- Costs depend on source location.
- Frequency of access impacts total cost
- Caching can help reduce repeated transfer costs.
Data sharing between accounts
- Provider account bears compute costs
- No transfer costs for same region sharing
- Cross-region sharing incurs additional fees.
Snowpipe ingestion
- Additional charges apply for auto-ingestion infrastructure.
- Per-file ingestion fees may apply

Dig deeper into these components

Understanding these cost components and their variations is crucial for:

Budget planning
Resource allocation
Optimization strategies
Cost forecasting
Capacity planning

Each component can be optimized independently, but they often interact with each other. For example, better compute optimization might require more storage for materialized views, or reduced storage costs might lead to increased compute costs for real-time processing.

If you want to know more, you can check out the Snowflake Documentation on costs here.

Now that you have a fair idea of how cost is calculated in Snowflake, let’s dive into optimizations by starting with the warehouse optimization which is the highest cost contributor.

Immediate Cost Optimization Strategies

In this section, I will cover some of the more immediate cost optimizations you can use to save money pretty quickly.

Auto-Suspension Configuration in Snowflake

The auto-suspension feature is crucial for controlling costs, especially for workloads with intermittent activity. Warehouses take about 1-2 seconds to resume, so the impact on query performance is minimal for most use cases.

Minimize idle costs using the following SQL:

ALTER WAREHOUSE COMPUTE_WH

SET AUTO_SUSPEND = 300,

AUTO_RESUME = TRUE;

What this does:
- Sets the warehouse to automatically suspend after 5 minutes of inactivity.
- Enables automatic resume when new queries arrive.
Cost Impact:
- If your warehouse runs 24/7 without auto-suspension, you’re paying for idle time.
- For a Medium warehouse (4 credits/hour):
  - Without Auto-suspend: 24 hours × 4 credits = 96 credits/day ($192/day at $2/credit)
  - With Auto-suspend (8 active hours): 8 hours × 4 credits = 32 credits/day ($64/day)
  - Daily Savings: $128 or 67% saving.

Multi-Cluster Configuration for Scalability

Enable auto-scaling for dynamic workloads to adjust clusters to match workload demands. This is particularly useful for scenarios with variable query loads, such as:

Morning report generation peaks
End-of-month financial processing.
Concurrent user query spikes during business hours

ALTER WAREHOUSE COMPUTE_WH

SET MIN_CLUSTER_COUNT = 1,

MAX_CLUSTER_COUNT = 3,

SCALING_POLICY = 'STANDARD';

What this does:
- Sets minimum cluster count to 1 (baseline capacity)
- Allows automatic scaling up to 3 clusters based on workload.
- Uses standard scaling policy for balanced scaling decisions.
Cost Impact:
- While multiple clusters increase concurrent processing capability, they multiply credit consumption:
  - Single Cluster (Medium): 4 credits/hour
  - Three Clusters: 12 credits/hour
- Peak Hour Cost Comparison (Enterprise Edition at $3/credit):
- Single Cluster: 4 credits × $3 = $12/hour
- Three Clusters: 12 credits × $3 = $36/hour

Note: These costs are in addition to the standard warehouse size pricing discussed earlier. Multi-clustering is particularly cost-effective when query queuing would otherwise impact business operations.

Specialized Warehouse Creation for Workload Segmentation

Separate ETL and BI workloads. Different workload types have distinct resource needs and timing patterns. For example:

ETL processes often need more compute power but run in shorter bursts.
BI queries typically require less power but run throughout the day.
Data science workloads might need large warehouses for specific operations.

Here’s how to configure warehouses for these different needs:

ETL:

-- ETL Warehouse

CREATE WAREHOUSE ETL_WH

WITH WAREHOUSE_SIZE = 'LARGE'

AUTO_SUSPEND = 600

AUTO_RESUME = TRUE

INITIALLY_SUSPENDED = TRUE;

BI:

-- BI Warehouse

CREATE WAREHOUSE BI_WH

WITH WAREHOUSE_SIZE = 'MEDIUM'

AUTO_SUSPEND = 300

AUTO_RESUME = TRUE

INITIALLY_SUSPENDED = TRUE;

What this does:

Creates separate warehouses for ETL and BI workloads.

ETL warehouse:
- LARGE size for processing power (8 credits/hour)
- 10-minute auto-suspend for batch jobs
- Starts suspended to prevent unnecessary credits.
BI warehouse:
- MEDIUM size for interactive queries (4 credits/hour)
- 5-minute auto-suspend for interactive usage
- Starts suspended to prevent unnecessary credits.

Cost Impact Example:

ETL Warehouse (4 hours/day):

8 credits/hour × 4 hours = 32 credits/day

BI Warehouse (8 hours/day):

4 credits/hour × 8 hours = 32 credits/day
Total Daily Credits: 64 credits
Daily Cost (Standard Edition): 64 credits × $2 = $128

Compared to single LARGE warehouse running 12 hours:

8 credits/hour × 12 hours = 96 credits
Daily Cost: 96 credits × $2 = $192

Daily Savings: $64

The separation of workloads often leads to better overall performance despite using smaller warehouses for some tasks. ETL jobs can run faster on a dedicated large warehouse, while BI queries benefit from consistent, right-sized resources. This approach can actually reduce total processing time and thus credit consumption.

Best Practices for Cost-Efficient Warehouse Management

Understanding how to effectively manage warehouse resources is crucial for both cost control and performance optimization. Here are key strategies based on real-world implementations:

Size Planning:
- Start with smaller warehouses and scale up based on performance metrics.
- Monitor query execution times and credit consumption.
- Use query history to identify optimal warehouse sizes.
Workload Segregation:
- Separate ETL and BI workloads to different warehouses
- Match warehouse size to workload characteristics.
- Consider time zones for global operations.
Auto-scaling Strategy:
- Set appropriate MIN_CLUSTER_COUNT based on minimum concurrent users.
- Adjust MAX_CLUSTER_COUNT based on peak load requirements.
- Monitor scaling patterns to optimize settings.
Monitoring Query Performance:

SELECT warehouse_name,

COUNT(*) as query_count,

AVG(execution_time) as avg_execution_time,

SUM(credits_used) as total_credits,

SUM(credits_used)/COUNT(*) as credits_per_query

FROM snowflake.account_usage.query_history

WHERE start_time >= dateadd('day', -7, current_timestamp())

GROUP BY 1

ORDER BY 3 DESC;

Query Optimization Techniques in Snowflake

Based on experience optimizing enterprise-scale data platforms, here are key query optimization strategies that have proven effective in real-world scenarios:

Leverage Clustering for Better Query Performance

Clustering in Snowflake refers to the physical organization of table data based on one or more columns. Think of it as similar to traditional database indexing, but with some key differences. Here’s how to implement it:

CREATE OR REPLACE TABLE sales_data (

transaction_date DATE,

store_id VARCHAR,

product_id VARCHAR,

sale_amount NUMBER(38,2)

) CLUSTER BY (transaction_date, store_id);

Explanation:

Clustering helps physically organize data to improve query performance. While clustering generally helps queries that filter on clustered columns, it may not benefit for:

Queries that scan all data (full table scans)
Queries that don’t filter on clustered columns.
Tables with frequent updates to clustered columns

In this example, we cluster by transaction_date and store_id. When queries filter on these columns, Snowflake can skip scanning irrelevant data micro-partitions.

Reduces I/O and improves query execution time
Particularly effective for frequently queried columns

Use Materialized Views for Frequent Queries

Materialized views can significantly improve performance for frequently run complex queries, but they come with maintenance overhead:

Pre-compute results to optimize query execution:

-- Create a materialized view for aggregated daily sales

CREATE OR REPLACE MATERIALIZED VIEW daily_sales_summary AS

SELECT

DATE_TRUNC('day', transaction_date) as sale_date,

store_id,

product_category,

SUM(sale_amount) as total_sales,

COUNT(*) as transaction_count,

AVG(sale_amount) as average_transaction_value

FROM sales_data

GROUP BY 1, 2, 3;

Explanation:

Materialized views pre-compute and store query results

Pros:

Reduces computational overhead for repetitive queries.
Automatically maintained by Snowflake.
Can dramatically improve query performance.

Cons:

Consumes additional storage.
Updates consume credits.
May not be cost-effective for infrequently accessed data.

Predicate Pushdown Optimization

Understanding predicate pushdown helps write more efficient queries:

-- Inefficient Query

SELECT * FROM large_table

WHERE complex_transformation(column) = 'value';

-- Optimized Query

SELECT * FROM large_table

WHERE column IN (

SELECT column

FROM lookup_table

WHERE complex_condition

);

Explanation:

Predicate pushdown moves filtering conditions closer to data source.
Reduces amount of data processed and transferred.
Allows Snowflake to leverage clustering and micro-partition pruning.
Minimizes unnecessary data scanning and movement.

Window Function Optimization

Window functions are powerful SQL features that perform calculations across a set of rows related to the current row. While extremely useful, they can impact performance if not optimized properly. Snowflake provides specific optimizations and features to make window functions more efficient.

Understanding QUALIFY Clause

Snowflake introduces the QUALIFY clause as a unique optimization feature that isn’t available in standard SQL. This clause allows filtering on window functions without using subqueries or CTEs (Common Table Expressions), resulting in more efficient execution and cleaner code.

Common Window Function Scenarios and Optimizations:

In this section I will show a few examples of using window functions:

Top N Records Per Group

-- Less Efficient Approach (Using CTE)

WITH ranked_sales AS (

SELECT

store_id,

sale_amount,

ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY sale_amount DESC) as rank

FROM sales_data

)

SELECT * FROM ranked_sales WHERE rank <= 10;

-- Optimized Approach (Using QUALIFY)

SELECT

store_id,

sale_amount,

ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY sale_amount DESC) as rank

FROM sales_data

QUALIFY rank <= 10;

Performance Impact:

Eliminates the need for temporary result storage.
Reduces memory usage.
Allows Snowflake to optimize the entire operation as a single unit.
Typically, 20-30% faster for large datasets

Join Optimization Techniques

Understanding join behavior and optimization is crucial for query performance:

-- Inefficient Join SELECT s.store_name, t.transaction_amount FROM stores s LEFT JOIN transactions t ON s.store_id = t.store_id; -- Optimized Join with Join Filtering SELECT s.store_name, t.transaction_amount FROM stores s LEFT JOIN transactions t ON s.store_id = t.store_id AND t.transaction_date >= '2023-01-01' AND t.transaction_amount > 100;

Explanation:

Join filters in the ON clause can improve performance by:
- Reducing the amount of data before the join
- Allowing better use of clustering and partition pruning
For LEFT JOINs, conditions in the ON clause vs WHERE clause behave differently:
ON clause: Filters only the right table
WHERE clause: Filters after the join

Choose based on your specific needs and data relationships.

Best Practices for Query Optimization

Use Appropriate Data Types
- Choose the smallest possible numeric precision.
- VARCHAR in Snowflake automatically optimizes storage
- Use VARCHAR with appropriate length
Leverage Query Result Caching
- Results are cached for 24 hours.
- Caching works for identical queries including:
  - Same text (including spaces and comments)
  - Same parameter values
  - Same session settings
Minimize Data Movement
- Use local joins when possible.
- Avoid unnecessary data transformations.
Monitor and Analyze Query Performance
- Use EXPLAIN command to understand query execution plan.
- Review QUERY_HISTORY for performance insights.

-- Query Performance Analysis

SELECT

query_text,

total_elapsed_time,

bytes_scanned,

bytes_spilled_to_local_storage,

bytes_spilled_to_remote_storage

FROM snowflake.account_usage.query_history

WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())

ORDER BY total_elapsed_time DESC

LIMIT 10;

Look for patterns in poor-performing queries:

Large amounts of data scanned.
Excessive spilling to storage
Long execution times
High credit consumption

To implement effective query optimization in Snowflake, it requires a combination of:

Strategic table design
Efficient query writing
Understanding of Snowflake’s query execution mechanics
Continuous monitoring and refinement

By implementing these techniques, you can significantly reduce query execution time and minimize credit consumption.

Storage Optimization in Snowflake

Managing storage effectively is crucial for both cost control and performance. Let’s explore key strategies through practical scenarios.

Manage Time Travel Periods

Time Travel allows you to access historical data, but it comes with storage costs. Here’s how to optimize it:

— Example: Set 7-day retention for frequently changing tables

1 2	ALTER TABLE sales_data SET DATA_RETENTION_TIME_IN_DAYS = 7;

— Example: Reduce retention for large, infrequently accessed tables

1 2	ALTER TABLE historical_logs SET DATA_RETENTION_TIME_IN_DAYS = 1;

Management Strategy:

Set longer retention (7+ days) for critical business data.
Use minimal retention (0-1 days) for temporary or staging tables.
Monitor Time Travel storage costs through system views.
Consider business requirements vs. storage costs when setting retention periods.

Use Compact Storage Formats

Compare these approaches to understand storage optimization:

— Non-optimized table definition

CREATE TABLE verbose_sales (

date_time TIMESTAMP_NTZ(9), -- More precision than needed

product_id VARCHAR(16777216), -- Default max length

sale_amount NUMBER(38,6) -- Excessive precision

);

— Optimized table definition

CREATE TABLE optimized_sales (

date_time DATE, -- Sufficient for daily sales

product_id VARCHAR(10), -- Based on actual ID length

sale_amount NUMBER(12,2) -- Realistic precision for currency

);

Best practices:

Managing Time Travel periods

Regular audit of retention periods
Set different retention policies based on data importance.
Monitor and adjust based on actual usage patterns.
Consider regulatory requirements.

Implementing efficient data archival strategies

Archive infrequently accessed data to lower cost storage
Use clone for zero-copy backup.
Implement staged archival process based on data temperature.

Using appropriate data types and compression

Choose minimal but sufficient numeric precision.
Use DATE vs TIMESTAMP when time isn’t needed.
Leverage automatic compression features.

Leveraging zero-copy cloning

Use for development environments.
Create backup copies without duplicating storage.
Test data modifications safely.

Cost Management Framework

Implement proactive cost controls through resource monitors. Here’s a practical example:

— Example of resource monitor creation

CREATE RESOURCE MONITOR monthly_limit

WITH CREDIT_QUOTA = 1000

FREQUENCY = MONTHLY

START_TIMESTAMP = IMMEDIATELY

TRIGGERS ON 75 PERCENT DO NOTIFY

ON 90 PERCENT DO NOTIFY

ON 100 PERCENT DO SUSPEND;

— Applying resource monitor to warehouse

1 2	ALTER WAREHOUSE COMPUTE_WH SET RESOURCE_MONITOR = monthly_limit;

Implementation strategies with real-world considerations:

Setting up resource monitors:
- Create different monitors for development vs. production.
- Set appropriate thresholds based on historical usage.
- Consider business hours vs. off-hours requirements.
Implementing tagging for cost allocation:
- Tag resources by department.
- Track project-specific usage.
- Enable chargeback mechanisms.
Establishing monitoring and alerting:
- Define alert recipients by role.
- Set up escalation paths.
- Configure notification channels (email, Slack, etc.)
Creating cost governance policies:
- Document cost management procedures.
- Define approval processes for quota increases.
- Establish regular review cycles.

Snowflake Monitoring and Governance Best Practices

Like any computer system, it is important to monitor a Snowflake system to make sure that it is running properly. And for a cloud based system like Snowflake, it is even more crucial to make sure you are not using unnecessary resorces.

Monitor Query and Warehouse Usage

Track query execution times and credit usage:

Here’s a monitoring query with example results:

— Example query for monitoring warehouse usage

SELECT

warehouse_name,

COUNT(*) as query_count,

AVG(execution_time) as avg_execution_time,

SUM(credits_used) as total_credits

FROM snowflake.account_usage.query_history

WHERE start_time >= dateadd('day', -7, current_timestamp())

GROUP BY 1

ORDER BY 3 DESC;

— Sample Results:

-- ANALYTICS_WH | 1500 queries | 45.2s avg | 250.5 credits

-- ETL_WH | 300 queries | 180.5s avg | 425.8 credits

-- DEV_WH | 800 queries | 15.3s avg | 85.2 credits

Look for patterns such as:

Unusually long execution times
Unexpected credit consumption
Query patterns outside business hours.
Warehouse sizing mismatches

Implementation of Alerts

Create a comprehensive alerting system that notifies appropriate teams:

CREATE OR REPLACE PROCEDURE SEND_ALERT(

WAREHOUSE_NAME STRING,

CREDIT_USAGE NUMBER,

ALERT_LEVEL STRING,

RECIPIENT_ROLE STRING )

RETURNS STRING

LANGUAGE JAVASCRIPT

// Alert routing based on severity and role

let recipients = {

'DBA': ['dba@company.com'],

'Finance': ['<a href="mailto:finance@company.com">finance@company.com</a>'],

'DevOps': ['devops@company.com']

};

// Alert message customization

let message = `Alert Level: ${ALERT_LEVEL}

Warehouse: ${WAREHOUSE_NAME}

Credit Usage: ${CREDIT_USAGE}

Time: ${new Date().toISOString()

}`;

// Send to appropriate channel based on role

// Add your notification logic here $$;

Conclusion

Optimizing Snowflake for performance and cost efficiency requires a combination of query optimization techniques, storage management, warehouse optimization and cost governance strategies. By implementing these methods, businesses can achieve significant cost savings while improving query execution times. Start applying these strategies today to unlock the full potential of your Snowflake data warehouse.

Remember that optimization is an iterative process – monitor, adjust, and refine your strategies based on actual usage patterns and business needs. Start with the highest-impact areas for your organization and gradually implement more sophisticated optimizations as needed.

Register for Simple Talk

Essential Strategies for Optimizing Snowflake Performance and Reducing Costs

Understanding Snowflake’s Cost Structure

Compute Costs

Storage Costs

Data Transfer Costs

Dig deeper into these components

Immediate Cost Optimization Strategies

Auto-Suspension Configuration in Snowflake

Multi-Cluster Configuration for Scalability

Specialized Warehouse Creation for Workload Segmentation

Best Practices for Cost-Efficient Warehouse Management

Query Optimization Techniques in Snowflake

Leverage Clustering for Better Query Performance

Use Materialized Views for Frequent Queries

Predicate Pushdown Optimization

Window Function Optimization

Understanding QUALIFY Clause

Common Window Function Scenarios and Optimizations:

Join Optimization Techniques

Best Practices for Query Optimization

Storage Optimization in Snowflake

Manage Time Travel Periods

Use Compact Storage Formats

Cost Management Framework

Snowflake Monitoring and Governance Best Practices

Monitor Query and Warehouse Usage

Implementation of Alerts

Conclusion

About the author

Anil Kumar Moka

Anil's contributions

Articles

Books

Top topics

Anil's latest contributions:

Load Data into Snowflake Using Native Snowflake Connector for Python: From Small to Large Datasets

Real-time Data Streaming in Snowflake

Load Data into Snowflake Using Python with Pandas

Understanding Snowflake’s Cost Structure

Compute Costs

Storage Costs

Data Transfer Costs

Dig deeper into these components

Immediate Cost Optimization Strategies

Auto-Suspension Configuration in Snowflake

Multi-Cluster Configuration for Scalability

Specialized Warehouse Creation for Workload Segmentation

Best Practices for Cost-Efficient Warehouse Management

Query Optimization Techniques in Snowflake

Leverage Clustering for Better Query Performance

Use Materialized Views for Frequent Queries

Predicate Pushdown Optimization

Window Function Optimization

Understanding QUALIFY Clause

Common Window Function Scenarios and Optimizations:

Join Optimization Techniques

Best Practices for Query Optimization

Storage Optimization in Snowflake

Manage Time Travel Periods

Use Compact Storage Formats

Cost Management Framework

Snowflake Monitoring and Governance Best Practices

Monitor Query and Warehouse Usage

Implementation of Alerts

Conclusion

Recommended

About the author

Anil Kumar Moka

Anil's contributions

Articles

Books

Top topics

Anil's latest contributions:

Load Data into Snowflake Using Native Snowflake Connector for Python: From Small to Large Datasets

Real-time Data Streaming in Snowflake

Load Data into Snowflake Using Python with Pandas