In today’s data-driven enterprise environment, businesses increasingly rely on Snowflake’s cloud data warehouse for handling large-scale data analytics and storage needs. However, optimizing Snowflake performance is crucial to manage growing data warehouse costs and maintain efficiency. This guide shares actionable strategies for reducing Snowflake expenses while boosting query performance and scalability. Snowflake offers many benefits, but it could also be more expensive if you don’t apply optimizations, and I learned it when my organization’s snowflake bill increased exponentially as the usage increased.
Understanding Snowflake’s Cost Structure
Before diving into optimization strategies, it’s crucial to understand how Snowflake’s pricing model works. Unlike traditional data warehouses, Snowflake’s consumption-based pricing model means you pay for what you use, but this also presents unique opportunities for optimization.
Note: all prices are subject to change and should be used for relative magnitude only. Costs are very subject to change, so be sure to check out Snowflake’s documentation. For current up to date cost structures before committing to a new contract/subscription.
There are three core cost components in Snowflake’s cost structure. Compute, Storage, and Data Transfer.
Compute Costs
Compute costs are billed in credits. Credits are Snowflake’s primary billing unit for compute resources. Every time you run a virtual warehouse, you consume credits based on the warehouse size and runtime. Let me walk through a couple of scenarios in configuring a Virtual Warehouse.
- Credits/hour are based on warehouse size:
- X-Small (XS): 1
- Small (S): 2
- Medium (M): 4
- Large (L): 8
- X-Large (XL): 16
- 2X-Large (2XL): 32
- Examples:
- Scenario 1: Daily ETL Workload – A Large warehouse running for 4 hours/day over 30 days costs 704 credits/month at $2/credit = $1,408.
- Calculation:
Daily Credits: 8 credits/hour × 4 hours = 32 credits
Monthly Credits: 32 credits × 30 business days = 906 credits
Monthly Cost: 906 credits × $2/credit = $1,812
- Calculation:
- Scenario 2: BI Query Workload – A Medium warehouse running for 8 hours/day over 30 days costs 960 credits/month at $3/credit = $2,880
- Calculation:
Daily Credits: 4 credits/hour × 8 hours = 32 credits
Monthly Credits: 32 credits × 30 days = 960 credits
Monthly Cost: 960 credits × $3/credit = $2,880
- Calculation:
- Scenario 1: Daily ETL Workload – A Large warehouse running for 4 hours/day over 30 days costs 704 credits/month at $2/credit = $1,408.
Storage Costs
Storage costs represent data that is being persisted. For example for Active storage, staging, used for fail safe, time travel, platform costs (Cloud platform (AWS, Azure, GCP)). etc..
Types of storage affect your bill:
- Active Storage: Currently used data
- Compressed and optimized automatically.
- Rates typically range from $23 to $40 per TB per month.
- Varies by region and cloud provider.
- Time Travel Storage: Historical data versions
- Cost varies based on retention period.
- Standard Edition: 1 day included.
- Enterprise Edition: Up to 90 days
- Additional storage needs can significantly impact costs.
- Fail-safe Storage: 7-day disaster recovery backup
- No direct charge in most editions
- Included in overall storage calculations.
- Critical for compliance and recovery
Real-World Storage Example:
- Items stored:
- Average Data Storage: 5 TB
- Time Travel Storage: 1 TB
- Fail-safe Storage: 2 TB
- Storage Rate: $40/TB/month
- Calculation:
- Total Storage: 8 TB
- Monthly Storage Cost: 8 TB × $40/TB = $320
Data Transfer Costs
Snowflake charges for data transfer based on different scenarios and directions of data movement. Let’s break down each type with explanations and an example.
- Cross-region data movement
- Costs range from $20 to $200 per TB.
- Varies by cloud provider and regions involved.
- Some regions have higher costs due to infrastructure.
- Example: Moving 1TB of data between AWS regions could cost $100-200
- Moving 1TB of data between AWS regions:
- US East to US West: ~$20/TB
- US to Europe: ~$100/TB
- US to Asia: ~$200/TB
- Moving 1TB of data between AWS regions:
- External table access
- Costs depend on source location.
- Frequency of access impacts total cost
- Caching can help reduce repeated transfer costs.
- Data sharing between accounts
- Provider account bears compute costs
- No transfer costs for same region sharing
- Cross-region sharing incurs additional fees.
- Snowpipe ingestion
- Additional charges apply for auto-ingestion infrastructure.
- Per-file ingestion fees may apply
Dig deeper into these components
Understanding these cost components and their variations is crucial for:
- Budget planning
- Resource allocation
- Optimization strategies
- Cost forecasting
- Capacity planning
Each component can be optimized independently, but they often interact with each other. For example, better compute optimization might require more storage for materialized views, or reduced storage costs might lead to increased compute costs for real-time processing.
If you want to know more, you can check out the Snowflake Documentation on costs here.
Now that you have a fair idea of how cost is calculated in Snowflake, let’s dive into optimizations by starting with the warehouse optimization which is the highest cost contributor.
Immediate Cost Optimization Strategies
In this section, I will cover some of the more immediate cost optimizations you can use to save money pretty quickly.
Auto-Suspension Configuration in Snowflake
The auto-suspension feature is crucial for controlling costs, especially for workloads with intermittent activity. Warehouses take about 1-2 seconds to resume, so the impact on query performance is minimal for most use cases.
Minimize idle costs using the following SQL:
1 2 3 |
ALTER WAREHOUSE COMPUTE_WH SET AUTO_SUSPEND = 300, AUTO_RESUME = TRUE; |
- What this does:
- Sets the warehouse to automatically suspend after 5 minutes of inactivity.
- Enables automatic resume when new queries arrive.
- Cost Impact:
- If your warehouse runs 24/7 without auto-suspension, you’re paying for idle time.
- For a Medium warehouse (4 credits/hour):
- Without Auto-suspend: 24 hours × 4 credits = 96 credits/day ($192/day at $2/credit)
- With Auto-suspend (8 active hours): 8 hours × 4 credits = 32 credits/day ($64/day)
- Daily Savings: $128 or 67% saving.
Multi-Cluster Configuration for Scalability
Enable auto-scaling for dynamic workloads to adjust clusters to match workload demands. This is particularly useful for scenarios with variable query loads, such as:
- Morning report generation peaks
- End-of-month financial processing.
- Concurrent user query spikes during business hours
1 2 3 4 |
ALTER WAREHOUSE COMPUTE_WH SET MIN_CLUSTER_COUNT = 1, MAX_CLUSTER_COUNT = 3, SCALING_POLICY = 'STANDARD'; |
- What this does:
- Sets minimum cluster count to 1 (baseline capacity)
- Allows automatic scaling up to 3 clusters based on workload.
- Uses standard scaling policy for balanced scaling decisions.
- Cost Impact:
- While multiple clusters increase concurrent processing capability, they multiply credit consumption:
- Single Cluster (Medium): 4 credits/hour
- Three Clusters: 12 credits/hour
- Peak Hour Cost Comparison (Enterprise Edition at $3/credit):
- Single Cluster: 4 credits × $3 = $12/hour
- Three Clusters: 12 credits × $3 = $36/hour
- While multiple clusters increase concurrent processing capability, they multiply credit consumption:
Note: These costs are in addition to the standard warehouse size pricing discussed earlier. Multi-clustering is particularly cost-effective when query queuing would otherwise impact business operations.
Specialized Warehouse Creation for Workload Segmentation
Separate ETL and BI workloads. Different workload types have distinct resource needs and timing patterns. For example:
- ETL processes often need more compute power but run in shorter bursts.
- BI queries typically require less power but run throughout the day.
- Data science workloads might need large warehouses for specific operations.
Here’s how to configure warehouses for these different needs:
ETL:
1 2 3 4 5 6 |
-- ETL Warehouse CREATE WAREHOUSE ETL_WH WITH WAREHOUSE_SIZE = 'LARGE' AUTO_SUSPEND = 600 AUTO_RESUME = TRUE INITIALLY_SUSPENDED = TRUE; |
BI:
1 2 3 4 5 6 |
-- BI Warehouse CREATE WAREHOUSE BI_WH WITH WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE INITIALLY_SUSPENDED = TRUE; |
What this does:
Creates separate warehouses for ETL and BI workloads.
- ETL warehouse:
- LARGE size for processing power (8 credits/hour)
- 10-minute auto-suspend for batch jobs
- Starts suspended to prevent unnecessary credits.
- BI warehouse:
- MEDIUM size for interactive queries (4 credits/hour)
- 5-minute auto-suspend for interactive usage
- Starts suspended to prevent unnecessary credits.
Cost Impact Example:
ETL Warehouse (4 hours/day):
- 8 credits/hour × 4 hours = 32 credits/day
BI Warehouse (8 hours/day):
- 4 credits/hour × 8 hours = 32 credits/day
- Total Daily Credits: 64 credits
- Daily Cost (Standard Edition): 64 credits × $2 = $128
Compared to single LARGE warehouse running 12 hours:
- 8 credits/hour × 12 hours = 96 credits
- Daily Cost: 96 credits × $2 = $192
Daily Savings: $64
The separation of workloads often leads to better overall performance despite using smaller warehouses for some tasks. ETL jobs can run faster on a dedicated large warehouse, while BI queries benefit from consistent, right-sized resources. This approach can actually reduce total processing time and thus credit consumption.
Best Practices for Cost-Efficient Warehouse Management
Understanding how to effectively manage warehouse resources is crucial for both cost control and performance optimization. Here are key strategies based on real-world implementations:
- Size Planning:
- Start with smaller warehouses and scale up based on performance metrics.
- Monitor query execution times and credit consumption.
- Use query history to identify optimal warehouse sizes.
- Workload Segregation:
- Separate ETL and BI workloads to different warehouses
- Match warehouse size to workload characteristics.
- Consider time zones for global operations.
- Auto-scaling Strategy:
- Set appropriate
MIN_CLUSTER_COUNT
based on minimum concurrent users. - Adjust
MAX_CLUSTER_COUNT
based on peak load requirements. - Monitor scaling patterns to optimize settings.
- Set appropriate
- Monitoring Query Performance:
1 2 3 4 5 6 7 8 9 |
SELECT warehouse_name, COUNT(*) as query_count, AVG(execution_time) as avg_execution_time, SUM(credits_used) as total_credits, SUM(credits_used)/COUNT(*) as credits_per_query FROM snowflake.account_usage.query_history WHERE start_time >= dateadd('day', -7, current_timestamp()) GROUP BY 1 ORDER BY 3 DESC; |
Query Optimization Techniques in Snowflake
Based on experience optimizing enterprise-scale data platforms, here are key query optimization strategies that have proven effective in real-world scenarios:
Leverage Clustering for Better Query Performance
Clustering in Snowflake refers to the physical organization of table data based on one or more columns. Think of it as similar to traditional database indexing, but with some key differences. Here’s how to implement it:
1 2 3 4 5 6 |
CREATE OR REPLACE TABLE sales_data ( transaction_date DATE, store_id VARCHAR, product_id VARCHAR, sale_amount NUMBER(38,2) ) CLUSTER BY (transaction_date, store_id); |
Explanation:
Clustering helps physically organize data to improve query performance. While clustering generally helps queries that filter on clustered columns, it may not benefit for:
- Queries that scan all data (full table scans)
- Queries that don’t filter on clustered columns.
- Tables with frequent updates to clustered columns
In this example, we cluster by transaction_date and store_id. When queries filter on these columns, Snowflake can skip scanning irrelevant data micro-partitions.
- Reduces I/O and improves query execution time
- Particularly effective for frequently queried columns
Use Materialized Views for Frequent Queries
Materialized views can significantly improve performance for frequently run complex queries, but they come with maintenance overhead:
Pre-compute results to optimize query execution:
1 2 3 4 5 6 7 8 9 10 11 |
-- Create a materialized view for aggregated daily sales CREATE OR REPLACE MATERIALIZED VIEW daily_sales_summary AS SELECT DATE_TRUNC('day', transaction_date) as sale_date, store_id, product_category, SUM(sale_amount) as total_sales, COUNT(*) as transaction_count, AVG(sale_amount) as average_transaction_value FROM sales_data GROUP BY 1, 2, 3; |
Explanation:
Materialized views pre-compute and store query results
Pros:
- Reduces computational overhead for repetitive queries.
- Automatically maintained by Snowflake.
- Can dramatically improve query performance.
Cons:
- Consumes additional storage.
- Updates consume credits.
- May not be cost-effective for infrequently accessed data.
Predicate Pushdown Optimization
Understanding predicate pushdown helps write more efficient queries:
1 2 3 4 5 6 7 8 9 10 |
-- Inefficient Query SELECT * FROM large_table WHERE complex_transformation(column) = 'value'; -- Optimized Query SELECT * FROM large_table WHERE column IN ( SELECT column FROM lookup_table WHERE complex_condition ); |
Explanation:
- Predicate pushdown moves filtering conditions closer to data source.
- Reduces amount of data processed and transferred.
- Allows Snowflake to leverage clustering and micro-partition pruning.
- Minimizes unnecessary data scanning and movement.
Window Function Optimization
Window functions are powerful SQL features that perform calculations across a set of rows related to the current row. While extremely useful, they can impact performance if not optimized properly. Snowflake provides specific optimizations and features to make window functions more efficient.
Understanding QUALIFY Clause
Snowflake introduces the QUALIFY
clause as a unique optimization feature that isn’t available in standard SQL. This clause allows filtering on window functions without using subqueries or CTEs (Common Table Expressions), resulting in more efficient execution and cleaner code.
Common Window Function Scenarios and Optimizations:
In this section I will show a few examples of using window functions:
Top N Records Per Group
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
-- Less Efficient Approach (Using CTE) WITH ranked_sales AS ( SELECT store_id, sale_amount, ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY sale_amount DESC) as rank FROM sales_data ) SELECT * FROM ranked_sales WHERE rank <= 10; -- Optimized Approach (Using QUALIFY) SELECT store_id, sale_amount, ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY sale_amount DESC) as rank FROM sales_data QUALIFY rank <= 10; |
Performance Impact:
- Eliminates the need for temporary result storage.
- Reduces memory usage.
- Allows Snowflake to optimize the entire operation as a single unit.
- Typically, 20-30% faster for large datasets
Join Optimization Techniques
Understanding join behavior and optimization is crucial for query performance:
1 |
-- Inefficient Join<br>SELECT <br> s.store_name, <br> t.transaction_amount<br>FROM stores s<br>LEFT JOIN transactions t ON s.store_id = t.store_id;<br><br>-- Optimized Join with Join Filtering<br>SELECT <br> s.store_name, <br> t.transaction_amount<br>FROM stores s<br>LEFT JOIN transactions t <br> ON s.store_id = t.store_id<br> AND t.transaction_date >= '2023-01-01'<br> AND t.transaction_amount > 100; |
Explanation:
- Join filters in the ON clause can improve performance by:
- Reducing the amount of data before the join
- Allowing better use of clustering and partition pruning
- For
LEFT JOIN
s, conditions in the ON clause vsWHERE
clause behave differently: ON
clause: Filters only the right tableWHERE
clause: Filters after the join
Choose based on your specific needs and data relationships.
Best Practices for Query Optimization
- Use Appropriate Data Types
- Choose the smallest possible numeric precision.
VARCHAR
in Snowflake automatically optimizes storage- Use
VARCHAR
with appropriate length
- Leverage Query Result Caching
- Results are cached for 24 hours.
- Caching works for identical queries including:
- Same text (including spaces and comments)
- Same parameter values
- Same session settings
- Minimize Data Movement
- Use local joins when possible.
- Avoid unnecessary data transformations.
- Monitor and Analyze Query Performance
- Use
EXPLAIN
command to understand query execution plan. - Review
QUERY_HISTORY
for performance insights.
- Use
1 2 3 4 5 6 7 8 9 10 11 |
-- Query Performance Analysis SELECT query_text, total_elapsed_time, bytes_scanned, bytes_spilled_to_local_storage, bytes_spilled_to_remote_storage FROM snowflake.account_usage.query_history WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP()) ORDER BY total_elapsed_time DESC LIMIT 10; |
Look for patterns in poor-performing queries:
- Large amounts of data scanned.
- Excessive spilling to storage
- Long execution times
- High credit consumption
To implement effective query optimization in Snowflake, it requires a combination of:
- Strategic table design
- Efficient query writing
- Understanding of Snowflake’s query execution mechanics
- Continuous monitoring and refinement
By implementing these techniques, you can significantly reduce query execution time and minimize credit consumption.
Storage Optimization in Snowflake
Managing storage effectively is crucial for both cost control and performance. Let’s explore key strategies through practical scenarios.
Manage Time Travel Periods
Time Travel allows you to access historical data, but it comes with storage costs. Here’s how to optimize it:
— Example: Set 7-day retention for frequently changing tables
1 2 |
ALTER TABLE sales_data SET DATA_RETENTION_TIME_IN_DAYS = 7; |
— Example: Reduce retention for large, infrequently accessed tables
1 2 |
ALTER TABLE historical_logs SET DATA_RETENTION_TIME_IN_DAYS = 1; |
Management Strategy:
- Set longer retention (7+ days) for critical business data.
- Use minimal retention (0-1 days) for temporary or staging tables.
- Monitor Time Travel storage costs through system views.
- Consider business requirements vs. storage costs when setting retention periods.
Use Compact Storage Formats
Compare these approaches to understand storage optimization:
— Non-optimized table definition
1 2 3 4 5 |
CREATE TABLE verbose_sales ( date_time TIMESTAMP_NTZ(9), <em>-- More precision than needed</em> product_id VARCHAR(16777216), <em>-- Default max length</em> sale_amount NUMBER(38,6) <em>-- Excessive precision</em> ); |
— Optimized table definition
1 2 3 4 5 |
CREATE TABLE optimized_sales ( date_time DATE, <em>-- Sufficient for daily sales</em> product_id VARCHAR(10), -- Based on actual ID length sale_amount NUMBER(12,2) -- Realistic precision for currency ); |
Best practices:
Managing Time Travel periods
- Regular audit of retention periods
- Set different retention policies based on data importance.
- Monitor and adjust based on actual usage patterns.
- Consider regulatory requirements.
Implementing efficient data archival strategies
- Archive infrequently accessed data to lower cost storage
- Use clone for zero-copy backup.
- Implement staged archival process based on data temperature.
Using appropriate data types and compression
- Choose minimal but sufficient numeric precision.
- Use
DATE
vsTIMESTAMP
when time isn’t needed. - Leverage automatic compression features.
Leveraging zero-copy cloning
- Use for development environments.
- Create backup copies without duplicating storage.
- Test data modifications safely.
Cost Management Framework
Implement proactive cost controls through resource monitors. Here’s a practical example:
— Example of resource monitor creation
1 2 3 4 5 6 7 |
CREATE RESOURCE MONITOR monthly_limit WITH CREDIT_QUOTA = 1000 FREQUENCY = MONTHLY START_TIMESTAMP = IMMEDIATELY TRIGGERS ON 75 PERCENT DO NOTIFY ON 90 PERCENT DO NOTIFY ON 100 PERCENT DO SUSPEND; |
— Applying resource monitor to warehouse
1 2 |
ALTER WAREHOUSE COMPUTE_WH SET RESOURCE_MONITOR = monthly_limit; |
Implementation strategies with real-world considerations:
- Setting up resource monitors:
- Create different monitors for development vs. production.
- Set appropriate thresholds based on historical usage.
- Consider business hours vs. off-hours requirements.
- Implementing tagging for cost allocation:
- Tag resources by department.
- Track project-specific usage.
- Enable chargeback mechanisms.
- Establishing monitoring and alerting:
- Define alert recipients by role.
- Set up escalation paths.
- Configure notification channels (email, Slack, etc.)
- Creating cost governance policies:
- Document cost management procedures.
- Define approval processes for quota increases.
- Establish regular review cycles.
Snowflake Monitoring and Governance Best Practices
Like any computer system, it is important to monitor a Snowflake system to make sure that it is running properly. And for a cloud based system like Snowflake, it is even more crucial to make sure you are not using unnecessary resorces.
Monitor Query and Warehouse Usage
Track query execution times and credit usage:
Here’s a monitoring query with example results:
— Example query for monitoring warehouse usage
1 2 3 4 5 6 7 8 9 |
SELECT warehouse_name, COUNT(*) as query_count, AVG(execution_time) as avg_execution_time, SUM(credits_used) as total_credits FROM snowflake.account_usage.query_history WHERE start_time >= dateadd('day', -7, current_timestamp()) GROUP BY 1 ORDER BY 3 DESC; |
— Sample Results:
-- ANALYTICS_WH | 1500 queries | 45.2s avg | 250.5 credits
-- ETL_WH | 300 queries | 180.5s avg | 425.8 credits
-- DEV_WH | 800 queries | 15.3s avg | 85.2 credits
Look for patterns such as:
- Unusually long execution times
- Unexpected credit consumption
- Query patterns outside business hours.
- Warehouse sizing mismatches
Implementation of Alerts
Create a comprehensive alerting system that notifies appropriate teams:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE OR REPLACE PROCEDURE SEND_ALERT( WAREHOUSE_NAME STRING, CREDIT_USAGE NUMBER, ALERT_LEVEL STRING, RECIPIENT_ROLE STRING ) RETURNS STRING LANGUAGE JAVASCRIPT AS $$ // Alert routing based on severity and role let recipients = { 'DBA': ['dba@company.com'], 'Finance': ['<a href="mailto:finance@company.com">finance@company.com</a>'], 'DevOps': ['devops@company.com'] }; // Alert message customization let message = `Alert Level: ${ALERT_LEVEL} Warehouse: ${WAREHOUSE_NAME} Credit Usage: ${CREDIT_USAGE} Time: ${new Date().toISOString() }`; // Send to appropriate channel based on role // Add your notification logic here $$; |
Conclusion
Optimizing Snowflake for performance and cost efficiency requires a combination of query optimization techniques, storage management, warehouse optimization and cost governance strategies. By implementing these methods, businesses can achieve significant cost savings while improving query execution times. Start applying these strategies today to unlock the full potential of your Snowflake data warehouse.
Remember that optimization is an iterative process – monitor, adjust, and refine your strategies based on actual usage patterns and business needs. Start with the highest-impact areas for your organization and gradually implement more sophisticated optimizations as needed.
Load comments