Essential Strategies for Optimizing Snowflake Performance and Reducing Costs

Comments 0

Share to social media

In today’s data-driven enterprise environment, businesses increasingly rely on Snowflake’s cloud data warehouse for handling large-scale data analytics and storage needs. However, optimizing Snowflake performance is crucial to manage growing data warehouse costs and maintain efficiency. This guide shares actionable strategies for reducing Snowflake expenses while boosting query performance and scalability. Snowflake offers many benefits, but it could also be more expensive if you don’t apply optimizations, and I learned it when my organization’s snowflake bill increased exponentially as the usage increased.

Understanding Snowflake’s Cost Structure

Before diving into optimization strategies, it’s crucial to understand how Snowflake’s pricing model works. Unlike traditional data warehouses, Snowflake’s consumption-based pricing model means you pay for what you use, but this also presents unique opportunities for optimization.

Note: all prices are subject to change and should be used for relative magnitude only. Costs are very subject to change, so be sure to check out Snowflake’s documentation. For current up to date cost structures before committing to a new contract/subscription.

There are three core cost components in Snowflake’s cost structure. Compute, Storage, and Data Transfer.

Compute Costs

Compute costs are billed in credits. Credits are Snowflake’s primary billing unit for compute resources. Every time you run a virtual warehouse, you consume credits based on the warehouse size and runtime. Let me walk through a couple of scenarios in configuring a Virtual Warehouse.

  • Credits/hour are based on warehouse size:
    • X-Small (XS): 1
    • Small (S): 2
    • Medium (M): 4
    • Large (L): 8
    • X-Large (XL): 16
    • 2X-Large (2XL): 32
  • Examples:
    • Scenario 1: Daily ETL Workload – A Large warehouse running for 4 hours/day over 30 days costs 704 credits/month at $2/credit = $1,408.
      • Calculation:
        Daily Credits: 8 credits/hour × 4 hours = 32 credits
        Monthly Credits: 32 credits × 30 business days = 906 credits
        Monthly Cost: 906 credits × $2/credit = $1,812
    • Scenario 2: BI Query Workload – A Medium warehouse running for 8 hours/day over 30 days costs 960 credits/month at $3/credit = $2,880
      • Calculation:
        Daily Credits: 4 credits/hour × 8 hours = 32 credits
        Monthly Credits: 32 credits × 30 days = 960 credits
        Monthly Cost: 960 credits × $3/credit = $2,880

Storage Costs

Storage costs represent data that is being persisted. For example for Active storage, staging, used for fail safe, time travel, platform costs (Cloud platform (AWS, Azure, GCP)). etc..

Types of storage affect your bill:

  • Active Storage: Currently used data
    • Compressed and optimized automatically.
    • Rates typically range from $23 to $40 per TB per month.
    • Varies by region and cloud provider.
  • Time Travel Storage: Historical data versions
    • Cost varies based on retention period.
    • Standard Edition: 1 day included.
    • Enterprise Edition: Up to 90 days
    • Additional storage needs can significantly impact costs.
  • Fail-safe Storage: 7-day disaster recovery backup
    • No direct charge in most editions
    • Included in overall storage calculations.
    • Critical for compliance and recovery

Real-World Storage Example:

  • Items stored:
    • Average Data Storage: 5 TB
    • Time Travel Storage: 1 TB
    • Fail-safe Storage: 2 TB
    • Storage Rate: $40/TB/month
  • Calculation:
    • Total Storage: 8 TB
    • Monthly Storage Cost: 8 TB × $40/TB = $320

Data Transfer Costs

Snowflake charges for data transfer based on different scenarios and directions of data movement. Let’s break down each type with explanations and an example.

  • Cross-region data movement
    • Costs range from $20 to $200 per TB.
    • Varies by cloud provider and regions involved.
    • Some regions have higher costs due to infrastructure.
    • Example: Moving 1TB of data between AWS regions could cost $100-200
      • Moving 1TB of data between AWS regions:
        • US East to US West: ~$20/TB
        • US to Europe: ~$100/TB
        • US to Asia: ~$200/TB
  • External table access
    • Costs depend on source location.
    • Frequency of access impacts total cost
    • Caching can help reduce repeated transfer costs.
  • Data sharing between accounts
    • Provider account bears compute costs
    • No transfer costs for same region sharing
    • Cross-region sharing incurs additional fees.
  • Snowpipe ingestion
    • Additional charges apply for auto-ingestion infrastructure.
    • Per-file ingestion fees may apply

Dig deeper into these components

Understanding these cost components and their variations is crucial for:

  • Budget planning
  • Resource allocation
  • Optimization strategies
  • Cost forecasting
  • Capacity planning

Each component can be optimized independently, but they often interact with each other. For example, better compute optimization might require more storage for materialized views, or reduced storage costs might lead to increased compute costs for real-time processing.

If you want to know more, you can check out the Snowflake Documentation on costs here.

Now that you have a fair idea of how cost is calculated in Snowflake, let’s dive into optimizations by starting with the warehouse optimization which is the highest cost contributor.

Immediate Cost Optimization Strategies

In this section, I will cover some of the more immediate cost optimizations you can use to save money pretty quickly.

Auto-Suspension Configuration in Snowflake

The auto-suspension feature is crucial for controlling costs, especially for workloads with intermittent activity. Warehouses take about 1-2 seconds to resume, so the impact on query performance is minimal for most use cases.

Minimize idle costs using the following SQL:

  • What this does:
    • Sets the warehouse to automatically suspend after 5 minutes of inactivity.
    • Enables automatic resume when new queries arrive.
  • Cost Impact:
    • If your warehouse runs 24/7 without auto-suspension, you’re paying for idle time.
    • For a Medium warehouse (4 credits/hour):
      • Without Auto-suspend: 24 hours × 4 credits = 96 credits/day ($192/day at $2/credit)
      • With Auto-suspend (8 active hours): 8 hours × 4 credits = 32 credits/day ($64/day)
      • Daily Savings: $128 or 67% saving.

Multi-Cluster Configuration for Scalability

Enable auto-scaling for dynamic workloads to adjust clusters to match workload demands. This is particularly useful for scenarios with variable query loads, such as:

  • Morning report generation peaks
  • End-of-month financial processing.
  • Concurrent user query spikes during business hours
  • What this does:
    • Sets minimum cluster count to 1 (baseline capacity)
    • Allows automatic scaling up to 3 clusters based on workload.
    • Uses standard scaling policy for balanced scaling decisions.
  • Cost Impact:
    • While multiple clusters increase concurrent processing capability, they multiply credit consumption:
      • Single Cluster (Medium): 4 credits/hour
      • Three Clusters: 12 credits/hour
    • Peak Hour Cost Comparison (Enterprise Edition at $3/credit):
    • Single Cluster: 4 credits × $3 = $12/hour
    • Three Clusters: 12 credits × $3 = $36/hour

Note: These costs are in addition to the standard warehouse size pricing discussed earlier. Multi-clustering is particularly cost-effective when query queuing would otherwise impact business operations.

Specialized Warehouse Creation for Workload Segmentation

Separate ETL and BI workloads. Different workload types have distinct resource needs and timing patterns. For example:

  • ETL processes often need more compute power but run in shorter bursts.
  • BI queries typically require less power but run throughout the day.
  • Data science workloads might need large warehouses for specific operations.

Here’s how to configure warehouses for these different needs:

ETL:

BI:

What this does:

Creates separate warehouses for ETL and BI workloads.

  • ETL warehouse:
    • LARGE size for processing power (8 credits/hour)
    • 10-minute auto-suspend for batch jobs
    • Starts suspended to prevent unnecessary credits.
  • BI warehouse:
    • MEDIUM size for interactive queries (4 credits/hour)
    • 5-minute auto-suspend for interactive usage
    • Starts suspended to prevent unnecessary credits.

Cost Impact Example:

ETL Warehouse (4 hours/day):

  • 8 credits/hour × 4 hours = 32 credits/day

BI Warehouse (8 hours/day):

  • 4 credits/hour × 8 hours = 32 credits/day
  • Total Daily Credits: 64 credits
  • Daily Cost (Standard Edition): 64 credits × $2 = $128

Compared to single LARGE warehouse running 12 hours:

  • 8 credits/hour × 12 hours = 96 credits
  • Daily Cost: 96 credits × $2 = $192

Daily Savings: $64

The separation of workloads often leads to better overall performance despite using smaller warehouses for some tasks. ETL jobs can run faster on a dedicated large warehouse, while BI queries benefit from consistent, right-sized resources. This approach can actually reduce total processing time and thus credit consumption.

Best Practices for Cost-Efficient Warehouse Management

Understanding how to effectively manage warehouse resources is crucial for both cost control and performance optimization. Here are key strategies based on real-world implementations:

  • Size Planning:
    • Start with smaller warehouses and scale up based on performance metrics.
    • Monitor query execution times and credit consumption.
    • Use query history to identify optimal warehouse sizes.
  • Workload Segregation:
    • Separate ETL and BI workloads to different warehouses
    • Match warehouse size to workload characteristics.
    • Consider time zones for global operations.
  • Auto-scaling Strategy:
    • Set appropriate MIN_CLUSTER_COUNT based on minimum concurrent users.
    • Adjust MAX_CLUSTER_COUNT based on peak load requirements.
    • Monitor scaling patterns to optimize settings.
  • Monitoring Query Performance:

Query Optimization Techniques in Snowflake

Based on experience optimizing enterprise-scale data platforms, here are key query optimization strategies that have proven effective in real-world scenarios:

Leverage Clustering for Better Query Performance

Clustering in Snowflake refers to the physical organization of table data based on one or more columns. Think of it as similar to traditional database indexing, but with some key differences. Here’s how to implement it:

Explanation:

Clustering helps physically organize data to improve query performance. While clustering generally helps queries that filter on clustered columns, it may not benefit for:

  • Queries that scan all data (full table scans)
  • Queries that don’t filter on clustered columns.
  • Tables with frequent updates to clustered columns

In this example, we cluster by transaction_date and store_id. When queries filter on these columns, Snowflake can skip scanning irrelevant data micro-partitions.

  • Reduces I/O and improves query execution time
  • Particularly effective for frequently queried columns

Use Materialized Views for Frequent Queries

Materialized views can significantly improve performance for frequently run complex queries, but they come with maintenance overhead:

Pre-compute results to optimize query execution:

Explanation:

Materialized views pre-compute and store query results

Pros:

  • Reduces computational overhead for repetitive queries.
  • Automatically maintained by Snowflake.
  • Can dramatically improve query performance.

Cons:

  • Consumes additional storage.
  • Updates consume credits.
  • May not be cost-effective for infrequently accessed data.

Predicate Pushdown Optimization

Understanding predicate pushdown helps write more efficient queries:

Explanation:

  • Predicate pushdown moves filtering conditions closer to data source.
  • Reduces amount of data processed and transferred.
  • Allows Snowflake to leverage clustering and micro-partition pruning.
  • Minimizes unnecessary data scanning and movement.

Window Function Optimization

Window functions are powerful SQL features that perform calculations across a set of rows related to the current row. While extremely useful, they can impact performance if not optimized properly. Snowflake provides specific optimizations and features to make window functions more efficient.

Understanding QUALIFY Clause

Snowflake introduces the QUALIFY clause as a unique optimization feature that isn’t available in standard SQL. This clause allows filtering on window functions without using subqueries or CTEs (Common Table Expressions), resulting in more efficient execution and cleaner code.

Common Window Function Scenarios and Optimizations:

In this section I will show a few examples of using window functions:

Top N Records Per Group

Performance Impact:

  • Eliminates the need for temporary result storage.
  • Reduces memory usage.
  • Allows Snowflake to optimize the entire operation as a single unit.
  • Typically, 20-30% faster for large datasets

Join Optimization Techniques

Understanding join behavior and optimization is crucial for query performance:

Explanation:

  • Join filters in the ON clause can improve performance by:
    • Reducing the amount of data before the join
    • Allowing better use of clustering and partition pruning
  • For LEFT JOINs, conditions in the ON clause vs WHERE clause behave differently:
  • ON clause: Filters only the right table
  • WHERE clause: Filters after the join

Choose based on your specific needs and data relationships.

Best Practices for Query Optimization

  1. Use Appropriate Data Types
    • Choose the smallest possible numeric precision.
    • VARCHAR in Snowflake automatically optimizes storage
    • Use VARCHAR with appropriate length
  2. Leverage Query Result Caching
    • Results are cached for 24 hours.
    • Caching works for identical queries including:
      • Same text (including spaces and comments)
      • Same parameter values
      • Same session settings
  3. Minimize Data Movement
    • Use local joins when possible.
    • Avoid unnecessary data transformations.
  4. Monitor and Analyze Query Performance
    • Use EXPLAIN command to understand query execution plan.
    • Review QUERY_HISTORY for performance insights.

Look for patterns in poor-performing queries:

  • Large amounts of data scanned.
  • Excessive spilling to storage
  • Long execution times
  • High credit consumption

To implement effective query optimization in Snowflake, it requires a combination of:

  • Strategic table design
  • Efficient query writing
  • Understanding of Snowflake’s query execution mechanics
  • Continuous monitoring and refinement

By implementing these techniques, you can significantly reduce query execution time and minimize credit consumption.

Storage Optimization in Snowflake

Managing storage effectively is crucial for both cost control and performance. Let’s explore key strategies through practical scenarios.

Manage Time Travel Periods

Time Travel allows you to access historical data, but it comes with storage costs. Here’s how to optimize it:

— Example: Set 7-day retention for frequently changing tables

— Example: Reduce retention for large, infrequently accessed tables

Management Strategy:

  • Set longer retention (7+ days) for critical business data.
  • Use minimal retention (0-1 days) for temporary or staging tables.
  • Monitor Time Travel storage costs through system views.
  • Consider business requirements vs. storage costs when setting retention periods.

Use Compact Storage Formats

Compare these approaches to understand storage optimization:

— Non-optimized table definition

— Optimized table definition

Best practices:

Managing Time Travel periods

  • Regular audit of retention periods
  • Set different retention policies based on data importance.
  • Monitor and adjust based on actual usage patterns.
  • Consider regulatory requirements.

Implementing efficient data archival strategies

  • Archive infrequently accessed data to lower cost storage
  • Use clone for zero-copy backup.
  • Implement staged archival process based on data temperature.

Using appropriate data types and compression

  • Choose minimal but sufficient numeric precision.
  • Use DATE vs TIMESTAMP when time isn’t needed.
  • Leverage automatic compression features.

Leveraging zero-copy cloning

  • Use for development environments.
  • Create backup copies without duplicating storage.
  • Test data modifications safely.

Cost Management Framework

Implement proactive cost controls through resource monitors. Here’s a practical example:

— Example of resource monitor creation

— Applying resource monitor to warehouse

Implementation strategies with real-world considerations:

  • Setting up resource monitors:
    • Create different monitors for development vs. production.
    • Set appropriate thresholds based on historical usage.
    • Consider business hours vs. off-hours requirements.
  • Implementing tagging for cost allocation:
    • Tag resources by department.
    • Track project-specific usage.
    • Enable chargeback mechanisms.
  • Establishing monitoring and alerting:
    • Define alert recipients by role.
    • Set up escalation paths.
    • Configure notification channels (email, Slack, etc.)
  • Creating cost governance policies:
    • Document cost management procedures.
    • Define approval processes for quota increases.
    • Establish regular review cycles.

Snowflake Monitoring and Governance Best Practices

Like any computer system, it is important to monitor a Snowflake system to make sure that it is running properly. And for a cloud based system like Snowflake, it is even more crucial to make sure you are not using unnecessary resorces.

Monitor Query and Warehouse Usage

Track query execution times and credit usage:

Here’s a monitoring query with example results:

— Example query for monitoring warehouse usage

— Sample Results:

-- ANALYTICS_WH | 1500 queries | 45.2s avg | 250.5 credits

-- ETL_WH | 300 queries | 180.5s avg | 425.8 credits

-- DEV_WH | 800 queries | 15.3s avg | 85.2 credits

Look for patterns such as:

  • Unusually long execution times
  • Unexpected credit consumption
  • Query patterns outside business hours.
  • Warehouse sizing mismatches

Implementation of Alerts

Create a comprehensive alerting system that notifies appropriate teams:

 

Conclusion

Optimizing Snowflake for performance and cost efficiency requires a combination of query optimization techniques, storage management, warehouse optimization and cost governance strategies. By implementing these methods, businesses can achieve significant cost savings while improving query execution times. Start applying these strategies today to unlock the full potential of your Snowflake data warehouse.

Remember that optimization is an iterative process – monitor, adjust, and refine your strategies based on actual usage patterns and business needs. Start with the highest-impact areas for your organization and gradually implement more sophisticated optimizations as needed.

Load comments

About the author

Anil Kumar Moka

See Profile

Anil Kumar Moka is a Polyglot Data Software Engineer and Visionary Technical Leader at a leading U.S. bank, bringing 17 years of expertise in data engineering, data security, software engineering and cloud solutions. His innovative enterprise-wide data remediation platforms safeguard over 100 million customers while achieving 95% faster processing times and multi-million dollar cost savings. An AWS Certified Solutions Architect and published researcher, Anil regularly authors influential white papers and technical articles on cloud computing and data engineering. His leadership in designing scalable solutions and driving technological excellence has established him as a thought leader in the industry. Anil holds a Bachelor of Technology in Electrical and Electronics Engineering, complemented by multiple professional certifications that underscore his commitment to technical mastery.