In 2010, it is estimated that 1.2 zettabytes (one zettabyte equals one trillion gigabytes) of data were created. In 2011 an estimated 1.8 zettabytes of data were created. And by 2020, it is estimated that data will grow by another 50 times. And if that is hard to believe, it is estimated that the IT staff who will be managing this data will grow only about 1.5 times1 Oh, my aching hard drive!
So why is data growing so fast? There are many reasons, but some of the most common ones include …
- Growing businesses
- Business acquisitions
- Increased use of data warehouses and OLAP cubes
- Business protection, such as backups, mirroring, and much more
- An increased need for keeping data, even older data, online
- More reporting needs from core business systems
- Government and other regulatory bodies
- More “bigger” data stored (images, files, etc.)
… and many more2
As a DBA, do you see any of this slowing down? Of course not. The rapid growth of an organization’s storage footprint will continue to grow, and as DBAs we need to start thinking of new ways to deal with this growth.
When a lot of managers think about data storage, they think of the size of production databases. What they don’t often realize is that most of the data in their organization’s production databases are duplicated, often many times over. In fact, in most organizations, the total size copies of production databases makes up the bulk of their storage footprint, while production databases only make up a small portion of the storage footprint.
For example, do any of these sound familiar to you?
- Backups: How many copies do you keep of each database? One, two, seven, thirty, or more?
- Database Snapshots: While not a lot of organizations use database snapshots, those that do can easily duplicate productions databases many times over.
- Log Shipping: While log shipping is not a sexy way to enhance high availability, it is still popular, and will double the size needed for each production database being log shipped.
- Mirroring (SQL Server): Although database mirroring has been depreciated and will eventually be dropped from SQL Server, it is still a very popular way to protect data. Of course, mirroring doubles the size of the data storage needed per database.
- AlwaysOn Availability Group Replicas: SQL Server 2012 has introduced AlwaysOn technology, which will be replacing mirroring. AlwaysOn’s ability to create multiple database replicas at the same time will further increase the need for more disk space.
- SAN Mirroring/Snapshots: If you have a large SAN environment, you may be employing SAN mirroring or snapshots, which requires, at a very minimum, double the space of a production database.
- Image Backups: Some organizations perform full image backups of their servers, duplicating production data
- Reporting: Some organizations use copies of production databases for off-loading of reporting or other read-only needs.
- Imports/Exports: If you use SSIS to import or export large amounts of data, the temporary space it takes is often very large, and of course duplicates data already stored in production databases.
- Development: If you develop your own applications, at the very minimum you will need a development copy of each production database. In some cases you may need more, assuming each developer wants their own development copy.
- Test: Whether you develop your own apps, or use third-party apps, you will need at least one test copy of each production database for testing.
- Staging: Some development shops also have separate staging servers where one or more copies of a production database reside.
- Replication: If a production database is a publisher, you will have a distribution database, and an unknown quantity of subscriber databases, where data can be duplicated over and over.
- Training: Some organizations with internal training use copies of production databases to train new employees on how to use internal applications.
- Archival: Often when data is no longer actively used, it is archived into different databases, even though it still may remain online in some cases. As time passes, archival data never stops growing.
- Potential Litigation Defense: Some companies keep copies of production databases for long periods of time, in not indefinitely, just in case they get sued and need to produce data to prove their case.
- Compliance: Government, regulatory, and auditing firms require that data be stored for specific lengths of time, further duplicating data. This also includes auditing databases that store auditing records.
Not all of these affect every organization, and some of these overlap each other, but I am sure that all of you have experienced many of the above data duplication issues.
According to one study2, the most common way organizations deal with non-stop data growth is to buy more storage. While the cost per gigabyte of storage has been coming down over the years, especially on low-end storage devices, the cost of adding more storage to a SAN, NAS, DAS is not cheap, especially if you include the actual total cost of storage. Here is what I mean.
When many people think about the cost of adding more storage, they are thinking about the cost of adding additional disk drives. But this is seldom the only cost. Depending on your storage ecosystem, adding more disk drives may include the following additional costs, above and beyond the cost of the disk drives themselves. Some of them include …
- Upgrading hardware (adding more drives sometimes require more cabinets, HBAs, controllers, switches, cables, etc.)
- Consulting and/or installation costs
- Annual hardware maintenance contracts
- Annual hardware and software support costs
- Additional software license fees
- Additional administration time(more staff) to manage the extra hardware
- Backup battery/generator costs
- Electricity costs
- Cooling costs
- Physical space costs
According to some research, the Total Cost of Ownership (TCO) for SAN storage can range from $5,000 to $10,000 per terabyte.
As DBAs we have little or no control of the above costs, but there are ways that we can make a difference in reducing our organization’s storage footprint, and that is by employing data compression at both the database and backup levels. Of course these are not your only options, but they are low-hanging fruit that can be easily and relatively inexpensive to implement. In fact, they are generally much less expensive and easier to implement than by purchasing additional hardware.
Note: Some editions of SQL Server include data compression and compressed backups. In many cases, these are only available in the Enterprise Edition, or in more recent editions. Even if you are using the right editions to get native data or backup compression, the native compression features are basic and don’t offer the features and flexibility offered by SQL Storage Compress and SQL Backup Pro.
Using SQL Storage Compress to Shrink the Storage Footprint of Duplicate Databases
SQL Storage Compress takes your live database files and compresses them on disk, saving as much as 90% (73% is average) of the original size of MDF and NDF files. A database compressed with SQL Storage Compress behaves exactly like any normal SQL Server database, significantly reducing your SQL Server database disk storage needs and helping you to reduce your storage costs. It can be used for many different purposes, including
- Log Shipping
- Potential Litigation Defense
- Auditing Databases
… And more
For example, if you have a 1 TB production database and there happens to be five copies of it in your organization (live copies, not backups), and assuming the 73% average data compression it offers, SQL Storage Compress can turn 5 TB of storage footprint into only 1.35 TB, a huge savings in your organization’s storage footprint.
So how does SQL Storage Compress work? Before I can answer this question, you need a high-level understanding of how SQL Server interacts with the Windows Operation System (OS). While SQL Server performs many of its own tasks internally using the SQLOS (SQL Server Operating System), one of the things it doesn’t handle is the reading and writing of data directly to disk. Whenever SQL Server needs to read or write to a file, it passes the I/O request to the Windows I/O Manager, which is a part of the OS’s kernel. At that point, the Windows I/O Manager passes the I/O request to a device driver, and eventually the data is read from, or written to, disk.
When SQL Storage Compress is set up, it installs and starts what is called the HyperBac Control Service. This service is like any other OS service and runs under the LocalSystem security account. Its job, in cooperation with the Windows I/O Manager, is to intercept the read and write requests from SQL Server, compressing or decompressing them as necessary. SQL Server doesn’t even know that the HyperBac Control Service exists. The database engine just passes its I/O requests to the Windows I/O Manager. And since SQL Storage Compress works within the context of the Windows I/O Manager, SQL Server is none the wiser. The SQL Server instance remains unaware that the database files are being compressed and decompressed, and works just as any database files do, reading and writing data to disk.
As you might imagine, there is no such thing as a free lunch. SQL Storage Compress compression takes additional CPU resources to perform its work, just as native SQL Server compression requires additional CPU horsepower. On the other hand, because data is compressed, IO is substantially reduced, helping to boost IO performance.
Using SQL Backup Pro to Shrink the Storage Footprint of Backups
It is rare to see an organization keep only a single backup of its production databases. I have worked in companies where dozens of backups of each production database are kept, and they can use a lot of space if they are not compressed.
Besides offering multiple levels of compression, up to 95%, (80% is about average) SQL Backup Pro offers many other features, such as significantly reduced backup and restore times, backup encryption, centralized administration, backup verification, and much more.
For example, if you are keeping two weeks of backups for a 1 TB production database, by using SQL Backup Pro, you can turn that 14 TB of backups into about 4 TB of backups, a huge 10 TB of savings just for a single production database.
SQL Backup Pro is easy to administer from a single computer, and it can create backups for hundreds of databases by automatically creating the necessary SQL Agent jobs on the relevant instances. As jobs are running, or have finished, you can view them on management console, making backup administration easy for every DBA.
So how does SQL Backup Pro work? Generally speaking, the SQL Backup Pro GUI interface can be installed on any computer that allows access to your SQL Servers on the network. In addition, an agent service must be installed on each SQL Server instance to be backed up. Once the all the software is installed, the SQL Backup Pro GUI can be used to create backup compression jobs for any SQL Server instance, all of which can be monitored from the GUI’s main screen.
While native SQL Server compression only offers a single compression option (and requiring additional CPU resources to perform backup compression), SQL Backup Pro offers four different levels of compression. The only tradeoff that has to be made for getting additional compression is to use a few more CPU cycles. Level one offers the least amount of compression and uses the least amount of additional CPU. Level four offers the most amount of compression, but uses slightly more CPU resources.
For example, if you have a large database that needs backed up, and you perform the backup during a time of the day when your server is not particularly busy and has available CPU resources, you can choose level four compression. Depending on the compressibility of the data, using level four compression you can reduce a backup by as much as 50% smaller than over level one compression. This can be a big difference if you are backing up a huge database.
As DBAs, we can’t solve the problem of the enormous growth of data in our organizations, but by following best practices, and employing the proper use of such tools as SQL Storage Compress and SQL Backup, especially on duplicate databases, we can make a significant dent, helping our organizations save money as the reduce their storage footprint.
- 1 IDC Report, Extracting Values from Chaos, June 2011,
- 2 IOUG Report, The Petabyte Challenge, August 2011,