Why is Database Cloud Backup any different?
General PC, SAN or Server backup solutions take no account of the ‘transactionality’ of relational databases. They are also optimized for general files. They are not best-fitted for the task of restoring a database.
Almost any disaster-recovery service-level agreement demands that a production database can be restored to a point in time, and within a particular time, and in a way that leaves the database in a consistent state. It demands that the data be stored in a way that complies with all statutory requirements for security.
The general categories of Database Cloud Backup
The Managed Backup Service
Most cloud-based managed services for data are developed from generic enterprise-wide disaster-recovery systems. They deal with the special requirements of databases by providing agents that are installed on the server and interact with the database server’s ‘native’ backup service by hooking into the SQL Server VDI interface.
For corporate backup of databases, Seagate EVault, EMC2 Avamar, Netbackup and CommVault all provide Backup as a Service (BaaS) products via some of the large cloud service providers. Typically, the backup service is generic, but individual installable agents exist for the intelligent backup of application software such as Exchange or SQL Server. These products generally allow users to sign up for new services, change service levels, and perform basic tasks through a web-based portal.
By using this type of solution, companies can draw upon the elastic resources that cloud infrastructure delivers and pay only for what they need. No capital cost is involved.
Offsite Backup to Cloud
Offsite backup is so important that any secure offsite backup is far better than none. Because this merely involves copying existing backup files offsite to secure storage via the internet, there are a wide range of cloud services that can be used, and they vary so much in their design and features that it would be misleading to compare them directly.
Copy of Backup to Cloud (Hosted Storage)
Once the local backup is performed, it can be copied to the cloud. The manufacturers of several NAS devices, for example, provide facilities to backup or restore specific directory trees with cloud BLOB storage (e.g. Synology to Amazon Glacier ). Several companies, such as CrashPlan or Backblaze offer a general backup service of particular directories or filetypes where the restore process to that directory is automated, but the local administrator must then restore the database.
Cloud Backup becomes relevant with larger volumes of data, usually stored in an encrypted, compressed form in BLOB storage such as Amazon Glacier, and restorable only by using the same application that created the backup. There is usually a guarantee of data being stored within a ‘legislative area’ and sometimes a geographical area, but nothing precise.
Cloud vendors such as Azure or Amazon S3 usually offer a higher level of availability and scalability by keeping several redundant copies of data across several facilities. Even if several device failures happen at once, this is quickly detected and any lost redundancy can be repaired. When a cloud service processes a request to store data, it will store copies across several facilities before reporting the success of the request. The user is charged a fee based on the disk space used, bandwidth required and the amount of access required. The provider may throttle the bandwith allowed, rather than allow peaks of access that are necessary for a timely database restore.
The main advantages of sending backup data to the cloud is that costs are only accumulated when the resources are used and are the cost is scaled to the users’ performance needs. It also allows the available storage space to be increased rapidly.
Cloud ‘drive’ Storage (File Lockers)
There are a number of companies that provide a file-based general ‘backup’ service to the Cloud. These include services such as SkyDrive, Box, SugarSync, DropBox, iCloud, Mozy, Google Drive, Carbonite and Memopal. These can be used for offsite database backups.
Cloud ‘drive’ storage provides a good way for individuals to have storage that can be accessed from a number of different devices, or synchronized between them. It can also be used to copy precious files offsite in case of disaster. Some Cloud storage services allow sharing of files between individuals. There have been security lapses in the past.
Even though problems are unlikely with the bigger players, there have been problems with cloud backup providers. Streamload , which became MediaMax and then The Linkup is an example of a Cloud Backup that went bad, before expiring altogether. They even managed to delete a proportion of their clients data before doing so.
One problem with using this sort of solution as an effective substitute for traditional Offsite database Backup, is if you have personal or financial data. Offsite database backups must, to comply with the law, involve the storage being locked to a particular location with guaranteed levels of security in depth all the way from advanced encryption and logged attempts at intrusion. This isn’t usually available at the ‘file-locker’ end of the market.
Another problem with this solution is that, if you are planning to store a number of backups in a file locker, you are likely to rack up storage costs more quickly than with a specialist provider. This is because backup files, being block-based, are particularly amenable to deduplication and the managed BaaS providers generally use this to minimize these costs.
What does a Managed Service involve?
These services provide a service based on three components: the agent, the server, and the portal.
The agent is similar to a standard third-party database backup tool in that it acts by becoming a driver for a logical backup device on the SQL Server. In order to make best use of the restricted bandwidth of the internet, it uses compression techniques to reduce the size of the backup, and then usually encrypts the result before saving it to disk locally and then shipping it securely using a compressed data-stream to offsite Cloud storage. The agent also performs restores in a logical reversal of the process. There is generally a system for retrying in the case of an interruption to internet connectivity, and a throttling of bandwidth to prevent the process taking all the available connectivity.
The server is generally a cloud-based application. It keeps track of the backups, locates the right backup set for a restore, and runs the deletion of old backups in conformance with the retention policy. Deduplication is generally also done by the server since it is most effective with large sets of similar files
The portal is the administrative console that allows users to monitor the progress of backups, to initiate restores and check on the charges accrued through use of the service.
Why consider a managed service?
So far, managed services have been adopted most often by companies wishing to implement a company-wide disaster-recovery solution. Normally it is chosen as a way of responding to a definition of the recovery-time and Recovery-point objectives of the business. A managed service is likely to meet the needs of users who want the entire backup process including the offsite backup to be done by someone else. A managed cloud service is also ideal for home users, and small businesses with thirty or fewer servers who lack the expertise to administer backups, or who do not have the necessary storage infrastructure to do backups in-house. It is particularly suitable within companies that have spare internet bandwidth, and is particularly valuable for PCs for which backups can be scheduled out-of-hours, or where staff travel frequently.
Companies that have outlying offices, sites, and home-workers, managed backup systems provide an excellent way of ensuring that even the data outside the corporate network is backed up to the same standard as the rest of the enterprise. For busy DBAs, it is a good way of ensuring that outlying databases are included in a disaster-recovery plan
For many companies who are attempting to grow rapidly, BaaS allows them the only practical way to put in place an effective disaster recovery plan.
Is a managed service a complete answer to backups?
Because a managed database backup solution is geared mainly towards disaster recovery, the DBA in a company that already has a managed backup service still has to do backups for other purposes such as rapidly copying databases, object-level recovery, filegroup restores, archiving, consistency checking, and audit. This means that the saving in local disk space can prove to be less than suggested. An enterprise-wide Disaster-recovery scheme can remove from the DBA the opportunity to do test restores to ensure that the end-to-end process works.
The Database Administrator may not require a full restore. It is quite common to do a filegroup or object-level restore. The DBA may need to use backups to do an audit trail to determine when a particular data change was made, or a forensic search for a DDL change in a production system. Backups may be needed for staging servers, as part of a deployment process. With the more general BaaS offerings, it is unclear how the service will accommodate these requirements.
Advantages of Cloud Database Backup
The general advantages of any automated system that copies backups to cloud storage are that an offsite backup becomes so much easier to do that it is less likely to be neglected. It is particularly useful where a database is included in a turnkey system such as an accounting or payroll system. This means that the vendor can ensure that offsite backups are taken.
Why isn’t it as simple as uploading backup files?
Bandwidth at the right time
Most Cloud Backup and Restore systems are based on the assumption of plentiful internet bandwidth. In reality, bandwidth restriction can limit the speed and efficiency of both backup and restore processes.
It is possible, by the use of data compression and by employing several concurrent data streams, to attain data throughput of around 50Mb/s, but this is unrealistic without special software. It would be too expensive for SMEs and experience has proved that internet connectivity if unlikely to be good after a widespread natural disaster such as a hurricane.
In the event of a restore being required, the available internet bandwidth is most likely to throttle the process, since it is likely to be lower than the potential restore speed. Often, in the case of a disaster, internet connectivity is diminished or absent, and restores have to be done from disk delivered by a courier. Providers such as Amazon offer a service that sends a full backup of data on portable hard drive by using express mail after a disaster in the local operational database. Even with this service, it could prove difficult for inexperienced staff to identify the correct backup, especially if they are used to enjoyiong a hands-off BaaS service
Even if the local internet connectivity survives a disaster intact, it still would take over two days to download a 400gb data backup, assuming a 15MB transfer rate.
An alternative solution after a disaster is for the provider of the managed solution to offer a hot standby that can be spun up and restored to. This would be fine for a public service, but if it is a local company they would need the internet connectivity to access the new cloud server.
In a nutshell. The time it takes to copy the data to Cloud must be less than the interval between backups.
The amount of data that can be backed up effectively in a given time is governed by the internet bandwidth available. This currently averages out, for each connection, at download speed of 15Mbps, whereas upload speed is just 4Mbps, far less when compared with the rate of saving on local storage. If the data rate of the backups process gets near the internet speed it will hog all the existing bandwidth, slowing down other Internet business processes. Even with ‘plentiful’ bandwidth, backup speed will still very slow compared to backup to NAS / SAN
To provide a point-in-time recovery, backup must be fairly continuous. If bandwidth is in short supply, then it will have to be throttled, and if the backup rate exceeded the limit, the transfer of data offsite would have to be buffered up to be transferred offsite at times of the day when the business has few demands on internet bandwidth.
What a DBA should look out for in a managed service.
The agent installed by the provider to do the backups makes demands on the CPU resources of the database server, particularly if the agent uses some form of data compression or deduplication on the data to moderated demands on bandwidth.
The use of a Database BaaS does not allow the user to avoid their obligations for compliance with legislation. This demands security for data transmission, the management of access to backups and security within the client software itself. There must , for example, be Over-the-wire encryption, and at-rest encryption, whilst the data is stored in remote data centers.
Database Baas shifts the cost of providing backup from a capital expense to an operational cost. However, the latter is, in many cases, more difficult to predict, especially if working on a metered rate.
Database BaaS is only practical if advanced compression techniques are used. However, deduplication technology can introduce a question mark over resilience.
Block-level deduplication means that a data block is only held once, and pointers replace subsequent identical blocks. If this applies only to the backup, then resilience is not greatly affected, but if the algorithm is taken to its logical conclusion, then data blocks are stored only once for the database, over time, and only the difference is backed up. Although it makes for spectacularly high compression, it introduces a potential for the backups to be more vulnerable.
Recent advances in DeDuplication have removed some of these doubts, but a wise DBA will do regular test restores.
Backup strategies are not Recovery strategies
The high levels of compression and deduplication that make BaaS feasible can cause problems when timing is crucial, at the point of recovery. Deduplication has to be reversed to effect the restore process and with some deduplication algorithms, this is not always possible at the client/agent end. If done at the server, and this is likely with current services, it will result in having to transmit the entire backup over the internet. Before using Database BaaS, it is important to get timings from a series of recoveries over a range of data, differing in type and volume.
With the wide range of Cloud-based backup facilities available for offsite-backup, it is no longer excusable to omit an offsite backup of data. The canny DBA is likely to pick the service that is most appropriate for their particular requirements, the type of data, and their company’s disaster-recovery plan. In considering this, the time taken to restore the database is likely to be an important factor.
The corporate managed service certainly provides offsite backup, but is too closely geared to the needs of disaster-recovery to be a generic solution to database backup.
For many small enterprises, Database Backup as a Service is a godsend, because it provides resilience, offsite archiving and disaster recovery that are impossible any other way. Whatever the size of the enterprise, a managed service is likely to be part of an overall backup regime. A managed service will providing improved resilience, as it can back up the data for a minority of staff or locations that cannot be reached by the current corporate backup solution, such as the data held by ‘road warriors’ and homeworkers.
Editor’s note: this was originally published 06 March 2013