5 key problems from a SQL Server to Azure SQL Managed Instance migration (and how to avoid them)

Like2
Comments 0

Share to social media

Azure SQL Managed Instance (MI) sits in a useful middle ground. It has most of the functionality of a full on-premises SQL Server, with Microsoft handling the engine, backups, OS, and hardware. Plus, it supports cross-database queries and SQL Agent jobs, with fewer limits than Azure SQL Database – making it popular for migrating transactional workloads.

The migration itself isn’t seamless, however. This article reveals the five surprises I encountered from a real production SQL Server to Azure SQL MI migration – and what I learned from them. Includes guidance and advice to make your migration as seamless as possible.

A bit of background

I was recently on a project to migrate a very transactional installation of SQL Server to Azure SQL Managed Instance (MI). SQL Managed Instance is a good stepping stone between a full, on-premises SQL instance / Azure VM and an Azure SQL Database. It has most of the functionality of a full, on-prem instance, with management of the SQL engine, backups, OS and underlying hardware done by Microsoft. It allows you to use cross database queries and run SQL Agent jobs, with fewer limitations than Azure SQL Database migrations.

The migration process isn’t completely seamless. During the migration of this system, we encountered several surprises – five of which are detailed in this article. Hopefully, this will help you avoid, or at least be prepared for these differences from the on-prem version. This also reinforces the importance of testing each aspect of your migration.

The 5 key problems I encountered – and what I learned from them

There’s no support for DNS aliases

If you plan on using DNS aliases to assist the transition to managed instances, know that it isn’t supported. There may be some way to hack a solution, but that’s generally not a good idea. We tried to use aliases to allow clients/ external systems to make the DNS name transition before our production migration.

To put it simply, this was a failure. The Azure environment blocks these aliases. You can only use the Microsoft assigned DNS entry. There are likely also certificate issues if using your own alias. You may be able to hack some solution, but that would be fragile at best, and unsupported.

Plan on client applications migrating at the same time as your server migration. In a best-case scenario, the number of clients hitting your system directly will be limited. But be ready for this coordination when you migrate.

Zone redundancy

An attractive aspect of migrating to MI is the high-availability options available. Zone redundancy synchronizes your server between zones in Azure (refer to the references for detailed zone redundancy information). This is the high-availability option. If a datacenter has issues, your server will fail-over to another zone in your region, when using zone redundancy.

We had significant problems with zone redundancy. After our initial deployments, we enabled zone redundancy in our dev environment. This was successful. After testing, we tried to enable zone redundancy in production. Because server and CPU resources weren’t available in this region (during this deployment), all of our deployments failed. Even environments that were established failed to deploy with no changes.

After many discussions internally, talking to our Microsoft liaisons, and opening a Microsoft ticket, we found that the documentation was incomplete. Zone redundancy was not available in our region and likely will not be available until the 4th quarter of this year. We had this error even though we had successfully enabled zone redundancy in one of our environments.

This is error we saw in our DevOps environment during our attempted deploy:

The lesson learned from this is, confirm that the options you want to use are available before you plan your migration. The documentation is incomplete. Demand for services in Azure is very high and documented resources may be inaccurate. Consult your Azure representative to verify your plan. Even if it works once, you may be unable to deploy later due to these limitations.

Fast, reliable and consistent SQL Server development…

…with SQL Toolbelt Essentials. 10 ingeniously simple tools for accelerating development, reducing risk, and standardizing workflows.
Learn more & try for free

TDE-enabled databases

Moving to Azure Managed Instances is a commitment. If you want to have all of the benefits of a managed server, that probably includes having Microsoft manage the keys for your TDE (encryption at rest) enabled databases. The initial transfer of your databases is a little complicated if it’s TDE enabled. You either need to manage your own keys, which complicates things considerably in the long run, or make the switch to Microsoft managed keys.

If you continue to use your own keys, you need to manage them in Azure SQL MI. This negates some benefits of migrating to MI (primarily, key management and unmanaged PITR restore options between environments). Assuming you don’t want to manage keys, the steps to migrate a TDE enabled database are the following:

  • Disable TDE on the source database

  • Backup the source database

  • Copy the backup to an MI accessible location (blob storage)

  • Restore the backup

  • Enable TDE on the MI

  • Delete the unencrypted backups

  • Re-enable TDE on the source system (if the source database is not immediately deleted)

Using Microsoft managed keys allows Point-in-Time-Recovery (PITR) to become seamless between environments, including databases protected with TDE, and key management is handled for you. The extra steps are well worth the migration effort. Remember to allow extra time in your migration for the decryption / encryption steps. This will vary considerably depending on your database size, MI performance tier, and source hardware.

Traditional backup/restore

Backup and restore options when using MI are limited. Point in time restore (PITR) is the recommended method and provides the most functionality. It allows restores between different databases on the same server to be easily synchronized. One of the primary reasons for moving to MI instead of individual SQL Databases is cross database queries and dependencies between databases. PITR makes this easier.

You can still use the traditional backup / restore process in MI environments, with limitations. You need to use a COPY_ONLY backup. To limit scope of our migration effort, we continued to use our old backup / restore process between environments using a COPY_ONLY backup scenario. This worked most of the time. Most of the time is a problem with things like backups and restores. Some of our restores failed. Since our databases were interdependent, one database failing left the entire environment unreliable and potentially unusable.

Our failure and diagnosis looked like the following:

  • Copy Only restore attempted
    • Failed after an extended period of time with “Unable to cycle error log” message
      • Unable to cycle error log file from <FileLocation> to <FileLocation> due to OS error ‘32(The process cannot access the file because it is being used by another process.). A process outside of SQL Server may be preventing SQL Server from reading the files.
    • The restore may work on different servers / environments
    • Database restores for the same server may work for different databases

  • Errors start to show up in the server error log soon after the restore starts (well before the error message is returned to the restore process)
    • Page corruption
      • RESTORE detected an error on page (1:12345) in database <GUID> as read from the backup set.
      • [CorruptedPageList: InsertEntryImITS]: Page (1:45678) in database <GUID> is suspect
      • Error: 3183, Severity: 16, State: 1

  • DBCC CHECKDB run on source database
    • No issues
    • This was the nightmare scenario – needed it to be ruled out as soon as possible

  • RESTORE VERIFYONLY run against backup file
    • No issues
    • Backup was able to be restored to a different server

The restore process didn’t fail on every server. This, in addition to checking the source with CHECKDB and the restore with VERIFYONLY, indicated that it was a problem with the destination server. This item also involved opening a Microsoft ticket after checking with our Microsoft architect. The issue seemed to be I/O buffer related. Using the following options during the restore fixed the issue:

  • Restore options added
    • BLOCKSIZE = 65536
    • MAXTRANSFERSIZE = 4194304

This fixed the issue, but PITR is the recommended method for SQL MI backups and restores. It’s what you should plan on using. Using a traditional, COPY_ONLY, backup method is technical debt. The first question asked by every level of support was, “Why aren’t you using PITR?”. Move to PITR as soon as possible for all of your MI processes.

SQL agent job errors

The final blocker for us to migrate to SQL MI was SQL Agent job errors. In our non-production environments, we had errors requiring investigation and changes to the servers by Microsoft. They were intermittent issues, making fixing this very difficult. Resolving the errors required us to open Microsoft support tickets.

The Agent job issue was relatively easy to fix, since our Microsoft architect was able to find it in the internal support database. That database is not available to customers, so it was only accessible by our support team.

We don’t interact with localhost, so it was clear it wasn’t our issue. We also don’t specify port 11,000. We also saw ports above 11,000 (i.e., localhost,11007) – which we also don’t specify. (both of these items were verified by querying the job step table, dbo.sysjobsteps, in msdb). The fix for this was out of our hands. The Microsoft team had to roll back an ODBC driver on the server. After the driver was reverted, this error disappeared.

Agent Jobs are one of the reasons for selecting SQL MI, so this was very disruptive. We will eventually move to a different technology, such as Elastic Jobs or ADF, but be sure thoroughly test your jobs under load.

Summary

SQL Managed Instances are attractive to enterprises that don’t want to be responsible for all of the maintenance of a full SQL Server installation. That includes backups and engine maintenance. We encountered a few issues that would have stopped our migration if we hadn’t tested first or had help from the various Microsoft teams.

We were able to hit our deadlines and had a successful migration, but – it wasn’t a smooth process. There are a few items that may hamper your migration. If you know about these issues before you begin, you are more likely to be successful. Test thoroughly, talk to the technical team about hardware availability in your region, and be ready to open tickets to resolve issues. And check your error logs during and after the migration. I’ll discuss other differences and surprises in SQL MI in later posts.

References

Simple Talk is brought to you by Redgate Software

Take control of your databases with the trusted Database DevOps solutions provider. Automate with confidence, scale securely, and unlock growth through AI.
Discover how Redgate can help you

FAQs: Migrating to Azure SQL Managed Instance

1. Are DNS aliases supported in Azure SQL Managed Instance?

No. Azure blocks DNS aliases for SQL MI, and you can only use the Microsoft-assigned DNS entry. Custom aliases also create certificate issues. Plan to migrate client applications at the same time as the server – there’s no clean way to stage the DNS transition ahead of cutover.

2. Is zone redundancy always available for Azure SQL MI?

No. Zone redundancy availability depends on region and hardware generation, and the documentation can be incomplete or out of date. It’s possible to enable zone redundancy successfully in one environment and then have production deployments fail with ManagedInstanceZoneRedundantFeatureTemporarilyDisabledPerHardwareGeneration. Confirm availability with your Azure representative before planning around it.

3. How do you migrate a TDE-enabled database to Azure SQL Managed Instance?

If you’re switching to Microsoft-managed keys (recommended): disable TDE on the source, back it up, copy to blob storage, restore on MI, re-enable TDE on MI, delete the unencrypted backups, and re-enable TDE on the source if it’s staying online. Managed keys make Point in Time Recovery seamless across environments. Allow extra migration time for the decrypt/encrypt steps.

4. Why do traditional COPY_ONLY restores fail on Azure SQL MI?

Some COPY_ONLY restores fail with “Unable to cycle error log” or page corruption errors that aren’t present on the source (CHECKDB and RESTORE VERIFYONLY both pass). It’s an I/O buffer issue on the destination. Adding BLOCKSIZE = 65536 and MAXTRANSFERSIZE = 4194304 to the restore resolves it – but PITR is the recommended approach and what Microsoft support will steer you toward.

5. Why are SQL Agent jobs failing with "tcp:localhost,11000" errors?

This is a Microsoft-side issue tied to an ODBC driver on the managed instance, not something in your job configuration. The fix requires Microsoft to roll back the driver on the server, which means opening a support ticket. Test Agent jobs under load before cutover.

Article tags

About the author

Ben Johnston

See Profile

Ben is a data architect from Iowa and has been working with SQL Server since version 6.5 in the late 90's. Ben focuses on performance tuning, warehouse implementations and optimizations, database security, and system integrations.