Questions About Amazon Data Migration Service (AWS DMS) That You Were Too Shy to Ask

Can you imagine it? You are in a group of smart database people, and they are debating the finer points about AWS DMS, and you don't even know what the letters stand for. You just feel too shy to ask those basic questions that seem ridiculous once you're up to speed. Laerte Junior answers all the questions you need answers for when facing the prospect of getting familiar with Amazon's useful Database Migration Service

  1. What is AWS DMS (database migration service)?
  2. I am planning to use DMS to migrate my data to the cloud. Is there anything in particular that I need to be concerned about?
  3. So I’m told that I need to be concerned with the optimum size of the Replication Instance, what is a Replication Instance?
  4. When would I need MULTI-AZ?
  5. Do I really need to enable MULTI-AZ if I don’t need to use CDC?
  6. Is there some performance metric that I should be concerned about in the replication instance?
  7. If, when I check my replication instance and find that it isn’t the size I need, can I just resize?
  8. How can I know which replication instance to use?
  9. Does DMS have some tool to help me in the migration that includes the Schema?
  10. Do I have limits on the number of replications instances for DMS?
  11. What are endpoints in DMS jargon?
  12. What are the sources and targets for DMS?
  13. Can I use event and notifications in DMS , so that when creating a replication instance, I can be notified by email?
  14. Can I use CLI for the operations in DMS?
  15. Can I perform data transformation on the fly?
  1. What is AWS DMS (database migration service)?

    DMS is a tool that helps you to migrate your data to the cloud and which also enables ongoing replication. It supports most of the widely-used databases, so it is a fantastic heterogeneous tool for doing migrations. It’s simple and reliable to use. It is also robust because it continually monitors the source, target, replication instances and network so that, if something happens that affects the task, it can then restart the process from where it stopped once the problem has been cleared.

    DMS is one of those services from AWS that I really enjoyed working with. Of course, it has some limitations but I can wholeheartedly recommend using DMS if you want to migrate your data to the cloud.

  2. I am planning to use DMS to migrate my data to the cloud. Is there anything in particular that I need to be concerned about?

    Planning is the most important part of the any migration, of course. As with any other tool, AWS DMS has its limitations, and you will need to use some general good practices to migrate your data using DMS. These good practices will include separating big tables from small tables in several DMS tasks, reducing load in the source, loading of multiple tables in parallel and enabling the task log. As well as the need to understand how to use the tool, it helps to pay special attention to migrating LOB data. Most importantly, you will need to determine the right size of your replication instance.

    A lot more details of best-practice for using AWS DMS are contained here in AWS DMS Best practices:Determining the Optimum Size for a Replication Instance

  3. So I’m told that I need to be concerned with the optimum size of the Replication Instance, what is a Replication Instance?

    It’s the most important part of the implementation of DMS. All the migration, data conversions and data manipulations that need to be done to a user’s data-migration is made in the replication instance. It’s an EC2 instance that will run everything, and DMS will try to put all its data into memory to speed up the process. The wrong choice of resources such as disk, CPU or memory in the replication instance can slow down the tasks, or even stop them with a failure.

    Replication Instances for AWS Database Migration Service

  4. When would I need MULTI-AZ?

    The replication instances also can be setup in a MULTI-AZ environment that allows you to have a failover support because it creates a standby replica of your replication instance in another Availability Zone. In some cases when you use CDC, it’s a good option because you will have a backup; but the ongoing replication may be slower. Everything that is done in one node is replicated to the standby, of course. If you are not using CDC, you don’t need to enable it because you can always load the tasks again.

  5. Do I really need to enable MULTI-AZ if I don’t need to use CDC?

    Actually no. It’s advised to enabled MULTI-AZ only if you will use CDC

    Basically, your task can be one of three types; Load-only, Load and ongoing replication, or just ongoing replication.

    It only makes sense to use MULTI-AZ when you are using CDC (ongoing replication) because you don’t need a backup if you need to do the load-only operation. If you are using load and ongoing replication, there is an option when you setup the task to stop the task after the initial load to allow you to disable the MULTI-AZ and enable it again when you finish, so that it will start the ongoing replication at that point rather than when the initial load is finished.

  6. Is there some performance metric that I should be concerned about in the replication instance?

    There are two very important metrics in the replication instance: Memory and Swap. These two variables are negatively correlated, so if your memory is low then the swap will probably be high – if there is insufficient memory then memory resources have to be swapped to disk. This prevents errors but the swapping to disk is bad for performance. Take care to keep an eye on these metrics. If the replication instance memory is low and the swap is high, then there is a high probability that your task will fail or run extremely slowly. Remember that it will try to do all the DMS transformations in memory in the replication instance but when there is no more free memory then it will have to swap out memory to disk.

  7. If, when I check my replication instance and find that it is’nt the size I need, can I just resize?

    Yes you can. You can resize your replication instance with just a few mouse-clicks in the DMS panel. It’s easy, simple and quick.

  8. How can I know which replication instance to use?

    To answer this question, we need to understand a little bit about how DMS works internally. All the tables are loaded individually and, by default, eight tables are loaded at time in the task. All the transactions (transformations) that DMS does in each table are done in memory, and when there is no more memory then it swaps to disk. Also, all the tasks are logged: You will need space on disk for that. So based on this you need to provision for:

    • Table Size – Large tables will take longer and the ongoing transactions must be cached until the table is loaded (memory). After that, the transactions are applied and no longer stay on disk.
    • DML – A busy transactional database will generate a lot of transactions, and that will require memory resources. Remember that transactions are applied after the table is loaded.
    • Transaction sizes – Again, the transactions need to be cached to be applied so if you have a huge transaction that accumulates tons of MB of data that data will need to be cached first (memory) before applying
    • Size of migration – Of course large migrations take longer and because each task is logged, large log files will be created.
    • Number of tasks – More tasks, more log files.

    So, based on these aspects, you can choose the right replication instance for you. As we saw in the last question, you can resize your replication instance at any time.

  9. Does DMS have some tool to help me in the migration that includes the Schema?

    Yes, DMS includes an awesome tool called the Schema Conversion Tool (SCT). This application will help you to migrate from heterogonous databases. If I want to migrate from SQL Server to MySQL, though I am not sure why I would ever want to do that, the SCT will get the original schema from SQL Server and will then create a new schema for MYSQL. You can apply or even create a DMS task from SCT. I recommend the use of this tool

    AWS Schema Conversion Tool Documentation

  10. Do I have limits on the number of replications instances for DMS?

    Yes, you do. For the explanation see here:

    Limits for AWS Database Migration Service

  11. What are endpoints in DMS jargon?

    Endpoints are the source and target databases.

  12. What are the sources and targets for DMS?

    DMS can deal with a number of different RDBMS products, though the sources and targets are most likely to be Oracle, SQL Server or MYSQL (on-premise and RDS). Check the document below to see the versions of the databases that are currently supported

  13. Can I use event and notifications in DMS , so that when creating a replication instance, I can be notified by email?

    Yes, and it works like a charm. Through SNS (Amazon Simple Notification Service), you can be notified when a DMS Event occurs, and you can use any kind of notification that AWS in the region supports, such as email, text, call or HTTP endpoint.

    AWS DMS groups the events in categories, and you can subscribe to it and will be notified when an event in this category occurs. As an example, if you want to be notified when the replication instance changes its class, you can subscribe to ‘Category Configuration Change, DMS Event ID – DMS EVENT-0012’ that corresponds to “REP_INSTANCE_CLASS_CHANGING – The replication instance class for this replication instance is being changed.”

    You can go further than this when the event notification is triggered, by calling a lambda function that will do something. The sky is the limit.

    For a complete list of events, check AWS Database Migration Service documentation, specifically Working with Events and Notifications

  14. Can I use CLI for the operations in DMS?

    Yes of course; and I recommend that you do this because some operations can only be done by CLI. It’s a little bit more complex to achieve but I am pretty sure you will be able to do it.

  15. Can I perform data transformation on the fly?

    No, unfortunately at this point DMS does not allow you change data on the fly. If you need to do it, basically you will need to change it in the source. Using Oracle as source, you can make your data transformations in views and DMS will pull the data from it. This option it is in Extra connection attributes in the Advanced section of the Oracle source endpoint. You will need to check that the view will be a table in the target schema.