Database migration with AWS DMS

3 minute read

AWS Database Migration Service (DMS) is a cloud service that enables one-time or continuous migration of data between a source and a target: these could be relational or NoSQL databases, or other types of data stores.

At a high level, DMS is an EC2 instance which runs one or more replication tasks that perform ETL (extract-transform-load) from a source endpoint to a target endpoint.

Replication tasks can run in one of three different modes (migration type options):

  • Full Load: this migrates all (and only) data that existed when the task started
  • Change Data Capture (CDC) only: this only replicates data changes that happened after the task started
  • Full Load + CDC: this is a combination of the above two; first, all existing data is migrated; then, all changes that have happened in the meanwhile and will happen from now on will be replicated

The behaviour of a replication task is determined by so-called table-mapping rules, which include:

  • selection rules: which tables, views, and schemas to load
  • transformation rules: what transformations to apply to source tables and schemas before they are loaded in the target
  • table-setting rules: how DMS tasks should load data for individual tables

Here is an example of such rules, in JSON:

{
  "rules": [
    {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "sel_shoes_brand",
      "object-locator": {
        "schema-name": "src",
        "table-name": "shoes_brand"
      },
      "rule-action": "include",
      "load-order": 2,
      "filters": []
    },
    {
      "rule-type": "selection",
      "rule-id": "2",
      "rule-name": "sel_shoes_model",
      "object-locator": {
        "schema-name": "src",
        "table-name": "shoes_model"
      },
      "rule-action": "include",
      "load-order": 1,
      "filters": []
    },
    {
      "rule-type": "transformation",
      "rule-id": "3",
      "rule-name": "change_column_type",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "shoes_model",
        "column-name": "neutral_running"
      },
      "rule-action": "change-data-type",
      "data-type": {
        "type": "boolean"
      }
    }
  ]
}

The above snippet showcases selection and transformation rules: for example, the change_column_type rule changes the type of the neutral_running column to boolean. This can be useful, for example, when migrating from one database engine to another (e.g. to convert MySQL’s tinyint to PostgreSQL’s boolean). Check the documentation for an exhaustive list of available actions.

Note: The load-order fields in the above selection rules instruct DMS to first load the whole shoes_brand table (load_order: 2), and only then load the shoes_model table (load_order: 1): the higher the value, the higher its priority. If your tables have foreign keys, this is critical to ensure a predictable replication order.

Replication tasks can be created using the DMS web console, the AWS CLI, or CloudFormation. For example, this is what a CloudFormation template would look like:

AWSTemplateFormatVersion: "2010-09-09"
Description: My Replication Task
Resources:
  ReplicateMyData:
    Type: AWS::DMS::ReplicationTask
    Properties:
      MigrationType: full-load-and-cdc
      ReplicationInstanceArn: arn:aws:dms:eu-west-1:123456789012:rep:my-replication
      ReplicationTaskIdentifier: my-replication-task
      SourceEndpointArn: arn:aws:dms:eu-west-1:123456789012:endpoint:ABC
      TargetEndpointArn: arn:aws:dms:eu-west-1:123456789012:endpoint:XYZ
      ReplicationTaskSettings: |
        {
          "TargetMetadata": { ... },
          "FullLoadSettings": { ... },
          "Logging": { ... },
          "ControlTablesSettings": { ... },
          "StreamBufferSettings": { ... },
          "ChangeProcessingDdlHandlingPolicy": { ... },
          "ErrorBehavior": { ... },
          "ChangeProcessingTuning": { ... },
          "PostProcessingRules": null,
          "CharacterSetSettings": null,
          "LoopbackPreventionSettings": null,
          "BeforeImageSettings": null,
          "FailTaskWhenCleanTaskResourceFailed": false
        }        
      TableMappings: |
        {
          "rules": [
            ...
          ]
        }        

and here’s how such a template could be deployed (assuming you are already logged into AWS CLI):

aws cloudformation deploy --template-file my_replication_task.yaml --stack-name my-stack --capabilities CAPABILITY_IAM

After its deployment, the task’s lifecyle can be controlled either in the DMS web console or using the AWS CLI. Below are some of the available commands:

# start it
aws dms start-replication-task --replication-task-arn $ARN --start-replication-task-type start-replication
# stop it
aws dms stop-replication-task --replication-task-arn $ARN
# resume it, after it has been stopped
aws dms start-replication-task --replication-task-arn $ARN --start-replication-task-type resume-processing