Database migration with AWS DMS
AWS Database Migration Service (DMS) is a cloud service that enables one-time or continuous migration of data between a source and a target: these could be relational or NoSQL databases, or other types of data stores.
At a high level, DMS is an EC2 instance which runs one or more replication tasks that perform ETL (extract-transform-load) from a source endpoint to a target endpoint.
Replication tasks can run in one of three different modes (migration type options):
- Full Load: this migrates all (and only) data that existed when the task started
- Change Data Capture (CDC) only: this only replicates data changes that happened after the task started
- Full Load + CDC: this is a combination of the above two; first, all existing data is migrated; then, all changes that have happened in the meanwhile and will happen from now on will be replicated
The behaviour of a replication task is determined by so-called table-mapping rules, which include:
- selection rules: which tables, views, and schemas to load
- transformation rules: what transformations to apply to source tables and schemas before they are loaded in the target
- table-setting rules: how DMS tasks should load data for individual tables
Here is an example of such rules, in JSON:
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "sel_shoes_brand",
"object-locator": {
"schema-name": "src",
"table-name": "shoes_brand"
},
"rule-action": "include",
"load-order": 2,
"filters": []
},
{
"rule-type": "selection",
"rule-id": "2",
"rule-name": "sel_shoes_model",
"object-locator": {
"schema-name": "src",
"table-name": "shoes_model"
},
"rule-action": "include",
"load-order": 1,
"filters": []
},
{
"rule-type": "transformation",
"rule-id": "3",
"rule-name": "change_column_type",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "shoes_model",
"column-name": "neutral_running"
},
"rule-action": "change-data-type",
"data-type": {
"type": "boolean"
}
}
]
}
The above snippet showcases selection and transformation rules: for example, the change_column_type
rule changes the type of the neutral_running
column to boolean
. This can be useful, for example, when migrating from one database engine to another (e.g. to convert MySQL’s tinyint
to PostgreSQL
’s boolean
). Check the documentation for an exhaustive list of available actions.
Note: The load-order
fields in the above selection rules instruct DMS to first load the whole shoes_brand
table (load_order: 2
), and only then load the shoes_model
table (load_order: 1
): the higher the value, the higher its priority. If your tables have foreign keys, this is critical to ensure a predictable replication order.
Replication tasks can be created using the DMS web console, the AWS CLI, or CloudFormation. For example, this is what a CloudFormation template would look like:
AWSTemplateFormatVersion: "2010-09-09"
Description: My Replication Task
Resources:
ReplicateMyData:
Type: AWS::DMS::ReplicationTask
Properties:
MigrationType: full-load-and-cdc
ReplicationInstanceArn: arn:aws:dms:eu-west-1:123456789012:rep:my-replication
ReplicationTaskIdentifier: my-replication-task
SourceEndpointArn: arn:aws:dms:eu-west-1:123456789012:endpoint:ABC
TargetEndpointArn: arn:aws:dms:eu-west-1:123456789012:endpoint:XYZ
ReplicationTaskSettings: |
{
"TargetMetadata": { ... },
"FullLoadSettings": { ... },
"Logging": { ... },
"ControlTablesSettings": { ... },
"StreamBufferSettings": { ... },
"ChangeProcessingDdlHandlingPolicy": { ... },
"ErrorBehavior": { ... },
"ChangeProcessingTuning": { ... },
"PostProcessingRules": null,
"CharacterSetSettings": null,
"LoopbackPreventionSettings": null,
"BeforeImageSettings": null,
"FailTaskWhenCleanTaskResourceFailed": false
}
TableMappings: |
{
"rules": [
...
]
}
and here’s how such a template could be deployed (assuming you are already logged into AWS CLI):
aws cloudformation deploy --template-file my_replication_task.yaml --stack-name my-stack --capabilities CAPABILITY_IAM
After its deployment, the task’s lifecyle can be controlled either in the DMS web console or using the AWS CLI. Below are some of the available commands:
# start it
aws dms start-replication-task --replication-task-arn $ARN --start-replication-task-type start-replication
# stop it
aws dms stop-replication-task --replication-task-arn $ARN
# resume it, after it has been stopped
aws dms start-replication-task --replication-task-arn $ARN --start-replication-task-type resume-processing