Moving databases between servers with minimal downtime

Recently we decided to roll out new database infrastructure for many of our backed systems. One of the challenges we faced after deploying the new infrastructure was how to migrate the data from the current production database server to the new server without having to take the system offline while we moved the data. The easiest way to move the data would have been to take the service offline, move the data over, and then point clients at the new server; however, this outage is a massive inconvenience for customers, and so we wanted to find a way to do it without any downtime, and without any data loss. 

Moving the data

The first method we thought of was to snapshot the data and load it onto the new server without shutting down the service. However, this meant that any new data or updates that came in while we were loading the snapshot onto the new server would have been lost if we switched the clients to the new server. 

Replication to the rescue! If you enable the binary-log in MariaDB, you can instruct mysqldump to store the current GTID in the output file. We loaded the snapshot onto the new server, and then set up simple replication from the old server to the new one. Since the master GTID was also loaded as a part of the snapshot, our replica server easily fetched all the changes since the snapshot, and we had identical data between the two systems. Even as new transactions were executed on the old server, they were replicated to the new server, so there would be no data loss whenever we switched the clients over. 

Pointing clients at the new server

Now that we had identical data on both servers, we needed to tackle the problem of pointing the clients at the new server. It was important that all the clients moved simultaneously because our data replication is only one way. Changes made on the new server won’t be pushed back to the old server, and state may quickly diverge. This can at best lead to duplicate records, and at worst may cause some catastrophic failures, depending on your data. We considered simply updating the DNS records the clients used to connect to the server; however, we realized that due to propagation time and caching the change wouldn’t be instant for all clients. 

Instead of trying to find a way to move all the clients at once, we wondered if there was a way to remove that requirement. We briefly considered enabling bidirectional replication between the two servers. Since updates from either service would replicate to the other, we could be sure that data would be identical across both. We quickly realized that bi-directional replication has issues resolving auto-incrementing IDs consistently as we almost immediately ran into duplicate key errors. This is a common problem in multi-master replication if you have two clients writing to the same database at the same time on different servers. 

The method we used settled on to tackle this problem was to make use of ProxySQL, a layer 7 proxy server for MySQL and MariaDB. By setting up ProxySQL on the original server to proxy connections to the new server, we created two entrances to the same system instead of two separate systems replicating data. This meant that it didn’t matter which server a given client was connected to, it all went to the same destination. When we were sure that the two servers were in-sync, we severed the replication, quickly stopped the MariaDB process on the original server and enabled the ProxySQL process in its place. Clients experienced a momentary blip, but were immediately reconnected. 

Since it no longer mattered which server a client was connected to, we didn’t have to move them all at once. We were able to update the DNS record and watch as connections on the original server dwindled and eventually ceased. Once we were sure nothing was connecting to the old server, we simply shut it down.

Summary

With the help of replication and ProxySQL, we were able to completely migrate all the data from one database server to another, re-point all clients to the new server, and decommission the old server, without any configuration changes needed on the client side, and with only a momentary blip of downtime. 

Pitfalls and suggestions

  • Because the method to ensure consistent data relies heavily on replication, be sure to get some experience with how replication works in MariaDB/MySQL before using this technique on production data, especially GTID-based replication. 
  • ProxySQL requires a copy of all user credentials for the backend server it is connecting to. If you swap to ProxySQL before configuring the users, clients will not be able to connect. Percona has written a useful tool that can sync users from the backend server to ProxySQL: https://github.com/percona/proxysql-admin-tool
  • Triple check that your ProxySQL setup works correctly by binding it to another port and checking connections before switching to it. You’ll need to restart ProxySQL with the --reload flag when you change the config file. ProxySQL ignores the config file after the first run unless the reload flag is passed in.
  • Replication will work between servers running different versions of MariaDB, but to ensure that everything works smoothly, the replica should be running the higher version. This can be useful if you need to upgrade the OS version of your server and the versions of MariaDB differ. 
  • Replication from a MySQL server to a MariaDB server works, but with some considerations. Be sure to read up on it before attempting.
  • This technique can be used to pull a standalone server into a cluster, by replicating the data from the standalone server into the master node of the cluster. Keep in mind that some services handling dynamic routing or failover may treat the master node differently when it is replicating from another source than if it were a simple master node. In particular, replication-manager will reclassify the node as a “relay”, and complain that the cluster has no master node.
  • If you’re migrating data into a server (or cluster) that already has data, be sure NOT to override any databases that already exist. In particular, be sure not to override the mysql database that contains user credentials. You can pass in the --databases flag to mysqldump when taking the snapshot and only include specific databases you want to move. You will also need to set up a replication filter to ensure that only those databases are replicated.