Backing up MySQL Part 3: mysqlpump

In this blog, we walk you through the brother of mysqldump - mysqlpump - and tell you everything about it. Have a read!

In the MySQL world, there are a couple of ways to take backups of your data. Backups in MySQL can be categorized into two distinct categories: logical or physical. MySQL also comes with a lot of tools helping us achieve our backup objectives: we have walked you through one of the most popular offerings in this space – Percona XtraBackup – in the second part of this series, and the first part of this series covered one of the main logical MySQL backup tools called mysqldump. As far as backups are concerned, though, we are far from done – for example, mysqldump alone has a brother called mysqlpump. Since the two CLI-based backup tools are frequently confused with each other, the aim of this blog is to clear the confusion up a little.

What is mysqlpump?

mysqlpump is a backup utility that is used via the command-line interface. The tool is very similar to mysqldump in that it provides us with the ability to take logical backups, but also different at the same time – the goal of mysqlpump is to be an extendable, parallel-supporting replacement of mysqldump. In their blog from 2015, MySQL team said that one of the primary aims of introducing mysqlpump was not be forced to implement legacy functionality that is provided by mysqldump.

mysqlpump was introduced to the world of MySQL in 2015 – the tool was added to MySQL 5.7.8.

Using mysqlpump

The most basic use case of mysqlpump looks like this (the tool can be used in the same fashion both on Linux and on Windows):

mysqlpump is designed in such a way that it will only dump tables that were created by a user – in other words, it will refrain from backing up tables that were created internally by MySQL (it won‘t touch any of the internal tables.)

The one feature that is exclusive to mysqlpump, though, is the ability to issue backups using multiple threads at once – to figure out how many threads your system has, follow these steps:

  1. Linux users – issue a command like lscpu or lscpu | egrep ‘Thread|CPU\(s\)’ and look at the output (you should see something similar to “Thread(s) per core.”)
  2. Windows users – open up task manager and look at the number next to “Logical processors” – they represent the amount of threads in the system:

Once you know the number of threads in your system, think about the number of threads you wish (and can) allocate to mysqlpump – things might seem difficult and complex, but your thought process shouldn’t take more than a couple of seconds – mysqlpump is usually quick to complete too.

Once you know the number of threads in your infrastructure, configure the number of threads that can be used together with mysqlpump. Threads are to be configured using the --default-parallelism and --parallel-schemas options. Also, consider providing the username and password that you’re using to access MySQL itself through my.cnf for secure access: passwords provided through the CLI can be observed by accessing the history of issued commands. If you don’t want to do that, feel free to provide a username and a password, but be aware that mysqlpump itself will issue you warnings if you do so:

mysqlpump --default-parallelism=[threads] > data.sql

Since the “specialty” of mysqlpump is to back up data using multiple threads, we can also define the number of threads that we want the tool to use to dump a specific database – this functionality of mysqlpump can be exceptionally useful if we’re working with bigger sets of data, but need a logical backup to be taken.

mysqlpump – the Details

  • To specify more threads to dump a large database and less threads for smaller databases, look at the option named parallel-schemas. If you define your parameters like so, 6 threads will be used for larger databases, and 2 for smaller ones:
    mysqlpump --parallel-schemas=6:large_db1,large_db2 --parallel_schemas=2:small_db [--default-parallelism=x] > backup.sql
    Note the --default-parallelism option: a default number of threads when dumping other databases (databases that are not defined) can also be specified.
  • To back up only specific databases, define your query like so (here db_1 and db_2 represent two separate databases):
    mysqlpump –-databases db_1 db_2 > databases.sql
    You can also accomplish the same goal like so:
    mysqlpump --include-databases=db_1,db_2 --result-file=dump.sql
  • mysqlpump can also perform an “empty” dump (meaning that it can only back up the schema of the database, but not the data contained within):
    mysqlpump --include-databases=data_1,db2,db3 --skip-dump-rows --result-file=mydata.sql
  • mysqlpump also gives us the ability to use wildcards inside of any parameter. That means that if we have a lot of databases, backing a part of them (only those starting with the letter x, for example) is a breeze:
    mysqlpump --include-databases=x% --result-file=s_databases.sql
  • We can also exclude databases from being backed up like so (the following query would exclude all databases starting with test_ and back up all the rest):
    mysqlpump --exclude-databases=test_% --result-file=data.sql
  • We can also work with entire patterns (the following query would exclude all tables matching the demo pattern from the beginning):
    mysqlpump --exclude-tables=__demo --result-file=backup.sql
  • And last but not least, mysqlpump is also able to work with events and routines – all the same, just specify events or routines instead of databases and tables:
    mysqlpump --[include|exclude]-[events|routines]=title1,title2 --result-file=verycoolbackup.sql

mysqlpump can do a number of other things as well – all of the information on mysqlpump, as with everything MySQL-related, can be found at the documentation.

The Downsides of mysqlpump

However, as powerful as mysqlpump is, it’s also not without its weaknesses. Research made by Giuseppe Maxia back in September 2015 suggests that mysqlpump is faster than mysqldump, but only slightly – the blogger provides an example where mysqldump takes 3 minutes and 33 seconds to execute, whereas mysqlpump takes 2 minutes and 55 seconds – the difference is there, but we presume that the blogger has expected it to be way bigger than it was.

The research was made with approximately 20 million rows inside of the tables in the database – a figure big enough for both scripts to handle, but not large enough to invoke SELECT * INTO OUTFILE (a command used to back up bigger sets of data.) This suggests that mysqlpump is good if we find ourselves needing to use specific options not available to mysqldump, but doesn’t provide much of an upside otherwise.

Summary

mysqlpump entered the MySQL scene in 2015 with the release of MySQL 5.7.8. At first it was thought that it could be a reliable replacement for mysqldump and also offer a couple of “exotic” options not available to its counterpart, but as time went by mysqlpump proved that it still needs more refinement to be considered a reliable replacement for mysqldump.

Both mysqldump and mysqlpump are part of the MySQL’s ecosystem to this day, yet people steer towards mysqldump more than they do towards mysqlpump – while mysqlpump has its use cases, it is thought that its speed is not significant enough to outweigh its counterpart.

We hope that you’ve enjoyed reading this article, stay around the Redgate blog to learn more about everything database-related, and until next time.