Smarter MySQL backups and restores

Published Thursday, Jan 06 2011

Introduction

This is the first part of a mini series that will cover various topics related to MySQL administration (in particular MySQL backups and restores) as well as several tips and tricks on using MySQL in our applications. It will be a way for me to finally reorganize some notes in a more readable format, and at the same time it will be a good chance to share these notes, hoping you also find them useful! In this first part, I thought I’d put together some sort of “reference” on how to use mysqldump for backups, how to restore these dumps, and pretty many cool tricks in between. Check out also the next post for a better alternative to mysqldump that can perform hot backups of MySQL databases, without downtime, especially useful with large databases.

If you work with MySQL at any level, chances are you may need to backup your databases occasionally or on a regular basis. MySQL is a great RDBMS: it performs very well in most scenarios, has got all you need for most applications, and it’s free (and hopefully will remain so), coming bundled with a number of useful tools and utilities, also free.

But one thing about MySQL that has always disappointed me, though, is that while there is a choice of free tools for backing up databases, the tools that come bundled with MySQL server for this purpose aren’t really great. Absolutely the most popular one among these is mysqldump, as it’s very easy to use, and performs backups by producing simple text files containing DROP/CREATE/INSERT SQL statements; once replayed on the destination MySQL instance, these statements will recreate all the databases, tables and data. mysqldump can also create dumps in other formats such as CSV and XML, however SQL is the most used format.

I am not a particularly passionate fan of mysqldump, as I prefer using alternative tools that are also free and that perform dramatically better than mysqldump in a number of cases. One of the main reasons though, why I don’t like mysqldump too much and try to avoid using it whenever I can, is performance. mysqldump works just fine with small databases, but it can be a terrible pain with large or very large databases, especially if you are so unlucky to have to restore one! Pick any database with tables containing more than a few million rows, and try restoring it. Unless you are very patient and are not in rush, it is likely that it’ll take longer than you think would be reasonable, and perhaps you’ll give up and look for something else.

However, mysqldump is, still, the only backup tool that most MySQL users know and use, and it’s available wherever MySQL is. Also, at times a plain SQL-based backup may be useful, as it can often be restored more easily to a different version of MySQL without particular difficulties.

So, here are a few useful tips that can help you save some time when backing up MySQL databases with mysqldump, and restoring any of these dumps, plus some nice tips.

MySQL backups and restores, the basics…

Backups, or “dumps”, are extremely easy to create with mysqldump. Here are a few examples:

# Backup a single database
# NOTE: you can also use the shortcuts -u and -p to specify username and password
mysqldump --user=... --password=... your_database_name > your_database_name.sql

# Backup multiple databases:
mysqldump --user=... --password=... --databases db1 db2 > backup.sql

# or all the databases in one shot:
mysqldump --user=... --password=... --all-databases > backup.sql

# Optionally, it's easy to compress the backup, for example with gzip:
mysqldump --user=... --password=... your_database_name \
| gzip > your_database_name.sql

# And, if your database also has other objects (apart from tables) such as functions,
# views, and stored procedures, you can back them up too with --routines:

mysqldump --user=... --password=... --routines \
your_database_name > your_database_name.sql

# It may also be useful to include a timestamp in the target file name, so to know
# right away when a backup was taken:

mysqldump --user=... --password=... \
your_database_name > "your_database_name-$(date +%Y-%m-%d-%H.%M.%S).sql"

# If you are backing up data with the purpose of restoring it to an instance you want
# to use as replication slave, then the option --master-data is handy as it adds to
# the dump the information needed to configure the connection with the
# replication master:

mysqldump --user=... --password=... --all-databases --master-data > backup.sql

Restoring is pretty simple too. Usually it is done this way:

mysql -u.. -p... db_name < db_name.sql

The command above is very similar to the previous one, but here we tell MySQL client to import and replay the SQL statements contained in our backup file (or dump). As we’ll see later in this post, there are also smarter ways of backing up and restoring dumps created with mysqldump, that would work better in some cases.

Speeding things up

Dumps

With the most recent versions of MySQL commonly in use these days, the single most useful option available with mysqldump is –opt. This option is actually a sort of shortcut for a group of other options; among these, there are a few ones that can help speed up backups, but also -more importantly- restores.

Update: it looks like the –opt option is now switched on by default on the latest versions of MySQL, so the following details are for reference in case you use a very recent version (and you should!).

These options are:

--add-locks: Affects restores: this option ensures that each table is locked while restoring, so to allow dropping and recreating the tables. At the same time, because a table remains locked to other transactions while restoring the data, inserts happen more quickly, therefore reducing the time taken to restore the content of the table

--create-options: Affects restores: makes the creation of a table quicker by merging into the CREATE TABLE statement anything that has to do with defining the structure of the table.

--disable-keys: Affects restores: it helps when restoring databases using MyISAM as storage engine. Delays the creation of the indexes for a table until all the data in that table has been restored. This results in an overall faster restore of the table vs updating indexes while restoring the data.

--extended-insert: Affects both dumps and restores: this can speed up A LOT restores, as it produces in the final SQL dump INSERT commands with multiple sets of values, resulting in the insertion of multiple rows at once. As a side benefit vs having a separate INSERT statement for each row, the resulting SQL dump will also be smaller, taking up less storage space.

--lock-tables: Affects dumps: improves dumping of MyISAM tables, by locking all the tables during the dump.

--quick: Affects dumps: when dumping large tables, this option prevents buffering the whole tables before dumping them to the backup file. Instead, rows are fetched and dumped right away to file, resulting in an overall faster and lighter dump thanks to reduced load on the system.

Restores

As I’ve just suggested, the –opt argument also helps with speeding up restores. However there’s another trick that I use whenever I need to restore a dump, as it can save a lot of time. When a dump is being restored, I simply disable a number of checks that MySQL has to perform to ensure the integrity of the data, with foreign keys and more, and then I enable these again soon after the data has been completely restored.

One problem is that if the dump file I need to restore are pretty large, it’s not a good idea to edit the content of the file to make these changes. So.. cat to the rescue!

I basically use cat to produce a new string output containing the changes I just described plus the content of the original dump file, and then I stream this output directly to the target MySQL instance for restore:

(
echo "SET AUTOCOMMIT=0;"
echo "SET UNIQUE_CHECKS=0;"
echo "SET FOREIGN_KEY_CHECKS=0;"
cat dump.sql
echo "SET FOREIGN_KEY_CHECKS=1;"
echo "SET UNIQUE_CHECKS=1;"
echo "SET AUTOCOMMIT=1;"
echo "COMMIT;"
) | mysql -u... -p... target_database

Believe me, I would never want to restore a large database without this trick! Another tip that I’d like to suggest is having a shell function ready to use so you won’t have to type this command whenever needed. For example:

MYSQL_USER="..."
MYSQL_PASSWORD="..."

function restore() {
echo $1;
(
echo "SET AUTOCOMMIT=0;"
echo "SET UNIQUE_CHECKS=0;"
echo "SET FOREIGN_KEY_CHECKS=0;"
cat "$1.sql"
echo "SET FOREIGN_KEY_CHECKS=1;"
echo "SET UNIQUE_CHECKS=1;"
echo "SET AUTOCOMMIT=1;"
echo "COMMIT;"
) | mysql -u"$MYSQL_USER" -p"$MYSQL_PASSWORD" "$1"
}

You can then use this function this way:

restore db_name

Which expects there’s a file named db_name.sql. The above is just an example. As we’ll see later, you may want to add this shortcut to another shell function I am suggesting in the section “Other useful tips”.

Working with remote hosts

When you work with different environments (such as development, testing, staging, quality assurance, production, etc.), you most likely have to deal with different hosts, often located in different data centres, perhaps in different geographical regions and far away from each other. In these cases, it may be difficult to transfer MySQL dumps from an environment to another, and then restore them to different server instances. This is especially true if the connection available between the two ends is not speedy, or if the database dumps -even if compressed- are too large.

In such cases, instead of dumping databases to files, migrating these files to a different host over the Internet or other connection, and restoring the data to the target host, it is possible to skip the intermediate step and restore directly to the target host while the data is still being backed up. This is called streaming, and can be done in various ways.

Between hosts sharing the same network / between two MySQL instances on the same host / between two hosts with no firewall restrictions

The most straight forward scenario is when the source MySQL instance and the target instance share either the same host or at least the same network, or they are even in different networks and hosts but are allowed to communicate directly with each other. That is, there are no firewall rules forbidding this point-to-point communication. In these cases, you can dump data from an instance and restore it to the other instance while the dump is still being performed:

mysqldump --u... -p... --opt source_db \
| mysql -u... -p.... --host=target_host target_db

With SSH

When direct communication between the hosts is forbidden, or when the data is sensitive and you need encryption, you can use SSH. In the following example, a database is being restored directly from a remote host to a local instance:

ssh user@remote_host mysqldump -u... -p... --opt source-db \
| mysql -u... -p... target_db

Similarly, if you want to restore a local database to a remote host:

mysqldump -u... -p... source-db \
| ssh user@remote_host mysql -u... -p... --opt target_db

If you, instead, only want to dump a remote database locally, or dump a local database to a remote file system:

# dumping remote => local
ssh user@remote_host 'mysqldump -u... -p... dbname \
| gzip' > backup-`date +%Y-%m-%d`.sql.gz

# dumping local => remote
mysqldump -u... -p... dbname | gzip \
| ssh user@remote_host "cat > backup-`date +%Y-%m-%d`.sql.gz"

With netcat

I wanted to mention a third option that uses netcat and that perhaps is not very popular, but that I like a lot and works really well when there are no firewall restrictions between source and target MySQL instances. I wouldn’t use this technique when I need to transfer sensitive data as it does not use encryption as SSH does, however whenever this is not an issue transferring data this way can be a lot speedier, especially if I need to transfer really large databases!

Between SSH and netcat, it is perhaps more correct to talk about streaming if the tool in use is netcat. So it requires two simple steps. First, we need to tell netcat to listen on a custom port -for example 4567- on the target host, and to save the data it will receive from the source host to a local file:

nc -l 4567 > backup.sql.gz

As you can see, I’ve appended the extension .sql.gz to the file’s name. This is because netcat will be receiving compressed data, as we have seen in other examples. Next, we’ll execute mysqldump on the source host, and stream it output -compressed- to the target host:

mysqldump -u... -p... --opt dbname | gzip | nc -w1 target_host_ip_or_hostname 4567

One cool thing is that netcat will automatically stop listening, and will close the target dump file, as soon as the dump is completed and therefore the streaming is stopped.

Other useful tips

Exporting only the schema, not the data

If you want to quickly create a database having the same schema as an existing database, you can dump only the schema, but not the data, from the existing one and then restore the dump as usual:

mysqldump -u... -p... --no-data dbname > dbname.sql

Dumping one or multiple databases and automatically deleting dumps older than X days

If you backup your database(s) regularly, for example with a cron job, you may want to keep an eye on the disk space used by backups. It is easy to automatically delete backups older than X days, as follows:

mysqldump ... && find /backup-location/* -mtime +15 -exec rm {} \;

In the example above, we dump the databases first (I have omitted arguments for mysqldump as these will depend on your backup needs), then we find and delete backups older than 15 days.

Dumping multiple databases, but to separate files

As we’ve seen in the second section, the option –all-databases gives you the possibility to backup all the databases available. However, the databases will all be backed up to the same dump file, so sometimes you may prefer having a separate dump for each database. This can be useful in the case you need to restore one database, but need to leave the other ones alone. First, we need to find out which databases are available, and perhaps exclude some of them from the backup. Say, for instance, that you want to backup all the databases but mysql, information_schema. First, we list all the databases with

mysql -u... -p... -e "show databases;"

You should see something similar to this:

+-----------------------------+
| Database |
+-----------------------------+
| information_schema |
| database1 |
| database2 |
| mysql |
+-----------------------------+

We need to manipulate this output a little bit, so that we can extract from it just the database names, filtering out all the other characters that are not needed:

mysql -u... -p... -e "show databases;" | tr -d "| "

Which produces:

Database
information_schema
database1
database2
mysql

Almost there. We now want to remove the Database heading, and also exclude the databases mysql and information_schema, as per the example (unless you want to back them up too):

mysql -u... -p... -e "show databases;" \
| tr -d "| " \
| egrep -v "(Database|mysql|information_schema)"

Here we are:

database1
database2

We now have the clean list of the names of the databases we want to backup to separate files. All we have to do is chain this command with a simple for..do loop that will execute mysqldump for each database:

for db in \
`mysql -u... -p... -e "show databases;" \
| tr -d "| " \
| egrep -v "(Database|mysql|information_schema)"`; \
do mysqldump -u... -p... --opt --routines --databases $db > "$db.sql"; done

By executing the command above, you will create as many .sql dump files as the number of databases you have backed up.

Restoring from multiple dumps

If you dump each database to a different file, you can restore all these databases -should you need to restore all of them- this way:

for file in `ls *.sql`; do echo $file && mysql -u... -p... \
"`echo $file | sed "s/\.sql//"`" < $file; done

Note: Some frameworks, such as Rails and -usually- others based on Ruby, encourage using a different database for each environment, for a number of reasons. Convention wants that these databases be named after the environment, for example: blog_development, blog_test, and blog_production. Sometimes you may need to restore, for example, production databases to a development environment, perhaps to test some new code on live data. How would you do this, if you have backed up the production databases to separate files as described in the previous tip? In this case, your dumps would be named like dbX_production.sql (or something similar), while your development databases -provided they already exist and that you want to overwrite them with production data- would be named like dbX_development.

You can restore all the databases from the dumps with one command as follows:

for file in `ls *.sql`; do echo $file && sed "s/production/development/" $file \
| mysql -u... -p... "`echo $file | sed "s/production/development/" \
| sed "s/\.sql//"`" ; done

In the example above, we use sed to replace the “production” with “development”. So, for example the dump named db_production.sql would be restored to the database db_development without having to rename files, which can be useful if you want somehow to automate these commands.

Shell shortcuts

As I need often these commands and others, I find it useful to create some bash aliases and functions (as I usually use bash as shell), so to save some typing. I usually add to my bash profile these lines:

MYSQL_USER="..."
MYSQL_PASSWORD="..."
ENVIRONMENT="development"

mysql() {
case $1 in
alldbs ) mysql -u"$MYSQL_USER" -p"$MYSQL_PASSWORD" -e "show databases;" \
| egrep -v "(Database|mysql|information_schema)" ;;
backup ) <commands seen in previous section> ;;
restore ) <commands seen in previous section> ;;
* ) eval '`which mysql` -u"$MYSQL_USER" -p"$MYSQL_PASSWORD" $@';;
esac
}

# usage:
# mysql alldbs => to list the available databases
# mysql backup => to dump all the databases
# and so on...

So I just add to that function as many shortcuts as I need for the commands that I use most often. Of course, if you create such shortcuts in a production environment that can be accessed by others, or just for improved security, you may want to avoid adding your credentials in the bash profile.

…and One More Thing

In these pages I am sharing with you several useful tips on mysqldump, but hold on! As promised in the introduction this is only part of a sort of mini series on MySQL, and certainly not the last on backing up and restoring MySQL data.

As I’ve already mentioned, I only use mysqldump with small databases, while with large databases or with more complex setups (replication to name one), I prefer using other tools.

I will publish pretty soon another post describing these tools, and how to use them in those more complex setups where mysqldump, simply put, may not be enough.

So, stay tuned! In the meantime, if you find this sort of mini “reference” on mysqldump & co. useful, don’t forget to subscribe to my feed and to share with others!

Previous: A serialisable and validatable tableless model to get rid of a few tables and speed things up

Next: MySQL hot backups with Percona Xtrabackup