Resque: automatically kill stuck workers and retry failed jobs

Resque is a great piece of software by Github that makes it really easy to perform some operations (‘jobs’) asynchronously and in a distributed way across any number of workers. It’s written in Ruby and backed by the uber cool Redis key-value data store, so it’s efficient and scalable. I’ve been using Resque in production for a couple years now after it replaced Delayed Job in my projects, and love it. If your projects do something that could be done asynchronously, your really should check it out if you haven’t yet.

At OnApp we’ve been using Resque for a while to process background jobs of various types, with great results: in a few months, we’ve processed a little over 160 million jobs (at the moment of this writing), and out of this many only 43K jobs have been counted as failed so far. However, many of these failed jobs have been retried successfully at a successive attempt, so the number of jobs that actually failed is a lot smaller, perhaps a very few thousands.

Out of 160M+ jobs, it’s a very small percentage of failures. But despite the system, for the most part, has been rock solid so far, jobs can still fail every now and then depending on the nature of the jobs, excessive load on the worker servers, temporary networking and timeout issues or design related issues such as race conditions and alike. Sometimes, you will also find that workers can get “stuck”, requiring (usually) manually intervention (as in: kill / restart the workers, manually sort out failed jobs).

So I wanted to share a simple script I am using in production to automatically find and kill these “stuck” workers and then retry any jobs that are found as ‘failed’ due to the workers having been killed, or else. The purpose is to keep workers running and minimise the need for manual intervention when something goes wrong.

Please note that I use resque-pool to manage a pool of workers more efficiently on each worker server. Therefore if you manage your workers in a different way, you may need to adapt the script to your configuration.

You can find the little script in this gist, but I’ll briefly explain here how it works. It’s very simple, really. First, the script looks for the processes that are actually working off jobs:

root@worker1:/scripts# ps -eo pid,command | grep [r]esque
10088 resque-pool-master: managing [10097, 10100, 10107, 10113, 10117, 10123, 10138, 10160, 10167, 10182, 10195]
10097 resque-1.20.0: Forked 16097 at 1337878130
10100 resque-1.20.0: Forked 16154 at 1337878131
10107 resque-1.20.0: Waiting for cdn_transactions_collection
10113 resque-1.20.0: Waiting for usage_data_collection
10117 resque-1.20.0: Waiting for usage_data_collection
10123 resque-1.20.0: Waiting for check_client_balance
10138 resque-1.20.0: Waiting for check_client_balance
10160 resque-1.20.0: Waiting for geo_location
10167 resque-1.20.0: Forked 16160 at 1337878131
10182 resque-1.20.0: Forked 16163 at 1337878132
10195 resque-1.20.0: Waiting for services_coordination
16097 resque-1.20.0: Processing push_notifications since 1337878130
16163 resque-1.20.0: Processing push_notifications since 1337878132

This is an example from one of our worker servers. The Processing processes are those that are actually working off jobs, so these are the ones we are after since these are the processes that can get “stuck” sometimes for a reason or another. So the script first looks for these processes only, ignoring the rest:

root@worker1:/scripts# ps -eo pid,command | grep [r]esque | grep Processing
18956 resque-1.20.0: Processing push_notifications since 1337878334
19034 resque-1.20.0: Processing push_notifications since 1337878337
19052 resque-1.20.0: Processing usage_data_collection since 1337878338
19061 resque-1.20.0: Processing usage_data_collection since 1337878338
19064 resque-1.20.0: Processing usage_data_collection since 1337878339
19066 resque-1.20.0: Processing usage_data_collection since 1337878339

Next, the script loops through these processes, and looks for those that have been running for over 50 seconds. You may want to change this threshold, but in our case all jobs should usually complete in a few seconds, so if some jobs are still found after almost a minute, something is definitely going on.

ps -eo pid,command |
grep [r]esque |
grep "Processing" |
while read PID COMMAND; do
if [[ -d /proc/$PID ]]; then
SECONDS=`expr $(awk -F. '{print $1}' /proc/uptime) - $(expr $(awk '{print $22}' /proc/${PID}/stat) / 100)`

if [ $SECONDS -gt 50 ]; then
kill -9 $PID
...

QUEUE=`echo "$COMMAND" | cut -d ' ' -f 3`

echo "
The forked child with pid #$PID (queue: $QUEUE) was found stuck for longer than 50 seconds.
It has now been killed and job(s) flagged as failed as a result have been re-enqueued.

You may still want to check the Resque Web UI and the status of the workers for problems.
" | mail -s "Killed stuck Resque job on $(hostname) PID $PID" email@address.com

...
fi
fi
done

I was looking for a nice and easy way to find out how long (in seconds) a process had been running, and the expression you see in the code snippet above was the nicest solution I could find (hat tip to joseph for this).

If any of the Resque processes that are working off jobs are found running for longer than 50 seconds, then these are killed without mercy and a notification is sent to some email address just in case.

First, this way we don’t actually kill Resque workers, but other processes forked by the workers in order to process jobs. This means that the workers remain up and running and soon after they’ll fork new processes to work off some other jobs from the queue(s) they are watching. This is the nicest part, in that you don’t need to manually kill the actual workers and then restart them in order to keep the worker servers going.

Second, killing those processes will cause the jobs that they were processing to fail, so they will appear in Resque’s “failed jobs” queue. The second part of the script takes care of this by running a rake task that re-enqueues all failed jobs and clears the failed jobs queue. For starters, you’ll need to add this rake task to your application. If you are already using Resque, you will likely have a lib/tasks/resque.rake file, otherwise you’ll have to create one (I’m assuming here it’s a Rails application).

In any case, add the following task to that rake file:

desc "Retries the failed jobs and clears the current failed jobs queue at the same time"
task "resque:retry-failed-jobs" => :environment do
(Resque::Failure.count-1).downto(0).each { |i| Resque::Failure.requeue(i) }; Resque::Failure.clear
end

Back to the script, if it finds and kills any workers that it found stuck, it then proceeds to run the above rake task so to retry the failed jobs:

ps -eo pid,command |
grep [r]esque |
grep "Processing" |
while read PID COMMAND; do
if [[ -d /proc/$PID ]]; then
SECONDS=`expr $(awk -F. '{print $1}' /proc/uptime) - $(expr $(awk '{print $22}' /proc/${PID}/stat) / 100)`

if [ $SECONDS -gt 50 ]; then
...
touch /tmp/retry-failed-resque-jobs
...
fi
fi
done

if [[ -f /tmp/retry-failed-resque-jobs ]]; then
/bin/bash -c 'export rvm_path=/usr/local/rvm && export HOME=/home/deploy && . $rvm_path/scripts/rvm && cd /var/www/sites/dashboard/current/ && /usr/local/bin/rvm rvmrc load && RAILS_ENV=production bundle exec rake resque:retry-failed-jobs'
fi

You may notice that I am forcing the loading of RVM before running the rake task; this is because I need to upgrade some stuff on the worker servers, but you may not need to run the rake task this way.

This is basically it: the script just kills the stuck workers and retries the failed jobs without requiring manual intervention; in almost all cases, I don’t have to worry anymore about them besides wondering whether there’s a design issue that might cause workers to get stuck and that therefore need to be addressed (which is a good reason to keep an eye on the notifications). There might be other monitoring solutions of various types out there, but this simple script is what has been working best for me so far on multiple worker servers with tens of workers.

The final step is to ensure that this script runs frequently so to fix problems as soon as they arise. The script is extremely lightweight, so in my case I just schedule it (with cron) to run every minute on each server.

Know of a better way of achieving the same result? Please do let me know in the comments.

KVM LVM virtual machines: backups, cloning, and

This is a follow up to my previous post, so if you haven’t set up a KVM LVM virtual machine host yet, you may want to read that post first.

In the previous post I’ve described how to set up a very simple virtualised environment based on KVM and using LVM for the storage layer. I also mentioned that I usually do part of the administration with the bundled GUI, virt-manager, while I use command line tools for those administration tasks that aren’t available in the GUI; in second part I’ll describe some of the ones I use most often.

Backing up a KVM LVM virtual machine

When I need to backup a virtual machine, I usually just make a backup of the virtual disk attached to it. Surely it makes to backup also the configuration of the virtual hardware and any other settings concerning the virtual machines, but since creating a new virtual machine with virt-manager requires only a few clicks, I usually back up just the virtual disk.

A virtual disk is not very different from any raw disk, so the simplest way of backing up one is using the dd utility. For simplicity, I am assuming here that you have followed my advice in the previous post to use LVM for the storage pool containing your virtual disks. If that’s the case, each virtual machine’s disk will be an LVM logical volume and it will be mapped as /dev/<volume group>/<virtual machine>. You can list the available logical volumes with the lvs command.

For example, since in my case I store the virtual machines in the virtual-machines volume group, I can run the following command to list the virtual disks currently available in that volume group:

root@vmserver:~# lvs virtual-machines
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
backup-tests virtual-machines -wi-a- 7.81g
ha1 virtual-machines owi-ao 7.81g

As you can see, the disk for this sample virtual machine is available at /dev/virtual-machines/backup-tests.

If I want to make a backup of this raw disk as it is, I can run:

dd if=/dev/virtual-machines/backup-tests of=<backup file name> bs=512K

This will create a copy of the raw disk with identical size to that of the original disk. Intuitively, if represents the source disk, and of the name of the file that’ll contain the backup copy of the disk; bs instead is the block size. The optimal value for this parameter might depend on the particular configuration (both hardware and software) of your system, but in my case I found that the value 512K always yields the highest transfer rates.

To restore a virtual machine’s disk from a backup, I can just swap source and target disks:

dd if=<backup file name> of=/dev/virtual-machines/ha1 bs=512K

Of course, remember to shutdown the virtual machine before restoring the disk to a previous state.

You may also want to compress your backups to save storage space:

# backup
dd if=/dev/virtual-machines/backup-tests bs=512K | gzip -9 > <backup file name>

# restore
unzip <backup file> - | dd of=/dev/mapper/<target device>

One thing you’ll notice if you haven’t used the dd utility before is that it doesn’t show progress by default, which can be annoying if you are copying a large disk. This utility, though, does show something if you send the USR1 signal to its process while it’s copying. Therefore here’s a tip to show some sort of pseudo-realtime progress during the copy:

dd if=/dev/virtual-machines/ha1 of=<backup file name> bs=512K& pid=$!; \
sleep 1; while [[ -d /proc/$pid ]]; do kill -USR1 $pid && sleep 1; done
[1] 4986
251+0 records in
250+0 records out
131072000 bytes (131 MB) copied, 0.994183 s, 132 MB/s
513+0 records in
512+0 records out
268435456 bytes (268 MB) copied, 1.99666 s, 134 MB/s
774+0 records in
773+0 records out
405274624 bytes (405 MB) copied, 2.99813 s, 135 MB/s

[...]

Update 23/5/2012: Reader David Wittman suggests a really nice tip for an easier and nicer way of showing progress during the copy using pv (pipe viewer):

root@vmserver:~# dd if=/dev/virtual-machines/ubuntu-1004-server-template bs=512K | \
pv -s 5G | \
dd bs=512K of=/storage/kvm-templates/ubuntu-1004-server.amd64
^C+03MB 0:00:08 [ 124MB/s] [======================> ] 19% ETA 0:00:3

You will notice that due to the piping the copy speed is reduced a little (with the same block size, in my case copy was ~10MB/sec slower), but this is indeed a much nicer way of showing progress and it also requires less typing too.

Backing up with LVM snapshots

The way I just suggested to backup a virtual disk only works well if the virtual machine isn’t currently in use, i.e. it is powered off. If changes are being made to the disk while making a backup, the backup might be invalid and a subsequent restore could either fail or restore an unusable virtual machine. This is one of the reasons why I recommend to use LVM for the storage pool: it is possible to instantly create a snapshot of a disk, on the fly, even without having to shut down the virtual machine first. The snapshot is a reliable “view” of the disk at the exact moment the snapshot was taken, so any changes made to the disk during the backup won’t affect that “view” of the disk in the given point in time.

This enables us to make reliable backups of a virtual machine’s disk even while the virtual machine is running, by simply backing up the snapshot rather than the main disk. The first step is creating the snapshot:

lvcreate -L1G -s -n /dev/virtual-machines/ha1-snapshot /dev/virtual-machines/ha1
Logical volume "ha1-snapshot" created

You can verify the presence of the snapshot for example with lvs again:

root@vmserver:~# lvs virtual-machines
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
backup-tests virtual-machines -wi-a- 7.81g
ha1 virtual-machines owi-ao 7.81g
ha1-snapshot virtual-machines swi-a- 1.00g ha1 0.01

As you can see there’s a new logical volume having ha1 as the origin. LSize is different though: this doesn’t represent the actual size of the snapshot, but the max size of all the changes made to the original volume since the snapshot was created, that the snapshot can contain. The example sets this argument to 1G but even a smaller value should be enough for making backups. It it also important to note that snapshots can seriously impact on the performance of the disk sub system: this is because snapshots work with the copy on write strategy. This means that the performance of the disk layer can degrade proportionally to the number of snapshots created (you can have as many snapshots as you need for a given origin volume, provided you are aware of the performance penalty).

The performance implications should suggest that LVM snapshots should not be used as a backup strategy per se. A good approach is to temporarily create a snapshot of a volume, make a backup copy using the snapshot, and discard the snapshot upon completion of the backup. This way, if the backup is quick and there aren’t many changes being made to the original volume in the meantime, the performance penalty of the snapshot can be very small for a short time.

So, in the previous example, once I have created a snapshot I can proceed with the actual backup, by making a copy of the snapshot instead of the original volume:

dd if=/dev/virtual-machines/ha1-snapshot of=<backup file name> bs=512K& pid=$!; \
sleep 1; while [[ -d /proc/$pid ]]; do kill -USR1 $pid && sleep 1; done

Once this is completed, I can discard the (no longer needed) snapshot:

lvremove /dev/virtual-machines/ha1-snapshot

Cloning a virtual machine

virt-manager has a handy cloning functionality, so in most cases you will want to just use that. The alternative with the command line is creating a new logical volume of the same size as the original first, and then restoring the disk from backup or just copying directly from the other disk’s logical volume; eventually, this new disk would be attached to the new vm clone. For example, if restoring from a backing:

# first, take note of the exact size in bytes of the original disk #
root@vmserver:~# ls /storage/templates/ubuntu-10.04-server.disk -l
-rw-r--r-- 1 root root <bytes> May 8 15:22 /storage/templates/ubuntu-10.04-server.disk

# create the new logical volume
lvcreate -<bytes>b -n <new volume name> <volume group>

# proceed with the restore with dd #
dd if=/storage/templates/ubuntu-10.04-server.disk \
of=/dev/<volume group>/<new volume> bs=512K

The example basically shows how I create a new virtual machine’s disk from a “template” (ubuntu-10.04-server.disk is a backup of a virtual machines with already all the tools and packages I always need on my VMs).

Note: when cloning a virtual machine with the command line method (can’t remember if this applies to VM cloned with virt-manager too), the cloned VM might have some networking issues due to the different MAC address. In particular you might see errors such as “SIOCSIFADDR: No such device“. To fix, just run:

echo "" > /etc/udev/rules.d/***net.rules

Then reboot the virtual machine and networking should work as usual.

Mounting a virtual machine’s partitions on the host

In some cases, you may need to mount a virtual machine’s disk (or, better, the partitions in it) directly on the host. How to do this slightly differs depending on whether you are using LVM also inside the virtual machine or not, so I’ll show the steps required in each case.

If LVM is not used also within the virtual machine disk

Check which partitions are available inside the virtual machine’s disk:

fdisk -l /dev/<volume group>/<volume>

Make the partitions available for mounting:

kpartx -av /dev/<volume group>/<volume>

Verify that the partitions are now available for mounting:

ls /dev/mapper/<volume group>-<volume>*

Mount a partition (e.g. the first one is usually the boot/main partition):

# recommended -o ro for read only
mkdir -p /mnt/<mount name> && mount -o ro /dev/mapper/<partition> /mnt/<mount name>

Verify the partition has been correctly mounted:

ls /mnt/<mount name>

You can at this point backup the content of the partitions or do whatever you want with it. Once done, it’s time to…
Clean up when the mount is no longer needed:

umount /mnt/<mount name> && kpartx -dv /dev/<volume group>/<volume>
If LVM is also used within the virtual machine disk (“nested” LVM)

Check which partitions are available inside the virtual machine’s disk:

fdisk -l /dev/<volume group>/<volume>

Make the partitions available for mounting:

kpartx -av /dev/<volume group>/<volume>

# vgs: will show a new volume group that is basically the volume group defined
# inside the virtual machine's disk, but now directly available on the host;
# take note of the name of this volume group as you'll need it for the next command

vgs

# Example output on my vm server - note the new "ubuntu" volume group;
# this is actually the volume group inside the virtual machine, now accessible
# directly from the host.
# VG #PV #LV #SN Attr VSize VFree
# storage 2 2 0 wz--n- 2.73t 0
# ubuntu 1 2 0 wz--n- 7.57g 28.00m
# virtual-machines 1 4 1 wz--n- 1.36t 1.34t
# vmserver 1 2 0 wz--n- 279.22g 12.00m

vgchange -ay <that volume group>

Check that the partitions are now available for mounting:

Slightly different form the non-nested-LVM scenario. In this case, ls /dev/mapper will still show some items named after the virtual machine disk (i.e. <volume group>-<volume>X) and as many new items as the number of partitions inside the virtual machine disk, named as <volume group>-<partition name>. These are the partitions you want to mount. For example:

root@vmserver:~# ls /dev/mapper/ -l
total 0
crw------- 1 root root 10, 236 May 7 17:46 control
lrwxrwxrwx 1 root root 7 May 7 20:49 storage-backups -> ../dm-0
lrwxrwxrwx 1 root root 7 May 7 20:49 storage-storage -> ../dm-1
lrwxrwxrwx 1 root root 8 May 21 20:05 ubuntu-root -> ../dm-13
lrwxrwxrwx 1 root root 8 May 21 20:05 ubuntu-swap_1 -> ../dm-14
lrwxrwxrwx 1 root root 7 May 18 11:34 virtual--machines-backup--tests -> ../dm-5
lrwxrwxrwx 1 root root 7 May 21 18:51 virtual--machines-ha1 -> ../dm-6
lrwxrwxrwx 1 root root 8 May 21 20:05 virtual--machines-ha1p1 -> ../dm-10
lrwxrwxrwx 1 root root 8 May 21 20:05 virtual--machines-ha1p2 -> ../dm-11
lrwxrwxrwx 1 root root 8 May 21 20:05 virtual--machines-ha1p5 -> ../dm-12
lrwxrwxrwx 1 root root 7 May 21 18:51 virtual--machines-ha1-real -> ../dm-8
lrwxrwxrwx 1 root root 7 May 21 18:51 virtual--machines-ha1--tmp -> ../dm-7
lrwxrwxrwx 1 root root 7 May 21 18:51 virtual--machines-ha1--tmp-cow -> ../dm-9
lrwxrwxrwx 1 root root 7 May 7 20:49 vmserver-root -> ../dm-2
lrwxrwxrwx 1 root root 7 May 7 20:49 vmserver-swap_1 -> ../dm-3

The important ones in my example are ubuntu-root and ubuntu-swap_1. These represent the actual partitions inside my virtual machine’s disk, and in this example I could need to mount the main partition, which is the first one.

Mount a partition (e.g. the first one is usually the boot/main partition):

# recommended -o ro for read only
mkdir -p /mnt/<mount name>
mount -o ro /dev/mapper/<volume group>-<partition> /mnt/<mount name>

Again, you can at this point backup the content of the partitions or do whatever you want with it. Once done,

Clean up when the mount is no longer needed:

umount /mnt/<mount name>

# the volume group here is the one from inside the virtual machine;
# "ubuntu" in my previous example

vgchange -an <volume group>

# the volume group here instead is that that contains the virtual machine's disk.
kpartx -dv /dev/<volume group>/<volume>

That’s pretty much it for now on KVM LVM based virtual machines; know of other tips or anyway useful administration tasks? Please let me know in the comments. I might also update the post later with some more.

KVM virtual machines: setting up the host

Note: this is a sort of mini series in two parts on getting started with KVM. More to come in the next post which I will publish in the next few days.

Why KVM virtual machines?

It occurred to me today that I haven’t written any new posts in a while… I have been very busy with work but I also moved from the UK to Finland, recently, and haven’t had a chance to even think about the blog in the meantime. But now that I am settled, I thought a good way to start again would be writing about a subject that is new to this blog (and somewhat long overdue): virtualisation.

The post won’t be a detailed description of the various technologies etc. – there’s Wikipedia for that. I’ll describe here setup and basic maintenance of a virtualisation environment similar to what I use regularly myself for development and testing; it’s a fairly simple setup, although its operation is done through a mix of a GUI and, often, command line tools. For the virtual machine host I use KVM. Again, I won’t go into the details of the technology here (I don’t know much about its architecture anyway, besides how to use it), but I’ll briefly highlight here what are the reasons why I like it and use it for development and testing:

  • KVM is already part of the Linux kernel, therefore it doesn’t require the installation of separate software (albeit you may have to install some packages, depending on your distro, in order to enable the hypervisor). So it’s not as complicated to install and configure as other hypervisors can be.
  • KVM is based on a simpler architecture than that of other hypervisors, in that it leaves the management of several things to Linux itself, rather than having to deal with scheduling, devices, etc. Therefore KVM is “focused” on just the particular features that concern the hypervisor. Because of this, and because it’s part of the Linux kernel, chances are that its development will be pretty fast over the time compared to that of competing hypervisors.
  • KVM already has some great support from companies such as Redhat, IBM, Intel, Cisco, etc. with some of them (e.g. Redhat) pushing KVM as the preferred hypervisor these days.
  • Performance is great is most scenarios.
  • KVM works with normal, unmodified operating systems, unlike other hypervisors.

Caveats

  • KVM requires CPUs with hardware support for virtualisation (Intel VT-x or AMD-v), in order to work. However nearly all the server CPUs as well as many desktop ones offer virtualisation capabilities these days, so it’s no longer an issue, really. My work machine is a MacBook Pro, but I also own a PC with hardware virtualisation support, so it’s working great as KVM virtual machines host.

The setup I am about to describe works fairly well for me, since I just need to manage a few virtual machines at any time, but if you are looking for something more scalable (especially if you are looking to sell services based on virtualisation) and/or would prefer managing your virtualisation environment almost completely with a user friendly GUI, you really should look into a complete solution such as OnApp‘s cloud management software (disclaimer: I work for OnApp), which would give you a lot more than just a management tool for your virtualisation layer, and supports several hypervisors. You can even register for a fully featured free license that will let you manage hypervisors with up to 16 cores.

If you are looking for a super quick setup, and development & testing is the reason why you want to set up a virtualised environment, then the setup I am about to describe will be fine. You can also look into alternatives such as Virtualbox if you want to minimise the administration with command line tools.

If you go for my setup, be warned that graphics performance of VMs with Windows will be terrible, so if you need to virtualise Windows you may want to use something else. I still use Parallels Desktop on my Mac, rather than KVM virtual machines, whenever I need to test something on Windows, and its performance is just great.

Setup – introduction

Please note that I mostly use Ubuntu as distro, so the following instructions will be based on Ubuntu. However it shouldn’t be difficult to adapt them to other distros. Also note that while it is not strictly required, it is usually recommended to use a 64 bit Linux distro as KVM virtual machines host, so to ease management of amounts of memory larger than 4GB, among other things. I also recommend you to install the server edition of the OS, so to have more resources available to the KVM virtual machines. Thanks to X11 forwarding to your work machine, using a sever edition of Linux on the host won’t prevent you from using a GUI to perform part of the administration of your virtual machines, as we’ll see later.

I’ll assume here that you’ve already installed the OS on your host, and that you want to store the KVM virtual machines disks as logical volumes through LVM. Using LVM makes it easier to manage the available storage in general, and it also makes it easier to backup your virtual machines (without shutting them down) thanks to LVM snapshots, as we’ll see later in this post.

First of all, you need to ensure you can use KVM virtual machines on your host, since this depends on the CPU installed:

# as sudo
root@vmserver:~# egrep -c '(vmx|svm)' /proc/cpuinfo
4

The number you see in the output is the number of cores in your CPU(s) that have hardware support for virtualisation; if the number is 0, it means you won’t be able to run KVM virtual machines on your host. In this case, you may also want to check whether the hardware support is enabled in the BIOS or not.

Setting up LVM

Unless you’ve already partitioned your system disk with LVM when installing the OS on your host, you’ll need to install LVM.

sudo apt-get install lvm2 lvm-common

Next, you need to “initialise” each physical storage device that you want to use with LVM. This is done with the pvcreate command. You can list the physical disks available with:

fdisk -l /dev/{s,h}d? 2> /dev/null | grep '^Disk /dev.*'

(or just use the quicker df if you know what the info displayed means). For example in my case this is what I see:

Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdb: 300.1 GB, 300069052416 bytes
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes

You can see that I have a 300GB drive which I use as system drive for the OS, and 3 x 1.5TB drives that I use for KVM virtual machines, backups and general storage. So in my case to use these disks with LVM I had to run:

pvcreate /dev/sda
pvcreate /dev/sdc
pvcreate /dev/sdd

You can check that LVM now “knows” about the drive(s) with pvs:

root@vmserver:~# pvs
PV VG Fmt Attr PSize PFree
/dev/sda virtual-machines lvm2 a- 1.36t 1.36t
/dev/sdb5 vmserver lvm2 a- 279.22g 12.00m
/dev/sdc storage lvm2 a- 1.36t 0
/dev/sdd storage lvm2 a- 1.36t 0

You can also use the pvscan or pvdisplay commands to see a few more details.

You can now create volume groups; a volume group is a collection of one or more physical disks, and contains one or more logical volumes created on those disks. A logical volume is simply some sort of allocation of space that belongs to the parent volume group, and will contain the partitions in turn containing the actual data.

For example on my host I dedicated one of the 1.5TB physical disks (/dev/sda) to a volume group named virtual-machines for use with KVM virtual machines, and the other two to a volume group named storage for backups and general storage. From the previous snippet you may notice in the “VG” column that I also have another volume group named vmserver (VG stands for volume group). This only contains the 300GB physical disk I use as system disk; the reason I have the OS also on a volume group is that when I installed Ubuntu on this host I chose to use LVM also to partition the system disk. You can create volume groups with the vgcreate command:

# create a volume group with a single disk
vgcreate <name of the volume group> /dev/sda

# create a volume group with multiple disks
vgcreate <name of the volume group> /dev/sda /dev/sdb

It is also possible to add to or remove disks from an existing volume group, which is one of the reasons why LVM is pretty flexible:

# add a disk to an existing volume group
vgextend <name of the volume group> /dev/sdc

# remove a disk from an existing volume group
vgreduce <name of the volume group> /dev/sdc

You can even rename volume groups, if needed:

vgrename <old name of the volume group> <new name>

Like for the physical disks, you can list the volume groups with vgs:

root@vmserver:~# vgs
VG #PV #LV #SN Attr VSize VFree
storage 2 2 0 wz--n- 2.73t 0
virtual-machines 1 1 0 wz--n- 1.36t 1.36t
vmserver 1 2 0 wz--n- 279.22g 12.00m

Or vgscan / vgdisplay if you want more details to be displayed. If you want to show details for a single volume group, just pass the volume group’s name as argument.

We’ll see later how to manage logical volumes. For now, make sure you have a volume group ready to contain the KVM virtual machines disks as logical volumes (in my case, it is “virtual-machines”).

Setting up KVM virtual machines

(Note: from this point on I am assuming you’ve started a session as root/sudo since it’s easier)

Installing / enabling KVM is pretty easy. For example on Ubuntu all you need is:

apt-get install qemu-kvm libvirt-bin ubuntu-vm-builder bridge-utils virt-manager

You can verify that KVM is correctly configured with the kvm-ok command:

root@vmserver:~# kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used

And, to verify that it is possible to connect to KVM:

root@vmserver:~# virsh -c qemu:///system list
Id Name State
----------------------------------

If you see the same output everything is OK. The list should be empty for now since you don’t have any virtual machines yet at this stage.

Creating the first virtual machine

When you’ve installed the necessary packages with apt-get, you’ve also installed a GUI that you can use to create KVM virtual machines and perform some of the administration without having to use command line tools. I never manage to remember all the commands and all the syntax, so I find it handy. If you have followed my advice earlier and have installed a server edition of the OS, you won’t be able to run this GUI directly on the host, since the host OS won’t have a full desktop environment configured.

You can still use the GUI from your work machine, though, thanks to X11 forwarding. First, you need to ensure that your user on the hypervisor host is allowed to run virt-manager:

sudo adduser `id -un` kvm
sudo adduser `id -un` libvirtd

Then log off and on again.

Now, if you use Linux on your work machine start a new SSH session to the KVM server passing the argument -X:

ssh -X <hostname or IP of your virtual machine host>

And then just run virt-manager, which is the command you need to start the KVM administration GUI. You may also need to set the DISPLAY environment variable – you’ll see an error if you do.

On Mac, you can use instead the -Y argument, which doesn’t require you to se the DISPLAY env variable:

ssh -Y <hostname or IP of your virtual machine host>

If when you run virt-manager from within an SSH session, the GUI doesn’t show up, add the -v argument and SSH will show some information that might help you figure out what went wrong. If the output shows something like “connection refused” or similar, you may need to authorise your virtual machine host to run X11 apps on your Mac. To do this, run xterm on your Mac, and from that terminal run

xhost + <IP of the virtual machine host>

At this point you should be able to run virt-manager remotely, from your work machine. You should see the GUI in the picture below if X11 forwarding is working correctly:

20120122-1jtrj9cm1peihyqu5nq216tj3u-1-jpg-1c6b9a

Now from Edit choose Connection Details, and open the Storage tab. Click the + button to add a storage pool, and select logical: LVM Volume Group from type. Then select the volume group you’ve created earlier, and that you want to use to store your virtual machines:

20120507-r18g9s1bs2i3tdsdfd483nb7ew-jpg-1c6b9a

Choose a name (I usually choose LVM so to remember that I am using a volume group to store my KVM virtual machines), and create the storage pool. I also recommend to remove the default storage pool (- button) so to avoid placing KVM virtual machines in it by mistake, since it doesn’t use LVM. Your Connection Details window should now look similar to the one in the following picture, with only the LVM storage pool configured:

20120507-kwepti8tepr1f4pmrtkxgng54j-jpg-1c6b9a

Remember to check the Autostart option on the LVM storage pool and to click Apply to save the changes. You can now close the Connection Details window and click New to create your first virtual machine. The wizard is pretty straightforward: just give the VM a name, and optionally select the type of the OS you want to install in the VM (I am not sure if it actually makes any difference TBH); then select whether you want to insert the OS installation disc into the optical drive of the host, or whether you want to use an ISO image. Finally, select the amount of memory you want to allocate for your machine, the number of cores you want it to use (just pick the max available).

Next, you’ll have to create the virtual disk for your VM: choose Select managed or other existing storage => Browse. Then select the LVM storage pool (it should be the only one available), and then create a new volume within that storage pool.

20120507-eh9hf41qpn4i2cj1ggagd53y2n-jpg-1c6b9a

Choose the newly created volume and complete the creation of your virtual machine. The VM will be automatically started and you’ll be able to install and setup your virtual machine from a nice graphical console:

20120507-jyyeb295dnntpu9iyxj8bw723t-jpg-1c6b9a

Fixing” networking

One thing that you will notice soon when setting up and working with your KVM virtual machines is that they cannot communicate with the other devices and computers on the local network, nor can you SSH into them directly from your work machine. This is because of the default NAT setup that comes with the KVM virtual machines unless you customise the configuration of their (virtual) network nics.

To fix this, and allow your VMs to communicate with your normal network, you need to switch to a bridged configuration.

First of all, it’s easier if your host has a static IP address. If that’s not the case, edit /etc/network/interfaces (as sudo/root) and change:

auto eth0
iface eth0 inet dhcp

with

auto eth0
iface eth0 inet static
address 192.168.1.100
netmask 255.255.255.0
network 192.168.1.0
broadcast 192.168.1.255
gateway 192.168.1.1

or something similar depending on your network configuration. Then add the following section and save the file.

auto br0
iface br0 inet dhcp
bridge_ports eth0
bridge_stp off
bridge_fd 0
bridge_maxwait 0

Now edit /etc/resolv.conf and set the correct nameservers if needed. Next,

sudo apt-get install bridge-utils
sudo /etc/init.d/networking restart

The host should now be good to go. You can check whether the bridge has been correctly set up with:

root@vmserver:~# ifconfig br0
br0 Link encap:Ethernet HWaddr 00:01:29:a6:5e:45
inet addr:192.168.0.8 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::201:29ff:fea6:5e45/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1665925 errors:0 dropped:0 overruns:0 frame:0
TX packets:1036864 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1735996352 (1.7 GB) TX bytes:578254981 (578.2 MB)

Next, you need to update your virtual machine(s). From virt-manager, open a VM’s details window, and in the configuration of the virtual NIC change from NAT,

20120507-1de7n9p7s1ryjijba2hkc4ixwq-jpg-1c6b9a

to

20120507-j6rw255qjbpjjdb39h27dqd69d-jpg-1c6b9a

You will need to shutdown and restart the VM for the new configuration to be effective. Done that, you should be able to SSH into the VM from another device on the normal network and see those devices from within the VM.

Conclusion part 1; what’s next

This concludes this first part of this post on getting started with KVM in general. Over the next few days I will publish a second part on:

  • how to backup a VM’s raw disk
  • how to mount a virtual disk’s partition to a location on the host, so to access the data directly
  • how to take advantage of LVM snapshots for consistent backups
  • possible issues you may encounter when cloning virtual machines

So, if you are interested in knowing more, stay tuned! In the meantime, I hope you’ll find this first part useful. As usual, please let me know in the comments if there’s anything you’d like to add.