CentOS Parallels VM and missing network configuration

I was using CentOS with Parallels today, and had problems with networking after cloning a template VM into several VMs. Basically, after cloning the template, the clones appear to report only the loopback interface and one eth interface which seems to be inactive, so of course Internet doesn’t work:

[root@centos ~]# ifconfig -a
eth1 Link encap:Ethernet HWaddr 00:1C:42:22:36:26
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:2464262 errors:0 dropped:0 overruns:0 frame:0
TX packets:1221954 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3716624972 (3.4 GiB) TX bytes:106808282 (101.8 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:3502 errors:0 dropped:0 overruns:0 frame:0
TX packets:3502 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:359663 (351.2 KiB) TX bytes:359663 (351.2 KiB)
[root@centos ~]# ping 8.8.8.8
connect: Network is unreachable

I am not too familiar with CentOS so I googled and found out that networking is disabled in the default installation or something like that.

Anyway, in case someone runs into the same issue, if you run ifup it complains that the configuration for the eth interface could not be found:

[root@centos ~]# ifup eth1
/sbin/ifup: configuration for eth1 not found.
Usage: ifup <device name>

I’ve had this particular issue – missing network configuration – only with CentOS VMs, but networking doesn’t work with Ubuntu VMs either after cloning. On Ubuntu however I usually run

rm /etc/udev/rules.d/70-persistent-net.rules

and then reboot the VM, and that usually fixes it. I tried the same on the CentOS clones but it didn’t work.

It turns out on the CentOS clones there is a profile for the loopback interface and a profile for eth0 but not for eth1 – which is the interface I see in the VMs after cloning – and that’s the the reason why the configuration could not be found:

[root@centos ~]# ls /etc/sysconfig/network-scripts/ifcfg*
/etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-lo

So the way I fixed the missing configuration was by making a copy of the eth0 profile for eth1, and updating the content of the new profile with the correct device name and MAC address. First, make a copy of the profile:

[root@centos ~]# cd /etc/sysconfig/network-scripts/
[root@centos network-scripts]# cp ifcfg-eth0 ifcfg-eth1

Then, open the new profile with any editor and make sure the DEVICE name is eth1 (or whatever ethX it is for you if you have removed/added virtual NICs) and that HWADDR is set to the MAC address of the VM:

DEVICE=eth1
HWADDR=00:1C:42:22:36:26
TYPE=Ethernet
UUID=6326455c-37eb-48f7-b2a4-0dbf113e3c93
ONBOOT=no
NM_CONTROLLED=yes
BOOTPROTO=dhcp

You can find the MAC address in the Network > Advanced Settings of the virtual machine:

screen-shot-2015-12-11-at-18-37-35-2

Then, run

[root@centos network-scripts]# ifup eth1

Determining IP information for eth1... done.

And Internet should now work:

[root@centos network-scripts]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=53 time=25.9 ms
...

That’s it. Not sure why this happens but anyway it’s easy to fix.

MySQL: Generate column names dynamically from row values

Let’s say you have a table or the results of a query with some values by date in different rows. You want to generate column names dynamically from these dates, and show the relevant results as values for these new columns instead of values in different rows. So you basically want to transpose rows into columns and have dynamically generated column names at the same time. Unfortunately MySQL doesn’t allow the use of functions to generate column names, and as far as I know it doesn’t have a means out of the box to generate column names dynamically in general (please let me know if I am mistaken; I’m keen to learn something new), but it is definitely possible at least with a trick using prepared statements. Let me show you with an example.

I have a table, named hcu_collection, which I use to collect some data for each of a number of software licenses. The relevant columns in this example are the license_id, the collect_date and an integer column named total_hv_cores (it’s just an example from a real app so ignore the meaning of this column as it’s not important); for example I want to know the MAX(total_hv_cores) by date for each license over the past 3 days. So I can use a simple query like the following:

SELECT license_id, collect_date, MAX(total_hv_cores) cores
FROM hcu_collection
WHERE collect_date >= SUBDATE(CURDATE(), 2)
GROUP BY license_id, collect_date
ORDER BY collect_date ASC, collect_hour ASC;

which produces these results:

+------------+--------------+-------+
| license_id | collect_date | cores |
+------------+--------------+-------+
| 18 | 2015-12-04 | 1108 |
| 67 | 2015-12-04 | 436 |
| 102 | 2015-12-04 | 140 |
...
...
| 12174 | 2015-12-10 | 78 |
| 12380 | 2015-12-10 | 624 |
...
...

What I want instead is a table that looks like the following, for example for the past 3 days:

+------------+-----------+-----------+-----------+
| license_id | Tue 08/12 | Wed 09/12 | Thu 10/12 |
+------------+-----------+-----------+-----------+
| 2 | 238 | 238 | 246 |
| 3 | 60 | 68 | 68 |
| 4 | 12 | 16 | 12 |
| 7 | 212 | 212 | 220 |
...
...

As said I am not aware if MySQL already has some means to achieve this, so the way I have done it is by generating a query dynamically which, when executed, will then generate the column names from the dates as I want.

The first step is to create a temporary table with the results from the original query, for convenience, since we are going to need these results more than once in the query that will be generated dynamically.

DROP TABLE IF EXISTS tmp_results;

CREATE TEMPORARY TABLE tmp_results AS
SELECT license_id, collect_date, MAX(total_hv_cores) cores
FROM hcu_collection
WHERE collect_date >= SUBDATE(CURDATE(), 2)
GROUP BY license_id, collect_date
ORDER BY collect_date ASC, collect_hour ASC;

Next, we need to generate a new query dynamically. Here’s an example:

SELECT CONCAT('
SELECT license_id, ',cores_by_dates,'
FROM tmp_results
GROUP BY license_id
ORDER BY license_id'
)
INTO @query
FROM
(
SELECT GROUP_CONCAT(CONCAT('IFNULL(MAX(CASE WHEN collect_date=''',actual_date,''' THEN cores END), ''-'') AS "',col_name,'"')) cores_by_dates
FROM (
SELECT actual_date, DATE_FORMAT(actual_date,'%a %d/%m') AS col_name
FROM (SELECT DISTINCT collect_date AS actual_date FROM tmp_results) AS dates
) dates_with_col_names
) result;

The important bit is

SELECT GROUP_CONCAT(CONCAT('IFNULL(MAX(CASE WHEN collect_date=''',actual_date,''' THEN cores END), ''-'') AS "',col_name,'"')) cores_by_dates
FROM (
SELECT actual_date, DATE_FORMAT(actual_date,'%a %d/%m') AS col_name
FROM (SELECT DISTINCT collect_date AS actual_date FROM tmp_results) AS dates
) dates_with_col_names

which generates something like:

IFNULL(MAX(CASE WHEN collect_date='2015-12-08' THEN cores END), '-') AS "Tue 08/12",IFNULL(MAX(CASE WHEN collect_date='2015-12-09' THEN cores END), '-') AS "Wed 09/12",IFNULL(MAX(CASE WHEN collect_date='2015-12-10' THEN cores END), '-') AS "Thu 10/12"

We save this new query in @query so that we can use it to prepare a statement:

PREPARE statement FROM @query;

Last, we just need to execute it:

EXECUTE statement;

This shows the results I want, with the dates as column names. Don’t forget to deallocate the prepared statement after fetching the results:

DEALLOCATE PREPARE statement;

Note: depending on how many dates you use to generate the columns, you may exceed the limit allowed for GROUP_CONCAT‘s length (default is 1024 bytes). So you may need to add something like

SET SESSION group_concat_max_len = 1000000;

before the dynamic generation of the query.

Hope it can be useful to someone.

I’m back!

Well, it’s been a while since I’ve last updated this blog and I just wanted to make a short little post here to update y’all on a few things. It seems it’s not just me – the blog thing is kinda dying lately from what I can see from blogs I used to visit often and that are now rarely updated.

Anyways, I’m back after a long hiatus. I somehow missed it and am going to post again every now and then mainly with tips, tricks, and solutions to problems I encounter in my work life, especially on web development, as usual.

Not sure where I should start. Perhaps with a super brief update about myself?

Last time I blogged was in April 2014, so yeah, it’s been a while. I can’t say/remember much of what happened during the rest of that year, so I’ll say something quick about this past year.

In a nutshell, 2015 has been a overall good year, albeit I can’t say there have been any particularly important events. I still live in Finland – well, it doesn’t look I’m going anywhere else anymore – and I still work for OnApp, managing a small team of developers based in London. So as usual I work remotely, and visit the London office every now and then.

I’m still happy with my job, although sometimes I feel like it would be nice to do something different. But it’s nice to be able to kinda manage my own time since I work from home, although I usually try to be available during UK office hours. I wish I had more free time to work on some side project though.

Work aside, unfortunately I had to give up boxing mainly for health related reasons. I really miss it, as it’s the only kind of sport I would never get bored with. Anything else I tried just didn’t work for me. So the result is that I am not fit at all.. and that’s not a good thing, especially because I spend most of the time sitting in front of a computer. Anyways…

Here in Finland days are quickly getting shorter and darker, and the weather in general is depressing. Would be nice to spend some time in a sunnier place at the moment.

A little site news for the devs and bloggers who might be reading this: I have just switched from static pages hosted on Github Pages to Ghost. So I started with a self hosted WordPress, switched to Jekyll, then back to WordPress, then I decided to stop blogging and just to keep the existing posts available to Googlers I generated a static version of the blog which I published on Github pages. So it looks like I spent more time switching blogging software than actually writing posts. LOL.

Eventually, I switched to Ghost, but this time I am using a hosted service and I am kinda committed to it (I already paid for the first year). I didn’t want to bother with having to keep the blog software up to date on my servers, so it’s easier this way.

I really like Ghost! It’s pretty fast and its very simple, clean design makes blogging fun again. It’s not cluttered and bloated like WordPress, and it’s so nice to just write posts in Markdown (I know you can do this in WordPress too) with a live preview on the side. It makes blogging much easier and quicker. I can just focus on what I want to write more easily.

I guess that’s all for now. Stay tuned for updates and thank you for stopping by 🙂

Cheers
Vito

Downtime and DDoS against PowerDNS.net

This site is back to normal now, after problems caused by a DDoS were resolved earlier today.

The attack was not against the site/server directly, but against the DNS service I’ve used until this morning, PowerDNS.net, resulting in my domains not being accessible for around 12 hours between 09:12:06PM GMT yesterday and 09:07:06AM GMT today (according to Pingdom).

Luckily this is just a personal blog and not a business, otherwise it could have cost me money. Nevertheless I am glad that everything is back to normal now. It’s a shame that the site was offline for that long, but at the same time my wife and I may have not received emails for a while, so I am more worried about the email services when the domains are not accessible.

While searching on Twitter for clues as to what was going on, I learnt that PowerDNS and PowerDNS.net are actually two distinct companies even with the same logo!… how confusing. Several people (me included) were asking @powerdns for help which they couldn’t provide while @PowerDNSNet, the company under attack (PowerDNS.Net Hosting by Trilab) remained silent.

No notice, email, explanation, status update on Twitter or else, was given during the outage. Frustrating and unprofessional. Only a few hours ago a tweet appeared in the PowerDNS.net feed saying:

Some of our ip’s have been nulled by our provider as traffic for them affected infrastructure and created latency/packet loss.

The lack of communication during the outage was enough for me to switch to the Amazon Route 53 service. Besides, PowerDNS.net has failed multiple times lately; I know that you can’t blame a provider if they are suffering from an attack, but ultimately the customer is affected. I hope that Amazon’s scale would at least make it more difficult for an attack to bring the service down.

A DDoS towards a DNS service or registrar reminds how easy it is these days for sites to go down even without being attacked directly.

At least for what concerns DNS services, the lesson learned is that using two services together vs a single service may be a good idea. I will likely use something else together with AWS Route 53. As said email especially is very important and I don’t want this to be affected if a DNS service is experiencing downtime.

Easier backups with duplicity and xtrabackup

A little while ago I wrote a couple of scripts to take backups with duplicity and xtrabackup more easily; I am a little allergic to all the options and arguments you can use with both duplicity and xtrabackup, so these scripts use simple configuration files instead.

You can find these scripts on Github at https://github.com/vitobotta/admin-scripts.

xtrabackup

Xtrabackup is a great tool for taking backups (both full and incremental) of your MySQL databases without bringing them offline. When you first launch the script – admin-scripts/backup/xtrabackup.sh – without arguments, it will generate the simple configuration file as ~/.xtrabackup.config, containing the following configuration settings – you only need to set the MySQL credentials, customise the paths of source and destination, and choose how many backup chains to keep:

MYSQL_USER="..."
MYSQL_PASS="..."
MYSQL_DATA_DIR=/var/lib/mysql
BACKUPS_DIRECTORY=/backup/mysql/
MAX_BACKUP_CHAINS=4

A backup chain is as usual made of one full backup and subsequent incrementals. The script – admin-scripts/backup/xtrabackup.sh accepts a single argument when you are taking backups, either full or incr. As these may suggest, in the first case a full backup will be taken, while the second case it will be an incremental. Backups are stored in the destination directory with the structure below:

/backup/mysql
├── full
│ ├── 2014-03-04_20-39-39
│ ├── 2014-03-09_02-00-04
│ ├── 2014-03-16_02-00-01
│ └── 2014-03-23_02-00-02
└── incr
├── 2014-03-04_20-39-53
├── 2014-03-04_20-41-21
├── 2014-03-05_02-00-02
├── 2014-03-05_13-00-02
├── 2014-03-06_02-00-07

I choose to store the incrementals separately from the full backups so to always have full backups ready for a simple copy if needed, but restoring from incrementals will work just fine. In order to restore, you can choose from any of the backups available – either full or incremental. To see the list of all the backups available you can use the list argument, which shows something like this:

> admin-scripts/backup/xtrabackup.sh list
Loading configuration from /root/.xtrabackup.config.
Available backup chains (from oldest to latest):

Backup chain 1:
...

Backup chain 2:
...

Backup chain 3:
Full: 2014-03-16_02-00-01
Incremental: 2014-03-16_13-00-01
Incremental: 2014-03-17_02-00-02
...
Incremental: 2014-03-21_13-00-01
Incremental: 2014-03-22_02-00-01
Incremental: 2014-03-22_13-00-02
Backup chain 4:
Full: 2014-03-23_02-00-02
Incremental: 2014-03-23_13-00-01
Incremental: 2014-03-24_02-00-03
Incremental: 2014-03-24_13-00-01
Incremental: 2014-03-25_02-00-01
Incremental: 2014-03-25_13-00-02

Latest backup available:
Incremental: 2014-03-25_13-00-02

Then, to restore any of the backups available you can run the script with the restore argument, e.g.

admin-scripts/backup/xtrabackup.sh restore 2014-03-25_02-00-01 <destination directory>

Once the restore is complete, the final result will be a destination directory ready for use with MySQL, so all you need to do at this stage (as the script will suggest) is:

  • stop MySQL
  • replace the content of MySQL’s datadir with the contents of the destination directory you’ve used for the restore
  • ensure the MySQL datadir is owned by the mysql user
  • start MySQL again

MySQL should happily work again with the restored data.

duplicity

The other script is a useful wrapper which makes it a bit easier to take backups of data with duplicity; like the other script, this script also uses a configuration file instead of lots of options and arguments, and this configuration file is generated as ~/.duplicity.config when you first run the script with no arguments. The content of this configuration file is as follows:

INCLUDE=(/backup /etc /home /root /usr/local/configuration /var/log /var/lib/mysql /var/www)

BACKUPS_REPOSITORY="rsync://user@host//backup_destination_directory/"

MAX_FULL_BACKUPS_TO_RETAIN=4
MAX_AGE_INCREMENTALS_TO_RETAIN=1W
MAX_AGE_CHAINS_TO_RETAIN=2M
MAX_VOLUME_SIZE=250

ENCRYPTION=1
PASSPHRASE=...

# Set ENCRYPT_KEY if you want to use GPG pub key encryption. Otherwise duplicity will just use symmetric encryption.
# ENCRYPT_KEY=

# Optionally use a different key for signing
# SIGN_KEY=
# SIGN_KEY_PASSPHRASE=

COMPRESSION_LEVEL=6 # 1-9; 0 disables compression; it currently works only if encryption is enabled

VERBOSITY=4 # 0 Error, 2 Warning, 4 Notice (default), 8 Info, 9 Debug (noisiest)

# Comment out the following if you want to run one or more scripts before duplicity backup.
RUN_BEFORE=(/root/admin-scripts/backup/xtrabackup.sh)

# Comment out the following if you want to run one or more scripts after duplicity backup.
#RUN_AFTER=()

Most of these settings should be self-explanatory. backups_repository uses by default duplicity’s rsync backend, so of course you need to have SSH access to the destination server. max_volume_size: duplicity automatically splits the backup into volumes and the script will use settings that have duplicity generate one volume while the previous one is being asynchronously transferred to the destination. This should make backups faster. The ideal value for max_vol_size is really difficult to determine as it depends on many things, but in my case I have found that a value of 250 with the other settings I use for compression and encryption, makes backups fairly fast. encryption of course enables/disables the encryption of the backup; if you are doing on site backup to servers you own and that noone else controls, then I’d disable this option so to make backups quicker. Otherwise I recommend to enable it if others have access to the backup files. Encryption can be done both with (GPG) keys, or without keys, using symmetric encryption with a passphrase. Then, you can set the compression level; I’d recommend the value 6 as from my tests higher compression slows down backups for little gain. As the comment in the configuration file suggests, compression is currently available only when encryption is also enabled.

Lastly, as you can see you can choose to run other scripts before and/or after the backup with duplicity is performed. In the configuration above you can also see that I normally run the backup with the xtrabackup script first, so that the backup taken with duplicity also includes the latest MySQL backup. I find this pretty useful. Like for the other script, you need to specify the full or incr argument when taking backups; this argument will automatically be passed to the scripts specified in run_before and run_after so, for example, when taking an incremental backup with duplicity, an incremental backup with xtrabackup is taken first.

Restoring latest backup available

Example:

duplicity -v debug rsync://user@host//backup_directory <destination>

Note: Duplicity will not overwrite an existing file.

duplicity – other useful commands

Restoring from backups with duplicity is a little more straightforward than backing up, so I haven’t added any commands for this in the script really. However I’ll add here, for reference, some useful commands you may likely need when restoring or else directly with duplicity. These are examples assuming you use duplicity with symmetric encryption, in which case you need to have the PASSPHRASE environment variable set and available:

export PASSPHRASE=... # the passphrase you've used in the configuration file; you'll need this will all

If you add these commands in some other scripts, remember to unset this variable with

unset PASSPHRASE
Listing available backups
duplicity -v debug collection-status rsync://user@host//backup_directory
Listing all files in current backup
duplicity -v debug list-current-files rsync://user@host//backup_directory
Restoring by date / specific files (e.g. 3 days ago)
duplicity -v debug -t 3D --file-to-restore FILENAME rsync://user@host//backup_directory <destination>

Also:

duplicity -v debug --restore-time 1308655646 rsync://user@host//backup_directory <destination> (unix time)
duplicity -v debug --restore-time 2011-06-21T11:27:26+02:00 rsync://user@host//backup_directory <destination>

Note: timestamps shown when listing available backups are in already in timezone, while the time on the server is in UTC. So a backup made e.g. on 24/2/2014 at 02:00 on the server means it will be listed as Mon Feb 24 04:00:35 2014. Restoring this backup means using the timestamp xxxx-xx-xxT02:00:00+02:00

If you are looking to use free tools, these scripts and commands should have your backup needs on servers covered in most cases.

Using Nginx to comply with a third-party API’s rate limits

API rate limits: the problem

I have just started a little pet project today that involves the integration of APIs of various social networks. In order to prevent abuse, among other reasons, these APIs usually restrict the number of requests that a client (normally identified by IP address) can make in a given amount of time, through rate limiting practices; an example is the Reddit API, which according to its access rules only allows 30 requests/minute per client.

Complying with this sort of API rate limits at application level, while possible, can be quite complicated, because there is the need to maintain some shared state across various instances of the application so that the API rate limits are not exceeded regardless of the instance making requests at any given time. I’m a Ruby developer, so in the past I have used a gem called SlowWeb to comply with a third party API’s rate limits. Unfortunately this gem is no longer maintained (last updates were 3 years ago), plus it is anyway limited in that it wouldn’t work by itself with multiple instance of the application since it doesn’t share state somehow by itself.

A simple solution

Wouldn’t it be cool if there was a way to comply with a third party API rate limits independently from our application, and without reinventing the wheel? This way there wouldn’t be any more the need to maintain some shared state across multiple instances of the application since the rate limiting would be handled separately. There’s a simple answer to this: web servers. It is trivial to implement such a solution with a web server like Apache or Nginx.

I normally use Nginx, so I’ll give you a very simple example (for Reddit API) with this web server. First, we need to add the following lines to Nginx’s main configuration:

http {
...

limit_req_zone $binary_remote_addr zone=api_name:10m rate=30r/m;

...
}

Then we need to add the following lines to a virtual host we’ll dedicate as wrapper for the third party API:

server {
listen 80;
server_name your_url.ext;

location / {
limit_req zone=api_name burst=30;
proxy_pass http://api_url.ext/;
}
}

That’s it! Now you can just use your custom URL in your application and stop worrying about the API rate limits. How it works is very simple: Nginx uses the builtin HttpLimitReqModule to limit the number of requests per session/client in a given amount of time. In the example above, we first define a ‘zone’ specifying that we want to limit requests to 30 per minute; then, in the virtual host, we let Nginx proxy all requests to the API’s URL with some “burstiness” unless the third party API does not allow this. Another bit of additional configuration you may want to add to the Nginx virtual host would be for caching, but I usually prefer handling this at application level, for example with Redis.

Know of other tricks to easily comply with API rate limits? Please let me know in the comments.

Jenkins CI with Rails projects

I’ve had to set up a Jenkins server for Rails projects today so I thought I’d write a post about it. Hopefully it will save someone time – I’ll assume here that your already know what Jenkins and CI are, and prefer setting up your own CI solution rather than using any commercial CI service. I will add here instructions on how to set up Jenkins on a Ubuntu server, so dependencies may be different if you use another distribution of Linux.

Dependencies

For starters, you need to install some dependencies in order to configure a fully functional Jenkins server for RSpec/Cucumber testing with MySQL, and Firefox or Phantomjs for testing features with a headless browser. You can install all these dependencies as follows – these dependencies also include everything you need to correctly install various gems required in most projects:

sudo apt-get install build-essential git-core curl wget openssl libssl-dev libopenssl-ruby libmysqlclient-dev ruby-dev mysql-client libmysql-ruby xvfb firefox libsqlite3-dev libxslt-dev libxml2-dev libicu48

Once these dependencies are installed, if you use Selenium with your Cucumber features you will have Firefox ready for use as a headless browser thanks to xvfb, which simulates a display. When xvfb is installed, the headless browser should already work with Jenkins with the project configuration I will show later. If that’s not the case, you may need to write an init.d script so that xvfb can run as a service. Here’s the content of such script (/etc/init.d/xvfb):

XVFB=/usr/bin/Xvfb
XVFBARGS=":1 -screen 0 1024x768x24 -ac +extension GLX +render -noreset"
PIDFILE=/var/run/xvfb.pid
case "$1" in
start)
echo -n "Starting virtual X frame buffer: Xvfb"
start-stop-daemon --start --quiet --pidfile $PIDFILE --make-pidfile --background --exec $XVFB -- $XVFBARGS
echo "."
;;
stop)
echo -n "Stopping virtual X frame buffer: Xvfb"
start-stop-daemon --stop --quiet --pidfile $PIDFILE
echo "."
;;
restart)
$0 stop
$0 start
;;
*)
echo "Usage: /etc/init.d/xvfb {start|stop|restart}"
exit 1
esac

exit 0

Of course you’ll need to make this file executable and then start the service:

chmod +x /etc/init.d/xvfb
/etc/init.d/xvfb start

In this example xvfb is configured to make the virtual display :1 available, so to make sure any app requiring it ‘finds’ it, you need to set the environment variable DISPLAY in your shell rc/profile file:

export DISPLAY=:1

If instead of Selenium/Firefox you are using Phantomjs as headless browser with your cucumber features, you need to install Phantomjs first. At the moment of this writing the latest LTS release of Ubuntu is 13.04, which has by default an old version of Phantomjs; Cucumber/Capybara will complain that this version is too old, so you need to install a newer version (e.g. 1.9) from source:

cd /usr/local/src
wget https://phantomjs.googlecode.com/files/phantomjs-1.9.0-linux-x86_64.tar.bz2
tar xjf phantomjs-1.9.0-linux-x86_64.tar.bz2
ln -s /usr/local/src/phantomjs-1.9.0-linux-x86_64/bin/phantomjs /usr/bin/phantomjs

Now if you run phantomjs –version it should return 1.9.0.

Jenkins

Once the dependencies are sorted out, it’s time to install Jenkins. It’s easy to do by following the instructions you can also find on Jenkins’ website. I’ll add them here too for convenience:

wget -q -O - http://pkg.jenkins-ci.org/debian/jenkins-ci.org.key | sudo apt-key add -
sudo sh -c 'echo deb http://pkg.jenkins-ci.org/debian binary/ > /etc/apt/sources.list.d/jenkins.list'
sudo apt-get update
sudo apt-get install jenkins

Jenkins’ UI should now be available on port 8080 (optionally you may want to configure a web server such Nginx as fronted to Jenkins). The first thing I recommend to do through the UI is to enable the security, otherwise anyone will have access to projects etc. You can secure Jenkins in many ways, but for the sake of simplicity I will suggest here the simplest one which is based on authentication with username and password. So go to Manage Jenkins > Configure Global Security, and check Enable security. Still on the same page, select Jenkins’ own user database under Security Realm and leave Allow users to sign up enabled for now.

Done this, follow the Sign up link in the top right corner of the page and sign up, so creating a new user. Then back to the Configure Global Security page, select Matrix-based security under Authorisation and add all permissions to the user you have just registered with. Then, disable Allow users to sign up – unless you do want other people to be able to sign up, rather than manually creating new users as needed.

Then log out and log in again just to make sure everything still works OK. If you have problems after these steps and can no longer access Jenkins, you can reset the security settings and try again.

Job configuration

I’ll assume here you are configuring Jenkins for a Rails project and that you use Git as SCM. Jenkins doesn’t support Git out of the box unfortunately, but you can easily fix this by installing the plugins GIT Plugin and GIT Client Plugin. You can install plugins under Manage Jenkins > Manage plugins > Available, where you can search for those plugins and select to install them (and, I recommend, to restart Jenkins after the plugins are installed so that the changes are effective immediately).

Next step is to create and configure a Job. Head to the main page, and then follow New Job; give the job a name and choose the type of job you want to create. In most cases you want to choose Build a free-style software project. You will be taken to the configuration page for the job. Under Source code management, choose Git and enter in Repository URL the URL… of your app’s repository. Before doing this though, make sure you can pull the code on the server by configuring SSH access and anything else needed – basically do a pull test manually from the terminal and ensure it works. Under Branches to build enter one or more branch that you want Jenkins to test against, e.g. */development.

Next, it is very likely that you want Jenkins to build the job automatically each time code is pushed to any of the branches the job is ‘watching’. There are a few ways to do so, called Build triggers on the job configuration page. The two methods I use are Trigger builds remotely with an authentication token and Poll SCM; in the first case, you’ll need to enter a token and then add a hook to the Git repository so that the trigger is automatically activated when new code is pushed. For example, in Bitbucket, you can do this on the page Hooks of the administration area of the repository; the hook to add is of type Jenkins and the format is:

http://USER:TOKEN@JENKINS_URL:8080/

The second method involves enabling Poll SCM in the job configuration page but without a schedule; then you’d add a POST hook with format:

http://JENKINS_URL:8080/git/notifyCommit?url=REPO_URL

In this case you may want to restrict these POST requests with a firewall or else. Either way, Jenkins will be notified whenever code is pushed and a build will be triggered.

Next, add an Execute shell build step under Build, and paste the following:

. /var/lib/jenkins/.bash_profile
rbenv global 1.9.3-p484
rbenv rehash
bundle install
cp config/database.yml.example config/database.yml
mkdir -p tmp/cache
RAILS_ENV=test bundle exec rake db:migrate db:seed
RAILS_ENV=test bundle exec rspec spec
DISPLAY=localhost:0.0 xvfb-run -a bundle exec cucumber features

Please note that I am assuming here that you have installed Ruby under the user jenkins (which is created automatically when installing Jenkins) with rbenv. If you have installed Ruby in a different way, you will have to adapt the build step accordingly. You may anyway have to make changes depending on your project, but the build step as suggested above should work with most projects.

The last piece of configuration left is email notifications, which you can customise as you like. Remember though to set Jenkins’ own email address under Configure system > Jenkins location.

That’s it – you can now test Jenkins by manually running a build or by pushing some code. Hope it helps.