Rails: Signing out from devices

In an app I’m working on, I wanted users to be able to sign out from any device they are signed in on, by invalidating logins. There’s a gem called authie that does this so you may want to check it out; here I’ll show a very simple implementation I went with which works well enough for me. The goal is to:

  • create a login whenever a user signs in, with IP address, user agent and a unique device ID;
  • at each request, check whether a login exists for the given user/device ID combination and if it doesn’t, force sign in;
  • update the login at each authenticated request just in case the IP address (thus the location) changes while a session is active (optional);
  • delete the login when the user signs out from the device;
  • list all the active logins in the user’s account page with browser/OS info, IP address, and approximate location (city & country);
  • allow the user to delete any of those logins to sign out from the respective device.

I like doing authentication from scratch (see this Railscast) so that’s what I am using here but if you use something like Devise instead, it won’t be very different.

The first thing we need for this simple implementation is to generate a Login model:

rails g model Login user:belongs_to ip_address user_agent device_id:index

The Login model will be basically empty as it will only do persistence:

class Login < ApplicationRecord
  belongs_to :user
end

Then in the create action of my SessionsController I have something like this:

  def create
    @sign_in_form = SignInForm.new

    if user = @sign_in_form.submit(params[:sign_in_form])
      device_id = SecureRandom.uuid

      if params[:sign_in_form][:remember_me]
        cookies.permanent[:auth_token] = user.auth_token
        cookies.permanent[:device_id]  = device_id
      else
        cookies[:auth_token] = user.auth_token
        cookies[:device_id]  = device_id
      end

      user.logins.create!(ip_address: request.remote_ip,
                          user_agent: request.user_agent,
                          device_id: device_id)

      redirect_to ...
    else
      redirect_to sign_in_path, alert: "Invalid email or password."
    end
  end

So each time a user successfully signs in from a device we create a login with a unique device ID.

In the ApplicationController, I have:

  def current_user
    @current_user ||= begin
      if cookies[:auth_token].present? and cookies[:device_id].present?
        if user = User.find_by(auth_token: cookies[:auth_token])
          if login = user.logins.find_by(device_id: cookies[:device_id])
            # optional
            login.update!(ip_address: request.remote_ip, user_agent: request.user_agent, updated_at: Time.now.utc)
            user
          end
        end
      end
    end
  end
  helper_method :current_user

  def authenticate
    redirect_to sign_in_path unless current_user
  end

I didn’t bother here but perhaps you can prettify the current_user method. So, in order to assume the user is successfully authenticated for the request, we expect:

  • both the auth_token and device_id cookies to be present;
  • the auth_token to be associated with an existing user;
  • a login to exist for the user with the device_id stored in the cookies;

otherwise we redirect the user to the sign in page.

Finally, in the SessionsController I have a destroy action which deletes both the login and the cookies from the browser:

  def destroy
    current_user.logins.find_by(device_id: cookies[:device_id]).destroy
    cookies.delete(:auth_token)
    cookies.delete(:device_id)
    flash.now[:notice] = "Successfully signed out."
    redirect_to sign_in_path
  end

Remember to add a route for the destroy action, e.g.:

resources :logins, only: [:destroy]

Next, we want to list the active logins for the user in their account page so that they can sign out from any of those devices. So that the user can easily tell logins apart I am using:

  • the device_detector gem to identify browser and operating system;
  • the Maxmind GeoIP2 API with the geoip2 gem to geolocate IP addresses so we can display the approximate location for each login. This is just one of many ways you can geolocate IP addresses; I am using Maxmind for other things too so using the Maxmind API works fine for me but you may want to use a different service or a local database (for performance). Also see the geocoder gem for another option.

In the LoginsHelper I have:

module LoginsHelper
    def device_description(user_agent)
        device = DeviceDetector.new(user_agent)
        "#{ device.name } #{ device.full_version } on #{ device.os_name } #{ device.os_full_version }"
    end

    def device_location(ip_address)
        if ip = Ip.find_by(address: ip_address)
            "#{ ip.city }, #{ ip.country }"
        else
            location = Geoip2.city(ip_address)
            if location.error
                Ip.create!(address: ip_address, city: "Unknown", country: "Unknown")
                "Unknown"
            else
                Ip.create!(address: ip_address, city: location.city.names[:en],
                                     country: location.country.names[:en])
                "#{ location.city.names[:en] }, #{ location.country.names[:en] }"
            end
        end
    end
end

I am leaving these methods in the helper but you may want to move them into a class or something. device_description, as you can see, shows the browser/OS info, for example for my Chrome on Gentoo it shows Chrome 52.0.2743.116 on GNU/Linux; then device_location shows city and country like Espoo, Finland if the IP address is in the Maxmind database. If the IP address is invalid or it is something like 127.0.0.1 or a private IP address, the Maxmind API will return an error so we’ll just show “Unknown” instead. This is an example, you may want to avoid the API call (if using an API) when the IP is a private IP address; another optimisation could be performing the geolocation asynchronously with a background job when the user signs in, instead of performing it while rendering the view. Also, you can see another model here, Ip. This is a simple way to cache IP addresses with their locations so we don’t have to make the same API request twice for a given IP address. So next we need to generate this model:

rails g model Ip address:index country city

Again, I am showing here an example, you may want to move the geolocation logic to the Ip model or to a separate class, up to you.

We can now add something like the following to the user’s account page:

<h2>Active sessions</h2>
These are the devices currently signed in to your account:
<table id="logins">
<thead>
<tr>
<th>Device</th>
<th>IP Address</th>
<th>Approximate location</th>
<th>Most recent activity</th>
<th></th>
</tr>
</thead>
<tbody>
    <%= render @logins  %></tbody>
</table>

where @logins is assigned in the controller:

@logins = current_user.logins.order(updated_at: :desc)

The _login.html.erb partial contains:

<tr id="<%= dom_id(login) %>" class="login">
<td><%= device_description(login.user_agent) %></td>
<td><%= login.ip_address %></td>
<td><%= device_location(login.ip_address) %></td>
<td><%= time_ago_in_words(login.updated_at) %></td>
<td>
        <% if login.device_id == cookies[:device_id] %>
            (Current session)
        <% else %>
            <%= link_to "<i class='fa fa-remove'></i>".html_safe, login_path(login), method: :delete, remote: true, title: "Sign out", data: { confirm: "Are you sure you want to sign out from this device?" } %>
        <% end %></td>
</li>

Besides browser/OS/IP/location we also show an X button to sign out from devices unless it’s the current session. It looks like this:

screenshot-from-2016-10-19-18-31-50

Finally, a little CoffeeScript view to actually delete the login when clicking on the X:

$("#login_<%= @login.id %>").hide ->
    $(@).remove()

and the destroy action:

class LoginsController < ApplicationController
    def destroy
        current_user.logins.find(params[:id]).destroy
    end
end

That’s it! Now if the user removes any of the logins from the list, the respective device will be signed out.

Setting up a Ubuntu server for Ruby and PHP apps

There are several guides on the Internet on setting up a Ubuntu server, but I thought I’d add here some notes on how to set up a server capable of running both Ruby and PHP apps at the same time. Ubuntu’s latest Long Term Support (LTS) release is 14.04, so this guide will be based on that release.

I will assume you already have a a server with the basic Ubuntu Server Edition installed – be it a dedicated server or a VPS from your provider of choice – with just SSH access enabled and nothing else. We’ll be bootstrapping the basic system and install all the dependencies required for running Ruby and PHP apps; I usually use Nginx as web server, so we’ll be also using Phusion Passenger as application server for Ruby and fastcgi for PHP to make things easier.

First steps

Before anything else, it’s a good idea to update the system with the latest updates available. So SSH into the new server with the IP and credentials you’ve been given and -recommended- start a screen session with

screen -S <session-name>

Now change the root password with

passwd

then open /root/.ssh/authorized_keys with and editor and ensure no SSH keys have already been added other than yours; if you see any keys, I recommend you comment them out and uncomment them only if you ever need to ask your provider for support.

Done that, as usual run:

apt-get update
apt-get upgrade -y

to update the system.

Next, edit /etc/hostname with vi or any other editor and change the hostname with the hostname you will be using to connect to this server; also edit /etc/hosts and add the correct hostname in there as well. Reboot:

reboot now

SSH access

It’s a good idea to use a port other than the default one for SSH access, and a user other than root. In this guide, we’ll be:

  • using the example port 17239
  • disabling the root access and enabling access for the user deploy (only) instead
  • switching from password authentication to public key authentication for good measure.

Of course you can choose whichever port and username you wish.

For convenience, on your client computer (that is, the computer you will be connecting to the server from) edit ~/.ssh.config and add the following content:

Host my-server (or whichever name you prefer)
Hostname <the ip address of the server>
Port 22
User root

So you can more easily SSH into the new server with just

ssh my-server

As you can see for now we are still using the default port and user until the SSH configuration is updated.

Unless your public key has already been added to /root/.ssh/authorized_keys during the provisioning of the new server, still on the client machine run

ssh-copy-id <hostname or ip of the server>

to copy your public key over. You should now be able to SSH into your server without password.

Back on the server, it’s time to setup the user which you will be using to SSH into the server instead of root:

adduser deploy

Edit /etc/sudoers and add:

deploy ALL=(ALL:ALL) ALL

On the client, ensure you can SSH into the server as deploy using your key:

ssh-copy-id deploy@my-server

You should now be able to login as deploy without password.

Now edit /etc/ssh/sshd_config and change settings as follows:

Port 17239
PermitRootLogin no
PasswordAuthentication no
UseDNS no
AllowUsers deploy

This will:

  • change the port
  • disable root login
  • disable password authentication so we are forced to use public key authentication
  • disable DNS lookups so to speed up logins
  • only allow the user deploy to SSH into the system

Restart SSH server with:

service ssh restart

Keep the current session open just in case for now. On the client, open again ~/.ssh/config and update the configuration of the server with the new port and user:

Host my-server (or whichever name you prefer)
Hostname <the ip address of the server>
Port 17239
User deploy

Now if you run

ssh my-server

you should be in as deploy without password. You should no longer be able to login as root though; to test run:

ssh root@my-server date

you should see an error:

Permission denied (publickey).

Firewall

Now that SSH access is sorted, it’s time to configure the firewall to lock down the server so that only the services we want (such as ssh, http/https and mail) are allowed. Edit the file /etc/iptables.rules and paste the following:

# Generated by iptables-save v1.4.4 on Sat Oct 16 00:10:15 2010
*filter
:INPUT ACCEPT
:FORWARD ACCEPT
:OUTPUT ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -d 127.0.0.0/8 ! -i lo -j DROP
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 443 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 587 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 17239 -j ACCEPT
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables [Positive[False?]: " --log-level 7
-A INPUT -p icmp -m icmp --icmp-type 8 -j ACCEPT
-A INPUT -j LOG
-A INPUT -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT -j ACCEPT
COMMIT
# Completed on Sat Oct 16 00:10:15 2010
# Generated by iptables-save v1.4.4 on Sat Jun 12 23:55:23 2010
*mangle
:PREROUTING ACCEPT
:INPUT ACCEPT
:FORWARD ACCEPT
:OUTPUT ACCEPT
:POSTROUTING ACCEPT
COMMIT
# Completed on Sat Jun 12 23:55:23 2010
# Generated by iptables-save v1.4.4 on Sat Jun 12 23:55:23 2010
*nat
:PREROUTING ACCEPT
-A PREROUTING -p tcp --dport 25 -j REDIRECT --to-port 587
:POSTROUTING ACCEPT
:OUTPUT ACCEPT
COMMIT
# Completed on Sat Jun 12 23:55:23 2010

It’s a basic configuration I have been using for some years. It locks all incoming traffic apart from SSH access, web traffic (since we’ll be hosting Ruby and PHP apps) and mail. Of course, make sure you specify the SSH port you’ve chosen here if other than 17239 as in the example.

To apply the setting now, run:

iptables-restore < /etc/iptables.rules

and verify with

iptables -L

You should see the following output:

Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere
DROP all -- anywhere loopback/8
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
ACCEPT tcp -- anywhere anywhere tcp dpt:http
ACCEPT tcp -- anywhere anywhere tcp dpt:https
ACCEPT tcp -- anywhere anywhere tcp dpt:submission
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:17239
LOG all -- anywhere anywhere limit: avg 5/min burst 5 LOG level debug prefix "iptables [Positive[False?]: "
ACCEPT icmp -- anywhere anywhere icmp echo-request
LOG all -- anywhere anywhere LOG level warning
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere

Now if you reboot the server, these settings will be lost, so you need to persist them in either of two ways:

1) open /etc/network/interfaces and add, in the eth0 section, the following line:

post-up iptables-restore < /etc/iptables.rules

So the file should now look similar to the following:

auto eth0
iface eth0 inet static
address ...
netmask ...
gateway ...
up ip addr add 10.16.0.5/16 dev eth0
dns-nameservers 8.8.8.8 8.8.4.4
post-up iptables-restore < /etc/iptables.rules

OR,

2) Run

apt-get install iptables-persistent

Either way, reboot now and verify again with iptables -L that the settings are persisted.

ZSH shell, editor (optional)

If you like me prefer ZSH over BASH and use VIM as editor, first install ZSH with:

apt-get install zsh git-core
curl -L https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh | sh
ln -s ~/dot-files/excid3.zsh-theme ~/.oh-my-zsh/themes

Then you may want to use my VIM configuration so to have a nicer editor environment:

cd; git clone https://github.com/vitobotta/dot-files.git
cd dot-files; ./setup.sh

I’d repeat the above commands for both the deploy user and root (as usual you can use sudo -i for example to login as root). Under deploy, you’ll need to additionally run:

chsh

and specify /usr/bin/zsh as your shell.

Dependencies for Ruby apps

You’ll need to install the various dependencies required to compile Ruby and install various gems:

apt-get install build-essential curl wget openssl libssl-dev libreadline-dev libmysqlclient-dev ruby-dev mysql-client ruby-mysql xvfb firefox libsqlite3-dev sqlite3 libxslt1-dev libxml2-dev

You’ll also need to install nodejs for the assets compilation (Rails apps):

apt-get install software-properties-common
add-apt-repository ppa:chris-lea/node.js
apt-get update
apt-get install nodejs

Next, as deploy:

Ensure the following lines are present in the shell rc files (.zshrc and .zprofile) and reload the shell so the new Ruby can be “found”:

export PATH="$HOME/.rbenv/bin:$HOME/.rbenv/shims:$PATH"
eval "$(rbenv init -)"

ruby -v should now output the expected version number, 2.2.4 in the example.

Optionally, you may want to install the rbenv-vars plugin for environment variables support with rbenv:

git clone https://github.com/sstephenson/rbenv-vars.git ~/.rbenv/plugins/rbenv-vars
chmod +x ~/.rbenv/plugins/rbenv-vars/bin/rbenv-vars

Dependencies for PHP apps

Install the various packages required for PHP-FPM:

apt-get install php5-fpm php5-mysql php5-curl php5-gd php5-intl php-pear php5-imagick php5-mcrypt php5-memcache php5-memcached php5-ming php5-ps php5-pspell php5-recode php5-snmp php5-sqlite php5-tidy php5-xmlrpc php5-xsl php5-geoip php5-mcrypt php-apc php5-imap

MySQL

I am assuming here you will be using MySQL – I usually use the Percona distribution. If you plan on using some other database system, skip this section.

First, install the dependencies:

apt-get install curl build-essential flex bison automake autoconf bzr libtool cmake libaio-dev libncurses-dev zlib1g-dev libdbi-perl libnet-daemon-perl libplrpc-perl libaio1
gpg --keyserver hkp://keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
gpg -a --export CD2EFD2A | sudo apt-key add -

Next edit /etc/apt/sources.list and add the following lines:

deb http://repo.percona.com/apt trusty main
deb-src http://repo.percona.com/apt trusty main

Install Percona server:

apt-get update
apt-get install percona-xtradb-cluster-server-5.5 percona-xtradb-cluster-client-5.5 percona-xtradb-cluster-galera-2.x

Test that MySQL is running:

mysql -uroot -p

Getting web apps up and running

First install Nginx with Passenger for Ruby support (also see this:

apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 561F9B9CAC40B2F7
apt-get install apt-transport-https ca-certificates

Edit /etc/apt/sources.list.d/passenger.list and add the following:

deb https://oss-binaries.phusionpassenger.com/apt/passenger trusty main

Update sources:

chown root: /etc/apt/sources.list.d/passenger.list
chmod 600 /etc/apt/sources.list.d/passenger.list
apt-get update

Then install Phusion Passenger for Nginx:

apt-get install nginx-extras passenger

Edit /etc/nginx/nginx.conf and uncomment the passenger_root and passenger_ruby lines, making sure the latter points to the version of Ruby installed with rbenv, otherwise it will point to the default Ruby version in the system. Make the following changes:

user deploy;
worker_processes auto;
pid /run/nginx.pid;

events {
use epoll;
worker_connections 2048;
multi_accept on;
}

http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
server_tokens off;
…
passenger_root /usr/lib/ruby/vendor_ruby/phusion_passenger/locations.ini;
passenger_ruby /home/deploy/.rbenv/shims/ruby;
passenger_show_version_in_header off;
}

Restart nginx with

service nginx restart

Test that nginx works by opening http://the_ip_or_hostname in your browser.

For PHP apps, we will be using fastcgi with unix sockets. Create for each app a profile in /etc/php5/fpm/pool.d/, e.g. /etc/php5/fpm/pool.d/myapp. Use the following template:

[<app name>]
listen = /tmp/<app name>.php.socket
listen.backlog = -1
listen.owner = deploy
listen.group = deploy

; Unix user/group of processes
user = deploy
group = deploy

; Choose how the process manager will control the number of child processes.
pm = dynamic
pm.max_children = 75
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
pm.max_requests = 500

; Pass environment variables
env[HOSTNAME] = $HOSTNAME
env[PATH] = /usr/local/bin:/usr/bin:/bin
env[TMP] = /tmp
env[TMPDIR] = /tmp
env[TEMP] = /tmp

; host-specific php ini settings here
; php_admin_value[open_basedir] = /var/www/DOMAINNAME/htdocs:/tmp

To allow communication between Nginx and PHP-FPM via fastcgi, ensure each PHP app’s virtual host includes some configuration like the following:

location / {
try_files $uri /index.php?$query_string;
}

location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_pass unix:/tmp/<app name>.php.socket;
fastcgi_index index.php;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root/$fastcgi_script_name;
}

Edit /etc/php5/fpm/php.ini and set cgi.fix_pathinfo to 0. Restart both FPM and Nginx:

service php5-fpm restart
service nginx restart

Congrats, you should now be able to run both Ruby and PHP apps.

Backups

There are so many ways to backup a server…. what I usually use on my personal servers is a combination of xtrabackup for MySQL databases and duplicity for file backups.

As root, clone my admin scripts:

cd ~
git clone https://github.com/vitobotta/admin-scripts.git
apt-key adv --keyserver keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A

Edit /etc/apt/sources.list and add:

deb http://repo.percona.com/apt trusty main
deb-src http://repo.percona.com/apt trusty main

Proceed with the installation of the packages:

apt-get update
apt-get install duplicity xtrabackup

Next refer to this previous post for the configuration.

Schedule the backups with crontab -e by adding the following lines:

MAILTO = <your email address>

00 02 * * sun /root/admin-scripts/backup/duplicity.sh full
00 02 * * mon-sat /root/admin-scripts/backup/duplicity.sh incr
00 13 * * * /root/admin-scripts/backup/xtrabackup.sh incr

Mailing

  • install postfix and dovecot with
apt-get install postfix dovecot-common mailutils
  • run dpkg-reconfigure postfix and set the following:
  • General type of mail configuration -> Internet Site
  • System mail name -> same as the server’s hostname
  • Root and postmaster email recipient -> your email address
  • Force synchronous updates on mail queue -> no
  • Local networks -> leave default
  • Mailbox size limit (bytes) -> set 10485760 (10MB) or so, to prevent the default mailbox from growing with no limits
  • Internet protocols to use -> all

  • SMTP authentication: run

postconf -e 'home_mailbox = Maildir/'
postconf -e 'smtpd_sasl_type = dovecot'
postconf -e 'smtpd_sasl_path = private/auth'
postconf -e 'smtpd_sasl_local_domain ='
postconf -e 'smtpd_sasl_security_options = noanonymous'
postconf -e 'broken_sasl_auth_clients = yes'
postconf -e 'smtpd_sasl_auth_enable = yes'
postconf -e 'smtpd_recipient_restrictions = permit_sasl_authenticated,permit_mynetworks,reject_unauth_destination'
  • TLS encryption: run
mkdir /etc/postfix/certificate && cd /etc/postfix/certificate
openssl genrsa -des3 -out server.key 2048
openssl rsa -in server.key -out server.key.insecure
mv server.key server.key.secure
mv server.key.insecure server.key
openssl req -new -key server.key -out server.csr
openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

postconf -e 'smtp_tls_security_level = may'
postconf -e 'smtpd_tls_security_level = may'
postconf -e 'smtp_tls_note_starttls_offer = yes'
postconf -e 'smtpd_tls_key_file = /etc/postfix/certificate/server.key'
postconf -e 'smtpd_tls_cert_file = /etc/postfix/certificate/server.crt'
postconf -e 'smtpd_tls_loglevel = 1'
postconf -e 'smtpd_tls_received_header = yes'
postconf -e 'myhostname = <hostname>'
  • SASL
  • edit /etc/dovecot/conf.d/10-master.conf, and uncomment the following lines so that they look as follows (first line is a comment so leave it…commented out):

Postfix smtp-auth

unix_listener /var/spool/postfix/private/auth {
mode = 0666
}
* edit /etc/dovecot/conf.d/10-auth.conf and change the setting auth_mechanisms to “plain login”
* edit /etc/postfix/master.cf and a) comment out smtp, b) uncomment submission
* restart postfix: service postfix restart
* restart dovecot: service dovecot restart
* verify that all looks good

root@nl:/etc/postfix/certificate# telnet localhost 587
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
220 <hostname> ESMTP Postfix (Ubuntu)
ehlo <hostname>
250-<hostname>
250-PIPELINING
250-SIZE 10240000
250-VRFY
250-ETRN
250-STARTTLS
250-AUTH PLAIN LOGIN
250-AUTH=PLAIN LOGIN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN

Test email sending:

echo "" | mail -s "test" <your email address>

There’s a lot more that could be done, but this should get you started. Let me know in the comments if you run into any issues.

Jenkins CI with Rails projects

I’ve had to set up a Jenkins server for Rails projects today so I thought I’d write a post about it. Hopefully it will save someone time – I’ll assume here that your already know what Jenkins and CI are, and prefer setting up your own CI solution rather than using any commercial CI service. I will add here instructions on how to set up Jenkins on a Ubuntu server, so dependencies may be different if you use another distribution of Linux.

Dependencies

For starters, you need to install some dependencies in order to configure a fully functional Jenkins server for RSpec/Cucumber testing with MySQL, and Firefox or Phantomjs for testing features with a headless browser. You can install all these dependencies as follows – these dependencies also include everything you need to correctly install various gems required in most projects:

sudo apt-get install build-essential git-core curl wget openssl libssl-dev libopenssl-ruby libmysqlclient-dev ruby-dev mysql-client libmysql-ruby xvfb firefox libsqlite3-dev libxslt-dev libxml2-dev libicu48

Once these dependencies are installed, if you use Selenium with your Cucumber features you will have Firefox ready for use as a headless browser thanks to xvfb, which simulates a display. When xvfb is installed, the headless browser should already work with Jenkins with the project configuration I will show later. If that’s not the case, you may need to write an init.d script so that xvfb can run as a service. Here’s the content of such script (/etc/init.d/xvfb):

XVFB=/usr/bin/Xvfb
XVFBARGS=":1 -screen 0 1024x768x24 -ac +extension GLX +render -noreset"
PIDFILE=/var/run/xvfb.pid
case "$1" in
start)
echo -n "Starting virtual X frame buffer: Xvfb"
start-stop-daemon --start --quiet --pidfile $PIDFILE --make-pidfile --background --exec $XVFB -- $XVFBARGS
echo "."
;;
stop)
echo -n "Stopping virtual X frame buffer: Xvfb"
start-stop-daemon --stop --quiet --pidfile $PIDFILE
echo "."
;;
restart)
$0 stop
$0 start
;;
*)
echo "Usage: /etc/init.d/xvfb {start|stop|restart}"
exit 1
esac

exit 0

Of course you’ll need to make this file executable and then start the service:

chmod +x /etc/init.d/xvfb
/etc/init.d/xvfb start

In this example xvfb is configured to make the virtual display :1 available, so to make sure any app requiring it ‘finds’ it, you need to set the environment variable DISPLAY in your shell rc/profile file:

export DISPLAY=:1

If instead of Selenium/Firefox you are using Phantomjs as headless browser with your cucumber features, you need to install Phantomjs first. At the moment of this writing the latest LTS release of Ubuntu is 13.04, which has by default an old version of Phantomjs; Cucumber/Capybara will complain that this version is too old, so you need to install a newer version (e.g. 1.9) from source:

cd /usr/local/src
wget https://phantomjs.googlecode.com/files/phantomjs-1.9.0-linux-x86_64.tar.bz2
tar xjf phantomjs-1.9.0-linux-x86_64.tar.bz2
ln -s /usr/local/src/phantomjs-1.9.0-linux-x86_64/bin/phantomjs /usr/bin/phantomjs

Now if you run phantomjs –version it should return 1.9.0.

Jenkins

Once the dependencies are sorted out, it’s time to install Jenkins. It’s easy to do by following the instructions you can also find on Jenkins’ website. I’ll add them here too for convenience:

wget -q -O - http://pkg.jenkins-ci.org/debian/jenkins-ci.org.key | sudo apt-key add -
sudo sh -c 'echo deb http://pkg.jenkins-ci.org/debian binary/ > /etc/apt/sources.list.d/jenkins.list'
sudo apt-get update
sudo apt-get install jenkins

Jenkins’ UI should now be available on port 8080 (optionally you may want to configure a web server such Nginx as fronted to Jenkins). The first thing I recommend to do through the UI is to enable the security, otherwise anyone will have access to projects etc. You can secure Jenkins in many ways, but for the sake of simplicity I will suggest here the simplest one which is based on authentication with username and password. So go to Manage Jenkins > Configure Global Security, and check Enable security. Still on the same page, select Jenkins’ own user database under Security Realm and leave Allow users to sign up enabled for now.

Done this, follow the Sign up link in the top right corner of the page and sign up, so creating a new user. Then back to the Configure Global Security page, select Matrix-based security under Authorisation and add all permissions to the user you have just registered with. Then, disable Allow users to sign up – unless you do want other people to be able to sign up, rather than manually creating new users as needed.

Then log out and log in again just to make sure everything still works OK. If you have problems after these steps and can no longer access Jenkins, you can reset the security settings and try again.

Job configuration

I’ll assume here you are configuring Jenkins for a Rails project and that you use Git as SCM. Jenkins doesn’t support Git out of the box unfortunately, but you can easily fix this by installing the plugins GIT Plugin and GIT Client Plugin. You can install plugins under Manage Jenkins > Manage plugins > Available, where you can search for those plugins and select to install them (and, I recommend, to restart Jenkins after the plugins are installed so that the changes are effective immediately).

Next step is to create and configure a Job. Head to the main page, and then follow New Job; give the job a name and choose the type of job you want to create. In most cases you want to choose Build a free-style software project. You will be taken to the configuration page for the job. Under Source code management, choose Git and enter in Repository URL the URL… of your app’s repository. Before doing this though, make sure you can pull the code on the server by configuring SSH access and anything else needed – basically do a pull test manually from the terminal and ensure it works. Under Branches to build enter one or more branch that you want Jenkins to test against, e.g. */development.

Next, it is very likely that you want Jenkins to build the job automatically each time code is pushed to any of the branches the job is ‘watching’. There are a few ways to do so, called Build triggers on the job configuration page. The two methods I use are Trigger builds remotely with an authentication token and Poll SCM; in the first case, you’ll need to enter a token and then add a hook to the Git repository so that the trigger is automatically activated when new code is pushed. For example, in Bitbucket, you can do this on the page Hooks of the administration area of the repository; the hook to add is of type Jenkins and the format is:

http://USER:TOKEN@JENKINS_URL:8080/

The second method involves enabling Poll SCM in the job configuration page but without a schedule; then you’d add a POST hook with format:

http://JENKINS_URL:8080/git/notifyCommit?url=REPO_URL

In this case you may want to restrict these POST requests with a firewall or else. Either way, Jenkins will be notified whenever code is pushed and a build will be triggered.

Next, add an Execute shell build step under Build, and paste the following:

. /var/lib/jenkins/.bash_profile
rbenv global 1.9.3-p484
rbenv rehash
bundle install
cp config/database.yml.example config/database.yml
mkdir -p tmp/cache
RAILS_ENV=test bundle exec rake db:migrate db:seed
RAILS_ENV=test bundle exec rspec spec
DISPLAY=localhost:0.0 xvfb-run -a bundle exec cucumber features

Please note that I am assuming here that you have installed Ruby under the user jenkins (which is created automatically when installing Jenkins) with rbenv. If you have installed Ruby in a different way, you will have to adapt the build step accordingly. You may anyway have to make changes depending on your project, but the build step as suggested above should work with most projects.

The last piece of configuration left is email notifications, which you can customise as you like. Remember though to set Jenkins’ own email address under Configure system > Jenkins location.

That’s it – you can now test Jenkins by manually running a build or by pushing some code. Hope it helps.

Multi tenancy with Devise and ActiveRecord’s default scope

Multi tenancy with default scope

Multi tenancy in a Rails application can be achieved in various ways, but my favourite one is using ActiveRecord’s default scope as it’s easy and provides good security. Essentially, the core of this technique is to define a default scope on all the resources owned by a tenant, or account. For example, say you have a tenant model named Account which has many users. The User model could define a default scope as follows:

class User < ActiveRecord::Base
# ...
belongs_to :account
default_scope { where(account_id: Account.current_id) }
# ...
end

Do see this screencast by Ryan Bates for more details on this model of multi tenancy.

The problem with this technique is that it often gets in the way of authentication solutions like Devise, which happens to be one of the most popular ones. One common way of implementing multi tenancy with Devise is using subdomains, as suggested in Ryan’s screencast; this works well because it’s easy to determine the tenant/account by just looking up the subdomain, regardless of whether the user is signed in or not. There are cases though when you don’t want or can’t use subdomains; for example, an application that enables vanity urls with subdomains only for paid users while using standard authentication for non paid users. In such scenario your application needs to implement multi tenancy both with and without subdomains.

So if you need to use the typical Devise authentication while also implementing the multi tenancy with the default scope to isolate the data belonging to each account, this combination won’t work out of the box. The reason is that the user must be already signed in, in order for Devise’s current_user to be defined, and with it – through association – the current account:

class ApplicationController < ActionController::Base
# ...

before_filter :authenticate_user!
around_filter :scope_current_tenant

private

# ...

def scope_current_tenant
Account.current_id = current_user.account.id if signed_in?
yield
ensure
Account.current_id = nil
end
end

If the user is not signed in, Account.current_id cannot be set, therefore the default scope on the User model will add a condition -to all the queries concerning users- that the account_id must be nil. For example when the user is attempting to sign in, a query like the following will be generated to find the user:

SELECT `users`.* FROM `users` WHERE `users`.`account_id` IS NULL AND `users`.`email` = 'email@example.com' LIMIT 1

As you can see it looks for a user with account_id not set. However, it is likely that in a multi tenancy application each user belongs to an account, therefore such a query will return no results. This means that the user cannot be found, and the authentication with Devise will fail even though a user with the given email address actually exists and the password is correct. This isn’t the only problem when using Devise together with default scope for multi tenancy without subdomains. Each Devise feature is affected:

  • authentication: the first problem you won’t miss when enabling default scope in an application that uses Devise for the authentication, is simply that you won’t be able to sign in. This is because the user cannot be found for the reasons explained earlier;
  • persistent sessions: once you get the basic authentication working, you will soon notice that the session is not persisted across pages. That is, once signed in you will need to sign in again when you change page in your application. Here the default scope gets in the way when retrieving the user using the session data;
  • password recovery: there are two problems caused by default scope to the password recovery process. First, as usual the user cannot be found when supplying a valid email address; second, when reaching the ‘change my password’ form upon following the link in the email the user receives, that form will be displayed again upon submission and the user won’t actually be able to set the new password because this form will be displayed again and again. Some investigation when I was trying to fix showed that the reason for this is that since the user cannot be found in that second step of the process (because of default scope, of course), the token will be considered invalid and the password recovery form will be rendered again with a validation error;
  • resending confirmation email: this is quite similar to the password recovery; first, user cannot be found when requesting that the confirmation instruction be sent again; second, token is considered invalid and the confirmation form is displayed again and again when reaching it by clicking the link in the email.

In order for Devise to find the user in all these cases, it is necessary that it ignore the default scope. This way the query like the one I showed earlier won’t include the condition that the account_id must be nil, and therefore the user can be found. But how to ignore the default scope? As Ryan suggests in his screencast, it’s as simple as calling unscoped before a where clause. unscoped also accepts a block, so that anything executed within the given block will ignore the default scope.

So in order to get the broken features working, it is necessary to override some methods that Devise uses to extend the User model, so that these methods use unscoped. I’ll save you some time with researching and just add here the content of a mixin that I use for this purpose:

module DeviseOverrides
def find_for_authentication(conditions)
unscoped { super(conditions) }
end

def serialize_from_session(key, salt)
unscoped { super(key, salt) }
end

def send_reset_password_instructions(attributes={})
unscoped { super(attributes) }
end

def reset_password_by_token(attributes={})
unscoped { super(attributes) }
end

def find_recoverable_or_initialize_with_errors(required_attributes, attributes, error=:invalid)
unscoped { super(required_attributes, attributes, error) }
end

def send_confirmation_instructions(attributes={})
unscoped { super(attributes) }
end

def confirm_by_token(confirmation_token)
unscoped { super(confirmation_token) }
end
end

See the use of unscoped. Then, simply extend the User model with this mixin (which I keep in the lib directory of the app):

class User < ActiveRecord::Base
# ...
extend DeviseOverrides
# ...
end

That’s it. You should now have Devise working just fine with the default scope for multi tenancy in your Rails application, without subdomains. While I was investigating these issues I was wondering, would it be a good idea to update Devise’s code so to ensure it always uses unscoped by default? In my opinion this wouldn’t affect the existing behaviour and would make this way of doing multi tenancy easier without having to override any code. What do you think? If you also know of a quicker, easier way of achieving the same result, do let me know!

Bitwise operations in Ruby, and an example application to testing with Rspec

I often need to test that something should happen or should be possible depending on a variable number of conditions that are linked together in some arbitrary way. In one application, for example, I have a model called License, and one of the requirements is that it should be possible to ‘activate’ an existing license only if its status is either :new or :suspended. Without going into too much detail about the domain specific to this application, it is clear in this example that a license should be activable only if the following two conditions are met at the same time:

  • the license should exist, that is – in typical Rails terms – the license should be persisted, and
  • the status of the license should be either :new or :suspended

So I have an instance method on the License model that looks like this:

class License < ActiveRecord::Base
...
def activable?
persisted? and (status_new? or status_suspended?)
end
...
end

persisted? is a method available on all ActiveRecord models while status_new? and status_suspended? are methods dynamically generated by a module that, included in a model, allows to manage any attribute as an ‘enum’ field that can only contain one from a list of possible values (so, in the example, License can have any of the statuses [:new, :active, :suspended, :revoked, :expired, :transferred]).

How would you go about testing that this method behaves as expected? One obvious way could be by taking into account all the possible combinations of true/false values that the three variables above (persisted?, status_new?, status_suspended?) could have. persisted? is a framework thing (and as such we won’t test it) and the methods dynamically generated concerning the license status are unit-tested separately, plus there are integration tests to ensure everything is working together properly. So it is safe, in this case, to just stub all the methods with the purpose of testing when a license can be activated, in isolation.

So we could have something like this:

describe "License" do
subject(:license) { build(:license) }
...
it "can only be activated if persisted, and new or suspended" do
license.stub(:persisted?).and_return(false)
license.stub(:status_new?).and_return(false)
license.stub(:status_suspended?).and_return(false)

license.should_not be_activable

# ... some other combinations

license.stub(:persisted?).and_return(true)
license.stub(:status_new?).and_return(true)
license.stub(:status_suspended?).and_return(false)

license.should be_activable

# ... some other combinations
end
...
end

One other way some would achieve the same thing is:

describe "License" do
subject(:license) { build(:license) }
...
it "can only be activated if persisted, and new or suspended" do
# license isn't persisted yet
[:new, :suspended].each { |status| license.status = status; license.should_not be_activable }
(License.statuses - [:new, :suspended]).each { |status| license.status = status; license.should_not be_activable }

license.save!
# license is persisted
[:new, :suspended].each { |status| license.status = status; license.should be_activable }
(License.statuses - [:new, :suspended]).each { |status| license.status = status; license.should_not be_activable }
end
...
end

Either way, the bottom line is that we’ll likely have to somehow loop through all the possible combinations of boolean values that the three variables persisted?, status_new?, status_suspended? can have, and then we determine for each of these combinations what the expected behaviour is (in the example: whether the license can be activated or not). While both examples (as in “Rspec examples”) would work, both suffer from quite a bit of duplication and reduced readability, in that it is not very clear right away by just looking at the code what is the relationship between the conditions we’re taking into account. In our case, we have a “A and (B or C)” kind of relation that is instantly clear only if you read the title of the Rspec examples.

Another way of testing the same thing, which I prefer, is as follows:

describe "License" do
subject(:license) { build(:license) }
...
it "can only be activated if persisted, and new or suspended" do
boolean_combinations(3).each do |persisted, _new, suspended|
license.stub(:persisted?).and_return(persisted)
license.stub(:status_new?).and_return(_new)
license.stub(:status_suspended?).and_return(suspended)

if persisted and (_new or suspended)
license.should be_activable
else
license.should_not be_activable
end
end
end
...
end

As you can guess, thanks to the boolean_combinations method (more on that later) we’re still looping through all the possible combinations but with no duplication, and the example is more readable. The advantage is that from looking at just the code, you can understand right away what is the relationship between what I’m calling “the variables” and how the various conditions are linked together. In particular, the code

...
if persisted and (_new or suspended)
license.should be_activable
else
license.should_not be_activable
end
...

clearly says when a license should be activable. It’s good from a “driving code by tests/specs” standpoint in that it suggests the “code we wish we had”. Before going ahead, one note on having multiple expectations in the same example: I usually prefer to keep one expectation per example, as it is often recommended for clarity and simplicity; however in cases like the above all the expectations in the single example define only together the behaviour being tested. None of them, taken individually, would define any behaviour of the subject of the test, and the alternative would be several almost identical specs with perhaps some nested contexts (depending on the combinations of conditions being tested) that would IMO be overkill in such cases. This is particularly true if you have even more than 3 different variables involved in the conditions being tested.

So, back to the boolean_combinations method: how do we generate all the possible combinations of boolean values, given any number of variables? If you’ve studied -in particular- some electronics, the answer should be pretty obvious. What we need is the same kind of truth table often used to figure out how to simplify, or reduce, some logical operations (for example with Karnaugh maps or similar methods). Such a table looks like the following (for 3 variables as in our example) and should be pretty familiar:

a | b | c
---+---+---
F | F | F
F | F | T
F | T | F
F | T | T
T | F | F
T | F | T
T | T | F
T | T | T

One possible ‘Rubyish’ way is to generate this table is with the combination method available for arrays in Ruby. It expects the number of items you want for each combination, and will produce all the possible combinations with the given array of items, e.g.

1.9.3p194 :038 > ['a', 'b', 'c'].combination(2).to_a
=> [["a", "b"], ["a", "c"], ["b", "c"]]
1.9.3p194 :039 >

In our example however we have a slightly different case, since we want combinations of 3 items each, but each of the items can have either true or false value. If we tried with

1.9.3p194 :040 > [true, false].combination(3).to_a
=> []

we would be out of luck because the array doesn’t have enough items to match the number of items required for each combination, therefore the result is an empty array. We can work around this by just ‘extending’ the original array, for example by multiplying it for the number of items we want in each combination:

1.9.3p194 :041 > ([true, false]*3).combination(3).to_a
=> [[true, false, true], [true, false, false], [true, false, true], [true, false, false], [true, true, false], [true, true, true], [true, true, false], [true, false, true], [true, false, false], [true, true, false], [false, true, false], [false, true, true], [false, true, false], [false, false, true], [false, false, false], [false, true, false], [true, false, true], [true, false, false], [true, true, false], [false, true, false]]

The good thing is that this way we do get all combinations we are looking for; the bad thing is that we get lots of duplicates – the more items per combination, the more duplicates. So we’d have to use uniq on the result to remove those duplicates:

1.9.3p194 :044 > ([true, false]*3).combination(3).to_a.uniq
=> [[true, false, true], [true, false, false], [true, true, false], [true, true, true], [false, true, false], [false, true, true], [false, false, true], [false, false, false]]

This does indeed produce all the combinations we are after, but it’s not terribly efficient, albeit it is short and simple.

Using bitwise logic

Another way to generate all the boolean combinations, in a perhaps less ‘Rubyish’ but more efficient way, is to use a bitwise operation.

First, we need to know in advance how many possible combinations we have with the given number of variables, since this will be required in the algorithm we’ll see shortly; one first way to figure out the number of combinations (or range) is with the shift-left operator (which shouldn’t be confused with same-looking operators on types other than integers):

number_of_combinations = 1 << n

As the name of this operator might suggest to some, what it does is shift each bit of the binary representation of the number to the left by n positions. The number of positions is simply the number of elements we want in each combination (that is also the number of variables we want to produce the boolean table for). So, in our example, we have:

number_of_combinations = 1 << 3

The binary representation of 1 is 0001 (using some leading zeros for clarity), so if we shift each bit by 3 positions to the left, we get 1000 which is the binary representation of the number 8. Similarly, 1 << 2 means shifting each bit of 0001 by 2 positions to the left, so we get 0100, which is the binary representation of the number 4. You’d quickly guess that the operation 1 << n is equivalent to the operation 2**n (number two raised to the nth power), so the formula usually used to calculate the number of combination is instead:

number_of_combinations = 2**number_of_items

as it’s a bit easier to remember and understand. In our case we have 2 ** 3 items = 8 combinations. The table we’ve seen earlier is equivalent to the following table:

a | b | c
---+---+---
0 | 0 | 0
0 | 0 | 1
0 | 1 | 0
0 | 1 | 1
1 | 0 | 0
1 | 0 | 1
1 | 1 | 0
1 | 1 | 1

Or also, using the variables in our example:

persisted | status_new | status_suspended combination index (I)
-----------+------------+----------------------------------------------
0 | 0 | 0 0
0 | 0 | 1 1
0 | 1 | 0 2
0 | 1 | 1 3
1 | 0 | 0 4
1 | 0 | 1 5
1 | 1 | 0 6
1 | 1 | 1 7

Interestingly, each combination of 1s and 0s on each line of the table is basically equivalent to the binary representation of the index I of the combination. So 4 = 101, 6 = 110, and so on.

So, if we look at this table we can see that for each combination with index I, each variable will have value true or false depending on whether their corresponding bit in the binary representation of I is set or not. For example, in the combination with index 7, all the three variables will have value true since their corresponding bits in the binary representation of 7, which is 111, are all set. Similarly, in the combination with index 5, persisted and status_suspended will have value true since their bits in the binary representation of 5 (101) are set, while status_new will be false because its bit in the binary representation of 5 isn’t set.

We can say the same thing this way too: given a combination with index I, and an element of the original array with index J, the element J will have value true in the combination I only if the Jnth bit (from the right) of the binary representation of I is set.

Given a number m, how do we check if the bit of position n of the binary representation of m is set? The “canonical” way do to this is to perform a binary and operation between m and the number that has only the bit of n position set. A binary and is simply an and operation performed bit by bit. So for example, if we wanted to check whether the 3rd bit of the binary representation of 16 (which is 10000) is set, we would do:

16 = 10000 &
4 = 00100
-------
00000 = 0 = 3rd bit not set

since 00100 (or 4) is the binary number having only the 3rd bit set. So the operation is equivalent to 16 & 4. In this example, the result of the bit-by-bit and operation is 0, meaning that the 3rb bit of 10000 (16) isn’t set – and in fact it is not. As another example, let’s check now if the 4th bit of 27 is set. The binary representation of 27 is 11011, while the binary number having only the 4th bit set is 10000, or 16. So we have:

27 = 11011 &
16 = 10000
-------
10000

The result is 10000 or 16, and the “rule” is that given:

  • m is a number
  • n is the position of one bit in the binary representation of m
  • o is the number that has only the nth bit set
    the nth bit of the binary representation of m is set if the operation m & o yields a result that differs from zero.

Back to the table above, let’s check for example why persisted has a value of true in the combination by index 5 (“m“). Persisted, in the table, corresponds to the third bit (“n“) of the binary representation of 5, so we need to find out if the 3rd bit from the right in 101 (= 5) is set; the number with only the 3rd bit set (“o“) is 100, or 4, so the operation we need is (using directly numbers in decimal notation):

1.9.3p194 :048 > 5 & 4
=> 4

The result is != 0, meaning that the 3rd bit of 5 is indeed set and, in turn, that for that combination, persisted has value true.

Applying all of the above to write a first version of an algorithm to generate all the boolean combinations we wanted in first place, we get:

def boolean_combinations(number_of_elements)
number_of_combinations = 2 ** number_of_elements
combinations = []

(0...number_of_combinations).each do |combination_index|
combination = Array.new(number_of_elements)

(0...number_of_elements).each do |element_index|
combination[element_index] = (combination_index & 2**element_index) != 0
end

combinations << combination.reverse
end

combinations
end

This indeed produces all the combinations we’re after:

1.9.3p194 :106 > boolean_combinations(3).each {|c| p c}; nil
[false, false, false]
[false, false, true]
[false, true, false]
[false, true, true]
[true, false, false]
[true, false, true]
[true, true, false]
[true, true, true]
=> nil

We can simplify this code a bit. Firstly, with the bit reference operator fix[n] -> 0,1 available with Fixnum objects, we can get right away the nth bit of the binary representation of a given number, and then we return either true or false depending on whether that bit is set or not. So the code

...
combination[element_index] = (combination_index & 2**element_index) != 0
...

is equivalent to

...
combination[element_index] = combination_index[element_index] == 1
...

Secondly, we can slightly simplify the two nested loops with map:

def boolean_combinations(number_of_elements)
(0...2**number_of_elements).map do |i|
(0...number_of_elements).map{ |j| i[j] == 1 }
end
end

I’m not sure of the performance difference (we are talking about tiny arrays here) but perhaps I would prefer the other, more readable version. So I keep a file in spec/support containing the boolean_combinations method, and use it in some Rspec examples as shown earlier:

describe "License" do
subject(:license) { build(:license) }
...
it "can only be activated if persisted, and new or suspended" do
boolean_combinations(3).each do |persisted, _new, suspended|
license.stub(:persisted?).and_return(persisted)
license.stub(:status_new?).and_return(_new)
license.stub(:status_suspended?).and_return(suspended)

if persisted and (_new or suspended)
license.should be_activable
else
license.should_not be_activable
end
end
end
...
end

There are various other operations that can be done with bitwise logic, including some tricks involving databases that I’ll perhaps show in some other post.

Update 16/08/2012: reader Luca Belmondo sent me a comment pointing out that the built in Array#repeated_permutation does exactly the same thing.

1.9.3p194 :004 > [true, false].repeated_permutation(3).to_a
=> [[true, true, true], [true, true, false], [true, false, true], [true, false, false], [false, true, true], [false, true, false], [false, false, true], [false, false, false]]

I hadn’t notice this method before, as well as Array#repeated_combination – it looks like both were introduced in Ruby 1.9 – so this is a fresh reminder that it’s always better to check out what’s already available before reinventing the wheel. Thanks Luca!

Resque: automatically kill stuck workers and retry failed jobs

Resque is a great piece of software by Github that makes it really easy to perform some operations (‘jobs’) asynchronously and in a distributed way across any number of workers. It’s written in Ruby and backed by the uber cool Redis key-value data store, so it’s efficient and scalable. I’ve been using Resque in production for a couple years now after it replaced Delayed Job in my projects, and love it. If your projects do something that could be done asynchronously, your really should check it out if you haven’t yet.

At OnApp we’ve been using Resque for a while to process background jobs of various types, with great results: in a few months, we’ve processed a little over 160 million jobs (at the moment of this writing), and out of this many only 43K jobs have been counted as failed so far. However, many of these failed jobs have been retried successfully at a successive attempt, so the number of jobs that actually failed is a lot smaller, perhaps a very few thousands.

Out of 160M+ jobs, it’s a very small percentage of failures. But despite the system, for the most part, has been rock solid so far, jobs can still fail every now and then depending on the nature of the jobs, excessive load on the worker servers, temporary networking and timeout issues or design related issues such as race conditions and alike. Sometimes, you will also find that workers can get “stuck”, requiring (usually) manually intervention (as in: kill / restart the workers, manually sort out failed jobs).

So I wanted to share a simple script I am using in production to automatically find and kill these “stuck” workers and then retry any jobs that are found as ‘failed’ due to the workers having been killed, or else. The purpose is to keep workers running and minimise the need for manual intervention when something goes wrong.

Please note that I use resque-pool to manage a pool of workers more efficiently on each worker server. Therefore if you manage your workers in a different way, you may need to adapt the script to your configuration.

You can find the little script in this gist, but I’ll briefly explain here how it works. It’s very simple, really. First, the script looks for the processes that are actually working off jobs:

root@worker1:/scripts# ps -eo pid,command | grep [r]esque
10088 resque-pool-master: managing [10097, 10100, 10107, 10113, 10117, 10123, 10138, 10160, 10167, 10182, 10195]
10097 resque-1.20.0: Forked 16097 at 1337878130
10100 resque-1.20.0: Forked 16154 at 1337878131
10107 resque-1.20.0: Waiting for cdn_transactions_collection
10113 resque-1.20.0: Waiting for usage_data_collection
10117 resque-1.20.0: Waiting for usage_data_collection
10123 resque-1.20.0: Waiting for check_client_balance
10138 resque-1.20.0: Waiting for check_client_balance
10160 resque-1.20.0: Waiting for geo_location
10167 resque-1.20.0: Forked 16160 at 1337878131
10182 resque-1.20.0: Forked 16163 at 1337878132
10195 resque-1.20.0: Waiting for services_coordination
16097 resque-1.20.0: Processing push_notifications since 1337878130
16163 resque-1.20.0: Processing push_notifications since 1337878132

This is an example from one of our worker servers. The Processing processes are those that are actually working off jobs, so these are the ones we are after since these are the processes that can get “stuck” sometimes for a reason or another. So the script first looks for these processes only, ignoring the rest:

root@worker1:/scripts# ps -eo pid,command | grep [r]esque | grep Processing
18956 resque-1.20.0: Processing push_notifications since 1337878334
19034 resque-1.20.0: Processing push_notifications since 1337878337
19052 resque-1.20.0: Processing usage_data_collection since 1337878338
19061 resque-1.20.0: Processing usage_data_collection since 1337878338
19064 resque-1.20.0: Processing usage_data_collection since 1337878339
19066 resque-1.20.0: Processing usage_data_collection since 1337878339

Next, the script loops through these processes, and looks for those that have been running for over 50 seconds. You may want to change this threshold, but in our case all jobs should usually complete in a few seconds, so if some jobs are still found after almost a minute, something is definitely going on.

ps -eo pid,command |
grep [r]esque |
grep "Processing" |
while read PID COMMAND; do
if [[ -d /proc/$PID ]]; then
SECONDS=`expr $(awk -F. '{print $1}' /proc/uptime) - $(expr $(awk '{print $22}' /proc/${PID}/stat) / 100)`

if [ $SECONDS -gt 50 ]; then
kill -9 $PID
...

QUEUE=`echo "$COMMAND" | cut -d ' ' -f 3`

echo "
The forked child with pid #$PID (queue: $QUEUE) was found stuck for longer than 50 seconds.
It has now been killed and job(s) flagged as failed as a result have been re-enqueued.

You may still want to check the Resque Web UI and the status of the workers for problems.
" | mail -s "Killed stuck Resque job on $(hostname) PID $PID" email@address.com

...
fi
fi
done

I was looking for a nice and easy way to find out how long (in seconds) a process had been running, and the expression you see in the code snippet above was the nicest solution I could find (hat tip to joseph for this).

If any of the Resque processes that are working off jobs are found running for longer than 50 seconds, then these are killed without mercy and a notification is sent to some email address just in case.

First, this way we don’t actually kill Resque workers, but other processes forked by the workers in order to process jobs. This means that the workers remain up and running and soon after they’ll fork new processes to work off some other jobs from the queue(s) they are watching. This is the nicest part, in that you don’t need to manually kill the actual workers and then restart them in order to keep the worker servers going.

Second, killing those processes will cause the jobs that they were processing to fail, so they will appear in Resque’s “failed jobs” queue. The second part of the script takes care of this by running a rake task that re-enqueues all failed jobs and clears the failed jobs queue. For starters, you’ll need to add this rake task to your application. If you are already using Resque, you will likely have a lib/tasks/resque.rake file, otherwise you’ll have to create one (I’m assuming here it’s a Rails application).

In any case, add the following task to that rake file:

desc "Retries the failed jobs and clears the current failed jobs queue at the same time"
task "resque:retry-failed-jobs" => :environment do
(Resque::Failure.count-1).downto(0).each { |i| Resque::Failure.requeue(i) }; Resque::Failure.clear
end

Back to the script, if it finds and kills any workers that it found stuck, it then proceeds to run the above rake task so to retry the failed jobs:

ps -eo pid,command |
grep [r]esque |
grep "Processing" |
while read PID COMMAND; do
if [[ -d /proc/$PID ]]; then
SECONDS=`expr $(awk -F. '{print $1}' /proc/uptime) - $(expr $(awk '{print $22}' /proc/${PID}/stat) / 100)`

if [ $SECONDS -gt 50 ]; then
...
touch /tmp/retry-failed-resque-jobs
...
fi
fi
done

if [[ -f /tmp/retry-failed-resque-jobs ]]; then
/bin/bash -c 'export rvm_path=/usr/local/rvm && export HOME=/home/deploy && . $rvm_path/scripts/rvm && cd /var/www/sites/dashboard/current/ && /usr/local/bin/rvm rvmrc load && RAILS_ENV=production bundle exec rake resque:retry-failed-jobs'
fi

You may notice that I am forcing the loading of RVM before running the rake task; this is because I need to upgrade some stuff on the worker servers, but you may not need to run the rake task this way.

This is basically it: the script just kills the stuck workers and retries the failed jobs without requiring manual intervention; in almost all cases, I don’t have to worry anymore about them besides wondering whether there’s a design issue that might cause workers to get stuck and that therefore need to be addressed (which is a good reason to keep an eye on the notifications). There might be other monitoring solutions of various types out there, but this simple script is what has been working best for me so far on multiple worker servers with tens of workers.

The final step is to ensure that this script runs frequently so to fix problems as soon as they arise. The script is extremely lightweight, so in my case I just schedule it (with cron) to run every minute on each server.

Know of a better way of achieving the same result? Please do let me know in the comments.

share_counts Ruby gem: The easiest way to check how many times a URL has been shared on social networks!

I was looking for a way to quickly check at once how many times a URL has been shared on the most popular social networks and aggregators, but I couldn’t find any. So I wrote some code to query these social networks’ APIs and I thought it may be useful to others, so why not gem it up? In fact, I already got the confirmation that may be useful to others since I just published the gem hours ago and despite I am talking about it only now, I saw that almost 20 people had already downloaded the gem!At the moment, the gem – named share_counts – supports the following social networks and aggregators:

  • Reddit
  • Digg
  • Twitter
  • Facebook (Shares and Likes)
  • LinkedIn
  • Google Buzz
  • StumbleUpon

I may add support for other networks if needed, and I will more likely extend the gem with other methods to leverage more of what the APIs offer, so stay tuned.

Using the gem

Github repo: https://github.com/vitobotta/share_counts.
On RubyGems: https://rubygems.org/gems/share_counts/stats.

Once you have installed the gem with the usual gem install share_counts, it’s very easy to use. For example, if you want to check the Reddit score for a story, you can call the method reddit with the URL as argument:

ruby-1.9.2-p0 :001 > require "share_counts"
=> true

ruby-1.9.2-p0 :016 > ShareCounts.supported_networks
=> ["reddit", "digg", "twitter", "facebook", "fblike", "linkedin", "googlebuzz", "stumbleupon"]

ruby-1.9.2-p0 :002 > ShareCounts.reddit "https://vitobotta.com/awesomeprint-similar-production/"
Redis caching is disabled - Making request to reddit...
=> 5

It works the same way with the other networks supported. Only for Facebook there are two methods available rather than one, since Facebook has both “shares” and “likes”:

ruby-1.9.2-p0 :003 > ShareCounts.facebook "https://vitobotta.com/awesomeprint-similar-production/"
Redis caching is disabled - Making request to facebook...
=> 1

ruby-1.9.2-p0 :004 > ShareCounts.fblike "https://vitobotta.com/awesomeprint-similar-production/"
Redis caching is disabled - Making request to fblike...
=> 0

You can also get both shares and likes together:

ruby-1.9.2-p0 :007 > ShareCounts.fball "https://vitobotta.com/awesomeprint-similar-production/"
Redis caching is disabled - Making request to fball...
=> {"share_count"=>1, "like_count"=>0}

Also you can get the share counts for all the supported services in one call or otherwise specify which ones you are interested in:

ruby-1.9.2-p0 :005 > ShareCounts.all "https://vitobotta.com/awesomeprint-similar-production/"
Redis caching is disabled - Making request to reddit...
Redis caching is disabled - Making request to digg...
Redis caching is disabled - Making request to twitter...
Redis caching is disabled - Making request to facebook...
Redis caching is disabled - Making request to fblike...
Redis caching is disabled - Making request to linkedin...
Redis caching is disabled - Making request to googlebuzz...
Redis caching is disabled - Making request to stumbleupon...
=> {:reddit=>4, :digg=>1, :twitter=>2, :facebook=>1, :fblike=>0, :linkedin=>2, :googlebuzz=>0, :stumbleupon=>0}

ruby-1.9.2-p0 :006 > ShareCounts.selected "https://vitobotta.com/awesomeprint-similar-production/", [ :reddit, :linkedin ]
Redis caching is disabled - Making request to reddit...
Redis caching is disabled - Making request to linkedin...
=> {:reddit=>4, :linkedin=>2}

In these cases you’ll get back a hash instead.

At this point, you may have noticed the message “Redis caching is disabled” being printed with each call. That was because I had the caching disabled. Since I’ve noticed that a) some of these social networks’ APIs aren’t available/working 100% of the time, and b) some of them may do rate limiting if you are making too many requests in a short period of time, the gem also supports caching with Redis.

By default caching is disabled since you may not be running Redis or you may want to use some other caching in your application, or you may not want to use caching at all. So the first step if you do want to use the caching, is to enable it. By default, share_counts assumes that Redis is listening on 127.0.0.1:6379, but you can override this by either setting in advance the global variable $share_counts_cache if you already have a reference to a connection with Redis (and you’re using the same redis gem used by share_counts), or by passing that reference as argument. You can alternatively specify host and port when you enable the caching:

# Using caching with the default settings
ruby-1.9.2-p0 :009 > ShareCounts.use_cache
=> #<Redis client v2.1.1 connected to redis://127.0.0.1:6379/0 (Redis v2.0.3)>

# Using an existing reference to a connection to Redis
$share_counts_cache = a_Redis_connection

# Same thing as above, but by passing the reference to the connection as argument
ruby-1.9.2-p0 :010 > ShareCounts.use_cache :redis_store => a_Redis_connection

# Specifying host and port for the connection to Redis
ruby-1.9.2-p0 :010 > ShareCounts.use_cache :host => "localhost", :port => 6379
=> #<Redis client v2.1.1 connected to redis://127.0.0.1:6379/0 (Redis v2.0.3)>

Cached share counts expire by default in 2 minutes, but you can again override this by setting the global variable $share_counts_cache_expire to a value in seconds.

So, let’s compare now using the gem with and without caching:

ruby-1.9.2-p0 :002 > require 'benchmark'
=> true

# Enabling caching
ruby-1.9.2-p0 :003 > ShareCounts.use_cache
=> #<Redis client v2.1.1 connected to redis://127.0.0.1:6379/0 (Redis v2.0.3)>

# First run, values are not cached
ruby-1.9.2-p0 :004 > Benchmark.realtime { ShareCounts.all "https://vitobotta.com/awesomeprint-similar-production/" }
Making request to reddit...
Making request to digg...
Making request to twitter...
Making request to facebook...
Making request to fblike...
Making request to linkedin...
Making request to googlebuzz...
Making request to stumbleupon...
=> 3.7037899494171143

# Now values are cached
ruby-1.9.2-p0 :005 > Benchmark.realtime { ShareCounts.all "https://vitobotta.com/awesomeprint-similar-production/" }
Loaded reddit count from cache
Loaded digg count from cache
Loaded twitter count from cache
Loaded facebook count from cache
Loaded fblike count from cache
Loaded linkedin count from cache
Loaded googlebuzz count from cache
Loaded stumbleupon count from cache
=> 0.003225088119506836

You can see which URLs have been cached and with which available share counts, with the cached method:

ruby-1.9.2-p0 :013 > ShareCounts.cached
=> {"https://vitobotta.com/awesomeprint-similar-production/"=>{:fblike=>0, :stumbleupon=>0, :linkedin=>2, :googlebuzz=>0, :facebook=>1, :twitter=>2, :digg=>1, :reddit=>5}}

Also, if you need you can clear the cached values:

ruby-1.9.2-p0 :013 > ShareCounts.cached
=> {"https://vitobotta.com/awesomeprint-similar-production/"=>{:fblike=>0, :stumbleupon=>0, :linkedin=>2, :googlebuzz=>0, :facebook=>1, :twitter=>2, :digg=>1, :reddit=>5}}
ruby-1.9.2-p0 :014 > ShareCounts.clear_cache
=> ["ShareCounts||fblike||https://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||stumbleupon||https://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||linkedin||https://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||googlebuzz||https://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||facebook||https://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||twitter||https://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||digg||https://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||reddit||https://vitobotta.com/awesomeprint-similar-production/"]
ruby-1.9.2-p0 :015 > ShareCounts.cached
=> {}

Notes:

  • If a request fails for one network, its share count won’t be cached and will remain set to nil. This way you can easily know whether a service’s API failed by just checking whether its share count for the given URL is nil or not.
  • Since you may be already using Redis in your app for something else, the gem namespaces the keys so that if you clear the cache, only its keys will be deleted.

A look at the code

The code is on Github if you want to have a look. Here I’ll highlight a few things.

All the methods to retrieve share counts for each supported service are wrapped in the module ShareCounts, as you may already have guessed:

module ShareCounts

extend Common
extend Caching

def self.supported_networks
%w(reddit digg twitter facebook fblike linkedin googlebuzz stumbleupon)
end

def self.reddit url
try("reddit", url) {
extract_count from_json( "http://www.reddit.com/api/info.json", :url => url ),
:selector => "data/children/data/score"
}
end

...

In particular, the try method will try to fetch the requested share count(s) by either making an HTTP request (or multiple requests depending on which share counts are being requested) or, if caching is enabled, from the cache.

def try service, url, &block
cache_key = "ShareCounts||#{service}||#{url}"
if cache_enabled?
if result = from_redis(cache_key)
puts "Loaded #{service} count from cache"
result
else
puts "Making request to #{service}..."
to_redis(cache_key, yield)
end
else
puts "Redis caching is disabled - Making request to #{service}..."
yield
end
rescue Exception => e
puts "Something went wrong with #{service}: #{e}"
end

Since most of these APIs follow a common pattern, HTTP requests are made with the assumption that APIs will return a JSON response with or without a callback method; if a callback method is provided, then the response is first manipulate to just extract the JSON data we need. The make_request method will attempt a request to a network’s API for a maximum of three times, with a maximum timeout of 2 seconds for each attempt. There’s a reason for this: while I was testing these APIs, I noticed that in most cases if a request didn’t return within a couple seconds, it then either timed out after a long time or returned with a 503 Service Unavailable status code. From this point of view, I must say I was surprised to see that Digg‘s API was likely the least reliable of the bunch, returning a 503 code too often, although I wasn’t making too many requests in a short period of time, so I doubt this was because of rate limiting. Anyway, the combination of a 2 seconds timeout and the three attempts, means we expect a response from each service within a few seconds and that’s a good compromise if you use caching. To make requests, I am using one of my favourite gems, rest-client (from the Github user archiloque‘s fork since it seems to be more up to date than the original one by Heroku‘s Adam Wiggins):

def make_request *args
result = nil
attempts = 1

begin
timeout(2) do
url = args.shift
params = args.inject({}) { |r, c| r.merge! c }
response = RestClient.get url, { :params => params }

# if a callback is specified, the expected response is in the format "callback_name(JSON data)";
# with the response ending with ";" and, in some cases, "\n"
result = params.keys.include?(:callback) \
? response.gsub(/^(.*);+\n*$/, "\\1").gsub(/^#{params[:callback]}\((.*)\)$/, "\\1") \
: response
end

rescue Exception => e
puts "Failed #{attempts} attempt(s)"
attempts += 1
retry if attempts <= 3
end

result
end

As for the extraction of the actual share counts from each API’s response, I was pleased to see a common pattern in the usage of JSON, so it was as easy as writing a simple method that “queries” the JSON data in a way that somehow recalls XPATH for XML. Arguments are the JSON data and a :selector => “where/the/share/count/is”, single key hash:

def extract_count *args
json = args.shift
result = args.first.flatten.last.split("/").inject( json.is_a?(Array) ? json.first : json ) {
|r, c| r[c].is_a?(Array) ? r[c].first : r[c]
}
end

The stuff needed for the caching with Redis is in a separate mix-in. If you haven’t used Redis yet, you can see its most basic usage from looking at this code.

To initialise a connection and optionally specify host and port:

def use_cache *args
arguments = args.inject({}) { |r, c| r.merge(c) }
$share_counts_cache ||= arguments[:redis_store] ||
Redis.new(:host => arguments[:host] || "127.0.0.1", :port => arguments[:port] || "6379")
end

To read from and write to Redis:

def from_redis(cache_key)
value = $share_counts_cache.get(cache_key)
return if value.nil?
Marshal.load value
end

def to_redis(cache_key, value)
$share_counts_cache.set cache_key, Marshal.dump(value)
$share_counts_cache.expire cache_key, $share_counts_cache_expire || 120
value
end

Then we have methods to return all the cached values used by the gem, and to clear those cached values:

def cached
urls = ($share_counts_cache || {}).keys.select{|k| k =~ /^ShareCounts/ }.inject({}) do |result, key|
data = key.split("||"); network = data[1]; url = data[2];
count = from_redis("ShareCounts||#{network}||#{url}")
(result[url] ||= {})[network.to_sym] = count unless ["all", "fball"].include? network
result
end
urls
end

def clear_cache
($share_counts_cache || {}).keys.select{|cache_key| cache_key =~ /^ShareCounts/ }.each{|cache_key|
$share_counts_cache.del cache_key}
end

As you can see keys are sort of “namespaced” and I am using inject to build a hash with the cached URLs and share counts.

APIs: a few exceptions to the “rule”

As said, most of the APIs for supported social networks follow a common pattern in their usage of JSON data. However there were two exceptions; a small one with Google Buzz‘s, in that it returns a JavaScript object -instead of an array- having as unique property the URL specified as argument; the value of that property then is the actual share count on Google Buzz. So in this case rather than using the extract_count method as for the other JSON-based APIs, all I had to do is getting the value of that property once parsed the JSON response:

def self.googlebuzz url
try("googlebuzz", url) {
from_json("http://www.google.com/buzz/api/buzzThis/buzzCounter",
:url => url, :callback => "google_buzz_set_count" )[url]
}
end

The second exception, instead, is StumbleUpon. I was so surprised and disappointed to see that they don’t have an API yet! (unless I missed it). It looks like StumbleUpon is a little behind the competition on this front. Luckily, despite the lack of an API, it wasn’t much more difficult to fetch share counts for SU too; in this case, once identified the HTML returned when their button is displayed, I could use Nokogiri to extract the share count, using XPATH:

def self.stumbleupon url
try("stumbleupon", url) {
Nokogiri::HTML.parse(
make_request("http://www.stumbleupon.com/badge/embed/5/", :url => url )
).xpath( "//body/div/ul/li[2]/a/span").text.to_i
}
end

So this was a quick look at the code as it is now, but I expect to add more methods to fetch more information from the APIs, so keep an eye on the Github repo if you plan on using this gem.

Also, if you have any suggestions on what to add or how to improve it, please let me know in the comments.