Gentoo: using Postfix with an external SMTP service

Sometimes I want to have some email notifications sent to my email address by my computer at home, for example to receive reports of some scheduled tasks.

The problem is that if you just install Postfix or other MTA with a default configuration, emails sent from your home computer may be flagged as spam or mailing may not work altogether due to some restrictions ISPs often have also to prevent spam.

One workaround is to configure e.g. Postfix to use an external SMTP service such as SendGrid to send the emails. Here I’ll show how to do this on Gentoo.

First thing you need to do is install Postfix. Edit /etc/portage/package.use and add:

>=mail-mta/postfix-3.1.0-r1 sasl

(of course you may have to specify a different version) Then run:

sudo emerge -av mail-mta/postfix

I also suggest you install mailutils as this includes an utility you can use to test email sending:

sudo emerge -av net-mail/mailutils

Next, you need to edit /etc/postfix/sasl_passwd and add the following line which contains the address and port of the SMTP service and the credentials required for the authentication:

[smtp.sendgrid.net]:587 username:password

You need then to create a db from this file with the following command:

sudo postmap /etc/postfix/sasl_passwd
sudo chown root:root /etc/postfix/sasl_passwd /etc/postfix/sasl_passwd.db
sudo chmod 0600 /etc/postfix/sasl_passwd /etc/postfix/sasl_passwd.db

Also run:

sudo newaliases
sudo postmap /etc/mail/aliases

Now edit /etc/postfix/main.cf and add the following:

relayhost = [smtp.sendgrid.net]:587
smtp_sasl_auth_enable = yes
smtp_sasl_security_options = noanonymous
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_use_tls = yes
smtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt
myhostname = <hostname>
mydomain = <hostname>

Please note that you need to set a FQDN hostname on your computer that is already validated with the SMTP service.

Finally, restart Postfix:

sudo systemctl restart postfix.service

You can test that mailing works with the mail utility:

echo blah | mail -s "test" <your email address>

To check the logs you can run:

journalctl -f -u postfix

That’s it. All emails sent from your computer will be now sent through the 3rd party SMTP service.

Migrating a Google Analytics property to another account

I’ve had two Google Analytics accounts for a few years now, the first one with just one property – for this blog – and the other one for the other sites/apps I manage. Today I wanted to migrate the blog property to the second account so to keep everything under the same account, so I was happy to see that this is possible now – not sure when they’ve changed things but I had tried this some time ago without success.

I did find some help page by Google about this, but it was confusing as to which exact permissions I had to enable and where, so here’s what I have done in case someone else finds this confusing too.

So, assuming you own a Google Analytics account A and another account B, and want to migrate/move a property from A to B, the first thing you need to do is open the property settings under Admin in account A:

screenshot-from-2016-10-15-13-43-17

Then you have to add the user account B under User Management and enable all the permissions for it. Here’s the confusing part: there is a User Management section for both the property and the account. From reading the Google help pages it wasn’t clear which one was it; it turns out, you want to head to the account‘s user management:

screenshot-from-2016-10-15-13-46-41

Here you need to add the email address for account B and enable all the permissions:

screenshot-from-2016-10-15-13-54-45

Once you’ve done this, head back to Admin > Property Settings and click the Move property button.

screenshot-from-2016-10-15-13-58-10

Select account B from the drop down and confirm the changes. That’s it. Give it (usually) a few minutes and the property will be moved to account B.

Encrypted Gentoo Installation on MacBook Pro

It looks like it’s been a while again since I last posted something… but here I am. About three months ago I was planning to replace my late 2013 iMac 27″ with a Mac Pro; overall I liked the iMac a lot but from time to time I do some video editing/encoding and the iMac got very hot and noisy each time. So I was originally thinking to switch to a Mac Pro mainly for this reason. However there was no sight of new Mac Pros and the ones currently available are still ridiculously expensive considering that we are talking about hardware released in 2013; with much less money you can easily build yourself a much more powerful machine, and so I did. I sold the iMac and with half the amount I’d have spent for a Mac Pro I bought all the parts (plus two 27″ monitors, new keyboard/mouse and webcam!) and built a very powerful machine with recent hardware. It’s very fast and very quiet even overclocked.

I initially thought I’d use the new PC as a Hackintosh and install macOS on it as the primary OS, but having used a Hackintosh in recent past I didn’t want again the hassle of getting the computer to work with macOS knowing that each time there is a big update there is also the risk that the OS could stop working altogether.

So the primary candidate was Ubuntu since I have been using it on servers for many years, but I then decided to install Gentoo Linux instead. IMO the installation isn’t as complicated and difficult as many think it is, so I eventually installed Gentoo on my two MacBook Pros as well as the desktop. I must say that so far I am loving it and I don’t miss OSX/macOS at all since I found and got used to the alternative apps for Linux.

Why Gentoo?

Some of the reasons why I wanted to give Gentoo a try as my primary OS are:

  • you can install binary packages but most software is compiled and thus it is optimised for your hardware, which means it does take longer when you install stuff but you usually get a faster system in return (“Gentoo” is the name of the fastest penguins on earth);
  • you really install only what you want/need. It’s not like most other distros which install a lot of stuff and features that you may never use. Instead with Gentoo you only install what you actually need and just the dependencies required; for example if you use Gnome like me, you can configure the system so that it doesn’t install all the packages required for KDE and so on. With USE flags you can even customise features on a per package basis if you wish;
  • Gentoo differs from other distros also in that it uses a rolling release system, so you can just install the system once and keep it frequently updated with the latest versions of everything, rather than having to perform a bigger upgrade in one go each time a new release is out; you must update your system frequently though for this to work well;
  • documentation is perhaps the best one I’ve seen so far for Linux distributions.

Installing Gentoo on a MacBook Pro with full disk encryption

There are several guides on the Internet (especially the official Gentoo Handbook) which show how to do a typical Gentoo installation, but I thought I’d add here my own notes on how to do this specifically on a MacBook Pro with full disk encryption and LVM, so it can hopefully save some time vs reading several guides to achieve the same. I want to keep this as short as possible so I won’t go into the details for every command, which you can easily find yourself. Here I will just describe the steps necessary to get a system up and running quickly, and will update the post each time I install Gentoo, if needed.

First, a few notes:

  • the two MacBook Pros on which I have installed Gentoo are a mid-2010 and an early-2011, so they are not very recent; you might find you have to tweak the installation process a little if you own a more recent MBP but most of the process will be the same;
  • while learning the installing process I had at times to force eject the installation CD/DVD during boot. I found that you can do this by holding the touch-pad’s left button while the MBP is booting;
  • once you install the system, you may find that your MBP takes around 30 seconds before actually booting and it will seem as if it freezes on the white screen after the startup chime sound; to fix this you will need to boot the system from an OSX/macOS installation media or use the Internet recovery, and lunch the following command from a terminal:
bless --device /dev/disk0s1 --setBoot --legacy

You need to replace /dev/disk0s1 with the correct name for your disk device which you can find with the diskutil list command;

  • during the installation the network interface may not work automatically until you get everything sorted; you can use the
ip link show

command to find the correct name for your network interface, which as we’ll see later you will need to manually activate.

  • you can use either the Gentoo CD or the DVD to install the system. The difference is that the CD only boots in BIOS mode while the DVD can also boot in EFI mode. So if you want to do an installation in EFI mode you will have to use the DVD. In my case, I have chosen to install Gentoo in BIOS mode on both my MBPs, because when the system boots in BIOS mode the integrated Intel graphics card is automatically disabled, forcing you to use the discrete ATI or nVidia card instead; if you want to avoid possible issues which may arise when having both the integrated card and the discrete card enabled, I recommend you also install the system in BIOS mode; it’s just easier. This is what I will show here.

The installation media

So, to get started with the installation first burn the Gentoo CD/DVD image which you can download here, then insert the CD/DVD in the optical drive and turn the MBP on while holding the Alt key, so you can chose to boot the system from the installation media. If you are using the DVD version you will be able to choose whether to boot the system in “Windows” mode or EFI mode. Choose “Windows” mode. You will then see the bootloader screen with some options; press “e” to temporarily edit the boot configuration and add the nomodeset argument to the line which starts with linux. This will avoid some issues with the graphics card during boot. Continue with the boot process making sure you boot into a terminal if you are using the DVD installation disk, otherwise it will load the “Live” version of Gentoo.

Disk and partitions

Next, assuming that you are going to install Gentoo as the only OS or anyway as the first OS (I won’t show here how to install multiple operating systems), you will want to wipe the disk and create the necessary partitions – if you want you can create separate partitions for /home etc but here I will assume you want a single main partition for simplicity. Run

fdisk /dev/sda

Press “p” to see the current partition scheme of the disk; to delete the first partition press “d” followed by the number of the partition you want to delete (starting from 1); repeat this until all the partitions have been removed from the configuration of the disk. Then you need to create the new partitions.

First, create the BIOS partition by pressing “n”, then “p” (to specify that you want to create a primary partition), and then “1” as the partition number; fdisk will now ask for both the first sector and the last sector for this partition; enter “2048” first and then “+2M” so that the size of the partition is 2MB. Next, create the boot partition by pressing “n”, then “p”, “2” (second partition); accept the default value for the first sector and enter “+128M” for the last sector so to have a 128M boot partition. Now press “a” and then “2” to make this partition bootable.

The last partition you need to create is /dev/sda3 which will later be encrypted and contain both the root partition for the OS and the data, and the swap partition. Press “n” again, followed by “p”, then “3”; accept the default values for both the first sector and the last sector so that this partition will take the remaining space on the disk.

If everything is OK you will see something like the following by pressing “p”:

Disk /dev/sda: 223.6 GiB, 240057409536 bytes, 468862128 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 6143 4096 2M 4 FAT16 <32M
/dev/sda2 * 6144 268287 262144 128M 83 Linux
/dev/sda3 268288 468860079 468591792 223.5G 83 Linux

The changes you have made haven’t been written to disk yet, so to confirm these changes and actually wipe the disk and create partitions press “w” then exit fdisk.

Now run

mkfs.vfat -F 32 /dev/sda2

to format the boot partition. Next it’s time to set up the encrypted partition. To activate the kernel modules required for the encryption run

modprobe dm-crypt
modprobe aes (if it returns an error it means that no hardware cryptographic device is present; in this case run "modprobe aes_generic" instead)
modprobe sha256

Next, to set up encryption and LVM run

cryptsetup luksFormat /dev/sda3 (type uppercase YES and enter a passphrase which you will use to unlock the encrypted disk)
cryptsetup luksOpen /dev/sda3 main
pvcreate /dev/mapper/main
vgcreate vg /dev/mapper/main
lvcreate -L 1GB -n swap vg
lvcreate -l 100%FREE -n root vg

Please note that I am using cryptsetup here with the default settings, but you can tweak the luksFormat command if you want to achieve higher security. Please refer to the man pages for more details. Next run vgdisplay to verify that all the space has been allocated to the encrypted partitions, then run:

mkswap /dev/vg/swap
swapon /dev/vg/swap
mkfs.ext4 /dev/vg/root
mount /dev/vg/root /mnt/gentoo
mkdir /mnt/gentoo/boot
mount /dev/sda2 /mnt/gentoo/boot
cd /mnt/gentoo

These commands will prepare and activate the swap partition, format the root partition as ext4 and mount both the boot and root partitions.

Installing the base system

Now you are ready to download the archive which contains the base system and install it. Run

links https://www.gentoo.org/downloads/mirrors/

which will launch a text based browser. Choose a mirror close to your location and download a stage3 archive from releases/amd64/autobuilds. Then run

tar xvjpf stage3-*.tar.bz2 --xattrs

to extract all the files for the base system on the root partition. Next run

nano -w /mnt/gentoo/etc/portage/make.conf

and change the follow settings:

CFLAGS="-march=native -O2 -pipe"
MAKEOPTS="-j5"
USE="mmx sse sse2 -kde gtk gnome dvd alsa cdr emu efi-32 efi-64i -bindist xvmc"
INPUT_DEVICES="evdev synaptics mtrack tslib"
VIDEO_CARDS="nouveau" (for nVidia graphics cards, or "radeon" for ATI cards)

Set MAKEOPTS to the number of cores + 1. Please note that I am assuming here you want to use Gnome, that’s why I have gnome but -kde in the USE setting. If you want to use something else you will have to change the USE setting. Now run

mirrorselect -i -o >> /mnt/gentoo/etc/portage/make.conf

and choose a mirror which will be used to download software from the repos hereinafter. Next,

mkdir /mnt/gentoo/etc/portage/repos.conf
cp /mnt/gentoo/usr/share/portage/config/repos.conf /mnt/gentoo/etc/portage/repos.conf/gentoo.conf

and then run

cp -L /etc/resolv.conf /mnt/gentoo/etc/

to configure DNS resolution for the installation process. Now run

mount -t proc proc /mnt/gentoo/proc
mount --rbind /sys /mnt/gentoo/sys
mount --rbind /dev /mnt/gentoo/dev
mount --make-rslave /mnt/gentoo/sys
mount --make-rslave /mnt/gentoo/dev

after which you are ready to chroot into the new system:

chroot /mnt/gentoo /bin/bash
source /etc/profile
export PS1="chroot $PS1"

It’s time to configure which system “profile” you want to use to configure and install the software. Run

emerge-webrsync
eselect news read
eselect profile list
eselect profile set X (where X is the profile you want to use, I use gnome/systemd)

Now install all the packages required to reflect the system profile you have chosen – as said I will assume you also have chosen gnome/systemd.

emerge --ask --update --deep --newuse @world

This will take some time, so go and enjoy a coffee. Once it’s done, choose your timezone, e.g.:

echo "Europe/Helsinki" > /etc/timezone
emerge --config sys-libs/timezone-data

and configure the locale:

nano -w /etc/locale.gen
locale-gen
eselect locale list
eselect locale set X (choose one)

So that these changes take effect, run

env-update && source /etc/profile && export PS1="(chroot) $PS1"

Configuring and compiling the Kernel

Now download the kernel sources with

emerge --ask sys-kernel/gentoo-sources

To ensure that the kernel will support encryption, run

echo "sys-kernel/genkernel-next cryptsetup" >> /etc/portage/package.use/genkernel-next

Then install genkernel which is a tool you can use to configure and compile the kernel.

emerge --ask sys-kernel/genkernel-next

You need now to edit /etc/fstab to ensure the boot partition is mounted at boot:

nano -w /etc/fstab

and add:

/dev/sda2 /boot ext2 defaults 0 0

Next install LVM:

emerge -av sys-fs/cryptsetup sys-fs/lvm2

Then edit /etc/genkernel.conf and make the following changes:

MRPROPER="no"
MAKEOPTS="-j5"
LVM="yes"
LUKS="yes"
REAL_ROOT="/dev/vg/root"
INITRAMFS_OVERLAY="/boot/overlay"
BUSY_BOX="yes"
MENU_CONFIG="yes"

Here also set MAKEOPTS to the number of cores + 1.

To compile the kernel, run:

genkernel --no-zfs --no-btrfs --install all

Now you can customise the kernel if you wish, or leave the defaults as they are – up to you. As you can see I am passing the –no-zfs –no-btrfs arguments since I don’t use these file system, so the compilation takes a little less time.

Once the kernel has been compiled, edit /etc/fstab once again and add

/dev/sda2 /boot vfat defaults 0 2
/dev/vg/root / ext4 noatime 0 0
/dev/vg/swap none swap sw 0 0
/dev/cdrom /mnt/cdrom auto noauto,user 0 0

Networking

Check which name your network interface has with

ip link show

then edit /etc/conf.d/net and change it so it looks as follows:

config_enp2s0f0="dhcp"

Of course change enp2s0f0 with the name of your network interface. Next, run

cd /etc/init.d
ln -s net.lo net.enp2s0f0 (again, use the name of your network interface here)
rc-update add net.enp2s0f0 default

Miscellaneous

At this stage you may want to set your root password with

passwd

Also install sysklogd with

emerge --ask app-admin/sysklogd

Bootloader

To install the bootloader, run

emerge --ask sys-boot/grub:2

Then edit /etc/default/grub and change the GRUB_CMDLINE_LINUX setting as follows:

GRUB_CMDLINE_LINUX="init=/usr/lib/systemd/systemd crypt_root=/dev/sda3 root=/dev/mapper/vg-root dolvm rootfstype=ext4 nomodeset"

This makes sure the correct settings are used each time you update the bootloader. In this example we specify that systemd, encryption and lvm must be used during boot otherwise it will not be possible to access the encrypted partitions. We also add nomodeset to avoid problems with the graphics card as explained earlier. Next,

grub-install /dev/sda
grub-mkconfig -o /boot/grub/grub.cfg

You should now be able to boot into the new system:

exit
cd
umount -l /mnt/gentoo/dev{/shm,/pts.}
umount /mnt/gentoo{/boot,/sys,/proc}
shutdown -r now

Hopefully the system will start from the disk. If all is OK, run

systemd-machine-id-setup
hostnamectl set-hostname vito-laptop (choose whichever hostname you wish here)

Next edit /etc/systemd/network/50-dhcp.network and change the contents as follow:

[Match]
Name=en*

[Network]
DHCP=yes

To activate networking now and ensure it is activated at startup, run

systemctl enable systemd-networkd.service
systemctl start systemd-networkd.service

At this stage I’d add the main user account with

useradd -m -G users,wheel,audio,video -s /bin/bash vito
passwd vito

Of course use your chosen account name instead of “vito”.

Graphics card and environment

To install the drivers for your graphics card and X, run

emerge --ask --verbose x11-base/xorg-drivers
emerge --ask x11-base/xorg-server
env-update
source /etc/profile

Next, to install Gnome edit /etc/portage/package.use/gnome-session and add

gnome-base/gnome-session branding

Then run

emerge --ask gnome-base/gnome
eselect news read
gpasswd -a vito plugdev (your account name instead of 'vito')

Edit /etc/conf.d/xdm and set GDM as the window manager, then run

echo "exec gnome-session" > ~/.xinitrc
systemctl enable gdm.service
systemctl start gdm.service
shutdown -r now

If all went well, the system will now boot into Gnome.

Touch pad

If the touch pad isn’t working you will need to recompile the kernel. Run

genkernel --no-zfs --no-btrfs --install all

and enable the following settings before saving and exiting – which will trigger recompilation:

EHCI HCD (USB 2.0) support
Root Hub Transaction Translators
Improved Transaction Translator scheduling
Generic EHCI driver for a platform device
Device Drivers --->
Input device support --->
Mice --->
Apple USB BCM5974 Multitouch trackpad support

Keeping the system up to date

As I mentioned earlier, it is recommended you update the system frequently to avoid problems with big updates. To update the system, I usually run the following commands weekly:

emerge --sync
emerge -avuDU --with-bdeps=y @world
emaint --check world
emerge -av --depclean
emerge --update --newuse --deep @world
revdep-rebuild
perl-cleaner --all

Conclusions

I actually had some more notes about using proprietary drivers for the graphics card (instead of the open source nouveau or radeon drivers) and a few more things, but I can’t find them at the moment. I will update the post if I find them or if I go through the installation process again. Anyway the steps described in the post will get you up and running with an encrypted installation with gnome/systemd.

Let me know in the comments if this post has been somehow useful.

Microsoft buys a chunk of Nokia. Surprise!

We saw this coming, didn’t we? It was all planned, and well orchestrated; it’s been a good strategy: send someone to the other company, help it sink, then buy it at a discount price. It was all a con. I love conspiracies. The “mole” ex-Microsoft Stephen Elop was supposed to save Nokia (read his memo on Nokia being a ‘burning platform’) but he is saving Microsoft instead. He has demolished Nokia by first making it undervalued and then giving away to Microsoft its devices and services business on the cheap, for just $7 billion ..Is it just me or does this sound like a bargain? Is that significant portion of Nokia’s business really worth only $7 billion? Wow. I would have guessed/expected much more. Of course Elop will soon be back at Microsoft and will likely follow Steve Ballmer as CEO.

Many say that this may anyway be a good move given that both companies are struggling and out of options, but will two struggling companies make a good one?

By buying the devices and services part of Nokia’s business and acquiring a long term license for its patents, Microsoft will likely try to position itself as a direct competitor for Apple and Google, since it will be able to engineer both software and hardware for its new mobile devices. But I wonder what we should expect in the near future, since neither Microsoft nor Nokia have amazed me with their products or services in the past several years. Will we see a new Kin? 😀

I also wonder what will happen to the other partners that Microsoft currently has and that use the Windows Phone OS on their devices; will they be willing to compete with Microsoft directly or will they jump ship and join the other manufacturers that use Android instead on their devices? Some may argue that this didn’t happen when Google acquired Motorola for $12.5 billion last year, but it’s still possible.

It’s a shame, really, that a company that once upon a time ruled the world of mobile, is kinda slowly dying. My first phone was a Nokia, and so many of the phones I have owned before my first iPhone. It saddens me to think that we will never see another Nokia smartphone 😦

At least there still is Angry Birds! 😀

Resque: automatically kill stuck workers and retry failed jobs

Resque is a great piece of software by Github that makes it really easy to perform some operations (‘jobs’) asynchronously and in a distributed way across any number of workers. It’s written in Ruby and backed by the uber cool Redis key-value data store, so it’s efficient and scalable. I’ve been using Resque in production for a couple years now after it replaced Delayed Job in my projects, and love it. If your projects do something that could be done asynchronously, your really should check it out if you haven’t yet.

At OnApp we’ve been using Resque for a while to process background jobs of various types, with great results: in a few months, we’ve processed a little over 160 million jobs (at the moment of this writing), and out of this many only 43K jobs have been counted as failed so far. However, many of these failed jobs have been retried successfully at a successive attempt, so the number of jobs that actually failed is a lot smaller, perhaps a very few thousands.

Out of 160M+ jobs, it’s a very small percentage of failures. But despite the system, for the most part, has been rock solid so far, jobs can still fail every now and then depending on the nature of the jobs, excessive load on the worker servers, temporary networking and timeout issues or design related issues such as race conditions and alike. Sometimes, you will also find that workers can get “stuck”, requiring (usually) manually intervention (as in: kill / restart the workers, manually sort out failed jobs).

So I wanted to share a simple script I am using in production to automatically find and kill these “stuck” workers and then retry any jobs that are found as ‘failed’ due to the workers having been killed, or else. The purpose is to keep workers running and minimise the need for manual intervention when something goes wrong.

Please note that I use resque-pool to manage a pool of workers more efficiently on each worker server. Therefore if you manage your workers in a different way, you may need to adapt the script to your configuration.

You can find the little script in this gist, but I’ll briefly explain here how it works. It’s very simple, really. First, the script looks for the processes that are actually working off jobs:

root@worker1:/scripts# ps -eo pid,command | grep [r]esque
10088 resque-pool-master: managing [10097, 10100, 10107, 10113, 10117, 10123, 10138, 10160, 10167, 10182, 10195]
10097 resque-1.20.0: Forked 16097 at 1337878130
10100 resque-1.20.0: Forked 16154 at 1337878131
10107 resque-1.20.0: Waiting for cdn_transactions_collection
10113 resque-1.20.0: Waiting for usage_data_collection
10117 resque-1.20.0: Waiting for usage_data_collection
10123 resque-1.20.0: Waiting for check_client_balance
10138 resque-1.20.0: Waiting for check_client_balance
10160 resque-1.20.0: Waiting for geo_location
10167 resque-1.20.0: Forked 16160 at 1337878131
10182 resque-1.20.0: Forked 16163 at 1337878132
10195 resque-1.20.0: Waiting for services_coordination
16097 resque-1.20.0: Processing push_notifications since 1337878130
16163 resque-1.20.0: Processing push_notifications since 1337878132

This is an example from one of our worker servers. The Processing processes are those that are actually working off jobs, so these are the ones we are after since these are the processes that can get “stuck” sometimes for a reason or another. So the script first looks for these processes only, ignoring the rest:

root@worker1:/scripts# ps -eo pid,command | grep [r]esque | grep Processing
18956 resque-1.20.0: Processing push_notifications since 1337878334
19034 resque-1.20.0: Processing push_notifications since 1337878337
19052 resque-1.20.0: Processing usage_data_collection since 1337878338
19061 resque-1.20.0: Processing usage_data_collection since 1337878338
19064 resque-1.20.0: Processing usage_data_collection since 1337878339
19066 resque-1.20.0: Processing usage_data_collection since 1337878339

Next, the script loops through these processes, and looks for those that have been running for over 50 seconds. You may want to change this threshold, but in our case all jobs should usually complete in a few seconds, so if some jobs are still found after almost a minute, something is definitely going on.

ps -eo pid,command |
grep [r]esque |
grep "Processing" |
while read PID COMMAND; do
if [[ -d /proc/$PID ]]; then
SECONDS=`expr $(awk -F. '{print $1}' /proc/uptime) - $(expr $(awk '{print $22}' /proc/${PID}/stat) / 100)`

if [ $SECONDS -gt 50 ]; then
kill -9 $PID
...

QUEUE=`echo "$COMMAND" | cut -d ' ' -f 3`

echo "
The forked child with pid #$PID (queue: $QUEUE) was found stuck for longer than 50 seconds.
It has now been killed and job(s) flagged as failed as a result have been re-enqueued.

You may still want to check the Resque Web UI and the status of the workers for problems.
" | mail -s "Killed stuck Resque job on $(hostname) PID $PID" email@address.com

...
fi
fi
done

I was looking for a nice and easy way to find out how long (in seconds) a process had been running, and the expression you see in the code snippet above was the nicest solution I could find (hat tip to joseph for this).

If any of the Resque processes that are working off jobs are found running for longer than 50 seconds, then these are killed without mercy and a notification is sent to some email address just in case.

First, this way we don’t actually kill Resque workers, but other processes forked by the workers in order to process jobs. This means that the workers remain up and running and soon after they’ll fork new processes to work off some other jobs from the queue(s) they are watching. This is the nicest part, in that you don’t need to manually kill the actual workers and then restart them in order to keep the worker servers going.

Second, killing those processes will cause the jobs that they were processing to fail, so they will appear in Resque’s “failed jobs” queue. The second part of the script takes care of this by running a rake task that re-enqueues all failed jobs and clears the failed jobs queue. For starters, you’ll need to add this rake task to your application. If you are already using Resque, you will likely have a lib/tasks/resque.rake file, otherwise you’ll have to create one (I’m assuming here it’s a Rails application).

In any case, add the following task to that rake file:

desc "Retries the failed jobs and clears the current failed jobs queue at the same time"
task "resque:retry-failed-jobs" => :environment do
(Resque::Failure.count-1).downto(0).each { |i| Resque::Failure.requeue(i) }; Resque::Failure.clear
end

Back to the script, if it finds and kills any workers that it found stuck, it then proceeds to run the above rake task so to retry the failed jobs:

ps -eo pid,command |
grep [r]esque |
grep "Processing" |
while read PID COMMAND; do
if [[ -d /proc/$PID ]]; then
SECONDS=`expr $(awk -F. '{print $1}' /proc/uptime) - $(expr $(awk '{print $22}' /proc/${PID}/stat) / 100)`

if [ $SECONDS -gt 50 ]; then
...
touch /tmp/retry-failed-resque-jobs
...
fi
fi
done

if [[ -f /tmp/retry-failed-resque-jobs ]]; then
/bin/bash -c 'export rvm_path=/usr/local/rvm && export HOME=/home/deploy && . $rvm_path/scripts/rvm && cd /var/www/sites/dashboard/current/ && /usr/local/bin/rvm rvmrc load && RAILS_ENV=production bundle exec rake resque:retry-failed-jobs'
fi

You may notice that I am forcing the loading of RVM before running the rake task; this is because I need to upgrade some stuff on the worker servers, but you may not need to run the rake task this way.

This is basically it: the script just kills the stuck workers and retries the failed jobs without requiring manual intervention; in almost all cases, I don’t have to worry anymore about them besides wondering whether there’s a design issue that might cause workers to get stuck and that therefore need to be addressed (which is a good reason to keep an eye on the notifications). There might be other monitoring solutions of various types out there, but this simple script is what has been working best for me so far on multiple worker servers with tens of workers.

The final step is to ensure that this script runs frequently so to fix problems as soon as they arise. The script is extremely lightweight, so in my case I just schedule it (with cron) to run every minute on each server.

Know of a better way of achieving the same result? Please do let me know in the comments.

Thermal paste: how to reapply it on a Macbook Pro

Thermal paste: why you may want to reapply it

My main machine these days is a mid-2010 15″ Macbook Pro powered by a dual core i5 (2.53GHz) CPU. I also have a better performing Hackintosh at home, but because of the portability I find myself using the MBP more. It’s a thing of beauty and I love it, as I have the previous Macs I’ve had the pleasure to work with both at home and at work. Since I purchased it, however, it has always been plagued with excessive heat issues: even 85C in idle or with very, very light load! If I said that I could perhaps fry an egg on its surface at times, I don’t think I would be too far from the truth. The laptop got really, really hot at times to the point that touching it for more than a couple of seconds was often more likely painful than just uncomfortable. The CPU is rated to work just fine at a temperature of up to 105C, so until recently I didn’t really care too much about the temperatures since, after all, the Mac seemed to work fairly OK and I didn’t have any other problems with it.

That was until I got really tired of the noisy fans (always running at the max speed of 6K RPM), and I started to wonder whether I should bring it in. I had already tried the usual stuff like resetting the SMC, or the PRAM/NVRAM, with poor results. Then I also noticed that with no doubt the excessive heat was affecting the performance of the laptop quite badly, much more so than I’d have thought possible considering that the temperatures were anyway always well within the 105C max, so from that point of view the CPU was still operating “safely”; lately, however, the laptop seemed to be performing like a much older machine… which was no good. For example, Handbrake video encoding tasks were ridiculously slower than I’d expect from this sort of machine, and then I also noticed that the kernel_task process was almost constantly at the top of my activity monitor with 200-300% CPU utilisation. What the.. ?

Some research on the web confirmed my suspicions that there was indeed a relation between the temperatures, always high but strangely never above 90-92C, and the general slowness, in particular with tasks such as video encoding or heavy testing. Apparently, this is due to the CPU throttling that kernel_task in Mac OS operates to prevent heat from damaging the hardware. I did some simple tests to confirm this, such as watching a Flash HD video for a while and then killing it: video would play fine in the beginning and then become a little sluggish as the CPU got hotter; at that point, kernel_task would be throttling the CPU and stay at the top with very high CPU utilisation and higher priority, so Flash would naturally be slowed down as a consequence and the temperature would never go above 92C or so; once I killed Flash, things would return to normal again and kernel_task would go back to a much lower CPU utilisation. This is – I think – the mechanism used by the software to manage CPU utilisation and heat.

From personal experience, I am aware that heat issues on laptops are often caused by a poor application of the stock thermal paste (also known as “thermal interface material” or TIM), provided that the cooling system is functioning. The reason is simple: the thermal paste – as the name suggests – is supposed to facilitate the transfer of the heat from the CPU/GPU to the heatsink. This only works efficiently, though, if a very thin layer of thermal paste is applied between CPU and heatsink in such a way that minimises the chance of creating “air bubbles” (air has a bad thermal conductivity). So the problem is that very often, the stock thermal paste is applied in factories in ridiculously large amounts, that often spread out of the die of the CPU and that most certainly achieve the opposite effect by slowing down, instead of facilitating, the transfer of heat from CPU to heatsink. Sadly, Apple doesn’t seem to be any different from other manufacturers from this point of view, despite the higher prices and the generally wonderful design and construction quality. Plus, often the stock thermal paste used by some manufacturers is quite cheap, and not based on some very efficient thermally conductive material.

In the past, I have almost always reapplied the thermal paste in my computers and replaced the stock paste with something better (especially with desktops that I liked to overclock) very often with great results, but in this particular case, having purchased the very expensive Apple Care cover together with the laptop, ideally I didn’t want to void the warranty.

Disassembling the laptop, removing the thermal paste and reapplying it, then reassembling the laptop… obviously doesn’t fit in Apple’s description of “user replaceable parts” (only hard drives and memory can be replaced / upgraded on Macbook Pros by the owner without affecting the warranty).

The “surgery”

Having said that, I had already opened my MBP, previously, not only to upgrade the memory and replace the HDD with an SSD, but also to replace the optical drive with a cheap version of the OWC Data Doubler I found on Ebay, so that I can use the ultra speedy 240GB SSD for OS and applications, and at the same time have the stock 500GB HDD installed as additional storage for iTunes and iPhoto libraries, and that kind of stuff that takes a lot of disk space.

I am not sure of how likely an Apple employee could notice that I have also replaced the optical drive with the data doubler, if I were to bring the laptop to an Apple Store after restoring its original configuration, but I am aware that my warranty is already virtually void. So I didn’t really want to waste a lot of time by bringing the laptop back to its original state, just to be able to bring it in and try to get some Apple employee to replace the thermal paste due to heat issues… too much hassle, plus from reading the Apple Support Communities, it looks like only very a few people have managed to get this done by Apple, and in all cases the job was done as poorly as in the factory, or even worse. So, I just decided to do it myself, and therefore I purchased the best thermal paste available at the moment, the IC Diamond 7 Carat – you can find a small syringe on ebay for a few quids. Here’s how the syringe looks like:

20120122-1nd4si3jht8anbxn6yanjjix25-jpg-1c6b9a

As the name suggests, the IC Diamond is a special thermal paste in that it contains 92% pure Diamond, which has a much better thermal conductivity than materials such as the silver used in other popular types of thermal paste like the Arctic Silver 5. Besides the great thermal conductivity, another great advantage of such a thermal paste is that it is not electrically conductive, so it’s not as risky to use as the silver based ones. Unfortunately though the IC Diamond is a very hard paste and it can be tricky to apply, especially on laptops where heatsinks are very light; I would likely recommend this kind of thermal paste more for desktop computers than for laptops for this very reason. On desktop computers, the best way of applying a thermal paste is to place a pea-size amount of paste on the middle of the CPU’s die, and let some heavy heatsink press and spread the paste evenly across most of the surface of the die… on laptops, due to the heatsinks being very light and due to the light pressure when they are fixed against the CPU, this technique would not work well, especially with such a hard thermal paste like the IC Diamond. Therefore in these cases, the easiest way is to just spread a small amount of paste manually across the surface of the die (I usually use an old credit card or something similar) until a very thin layer of paste covers the whole surface. This won’t ensure that no air bubbles will be produced between the CPU and the heatsink, but works fairly well in most cases. Alternatively, you may want to use something like the Arctic Cooling MX-4, which is another pretty efficient thermal paste (albeit not as efficient as the IC Diamond on paper) and is also not electrically conductive, plus it is a lot easier to apply than the IC Diamond.

If you are experiencing the same kind of heat-related issues with your own MBP, and want to try and reapply the thermal paste – provided you are well aware that this will virtually void the warranty, if any – I’d recommend you to get some proper screw drivers first. I don’t have any particular set of tools with me, but I usually use these screw drivers that I got when I purchased the OWC SDD, and they are great for laptops (note the little bluish tool on the right, this is very useful with Apple’s ribbon cables):

20120122-g4acexb61er18j547wjygxkn67-jpg-1c6b9a

The only problem I had when I disassembled my MBP was removing the battery: apparently Apple has changed again the screws used for the battery and in my case I had to purchase a tri wing screw driver from Maplin like the one in the picture to be able to remove it:

20120122-1i7cbfpepd36f9p657daqc69gf-jpg-1c6b9a

I didn’t need any other tools, apart from some alcohol – I use Isopropanol – and some lint free cloth to properly clean both the CPU and the GPU after removing the old thermal paste and before reapplying the new one:

bottle-jpg-1c6b9a-1

20120122-pjw4dcmndxi9npkj8nf3f4rwa4-jpg-1c6b9a

Besides tools, you obviously need to be a little patient and have steady hands if you go the same route and want to reapply the thermal paste. It’s not a really complicated operation, but – perhaps needless to say – you must be very careful. And did I mention that this will void your warranty, if any? (I warned you)

Before jumping to the results, here’s a few more pics of my laptop while I disassembled, so that you can get an idea of what to expect from the inside if you have never opened a MBP – if you aren’t sure of how to remove the back cover of your MBP, please stop here 🙂

Inside

20120122-g2gr57tgsasi5wux1i3wtiucag-jpg-1c6b9a

You can see that my laptop was quite dusty inside. This is kinda important: because reapplying the thermal paste will void your warranty, I recommend you first try by cleaning up all the dust especially from the fans. If too much dust and dirt is preventing the regular air flow, you might be shocked to see the difference that just cleaning the fans might make with regards to the temperatures! In my case I saw a drop of 5C or even more just after cleaning the fans.

Dusty fans

20120122-8btxy7mp11yndyr9mrsiufapht-jpg-1c6b9a

Fans after some cleaning

fan-jpg-1c6b9a

20120122-855i7tgd4u5u9rwm33pbbudwh8-jpg-1c6b9a

The battery

20120122-11q5nupb99m7ta6i1qhap2adje-jpg-1c6b9a

Depending on the model it may be a bit annoying to remove and require a special screw driver (a tri wing in my case).

The logic board

20120122-1jtrj9cm1peihyqu5nq216tj3u-jpg-1c6b9a

To reapply the thermal paste, you need to remove it from the case of the laptop. It’s nothing complicated, but you do need to be careful with the small ribbon cables here and there.

Before…

20120122-tt7iqrrjwxn4iwkscxyeesecyp-jpg-1c6b9a
20120122-paufw7w4hshh1y9e9cumx3wp47-jpg-1c6b9a
This is how the CPU, the GPU and their heatsinks looked like before cleaning up and reapplying the thermal paste. You can see how ridiculous large amount of compound had been applied, and how poorly the application was made. No wonder both my CPU and GPU were choking due to the heat!

Spring cleaning…

20120122-r9ki2x7scha44wpi2cjw49pq2m-jpg-1c6b9a

And this is instead how CPU/GPU (I forgot to take a pic of the heatsinks at this stage) looked after removing the old thermal paste and cleaning up properly with the Isopropanol. Funny…you can see that the CPU was so clean that the Apple logo of my iPhone – which I used to take the pictures – was reflected on its die!

After…

20120122-nbb7b2q34f8eync9sj6r31y48y-jpg-1c6b9a.jpeg

Finally, this is how the chips looked like after carefully applying the new thermal paste with a credit card. It’s definitely not my best application, but the IC Diamond was so hard! I would probably have used the Arctic Cooling MX-4 if I had known the IC Diamond was so difficult to apply. I don’t think I would have seen a massive difference between the two, after all, despite the theoretical difference would suggest otherwise.

Amazing results!

Reapplying the thermal paste can yield different results depending on various factors (how good the stock paste is, how well or badly it has been applied, which other thermal paste you want to replace it with, and how well you apply it). In my case, the results were pretty amazing!

Light load

As said earlier, the temperature of my CPU was most of the time at least 85C with light load or even in idle; under heavy load, temperature would rise to 95C max and the system would then be slowing down badly due to the CPU throttling. This is how my Activity Monitor looked like most of the time:

20120123-dqudwhaahrcwmeexxp7ym84p9q-jpg-1c6b9a

You can see kernel_task and its ridiculously high CPU utilisation in a moment when I was just surfing the web with Safari (nothing else), and even with Flash disabled!

After reapplying the paste, everything changed: the average temperature of the CPU was always just above 50C – more than 30C drop! – with the same light load or even more (web browsing and a few more things running at the same time), with the fans running always at the minimum speed of 2K RPM!

20120123-jrhnsjrx9cs8bubkmqgmhg6jwg-jpg-1c6b9a

This is when the laptop is plugged in; when I use in on battery, I haven’t yet seen the temperature of the CPU go above 30C !!

Heavy load

The difference is even more noticeable under heavy load. Before, when running some video encoding tasks with Handbrake kernel_task would use up to 350% CPU, slowing down Handbrake a lot. Now, this is what I see after reapplying the thermal paste:

20120123-qiqstmaawed39w4hw4gp8ec7yh-jpg-1c6b9a

See kernel_task? It’s completely gone, basically – it only appears from time to time for its normal stuff but it no longer shows a ridiculously high CPU utilisation. Needless to say, the encoding is much, much faster now and as you can see it is Handbrake that uses the CPU! I can even do other things like web surfing or coding/testing at the same time, all works great as it should. Before, I could basically forget about doing something else while video encoding.

And look at the temperature! Video encoding is one of the most CPU intensive tasks, yet with the fans at 6K RPM I haven’t seen the CPU going above 80C.

This is a massive improvement. My laptop feels a lot snappier now, it almost feels like a CPU upgrade and it is finally almost silent when I do anything other than video encoding. I am definitely happy about the improvement and would definitely recommend the same “fix” to others who may be experiencing the same issues. It’s cheap, it doesn’t take longer than 30 minutes overall and you just need to be a little careful. Remember about the warranty though!

Rest in peace, Steve

I would like to salute a man whose vision, creativity and achievements have indeed contributed for decades to improving our lives. A genius whose incredible eye and passion for great design and user experience will live forever and be source of inspiration for many generations to come.

After a long battle with pancreatic cancer lasted several years, Apple’s legendary co-founder and former CEO Steve Jobs died today at 56. It’s a very sad day not only for all technology enthusiasts, but for everybody.

He’s been a most inspiring person not only through the amazing products he’s delivered; his life-business experience has been remarkable: he co-founded Apple from nothing, bringing the company to wild success in just a few years; when his vision diverged from that of the board he got fired from the very company he had co-founded, “in a very public way” as he himself remembered years later; yet, he didn’t give up: he went on to founding NeXT and build some of the technologies that would later become part of the core of the Mac OS operating system; when a then struggling Apple acquired NeXT, he became Apple CEO again and over the years that followed he managed to bring the company back to life and transform it into the most valuable company in the world, in what has been often referred to as “the greatest second act in the history of business”. And at the same time, he made of Pixar the best animation studio around, delivering some of the most amazing animation movies ever made.

Apple have updated their home page to show a tribute to Jobs:

apple-home-steve-jobs-png-1c6b9a

The picture links to a page dedicated to him:

apple-site-steve-jobs-png-1c6b9a

“Apple has lost a visionary and creative genius, and the world has lost an amazing human being. Those of us who have been fortunate enough to know and work with Steve have lost a dear friend and an inspiring mentor. Steve leaves behind a company that only he could have built, and his spirit will forever be the foundation of Apple. If you would like to share your thoughts, memories, and condolences, please email rememberingsteve@apple.com.”

Apple’s new CEO Tim Cook also sent a note to all Apple employees:

“Team,
I have some very sad news to share with all of you. Steve passed away earlier today.
Apple has lost a visionary and creative genius, and the world has lost an amazing human being. Those of us who have been fortunate enough to know and work with Steve have lost a dear friend and an inspiring mentor. Steve leaves behind a company that only >he could have built, and his spirit will forever be the foundation of Apple.
We are planning a celebration of Steve’s extraordinary life for Apple employees that will take place soon. If you would like to share your thoughts, memories and condolences >in the interim, you can simply email rememberingsteve@apple.com.
No words can adequately express our sadness at Steve’s death or our gratitude for the opportunity to work with him. We will honor his memory by dedicating ourselves to >continuing the work he loved so much.
Tim””

It seems like Wikipedia’s page dedicated to Jobs has already been updated too:

“Steven Paul Jobs (February 24, 1955 – October 5, 2011) was an American computer entrepreneur and inventor. He was co-founder, chairman, and chief executive officer of Apple Inc. Jobs also previously served as chief executive of Pixar Animation Studios; he became a member of the board of directors of The Walt Disney Company in 2006, following the acquisition of Pixar by Disney. He was credited in Toy Story (1995) as an executive producer.”

Needless to say, messages about Jobs’ death are now trending on Twitter, and most blogs and news sites are reporting the news.

I would like to remember Steve Jobs with a quote on innovation, since I find his words most inspiring in my work:

“Innovation is not about saying yes to everything. It’s about saying no to all but the most crucial features.”

And also with his unforgettable commencement speech at Stanford University in 2005.

Thank you and rest in peace Steve, I’ll dearly miss your genius, your products, your exciting keynotes and your “one more thing”!