Setting up RancherOS for Rancher and Kubernetes

For many years, my operating system of choice for servers has always been Ubuntu. I tried several other distros over the years but for a reason or another I always went back to using Ubuntu. It’s a solid operating system that works very well on both desktops and servers. Recently, however, I have started to use Kubernetes heavily, with the goal of migrating everything from “old school” kind of deployments.

While I have never had any particular issues with Ubuntu, I was looking for a more “stable” alternative for use with Kubernetes; Kubernetes means that everything runs as containers and therefore I can update these containers whenever I want or need, individually; so I didn’t like the idea of also depending on OS updates as frequently as with Ubuntu. Nothing wrong with updating! But if I use containers for just about everything, then I prefer updating the OS only when critical security updates or some major release are available. This ways I need to reboot servers less frequently, which can be a good thing with regards to keeping apps running. Of course, one should always try to architect apps to be highly available without being affected if servers need to be rebooted every now and then, but anyway… every little helps.

So while looking for alternatives I’ve also tried CentOS - which is known to be more stable - but to be honest I never liked it too much. Then I came across the concept of “container optimised” operating systems, and learnt about RancherOS - surprisingly late considering that I had already been using Rancher for a little while (in case it’s the first time you hear about Rancher, it’s an awesome management interface for Kubernetes clusters). RancherOS is a special kind of operating system, in that everything - really - runs as containers, including system services. It mainly consists of two separate Docker instances, a System Docker for OS/system related stuff, and a User Docker for user managed containers. RancherOS is an OS made of containers for use with containers, so it sounds like the perfect choice for Kubernetes but also for Rancher, which I use (and absolutely love) to manage it.

RancherOS is a super lightweight operating system with just the minimum components required to run Docker. Not only is it lightweight, a minimal OS also translates in smaller attack surface out of the box. I’ve been running RancherOS only for a few days but I love it already once I sorted out a few things and generally understood better how it works.

Here I am going to show how to install it first, then will give a couple of tips for things to do once the OS is installed. For my servers I use Hetzner Cloud because of the amazing price/performance, but the instructions below can be easily adapted for other providers.

Installation

Hetzner Cloud has a RancherOS image available, but at the moment it’s the 1.4.0 version so it’s oldish. The current stable release is 1.5.2, so that’s what we are going to install. The very first thing you need to do, is of course create one or more servers depending on what you are going to use RancherOS for. It doesn’t matter which operating system you choose while creating the servers, because the OS will anyway be replaced by RancherOS. Optionally, if for example you are going to use RancherOS with Kubernetes nodes that will manage storage (with something like OpenEBS or similar), add one or more disks (Hetzner calls them “volumes”) to the server.

Once the server has been created, go to Rescue in the server’s control panel and enable the rescue system by clicking on Enable Rescue & Power Cycle. Within a minute you should be able to SSH into the server’s rescue system with

ssh [email protected]<server ip>

Once in the rescue system, you need to install the kexec-tools package, which is required to boot into another kernel different from the one currently running. Here I am assuming the original OS is Ubuntu.

DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes --show-progress kexec-tools

Next, download the RancherOS ISO - you can check the latest release available here. At the moment it is 1.5.2.

wget https://github.com/rancher/os/releases/download/v1.5.2/rancheros.iso

Wipe the disk

echo -e "w\nq" | fdisk /dev/sda

mount the ISO

mount -t iso9660 rancheros.iso /mnt

and boot into it

kexec --initrd /mnt/boot/initrd-v1.5.2 --command-line="rancher.password=some-password" /mnt/boot/vmlinuz-4.14.122-rancher

Of course set a proper password. The SSH connection should be interrupted so you need to SSH again forcing the password authentication (I am not sure/can’t remember if this is actually required):

ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password [email protected]<server ip>

You will be logged in to RancherOS now. Next you need to prepare the configuration file that will be used by the installer. First set the hostname

HOSTNAME=...

and the IP address of eth0

IP=`ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk '{ print $1}'`

If you have added a volume to the server, set the DISK variable too so it can be used for mounting

DISK=`ls /dev/disk/by-id/scsi-0HC*`

The above will fine the correct disk/volume device. Finally, create the config file:

cat <<EOF > cloud-config.yml
#cloud-config
hostname: $HOSTNAME
ssh_authorized_keys:
  - ...
mounts:
- ["$DISK", "/mnt/my-disk", "xfs", ""]
rancher:
  console: ubuntu
  resize_device: /dev/sda
  docker:
    tls: true
  network:
    post_cmds:
      - dhcpcd -e force_hostname=true eth0
    dns:
      nameservers:
        - 8.8.8.8
        - 1.1.1.1
    interfaces:
      eth0:
        address: $IP/32
        netmask: 255.255.255.255
        gateway: 172.31.1.1
        pointopoint: 172.31.1.1
        mtu: 1400
        dhcp: false
      lo:
        address: 127.0.0.1/8
EOF

Of course set your SSH key(s). You can remove the mounts section if you haven’t added a volume to your server. You can see that I am specifying a console here, this is because RancherOS by default uses an Alpine-based console, but you can choose to use something else like Ubuntu/Fedora/CentOS. Please note that if you want persistence, you need to switch to from the default console to another one. Also, the configuration makes the chosen console available, but we’ll need to switch to it manually as we’ll see in a moment. resize_device is required to ensure that the filesystem created by RancherOS takes the whole capacity of the main disk when installing the OS. The network settings for eth0 here are specific to Hetzner Cloud, so you will have to change them if you are using another provider.

Once you have created the config file, it’s a good idea to validate it just in case there are mistakes:

sudo ros config validate -i cloud-config.yml

Now we are ready to install RancherOS on disk:

sudo ros install -i rancher/os:v1.5.2 -t gptsyslinux -c cloud-config.yml -d /dev/sda --append "rancher.password=some-password"

Again, set a proper password. The installer will reboot the system once you confirm; once you are logged in again, set up Docker TLS support by running:

IP=`ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk '{ print $1}'`

sudo ros config set rancher.docker.tls true
sudo ros tls gen --server -H localhost -H rancher -H $IP
sudo system-docker restart docker
sudo ros tls gen

Next, unless you have removed the console setting, switch to the chosen console

sudo ros console switch ubuntu

This will kick you out so you’ll have to login again, then you will be able to install packages with apt if you chose the Ubuntu console, or equivalent for another console. Congrats, RancherOS is now installed on disk.

Post-installation

Like I mentioned earlier, being a minimal OS RancherOS has a smaller attack surface. However on all my servers I always do at least three things right away after installing the OS: configure/harden SSH, configure a firewall, and install fail2ban.

SSH configuration

If you edit /etc/ssh/sshd_config directly to make your changes like for example disabling password authentication, you will soon notice that your changes will be lost if you reboot the server. Instead, you need to customise the config template at /etc/ssh/sshd_config.tpl.

Firewall

I am not sure if editing iptables rules directly on the host would work because not everything is persisted as one would expect in RancherOS. I haven’t tried, and because everything in RancherOS runs as a container, I thought well, let’s use a container for the firewall as well. So I created a super simple image that dyamically adds some iptables rules when the container is started, and removes those rules when the container is stopped/removed. At the moment the very simple script in the image does two things: allow you to open only some ports to the public (locking down everything else) and/or allow any connection to the server, to any port, from specific IP addresses. Using this image is as simple as running

docker run --name firewall --env OPEN_PORTS="22,80,443" --env ACCEPT_ALL_FROM="ip1,ip2" --env CHAIN="DOCKER-FIREWALL" -itd --restart=always --cap-add=NET_ADMIN --net=host vitobotta/docker-firewall:0.1.0

Of course customise the ports you want to open and the IP addresses, if any, that should be allowed full communication with the server - I use this for example to allow communication between the nodes of a Kubernetes cluster. You can see the Dockerfile and the script here.

fail2ban

For this I was lucky because I found a ready image created by somebody else which also implements email notifications for events like when an IP is banned etc. At the moment I am using this for SSH only, but I will try and customise it further later for my apps/specific uses. For SSH, you need to create a jail first:

mkdir -p fail2ban/jail.d

cat <<EOF > fail2ban/jail.d/sshd.conf
[sshd]
enabled = true
port = ssh
filter = sshd[mode=aggressive]
logpath = /var/log/syslog
bantime  = 86400
findtime  = 14400
maxretry = 3
EOF

Customise the settings if needed. I am annoyed by the many attempts to login to my servers, so here I chose to ban for one whole day any IP that fails a login 3 times within 4 hours.

To run fail2ban:

docker run -it -d --name fail2ban --restart always \
  --network host \
  --cap-add NET_ADMIN \
  --cap-add NET_RAW \
  -v $(pwd)/fail2ban:/data \
  -v /var/log:/var/log:ro \
  -e F2B_LOG_LEVEL=DEBUG \
  -e F2B_IPTABLES_CHAIN=INPUT \
  -e F2B_ACTION="%(action_mwl)s" \
  -e TZ=EEST \
  -e F2B_DEST_EMAIL=... \
  -e F2B_SENDER=... \
  -e SSMTP_HOST=... \
  -e SSMTP_PORT=... \
  -e SSMTP_USER=... \
  -e SSMTP_PASSWORD=... \
  -e SSMTP_TLS=YES \
  crazymax/fail2ban:latest

I have chosen action_mwl as action so whenever an IP is banned, I receive a notification that includes whois details on the IP.

With a custom SSH config, a firewall and fail2ban, I have at least some basic “protection” from bots etc.

Backups and restores of Rancher data

Since one of the things I use RancherOS is Rancher itself, I needed to figure out a way to manage backups and restores of Rancher’s data. After spending a little time I created an image that does just that. It can be used to perform manual or scheduled backups (optionally with email notifications), and restores from either a local backup or a copy stored in S3-compatible storage (I use Restic for this). It’s simple and works pretty well from my testing. I wrote instructions on how to use it in the README of the repo on Github, so I won’t repeat those here.

Conclusion

Like I said earlier I have used RancherOS only for a few days, but I am really pleased so far with the setup for both Rancher and Kubernetes clusters. Especially after sorting out security basics and the backups with Rancher. Hopefully these tips can be useful to someone :)