Monthly Archives: May 2013

Using keepalived for a self-balancing cluster

Load balancing traffic between servers can sometimes lead to headaches depending on your topology and budget. Here I’ll discuss how to create a self load balanced cluster of web servers distributing HTTP requests between themselves and serving them at the same time. Yes, this means that you don’t need dedicated load balancers !

I will not go into the details on how to configure your kernel for ipvsadm etc since it’s already covered enough on the web but instead focus on the challenges and subtleties of achieving a load balancing based only on the realservers themselves. I expect you reader have a minimal knowledge of the terms and usage of ipvsadm and keepalived.

The setup

Let’s start with a scheme and some principles explaining our topology.

  • 3 web servers / realservers (you can do the same using 2)
  • Local subnet : 192.168.0.0/24
  • LVS forwarding method : DR (direct routing)
  • LVS scheduler : WRR (you can choose your own)
  • VIP : 192.168.0.254
  • Main interface for VIP : bond0

keepalived_dr

Let’s take a look at what happens as this will explain a lot of why we should configure the servers in a quite special way.

black arrow / serving

  1. the master server (the one who has the VIP) receives a HTTP port connection request
  2. the load balancing scheduler decides he’s the one who’ll serve this request
  3. the local web server handles the request and replies to the client

 blue arrow / direct routing / serving

  1. the master server receives a HTTP port connection request
  2. the load balancing scheduler decides the blue server should handle this request
  3. the HTTP packet is given to the blue server as-this (no modification is made on the packet)
  4. the blue server receives a packet whose destination IP is the VIP but he doesn’t hold the VIP (tricky part)
  5. the blue server’s web server handles the request and replies to the client

IP configuration

Almost all the tricky part lies in what needs to be done in order to solve the point #4 of the blue server example. Since we’re using direct routing, we need to configure all our servers so they accept packets directed to the VIP even if they don’t have it configured on their receiving interface.

The solution is to have the VIP configured on the loopback interface (lo) with a host scope on the keepalived BACKUP servers while it is configured on the main interface (bond0) on the keepalived MASTER server. This is what is usually done when you use pacemaker and ldirectord with IPAddr2 but keepalived does not handle this kind of configuration natively.

We’ll use the notify_master and notify_backup directives of keepalived.conf to handle this :

notify_master /etc/keepalived/to_master.sh
notify_backup /etc/keepalived/to_backup.sh

We’ll discuss a few problems to fix before detailing those scripts.

The ARP problem

Now some of you wise readers will wonder about the ARP cache corruptions which will happen when multiple hosts claim to own the same IP address on the same subnet. Let’s fix this problem now then as the kernel does have a way of handling this properly. Basically we’ll ask the kernel not to advert the server’s MAC address for the VIP on certain conditions using the arp_ignore and arp_announce sysctl.

Add those lines on the sysctl.conf of your servers :

net.ipv4.conf.all.arp_ignore = 3
net.ipv4.conf.all.arp_announce = 2

Read more about those parameters for the detailed explanation of those values.

The IPVS synchronization problem

This is another problem arising from the fact that the load balancers are also acting as realservers. When keepalived starts, it spawns a synchronization process on the master and backup nodes so you load balancers’ IPVS tables stay in sync. This is needed for a fully transparent fail over as it keeps track of the sessions’ persistence so the clients don’t get rebalanced when the master goes down. Well, this is the limitation of our setup : clients’ HTTP sessions served by the master node will fail if he goes down. But note that the same will happen to the other nodes because we have to get rid of this synchronization to get our setup working. The reason is simple : IPVS table sync conflicts with the actual acceptance of the packet by our loopback set up VIP. Both mechanisms can’t coexist together, so you’d better use this setup for stateless (API?) HTTP servers or if you’re okay with this eventuality.

Final configuration

to_master.sh

#!/bin/bash

ip addr del 192.168.0.254/32 dev lo
ipvsadm --restore < /tmp/keepalived.ipvs
  1. drop the VIP from the loopback interface (it will be setup by keepalived on the master interface)
  2. restore the IPVS configuration

to_backup.sh

#!/bin/bash

ip addr add 192.168.0.254/32 scope host dev lo
ipvsadm --save > /tmp/keepalived.ipvs
ipvsadm --clear
  1. add the VIP to the loopback interface, scope host
  2. keep a copy of the IPVS configuration, if we get to be master, we’ll need it back
  3. drop the IPVS local config so it doesn’t conflict with our own web serving

Conclusion

Even if it offers some serious benefits, remember the main limitation of this setup : if the master fails, all sessions of your web servers will be lost. So use it mostly for stateless stuff or if you’re okay with this. My setup and explanations may have some glitches, feel free to correct me if I’m wrong somewhere.

rabbitMQ : v3.1.1 released

EDIT: okay, they just released v3.1.1 so here it goes on portage as well !

highlights

  • relax validation of x-match binding to headers exchange for compatibility with brokers < 3.1.0
  • fix bug in ack handling for transactional channels that could cause queues to crash
  • fix race condition in cluster autoheal that could lead to nodes failing to re-join the cluster

3.1.1 changelog is here.

I’ve bumped the rabbitMQ message queuing server on portage. This new version comes with quite a nice bunch of bugfixes and features.

highlights

  • eager synchronisation of slaves by policy (manual & automatic)
  • cluster “autoheal” mode to automatically choose nodes to restart when a partition has occurred
  • cluster “pause minority” mode to prefer partition tolerance over availability
  • improved statistics (including charts) in the management plugin
  • quite a bunch of performance improvements
  • some nice memory leaks fixes

Read the full changelog.

Squid proxy : blocking download of some file extensions

It is a common request in squid to have it block downloading certain files based on their extension in the url path. A quick look at google’s results on the subject apparently gives us the solution to get this done easily by squid.

The common solution is to create an ACL file listing regular expressions of the extensions you want to block and then apply this to your http_access rules.

blockExtensions.acl

\.exe$

squid.conf

acl blockExtensions urlpath_regex -i "/etc/squid/blockExtensions.acl"

[...]

http_access allow localnet !blockExtensions

Unfortunately this is not enough to prevent users from downloading .exe files. The mistake here is that we assume that the URL will strictly finish by the extension we want to block, consider the two examples below :

http://download.com/badass.exe     // will be DENIED as expected

http://download.com/badass.exe?    // WON'T be denied as it does not match the regex !

Squid uses the extended regex processor which is the same as egrep. So we need to change our blockExtensions.acl file to handle the possible ?whatever string which may be trailing our url_path. Here’s the solution to handle all the cases :

blockExtensions.acl

\.exe(\?.*)?$
\.msi(\?.*)?$
\.msu(\?.*)?$
\.torrent(\?.*)?$

You will still be hated for limiting people’s need to download and install shit on their Windows but you implemented it the right way and no script kiddie can brag about bypassing you 😉

Fujifilm GF670W

It’s been so long since I switched to film-only photography that I decided a few months ago to sell all my digital equipment. I already own a Nikon FM2 camera which I love but I’ve to admit that I was and still am totally amazed by the pictures taken by my girlfriend’s Rolleiflex 3.5F. The medium format is the kind of rendering I was craving to get and that sooner or later I’d step into the medium format world. Well, I didn’t have to wait as when we were in Tokyo to celebrate new year 2013 I fell in love with what was the perfect match between my love for wide angles and medium format film photography : the Fujifilm GF670W !

For my soon to come birthday, I got myself my new toy in advance so I could use it in my upcoming roadtrip around France (I’ll talk about it soon, it was awesome). Oddly, the only places in the world where you can get this camera is in the UK and in Japan so I bought it from the very nice guys at Dale photographic. Here is the beast (literally) :

IMG_20130412_215344

Yes, this is a big camera and it comes with a very nice leather case and a lens hood. This is a telemetric camera with a comfortable visor, it accepts 120 and 220 films and is capable of shooting in standard 6×6 and 6×7 !

In the medium format world, the 55mm lens is actually a wide angle one as it is comparable to a 28mm in the usual 24×36 world. Its performances are not crazy on paper with a 4.5 aperture and a shutter speed going from 4s to 1/500s (as fast as a 1956 Rolleiflex) but the quality is just stunning as it’s sharp and offers a somewhat inexistant chromatic abberation.

Want proof ? These are some of my first roll’s shoots uploaded at full resolution :

07760003

07760006