Faelix runs Debian stable as the operating system of choice, and jessie was released just about the same time our new batch of servers arrived. We had already evaluated a few of the alternatives, but Google's Ganeti project stood out as the best fit for our needs. One of the most important factors was to find something which would make it easy to transition our existing customers' VMs. Faelix has over a decade of experience using Xen, and Xen continues to be a powerful virtualisation technology. We also liked Ganeti's management of disks, particularly its integration with DRBD. And when it turned out to be written in Python, that was just the icing on the cake.
We had already begun a network rearchitecture project a few months earlier, and a combination of Ganeti and Open vSwitch fit perfectly into this vision. Customers are now assigned their own network segment (or, potentially, more than one) which can be used for public and private traffic, and we map those through to VLANs within Open vSwitch. By keeping things simple with ethernet it also made life easier for our blended "virtual+colo" customers: e.g. a customer's private NAS appears on the same layer-2 network as their front-end VMs. We had considered an overlay technology like MPLS/VPLS or VXLAN, but felt that these would either limit our options with IPv6 support, or could be a little too close to the bleeding edge for a production network. We've planned evaluating these in the future, but not jumped in with both feet just yet.
One of the challenges we ran into when deploying the Debian-packaged versions of Xen, Ganeti and OpenVSwitch was around VLAN tagging. After a few false starts, we ended up writing our own "vif" scripts for Xen.
/etc/ganeti/node_info.py:
#!/usr/bin/python
import requests
import sys
hostname = sys.argv[ 1 ]
if hostname.endswith( "--incoming" ): # to cope with migrations
hostname = hostname[ :-10 ]
req = requests.get( "https://CLUSTERNAME:5080/2/instances/" + hostname, verify = False )
data = req.json()
for ( i, x ) in enumerate( data[ 'custom_nicparams' ] ):
vlan = x[ 'vlan' ]
while vlan.startswith( "." ): # sometimes end up with .1234 instead of 1234
vlan = vlan[ 1: ]
print( 'vlan[%d]="%s"' % ( i, vlan ) )
print( 'link[%d]="%s"' % ( i, x[ 'link' ] ) )
print( 'mode[%d]="%s"' % ( i, x[ 'mode' ] ) )
/etc/xen/scripts/vif-ganeti-ovs:
#!/bin/bash
source /usr/lib/ganeti/net-common
dir=$(dirname "$0")
. "$dir"/vif-common.sh
dev=$vif
dev_=${dev#vif}
domid=${dev_%.*}
devid=${dev_#*.}
domname=$(xl domname $domid)
eval `/etc/ganeti/node_info.py $domname`
INTERFACE=$dev
MODE=${mode[$devid]}
LINK=${link[$devid]}
VLAN=${vlan[$devid]}
setup_ovs
success
Overall this combination of technologies is working really well for us. We get the benefit of being able to migrate customers' virtual machines between physical servers by using DRBD, which is important for being able to perform essential maintenance. Our customers can continue to mix-and-match VMs with physical equipment in our racks. And it all "fits" nicely in our upgraded network architecture.