Setting up LVS (Linux Virtual Server) for use with dCap
Newer linux kernels have LVS built-in (as well as our UltraLight kernels). See
http://kb.linuxvirtualserver.org/wiki/Main_Page for some LVS knowledge-base info and
http://kb.linuxvirtualserver.org/wiki/FAQ for the FAQ.
We currently run
dcap doors on every pool node for our dCache installation (actually we run
3 on ports 22136, 22137 and 22125). The selection of a dcap door is controlled by a
dcache.conf file stored in the /pnfs namespace. (See
http://trac.dcache.org/projects/dcache/wiki/ChimeraSetup in section 6 for details)
We would rather use some more intelligent way of finding a door to use for dcap/dccp access so we tried
LVS.
Setup Primary LVS Server with Virtual IP
We first need to assign a virtual IP clients can specify to access the service. I setup dcap0.aglt2.org at 192.41.231.200 for this test.
The
mini-howto at
http://kb.linuxvirtualserver.org/wiki/Mini_Mini_Howto and
man ipvsadm have information on the setup.
Using 192.41.231.200 I first setup head02.aglt2.org as the
LVS server. I created a simple shell-script to do this (also stored in /afs/atlas.umich.edu/hardware/LVS)
#!/bin/bash
#
# Setup IPVS for dcap0.aglt2.org virtual IP
#
#######################
# Define virtual ip (vip) to use
vip=192.41.231.200
# Enable virtual IP on NIC alias
ifconfig bond0.4001:0 down
ifconfig bond0.4001:0 $vip netmask 255.255.255.255 broadcast $vip up
# Clear existing IPVS config
ipvsadm --clear
# Setup service for dcap0
ipvsadm -A -t $vip:22136 -s wlc
ipvsadm -a -t $vip:22136 -r 192.41.230.24:22136 -g -w 1000
ipvsadm -a -t $vip:22136 -r 192.41.230.27:22136 -g -w 500
ipvsadm -a -t $vip:22136 -r 192.41.230.33:22136 -g -w 800
This script will bring up a new IP alias on our bond0.4001 "NIC". The virtual IP will we specified will be "weighted least-connection" (wlc) balanced across 3 real servers (UMFS04/UMFS07 and UMFS13) each with different weights.
If
ipvsadm is not installed do
yum install ipvsadm
Running this script on head02 results in a new interface:
[root@head02 ~]# ifconfig bond0.4001:0
bond0.4001:0 Link encap:Ethernet HWaddr 00:15:C5:F2:7C:B5
inet addr:192.41.231.200 Bcast:192.41.231.200 Mask:255.255.255.255
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
Now we need to setup the real servers to respond to the requests
Setting up Real Servers
Each real server needs to be able to respond to packets destined for the virtual IP we have chosen. To do this we want a "local" implementation of the virtual IP which nodes outside this real server don't see. We use the loopback (
lo) device to set this up as well as some arp behavior changes to make it work correctly. See
http://kb.linuxvirtualserver.org/wiki/ARP_Issues_in_LVS/DR_and_LVS/TUN_Clusters for more details.
I setup another script (also in AFS) to handle the real servers:
[root@umfs04 ~]# cat setup_ipvs_dcap0.sh
#!/bin/bash
#
###################
# Pick VIP
vip=192.41.231.200
# Fix ARP on host
echo 1 > /proc/sys/net/ipv4/conf/eth3/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/eth3/arp_announce
echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
ifconfig lo:0 $vip netmask 255.255.255.255 broadcast $vip up
This makes sure the real server will not respond for the
vip address but will still process the requests that are coming from the LVS server. You will need to determine the correct interface (likely
not eth3 !) to use. See
http://kb.linuxvirtualserver.org/wiki/Using_arp_announce/arp_ignore_to_disable_ARP for details on
arp_ignore and
arp_announce settings.
The result on one of the real servers is a new interface:
[root@umfs04 ~]# ifconfig lo:0
lo:0 Link encap:Local Loopback
inet addr:192.41.231.200 Mask:255.255.255.255
UP LOOPBACK RUNNING MTU:16436 Metric:1
Testing LVS for dCache dcap
To test this I just ran 'dccp' for a test file already in our /pnfs/aglt2.org/atlashotdisk/ area:
dccp -d19 dcap://192.41.231.200:22136/pnfs/aglt2.org/atlashotdisk/shawn.test2 /tmp/shawn.test7
You can use the
ipvsadm -L --stats command to see the resulting statistics (run the command above many times):
[root@head02 ~]# ipvsadm -L --stats
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes
-> RemoteAddress:Port
TCP dcap0.aglt2.org:22136 14 182 0 13916 0
-> umfs13.aglt2.org:22136 5 65 0 4970 0
-> umfs07.aglt2.org:22136 3 39 0 2982 0
-> umfs04.aglt2.org:22136 6 78 0 5964 0
The weights for each real server should be adjusted to match their relative "power" for serving requests.
NOTE: One thing I noticed is we have way too many dcap doors setup for AGLT2. For the upcoming dCache upgrade I will reduce to using only 1 dcap door per pool node (rather than 3). The doors are only used for the
control channel...the pool with the file will respond on the data channel for the client request. Also I still need to test to verify that LVS can work with nodes at MSU which are on a different subnet (192.41.236.0/23 instead of 192.41.230.0/23).
Using an LVS serviced IP we can replace the dcache.conf contents with a single (VIP) instance like:
dcap0.aglt2.org:22136
Rather than the existing (long) list of all possible dcap doors at AGLT2. We can then use LVS on head02 (or anyplace else we want to put it) to handle dcap requests and load-balance them to the real servers.
--
ShawnMcKee - 02 Dec 2010