Build of msurx ROCKS 5.5 frontend
This page gives a procedural view of the build of msurx with ROCKS 5.5 and SL58, some differences in the use of ROCKS 5.5 vs. 5.3 are mentioned. As described here the frontend is working using rolls for building MSU Tier3 worker nodes.
This page originally referenced 5.4.3 version (used for development) and updated with 5.5. Some links still pointing to 5.4.3.
VM setup
Create a VM with:
- 2 CPUs
- 4GB memory
- 30GB and 40GB disks
- 2 NICs, one on local and one on public AGLT2 VLANs (used E1000 NIC type for compatibility during install)
- CD linked to ISOs
Suggest forcing boot to BIOS for first boot and in BIOS enable diagnostic screen at bootup. This can make it easier to get to grub boot menu.
ROCKS frontend kickstart
The ROCKS docs for frontend build (screen shots etc.) are
here.
Fetch ROCKS 5.4.3 and SL58 disk1 ISOs to area available via NFS to mount as CDs on VM client.
Boot the VM with the ROCKS DVD attached (I shuffled boot device order in VM's BIOS menu to put CD before HD), at ROCKS grub splash screen, type "build".
Configure networking using manual configuration option (just IPV4), set address info for public interface, installer then will switch to graphical mode.
Select these 4 rolls from ROCKS Jumbo DVD:
- base
- ganglia
- kernel
- web-server
Then provide the SL58 ISO and select "LTS" roll. Everything needed is on disk1, disk2 may add conflicting/extraneous features.
Configure cluster/cert metadata.
Configure IP for private network.
Gateway: 10.10.128.1
DNS server: 10.10.128.8,10.10.128.9
NTP server: 10.10.128.8
Select manual partitioning:
- / 26GB sda
- swap 4GB sda
- /export 40GB sdb
You'll need to switch back to ROCKS iso. Will automatically reboot once install completed.
For MSU cluster, set msulocal domainname
Ssh to frontend as root and ...
Make sure that ROCKS DB has correct name for private network ("msulocal") and that the /etc/hosts file includes correct private entry for the host.
$ rocks set network zone private msulocal
$ rocks sync config
$ cat /etc/hosts | grep msurx
10.10.128.11 msurx.msulocal msurx
192.41.236.11 msurx.aglt2.org
Fix resolv.conf
Remove the local nameserver from /etc/resolv.conf, so that the normal 10.10.128.8 and 10.10.128.9 nameservers are referenced, but 127.0.0.1 is not.
Put CFE client keys in place
First we push the key pairs used by CFEngine to the node, these are used to bootstrap the frontend to CFE and also in client builds.
The ROCKS build pushes the CFE client keys to the clients, but the keys are not stored in SVN. The keys exist on older ROCKS frontends and also on cap in a tarball. These need to be world readable so that them may be included in kickstart.
Scp the tarball to the new frontend. Then ssh to the frontend and do the following:
$ mkdir /var/cfengine
$ cd /var/cfengine
$ tar -xf /root/ppkeys-stash.tar
Bootstrap to CFEngine
The initial config is to bootstrap the node to CFEngine.
SSH to node as root using the password set during kickstart.
Fetch cfengine rpm, install it, source /etc/profile.d/cfengine3.sh, create policy_server.dat and policy_path.dat files, copy keys, then bootstrap cf-agent against 10.10.128.10 (msucfe.msulocal). Here I've copied the keys from the ppkeys-stash dir populated above.
$ wget http://mirror.msulocal/mirror/aglt2/5/x86_64/cfengine-community...
$ yum localinstall cfengine3.rpm
$ source /etc/profile.d/cfengine3.sh
$ cd /var/cfengine
$ cp ppkeys-stash/root-10.10.128.10.pub ppkeys
$ cp ppkeys-stash/root-10.10.128.11.priv ppkeys/localhost.priv
$ cp ppkeys-stash/root-10.10.128.11.pub ppkeys/localhost.pub
$ chmod 600 ppkeys/*
$ echo '10.10.128.10' > policy_server.dat
$ echo '/var/cfengine/masterfiles-prod' > policy_path.dat
Now use run cf-agent with bootstrap option and run cf-agent until all promises kept:
$ cf-agent -B -s 10.10.128.10
$ cf-agent
$ cf-agent -K
$ tail /var/cfengine/promise_summary.log
SVN checkout
The CFEngine run will have installed the ssh_keys needed for SVN checkout. Now want to checkout the ROCKS SVN trunk.
$ source /root/svn-tools/svn-readonly.sh
$ cd /export/rocks
$ svn checkout svn+ssh://ndt.aglt2.org/rocks/trunk svn-trunk
Note that the full local repo is 11 GB.
Bind mounts
Using bind mounts to create desired filesystem paths.
Just need /export/rocks/install/tools and config created from directory in svn-trunk local repo. Add this to /etc/fstab:
# Bind Mounts for ROCKS SVN local repos
/export/rocks/svn-trunk/config /export/rocks/install/config none bind 0 0
/export/rocks/svn-trunk/tools /export/rocks/install/tools none bind 0 0
Create mount points and enable mounts:
$ mkdir /export/rocks/install/config
$ mkdir /export/rocks/install/tools
$ mount -a
Build and install rolls
For each of the rolls from svn-trunk/rolls-src that are to be installed, substituting ROLLDIR and ROLLNAME, do:
$ cd ROLLDIR
$ make clean; make roll
$ rocks add roll ROLLNAME*.iso
$ rocks enable roll ROLLNAME
To loop through all creating rolls:
$ cd rolls-src
$ for aroll in agl-base agl-cfengine agl-condor agl-cvmfs agl-dell agl-msut3 agl-osg3 agl-puppetlabs agl-update-sl58; do cd $aroll; make clean; make roll; cd ..; done
$ for aroll in agl-base agl-cfengine agl-condor agl-cvmfs agl-dell agl-msut3 agl-osg3 agl-puppetlabs agl-update-sl58; do cd $aroll; rocks add roll *.iso; cd ..; done
$ rocks enable roll agl-base
$ ...
$ rocks list roll
NAME VERSION ARCH ENABLED
base: 5.4.3 x86_64 yes
ganglia: 5.4.3 x86_64 yes
kernel: 5.4.3 x86_64 yes
web-server: 5.4.3 x86_64 yes
LTS: 5.4.3 x86_64 yes
agl-base: 0.13 x86_64 yes
agl-cfengine: 0.12 x86_64 yes
agl-condor: 0.03 x86_64 yes
agl-cvmfs: 0.03 x86_64 yes
agl-dell: 0.02 x86_64 yes
agl-msut3: 0.02 x86_64 yes
agl-osg3: 0.02 x86_64 yes
agl-puppetlabs: 0.01 x86_64 yes
agl-update-sl58: 0.02 x86_64 yes
Global attributes:
(note that there is an issue with "primary_net" in our setup, see additions section below.)
rocks set attr Kickstart_PrivateDNSDomain msulocal
rocks set attr Kickstart_PrivateDNSServers 10.10.128.8,10.10.128.9
rocks set attr Kickstart_PrivateSyslogHost 10.10.128.15
rocks set attr primary_net public
rocks add attr agl_site MSU
rocks add attr cfe_policy_host 10.10.128.10
rocks add attr cfe_policy_path /var/cfengine/masterfiles-rocks
Routing:
This is the starting point:
$ rocks list route
NETWORK NETMASK GATEWAY
224.0.0.0: 255.255.255.0 private
255.255.255.255: 255.255.255.255 private
0.0.0.0: 0.0.0.0 10.10.128.11
192.41.236.11: 255.255.255.255 10.10.128.11
$ rocks remove route 224.0.0.0
$ rocks remove route 255.255.255.255
$ rocks remove route 0.0.0.0
$ rocks remove route 192.41.236.11
$ rocks add route 224.0.0.0 private netmask=240.0.0.0
$ rocks add route 0.0.0.0 192.41.236.1 netmask=0.0.0.0
$ rocks add route 10.10.0.0 10.10.128.1 netmask=255.255.240.0
This is the end point:
$ rocks list route | sort -r
NETWORK NETMASK GATEWAY
224.0.0.0: 240.0.0.0 private
10.10.0.0: 255.255.240.0 10.10.128.1
0.0.0.0: 0.0.0.0 192.41.236.1
The routes above are for: ganglia/multicast, UM private network, default gateway. Update the frontend with these routes:
$ rocks report host route msurx | grep '^any' > /etc/sysconfig/static-routes
Check that the default route is correct in /etc/sysconfig/network (this was on the 10.10 side on msut3-rx, not sure if some other step was missed on that host?):
GATEWAY=192.41.236.1
T2DCX appliance:
rocks add appliance T2DCX graph=default node=t2-dcx membership=T2
rocks add appliance attr T2DCX ganglia_address 239.2.12.61
Note that 56K nodes are in a different ganglia group, 239.2.12.73. Setting the ganglia_address for each host as well.
T3DCX appliance:
rocks add appliance T3DCX graph=default node=t3-dcx membership=T3
rocks add appliance attr T3DCX ganglia_address 239.2.12.67
a worker node, this can be done with a helper script using the "nodeinfo" "db" from tools directory:
/export/rocks/install/tools/rocks-add-host cc-113-1
of manually:
rocks "add host" cc-113-1 cpus=8 rack=113 rank=1 membership=T3
rocks add host interface cc-113-1 eth0
rocks set host interface ip cc-113-1 eth0 10.10.129.54
rocks set host interface name cc-113-1 eth0 cc-113-1
rocks set host interface mac cc-113-1 eth0 00:1e:c9:ac:34:a6
rocks set host interface module cc-113-1 eth0 bnx2
rocks set host interface subnet cc-113-1 eth0 private
rocks add host interface cc-113-1 eth1
rocks set host interface ip cc-113-1 eth1 192.41.237.54
rocks set host interface name cc-113-1 eth1 c-113-1
rocks set host interface mac cc-113-1 eth1 00:1e:c9:ac:34:a8
rocks set host interface module cc-113-1 eth1 bnx2
rocks set host interface subnet cc-113-1 eth1 public
rocks add host attr ganglia_address 239.2.12.67
rocks set host boot cc-113-1 action=install
rocks sync config
Note slight change in "interface name" command, no longer needs (or allows) domain name to be included.
Do rocks create distro
Now ready to build our customized ROCKS OS.
$ /export/rocks/install/tools/release-tools/rocks-create-distro-rolls.sh
The script does "rocks create distro" and some ancillary actions.
Yum update frontend
Yum update the frontend against itself. This picks up the sl-security updates from agl-updates roll.
$ yum clean all
$ yum update
$ reboot
Tests
See that a client kickstart config can be generated (need to have this host defined in db, see above):
$ rocks list host profile cc-113-1 > /tmp/cc-113-1.pro
$ rocks list host xml cc-113-1 > /tmp/cc-113-1.xml
See that the client is listed in dhcpd.conf and that pxe config is available.
$ grep cc-113-1 /etc/dhcpd.conf
$ ls /tftpboot/pxelinux/pxelinux.cfg
Remaining issues
- http access OK?
- resolv.conf on frontend, fixed by hand to point to msuinfo and msuinfox
- ganglia
- in /etc/ssh/ssh_config, turn off X11 forwarding
- note that /share/apps on clients is from msurxx (as set in auto.share)
Later Additions
Added
Salt minion setup during build. As with CFEngine, the clients keys are stored on frontend and injected in client during build. Also in parallel with CFE, the Salt master is elsewhere (not on ROCKS frontend). During frontend setup, the agl-saltstack roll is included, and the tarball of salt keys is unpacked to /var/salt.
Work around for primary_net
Changed the attribute used at line 358 of /opt/rocks/lib/python2.4/site-packages/rocks/commands/run/host/__init__.py to "primary_net_ssh" and added that attribute to the db. (This file modification is now in rocks_fe CFE script.)
rocks add attr primary_net_ssh private
Repo Mirror Web Server
Moved this to msut3-rx
Note This could be changed to an automount, say auto.web ...
Currently, have the files for mirror.msulocal/mirror stored on msu4.msulocal:/exports/vmware/mirror and using ROCKS FE web server for serving these. (Once VMware cluster in production, can probably relocate these files and perhaps change the web serving setup. The mirror dir uses 87GB today ...)
So, needed to export the directory from msu4 to the webserver, edit /etc/exports on msu4 and run "exportfs -a" to activate changes. Need to mount to ROCKS FE, in /etc/fstab:
# NFS mount of repo mirror
msu4.msulocal/exports/vmware/mirror /var/www/html/mirror nfs defaults 0 0
Do the mount:
mkdir /var/www/html/mirror
mount -a
Drop the jdk 1.7.0
Don't know if this change impacts any use of java on the cluster?
The ROCKS initial install includes jdk 1.7.0 x86_64 rpm. /export/rocks/install/rolls/base/5.5/x86_64/RedHat/RPMS/jdk-1.7.0_03-fcs.x86_64.rpm However, SL58 and updates are still on 1.6.0. Remove the 1.7.0 rpm from the base roll area so that it is not preferred over the 1.6.0 rpms during distro builds. The status of the ORACLE jdk rpms is in flux --- may need to change this setup soon. SL may move to an open jdk.
mkdir /root/archive
mv /export/rocks/install/rolls/base/5.5/x86_64/RedHat/RPMS/jdk-1.7.0_03-fcs.x86_64.rpm /root/archive/
/export/rocks/install/tools/release-tools/rocks-create-distro-rolls.sh
Then remove the 1.7.0 rpm and install 1.6.0
yum erase jdk.x86_64
yum install jdk.x86_64
yum update jdk.i586
--
TomRockwell - 25 Apr 2012