Introduction
A prototype build of a file server in Rocks 5 is described, along with caveats and difficulties. In this instance, the file server is destined to be for dCache. However, there is no real difference at this level between a dCache file server and an NFS file server.
The prototype build work was performed on umfs16 at UM.
Difficulties and caveats
- Initial actions
- It is assumed that the host is already set in the Rocks DB, with interfaces defined.
- rocks set host boot [hostname] action=install
- If the cfengine xml node is active...
- Action must be taken on manage.aglt2.org to place the general, file server, public cfengine key in place for the machine.
- PXE boot is first in the BIOS boot order, ahead of the hard disk. If that is the case, then the sub-items here can be skipped as they will not be relevant.
- The hard disk is first in the boot order for file servers, so PXE boot is forced at the console via F12 key choice.
- Despite having Myricom cards disabled in BIOS, a popup prompt at the console still forces a choice between using the internal NIC and the Myricom NIC for PXE boot. The default selection is the internal NIC.
- If the MD1000 disk shelves are built and in use and powered on, then popups in anaconda, one per shelf, at the start of the build demand to know if the file system should be used. The specific question is as follows.
- The partition table on device sdb [or c, d, e] (DELL PERC 6/E Adapter 28608000 MB) was unreadable. To create new partitions it must be initialized, causing the loss of ALL DATA on this drive. This operation will override any previous installation choices about which drives to ignore. Would you like to initialize this drive, erasing ALL DATA?
- One can then click the red "No" button in each successive popup and the install will proceed as expected.
- The build will wait at this point and not proceed until console action is taken.
- No disk will be modified until such time as that action is taken.
- No attempt has yet been made to avoid this popup.
- TDR has suggested the shelves should simply be powered off during this build as a means to protect the disk data.
- TDR has suggested that anaconda commands could be used to automatically bypass these disks.
- Neither dCache nor NFS setups are put in place. These must be done manually, and are scripted as detailed below.
XML Methodology
Just as the worker node xml is built based upon the agl-appl-dell-worker XML node, the file server is based upon the agl-appl-dell-gridftp XML node. There is an over-ride to the sshd setup of replace-ssh.xml within the server-security XML node, however most else is pretty vanilla in scope. This sshd setup will be over-ridden by cfengine, but I felt the correct prototype should be in place in advance of that.
The "fss-dcache" appliance at UM is the basis for the build. Our file servers have a mix of drac cards, either of type "DRAC 5" or if "iDRAC6". Consequently we invoke both XML nodes, and let the "syscfg" report determine which is applied to the machine.
Pre and post scripts turn off much of the rocks-specific services, such as rocks-grub, and further over-ride the BIOS setups to place the hard disk first in the boot order as we will not again wish to automatically PXE boot the file server machines.
All remaining file server configurations are performed by cfengine. The exception is that neither dCache, nor NFS, nor the associated disk-shelf mounts, are automatically performed. This must be done manually at this time.
Build procedure
- Start with current situation on the file server.
- [ ] pairs indicate which machine is the active session.
- Save the host certs, xfs disk mounts, and network configurations for later restoration
[umfs16]
cd
fs_saves.sh (copy this script from the tools repo in svn)
[somewhere else]
scp umfs16.local:/root/restore_umfs16.tar .
- Rocks rebuild file server
- Restore files saved pre-build
[umfs16]
cd /root/tools
# copy the restore_<machine>.tar file here from wherever it was saved to this directory
./restore_fs_saves.sh
- Follow directions to clean up/delete all tar files made as they contain the host certs for the file server
- Note that the ifcfg-eth* files were replaced above by those from pre-Rocks-build. The Rocks-build versions were saved in the following directory, should the builder prefer to keep them in place:
- If bonding was previously used, either reboot, or restart the network service.
- Make sure the certificates are in place: cron places these only at 20 minutes after each hour.
- /bin/bash /root/tools/rsync-certificates.sh
- Install dCache if so desired
- The dcache service is "chkconfig on", but it is not started
[umfs16]
cd /root/tools/dcache
./install_dcache.sh
- Inspect the machine. If satisfied, then:
--
BobBall - 27 Jan 2010