Adding New OSS to Lustre

Below are the steps needed to add a new OSS (storage server) to Lustre

  • Install or re-purpose an SL5.5 node
  • Update all BIOS/Firmware/Drivers and run yum update
  • Make sure the network is (re)configured to use bonding. Note that since Lustre uses the private subnet for OST access, the private VLAN should be the untagged one on the bonded interface.

Note that we have scripts and RPMS in /afs/atlas.umich.edu/hardware/Lustre. You should check there to see what is available.

Once the node is properly prepared you can begin to install and configure Lustre:

  • Install the following RPMS for Lustre (examples here are for 1.8.4 and using ext4 variant; modify according to the version being used):
    • kernel-2.6.18-194.3.1.el5_lustre.1.8.4-ext4.x86_64.rpm
    • kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4-ext4.x86_64.rpm
    • kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4-ext4.x86_64.rpm (*NOTE: this one must be run with 'rpm -Uvh')
    • e2fsprogs-1.41.10.sun2-0redhat.rhel5.x86_64.rpm
    • lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4-ext4.x86_64.rpm
    • lustre-ldiskfs-3.1.3-2.6.18_194.3.1.el5_lustre.1.8.4-ext4.x86_64.rpm
    • lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4-ext4.x86_64.rpm
    • lustre-tests-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4-ext4.x86_64.rpm

Since this is an "install" rather than an upgrade you can use the rpm -ivh form except for the kernel-headers RPM (use -Uvh there). There is documentation about setting up Lustre at https://hep.pa.msu.edu/twiki/bin/view/AGLT2/LustreNew

After installing the RPMS, reboot into the Lustre kernel. Our next step is to create the appropriate partitions for OSTs. There are some scripts in AFS which can help with the OST creation at the hardware level. We utilize the Dell omconfig utility to create RAID-6 arrays on the disk shelves. In previous cases we have chosen to create 2xRAID-6 per 15 disk shelf. This was to better match the I/O capabilities for Lustre but is wasteful of disk space (4 out of 15 disks are parity). For UMDIST03 we will utilize 6 RAID-6 partitions; 1 per 15 disk MD1000 shelf resulting in 6 "devices" from /dev/sdb to /dev/sdg. There is a script called setup_lustre_ost.sh in AFS which can help create RAID-6 partitions on Dell hardware.

For UMDIST06 we are reusing the existing RAID-6 partitions from the prior NFS setup. One RAID-60 array over 2 shelves was rebuilt as two RAID-6 arrays. These arrays then require formatting for use within Lustre. There is a script called format_lustre.sh which can be used as a template to create the lustre filesystem for the OSTs. The parameters that may require tuning are the stripe and inodes settings. We are using the following guidelines to set the value for these:

  • The stripe is the mount option for the number of stripe blocks. Each Lustre block is 4096 bytes. Our RAID-6 arrays in use on UMDIST03 are setup with 512KB stripe elements. Therefore each stripe element (512KB) contains 128 Lustre blocks. For RAID-6 on one MD1000 shelf we have 13 disks participating. Therefore we should set 1664 (13x128) for the stripe.
  • The 'inodes' are built assuming each file is 8-9MB. For 9TB (UMDIST03) that means inodes about 1,100,000.

Once the formatting is complete we need to setup the /etc/fstab to mount by UUID. Use the make_lustre_fstab.sh script:

[root@umdist03 ~]# ./make_lustre_fstab.sh
[root@umdist03 ~]# cat /etc/fstab
LABEL=/                 /                       ext3    defaults        1 1
LABEL=/var              /var                    ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=SWAP-sda3         swap                    swap    defaults        0 0
head02.aglt2.org:/pnfs        /pnfs            nfs rw,hard,nfsvers=3 0 0
UUID=407204a4-e32f-4be3-9d70-e340ea9b6d68 /mnt/ost11 lustre _netdev 0 0
UUID=4e5322fc-221c-420d-8664-aefaae334d17 /mnt/ost12 lustre _netdev 0 0
UUID=deab207e-2ccd-4172-8fd2-156aebbf3d0f /mnt/ost21 lustre _netdev 0 0
UUID=3dcc417e-95b7-4ec5-9013-c31afdff56c5 /mnt/ost22 lustre _netdev 0 0
UUID=bbc8be6e-f047-4764-80ee-03bd0d2a70e5 /mnt/ost31 lustre _netdev 0 0
UUID=4c140eb3-e4ff-412c-af32-7cc809dcf273 /mnt/ost32 lustre _netdev 0 0

Next we need to setup the /etc/modprobe.conf to correctly prepare the require Lustre lnet setup. We only need to add a single line which includes the required "routing" in place to support the physics subnet:

options lnet networks=tcp0(bond0) routes="tcp2 10.10.1.[50-52]@tcp0"

Then run depmod -a. Last thing before starting is to create the mount points for the OSTs:
mkdir /mnt/ost11
...
mkdir /mnt/ost32

Now we are ready to "startup" Lustre. All we need to do is mount the OSTs:

[root@umdist03 ~]# mount -a -t lustre
[root@umdist03 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             33099928   9824332  21567088  32% /
/dev/sda2             19840924    890940  17925844   5% /var
tmpfs                 16470428         0  16470428   0% /dev/shm
head02.aglt2.org:/pnfs
                     10995116277760 1006731885472 9988384392288  10% /pnfs
AFS                    9000000         0   9000000   0% /afs
/dev/sdb             9513526012    498492 9037203392   1% /mnt/ost11
/dev/sdc             9513526012    451816 9037250068   1% /mnt/ost12
/dev/sdd             9513526012    498656 9037203228   1% /mnt/ost21
/dev/sde             9513526012    483604 9037218280   1% /mnt/ost22
/dev/sdf             9513526012    491812 9037210072   1% /mnt/ost31
/dev/sdg             9513526012    500160 9037201724   1% /mnt/ost32

Now you can check dmesg to verify things are OK. Also look with 'lctl dl':

[root@umdist03 ~]# lctl dl
  0 UP mgc MGC10.10.1.140@tcp 277aa67c-7391-13f2-8f84-91a9674c7765 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter umt3-OST001c umt3-OST001c_UUID 249
  3 UP obdfilter umt3-OST001d umt3-OST001d_UUID 245
  4 UP obdfilter umt3-OST001e umt3-OST001e_UUID 251
  5 UP obdfilter umt3-OST001f umt3-OST001f_UUID 253
  6 UP obdfilter umt3-OST0020 umt3-OST0020_UUID 250
  7 UP obdfilter umt3-OST0021 umt3-OST0021_UUID 240

That's it. Should be online and working within Lustre now.

-- ShawnMcKee - 02 Sep 2010
Topic revision: r3 - 02 Sep 2010, ShawnMcKee
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback