Lustre Backup
Following:
http://wiki.lustre.org/manual/LustreManual18_HTML/BackupAndRestore.html
Snapshots
On umfs15 there are regularly scheduled hourly and daily snapshots taken of the MDT volume. Any of these snapshots can be used to rollback the MDT volume, or be "cloned" to be a new volume.
Tar/getattr backups to bambi
We write Lustre meta-data backups to NFS shares on our backup server. From there it gets a regularly scheduled backup onto tape.
Filesystem backups: bambi.local:/pool1/lustre-backup
Here are scripts which clone (and delete) the latest available snapshot on umfs15 which is then mounted and used to take our backup (see other scripts coming up on this page). The sunmdt-backup device is a multipath device defined in /etc/multipath.conf. Note the commands to flush and reload the multipath maps.
"scriptuser" is a user defined on umfs15 with rights to "changeGeneralProps, clone, createShare, destroy" on shares in the Lustre project (see Exceptions tab on user properties to configure this). The public portion of root's ssh RSA key from lmd01 and lmd02 was put into place to allow passwordless login as this user from the root account on the lmd systems.
Now the creation script:
#!/bin/bash
# ssh -T to disable pseudo-terminal allocation (avoids informational message about it)
ssh -T scriptuser@umfs15 < /dev/null
multipath -F &> /dev/null
multipath -v0 &> /dev/null
else
echo "Problem creating BACKUP-MDT clone of snapshot"
exit 1
fi
Here is the script to delete the clone:
#!/bin/bash
ssh -T scriptuser@umfs15 < /dev/null
multipath -v0 &> /dev/null
else
echo "Problem deleting BACKUP-MDT"
exit 1
fi
Here is the script used to write the backup. It runs both of the previous scripts.
#!/bin/bash
mount|grep -q "/mnt/mdt"
if [ $? -eq 0 ]; then
# echo "/mnt/mdt mounted, exiting"
exit 0
fi
echo "Cloning latest snapshot to mount..."
/root/create-mdt-backup-volume.sh
if [ $? -eq 0 ]; then
echo "Clone successful..."
else
echo "Clone not successful...running clone
delete script and quitting"
/root/delete-mdt-backup-volume.sh
exit 1
fi
# datestamp yyyy-mm-dd
DATE=`date +%F`
SNAPSHOT="/mnt/mdt_snapshot"
DEVICE=`ls /dev/mapper/mpath*`
WC=`echo $DEVICE | wc -w `
if [ $WC -gt 1 ]; then
echo "Found more than one mpath device, risk of mounting wrong device, quitting"
/root/delete-mdt-backup-volume.sh
exit 1
fi
ls $DEVICE &> /dev/null
if [ $? -ne 0 ]; then
echo "Snapshot device not found, quitting"
exit 1
fi
# now follow backup steps
VOLUME="/mnt/lustre-backup"
# mount our destination
if ! mount|grep -q $VOLUME; then
mount $VOLUME > /dev/null
fi
if mount|grep -q $VOLUME; then
# mount it
echo "Mounting snapshot..."
mount -t ldiskfs $DEVICE $SNAPSHOT
# back up EAs
echo "Backing up EAs..."
cd $SNAPSHOT
time getfattr -R -d -m '.*' -P . > ea.bak
echo "Backup of EAs complete..."
# backup filesystem data
echo "Backing up filesystem data..."
time tar czf /mnt/lustre-backup/mdt-backup-$DATE.tgz --sparse .
echo "Backup of filesyste complete..."
# get out of dir
cd /tmp
echo "Unmounting and deleting snapshot..."
# unmount
umount $SNAPSHOT
/root/delete-mdt-backup-volume.sh
if [ $? -eq 0 ]; then
echo "Deleted backup snapshot..."
fi
# finally, get rid of old backups
echo -e "\n"
echo "Removing backups more than 14 days old"
find $VOLUME -mtime +14 -exec rm {} \;
else
echo "Output location $VOLUME not mounted in mds-tar-backup.sh"
/root/delete-mdt-backup-volume.sh
exit 1
fi
echo -e "\n"
echo "Script mds-tar-backup.sh finished"
Configuring backup server and mounts
Setting up the NFS shares on bambi:
zfs create pool1/lustre-backup
zfs create pool1/lustre-image
zfs set sharenfs=root=lmd01.local pool1/lustre-image
zfs set sharenfs=root=lmd01.local pool1/lustre-backup
Fstab on lmd01:
bambi.local:/pool1/lustre-image /mnt/lustre-image nfs rsize=32768,wsize=32768,tcp 0 0
bambi.local:/pool1/lustre-backup /mnt/lustre-backup nfs rsize=32768,wsize=32768,tcp 0 0
--
BenMeekhof - 04 May 2010