Using ZFS on Linux for AGLT2 AFS Fileservers
Recently
ZFS on Linux became available. ZFS has lots of nice features including Copy On Write (COW), data integrity via checksum, inexpensive snapshots and other features.
The AFS file servers (linat06/07/08.grid.umich.edu) need upgrading to AFS 1.6.2 and Scientific Linux 6.4. As part of the upgrade we will try to use ZFS as the backend storage for AFS volumes
Creating Initial AFS Server VM
Our AFS cell
aglt2.org
has been virtualized in VMware vSphere 5.x for more than one year. To migrate to a new OS and AFS version, we intend to create a temporary AFS server (atback1.grid.umich.edu) to host AFS volumes from linat06/07/08, while we recreate each VM.
The initial atback1.grid.umich.edu VM (linat06n) was created using Ben's Cobbler setup (see
https://www.aglt2.org/wiki/bin/view/AGLT2/CobblerInfrastructure ) This setup was used to build the VM using Scientific Linux 6.4, 64-bit.
After the base host was built the VM had a new iSCSI LUN (~1TB) attached from the UMFS15 (Oracle NAS) system. This will become
/viceph
once we get ZFS installed.
The following
OpenAFS RPMS were installed:
openafs-plumbing-tools-1.6.2-0.144.sl6.x86_64
openafs-1.6.2-0.144.sl6.x86_64
kmod-openafs-358-1.6.2-0.144.sl6.358.0.1.x86_64
openafs-krb5-1.6.2-0.144.sl6.x86_64
openafs-server-1.6.2-0.144.sl6.x86_64
openafs-client-1.6.2-0.144.sl6.x86_64
kmod-openafs-1.6.2-5.SL64.el6.noarch
openafs-kpasswd-1.6.2-0.144.sl6.x86_64
openafs-authlibs-1.6.2-0.144.sl6.x86_64
openafs-module-tools-1.6.2-0.144.sl6.x86_64
openafs-compat-1.6.2-0.144.sl6.x86_64
openafs-devel-1.6.2-0.144.sl6.x86_64
Install ZFS
First we setup the ZFS repo /etc/yum.repos.d/zfs.repo
[zfs]
name=ZFS of Linux for EL 6
baseurl=http://archive.zfsonlinux.org/epel/6/$basearch/
enabled=1
metadata_expire=7d
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux
[zfs-source]
name=ZFS of Linux for EL 6 - Source
baseurl=http://archive.zfsonlinux.org/epel/6/SRPMS/
enabled=0
metadata_expire=7d
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux
This is installled via =rpm -ivh /afs/atlas.umich.edu/home/smckee/public/zfs-release-1-2.el6.noarch.rpm=
or, alternately directly from zfsonlinux.org,
yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el6.noarch.rpm
Once the repo is in place you can do:
yum -y install zfs
You can create a new
zpool
on a suitable device (iSCSI 1TB) via
zpool create <poolname> </dev/sdX>
. I had to force the creation using
zpool create -f zfs /dev/sdb
My iSCSI device was mounted on the VM at
/dev/sdb
That creates a zpool named
zfs. I then created a new pool called
zfs/viceph via:
zfs create zfs/viceph
The next important step is to create the right mountpoint so the
zfs/viceph shows up as
/viceph:
zfs set mountpoint=/viceph zfs/viceph
Now when
zfs
starts the zpool
zfs/viceph is mounted at
/viceph where
AFS
can find it.
We need to make sure
zfs starts automatically via
chkconfig --add zfs; chkconfig zfs on
NOTE: mounting for
zfs
is not controlled by
mount
but by
zfs mount
:
zfs mount /viceph
Last step is to
disable atime updates
and turn on
lz4 compression (See notes below on tuning)
SNAPshots on ZFS
The zfs filesystem supports snapshots. There are some nice cron-based utilities that implement an "auto-snapshot" capability. I installed the one from
https://github.com/zfsonlinux/zfs-auto-snapshot. Just unzip the package (copy in
~smckee/public/zfs-auto-snapshot-master.zip
), run
make
, then
make install
.
You can see the resulting cron entries in /etc/cron*. The snapshots show up in the zfs mount areas in
<MOUNTPOINT>/.zfs (NOTE this area is not visiable but you can
cd
to it and run
ls
)
Enable AFS on New Server
We first needed to copy over the
/usr/afs/local directory from
linat06.grid.umich.edu. Once on
atback1, we removed any of the
sysid* files and edited the
NetInfo and
NetRestrict files suitable for the
atback1 IP addresses.
Next copy over the /usr/afs/etc contents from
linat06, which includes the
KeyFile
Make sure zfs is started:
service zfs start
Verify /viceph is mounted on atback1:
df
Start AFS server:
service afs-server start
Verify AFS server is running:
[root@atback1 ~]# bos status atback1.grid.umich.edu
bos: running unauthenticated
Instance fs, currently running normally.
Auxiliary status is: file server running.
Instance dafs, disabled, currently shutdown.
Auxiliary status is: file server shut down.
At this point we have the new temporary server running. We can now proceed to move all the RW volumes from linat06 to this new file server. (First test a few relatively unused volumes and verify access continues to work). Once linat06 has all
RW volumes moved, we can remove all the
RO replicas (
vos remote linat06.grid.umich.edu /vicepe <VOLUME>
) When
/vicepe is empty we can build a new
linat06 VM (renaming the old one) and attach the iSCSI volume which hosted /vicepe to it. We then install linat06 with
OpenAFS and
zfs (as we did above for atback1), format
/vicepe as
zfs
, mount it and enable AFS. Then migrate everything back to linat06 from atback1.
Then we do the same process for
linat07 and then
linat08.
An AFS file server needs certain ports open to function. One example for changes to
/etc/sysconfig/iptables
is below. First near the top add:
:AFS-INPUT - [0:0]
Later in a suitable location put these lines:
-A INPUT -j AFS-INPUT
-A AFS-INPUT -p udp -m udp --dport 7000:7010 -j ACCEPT
-A AFS-INPUT -p udp -m udp --sport 7000:7010 -j ACCEPT
Test access from a client. If the client has problems, try
fs checkservers
and
fs checkvolumes
.
Notes on Memory Issues
ZFS on Linux has a problem where the ARC cache uses about twice as much memory as it should. The ZFS developers are aware of this problem (found by CERN) and will have a fix in a future version (beyond 0.6.1). Meanwhile the recommendation from CERN is to create an /etc/modprobe.d/zfs.conf containing something which restricts the maximum ARC mem usage to about 25% of physical memory:
# Set max ARC to 25% of physical memory (8G on this VM so about 2G)
options zfs zfs_arc_max=214733648
Notes on DKMS and ZFS
Sometimes a new kernel may have problems getting DKMS to properly build the needed
spl
and
zfs
modules. The following sequence should work to force a build:
dkms uninstall -m zfs -v 0.6.1 -k 2.6.32-358.11.1.el6.x86_64 #(Put in the correct versions as needed)
dkms uninstall -m spl -v 0.6.1 -k 2.6.32-358.11.1.el6.x86_64
dkms build -m spl -v 0.6.1 -k 2.6.32-358.14.1.el6.x86_64
dkms install --force -m spl -v 0.6.1 -k 2.6.32-358.14.1.el6.x86_64
dkms build -m zfs -v 0.6.1 -k 2.6.32-358.14.1.el6.x86_64
dkms install --force -m zfs -v 0.6.1 -k 2.6.32-358.14.1.el6.x86_64
dkms status
Notes on Tuning ZFS for AFS (and vice-versa)
There was a thread on the
OpenAFS list that had some good suggestions. One involves a future version of AFS:
> - if running OpenAFS 1.6 disable the sync thread (the one which
> syncs data every 10s) It is pointless on ZFS (and most other
> file systems) and all it usually does is it negatively impacts
> your performance; ZFS will sync all data every 5s anyway
>> There is a patch to OpenAFS 1.6.x to make this tunable. Don't
> remember in which release it is in.
This is the -sync runtime option, which will be in the release after
1.6.2 (it is present in the 1.6.3pre* releases). To get the behavior
that Robert describes here, give '-sync none' or '-sync onclose'. Or
technically, '-sync always' will also get rid of the sync thread, but
will probably give you noticeably worse performance, not better.
For our setup I edited the
/usr/afs/local/BosConfig and changed the fileserver lines to include
'-sync onclose'
:
parm /usr/afs/bin/fileserver -sync onclose -pctspare 10 -L -vc 800 -cb 96000
...
parm /usr/afs/bin/dafileserver -sync onclose -pctspare 10 -L -vc 800 -cb 96000
The thread is
http://lists.openafs.org/pipermail/openafs-info/2013-June/039633.html
Another tuning used on linat06
- Disable access time updates
OpenAFS doesn't rely on them and you will be saving on some
unnecessary i/o,you can disable it for entire pool and by default all file system
within the pool will inherit the setting:
zfs set atime=off zfs/vicepe
Also turned on compression:
[root@linat06 ~]# zfs set compression=lz4 zfs/vicepe
[root@linat06 ~]# zfs get "all" zfs/vicepe | grep compress
zfs/vicepe compressratio 1.00x
zfs/vicepe compression lz4 -
The compression only affect file created after it is set.
--
ShawnMcKee - 21 Jul 2013