Storage Element
An SRM/dCache instance is added to the site as a grid accessible Storage Element. dCache is a very flexible package for combining multiple filesystems into one namespace. This allows us to use many (commodity) disks and RAID arrays as one system. SRM provides a standard grid interface to the storage. The VDT project produces an automated install package for these programs.
References:
Install Process
The install process has these general steps:
- Learning about dCache --- see above references
- Design and Planning
- Deciding what hardware to use, what network connections will exit, mount points, etc.
- Node Preparation
- Host certificates, disk areas, etc
- Install Configuration
- Run config-node.pl script to make site-info.def. Edit site-info.def as needed.
- Actual Install
- Install nodes in proper order
- Services Start
- Final Configuration
- Directory tags in pnfs
- Poolmanager config
- Testing
Design
Services / Servers
We have 2 dual dual-core (XEON e5320) 2U servers with 8GB each to use in this install. msu2 has an internal 4x 750GB drive array. msu4 has an md1000 shelf with 15 x 750GB. They are installed with SLC45 64 bit via ROCKS. The RAID array is formatted with XFS and mounted at
/exports/pool
MSU2 will be used as the admin node. Services will include postgres, PNFS. SRM will also run here.
MSU4 will serve as a door / pool unit with pool, dcap, gridftp services. These will not require postgres databases... The pool will be about 4TB RAID array.
Nodes have 2 1G Ethernet connections. The networks are .local (private) and .aglt2.org (public).
Local worker nodes will not mount pnfs and will use dcap or gridftp to transfer files, not local protocols.
List of External Services
These are available on .local and .aglt2.org networks (services bound to all IPs).
Service |
Accessible From |
URL |
Comments |
SRM |
World |
srm://msu2.algt2.org:8443/srm/managerv2?SFN=/pnfs/msu-t3.aglt2.org |
GSI Authenticated SRM |
DCAP |
Should be local only, but not sure |
dcap://msu4.aglt2.org/pnfs/msu-t3.aglt2.org |
Unauthenticated DCAP |
GSIDCAP |
World |
gsidcap://msu4.aglt2.org/pnfs/msu-t3.aglt2.org |
GSI authenticated DCAP. Should this be supported? |
GridFTP |
World |
gsiftp://msu4.aglt2.org/pnfs/msu-t3.aglt2.org |
GSI authenticated GridFTP |
List of targets for DZero:
Planned upgrades
A 10G Ethernet interface will be added to msu4. Probably this will have a .aglt2.org address, but might try putting multiple VLANs on the interface and having it on .local as well. The PERC5 card on msu4 should be upgraded to PERC6 (will use shelf spare temporarily). The card should be moved to an x8 PCI slot.
Add pools on Dell compute nodes. These will be managed as a "read-pool" for compute nodes to get their job and minbias input files for DZero VO Monte Carlo generation. Considering changing disk partitioning so that dcache pool area is all of one disk. It is undecided whether these pools will be exposed (for reading) to remote sites.
Node Preparation
The nodes are installed with SLC45 64 bit.
Certs
Both machines need host certificates placed in /etc/grid-security
(This is a local script used for putting the certs to a ROCKS nodes...
perl /home/install/extras/install_hostcert.pl msu2
perl /home/install/extras/install_hostcert.pl msu4
Install Configuration
Get the VDT install tarball. Here I'm using vdt-dcache-SL4_64-2.1.6.tar.gz from
http://vdt.cs.wisc.edu/software/dcache/server/
Unpack the tarball in a temporary working place. Change into the install directory and run
config-node.pl
root@msu4 /tmp/vdt-dcache-SL4_64-2.1.6/install# ./config-node.pl
Found java at /usr/bin/java. Version is 1.6.0_03.
Installed java version matches bundled java version.
How many admin nodes (non-pool and non-door nodes) do you have? 1
The recommended services for node 1 are:
lmDomain poolManager pnfsManager dirDomain adminDoor httpDomain utilityDomain gplazmaService infoProvider srm replicaManager
Enter the FQDN for the node 1: msu2.aglt2.org
Which services do you wish to run on node msu2.aglt2.org (Enter for defaults)?
Do you wish to use the SRM Watch? [y or n]: y
How many door nodes do you have? 1
Enter the FQDN of door number 0: msu4.aglt2.org
Enter the private network that the pools are in.
If this does not apply, just press enter to skip:
Enter the number of dcap doors to run on each door node [default 1]:
Enter a pool FQDN name(Press Enter when all are done): msu4.aglt2.org
Enter the first storage location (Press Enter when all are done)): /exports/pool0
Enter another storage location (Press Enter when all are done)):
Enter another pool FQDN name(Press Enter when all are done):
Created site-info.def file.
Changes to site-info.def
Want to make the following changes:
- change MY_DOMAIN away from default of aglt2.org. That is already in use on the cluster and I want to avoid baking in a name conflict.
- force install of java, needed since not all nodes will have proper java (above config-node.pl was run a host with previous VDT install of Java...)
- put logs in /var/log/dcache, dcache system makes multiple logs, want to be able to find them all, otherwise they go into /var/log.
- change RESET_DCACHE vars to "yes". This does a reset to clean out old (what?)
Changed these vars:
MY_DOMAIN="msu-t3.aglt2.org"
JAVA_LOCATION="/opt/d-cache/jdk1.6.0_03/bin/java"
INSTALL_JDK=1
JDK_RELOCATION=/opt/d-cache
JDK_FILENAME=jdk-6u3-linux-amd64.rpm
DCACHE_LOG_DIR=/var/log/dcache
RESET_DCACHE_CONFIGURATION=yes
RESET_DCACHE_PNFS=yes
RESET_DCACHE_RDBMS=yes
Dryrun
Run the installer with the
--dryrun
option and it will list the actions it will take. The
-s
option specifies the site-info file to use:
# ./install.sh --dryrun -s site-info.def.msu-t3
Actual Installs
To perform an install, copy the VDT tarball and site-info file to the node. Unpack tarball and run the install.sh script using the site-info config file. Refer to the VDT documentation for the order to install nodes. For this simple setup, the admin node just needs to be installed first.
Starting Services
After install has been done on all nodes, then start services.
Admin node
Seems that postgresql and pnfs don't need to be started separately???
root@msu2# /opt/d-cache/bin/dcache-core start
Starting dcache services:
Starting lmDomain Done (pid=17437)
Starting dCacheDomain Done (pid=17487)
Starting pnfsDomain Done (pid=17542)
Starting dirDomain Done (pid=17597)
Starting adminDoorDomain Done (pid=17645)
Starting httpdDomain Done (pid=17709)
Starting utilityDomain Done (pid=17763)
Starting gPlazma-msu2Domain Done (pid=17880)
Starting infoProviderDomain Done (pid=17989)
Starting replicaDomain Done (pid=18061)
Using CATALINA_BASE: /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_HOME: /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_TMPDIR: /opt/d-cache/libexec/apache-tomcat-5.5.20/temp
Using JRE_HOME: /opt/d-cache/jdk1.6.0_03
Pinging srm server to wake it up, will take few seconds ...
Done
Change admin password
Login and change default password.
[rockwell@cap ~]$ ssh -l admin msu2.local -p 22223 -c blowfish
admin@msu2.local's password:
dCache Admin (VII) (user=admin)
[msu2.aglt2.org] (local) admin > cd acm
[msu2.aglt2.org] (acm) admin > create user admin
[msu2.aglt2.org] (acm) admin > set passwd newpass newpass
[msu2.aglt2.org] (acm) admin > ..
[msu2.aglt2.org] (local) admin > logoff
dmg.util.CommandExitException: (0) Done
[msu2.aglt2.org] (local) admin > Connection to msu2.local closed.
Issues
- Needed to open firewall for remote connections to pnfs (this config is using the .aglt2.org network only). This was noticed after pool node couldn't mount /pnfs/msu-t3.aglt2.org.
- replicaDomain.log grew rapidly with an error message that "Group Resilient Pools is empty". This is probably due to having replication enabled but not configured. Going to disable replica service for now (have only one pool at the moment anyways...). Turn off in
/opt/d-cache/config/dCacheSetup
on admin node and restart services.
Pool / Door
Start dcache-core and then dcache-pool.
root@msu4# /opt/d-cache/bin/dcache-core start
/pnfs/msu-t3.aglt2.org/ not mounted - going to mount it now ...
Starting dcache services:
Starting dcap-msu4Domain Done (pid=559844)
Starting gridftp-msu4Domain Done (pid=559911)
Starting gsidcap-msu4Domain Done (pid=559980)
root@msu4# /opt/d-cache/bin/dcache-pool start
start dcache pool: Starting msu4Domain Done (pid=560141)
root@msu4# mount ...
msu2.aglt2.org:/pnfsdoors on /pnfs/msu-t3.aglt2.org type nfs (rw,intr,noac,hard,nfsvers=2,addr=192.41.231.12)
Tests
See
https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/ValidatingDcache
Install client rpms (not included in VDT install tarball).
rpm -i /home/install/contrib/4.3/x86_64/RPMS/dcache-*
Try with dccp on msu4:
root@msu4# /opt/d-cache/dcap/bin/dccp /tmp/vdt-dcache-SL4_64-2.1.6.tar.gz /pnfs/msu-t3.aglt2.org/data/
If you get a error about "Can't open destination file", make sure that the pool is actually setup right (paths are correct and a proper link exists in poolmanager). Actual permissions on the pool probably aren't the issue as this test is being done as root... Here is an example of this error condition:
root@msu2 /pnfs/msu-t3.aglt2.org# /opt/d-cache/dcap/bin/dccp /tmp/test.txt testy/
Command failed!
Server error message for [1]: "No write pool available for <teststore:testgroup@osm>" (errno 20).
Failed open file in the dCache.
Can't open destination file : "No write pool available for <teststore:testgroup@osm>"
System error: Input/output error
Try dccp from a user account (permissions are wrong in both pnfs an dpool for this):
rockwell@msu2 ~$ /opt/d-cache/dcap/bin/dccp /tmp/test.txt /pnfs/msu-t3.aglt2.org/testy/
Failed create entry in pNFS.
Can't open destination file : Can not create entry in pNfs
System error: Operation not permitted
Now with permissions in pnfs fixed (chmod 777), surprisingly, it worked.
rockwell@msu2 ~$ /opt/d-cache/dcap/bin/dccp /tmp/test.txt /pnfs/msu-t3.aglt2.org/testy/
8 bytes in 0 seconds
rockwell@msu2 ~$ ls -l /pnfs/msu-t3.aglt2.org/testy/
total 0
-rw-r--r-- 1 root root 32 Apr 9 14:50 modprobe.conf.rocks
-rw-r--r-- 1 rockwell umatlas 8 Apr 10 21:45 test.txt
The data file on the pool is owned by root --- ok, this is how it should be. Above also works from msu4 (which has pnfs mounted).
using dccp:// A dcap door is running on msu4 (need to disable it...).
root@msu2 ~# /opt/d-cache/dcap/bin/dccp dcap://msu4/pnfs/msu-t3.aglt2.org/testy/test.txt .
8 bytes in 0 seconds
root@msu2 ~# cat test.txt
yo baby
to gsidcap door with a
grid proxy
rockwell@msu2 ~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/OSG/globus/lib
rockwell@msu2 ~$ /OSG/globus/bin/grid-proxy-init
Your identity: /DC=org/DC=doegrids/OU=People/CN=Thomas D. Rockwell 611410
Enter GRID pass phrase for this identity:
rockwell@msu2 ~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/d-cache/dcap/lib
rockwell@msu2 ~$ /opt/d-cache/dcap/bin/dccp /tmp/test.txt gsidcap://msu4.aglt2.org/pnfs/msu-t3.aglt2.org/testy/testnew.txt
Error ( POLLIN POLLERR POLLHUP) (with data) on control line [3]
Failed to create a control line
Error ( POLLIN POLLERR POLLHUP) (with data) on control line [5]
Failed to create a control line
Failed open file in the dCache.
Can't open destination file : Server rejected "hello"
System error: Input/output error
rockwell@msu2 ~$ echo $?
255
That was actually a successful transfer, have to ignore the errors messages I guess.
gridftp
rockwell@msu2 ~$ time /OSG/globus/bin/globus-url-copy file:/tmp/gsiftp.txt gsiftp://msu4.aglt2.org/pnfs/msu-t3.aglt2.org/testy/
real 0m1.230s
user 0m0.060s
sys 0m0.000s
srmls with grid proxy as above
rockwell@msu2 ~$ time /opt/d-cache/srm/bin/srmls srm://msu2.aglt2.org:8443/srm/managerv2?SFN=/pnfs/msu-t3.aglt2.org/
512 /pnfs/msu-t3.aglt2.org//
512 /pnfs/msu-t3.aglt2.org//testy/
512 /pnfs/msu-t3.aglt2.org//data/
real 0m4.029s
user 0m3.170s
sys 0m0.110s
Issues / Debugging
The error messages from failed transfers are not often not so helpful, so a systematic test method is useful.
srm version 2, probably gives the best error messages.
Consider this problem due to a misconfiguration of the paths in storage-authzdb:
rockwell@msu2 ~$ /OSG/globus/bin/globus-url-copy file:///tmp/afile.txt gsiftp://msu4.aglt2.org/pnfs/msu-t3.aglt2.org/dzero/cache/
error: globus_ftp_client: the server responded with an error
550 File not found
rockwell@msu2 /tmp$ time /opt/d-cache/srm/bin/srmcp -2 file:////tmp/afile.txt srm://msu2.aglt2.org:8443/srm/managerv2?SFN=/pnfs/msu-t3.aglt2.org/dzero/cache/srm-copy.txt
Mon Apr 21 11:50:30 EDT 2008: srmPrepareToPut update failed, status : SRM_FAILURE explanation= at Mon Apr 21 11:50:30 EDT 2008 state Failed : at Mon Apr 21 11:50:29 EDT 2008 state Pending : created
RequestFileStatus#-2147481647 failed with error:[ at Mon Apr 21 11:50:30 EDT 2008 state Failed : user`s path ///pnfs/msu-t3.aglt2.org/dzero/cache/srm-copy.txt is not subpath of the user`s root]
Mon Apr 21 11:50:30 EDT 2008: PutFileRequest[srm://msu2.aglt2.org:8443/srm/managerv2?SFN=/pnfs/msu-t3.aglt2.org/dzero/cache/srm-copy.txt] status=SRM_AUTHORIZATION_FAILURE explanation= at Mon Apr 21 11:50:30 EDT 2008 state Failed : user`s path ///pnfs/msu-t3.aglt2.org/dzero/cache/srm-copy.txt is not subpath of the user`s root
Mon Apr 21 11:50:30 EDT 2008: java.io.IOException: srmPrepareToPut update failed, status : SRM_FAILURE explanation= at Mon Apr 21 11:50:30 EDT 2008 state Failed : at Mon Apr 21 11:50:29 EDT 2008 state Pending : created
RequestFileStatus#-2147481647 failed with error:[ at Mon Apr 21 11:50:30 EDT 2008 state Failed : user`s path ///pnfs/msu-t3.aglt2.org/dzero/cache/srm-copy.txt is not subpath of the user`s root]
srm client error: stopped
java.lang.Exception: stopped
at gov.fnal.srm.util.Copier.run(Copier.java:287)
at java.lang.Thread.run(Thread.java:619)
real 0m4.001s
user 0m3.863s
sys 0m0.130s
The second error message points to just what the problem is...
test progression Note that srmls mainly test authentication, an srmmkdir will show if a write can be done to /pnfs and an actual copy will exercise the pool.
Auth and Auth
Authentication and Authorization uses the gPlazma cell. This is built with a plug-in architecture and provides multiple types of AA to the other dCache cells.
dcache.kpwd
This is a simple mode. The file /opt/d-cache/etc/dcache.kpwd is setup on the node running the gPlazma cell (other nodes don't need this file installed). The file has lines mapping grid subjects to local usernames and usernames to access:uid:gid:homedir:rootdir:root tuples.
[rockwell@cap ~]$ setup vdt
[rockwell@cap ~]$ globus-url-copy gsiftp://msu4.aglt2.org/pnfs/msu-t3.aglt2.org/testy/test.txt file:///tmp/test.txt
error: the server sent an error response: 530 530 Authorization Service failed: diskCacheV111.services.authorization.AuthorizationServiceException: authRequestID 693270851 caught exception
Exception thrown by diskCacheV111.services.authorization.KPWDAuthorizationPlugin: dcache.kpwd Authorization Plugin: Authorization denied for user rockwell with Subject DN /DC=org/DC=doegrids/OU=People/CN=Thomas D. Rockwell 611410
Fixed dcache.kpwd file and now it works:
[rockwell@cap ~]$ globus-url-copy gsiftp://msu4.aglt2.org/pnfs/msu-t3.aglt2.org/testy/test.txt file:///tmp/test.txt
[rockwell@cap ~]$ cat /tmp/test.txt
yo baby
Scheme for DZero
To support DZero MC processing, which relies on a small and stable set of subjects and just one local user, will setup gPlazma to use grid-mapfile. Will also have a (higher priority) check to dcache.kpwd so that local and test users can be supported. This will all for instance my rockwell subject to be manually remapped to something other than what is in the gridmap for testing. Will migrate to a full check to a GUMS server in the future.
In
/etc/grid-security
on msu2, copy in the grid-mapfile from msu-osg. Edit the file
/etc/grid-security/storage-authzdb
to look like this:
version 2.1
authorize samgrid read-write 825664 55673 / /pnfs/msu-t3.aglt2.org/dzero/samgrid /
Setup the file
/opt/d-cache/etc/dcache.kpwd
Setup gPlazma to use these to mechanisms with higher priority for dcache.kpwd, on msu2 edit
/opt/d-cache/etc/dcachesrm-gplazma.policy
.
# Switches
saml-vo-mapping="OFF"
kpwd="ON"
grid-mapfile="ON"
gplazmalite-vorole-mapping="OFF"
# Priorities (lower priority is tried first, if the method is enabled above)
saml-vo-mapping-priority="1"
kpwd-priority="3"
grid-mapfile-priority="4"
gplazmalite-vorole-mapping-priority="2"
Note, to get this change to the .policy file applied, a restart was needed for gPlazma. (i just restarted all of dcache-core, but there should be a way to do it in the admin console as well.) The mapping files are checked for each authentication, so changes there are picked up automatically.
SRM
The Storage Resource Manager is an new interface to dCache that provides additional functionality to grid users. SRM is also standardized so that other storage systems may provide the same interface, though dCache is currently the main production ready implementation.
SRM features (see the dcache book...):
- space management
- including space reservations/tokens
- data transfers
- pre-transfer srmPrepareToPut and srmPrepareToGet
- request status
- directory functions
- permissions functions
Install and config
SRM is installed with everything else using the vdt installer. Its config is in /opt/d-cache/config/dCacheSetup and it is enabled per node in /opt/d-cache/etc/node_config.
Errors
Had lots of jobs starting at once after having dcache run for a while without trouble. See this in gridftp log on msu4:
05 May 2009 14:53:57 Socket OPEN (ACCEPT) remote = /192.41.231.46:52023 local =
/192.41.231.14:2811
java.lang.OutOfMemoryError: GC overhead limit exceeded
05/05 14:54:01 Cell(GFTP-msu4@gridftp-msu4Domain) : Thread : listen got : java.l
ang.OutOfMemoryError: GC overhead limit exceeded
05/05 14:54:05 Cell(GFTP-msu4@gridftp-msu4Domain) : Thread : ClinetThread-/192.4
1.231.46:52023 got : java.lang.OutOfMemoryError: GC overhead limit exceeded
05/05 14:54:07 Cell(GFTP-msu4@gridftp-msu4Domain) : java.lang.OutOfMemoryError:
GC overhead limit exceeded
05/05 14:54:07 Cell(GFTP-msu4@gridftp-msu4Domain) : java.lang.OutOfMemoryError:
GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
The java processes don't have much memory in use, maybe there is a memory limit parameter that can be increased?
Restart stuff on msu4.
--
TomRockwell - 26 Mar 2008