Upgrading dCache at AGLT2 from 2.10.55-1 to 2.13.23-1
We are upgrading to the next golden release of dCache on February 23, 2016. We have setup CFEngine to have the correct RPMs and versions and turned cfengine3 off on all our nodes. Previous work was done to upgrade to the most recent
PostgreSQL (9.5.1) and the most recent version of dCache in the 2.10 line: 2.10.55-1.
We will be following the Community created upgrade guide at
https://github.com/dCache/upgrade-guide-213
Procedure
- If you use SRM, update to the latest SRM client version available: We are using the EPEL versions: gfal2-plugin-srm-2.10.3-1.el6.x86_64 and
srm-ifce-1.23.3-1.el6.x86_64
- Upgrade to the latest patch level release of dCache: Done, we are on 2.10.55-1
- Prior to upgrading run dcache check-config:All show no problems except the two xrootd doors dcdum01 and dcdmsu01. The show some non-standard values: [INFO] dcdmsu01.conf:26: Property summary is not a standard property[INFO] dcdmsu01.conf:27: Property detailed is not a standard property[INFO] dcdmsu01.conf:29: Property xrootd.n2n.site is not a standard property[INFO] dcdmsu01.conf:30: Property vo is not a standard property
- If the node relies upon any database (check via dcache database ls), then tag the current schema version by running dcache database tag dcache-2.10. Run successfully on head01 and head02
- If you have any third party plugins that offer new services, then remove them and get updated versions. We have installed updated rpms in our repo for dcache-plugin-xrootd-monitor.noarch 7.0.0-0 and dcache-xrootd-n2n-plugin.noarch 6.0.7-0
- Run dcache services and compare to thes services from the table listing changed services in the guide URL above. Handle any of them that are used by replacing them with the alternative after upgrading.
- Head01 is running the following changed services: admin, broadcast, loginbroker, httpd, spacemanager
- Head02 is running the following changed services: dir, pnfsmanager
- Commented out removed services on head01 and head02 in their layout files
- Ensure java 8 is installed as default: [root@head01 layouts]# java -versionopenjdk version "1.8.0_51"OpenJDK Runtime Environment (build 1.8.0_51-b16)OpenJDK 64-Bit Server VM (build 25.51-b03, mixed mode)
- Install dcache 2.13: running yum update dcache on all nodes. Run on all nodes
- Reset /etc/dcache/logback.xml. This seems to have been done by the yum update?!? Noop
- If you used head, pool or single as the layout name, fix it! Doesn't apply at AGLT2
- Run dcache check-config. Fix any errors and repeat. Found problems in dcache.conf and layouts files. Fixes all issues in CFEngine source files and updated. All OK
- HSM stuff....not applicable to AGLT2
- No cusomizations of web doors at AGLT2
- Need to download/update PCells
- Ran dcache database upgrade. On head01 it took about 3 seconds. On head02 it took 1 minute 15 seconds
- Starting dCache had a problem in the dCacheDomain on head01.
- Complained about line 533 in the poolmanager.conf file. That line is "psu set linkGroup attribute fake-link-group HSM=none" Fix was to remove obsoleted set linkGroup attribute lines. dCacheDomain starts
- Next BUG: Problem with 'dcache' role for srm domain. Issue was that we didn't specify ALL DB names and users in dcache.conf. New default is user 'dcache' instead of our use of 'srmdcache'. Fixed in dcache.conf
- No dcache check-config problems noted. With bug resolution include upgrade took about 2 hours. Would have been MUCH faster if we had gotten the right config changes up front (Step 16 above)
Issues After Upgrade
- We have been unable to view http://head01.aglt2.org:2288/webadmin/cellinfo Something apparently needed fixing for step 13?!
- FIXED: Problem was that we had an /etc/dcache/httpd.conf file. Moved it and restarted the httpdDomain. Things work now!
- The FAX upstream redirection seems to be failing after the update.
- Will need to consult with Ilija on this one.
- Fix was found by Gerd...order of plugins needs to have redirector second instead of third. Fixed CFEngine config and upstream redirection is again working (Feb 25, 2016)
--
ShawnMcKee - 23 Feb 2016