Procedures followed to bring gate03 online as a test gate keeper
NOTE: This page is changing as the procedures and tests evolve. This note will be removed once testing is complete and the odd comment here and there below are cleaned up.
Bringing up the cloned gate03
On gate03, the cloned gate01, did, at single, as it came up the first time
vdt-control --off
chkconfig condor off
reboot (to get correct IP and hostname)
- Had to rebuild vmware tools. No NICs and service startup failures.
- Turning off 3 cfengine services.
- Commented out crontab entries via ##
Differences between PACman and RPM install
See this URL:
https://twiki.grid.iu.edu/bin/view/Documentation/Release3/RPMWhatsNew
Some components are not yet packaged as RPMs and you still need to get them from existing Pacman installs. In particular, GUMS and the Gratia Collector are not yet provided via RPM.
- $VDT_LOCATION no longer exists
- $VDT_LOCATION/setup.sh no longer exists and isn't needed. For user jobs that still expect $OSG_GRID/setup.sh to exist, a dummy has been placed in /etc/osg/wn-client/setup.sh, and you can set $OSG_GRID to /etc/osg/wn-client.
- $GLOBUS_LOCATION no longer exists
- The single config.ini file has been replaced by a directory of files in /etc/osg/*.ini. They are read in alphabetical order.
- configure-osg has been renamed osg-configure.
Creating the CE
Follow directions
here and make Resource Group AGLT2_TEST Next, follow directions
here and create AGLT2_TEST_CE for gate03. Following this, an OSG ticket is created, and the gate keeper in the new resource group is marked as Inactive until the next weeks OSG management meeting. I attended this meeting, answered questions for the admins, and the resource was then activated.
This activation is necessary to initiate reporting to the BDII, else releases are not Tagged as available. To see this, the following commands are useful:
- lcg-info --vo atlas --list-ce --attr Tag|grep gate03.aglt2.org
- lcg-info --vo atlas --list-ce --attr Tag|grep -A 100 gate03.aglt2.org
- Following edit of /opt/osg/osg/etc/config.ini, did
- [gate03:etc]# configure-osg -v
- [gate03:etc]# configure-osg -c
- Must also disable:
- Should now be able to start condor and do
- Go into gums servers linat02/03/04 and add the "Host To Group Mappings" for gate03, identical to gate01 mapping.
- Add new gate03 hostcert and httpcert (service)
Add rsv account as shown in these directions:
SchedConfig Changes
copy AGLT2-condor.py to AGLT2_TEST-condor.py with gate01 -> gate03. This will look like a production queue.
(
later to run the osg-wn-client rpm set on the WN, change this as follows
envsetup' : 'source /afs/atlas.umich.edu/OSGWN/setup.sh;'
envsetup' : 'source /etc/osg/wn-client/setup.sh;'
)
- Request the addition of the test gate keeper for HC testing. To do this, send email to
Hi Gianfranco,
Could you please add AGLT2_TEST queue to MC HC test?
Thanks, Yuri (ADCoS expert)
is is enough to add it to the 2 tests that do not trigger auto-exclusion, or do you want the full suite
(including auto-exclusion/inclusion)?
For gate03, the 2 tests are sufficient. It can take up to 24 hours to begin such testing.
OK, I have added the queue to template 450 (PFT Evgen_trf 16.6.5.1) and 164 (PFT Reco_trf 16.6.5.5.1),
both of which are not used for auto-exclusion.
Test plan
- Submit test jobs to gate03 in standard way
- Works, both via globus tests and submission from splitter.
- Set the queue in test state
- [ball@gate01:~]$ curl --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --capath /etc/grid-security/certificates 'https://panda.cern.ch:25943/server/controller/query?tpmes=setmanual&queue=AGLT2_TEST-condor'
- Set queue nickname='AGLT2_TEST-condor', siteid='AGLT2_TEST' to manual
- [ball@gate01:~]$ curl --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --capath /etc/grid-security/certificates 'https://panda.cern.ch:25943/server/controller/query?tpmes=settest&queue=AGLT2_TEST-condor&comment=HC.Test.Me'
- Changed status of queue nickname='AGLT2_TEST-condor', siteid='AGLT2_TEST' from offline to test
- Asked pandashift for a batch of test jobs....
- Got a few, but the HC testing above is the true test of the gate03 workings.
- HC testing succeeding.
- Upgrade condor to 7.8.1
- rpm -Uvh ...
- cd /usr/bin; mv condor_submit real_condor_submit; cp -p new_condor_submit condor_submit
- Test jobs continue to successfully run. Moving on.
- Note: grid jobs submitted from splitter to gate03, and running on Condor 7.6.6 WN, will not correctly transfer output files back to splitter
- Same is true for locally run jobs on gate03, output files are called _condor_stdout and _condor_stderr
- There is no such issue if the WN is running 7.8.1
- Submit test jobs to using osg_wn_client rpm set from gate03. Make sure they work.
- Convert gate03 to rpm set and again submit jobs to osg_wn_client rpm set.
- OSG 3.x comes with GT5. When we switch, please, send me (Jose Caballero) and John Hover an email so we can adjust the pilot factory.
yum --enablerepo=osg install empty-ca-certs
yum --enablerepo=osg install osg-ce-condor
yum --enablerepo=osg install globus-gram-job-manager-managedfork
Carefully check over all of these files:
[gate03:yum.repos.d]# cd /etc/osg/config.d
[gate03:config.d]# ll
total 48
-rw-r--r-- 1 root 866 Oct 31 2011 01-squid.ini
-rw-r--r-- 1 root 1698 Mar 9 16:54 10-misc.ini
-rw-r--r-- 1 root 2370 Oct 20 2011 10-storage.ini
-rw-r--r-- 1 root 341 Aug 29 2011 15-managedfork.ini
-rw-r--r-- 1 root 1204 Dec 7 2011 20-condor.ini
-rw-r--r-- 1 root 1453 Feb 23 13:05 30-cemon.ini
-rw-r--r-- 1 root 8003 Apr 2 15:30 30-gip.ini
-rw-r--r-- 1 root 1884 Oct 31 2011 30-gratia.ini
-rw-r--r-- 1 root 339 Dec 7 2011 40-localsettings.ini
-rw-r--r-- 1 root 1442 Mar 9 16:54 40-network.ini
-rw-r--r-- 1 root 2325 Aug 29 2011 40-siteinfo.ini
Preparing for the rpm install
The RSV must be installed separately from the CE
Actions taken
- Installed yum-priorities, epel-release was already in place.
- Installed osg repos, default priority is already in place
rpm install methods
- On gate02 do yum install osg-ca-certs
- On gate01 and gate03 do yum install empty-ca-certs
- gate01: remember to set the queues to "brokeroff" to drain activated jobs
- Referenc this page and find the following should be performed
- edit /etc/yum.repos.d/osg.repo to add/change the following lines:
- exclude=condor empty-condor*
- enabled=0
- When ready, choose "yum install osg-ce-condor" as the rpm to install.
--
BobBall - 23 Jul 2012