The following steps should be first tested on gate03, if it works, then do it on gate01/02
Please note: the gate keeper run both condor and condor-ce, both are updated from the osg repo, htcondor-ce has a different version rather than the condor version which is defined by OSG release.
condor-ce is receiving jobs from different vos according to the job rounter, and condor shedules the jobs to the work nodes of the cluster.
Identify which is the current version of osg-release from
https://repo.opensciencegrid.org/osg/3.4/el7/release/x86_64/
In osg.cf (cfengine), edit it to change it to the available version
"osg-release-3.4" package_version => "9.osg34.el6", rpm_url("https://repo.opensciencegrid.org/osg/3.4/el7/release/x86_64");
On the node, if an older osg-release is installed, need to remove the old version, then run
#yum remove osg-release -y;cf-agent -Kf failsafe.cf;cf-agent -K -b osg;
#yum update osg* -y
#yum update htcondor-ce htcondor-ce-condor htcondor-ce-client -y #cfagent -Kf failsafe.cf;cf-agent -K #systemctl restart condor-ce;systemctl status condor-ce
Log files are in /var/log/condor-ce/
To view the history
#yum history #yum history info transaction-id
#condor_q
Check if there are continuous jobs coming to queue and run
Check job finishing status from either panda AGLT2_Test queue or the ATLAS analytic platform
-- WenjingWu - 07 Feb 2020