Controlling the ATLAS Queues, and the pilot rate
Much of the basic command structure is documented
in this document. There is also a newer document about setting the queue states
here. But we have implemented a local cron script for dynamically setting the total number of queued pilots waiting for a task, based upon the script written by Charles Waldman.
Setting the queue states
We have 2 queues here, AGLT2-condor (production), and ANALY_AGLT2-condor (analysis). A valid proxy must be used for these commands to work. So, for example, following my own "grid-proxy-init" I can issue the following command set:, where Q_state is one of setonline, setoffline, or settest.
curl --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --capath /etc/grid-security/certificates 'https://panda.cern.ch:25943/server/controller/query?tpmes=Q_state&queue=ANALY_AGLT2_SL6-condor'
Note that the newer document above also suggests adding the following to the curl command:
&comment=THE.ELOG.NUMBER
Where THE.ELOG.NUMBER is one of the the relevant eLog or GGUS entries.
We will only set our own queue state if we are doing some local testing. Panda shifters are responsible for bringing us back online after a problem is noted, and must first send us test pilots that are confirmed to work.
Recent (2011) changes in handling the Analysis queue testing now favor setting the queue to brokeroff with comment HC.Test.Me . This allows HammerCloud to test and validate the site before jobs may resume. Following is the syntax of such a command. Syntax changed 9/17/2013 for SL6.
curl --cert /tmp/x509up_u`id -u` --cacert /tmp/x509up_u`id -u` --capath /etc/grid-security/certificates 'https://panda.cern.ch:25943/server/controller/query?tpmes=settest&queue=ANALY_AGLT2-condor&comment=HC.Test.Me'
--
BobBall - 12 Mar 2009