Setting up Condor-CE
Condor-CE is a replacement for globus on our gatekeepers. Condor-G can still be used to submit jobs to the gatekeeper, but then the JobRouter daemon takes over, determines criteria under which a particular route will be satisfied (the Requirements expression of the route) and then re-writes the ClassAd variables specified in the route before sending it directly to the condor queue of the node.
Interesting things learned while doing this
- HTCondor-CE uses the public NIC, and ports 9619 and 9620 must be open for tcp on that NIC
- ClassAd variables specified in the Requirements statement MUST have the "target" prefix, for example
- Requirements = target.queue=="mp8"; \
- Be sure there are no blanks after the line continuation backslash
- Make sure there are no blanks on either side of the "=="
- This turns out NOT to be the case. The statement that I sent to the developers:
- The acceptance, or failure, of a route is non-trivial to work with because of a lack of debugging information in the logs from the process that parses it out.
- ClassAd variables in other statements should NOT have the "target" prefix
- GlobusRSL and InputRSL are not available to be parsed in the Requirements macro
- GlobusRSL and InputRSL ARE available parsed for use by other macros
- There can be no blank lines, anywhere, within the job routes, unless they terminate in the line continuation backslash
- Comments use C-style, not "sharp" style, for example
- /* This is a comment line */ \
- If multiple job route Requirements statements evaluate to true, they will be used on a round-robin basis for such jobs
- It is not possible to use a Condor schedd on a different machine, as read access to the spool directory must be possible. This dashed a speculation that HTCondor-CE could easily be used to load balance across multiple schedd on multiple machines.
- Several ClassAd macros are set up for us via the file /usr/share/condor-ce/condor_ce_router_defaults . If a macro is set in that file via "eval_set", then any redefinition of that variable must also use "eval_set". For example, even though the assigned value of RequestCpus in this example is a static value, eval_set must be used:
- eval_set_RequestCpus = 8; \
- It has been suggested that it is inadvisable to change the default JOB_ROUTER_DEFAULTS macro
- With "JOB_ROUTER_DEBUG = D_FULLDEBUG" set, the JobRouterLog file in /var/log/condor-ce contains a specification of the "umbrella" constraint. This contains an OR of all the routed path member "Requirements" constraints. It is the only really easy way I have found to make sure you didn't screw up any routes in setting up the configuration. Following more global constraints from default statements, and with more trailing, this can be found for the job_router setup below with 10 routes.
- ( (target.queue is undefined && target.Owner
= "usatlas2") || (target.queue =
"splitterNT") || (target.queue is undefined && ( regexp("usatlas2",target.Owner) || regexp("usatlas3",target.Owner) )) || (target.queue = "analy") || (target.queue =
"xrootd") || (target.queue is undefined && regexp("usatlas1",target.Owner)) || (target.queue = "mp8") || (target.queue =
"splitter") || (target.queue == "Tier3Test") || (target.queue is undefined && ifThenElse(regexp("usatlas",target.Owner),false,true)) )
- If a route does not show up in this umbrella match, then any submitted job that was supposed to take that route will sit Idle forever unless it also matches another route.
- The acceptance, or failure, of a route is non-trivial to work with because of a lack of debugging information in the logs from the process that parses it out.
- Submitted jobs should specify this non-default setting
- local universe jobs are excluded from any match by default,
The AGLT2 JobRouter setup and usage
The challenge (beyond just learning to do this) for AGLT2 is to cover all the situations we must consider. In the end, there were 10 separate routes as shown below. To accomplish the dynamic reconfiguration of MCORE priorities as was previously the case on gate04 (the production gatekeeper) the condor.pm and condor_submit script logic must be re-purposed into condor-ce configuration files in the /etc/condor-ce/config.d directory. With condor_submit a random value was always thrown to determine if the individual job should, or should not, be allowed access to the Dynamically configured machines. With HTCondor-CE this will become instead a 15 minute granular setup, where an HTCondor-CE configuration file is updated and the condor-ce is then reconfigured.
Two static files are used.
- 55-aglt2-configure.conf
- Default values for the MCORE prioritization plus some other over-rides
- 99-aglt2.conf
- This is the JOB_ROUTER_ENTRIES configuration
At 15 minute intervals the file "60-aglt2-mp8prio.conf" is written, with over-ride values for those in the 55-aglt2-configure.conf file
The over-ride values are written by the
same cron task sequence that has run on gate04 to accomplish this, with some additions specific to Condor-CE
The JobRouter configuration file content
# Customizations for AGLT2
#
# This is file 99-aglt2.conf
#
# For reference see also /usr/share/condor-ce/condor_ce_router_defaults
#
JOB_ROUTER_ENTRIES = \
/* Still to do on all routes, get job requirements and add them here */ \
/* ***** Route no 1 ***** */ \
/* ***** Analysis queue ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = target.queue=="analy"; \
Name = "Analysis Queue"; \
TargetUniverse = 5; \
eval_set_IdleMP8Pressure = $(IdleMP8Pressure); \
eval_set_LastAndFrac = $(LastAndFrac); \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && (IfThenElse((Owner == "atlasconnect" || Owner == "muoncal"),IfThenElse(IdleMP8Pressure,(TARGET.PARTITIONED =!= TRUE),True),IfThenElse(LastAndFrac,(TARGET.PARTITIONED =!= TRUE),True))); \
eval_set_AccountingGroup = strcat("group_gatekpr.prod.analy.",Owner); \
set_localQue = "Analysis"; \
set_IsAnalyJob = True; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 6; \
set_Rank = ifThenElse(TARGET.PARTITIONED =?= True, (64-TARGET.DetectedCpus+Cpus)*1.0, (TARGET.DetectedCpus+16-SlotId)*1.0); \
eval_set_RequestMemory = ifThenElse(maxMemory isnt undefined, \
ifThenElse(maxMemory <= 4096, 3968, maxMemory), 3968 ); \
eval_set_RequestCpus = ifThenElse(xcount isnt undefined, xcount, 1); \
set_RequestAnalyTask = 1; \
eval_set_JobMemoryLimit = ifThenElse(maxMemory isnt undefined, \
ifThenElse(maxMemory <= 4096, 4194000, maxMemory * 1024), 4194000); \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 2 ***** */ \
/* ***** splitterNT queue ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = target.queue=="splitterNT"; \
Name = "Splitter ntuple queue"; \
TargetUniverse = 5; \
eval_set_LastAndFrac = $(LastAndFrac); \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && IfThenElse(LastAndFrac,(TARGET.PARTITIONED =!= TRUE),True); \
eval_set_AccountingGroup = "group_calibrate.muoncal"; \
set_localQue = "Splitter"; \
set_IsAnalyJob = False; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 10; \
set_Rank = ifThenElse(TARGET.PARTITIONED =?= True, (64-TARGET.DetectedCpus+Cpus)*1.0, (TARGET.DetectedCpus+16-SlotId)*1.0); \
eval_set_RequestMemory = 3968; \
set_RequestAnalyTask = 0; \
set_JobMemoryLimit = 4194000; \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 3 ***** */ \
/* ***** splitter queue ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = target.queue=="splitter"; \
Name = "Splitter queue"; \
TargetUniverse = 5; \
eval_set_LastAndFrac = $(LastAndFrac); \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && IfThenElse(LastAndFrac,(TARGET.PARTITIONED =!= TRUE),True); \
eval_set_AccountingGroup = "group_calibrate.muoncal"; \
set_localQue = "Splitter"; \
set_IsAnalyJob = False; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 15; \
set_Rank = ifThenElse(TARGET.PARTITIONED =?= True, (64-TARGET.DetectedCpus+Cpus)*1.0, (TARGET.DetectedCpus+16-SlotId)*1.0); \
eval_set_RequestMemory = 3968; \
set_RequestAnalyTask = 0; \
set_JobMemoryLimit = 4194000; \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 4 ***** */ \
/* ***** xrootd queue ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = target.queue=="xrootd"; \
Name = "Xrootd queue"; \
TargetUniverse = 5; \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ); \
eval_set_AccountingGroup = strcat("group_gatekpr.prod.analy.",Owner); \
set_localQue = "Analysis"; \
set_IsAnalyJob = True; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 35; \
set_Rank = ifThenElse(TARGET.PARTITIONED =?= True, (64-TARGET.DetectedCpus+Cpus)*1.0, (TARGET.DetectedCpus+16-SlotId)*1.0); \
eval_set_RequestMemory = 3968; \
set_RequestAnalyTask = 1; \
set_JobMemoryLimit = 4194000; \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 5 ***** */ \
/* ***** Tier3Test queue ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = target.queue=="Tier3Test"; \
Name = "Tier3 Test Queue"; \
TargetUniverse = 5; \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && ( IS_TIER3_TEST_QUEUE =?= True ); \
eval_set_AccountingGroup = strcat("group_gatekpr.prod.analy.",Owner); \
set_localQue = "Tier3Test"; \
set_IsTier3TestJob = True; \
set_IsAnalyJob = True; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 20; \
set_Rank = ifThenElse(TARGET.PARTITIONED =?= True, (64-TARGET.DetectedCpus+Cpus)*1.0, (TARGET.DetectedCpus+16-SlotId)*1.0); \
eval_set_RequestMemory = 3968; \
set_RequestAnalyTask = 0; \
set_JobMemoryLimit = 4194000; \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 6 ***** */ \
/* ***** mp8 queue ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = ifThenElse(target.queue is undefined, \
false, \
ifThenElse(target.xcount is undefined, \
false, \
target.queue=="prod" && target.xcount==8)); \
Name = "MCORE Queue"; \
TargetUniverse = 5; \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && (( TARGET.Cpus == 8 && TARGET.CPU_TYPE =?= "mp8" ) || TARGET.PARTITIONED =?= True ); \
eval_set_AccountingGroup = strcat("group_gatekpr.prod.mcore.",Owner); \
set_localQue = "MP8"; \
set_IsAnalyJob = False; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 26; \
set_Rank = ifThenElse(TARGET.PARTITIONED =?= True, (64-TARGET.DetectedCpus+Cpus)*1.0, (TARGET.DetectedCpus+16-SlotId)*1.0); \
eval_set_RequestMemory = ifThenElse(maxMemory isnt undefined, \
ifThenElse(maxMemory <= 32768, 32640, maxMemory), 32640 ); \
set_RequestAnalyTask = 0; \
eval_set_RequestCpus = ifThenElse(xcount isnt undefined, xcount, 8); \
set_JobMemoryLimit = 39845000; \
set_Slot_Type = "mp8"; \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 6A ***** */ \
/* ***** mp8 test queue, Production ***** */ \
/* ***** Really only requests one cpu ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = target.queue=="mp8Test"; \
Name = "MCORE Test Queue"; \
TargetUniverse = 5; \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && ( TARGET.PARTITIONED =?= True ); \
eval_set_AccountingGroup = strcat("group_gatekpr.prod.mcore.",Owner); \
set_localQue = "Default"; \
set_IsAnalyJob = False; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 125; \
set_Rank = Cpus*1.0; \
eval_set_RequestCpus = 1; \
eval_set_RequestMemory = 3968; \
set_RequestAnalyTask = 0; \
set_JobMemoryLimit = 4194000; \
set_Slot_Type = "mp1"; \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 6B ***** */ \
/* ***** mp8 test queue, Analysis ***** */ \
/* ***** Really only requests one cpu ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = target.queue=="mp8TestA"; \
Name = "MCORE Test Queue Analysis"; \
TargetUniverse = 5; \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && ( TARGET.PARTITIONED =?= True ); \
eval_set_AccountingGroup = strcat("group_gatekpr.prod.mcore.",Owner); \
set_localQue = "Analysis"; \
set_IsAnalyJob = True; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 125; \
set_Rank = Cpus*1.0; \
eval_set_RequestCpus = 1; \
eval_set_RequestMemory = 3968; \
set_RequestAnalyTask = 1; \
set_JobMemoryLimit = 4194000; \
set_Slot_Type = "mp1"; \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 7 ***** */ \
/* ***** Installation queue, triggered by usatlas2 user ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = target.queue is undefined && target.Owner=="usatlas2"; \
Name = "Install Queue"; \
TargetUniverse = 5; \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && ( TARGET.IS_INSTALL_QUE =?= True ) && (TARGET.AGLT2_SITE == "UM" ); \
eval_set_AccountingGroup = strcat("group_gatekpr.other.",Owner); \
set_localQue = "Default"; \
set_IsAnalyJob = False; \
set_IsInstallJob = True; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 16; \
set_Rank = ifThenElse(TARGET.PARTITIONED =?= True, (64-TARGET.DetectedCpus+Cpus)*1.0, (TARGET.DetectedCpus+16-SlotId)*1.0); \
eval_set_RequestMemory = 3968; \
set_RequestAnalyTask = 0; \
set_JobMemoryLimit = 4194000; \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 8 ***** */ \
/* ***** Default queue for usatlas1 user ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = ifThenElse(target.queue is undefined, \
regexp("usatlas1",target.Owner), \
ifThenElse(target.xcount is undefined, \
regexp("usatlas1",target.Owner), \
target.queue=="prod" && target.xcount==1)); \
Name = "ATLAS Production Queue"; \
TargetUniverse = 5; \
eval_set_LastAndFrac = $(LastAndFrac); \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && IfThenElse(LastAndFrac,(TARGET.PARTITIONED =!= TRUE),True); \
eval_set_AccountingGroup = strcat("group_gatekpr.prod.prod.",Owner); \
set_localQue = "Default"; \
set_IsAnalyJob = False; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 1; \
set_Rank = ifThenElse(TARGET.PARTITIONED =?= True, (64-TARGET.DetectedCpus+Cpus)*1.0, (TARGET.DetectedCpus+16-SlotId)*1.0); \
eval_set_RequestMemory = ifThenElse(maxMemory isnt undefined, \
ifThenElse(maxMemory <= 4096, 3968, maxMemory), 3968 ); \
eval_set_RequestCpus = ifThenElse(xcount isnt undefined, xcount, 1); \
set_RequestAnalyTask = 0; \
eval_set_JobMemoryLimit = ifThenElse(maxMemory isnt undefined, \
ifThenElse(maxMemory <= 4096, 4194000, maxMemory * 1024), 4194000); \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 9 ***** */ \
/* ***** Default queue for any other usatlas account ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = target.queue is undefined && (regexp("usatlas2",target.Owner) || regexp("usatlas3",target.Owner)); \
Name = "Other ATLAS Production"; \
TargetUniverse = 5; \
eval_set_LastAndFrac = $(LastAndFrac); \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && IfThenElse(LastAndFrac,(TARGET.PARTITIONED =!= TRUE),True); \
eval_set_AccountingGroup = strcat("group_gatekpr.other.",Owner); \
set_localQue = "Default"; \
set_IsAnalyJob = False; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_JobPrio = 2; \
set_Rank = ifThenElse(TARGET.PARTITIONED =?= True, (64-TARGET.DetectedCpus+Cpus)*1.0, (TARGET.DetectedCpus+16-SlotId)*1.0); \
eval_set_RequestMemory = 3968; \
set_RequestAnalyTask = 0; \
set_JobMemoryLimit = 4194000; \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
] \
/* ***** Route no 10 ***** */ \
/* ***** Anything else. Set queue as Default and assign to other VOs ***** */ \
[ \
GridResource = "condor localhost localhost"; \
eval_set_GridResource = strcat("condor ", "$(FULL_HOSTNAME)", " $(JOB_ROUTER_SCHEDD2_POOL)"); \
Requirements = target.queue is undefined && ifThenElse(regexp("usatlas",target.Owner),false,true); \
Name = "Other Jobs"; \
TargetUniverse = 5; \
eval_set_LastAndFrac = $(LastAndFrac); \
set_requirements = ( ( TARGET.TotalDisk =?= undefined ) || ( TARGET.TotalDisk >= 21000000 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) && IfThenElse(LastAndFrac,(TARGET.PARTITIONED =!= TRUE),True); \
eval_set_AccountingGroup = \
ifThenElse(Owner=="ops"||Owner=="mis",strcat("group_gatekeeper.other.",Owner), \
ifThenElse(Owner=="gpn"||Owner=="hcc",strcat("group_opporA.CMS.",Owner), \
ifThenElse(Owner=="sbgrid",strcat("group_opporB.SBGrid.",Owner), \
ifThenElse(Owner=="glow",strcat("group_opporB.glow.",Owner), \
ifThenElse(Owner=="ligo",strcat("group_opporA.ligo.",Owner), \
strcat("group_VOgener.",Owner) ))))); \
set_localQue = "Default"; \
set_IsAnalyJob = False; \
set_IsTestJob = False; \
set_IsUnlimitedJob = False; \
set_IsTier3TestJob = False; \
set_IsShortJob = False; \
set_IsMediumJob = False; \
set_IsLustreJob = False; \
set_Rank = ifThenElse(TARGET.PARTITIONED =?= True, (64-TARGET.DetectedCpus+Cpus)*1.0, (TARGET.DetectedCpus+16-SlotId)*1.0); \
eval_set_RequestMemory = 3968; \
set_RequestAnalyTask = 0; \
set_JobMemoryLimit = 4194000; \
set_Periodic_Remove = ( ( RemoteWallClockTime > (3*24*60*60 + 5*60) ) || (ImageSize > JobMemoryLimit) ); \
]
Sample submit job
Below is a sample job submitted via grid to our gatekeeper, destined for HTCondor-CE routing
Universe = grid
grid_resource = condor gate03.aglt2.org gate03.aglt2.org:9619
+remote_queue = "analy"
+maxMemory = 4500
+xcount = 8
+maxWallTime = 1000
Executable = long_example.csh
Arguments = hello world
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_output_files = local_file_list.txt
Output = condor_ce_example.out
Log = condor_ce_example.log
Error = condor_ce_example.err
#
x509userproxy = /tmp/x509up_u55617
use_x509userproxy = true
Queue
Links to interesting and useful documentation
--
BobBall - 01 Aug 2014