Monitoring D0 Jobs

Samgrid monitoring is at http://samgrid.fnal.gov:8080/ The list of resent jobs for the samgrid scheduler that is used for MSU jobs is here.

Joel Snow's "Dial-a-Job Daemon" automatically submits the jobs that are run at MSU. See its list of running jobs at http://www-d0.fnal.gov/computing/mcprod/dajd/dajd_status.html

Click on the link on the right and drill down. You'll find list of batch jobs, drilling into these shows their progress.

If jobs are failing to transfer files (you don't see that minbias files have been transferred), suspect the dcache system. First investigate if the gridftp is functioning (check its logs). You can stop and start the dcache service on msu4 or msu4 and msu2.

Gatekeeper

The node msu-osg is the gatekeeper and runs the condor batch system.

List batch jobs with condor_status and condor_q commands.

Check gatekeeper log at /exports/osg-0.8/globus/var/globus-gatekeeper.log

-- TomRockwell - 19 Nov 2008
Topic revision: r2 - 01 Dec 2008, TomRockwell
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback