Monitoring D0 Jobs
Samgrid monitoring is at
http://samgrid.fnal.gov:8080/ The list of resent jobs for the samgrid scheduler that is used for MSU jobs is
here.
Joel Snow's "Dial-a-Job Daemon" automatically submits the jobs that are run at MSU. See its list of running jobs at
http://www-d0.fnal.gov/computing/mcprod/dajd/dajd_status.html
Click on the link on the right and drill down. You'll find list of batch jobs, drilling into these shows their progress.
If jobs are failing to transfer files (you don't see that minbias files have been transferred), suspect the dcache system. First investigate if the gridftp is functioning (check its logs). You can stop and start the dcache service on msu4 or msu4 and msu2.
Gatekeeper
The node msu-osg is the gatekeeper and runs the condor batch system.
List batch jobs with condor_status and condor_q commands.
Check gatekeeper log at /exports/osg-0.8/globus/var/globus-gatekeeper.log
--
TomRockwell - 19 Nov 2008