The following details the IPMI setup on the AFS file servers linat06, linat07 and linat08 (H8DAR T) Node Interface MAC Addr IP Addr SwitchPort Notes...
The AGIS System and Changing SchedConfig Parameters at AGLT2 Why AGIS Information about changeover to AGIS came in this Email of 2/28/2013 Ladies and gentleme...
Michigan/AGLT2 SuperComputing 2015 Network Demonstrations This year the University of Michigan and AGLT2 are again participating in SuperComputing 2015. The venue...
APC Hardware This page will describe maintanence tasks and miscellanous information on the APC hardware used in the MSU server room (PDUs and Rack Air Removal Uni...
Initial install of ATLAS 12.0.1 software kit (Bob, Ed, Shawn 18 Jul 06). Follow instructions on InstallingAtlasSoftware, and DraftNewInstallForWB (which is more r...
The ATLAS LIVE monitor is a NEC P521 AVT purchased near the beginning of April 2011. It is located in the hallway just outside West Hall 348 (the Michigan ATLAS ...
This document describes how user with grid certificate can access the files stored in the AGLT2 dCache system! All files in dCache need to be copied to a local fi...
IO with EOS command lines User can access (list and read) files in CERN EOS from non lxplus nodes, without being authenticated. In order to get full permission(wr...
Adding Files to Space Token Areas To add files already in dCache to space token areas we need to do a number of things: * Create a new space token to hold the ...
Replaced Disks That Show "Foreign" Status Such a simple thing, but such a pain. We've all seen this, replace a failed disk in a RAID array with a salvaged disk, ...
Adding New OSS to Lustre Below are the steps needed to add a new OSS (storage server) to Lustre * Install or re purpose an SL5.5 node * Update all BIOS/Firm...
Download openafs kernel source rpm and install on build system. The SRPM from https://linat05.grid.umich.edu/pub/SLC/4x/custom/SRPMS/openafs 1.4.6 1.1_AGLT2.src....
Monitoring: AGLT2 Compute Summary Page The initial idea was to simply extract and rearrange some lines of the HTML Ganglia page for the MSU site, in order to reor...
AFS Tape Backups with Amanda Amanda Commands For operations with amanda, you should be the amanda user on bambi: "su amanda". The exception is "amrecover". Her...
How a associate a directory to certain pools user case: we need to designate certain pools for sc07,which means when we write data to a certain directory, data wi...
Intro A simple example of how to get Condor to run Athena jobs. Submitting a Simple Fastsim Job Transform Make a file called test.cmd. This sets up the basic co...
Controlling the ATLAS Queues, and the pilot rate Much of the basic command structure is documented in this document. There is also a newer document about setting...
ATLAS Software PLEASE READ FIRST: This is a guide to installing and using Atlas software of the Tier3. It is for reference for Administrators only. If you need a ...
Auto Test Programs over AGLT2 Cluster Related PNFS mount point test Purpose Make sure every computer node has "/pnfs/aglt2.org" mounted , and every gridftp door ...
Benchmark On 2950 (tunning at the kernel Level) the basic information of the system2950 basic info basic device configuration For the following tests, the basic...
* X4500 iozone results: (full test and output in attached zipfile) /opt/iozone/iozone Rb run1.result P 0 s 33357824k r 512k l 1 u 12 F /zpool1/tmp1 ...
Bi weekly AGLT2 site meeting notes Thursday, June 26, 2008 See SecurityPlanning Security Notes Need to check syslog ng configuration for all hosts and base on mo...
Details of common settings for the CMC on a Dell Blade Chassis are shown via screen shots in the attached MS Word document. Note that power supplies on the chassi...
Building Lustre RPMs for a new kernel These are very old (version 1.8) directions When we move to a new kernel on a machine where lustre must also be mounted, ne...
Building Rolls ROCKS includes the roll mechanism for packaging software for distribution. A detailed developer guide is available, it contains all the info I nee...
11/13/2007 Replaced Dell cables with Gore cables on the following machines after seeing physical link counters increase: umfs07,umfs09,umfs10,umfs11 1/25/2008...
SVN repository: insert repository here How to use and usage pointers: insert things here Notes on CalibDataClass and structure of script: notes here Creating dev...
Configuration of the UM CERN Computing Cluster in BAT 188 In November 2014, the UM CERN Computing Cluster was upgraded to SLC6. Some old hardware was retired, new...
Grid Certificate Distribution at AGLT2 The certificates in /etc/grid security/certificates are used by the OSG authentication stack. It is a regularly updated, st...
Transition from CFEngine v2 to v3, and Build dCache Pool Servers Introduction As documented elsewhere in this Wiki, cfengine2 is currently (Oct 2012) in use to c...
Configuring a simple routing change with cfengine 2 for linux Unfortunately, this is one of those "not very portable" things that other config management tool adv...
Checking out and editing CFEngine policy Some general notes and information Policies are exported from /var/cfengine/policy on umcfe or msucfe. Any directory un...
We are using charles's proddisk_clean.py to clean our proddisk regularly.. 1 login dq2.aglt2.org as dq2 2.setup the env. export LFC_HOST=lfc.aglt2.org source /a...
Cleaning Up the srmspacefile Table (SRM Space token Allocations) We recently found out that our srmspacefile table in dcache was inconsistent with our actual spac...
MSU 2008 May BNX2 DKMS Ganglia Found that the existing bnx2 network driver was the cause of the large spikes in the ganglia network plots. It intermittently pu...
Cluster Control This is the main page for information about the Cluster Monitoring and Control tools. At this time, the state of two conditions is maintained an m...
AGL ClusterHardware Topics CPU Eval August 2006. A vendor has promised to lend a dual Woodcrest node for evaluation. We don't have a confirmation of a ship dat...
Manuals Cobbler manual: http://www.cobblerd.org/manuals/ For information on the Cheetah template language used in kickstart templates: http://www.cheetahtemplate....
The following is a brief list of common errors found on the Panda production and analysis monitoring pages. All errors that are generally the fault of AGLT2 are n...
Common Tasks This page covers the common tasks for the MSU sysadmin and the privilege level required to perform it. Tier 2 root * Rebuild file server nonroo...
AGLT2 Compute Node Health Assessment Utilities General Goals These compute mode health assessment utilities were designed to assist in managing the AGLT2 compute...
Setting up Condor CE Condor CE is a replacement for globus on our gatekeepers. Condor G can still be used to submit jobs to the gatekeeper, but then the JobRoute...
Job Queing at Michigan State NOTE: THIS PAGE IS PRETTY MUCH OUT OF DATE The queing system at Michigan State has not yet been established. Job Queing at the Unive...
Condor Batch System This is the main page for administrative info about the Condor batch system(s) in use at AGLT2. User info is at CondorUser. A description of t...
Planning Condor Configuration Updates for AGLT2 Now that AGLT2 is running on an SL6.4 OS we can plan on implementing some new features in Condor that will take ad...
Fill the umatlas repository Download the rpms Download the relevant rpms into the umatlas repo from condor repo http://research.cs.wisc.edu/htcondor/yum/stable ...
What To Do after losing the dcache partions of a Node? Due to the rocks rebuilding failures. dcache partitions could be wiped during the rebuilding, thus we have ...
MSU Raritan sh /usr/local/Raritan/Raritan MPC/5.0.3.5.36/start.sh The first time running it, you will need to tell this client where the KVM is: create "new prof...
Shutdown/Startup procedures for AGLT2 Clusters Procedures to cleanly bring All AGLT2 activity to a halt * service cfengine3 stop * This prevents changes...
If a line starts with $ it is a command to be run as a normal user, if it starts with # it is a command to be run as root. * $ cd /net/data07/tests/stress_test/ ...
SingleTopDPDMaker This page is aiming at presenting a tool producing D3PDs (TTrees) and D2PDs for analyses within the Athena framework. The SingleTopDPDMaker tool...
dCache Config Overview dCache from Database to Filesystem Below is a view of the chain of configuration for the dCache system starting with the PostgreSQL databa...
Dcache on AGLT2 service distribution: all the dcache services are physically distributed to 37 nodes. tow of them are head nodes(head01.aglt2.org and head02.aglt...
Upgading dCache Starting on the morning of May 4th 2009 AGLT2 began upgrading from dCache 1.8.0 15p12 to 1.9.2.5 as well as migrating to Chimera. This was motiva...
Upgrading dCache at AGLT2 from 2.10.55 1 to 2.13.23 1 We are upgrading to the next golden release of dCache on February 23, 2016. We have setup CFEngine to have t...
(Re)Configuring the log4j.properties in dCache to output to Syslog At AGLT2 we have setup a central loghost running syslog ng which also has a php syslog ng web i...
This page is OBSOLETE All site services now run at BNL. AGLT2 DQ2/DDM Verification and Debugging One of the central tasks for our Tier2 is to support a DQ2 ser...
Just starting this out... Tom, May 21 Data Storage Locations What locations are in use and how they are used. We have a number of storage locations for AGLT2: ...
02/29/2008 Dimm 2 alerting. Cleared alert. Has not returned as of 03/03/2008. Replaced anyways. First shipment of a single DIMM did not work (wrong brand to ...
1/28/2008 kipmi0 started taking up 100% of a CPU core. Machine didn't respond to IPMI requests. Power reset (at the outlet) fixed the problem. 2/1/2008 same p...
Drive 0 generated amber alert on front panel. Alert cleared after reseating drive in bay. Reloaded node, drive 0 then failed Dell diagnostics test. replace...
Error: E1000 Failsafe E1229 CPU2 VCORE Replaced motherboard and CPU 2 from Dell support. System not showing error. Now stops and asks for F1 or F2(setup) to co...
Dcache TroubleShooting Case 1.. node c 104 2 has some broken files which mean the md5sum of local file doesnt match the md5sum of source file.. we decided to rec...
debug on the dccp on dcache admin node root@head01 ~# dccp dcap://head01.aglt2.org/pnfs/aglt2.org/data/tdhtest3 /tmp/test3 12317 bytes in 0 seconds It works,also ...
Benchmark On 2950 (tunning at the kernel Level) the basic information of the system2950 basic info basic device configuration For the following tests, the basic...
Cacti Setup for Dell Nodes The Dell PE1950 and PE2950 nodes have a large number of fans and temperature probes which are not exposed via SNMP. The presents a pro...
AGLT2/Dell.OmreportOmconfig There is a Dell ROCKS Roll, do we want that? AGLT2/Dell.DellOrderStatusMSU check status of a Dell order for MSU AGLT2/Dell.DellService...
HS06 Measurements Performed at the Dell Innovations Lab in August/September, 2017 32 bit Results, Summary Machine/Model ChipSet Speed BIOS Settings RA...
IDRAC web interace IDRAC web interface can provide the virtual console and the log files of the system. How to access the idrac web interface URL: https://idrac ...
PC6248 Reformat Note that using the command show dir seems be a good stress test. Even switches that pass the check disk commands below can fail running the show...
Power Connect SNMP Lots of info is available via SNMP from the PowerConnects. References * http://wiki.xdroop.com/space/snmp/Switching Tables * http://for...
Dell Poweredge x950 Hardware Notes Information about the Dell 1950 and 2950 nodes. Dell Docs BIOS The Fall '07 order arrived with BIOS v1.5.1. This was release...
Provisioning the Dell 2950/MD1000 Storage Servers Recipe for getting a new PE2950 and MD1000 combination going as an fss nfs appliance in our ROCKS cluster. Refer...
Details on Configuring the /etc/multipath.conf for our Nexsan Setup The multipath v2 d with our configuration gives: root@umfs03 ~ # multipath v2 d create: m...
Installing DQ2 References * https://twiki.cern.ch/twiki/bin/view/Atlas/PandaDataService Procedure Shawn created a host cert from doegrids.org for umfs02.grid...
A list to track when we are forced to completely shut off power to a DRAC card to reset it. Hopefully these are isolated incidents, possibly caused by firmware/BI...
Drive Replacement on the MD1000 Obviously, this needs some testing... A drive has failed and XFS got errors and took the filesystem offline. * unmounted filesy...
Setting up Two OSG Gatekeepers for a Single Condor Cluster We need to load balance access to our Condor cluster because of a possible time out issue we are seeing...
UMVM0x (ESX) Network Reconfiguration June 2011 The details about the network changes on the UMVM0x nodes are below organized by host with before and after NIC/por...
Installation of EX9208 Installation steps 1 Lowering of existing junipers At risk, 3 person job 1 Move fiber box 'aside' to the top of the rack, and a...
Dark Sector Event Generation This section describes getting Itay Yavin's dark sector event generation files into a format usable for ATLAS simulation. Itay's cod...
Extend Compute Need to update extend compute for use at MSU and for changes in ROCKS 4.3. Actions in version from early Nov 2007 * install libgfortran with an...
Extending LVM Disks on VMware VMs We sometimes have partitions fill during operations and when those partitions are on VMs and using LVM we can easily extend them...
Dear Atlas US grid participant: REMINDER The next USATLAS Facilities and Operations phone meeting will occur Wed 1200 1330 CDT This is a reminder about a c...
FTS Channel Management Instructions The FTS channels for AGLT2 and be managed using the glite software. You will need an ATLAS VOMS production role (voms proxy i...
Fixing OMD Setup for AGLT2 Use cases There are a number of issues we are hitting as we try to setup OMD for AGLT2 use. Below are the current list of issues. When ...
Puppet Infrastructure Setup from our repo: To bootstrap foreman and puppetmaster from our svn code: Initial setup notes Installed from puppetlabs repo: http://y...
Monitor Disk Activity with iostat and Ganglia The iostat utility from the sysstat package provides information about disk operations and throughput. It works in ...
NB, condor is now in umatlas repo, and it along with yum priorites are now installed via cf3 epel should be installed as well root@gate02 ~ # cat /etc/...
Setting up gate02.grid.umich.edu as our AGL Tier2 Gatekeeper There are a number of steps we followed to get our new gatekeeper running. Hardware We had an Int...
We have a bunch of scripts which relies on the space token information, therefore,I define 2 hash based subroutine in the dcache perl Library, everytime you add/r...
Getting a Grid Certificate for ATLAS Use For getting a new certificate or renew a certificate,you can use the CERN CA to request the grid certificate: https://ca....
Testing Glusterfs For testing purposes only we used the Redhat Storage Appliance demo which has gluster tools pre installed. Docs are here: http://docs.redhat.com...
Installing Google Chrome on SL7 for use with VMWare * Create the google chrome repo on umt3int05 google chrome name=google chrome baseurl=http://dl.google.co...
Setup of GRAM Auditing for AGLT2 (OSG 0.8.1) The current OSG installation (0.8.1) has Globus 4.0.5 which supports a new "auditing" feature. You can request that ...
A Plan for the ROCKS Graph ROCKS Graphs ROCKS uses the Redhat Anaconda installer to do installs. Using the Anaconda installer provides many advantages: * It ...
High Available Lustre MDS failover nodes with Redhat cluster tools Background: The Lustre Meta Data Server is integral to using Lustre. If it is not available...
Hardware Deployment This page will discuss the steps to deploy new hardware in the MSU server room. Preparing for a purchase Determine resource usage * Where...
HS06 Measurements Performed at AGLT2 We have made a variety of measurements at AGLT2 during September of 2009 in preparation for the upcoming purchase cycle. We p...
The following is a copy of an email off the ROCKS mailing list from May 8, 2006: On Mon, 2006 05 08 at 23:04 1000, Jonathan.Ennis King #64;csiro.au wrote: I've...
how to copy files from dcache system to your local machines? Now our dcache system support three protocols to access files in dcache:Dcap SRM and GSIFTP proto...
How To Get Data From Dcache directly by Root when using Root, you can specify the root files in 2 different ways: 1 download these files from the dcache director...
How to add a new pool node to dCache/Chimera * Provision node via ROCKS (or other method) * Install correct kernel (if not done above): currently 2.6.38 UL5...
LOST all my edits by a key sequence...(from Emacs)... * First read dCache book for info: http://www.dcache.org/manuals/Book/start/in install.shtml http://www....
How To Add New Pools to Dcache This page is obsolete for dCache/Chimera. See HowToAddNewPoolsToChimera instead. On our existing dcache system ,I add some New ...
How to Add New Storage to dCache We will look at the example of umfs16, where 12 new pools were added. Following the xfs file system creation, all disks were mou...
How to Copy data to dcache we can use dccp , srmcp or globus url copy utilities to copy data to dcache, but dccp dosnt support recursive, srmcp will finally call ...
Directions on Draining then Removing a Pool Set pool readonly. Can start drain right away, but will likely miss a few files ssh to admin domain \c PoolManager \c...
Easy Move dcache data from pool to pool UserCase we want to retire non resilient pool umfs07_2 from dcache which holds 1.7TB data,so we need to move these data t...
How to convert a RO volume to a RW volume in AFS 1) Get tokens as 'admin' via 'kinit admin' folllowed by 'aklog' 2) Check the mount: 'fs lsmount /afs/atlas.umi...
How To Extend PoolView Purpose Allows to group pools on the web pages either according to the PoolManager groups or to customized groups. How to ? Uncomment th...
How to Move data from pool to pool (how to drain a pool) *usecase*: we want to retire the node c 2 33 from the pool nodes..so we need to move the data from po...
How To rename or RM pnfs dir Rename Just do the rename as the 'root' user. You will get the error message "mv: cannot move `xxx' to `yyy': Stale NFS fil...
Setup Atlas Space Token background details about why space revervation is needed, refer to srm space reservationdcache book. steps to set up space reservation ...
Here is from an email: In dCacheSetup (in Aglt2, it is poolSetup)you need to define: metaDataRepositoryImport=org.dcache.pool.repository.meta.file.FileMetaDataRe...
Setup Srm Space reservation background details about why space revervation is needed, refer to srm space reservationdcache book. steps to set up space reservat...
Procedure for Installing or Upgrading dCache Servers Procedure for installing dCache servers. This is tested for use on the dCache storage nodes / gridftp doors....
Upgrading dcache storage hardware This is a procedure to follow when you don't wish to create a new pool but instead copy an old one to new hardware and have it o...
IO performance test with tuning (reset readahead) Here are some IO performance (we focus on the tow IO patterns :read and write)test with tuning of the readahead ...
All these test are run by IOZONE, and the max file size are set about twice of the RAM size of each server. IOZONE parameters: a run a automatic model R generat...
AGLT2 IP Addresses Information on IP addresses for the Tier2 For detail list of IPs at MSU see ask Tom for msu ips.ods or see configs/msu/network/msu ips.csv For ...
Implementing Network QoS at AGLT2 Recently we have seen periods where our LANs have been congested and packets are dropped. This has resulted in some of the monit...
TOC% This document describes how to install amanda on Centos7, and also connect it to a new tape library EMC ML3. About EMC ML3 It has 2 drivers, It has 32 usabl...
Basics steps to install dCache First, install a package from dcache.org. As of this writing we run on 1.9.4 3. cf.dcache in cfengine can manage config files for ...
Installing a Main line Kernel on Scientific Linux 6.4 or CentOS 7.2 64 bit To install a main line kernel kernel is as simple as putting in place the correct elrep...
Installation of OSG 0.6.0 on gate01.aglt2.org The installation procedure for OSG 0.6.0 on gate01.aglt2.org is below. It was installed on April 2nd, 2007. Please...
Install or Upgrade OSG at AGLT2 The main difference between these instructions and the usual documentation is that we use worker node and wlcg client installation...
KVM at MSU While most of the VM infrastructure resides within vSphere, we want to keep a backup windows install with the standalone client to debug problems with ...
LFC SQL Queries Below are some potentially useful SQL queries to check the status of the LFC. These are my test queries and I don't guarantee they are correct ...
Resizing LVM Partitions Some CERN systems were built with little space in /, with the bulk of the space in /home. However, this means HTCondor, that wants at lea...
Install Problem (really libaio architecture issue) During the "Configuration Assistants" startup I have an error. The "Oracle Net Configuration Assistant" succee...
The desire was to set up a local installation of the dq2 tools, eg, dq2_get and dq2_ls. Previous setups used by the UM group did not work for a variety of reasons...
Lustre 2.10 with ZFS 0.7.1 from standard repo This page documents building the Lustre 2.10 RPMs on CentOS 7.3 using the default yum install of ZFS 0.7.1. The ste...
Lustre At Aglt2 Lustre Deployment MDS(metadata Server) we have a failover pair of metadata servers,lmd01 and lmd02, both servers can access the same device (/...
Lustre Backup Following: http://wiki.lustre.org/manual/LustreManual18_HTML/BackupAndRestore.html Snapshots On umfs15 there are regularly scheduled hourly and dai...
DKMS on Lustre servers spl(required by zfs), zfs and lustre zfs are all using dkms on all OSS to build their kernel modules automatically once the system has a ne...
This is to deploy a test lustre file system, assume all software repository is installed I deploy a test file system on 2 nodes, with lustre 2.10.4, then I will t...
General steps to follow on Migrating data from one OST to another 1. Set the source OST in read only status from mds server, if you do not want the files to be mi...
Lustre Configuration and Setup for AGLT2 In March 2010 we revisited our exploration of Lustre for use at AGLT2. This was motivated in part by the release of Lus...
Lustre Reinstall Notes After testing Lustre "in production" (mostly tests by Tiesheng) we have decided to go ahead with our plans to utilize Lustre to replace the...
Test results comparing zfs to ldiskfs The tests below run a test Lustre system (mgs umdist10) through its paces, starting with a zfs 0.6.4.2 straight up install...
Update Lustre on a testbed from 2.10.4 (SL7.6, zfs 0.7.9) to 2.12.3 (SL7.7, zfs 0.7.13) We are trying to upgrade lustre server from 2.10.4 (SL7.6, zfs 0.7.9) to 2...
Notes on upgrade of Lustre to 2.1.6 from 1.8.4 With the implementation of SL6.4 everywhere it became necessary to also upgrade Lustre from 1.8.4, which was not re...
Notes on setting up and configuring Lustre version 2.7 Index of Sections Source rpms We have chosen to use the kernel distributed with the rpms from the Lustre ...
MultiCore Condor Set UP Introduction AGLT2 implements a mix of static and dynamic job slots for MultiCore jobs. At the time of this writing, we use 10 static sl...
Installation and Configuration of Dell MD3460 Storage Basic Hardware This page refers specifically to hardware purchased in August 2016 using RBD 2016 funds. A s...
See also: * MSUDZeroOsgSE about the storage element * MSUDZeroOsgStartup Restarting the system * MSUDZeroOsgTests Testing the OSG site * MSUDZeroOsgJo...
Monitoring D0 Jobs Samgrid monitoring is at http://samgrid.fnal.gov:8080/ The list of resent jobs for the samgrid scheduler that is used for MSU jobs is here. Jo...
Storage Element An SRM/dCache instance is added to the site as a grid accessible Storage Element. dCache is a very flexible package for combining multiple filesy...
Restarting the MSU OSG Grid How to restart the system after an outage. Bring Up and Check Services Cluster Services General cluster services are required, for in...
Testing That OSG Site is Functional Central tests OSG centrally tests all sites a few time a day. * List of all sites * Current result for MSU OSG * C...
These instructions use scripts and files found on senna at /home/koll/ * indicates step contains drill specific instructions Drill Preparation* * Check that ...
MSU Hardware Catalog This page lists hardware at MSU. Subpages provide more details and link to hardware documentation. Rack View * WesternSciRack2005 The ra...
This page is obsolete Hardware maintenance is now logged at http://glpi.aglt2.org/ MSU Hardware Repairs Until we have a better system, I'm recording hardware rep...
Room/Site Infrastructure Monitoring at MSU Liebert Air Handlers The two Liebert System/3 Air Handler units have Intellislot Web / 485 cards. See: * http://ww...
Lustre Basics The Lustre file system is made of three types of servers: the management server (MGS), meta data servers (MDS), and object storage servers (OSS). Ea...
for big three phase PDUs The rearmost PDU is 1. In these racks the rearmost PDU is inverted (its cord comes out the top). * place label like "MAC 00:00:00:00...
Submiting Jobs Often you will find yourself with the need to run batch jobs on Condor. This should be done entirely on the tier3's work/fast disks instead of on g...
Running a single command on condor runcommand.sh is a script for submitting a single line to be executed on the MSU tier 3 without having to mess around with cond...
Condor Monitoring Commands To see view of the available job slots, use the command "condor_status". To see of view of the jobs in the system submitted from your c...
Setting up the Bypass Queue The users requested a queue that would bypass the timed queues, i.e., a queue with no limits on it. The agreed upon way to denote such...
Installing new drives in a login node 1 Check that no jobs are running on the login/submit node and that no users are currently logged on (you can check who is...
Setting up a new login/submit node Hardware 1 Pick a machine that will host the new login node if one has not already been picked. (Discuss with Philippe) ...
Setting up the timed queues The user's requested several timed queues that would hold a job after it had exceeded a certain amount of runtime. These queues each h...
Starting HTCondor on a login/submit node To start condor on a login/submit node, do the following: 1 As a super user/root, use the following command: $ service...
MSU Tier 2 Administration MSU's computing resources make up approximately half of AGLT2. These machines are jointly administrated by MSU and UM. This page will br...
User Info for MSU Tier3 Regulations Your usage of the cluster must conform with MSU's acceptible use statement http://www.msu.edu/au/ Privacy The cluster is a m...
Workflows for modifying HTCondor configuration When modifying condor there are two broad phases any steps taken can be put into: the testing phase and the impleme...
Overview of MSU's Tier3 HTCondor Setup Intro Video for Admins Types of Machines on HTCondor HTCondor consists of three types of machines: submit nodes, worker n...
How to update the tier3 rack spreadsheet The spreadsheet that holds all of the rack information is located here. In order to edit this, you will need a google acc...
How to update visio 1 Log into senna. 1 Open up a terminal and run the command "rdesktop g 1280x1000 hepwin.pa.msu.edu" 1 Log into this machine with the...
Nov 2008 T2 Hardware Things that can happen whenever: * configure PDUs power strips * install power cords to PDUs * label needed network cables * get ...
MSU Tripp Lite UPS The storage racks at MSU have Tripp Lite SU6000RT3UHV UPSs. These are 6KVA models. One of the two PDU (power strips) in the rack is fed from ...
pe2950 Utility Node Install Have a pe2950 with 2x 250GB drive and 4x 750 GB drives. Want to set it up to support a variety of cluster services including running ...
Backing up and moving VMs If the VM is running you need to pull a snapshot and backup, otherwise the .vmdk may not be consistent. Spaces in VM names for some back...
Replacing disk in zpool (potentially with larger disks) ZFS documentation Replacing disk in zpool (potentially with larger disks) Note: use "parted" "mklabel ...
* RebuildComputeNode Rebuilding a ROCKS compute node * RespondToDownNode What to do with a down node * ControlledShutdown How to bring the cluster down nice...
Management of Dcache main services to maintain head01 : dcache core head02: postgresql pnfs dcache core pool nodes: dcache core dcache pool main configurati...
Manual Replication of Hot Files in dCache Particularly for the Health Check, we need multiple copies of the source file All work is performed in either a browser ...
MDTChambers MDT status application Application public link (CERN login required): https://atlasop.cern.ch/atlas point1/muon/MDTchambers/ Following are some i...
Merging Existing Space Tokens When we setup space tokens for AGLT2 we assumed we needed a space token for each VO/Role that needed to be able to write to a space ...
Hardware Transition Planning from head01 (old R610) to head01 temp (new R630) We purchased a new Dell R630 to act as replacement hardware for our existing head01 ...
Migrating VMs into ESXI Once VMs are in ESXI, sloshing them between hosts is easy. But moving existing hardware and VMs into ESXI can be a little tricky. VMware p...
Migrating files to newly added OST so as to balance content It is desirable to distribute access over as many Lustre OSS as possible, so when a new OSS umdist04.a...
Local AGLT2 Monitors There are many monitors we've implemented. These include both AGLT2 and general USATLAS pages. Summaries * AGL Compute Summary page of Ph...
MSU OSG OSG site information and policy. Currently the MSU OSG site is 100% allocated to "SAMGrid" processing for the DZero Experiment. An SRM/dCache v2.2 SE is l...
Results of the MySQL DQ2 tests: umfs02:~ # su dq2 umfs02:~ $ cd /opt/dq2 You have new mail in /var/spool/mail/dq2 umfs02:dq2 $ source config/AGLT2/environmen...
Installation of NDT on ndt.aglt2.org See also Patrick McGuigan's page at NDTInstallation. Installation overview (more details below) 1. Applied the web100 ker...
Some Addresses At U M 198.32.43.193 an interface on Nile All UM Networks and purposes:http://www.itcom.itd.umich.edu/backbone/umnet/Tool to list all known IP ass...
Potentially Useful Network Equipment Info about hardware we are considering using. Dell Powerconnect 6248 This is one of a new (Fall 2006) fixed switches that s...
Network Issues at AGLT2 This page is intended to capture the network related issues at AGLT2 Network Issues after UltraLight Router at Starlight (R04CHI) was Ret...
Planning for the production network. NetworkHardwareInfo Near term To Do List Here is a list of network related items that need doing as of February 4, 2011: ...
Network Testing and Debugging for AGLT2 During the last year we have seen many indications that all is not right with our network connections to BNL (and perhaps ...
Network Tuning and Testing On September 18, 2007 Dimitri Katramatos, Kunal Shroff and Shawn McKee tried to test and tune the following machines at BNL and Michiga...
Procedures followed to bring gate03 online as a test gate keeper NOTE: This page is changing as the procedures and tests evolve. This note will be removed once te...
Creation of New dCache Headnodes (Dell R610) in January 2011 As part of our Fall 2010 procurements we purchased 2 Dell R610 nodes to host the dCache services (he...
Procedure to Migrate dCache headnodes (head01/head02.aglt2.org) to new hardware and operating system. During fall 2009 and winter 2010, AGLT2 is migrating all lin...
Evaluation and testing of Nexsan SATABeast with B60E expansion Unpacking and Installation See photos and some comments here: https://picasaweb.google.com/ben.mee...
MSU Each rack has 2 PDUs named PDU RACKNUM N.msulocal where N is 1 or 2. For racks with UPSs, the 1 PDU is on the UPS. You can connect to the web interface usin...
Numpy and Scipy at AGLT2 The numpy and scipy software packages are in common use at AGLT2, but, the installed versions are somewhat old, having to do with the dea...
OSG Account Setup To support OSG VOs we must setup UNIX user/group accounts for the VOs. We have done this for the AGLT2 sites (UMATLAS and AGLT2_UM) and have o...
OSG CE 0.4.1 Install Instructions Introduction This document is intended for administrators responsible for installing and configuring: OSG Compute Element (CE) ...
* InstallUpgradeOSG Modified April 5, 2011, for OSG 1.2.19 install, B.Ball * UpdateOSGOnGatekeeper How to update the OSG and condor ce on gatekeepers ...
The content below was copied from the OSG install Twiki page on June 5, 2006. This was done to allow us to use this Twiki to record install details for our OSG i...
I removed all of the packages from the Dag repository on linat05. To get the list of packages and remove them I used these commands: rpm qia grep B1 A1 "Ven...
OpenAFS and Kerberos on Windows Software prerequisites Kerberos for windows. The current release of OpenAFS 1.7.4 recommends the Heimdal Kerberos implementation. ...
Setting up Oracle on Linux The following documents the installation and setup of Oracle at the University of Michigan for use by the ATLAS Muon Calibration and Al...
Installing Updated Muon Calibration Schema New schema was made available in early February 2008. Since the changes were significant I totally removed the origin...
Some info on Oracle setup at AGLT2 * Oracle Installation on linux for the ATLAS Muon Calibration/Alignment centers. * Oracle MuonDB updated (new) schema Feb...
Oracle Upgrade from 10.2.0.2 to 10.2.0.3 Prior to installing the Rome muon calibration DB for replication we needed to update our Oracle installation. I received...
Installing pCache and LSM at AGLT2 We are interested in setting up both a Local Site Mover (LSM) and pCache on our worker nodes. The goals are: * Reduce the I...
Useful PNFS/Chimera SQL Queries NOTE: This page assumes you are running Chimera/PNFS rather than the older PNFS from dCache 1.8.x or earlier. First query: Fix PN...
The PanDA Auto Exclusion process for ANALY_AGLT2 Introduction Procedures here were documented by D. van der Ster in this talk. To see this you will need a CERN ...
Client tools for Panda Analysis jobs Intro The panda client package contains following tools to submit/manage analysis jobs on PanDA. The following instructions...
See https://metalink.oracle.com/metalink/plsql/f?p=130:14:3681850522787148914::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:...
Install Postgresql on CentOS/RHEL/SL with Replication for Esmond This Wiki topic covers installing Postgresql with replication to support the Esmond DB. You will ...
Upgrading Postgresql on CentOS/RHEL/SL with Hot standby Systems This Wiki topic covers upgrading our existing PostgreSQL version 9.3.11 on Scientific Linux 6.7 64...
Upgrading Postgresql from 9.5 to 10.5 We want to go to the most recent Postgresql for use by dCache, at least on head01.aglt2.org. Currently Postgresql is version...
Postgresql on ZFS AGLT2 has been running Postgresql on top of ZFS on our head01.aglt2.org (dCache headnode) for more than 1 year. Recently we came across an inter...
The instructions use the c6 1 24 1 (Dell C6420) as an example Switch ports Available ports Look for all the switches for available ports The c6420 nodes need 2 ...
This section is already implemented for new user when the account was setup Protecting SSH Keys (or X509 Certificates) on AFS This section applies to user's with...
Proof on demand Quick start This section briefly describes how to setup a proof on demand system. Follow these steps: * Setup a root version or skip this step...
Overview See these URLs for an overview: https://twiki.cern.ch/twiki/bin/view/Atlas/PandaRun https://twiki.cern.ch/twiki/bin/view/Atlas/PandaTools Setup procedur...
Index of other pages ForemanPuppetInitialSetup unorganized notes from initial setup. Mostly you won't need these. HOWTO: Build new host with foreman Mostly se...
How msurxx was setup Create config files in SVN In the ROCKS SVN repo, below hostconfigs, copy msurxii.aglt2.org to msurxx.aglt2.org. Checkout (nominal location...
Re Adding to Database If a host needs to be re added to the database (it was erased, or the front end is being rebuilt), get the host info from SVN. root@msurxi ...
Introduction A prototype build of a file server in Rocks 5 is described, along with caveats and difficulties. In this instance, the file server is destined to be...
Configuring the Frontends Record of configuration done to frontends. Updated Dec 16th for msurx build. Connecting You can ssh to the frontend as root to perform...
Installing the Frontends The frontends are installed on VMware clients. Note that you must have a valid resolvable IP address and name or the install will fail. ...
Host Configs in SVN Have modified host config files tracked in SVN. Also have ROCKS DB entries tracked. Pull these out and put on the frontend. Storage area We...
ROCKS 5 Routing ROCKS 5 has a flexible scheme for setting routing rules. Similar to the attribute scheme, routes can be set on a global, OS, appliance and host l...
Frontend Config in SVN Have a scheme to track frontend config in SVN. A directory structure is created at /var/svn, below here modified configurations are copied....
VMware Hosting of Frontend Wish to run the frontends in VMware ESXi. The primary benefit is server consolidation. This also provides a good way to make a full b...
Raritan Dominion MSU MSU has a Dominion KX132. This is a 32 port model. In 2008, the Dominion KX series has been replaced in Raritan's line with the Dominion KX ...
How to empty all OST on an OSS, then re create the underlying Lustre file systems Motivation The underlying striping for a Lustre OST, as seen in the mail list, ...
Procedure for rebuilding a compute node In general, compute node rebuilding is fairly easy and the ROCKS should be maintained so that compute nodes can be rebuild...
(Re)Configuration of gPlazma on AGLT2 Due to issues with SRM failing that were traced to probable issues in gPlazma we are planning to implement some changes to g...
MSU Hummm... Normal procedure is to plug keyboard/monitor into node and see if there are any kernel messages on screen. On Dells also note errors from LCD. U M ...
Background Info: Unexpected Power Loss on file servers During backup generator test on 12 may 09 at the MSU BPS bldg , most UPSs received an errant EPO (Emergenc...
Recovering from a Lost Pool When we lose a pool we need to do a number of things to recover. Once we determine we have really lost the pool we will need to find t...
Reinstallation of Oracle for the Muon Calibration Center On May 2nd, 2009 our primary Oracle server (umors.grid.umich.edu) was compromised because we had forgott...
* SwitchAccess including how to find where a node is on the network * NodeConsoleAccess Including via KVM and IPMI/DRAC * NodePowerControl including PDUs an...
Removing PNFS (Chimera) Ghosts There is the possibility that the chimera DB can become out of sync with the actual files stored on disk. The t_dirs table holds t...
Replicating the Oracle Controlfile For safety it is good to replicate the controlfile for an Oracle DB. Our muoncal instance (after reinstalling in May 2009)...
Node down in Ganglia If a node is down in ganglia do this (assumes ganglia config is ok...): * From frontend, ping private interface * OK: try ssh to nod...
Reworking AGLT2's Logging Setup In upgrading atgrid we have an opportunity to migrate from syslog ng and php syslog ng to something new. The ELK stack (Elasticse...
To build the rocks distribution (in /home/install/ or /export/rocks/install/): rocks create distro To have a node reinstall: rocks set host boot hostname action...
Well, I haven't actually done it, but here's the directions. Just reverse the architectures since we're running a x86_64 cluster and are going to be kickstarting ...
Useful Rocks Links * Rocks v5.2 User Guide the User Guide... * Kickstart XML Reference Just remember that the "var" tag no longer works and that we use...
ROCKS 5.3 * R53abMSURXX Setup of msurxx * R5abFileServer Building a file server and configuring with cfengine * UMRocksiSetup Taking umrocksi.aglt2.org f...
Manually Adding Nodes to Database The insert ethers command used to support manually adding nodes, it no longer does, however this can be performed using the rock...
Intro Here are listed releases or "tags" of the ROCKS installation. Issues with each can be added here. Summer 2011 making an attempt to maintain this page go...
lighttpd service is running on client during install. it matches URLs that have HOST == 127.0.0.1. Then does a redirect of "/install/(. )$" = rocks by.py?filena...
ROCKS Node Info A feature that is weak or missing from ROCKS is a way to add user defined parameters on a per node basis. There is a mechanism for adding user de...
Cross Kickstarting in Rocks 4.3 So you want to cross Kickstart nodes that aren't the same architecture as your front end? Don't worry, rocks can do that, or it's ...
Building a ROCKS client worker node Build Server Status The software revision for ROCKS and CFEngine that are active on the ROCKS frontend are shown here, the ne...
DNS in ROCKS References: * http://rscott.org/dns/ DNS Oversimplified This page written for ROCKS 4.3 with the update given below. ROCKS will manage the config ...
In ROCKS5 whole new scheme for user customization of partitioning. The annoyances with getting custom partitioning done in ROCKS4 seem to be gone we no longer ne...
ROCKS FAQs Introduction FAQs are divided into groups... Database management using ROCKS command Add an appliance This will add new entries in the membership an...
First begin by installing your Frontend by following the directions in the Rocks users guide, make sure you include the service pack roll, as it's required and fi...
To manually add a node in Rocks v5.2 you only really need two commands provided that the appliance type already exists, you could check that it's in the list when...
ROCKS 4.3 Install From Scratch Install log of ROCKS 4.3 and SLC45 on a Dell Poweredge 2950 server and 1950 client. The client will be installed over the network ...
Generic Kickstart Want to perform a network install of a node that won't be a ROCKS client but using the ROCKS frontend as the kickstart server. Have tried and f...
ROCKS Graphs Default graph in ROCKS52 Initial new AGL graph structure in ROCKS52 How To make them View on frontend web page (under "misc admin"), or... rocks lis...
ROCKS Installer This page is a description of how the ROCKS installer boots, how it requests the kickstart file for the node, how the server generates the kicksta...
ROCKS This is the main local page for the ROCKS cluster software. Subpages: * BuildingRocksRolls * RocksAglReleases Notes on configs used in production ...
ROCKS MySQL Database ROCKS stores configuration information in a MySQL database. Normal operations on configuration are performed with the rocks command, but thi...
ROCKS Frontend on VMware Running ROCKS frontend on VMware can simplify some management and recovery operations. The ROCKS frontend functions don't require much c...
Configuring to PXE boot Servers from the Rocks 5.5 HeadNodes From Ben on 10/10/2012, the following procedure can be used to PXEboot a machine into SL6 via Rocks 5...
Install on Frontend Setup EPEL yum repo On test ROCKS5 frontend, install puppet server from EPEL repo. Note that redhat.com is not reachable from aglt2.org. But...
Using RCS in ROCKS and How ROCKS Uses RCS ROCKS uses RCS on files that are written or appended with the file tag in kickstart xml. This provides some possibili...
Recovery Roll The ROCKS server can automatically build a recovery roll that allows the server's configuration to be reproduced on a new ROCKS server install. Thi...
ROCKS Site Sync We have two ROCKS clusters and wish to keep their configurations synchronized. This page will describe how to do that. Note that a closely relat...
ROCKS Kickstart XML Style Guide and How To The kickstart XML and the accompanying scripts (extras directory) specify much of the configuration for nodes on our cl...
Managing the ROCKS Installer with Subversion See local subversion pages at Subversion Creating a Branch or Tag SVN root@msurox /home/install # svn copy m "c...
Test Frontend Wish to be able to separate production and test use of frontend. The idea is that the normal production frontend can be maintained with a well defi...
ROCKS Test Server Wish to have a second server on cluster to enable testing of new ROCKS configurations. It seems that it will be simpler to have an entirely sep...
Directory /home/install/tools under SVN control. Directory /home/install/tools/bin intended for adding to PATH as desired. Main.TomRockwell 29 May 2009
Update Installer Kernel Warning this is a cludge. Darn seems to work fine with the r610 hardware, but on the existing pe1950s, hardisk doesn't get mounted for rei...
Routine Tasks This page will describe the tasks that must be done on a regular basis. Daily * Check email from users / admins / ticketing system / monitoring ...
Setup and Running ATLAS Software (from Ed Diehl email) I have found in the past that the validation scripts have errors themselves, or there are other obscure pro...
SEC.pl (System Event Correlator) There is a nice two part article on SEC which describes how it works and what it provides. I encourage you to look it over. Th...
Creation of gate04.aglt2.org, the SL6 gatekeeper Core dump style... Not bothering for now to make this pretty, just recording the actions taken 2 cd /etc/yu...
Issues Fixed in CFEngine for SL7.3 Upgrade Known Issues Need to limit yum output on overnight updates so that so many Emails are not sent. The update_dell_firmwa...
AGLT2 SRM Hangs Starting in late April 2009 AGLT2 was having more and more dCache/SRM issues. One problem that significantly increased in frequency was SRM faili...
Security Monitoring: Setup and Configuration * Snort setup information and configuration * Syslog ng setup for AGLT2 * SEC setup and configuration usin...
Security Planning Config Changes to Tighten Security Ideas from June 26 meeting: * Firewall changes, see SystemInstallChecklist * See below...implement...
See http://www.sensatronics.com/index.php/industrial monitors/model e4.html Need to connect using serial port to make IP configuration. It has a web server and ...
Tier2 Services at UM Services for Tier2 job submission and remote monitoring are distributed across several physical machines at UM. Below is a breadown of what ...
Setting up ATLAS Area HOTDISK for AGLT2 As of mid September 2009 we need to provide a new space token area in Tiers of ATLAS call 'HOTDISK'. There are a few com...
Athena can be tricky to set up and run under your user account. These are some minimal directions to follow. The ATLAS Computing Workbook is chock full of helpfu...
In order to setup your GRID Certificate, you need to have already completed the initial steps of requesting the certificate, registering for membership in the ATL...
Setup and Configuration of the AGLT2 MD3820i This details our installation and configuration of our new MD3820i (UMVMSTOR03). We received both units on August 4th...
Installing gssklog/gssklogd on our cluster We have user home spaces (including grid "group" accounts) in our AFS cell (atlas.umich.edu). Currently any user tryi...
Setting up LVS (Linux Virtual Server) for use with dCap Newer linux kernels have LVS built in (as well as our UltraLight kernels). See http://kb.linuxvirtualserv...
Setup and Configuration of the AGLT2 MD3600i July 25, 2011 This details our installation and configuration of our new MD3600i (UMVMSTOR02) plus MD1200 shelf. W...
Setting Up OMD on AGLT2 Systems Monitoring for AGLT2 has used lots of different software: Ganglia, Syslog ng, Cacti, Nagios, Shinken, Rancid, Monit/MMonit, OpenMa...
login to any of the interactive machine(unt3int01 05), run the following commands #localSetupATLAS or run #source /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/u...
Setting up SSH Keys for AGLT2 SSH is able to use a variety of methods for authenticating users. Each method has security strengths and weaknesses. The normal user...
Notes on Building ShawnGenerator Below are the series of steps I used to create my "ShawnGenerator". This generator is based upon source code from Loek Hooft va...
This page keeps a "shopping list" of needed equipment and parts as well as a reference for suppliers. Add items with a date. When items are purchased, please ma...
HowTo Shutdown a Pool Node While In Production We sometimes need to restart/reboot pool nodes and would like to make this as least disruptive to the production sy...
Site Blacklisting via DQ2 Commands If you want to exclude your site from DQ2 you need to use the dq2 set location status command. The specific command is: dq2 set...
Using Slony to Replicate dCache Postgresql DBs We have been running postgresql 9.0.x on our dCacheheadnodes for almost two years. Currently we have have the follo...
Running ATLAS software on SLC43 x86_64 We had installed the mars01.cern.ch mars05.cern.ch systems with SLC43 and an experimental kernel. Though we could install ...
ATLAS Software Tutorials Introduction This is a listing of tutorials for using the ATLAS Software. When possible, tutorials specific to usage on the G...
Solaris Installation notes A lot of this was compiled and figured out via extensive reading of this forum. There are a number of confusions and misleading direct...
Squid rpm Installation or Update Follow directions at this OSG Twiki page for installing. More directions are available at General CERN Twiki page. Summary step...
Startup and Shutdown This page will describe the shutdown and start up procedures for the cluster Unplanned power problems This will be a discussion of how to gr...
Configuring new storage This page describes how to configure newly purchased storage. 1 Start server 1 At BIOS prompt for storage controller setup, press Ct...
How to submit test jobs in panda from umt3 interactive nodes 1 get permission to access BNL CVS BNL CVS is a mirror to Cern CVS, it is a readonly CVS, you can us...
Subversion Subversion is a software revision control system designed to be an improvement on CVS. It generally replicates the features of CVS. References * T...
Proxy server This system has no access to outside world, and in general systems can't talk to non edu locations. To be able to download things, use proxy server. ...
Installation, Setup, and Usage reference for Sun hardware and Solaris SolarisOnPE2950 Installation of Solaris 10 (08/07) on umfs11 SunX4540ConfigFreeBSD In...
Hardware notes here...little to say about setup. As the manual also says, hit appropriate keys during boot and configure the management/IPMI card using an exter...
Operational notes Setup zfs filesystems and quotas for users Mkdir as usual: mkdir /atlas/data19/bmeekhof chown bmeekhof:umatlas /atlas/data19/bmeekhof Set quota...
This is now obsolete thor01 now runs FreeBSD. See SunX4540ConfigFreeBSD. Zpool configuration Destroy existing pools (pool1 through pool4): zpool destroy pool1 T...
Info about actually using the 4540 is at SunX4540ConfigSolaris Migrating pre installed X4540 Solaris 10 to boot from flash drive NOTE: There's probably no need ...
04 10 2008 It sure did. Once again I put "usatlasgrid" in as public read only community and copied to startup config. Not much definite to say about it, but on...
MSU Two switches are at msu sw1.local and msu sw2.local. To find a given node's switch ports(s): Option 1 access the switch web interface and browse for the port ...
System Install Checklist (UM Systems) Attached to this document is a tarball containing reasonable examples, many of which can probably be used with no modificati...
Testing of New dCache Storage Node Build Here is the testing result for a new storage node (summary: it works). After running \x91install_dcache.sh\x92 on UMFS16...
Getting the MSU site up in ROCKS Had some initial difficulty getting the compute nodes installed. Restarted from scratch with the following plan: * Clean up /...
Installation Details for VDT v1.6.0 We are trying the VDT v1.6.0 installation on gate01.grid.umich.edu following instructions at the VDT 1.6.0 release note page. ...
AGLT2 Account and Resource Policy In order to meet the requirements placed upon our Tier2 we are implementing a GUMS/VOMS/PRIMA configuration (so called "Full Pri...
This document helps the UM Tier3 users to diagnose their condor job problems. Submitting Machines Tier 3 users can submit their condor jobs from the following ma...
0. Useful webpages This webpage is a mix between a tutorial and reference. If you are just interested in a quick overview of useful condor commands, just google "...
Setup and Configuration of USATLAS Tier 3 Queue Background To assist Fred Luehring in testing remote, Tier 3 job queues for USATLAS, we have set up a test PanDA ...
Using TortoiseSVN with Putty/Pageant for SVN access to AGLT2 SVN server Prerequisites Install TortoiseSVN from http://tortoisesvn.net/downloads.html Setup putty a...
Track Transfers to dCache Just some quick notes on tracking specific transfers to dCache. * First, make sure the SRM logging is verbose enough (going to catali...
Trouble Atlas Atlas Analysis Job mishandled OSG APP Paul, Bob, OSG_APP should be "/atlas/data08/OSG/APP". "atlas_app/atlas_rel" are subdirectory created when i...
Setups taken from bl 13 1 when it was set up as an interactive machine root@bl 13 1 network scripts # cat ifcfg eth0 (This NIC is NOT trunked) DEVICE=eth0 HWAD...
Athena Code Checkout Overview At Michigan we have Athena "kits" installed which have only Athena binaries located at /afs/atlas.umich.edu/atlas/software/kits . ...
Athena Software Setup at Michigan Overview We have installed an Atlas software "mirror" and the Athena "Multi" installation of all 12.X.Y Athena versions on the ...
Athena Kit Installation Technical Details Overview This document gives the details about how the Michigan Athena kit installation is installed and how to update ...
Atlantis Event Display The Atlantis event display is installed in directory /afs/atlas.umich.edu/atlas/software/atlantis. To run the Atlantis you type the followi...
Michigan Computers Overview The Michigan computer cluster consists of several interactive machines, and 2 condor batch queue clusters. Here is the current list ...
Condor Setup at Michigan Overview Condor is University of Wisconsin system to run batch jobs on CPU farms and/or random groups of desktop machines. Condor jobs a...
Copying files to/from CERN AFS copying from AFS with 'cp' If you get AFS tokens (with klog cell CERN.CH) you can copy files directly from CERN AFS space with sp...
Pacman Notes Overview pacman is a "package manager" which is used to install Athena Kit Distributions. This document gives some additional notes about how to us...
SSH password free login Introduction SSH offers the possibility to login without using passwords using shared keys. Not only is this more convenient than using ...
Syslog NG Log Processing on ATGRID The syslog ng software can generate a significant amount of data over time. On atgrid.grid.umich.edu we have a RAID5 system be...
Installing and Using X2Go for UM T3 Users This page describes how to get the X2Go client software, install it on Windows (explicitly, it will likely also go on a ...
Michigan Test Page This page is to the responsivenss of the wiki. It has no other purpose. * My text goes here * Let us DO another bullet A. Item 1 A. ...
Tier3 for Users For information on using ATLAS software please see this section of our index page: WebHome#AGLT2_User_Information Information here includes how to...
UMATLAS yum repository NOTE: As of January, 2018, sysprov02, an SL7 VM, has replaced sysprov01, and sysprov01 has been shut down. All refs to sysprov01 below have...
Have not tried swapping cable, card,switch port yet. Oddly slow network speeds between this system and any other with iperf. Doesn't matter if umfs10 is client...
03/25/2008 replaced DIMM B, 1B and DIMM B, 1A with same from oc 1 23. Watching for new alerts, also watch 1 23. 03/24/2008 machine check exception. Believe ba...
Update Kerberos on our Servers The kerberos servers were installed long ago when DES was the primary encryption. We need to change to using newer more secure algo...
How to update OSG and condor ce on gatekeepers The following steps should be first tested on gate03, if it works, then do it on gate01/02 Please note: the gate ke...
CPLD Firmware Updates The CPLD has to be updated outside the OS environment using a bootable USB drive or creating a bootable ISO image and uploading the image to...
Updating LFC for AGLT2 The LFC host for AGLT2 is lfc.aglt2.org. This is a VMware VM (SL5.2/x86_64). As of September 13, 2009 the LFC software was installed in /...
Upgrading from GPlazma1 to GPlazma2 on dCache 2.2.12 In preparation for upgrading to dCache 2.6 we need to reconfigure our dCache GPlazma configuration to go from...
Upgrade Postgres on AGLT2 dCache ADMIN node During our recent upgrade of dCache we also ended up upgrading our Postgres installation on the dCache PNFS node (head...
ROCKS Upgrade Procedures It would be helpful to write down how the following are performed: * Upgrade ROCKS version (minor and major) * Upgrade OS version (...
Updating umrocksi.aglt2.org to the most recent SL5.4 rpm set Following is a description of the process followed to ready a Rocks 5.3 client node built with the SL...
On December 23, 2005 I "upgraded" umfs01.grid.umich.edu from the i686 (32 bit) version of Scientific Linux 4.1 to the x86_64 (64 bit) version of Scientific Linux ...
Updating and Upgrading dCache Headnodes for AGLT2 June 2013 The dCache headnodes head01.aglt2.org and head02.aglt2.org were transitioned to VMware VMs in 2012. As...
Upgrade Planning for AGLT2 SL5 Systems Introduction We need to upgrade our remaining SL5 systems to SL6 soon. We should use this page to track which systems stil...
Reconfigure dCacheConfig to use both Cell and Module As noted before, the preferred gPlazma configuration uses the cell method but it has been pointed out that th...
Useful Links This page will link to many pages useful for day to day administration Monitoring * Ganglia Monitoring * PerfSonar (latency) (bandwidth) * ...
Intro This is a guide for using Athena on the Tier 3 that all users should follow. Useful webpages: * ATLAS computing workbook * nice introdution, a bit ol...
Using CVMFS at AGLT2 General Information CVMFS is a new method of distributing ATLAS software that relies on using central repositories of software on servers lo...
Setting up Geant4 A 64 bit gcc 4.1.2 Geant4 installation for SL5 is located at /afs/atlas.umich.edu/opt/geant4. All available data packages are included. For a li...
Using "Monit" for Monitoring and Repairing AGLT2 Services NOTE: THIS PAGE IS NOW MOSTLY OBSOLETE, WITH MONIT INSTALLED VIA CFENGINE The monit application monitors...
Using OMD and GLPI for AGLT2 We have some nice tools installed to monitor our systems and software (OMD/Check_MK) and track the resolution of problems (GLPI). It ...
Using Pathena to Submit Jobs Once you are logged into the tier3, you need to source the panda setup scripts by doing source /afs/cern.ch/atlas/offline/external/GR...
Want to do test installs nodes in a VMWare ESXi guest. Expect that more things can be made to work similarly to an install on a physical host, but expect that th...
VMware perfSONAR Plots for Debugging Network Issues As part of AGLT2's VMware setup, we have created three ESX host specific perfSONAR instances running OWAMP (on...
VMWare Setup and Updates This page should keep track of VMware related setup/updates and information. Update to vSphere 5.1 This section will document the detail...
VMWare vSphere Upgrade at AGLT2 In September 2012 VMware release a signficant upgrade to vSphere: 5.1. There are a number of nice features that we want to benefit...
How to Vacuum Postgresql DB sometimes, when too many deletion or updating opereations happened to the postgresql database, it would run into a transactionid probl...
Video Conferencing Help Asking for help or suggestions Email: aglt2 umich #64;umich.edu 348 West Hall Howto Guides Set outputs for each screen On the "HDMI ...
Virtuozzo Information and HowTo We have been testing Virtuozzo on our new virtualization hardware. Virtuozzo runs multiple "servers" on a host system, sharing t...
WLCG Accounting for Tier 2 Sites This page contains some plots showing WLCG Tier 2 accounting results for Tier 2's worldwide. Currently the plots are only availa...
Atlas Great Lakes Tier2 Web How to contact us * For problems, contact us through our signatures here: * Main.WenjingWu AGLT2 Manager and University of M...
AGLT2 Web Preferences The following settings are web preferences of the AGLT2 web. These preferences overwrite the site level preferences in . and , and c...
These nodes use the MSI barebones chassis, OEM for the IBM 325. We purchased 20 of them from Western Scientific Spring 2005. They were all upgraded to dual core...
Rack Equipment * msi05 compute node WesternNode2005 * sm06 compute node TeamHPCNode2006 Rack Layout Notes: * RU Slot 42 is at the top of rack; 1 is at t...
Installing and Using X2Go We are dropping the Remote Desktop machine aglt2rd, and replacing it with a Linux machine set, starting with bridge um at the UM site, a...
Tests of ZFS on SunX4540 running FreeBSD 9.0 For setup notes see SunX4540ConfigFreeBSD For nearest comparison of similar system running Solaris see BenchmarkOnX4...
Introduction xCAT is a cluster management tool originally developed at IBM and now Open Source. xCAT v1 was rewritten with much of the same functionality but a n...
Installing Follow the Install chapter of "Top Document" xCAT2top.pdf xCAT is installed with the command: yum install xCAT (There's some stuff to do before and aft...
yum cron Configuration in SL7 Un modified yum cron ALWAYS sends emails upon completion. This is an overwhelming flood given the number of systems we have. We th...
Using ZFS on Linux for AGLT2 AFS Fileservers Recently ZFS on Linux became available. ZFS has lots of nice features including Copy On Write (COW), data integrity v...