HowTo Shutdown a Pool Node While In Production
We sometimes need to restart/reboot pool nodes and would like to make this as least disruptive to the production system as possible. There are two things to do:
- Set the pools 'rdonly' so that no new "write" requests go to the pools on this node
- Disable the doors on this node so they won't service new requests
Using dcache-pool-control
This tool implements the manual instructions below to set pools readonly (shutdown) or back to normal read-write (startup). It is installed under /root/tools on head01 but you could copy it to someplace accessible as your regular user account and run it from another system as long as your public key is installed in the admin account. More info on the details is in the next section.
[root@head01 ~]# /root/tools/dcache-pool-control
Usage: dcache-pool-control shutdown|startup [pool] [pools or pool servers]
Example: dcache-pool-control shutdown umfs16
Example: dcache-pool-control shutdown pool umfs16_2 umfs16_4
Setting the pools rdonly
You'll need to login to the dCache administrative interface. Access is controlled by the public keys in /etc/dcache/admin/authorized_keys2. If you are an admin here we probably put your public key into the file. There is also a public key for local "root" on head01.
To enter the interface:
ssh -l admin -p 22224 head01.aglt2.org
Next 'cd' to the PoolManager and disable the pools on this node for "writes". Assuming UMFS04 is the node we are shutting down:
cd PoolManager
psu set pool UMFS04_1 rdonly
...
psu set pool UMFS04_5 rdonly
To "undo" this you will need to run
psu set pool UMFS04_1 notrdonly
Now the
PoolManager will not assign any "write" requests to this pool However P2P could still be sending writes to the pool. If we want to make sure that NO writes go to these pools we can 'cd' to the pool itself and set it
rdonly:
cd UMFS04_1
pool disable -rdonly
To "undo" this you will need to 'cd' to the pool and issue
pool enable
I am not sure of the impact of directly setting the pool
disable -rdonly
(will that immediately kill P2P transfers or just disallow new ones?).
Disabling Node Doors
This section is no longer relevant to pool nodes. We now have dedicated dcache doors (dcdumXX, dcdmsuXX).
The
LoginBroker service in dCache is responsible for assigning doors to requests. If we want to make sure that a specific node doesn't service new requests we just need to disable the corresponding door in the
LoginBroker:
cd LoginBroker
disable GFTP-umfs04
disable dcap-gsi-umfs04
disable dcap-umfs04
disable dcap1-umfs04
disable dcap2-umfs04
Note that you use the
enable form of the command once you want the door back in service.
Once the pools are
rdonly and the doors are
disabled you need to wait for existing connections and activity to finish. Using
*PCells* and tracking the movers and transfers are a good way to see what is happening (use the
text screens to see details and sort by node). You can also go into the LoginBroker cell and do ls to see what is happening. It may take a while to clear out all ongoing actions.
During a restart/reboot there may still be "failures" caused by requests for unique files that exist only on the "down" pools. If the reboot is quick enough there should be minimal disruption.
--
ShawnMcKee - 20 Dec 2010