HowTo Shutdown a Pool Node While In Production

We sometimes need to restart/reboot pool nodes and would like to make this as least disruptive to the production system as possible. There are two things to do:
  • Set the pools 'rdonly' so that no new "write" requests go to the pools on this node
  • Disable the doors on this node so they won't service new requests

Using dcache-pool-control

This tool implements the manual instructions below to set pools readonly (shutdown) or back to normal read-write (startup). It is installed under /root/tools on head01 but you could copy it to someplace accessible as your regular user account and run it from another system as long as your public key is installed in the admin account. More info on the details is in the next section.

[root@head01 ~]# /root/tools/dcache-pool-control

Usage: dcache-pool-control shutdown|startup [pool] [pools or pool servers]

Example: dcache-pool-control shutdown umfs16

Example: dcache-pool-control shutdown pool umfs16_2 umfs16_4

Setting the pools rdonly

You'll need to login to the dCache administrative interface. Access is controlled by the public keys in /etc/dcache/admin/authorized_keys2. If you are an admin here we probably put your public key into the file. There is also a public key for local "root" on head01.

To enter the interface:
ssh -l admin -p 22224 head01.aglt2.org

Next 'cd' to the PoolManager and disable the pools on this node for "writes". Assuming UMFS04 is the node we are shutting down:
cd PoolManager
psu set pool UMFS04_1 rdonly
...
psu set pool UMFS04_5 rdonly

To "undo" this you will need to run
psu set pool UMFS04_1 notrdonly

Now the PoolManager will not assign any "write" requests to this pool However P2P could still be sending writes to the pool. If we want to make sure that NO writes go to these pools we can 'cd' to the pool itself and set it rdonly:
cd UMFS04_1
pool disable -rdonly

To "undo" this you will need to 'cd' to the pool and issue
pool enable

I am not sure of the impact of directly setting the pool
disable -rdonly
(will that immediately kill P2P transfers or just disallow new ones?).

Disabling Node Doors

This section is no longer relevant to pool nodes. We now have dedicated dcache doors (dcdumXX, dcdmsuXX).

The LoginBroker service in dCache is responsible for assigning doors to requests. If we want to make sure that a specific node doesn't service new requests we just need to disable the corresponding door in the LoginBroker:
cd LoginBroker
disable GFTP-umfs04
disable dcap-gsi-umfs04
disable dcap-umfs04
disable dcap1-umfs04
disable dcap2-umfs04

Note that you use the enable form of the command once you want the door back in service.

Once the pools are rdonly and the doors are disabled you need to wait for existing connections and activity to finish. Using *PCells* and tracking the movers and transfers are a good way to see what is happening (use the text screens to see details and sort by node). You can also go into the LoginBroker cell and do ls to see what is happening. It may take a while to clear out all ongoing actions.

During a restart/reboot there may still be "failures" caused by requests for unique files that exist only on the "down" pools. If the reboot is quick enough there should be minimal disruption.

-- ShawnMcKee - 20 Dec 2010
Topic revision: r3 - 07 Jan 2015, BenMeekhof
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback