Hardware Transition Planning from head01 (old R610) to head01-temp (new R630)
We purchased a new Dell R630 to act as replacement hardware for our existing
head01 system (on R610 hardware). This document describes the neccessary steps to switch to the new hardware with minimal downtime.
The R630 has been installed with SL6.6 64-bit and named
head01-temp (10.10.1.43; 192.41.230.43). It currently has the same version of Postgresql installed as is on the current
head01.
We have configured it to be a hot-standby streaming replica of the current
head01 Postgresql DBs.
There are three main areas that need some detailed planning:
- dCache
- Postgresql
- Host configuration (IP/name; crontabs; RPMS)
dCache
Since dCache is not running on
head01-temp we can copy the complete configuration over from the current
head01 system to make sure it is ready to start and identically configured. After install the right dCache RPMS, the following are the files we need to make sure are copied over from the existing
head01:
Note that some of these files are straight from the dCache rpm, some are modified when GUMS needs to change servers, and some must be constructed one way or the other. Below I will mark those straight from the rpm (*) and those that are constructed (#) and those that are modified (m) starting from the rpm content.
As of Oct 23, 2015, all but the * files, and those marked obsolete, are correctly synced by cfengine.
/root/
.pgpass
/etc/dcache/
dcache.conf
gplazma.conf m (GUMS modified)
logback.xml *
dcachesrm-gplazma.policy (GUMS modified)
hostcert.p12 #
dcache.kpwd
certificates.jks #
tc-config.xml *
info-provider.xml m
httpd.conf
/etc/dcache/admin
authorized_keys2
server_key.pub
server_key
host_key.pub
host_key
ssh_host_dsa_key.pub
ssh_host_dsa_key
authorized_keys (obsolete)
/etc/dcache/layouts
head01.conf
/var/lib/dcache/config
poolmanager.conf
LinkGroupAuthorization.conf
passwd
/etc/grid-security
hostkey.pem
hostcert.pem
monit.pem
storage-authzdb
gsi-authz.conf (obsolete)
grid-vorolemap
Make sure the
dcache-server
service is chkconfig'ed off.
The
/etc/grid-security/vomsdir
and
/etc/grid-security/certificates
directories must be setup and configured as well. There is an
/etc/cron.d/rsync-certificates.cron
which needs setting up. See Host-config below.
dCache can be upgraded on this host anytime before the transition.
Once these are in place we should be ready to do 'dcache start' once we have the old host shutdown, postgresql updated and running and the host reconfigured as
head01
Postgresql Details
We have setup the same version of
postgresql-9.3 running on the new
head01 system. It is currently configured to be a hot-standby streaming replication server. The primary task is to ensure that all changes from the current
head01 are propagated to this host before the original
head01 is shutdown. Steps are detailed below on how to do the transition. The idea is that we make sure we are current before shutting down the hot-standby postgresql on the new head01, then shut it down, move the postgresql configuration files from the current (old) host into place and restart.
The following files will need updating when we transition from hot-standby mode to master:
In
/var/lib/pgsql
are scripts used to "seed" hot-standby hosts from
head01. These should be copied to the new host.
[root@head01 pgsql]# pwd
/var/lib/pgsql
[root@head01 pgsql]# ls
9.3 pg_hba.conf~ reseed_hot_standby-9.3.sh reseed_hot_standby-o-head01.sh tmp.shlost+found pg_ident.conf reseed_hot_standby-n-head01.sh reseed_hot_standby.shmake_backup.sh postgresql-9.2-nfs reseed_hot_standby-nhead01.sh reseed_hot_standby.sh.07Jun2013pg_hba.conf postgresql.conf reseed_hot_standby-nhead01.sh~ setup_ivukotic.psql
The important configuration files are stored in
/var/lib/pgsql/9.3/data
:
pg_hba.conf
pg_ident.conf
postgresql.conf
These just need to be copied into place on the new system once it is up-to-date and shut-down. I have setup
/var/lib/pgsql/head01
and
/var/lib/pgsql/head01-temp
to host these files for the new master and hot-standby setup's respectively.
Host-config
To check RPMS, get a sorted list from each host: On head01-temp:
rpm -qa --queryformat='%{NAME}\n'| sort > head01-temp-rpms.txt
On head01:
rpm -qa --queryformat='%{NAME}\n' | sort > head01-rpms.txt
Generate lists:
comm -2 -3 head01-rpms.txt head01-temp-rpms.txt > rpms-only-on-head01.txt
comm -1 -3 head01-rpms.txt head01-temp-rpms.txt > rpms-only-on-head01-temp.txt
comm -1 -2 head01-rpms.txt head01-temp-rpms.txt > rpms-on-both.txt
I then find the missing packages on
head01 and install them.
Will need GPG keys for RPMS.
scp /etc/pki/rpm-gpg/* root@10.10.1.43:/etc/pki/rpm-gpg/
Will need to temporarily enable the
OSG and
EPEL repo's to get
vo-client
,
voms
and
voms-clients
installed.
scp /etc/cron.d/* root@10.10.1.43:/etc/pki/rpm-gpg/
Make sure
AFS is chkconfig'ed on.
Need to add
/pnfs
mount to
/etc/fstab
:
head02.aglt2.org:/pnfs /pnfs nfs rw,hard,nfsvers=3 0 0.
NOTE: We don't add the LABEL=pgsql-head01 mount since it is controlled by ZFS on new head01.
Networking reconfig
The following locations need updating to move
head01-temp to
head01 on the network
/etc/sysconfig/network-scripts
ifcfg-em1
ifcfg-em2
ifcfg-em3
The
em1 and
em2 (both 10G) participate in a
LACP
bonded configuration. The ifcfg-em3 (1G) is currently running as
head01-temp (10.10.1.43).
To prepare for the network change I made two subdirectories under
/etc/sysconfig/network-scripts
:
head01-temp
and
head01
.
I then copy all the ifcfg-* files into both. Then I edit the
head01
values to match the network information for head01.local and head01.aglt2.org.
We also need to update:
/etc/sysconfig/network
and change
HOSTNAME=head01.aglt2.org
The
/etc/zfs/zpool.cache
may have something coded in it as well, so we might need to do:
zpool set cachefile='/etc/zfs/zpool.cache' pgqsl
I created a script to move to the new network config:
[root@head01-temp network-scripts]# cat mv-net-to-head01.sh
#!/bin/bash
#
# Move network setup from current to head01.aglt2.org
#######################################################
/sbin/service networking stop
unalias cp
cp -f /etc/sysconfig/network-scripts/head01/* /etc/sysconfig/network-scripts/
cp -f /etc/sysconfig/network.head01 /etc/sysconfig/network
hostname head01.aglt2.org
/sbin/service networking start
echo " Restarted network as head01.aglt2.org"
exit
#######################################################
Sequence to transition
On head01(old) | On head01(new) |
| chkconfig postgresql-9.3 and dcache-server off |
dcache stop | |
| Verify postgresql-9.3 running; wait ~2 minutes |
service postgresql-9.3 stop | |
| service postgresql-9.3 stop |
Reconfigure host/network as head01-temp | |
shutdown -h now | |
| Once head01(old) down, run mv-net-to-head01.sh |
| Run cf-agent |
| Move postgresql configuration for head01 in place |
| reboot |
| Verify ZFS properly started pgsql location (/nvme*) |
| service postgressql-9.3 start ; verify proper startup |
| dcache start ; check logfiles; verify proper startup |
| chkconfig postgresql-9.3 and dcache-server to 'on' |
--
ShawnMcKee - 08 Jun 2015