Removing PNFS (Chimera) Ghosts
There is the possibility that the
chimera DB can become out-of-sync with the actual files stored on disk. The
t_dirs table holds the "tree" of the /pnfs namespace in Chimera. The
t_locationinfo table holds the physical location for a specific
ipnfsid. "Ghosts" are when there is an entry in the
t_dirs table but not a corresponding location in the
t_locationinfo table.
To find this I proceed as follows:
- Connect to the chimera DB and output the ipnfsid values from both t_dirs and t_locationinfo:
chimera=> \o /tmp/pnfsid_dirs.log
chimera=> select ipnfsid from t_dirs;
chimera=> \o /tmp/pnfsid_locationinfo.log
chimera=> select ipnfsid from t_locationinfo;
chimera=> \q
root@head02 ~# wc /tmp/pnfsid_dirs.log
3517723 3517723 94580872 /tmp/pnfsid_dirs.log
root@head02 ~# wc /tmp/pnfsid_locationinfo.log
2979423 2979423 79808840 /tmp/pnfsid_locationinfo.log
- Edit the above output files to remove the header and blank lines
- Sort the output files for use with the comm utility on Linux:
sort -u /tmp/pnfsid_dirs.log > /tmp/pnfsid_dirs.log.sorted
sort -u /tmp/pnfsid_locationinfo.log > /tmp/pnfsid_locationinfo.log.sorted
Check the unique count:
root@head02 ~# wc /tmp/pnfsid_dirs.log.sorted
3164344 3164344 84768704 /tmp/pnfsid_dirs.log.sorted
root@head02 ~# wc /tmp/pnfsid_locationinfo.log.sorted
2978103 2978103 79773966 /tmp/pnfsid_locationinfo.log.sorted
NOTE: At this point we have two sets of PNFSIDs to reconcile BUT we can't simply assume the if the ipnfsid present in
t_dirs is NOT present in
t_locationinfo it is a "ghost". The reason is that
directories won't have any entry in
t_locationinfo. So we need to find the list of POSSIBLE "ghosts" but then remove any directories which are contained in the list.
We can get a list of PNFSIDs for "directories" as follows:
\o /tmp/pnfsid_directories.log; select ipnfsid from t_inodes where itype=16384;
Next sort these:
sort -u /tmp/pnfsid_directories.log > /tmp/pnfsid_directories.log.sorted
Check the count:
root@head02 ~# wc /tmp/pnfsid_directories.log.sorted
176727 176727 4747829 /tmp/pnfsid_directories.log.sorted
We can use this list to remove directory entries from any potential "ghost" list.
- Next get a list of POTENTIAL ghosts:
comm -2 -3 /tmp/pnfsid_dirs.log.sorted /tmp/pnfsid_locationinfo.log.sorted > /tmp/pnfsid_ghosts_v1.log
- Use the possible list of ghosts to create a final list of "ghosts" by removing the pnfsids of the directories:
comm -2 -3 /tmp/pnfsid_ghosts_v1.log /tmp/pnfsid_directories.log.sorted > /tmp/pnfsid_ghosts_final.log
Here is the actual results:
root@head02 ~# wc /tmp/pnfsid_directories.log.sorted
176727 176727 4747829 /tmp/pnfsid_directories.log.sorted
root@head02 ~# comm -2 -3 /tmp/pnfsid_dirs.log.sorted /tmp/pnfsid_locationinfo.log.sorted > /tmp/pnfsid_ghosts_v1.log
root@head02 ~# wc /tmp/pnfsid_ghosts_v1.log
186261 186262 4995475 /tmp/pnfsid_ghosts_v1.log
root@head02 ~# comm -2 -3 /tmp/pnfsid_ghosts_v1.log /tmp/pnfsid_directories.log.sorted > /tmp/pnfsid_ghosts_final.log
root@head02 ~# wc /tmp/pnfsid_ghosts_final.log
9572 9573 249029 /tmp/pnfsid_ghosts_final.log
So we have 9572 "ghosts" in our PNFS space.
We then use a simple perl/DBI script to remove these entries from the
t_dirs,
t_level_2 AND the
t_inodes tables, in that order.
root@head02 ~# perl remove_pnfsid_ghosts.pl /tmp/pnfsid_ghosts_final.log
Starting at Fri May 22 15:38:18 2009
Processed 1 entries at Fri May 22 15:38:18 2009
Processed 101 entries at Fri May 22 15:40:09 2009
Processed 201 entries at Fri May 22 15:42:10 2009
...
Processed 9001 entries at Fri May 22 18:36:03 2009
Processed 9101 entries at Fri May 22 18:38:01 2009
Processed 9201 entries at Fri May 22 18:39:59 2009
Processed 9301 entries at Fri May 22 18:41:57 2009
Processed 9401 entries at Fri May 22 18:43:54 2009
Processed 9501 entries at Fri May 22 18:45:52 2009
Finished deleting 9571 records from CHIMERA DB at Fri May 22 18:47:17 2009
The script looks like:
#!/usr/bin/perl
#
# remove_pnfsid_ghosts.pl - This script reads from a file the list
# of ghost pnfsids to be removed from the chimera DB
#
# Shawn McKee <smckee@umich.edu> on May 22, 2009
####################################################
use DBI;
use DBD::Pg;
my $verbose=1;
my $dbh=DBI->connect("DBI:Pg:dbname=chimera;host=head02.aglt2.org","<user>","",{ RaiseError => 1});
my $sth1=$dbh->prepare('delete from t_dirs where ipnfsid=?') or die "Couldn't prepare statement: " . $dbh->errstr;
my $sth2=$dbh->prepare('delete from t_level_2 where ipnfsid=?') or die "Couldn't prepare statement: " . $dbh->errstr;
my $sth3=$dbh->prepare('delete from t_inodes where ipnfsid=?') or die "Couldn't prepare statement: " . $dbh->errstr;
print " Starting at ".localtime(time())."\n";
my $infile=$ARGV[0];
chomp($infile);
# The input file contains a list of PNFSIDs (1/line) of ghost PNFS entries
# to be removed from the set of CHIMERA tables.
open(IN,"<$infile") or die "Unable to open $infile: $!\n";
$cnt=0;
while (<IN>) {
if ( ($cnt++ % 100) == 0 ) {
print " Processed $cnt entries at ".localtime(time())."\n";
}
chomp;
/\s*([\S]+)/;
$pnfsid=$1;
# print " Found PNFSID=$pnfsid\n";
$sth1->execute($pnfsid) or die "Couldn't execute statement: " .$sth->errstr;
$sth2->execute($pnfsid) or die "Couldn't execute statement: " .$sth->errstr;
$sth3->execute($pnfsid) or die "Couldn't execute statement: " .$sth->errstr;
# last;
}
close(IN);
$dbh->disconnect;
print " Finished deleting $cnt records from CHIMERA DB at ".localtime(time())."\n";
exit;
--
ShawnMcKee - 22 May 2009