03/25/2008 - replaced DIMM B, 1B and DIMM B, 1A with same from oc-1-23. Watching for new alerts, also watch 1-23.
03/24/2008 - machine check exception. Believe bad RAM chip.
...
Machine check events logged
Machine check events logged
Machine check events logged
Machine check events logged
...
Then I ran the mcelog:
[umopt1:~]# mcelog --dmi
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 4 northbridge TSC 5163806bf8f1c
ADDR 1780b20a0
Northbridge Chipkill ECC error
Chipkill ECC syndrome = 3208
bit40 = error found by scrub
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS d404410032080a13 MCGSTATUS 0
Resolving address 1780b20a0 using SMBIOS
WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
Memory device 39 for address 1780b20a0 too short 18 expected 27
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 4 northbridge TSC 516856b26bdbd
ADDR 1780b20a0
Northbridge Chipkill ECC error
Chipkill ECC syndrome = 3208
bit40 = error found by scrub
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS d404410032080a13 MCGSTATUS 0
Resolving address 1780b20a0 using SMBIOS
Memory device 39 for address 1780b20a0 too short 18 expected 27
MCE 2
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 4 northbridge TSC 516d35293049c
ADDR 1780b20a0
Northbridge Chipkill ECC error
Chipkill ECC syndrome = 3208
bit40 = error found by scrub
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS d404410032080a13 MCGSTATUS 0
Resolving address 1780b20a0 using SMBIOS
Memory device 39 for address 1780b20a0 too short 18 expected 27
--
BenMeekhof - 24 Mar 2008