03/25/2008 - replaced DIMM B, 1B and DIMM B, 1A with same from oc-1-23. Watching for new alerts, also watch 1-23.


03/24/2008 - machine check exception. Believe bad RAM chip.


...

Machine check events logged

Machine check events logged

Machine check events logged

Machine check events logged
... 

Then I ran the mcelog:

[umopt1:~]# mcelog --dmi

MCE 0

HARDWARE ERROR. This is *NOT* a software problem!

Please contact your hardware vendor

CPU 2 4 northbridge TSC 5163806bf8f1c 

ADDR 1780b20a0 

  Northbridge Chipkill ECC error

  Chipkill ECC syndrome = 3208

       bit40 = error found by scrub

       bit46 = corrected ecc error

       bit62 = error overflow (multiple errors)

  bus error 'local node response, request didn't time out

      generic read mem transaction

      memory access, level generic'

STATUS d404410032080a13 MCGSTATUS 0

Resolving address 1780b20a0 using SMBIOS

WARNING: SMBIOS data is often unreliable. Take with a grain of salt!

Memory device 39 for address 1780b20a0 too short 18 expected 27

MCE 1

HARDWARE ERROR. This is *NOT* a software problem!

Please contact your hardware vendor

CPU 2 4 northbridge TSC 516856b26bdbd 

ADDR 1780b20a0 

  Northbridge Chipkill ECC error

  Chipkill ECC syndrome = 3208

       bit40 = error found by scrub

       bit46 = corrected ecc error

       bit62 = error overflow (multiple errors)

  bus error 'local node response, request didn't time out

      generic read mem transaction

      memory access, level generic'

STATUS d404410032080a13 MCGSTATUS 0

Resolving address 1780b20a0 using SMBIOS

Memory device 39 for address 1780b20a0 too short 18 expected 27

MCE 2

HARDWARE ERROR. This is *NOT* a software problem!

Please contact your hardware vendor

CPU 2 4 northbridge TSC 516d35293049c 

ADDR 1780b20a0 

  Northbridge Chipkill ECC error

  Chipkill ECC syndrome = 3208

       bit40 = error found by scrub

       bit46 = corrected ecc error

       bit62 = error overflow (multiple errors)

  bus error 'local node response, request didn't time out

      generic read mem transaction

      memory access, level generic'

STATUS d404410032080a13 MCGSTATUS 0

Resolving address 1780b20a0 using SMBIOS

Memory device 39 for address 1780b20a0 too short 18 expected 27


 

-- BenMeekhof - 24 Mar 2008
Topic revision: r2 - 25 Mar 2008, BenMeekhof
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback