Quantcast
Channel: Business PCs, Workstations and Point of Sale Systems topics
Viewing all articles
Browse latest Browse all 10504

Z620 2xCPU 96Gb: Kernel reports for MCE memory errors after two days of running

$
0
0

Hi everyone,

 

HP Z620 (158A) with 2x E5-2670 and 96Gb (12x 8gb) of ram, latest BIOS, Ubuntu Linux.

 

I have strange behavior with the system that I use as homelab server and maybe someone solved that before.

 

After about two days kernel reports for MCE errors like this: 

EDAC MC0: 32024 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x1044 offset:0x840 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0091 socket:0 ha:0 channel_mask:2 rank:0)

 

Reboot helps and it can run for two days again.

 

I tried to cycle memory modules, but the bank # in the message remains the same. I think modules are completelly fine at this moment.

 

Memtest86 reports no errors.

 

If I remove raiser board with the second CPU with memory on it (it's detachable) - no errors reported for a week (not tested more). But it looks overkill for me to loose one CPU to beat that issue.

 

My next step was to add 'mce=ignore_ce' to boot options. It doesn't report MCE errors as designed. But after two days I noticed that overall system performance degraded drastically. 

For example an app starts in 16 secs on fresh system and after two days it takes 101 seconds to start. The system idle was 99% before I start it.

 

Now I puzzled, what to try next?


Viewing all articles
Browse latest Browse all 10504

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>