• Happy holidays, folks! Thanks to each and every one of you for being part of the Tom's Hardware community!

Question WHEA Logger 18 crashes

Nov 11, 2022
10
0
10
Computer Type: Desktop
GPU: AMD RX 5700
CPU: RYZEN 5 3600 6 CORE 12 THREADS
Motherboard: MSI B450 TOMAHAWK MAX (MS-7C02)
BIOS Version: 3.F0 07/23/2022
RAM: CORSAIR VENGEANCE® RGB PRO 16GB (2 x 8GB) DDR4 DRAM 3200MHz C16
PSU: SilverStone Essential Gold ET650-HG 650W
Operating System & Version: Windows 11 Pro 22H2 Build 22621.819
GPU Drivers: 21.50.21.11-220428a-382767C-AMD-Software-Adrenalin-Edition
Chipset Drivers: AMD B450 CHIPSET DRIVERS VERSION 4.09.23.507
Background Applications: LGHUB, iCUE AMD Adrenaline, HWINFO64, Malwarebytes
Description of Original Problem: Every time I try to play any game, it randomly crashes the whole system. It can happen after one hour of playing, or after a few days. Every time it's WHEA Logger 18 with random APIC ID (it basically crashed on every core ID ). The PC was build in 2019, these errors started at the beginning of this year.
Troubleshooting: I have tried every software fix like sfc. I also turned off cpu overclock and pbo. Temps are always stable, around 65 °C. Gpu is overclocked to 1850mhz with 1100mv max, vram is on 1860mhz. Gpu temps are also stable, around 75°C. I did all OCCT tests without errors. I also did MEMTEST86 for multiple hours without errors. I also tried changing RAM timing and voltage. RAM is currently running default at 2666mhz, no XMP. At this point I am trying to determine which HW component is responsible for the errors. I have a suspicion it's the RAM or CPU, but i am not 100% sure. So any suggestions what should I try next ?
 
Computer Type: Desktop
GPU: AMD RX 5700
CPU: RYZEN 5 3600 6 CORE 12 THREADS
Motherboard: MSI B450 TOMAHAWK MAX (MS-7C02)
BIOS Version: 3.F0 07/23/2022
RAM: CORSAIR VENGEANCE® RGB PRO 16GB (2 x 8GB) DDR4 DRAM 3200MHz C16
PSU: SilverStone Essential Gold ET650-HG 650W
Operating System & Version: Windows 11 Pro 22H2 Build 22621.819
GPU Drivers: 21.50.21.11-220428a-382767C-AMD-Software-Adrenalin-Edition
Chipset Drivers: AMD B450 CHIPSET DRIVERS VERSION 4.09.23.507
Background Applications: LGHUB, iCUE AMD Adrenaline, HWINFO64, Malwarebytes
Description of Original Problem: Every time I try to play any game, it randomly crashes the whole system. It can happen after one hour of playing, or after a few days. Every time it's WHEA Logger 18 with random APIC ID (it basically crashed on every core ID ). The PC was build in 2019, these errors started at the beginning of this year.
Troubleshooting: I have tried every software fix like sfc. I also turned off cpu overclock and pbo. Temps are always stable, around 65 °C. Gpu is overclocked to 1850mhz with 1100mv max, vram is on 1860mhz. Gpu temps are also stable, around 75°C. I did all OCCT tests without errors. I also did MEMTEST86 for multiple hours without errors. I also tried changing RAM timing and voltage. RAM is currently running default at 2666mhz, no XMP. At this point I am trying to determine which HW component is responsible for the errors. I have a suspicion it's the RAM or CPU, but i am not 100% sure. So any suggestions what should I try next ?
How long has the gpu been oc'd and what volts
 
How long has the gpu been oc'd and what volts

Pretty much since the start. The OC is:
1850mhz clock speed at 1100 mV (It was also set at 1200 mV in the past - same behavior). Vram at 1860mhz. Power limit +20%. Temperature was never above 80°C, most of the time it's around 70°C. Do you think the Whea crashes are from GPU OC ?
 
what happens if you put back gpu to default settings .
I will try it tomorrow. But it would still be weird, that normal GPU OC is causing system crash. It also crashes randomly, most of the time the GPU is not at full load. I have also done multiple benchmarks after the OC without issue.]
 
Also, is vram hotter than hot spot?
Not always but it is possible for the vram to be hotter then the hot spot... It depends on where the hotspot sensors is on this specific model... The best way I know to test if the vram is fried is a super strong bench test that will do a focused tested on eack part of the gpu not just stress the gpu but will stress the CPU vram fans it will stress the whole gpu and give feed back I think occt might have one that can it but I'm not sure anymore it's been a while since I have looked into this type of testing
 
Not always but it is possible for the vram to be hotter then the hot spot... It depends on where the hotspot sensors is on this specific model... The best way I know to test if the vram is fried is a super strong bench test that will do a focused tested on eack part of the gpu not just stress the gpu but will stress the CPU vram fans it will stress the whole gpu and give feed back I think occt might have one that can it but I'm not sure anymore it's been a while since I have looked into this type of testing
I have done OCCT for one hour (limit for free version) without problem. I can try it again, maybe for more than once.
 
I have done OCCT for one hour (limit for free version) without problem. I can try it again, maybe for more than once.

This is the only thing I'm finding on stressing your vram right now
 

This is the only thing I'm finding on stressing your vram right now
This software is only utilizing 1GB of vram for me (It's from 2009). I have done another OCCT Vram test without issue.
 
I don't think the GPU is overheating. It's mostly around 65°C and 75°C for a brief moments(I have radeon chill turned on) in GPU intensive games. 80°C was reached only during OCCT benchmark. I also did some more research and I suspect it could be also faulty PSU that causes random crashes.
 
I don't think the GPU is overheating. It's mostly around 65°C and 75°C for a brief moments(I have radeon chill turned on) in GPU intensive games. 80°C was reached only during OCCT benchmark. I also did some more research and I suspect it could be also faulty PSU that causes random crashes.
I was thinking it could be psu related in the begining however I highly don't it. Your psu is a highly qualified psu... That doesn't mean it can't be faulty but generally less likely to be faulty
 
I will try different PSU for now and see, if it really is bad PSU. Actually, bad PSU is the only thing making sense to me, no other component is really showing any type of problem.