Question Asrock X399 System Crash

Jun 22, 2019
6
0
10
Hi all,

i just build a new pc system and have big issues with system stability. The system crashes in differenet situations (sometimes at OS start or after 2 hours even when theres no application running).

The I get following different error codes (bluescreen): PAGE FAULT IN NON PAGED AREA, DRIVER_IRQL_not_less_or_equal, PFN_LIST_CORRUPT or MEMORY MANAGEMENT

This components are build in the system:

  • Asrock Taichi X399 Bios Ver. P3.50
  • AMD Threadripper 1920X
  • G.Skill 32GB Quad-Channel Mode (F4-2400C15Q-32GFT)
  • AMD Radeon RX Vega64
  • Power: be quiet powerzone 850W
  • OS: Windows 10 Pro on a Kingston 240GB SSD via SATA
Today i just reinstalled the whole system and changed to a brandnew SSD (before there was a SanDisk). Then I installed all the drivers from the asrock website, installed the newest Radeon Vega64 driver and checked all drivers for updates.
I did some stresstest on CPU, GPU and RAM and all works fine. I also did a system file ckeck (command: sfc/scannow) and a memory check (command: mdsched) and there were no problems.
But when I restart or wait for 2-3 hours, the system crahses and randomly shows one of the bluescreens I mentioned above.

After reboot there's the info message in windows that the "Radeon Wattman settings were restored to default because of a system crash".

Here are the mini dump files to download: http://s000.tinyupload.com/?file_id=54936420509262466246

I really hope you experts can help me!
 

PC Tailor

Glorious
Ambassador
Welcome to the forums my friend!

Can you confirm if the issue occurs in safe mode?

As for the Radeon Wattman Settings error:
This can technically occur for a number of reasons

How new is the GPU? How old is the PSU?
Do you have latest BIOS installed?
Do you have latest GPU drivers installed?
Do you have latest Windows updates installed?
A poor quality or defective PSU can also cause these issues if it is not supplying adequate power to the system.
A defective GPU can also cause these issues, but we should eliminate software / firmware first.

Being as you have multiple dump files, @gardenman will be able to run them swiftly (also I happen to be having some symbol errors on this debug). I will continue to debug when I get a chance though.
 
Jun 22, 2019
6
0
10
Welcome to the forums my friend!

Can you confirm if the issue occurs in safe mode?

As for the Radeon Wattman Settings error:


Being as you have multiple dump files, @gardenman will be able to run them swiftly (also I happen to be having some symbol errors on this debug). I will continue to debug when I get a chance though.
Thank you for your answers. I can not confirm the issue occurs in safe mode but I will test it today for a couple of hours.

  • the GPU and PSU is brand-new (like all components; I build a complete new system)
  • the latest BIOS was pre-installed by manufacturer
  • latest GPU drivers installed / I also tried an older driver version and did a clean deinstallation in safe mode
  • how can I check the PSU if it is not supplying adequate power?

I'm curious about the debug files, hopefully there are information about the issue
 

PC Tailor

Glorious
Ambassador
Thank you for your answers. I can not confirm the issue occurs in safe mode but I will test it today for a couple of hours.
I will await your feedback regarding safe mode then.

the latest BIOS was pre-installed by manufacturer
What version is this BIOS?

how can I check the PSU if it is not supplying adequate power?
The most effective way to test this is to simply replace the PSU with a known good quality unit and seeing if the issue persists.
There are some alternative methods but they are not guarantees that the PSU is working, just will help identify anything obvious:
  • You can use a multimeter and power up the PSU seperate from the system and test the V output of each rail. Each rail should be within a +/- 5% tolerance. However this only demonstrates voltages at idle, and many issues can occur under load, so this does not capture that.
  • You can also use software such as HWInfo to check voltages of each rail under load, and check for the same tolerance. However software is not always accurate and once again, if the voltages are OK, it doesn't mean the PSU is OK, just that the voltages appear to be stable.
 

PC Tailor

Glorious
Ambassador
To add to this, I am attempting to debug your files and currently have a lot of kernel symbol errors. So I will also await to see if @gardenman encounters the same issue.
This could well be corruption in the dump file, an issue on my debugging side, or potential a hardware error on your system that has causes the modules to become a bit flakey.

This could well coincide with the fact that you are having multiple different BSOD errors, which can be indicative of hardware problems.
It may be worth running memtest to verify the integrity of your RAM modules.
 
Jun 22, 2019
6
0
10
I will await your feedback regarding safe mode then.
System now runs two hours in safe mode without any problem but I'll wait another 4 hours just to be sure.

What version is this BIOS?
BIOS Version is Asrock P3.50 (latest version)
The most effective way to test this is to simply replace the PSU
The PSU I use is known for good quality and the fact that it is new makes it very unlikely that the issue is caused by the PSU. But I definitely will check this piece if we won't find the issue somewhere else
I am attempting to debug your files and currently have a lot of kernel symbol errors
Yes, there are several kernel errors, I can see them in the event viewer (eventvwr.exe). Most of them have the source: Kernel Power (Event-ID 41)
It may be worth running memtest to verify the integrity of your RAM modules.
I ran memtest86+ yesterday but it didn't find errors. But I'll run tonight again – to be sure again
 

gardenman

Splendid
Moderator
I also got multiple symbols errors for all of the files. I made a quicky program to remove the errors from the output. Results with errors remove: https://pste.eu/p/vUqo.html There's some useful info there, system information, driver list and bugcheck errors.

Possible Motherboard page: https://www.asrock.com/mb/AMD/X399 Taichi/index.asp
It appears you have the latest BIOS already installed, version 3.5.

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.
 
  • Like
Reactions: PC Tailor

PC Tailor

Glorious
Ambassador
The PSU I use is known for good quality and the fact that it is new makes it very unlikely that the issue is caused by the PSU
Not necessarily, the Powerzone PSUs are probably around Tier 4 ish - so not great, but certainly not awful. The internals of the Powerzone are well assembled and use Fortron as their OEM, who are fairly good but make a range of products. The Teapo Capacitors are tier 2 ish, so they're usually used in mid-tier units, which is roughly where the powerzone sits from my understanding.

Not suggesting it is the PSU, but it can be, and I've had plenty of new PSUs become faulty so it's something to consider as you said.

Yes, there are several kernel errors, I can see them in the event viewer (eventvwr.exe). Most of them have the source: Kernel Power (Event-ID 41)
Apologies, the kernel symbol errors are unrelated to the event viewer, basically as gardenman has also iterated, in order to debug the dmp files we have to access various MS symbols in order to interpret the modules leading up to the crash, the kernel moduels are basically your core OS modules, and symbols are the language the debuggers use to identify them, and both myself and gardenman seemed to have issues with the symbols in your dmp file, which may suggest corruption.

If the issue does not occur at all in safe mode, then it could be software. So we'll have to see.
It may also be worth running HD Sentinel to verify storage drives as windows can consider part of your storage to be memory (virtual).
 
Jun 22, 2019
6
0
10
Ok, now it's getting interesting. I wanted to do a CLEAN installation for the GPU driver and used DDU from Guru3D to deinstall the drivers in safe mode. After installing the latest AMD Vega64 driver the screen changed to this:

3j0IbNX.jpg


When I restart it looks normal again but this can't be right!?
 
Jun 22, 2019
6
0
10
It is not the GPU I think. I replaced it with an older GeForce GTX 560Ti and after a while the system crashed again (same error codes here).

I then removed 2x8GB RAM from two slots so that only Dual Channel Memory (2x8GB) is enabled and now the system runs stabil for hours. I exchanged the RAM to the other 2x8GB and same here.

But when I re-install the 4x8GB Quad Channel Option the system crashes again. I have two questions now:

  • maybe it's just the RAM that is not compatible (but Asrock says so) even in Quad Channel Mode?
  • or is there a major problem with the Threadripper CPU and one of the memory channel is broken?
 

PC Tailor

Glorious
Ambassador
Occurring in safe mode usually does mean it is hardware.
The dumps Gardenman has run also indicate a lot of symbol errors (potentially from corruption) and do point to memory corruption. Now the HD can also be considered part of memory, however if HD Sentinel checks out, it at least reduces the likelihood that it is the HDD.

If you have changed RAM modules and it appears to have worked, some chipsets can struggle to take the quad channel memory. Are you running any overclock anywhere at all?

Also may be worth running memtest and seeing the results.