[SOLVED] New build fails stress test at stock

Alexis81

Commendable
Feb 29, 2020
11
0
1,510
Hi

I wonder if i could get some advice. I have the following new components:

Gigabyte Z390 UD
i5 9600K - Also tried 9600KF (see below)
2x8GB Cruicial Balliastix 3200 (stock 2400 for testing)
Corsair CX650 PSU
WD Blue nvme 1TB
Arctic Freezer 34 CO
Win 10 Pro x64

Also using a used GTX 1080 and 2 used Crucial MX500 SSDs (all working fine with my old machine)

Only about a week old, no issues in games etc so far, no overclock has been applied at any point.

Wanted to see if i could overclock so tested stability/temps at stock first, Prime 95 fails small FFT test on one core almost instantly. Has also caused instant lock ups and a blue screen. Blend test fails when it gets to the small FFT part. LinX fails within 10 minutes. Temps are ok and the highest core temp i've seen is 71c.

I assumed the CPU was bad as it was always the same core that failed on prime so I ordered another (a 9600KF this time due to price). Dropped it in, cleared the CMOS, and the same issue happens. Seems to be a different core failing now on Prime 95 but I have seen failures on different cores. My LinX test on the new CPU failed in on the first run - within 10 seconds.

When Prime fails the other cores seem to continue ok but I havent left it going for more than about 20 minutes after the first failure.

Latest BIOS used for board (F9). CMOS reset multiple times. I have tested the RAM using windows memory test and memtest64 for about 30mins without issues. Clean Win 10 install.

My next instinct is to try another motherboard, I dont have another so will be getting one delivered. Worth going for a better board? With better VRM?

Seems unlikely to be the PSU as no issues occur running the GTX 1080 under full load while gaming. Can't think what else could be wrong!

Any input would be appreciated.

Many thanks

Alexis
 
Solution
OH - you probably need to increase the cpu current limit in bios. It should be in the advanced cpu power settings somewhere.
The default 100% power limit is still being hit, even with AVX disabled, it just takes a little longer. You shouldn't need a crazy number like 200% current - 140-150% should be enough.
Take one stick of RAM out and test again. Then swap for the other stick and test just that one.

Thanks, I just tried that, tried each stick indepedently. Instant Prime 95 error on one core with first stick. Blue screen with the other stick after a few seconds of P95

I might be wrong but doesn't the small FFT test just use the CPU cache, rather than the RAM?
 
I agree it was worth testing the RAM. I had a look at the socket when replacing the CPU, couldn't see any damage. I guess I'll just try a different board.
 
Got a new motherboard - a Z390 Aorus Pro this time and still have the same issue. More frequent bluescreens with linx/p95 now it seems.. Machine_check_exception or clock_watchdog_timeout usually within a minute.

I got my old PSU out and tried that but the same happens. Tried another fresh Windows install on another SSD + cable with iGPU only but still the same problem. No issues with temps... I'm completely stumped. 😕
 
As I was reading through some of your posts...
-I would suggest not using LinX - ditto for Prime95 with AVX enabled. The loads they put on the cpu and VRMs just aren't practical.
I'd bet the old Z390 UD's VRMs were being strangled when you were still using it. Prime 95 AVX disabled, and Cinebench R20 'infinite loop' are more reasonable cpu stress tests.
I tried LinX once... for about 10 seconds. Never looked at it again.

-Then again, maybe you were already running AVX disabled? You didn't really specify; it's enabled by default, if I recall correctly.

-What is your case?

-"Unlikely that both sticks would be bad. Also unlikely that you'd get two faulty CPUs."
I agree with this statement - but how hot were the motherboard VRMs getting? HWINFO can tell you.
 
Thanks for your reply. AVX was enabled on all tests. I just tried p95 small fft with AVX2 and AVX both disabled and it lasted longer... But still bluescreened in less than 5 mins. I didn't get a chance to see the vrm temps before it failed.

Because it fails p95 with AVX instantly (within 5 seconds) I doubt that temps are the issue. Been testing with the side of the case off most of today in a cool room. It's a corsair 275q case with 3 fans in the front and one at the back - all120mm Arctic pwm 1300rpm running around 800 rpm at idle so there's a fair bit of ventilation.

I won't be able to test again until tomorrow evening now. But I'll try to note the vrm temps and report back. Cheers
 
Retested same as above, small ffts, no avx. IGPU only. Bluescreen after about 6 minutes.
There are lots of temp sensors on hwinfo... The VRM MOS sensor never got above 40c, there is a VR Loop1 sensor that peaked at around 55c. This is the highest temp I could see across all sensors. Core temps never above 55c. Ambient room temp is 16c (need to put the heating on!!)
This test seems a bit weak without AVX, but my system still fails it 😔

I also just tried running prime 95 within Linux from a ubuntu USB drive. Left AVX on and got instant worker fail as per usual.

At this point I'm considering sending everything back and going AMD instead! I feel like it's 2 defective cpus but it does seem unlikely.
 
OH - you probably need to increase the cpu current limit in bios. It should be in the advanced cpu power settings somewhere.
The default 100% power limit is still being hit, even with AVX disabled, it just takes a little longer. You shouldn't need a crazy number like 200% current - 140-150% should be enough.
 
Solution
OH - you probably need to increase the cpu current limit in bios. It should be in the advanced cpu power settings somewhere.
The default 100% power limit is still being hit, even with AVX disabled, it just takes a little longer. You shouldn't need a crazy number like 200% current - 140-150% should be enough.

Thanks, I found cpu power limit and per core limit options in the bios. Both were set to auto. The only options for these were auto, enabled and disabled. I set both to disabled and re ran the small fft no avx test. It lasted longer but still bluescreened after 20 mins unfortunately. According to hwinfo and core temp the cpu was drawing 75w during the test so well under the TDP?
 
Thanks, I found cpu power limit and per core limit options in the bios. Both were set to auto. The only options for these were auto, enabled and disabled. I set both to disabled and re ran the small fft no avx test. It lasted longer but still bluescreened after 20 mins unfortunately. According to hwinfo and core temp the cpu was drawing 75w during the test so well under the TDP?
That doesn't sound like the right one.
I'll go and take a look at the online mobo manual.
 
OK.
Bios > MIT > Advanced Cpu Core Settings.
Power Limit TDP (Watts) / Power Limit Time: Enter a stupidly high number and press Enter. The motherboard should then default to the highest possible value it offers.
Core Current Limit (Amps): 300-500 should be enough - or you can just do the above. You're just trying to test the stock frequencies anyway.
 
Maybe a silly question but where did you buy the CPU/motherboard/RAM from? Everything was bought new, correct? 2 motherboards and 2 CPUs all bad?

I have a 9600k on a Gigabtye Z390 Gaming SLI and haven't had any issues.

All new from Amazon

OK.
Bios > MIT > Advanced Cpu Core Settings.
Power Limit TDP (Watts) / Power Limit Time: Enter a stupidly high number and press Enter. The motherboard should then default to the highest possible value it offers.
Core Current Limit (Amps): 300-500 should be enough - or you can just do the above. You're just trying to test the stock frequencies anyway.

I'll give this a try
 
Think they changed the layout of the bios since the manual was made. The option I found before does seem to be closest to what you're saying but I get more options when I choose enable

Package power limit1 TDP (W)
package power limit1 time
Package power limit2 (W)
Package power limit2 time
Platform power limit1 etc etc
Dram power limit1 etc etc

I think when I set it to disabled initially it disables all limits? Otherwise which ones should I change?
 
Maybe a silly question but where did you buy the CPU/motherboard/RAM from? Everything was bought new, correct? 2 motherboards and 2 CPUs all bad?

I have a 9600k on a Gigabtye Z390 Gaming SLI and haven't had any issues.

Have you tested your cpu at stock with prime95/linpack extreme/occt small data set?
 
Think they changed the layout of the bios since the manual was made. The option I found before does seem to be closest to what you're saying but I get more options when I choose enable

Package power limit1 TDP (W)
package power limit1 time
Package power limit2 (W)
Package power limit2 time
Platform power limit1 etc etc
Dram power limit1 etc etc

I think when I set it to disabled initially it disables all limits? Otherwise which ones should I change?
I see. All of those seem to expand upon 'Power Limit TDP (Watts) / Power Limit Time'.
What about 'Core Current Limit' though? If it's no longer there, then it would be integrated with the above settings.

When you disable them, then the bios will run Intel defaults.
Enable, and enter unrealistic numbers as I mentioned earlier.
 
I might have solved this... Changed loadline calibration from auto to medium and just passed 15 minutes p95 small ffts with AVX on!!!
I'm thinking intel/gigabyte underestimating vdroop when all cores taxed heavily

Phaaze88 - I'll look at that setting if it fails again. I don't think there's any power throttling going on though as cpu power draw currently showing as 118W during prime. Thanks for your help :)
 
You can make very quick adjustments upward in clock speed (adjusted to your preference according to how many cores active), Vcore offset, etc., within Intel's XTU...

Any sign of instability, i.e., a blue screen, or any hard reset or power off without a proper shutdown, it will default back to all stock settings...

Good luck, and happy gaming!