Skylake Bugs Aren't Odd, They're Prime

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Wot? It's not like there's a "compute prime number" instruction that has a bug. Whatever's at the root of this problem surely affects things other than computing primes. That's just a handy way for people to stress the AVX2 engines of their CPUs. But AVX2 is used for many things, and we don't even know if the bug is specific to AVX2.

This might come as a surprise to you, but there are workloads that involve heavy-duty number crunching, besides just computing prime numbers.

Without getting into the details of the bug, it's not possible to know how broad the impact could be. It might even be the cause of other instability people have experienced with Skylake that's not generally regular or reproducible enough to track back to that problem.
 
AVX2 at a lower clock is still faster than non-AVX2 at a higher clock. It's called a trade-off. The desktop Haswell increases the voltage when AVX2 gets used, the Xeon Haswells reduce the clock speed and keep the voltage the same.

Not on the 4790K---mine goes straight for 100C before it thermally throttles. That doesnt seem like a great way of reducing the clock, you know, making it so hot that the chip death protection cuts in.

Lookup how to limit your TDP with Intel's XTU. Then worry about real world load instead of AVX simulators. If you want to do these intense type of workloads then you should have excellent cooling and know how to overvolt a little to stabilize.
If you are using the stock Intel cooler on your 4790k I can tell you that it's barely enough for the XTU stress test and def not enough for Prime95. When AVX heavy hits Haswell it ups your voltage +.100v (more than "supposed" max). This spikes temp so you need cooling or you need to throttle with TDP in XTU. Your wattage and amperage are out of control and you'll see when you open XTU.
 


Sounds like you have a firmware issue with your NVMe SSD. Dell and MS have released BIOS updates fixing NVMe drives running at full blast all the time killing battery life. Shutdown should take longer since it essentially hibernates before shutdown. Boot time should be fine but there's lots of reasons for long loads especially if you are literally shutting down. Try "restarting" and see if it's faster than shutdown, turn on. Did you try to save space by disabling hiberfile or pagefile size?
 


I'd start with this from the AVX Wiki page.

"- Suitable for floating point-intensive calculations in multimedia, scientific and financial applications (integer operations are expected in later extensions).
- Increases parallelism and throughput in floating point SIMD calculations.
- Reduces register load due to the non-destructive instructions.
- Improves Linux RAID software performance (required AVX2, AVX is not sufficient)
...
Prime95/MPrime, the software used for GIMPS, started using the AVX instructions since version 27.x"

https://en.wikipedia.org/wiki/Advanced_Vector_Extensions

Then under FMA:

"Fused multiply–add can usually be relied on to give more accurate results. However, Kahan has pointed out that it can give problems if used unthinkingly.[2] If x2 − y2 is evaluated as ((x×x) − y×y) using fused multiply–add, then the result may be negative even when x = y due to the first multiplication discarding low significance bits. This could then lead to an error if, for instance, the square root of the result is then evaluated."

This also sounds like a possible TSX issue like was found in Ivy Bridge+. Maybe they tried to re-implement it with Skylake and now we are getting Race Conditions without proper MUTEX...
 
Why? I don't see your point. I know vector instructions and FMA quite well. I actually use these in my job.

I doubt this is a TSX issue. If it were, Prime 95 is one of the last places I'd expect problems to appear. At the very least, it should be plaguing all kinds of other activities & software, first. And if it were, then it probably couldn't be fixed via a BIOS patch.

Plus, after the embarrassment they suffered with previous TSX bugs, I'm guessing they tested it properly, in Skylake.

So, until there's any word about TSX from Intel or the reporters of this bug, I'd classify that as wild speculation. There's a lot that can go wrong in hardware which could result in system lockups.
 


Either customized or something is wrong with your system. None of Intel's CPUs will get anywhere near 100c when loaded at 100% using stock cooling with a reasonable ambient temperature.

 


Prime 95 and some other stress tests can do that even without AVX2 if you use the stock cooler in many situations. with AVX2 acting like that, I wouldn't be surprised. Those coolers aren't meant for heavy workloads.
 
^ what he said. prime95 v28 or later heavily saturates the chip with AVX2/FMA instructions and the chip voltage spikes---I saw 150W in CoreTemp with it running with 8 threads on my 4790K. At that point, even a custom closed loop cooler is going to have trouble keeping the chip under the 100C throttle point. The problem apparently is that the area(s) of the chip that get very soaked with heat at this time do not make great contact with the heat spreader.

 
Status
Not open for further replies.