Amd Ryzen Threadripper & X399 MegaThread! FAQ & Resources

-Fran- · Aug 21, 2018

https://www.phoronix.com/scan.php?page=news_item&px=Threadripper-2990WX-DragonFly

Cheers!

jaymc · Aug 21, 2018

Here guy's but doesn't this also mean that the benchs weve been getting on Epyc 32 core Windows Server are all severely hampered by the same bad scheduler ?

I mean it's not a small discrepancy, a lot of the workloads are 2.5x faster in Linux.

I can't wait to see the actual numbers for a 32 Core Epyc Windows Server against Xeon when this is fixed...
Expect the gap to increase a lot... well 2.5x in certain workloads...

If this makes all the previous benchmark results wrong on Threadripper and Epyc...well lol This is huge.

I wonder what the 64 Core is going to look like, ridiculous I'd say 😀

gamerk316 · Aug 22, 2018

jaymc :

There's a couple of things going on here. I'd need some really low level performance stats to know for sure, but my suspicion is simply that the Windows scheduler really isn't designed for this type of workload.

Unlike Linux, Windows is designed to allow threads to jump between cores in order to increase maximum thread uptime. For example, on Linux, if a thread gets bumped by another thread, it will be re-assigned to the same CPU core. The downside is that thread may be waiting for a period of time even if another CPU core is capable of running it. This costs some performance for that specific thread, but you don't have to worry about accessing CPU cache across cores or having to access main memory as often as threads get re-assigned. By contrast, Windows will schedule a thread on whatever core is capable of running it at that time. This results in threads jumping between cores, increasing uptime but also increasing the amount of cache/memory access required.

If you have only a few threads, the Windows approach isn't going to significantly affect performance in a negative way, but as core count on the CPU starts to increase you need to start considering the increased cache/memory access this approach results in. This problem gets significantly amplified in a NUMA environment. Essentially: the Windows thread scheduler is optimized for a single application that utilizes no more then handful of threads.

As for the results on Linux, do notice scaling starts to decline noticeably beyond 16 threads, which is about what I'd expect. If I'm particularly bored (very little work for me right now) I might actually compute some hard numbers to demonstrate.

-Fran- · Aug 22, 2018

But workloads don't escale beyond 8 cores!

Cheers! 😛

gamerk316 · Aug 23, 2018

-Fran- :

Games don't.

Other workloads will, but scaling declines once you get beyond 16 cores due to scheduling overhead. Coincidentally, you see that trend in the result set, where scaling drops significantly when moving beyond 16 cores.

jaymc · Aug 23, 2018

gamerk316 :

That's not exactly correct, a true Dx12 or Vulcan (mantle) title should be able to scale well above 8 cores. Actually it would be capable to break the workload into as many threads as there are free cores or is beneficial for that matter.

Kulasko · Aug 23, 2018

https://www.youtube.com/watch?v=FCAM-3aAzXg&t=6h57m55s

He says 32+ cores in games are no problem if the engine is designed for it from the start.

gamerk316 · Aug 24, 2018

jaymc :

The problem is as you scale up, especially for memory heavy tasks like GPU rendering, you start to run into a lot of scheduling/resource overhead. Unless you have either insanely fast memory access or much larger CPU caches then we currently have, these effects will limit how well you can scale. The APIs can handle it, but the rest of the system can't.

-Fran- · Aug 24, 2018

Well, gamerk, in case you haven't noticed, we do have insanely fast memory that is grossly underused ATM. Maybe when developers start actually taxing the memory subsystem we'll start noticing such bottlenecks you're talking about, but if you ask me, we haven't seen any of them yet.

My take is like before: we haven't seen a proper paradigm shift because of economic reasons and not technical ones.

Cheers!

gamerk316 · Aug 24, 2018

-Fran- :

Latency is more important here then bandwidth; you do NOT want a thread assigned then have to wait several hundred CPU cycles to get the data it needs.

-Fran- · Aug 24, 2018

Not disagreeing, but I'm pretty sure that is why NUMA has such a huge impact in Linux for AMD. Memory management for threading is really important like you say and AMD has it covered with NUMA (to a degree) for massively parallel workloads. Changing the topology of how you arrange the memory for the CPUs has a deep impact on latencies across the hardware. I like the trade off, to be honest. Less effective memory for a better average access is most definitely a great trade off. This just adds to my comment of "we do have fast memory anyway".

What I'd like to see is Microsoft pushing a NUMA-enabled version of Windows10 for AMD. I know Servers have it, but you have to enable it. So, I don't think (or I hope?) it would be a massive cost for Microsoft to push it into consumer space. There's a reason why to do it now, at the very least, so might as well? Hell, maybe there's already a hidden option (since the Kernel is massively similar anyway?) that could enable it? Maybe there's already a way?

Cheers!

gamerk316 · Aug 24, 2018

-Fran- :

The downside to NUMA is you need your workloads to NEVER need to touch the same memory data, otherwise everything grinds to a halt. That's why Windows does so poorly in NUMA, given the way they handle threading (which is optimized for non-NUMA).

genz · Sep 5, 2018

gamerk316 :

Sounds like a problem for... drum roll
https://en.wikipedia.org/wiki/Windows_on_Windows

Maybe they can get a kernel scheduler on each NUMA channel. :lol:

Seriously though, with the amount of 2S and 4S sockets produced throughout history are you honestly telling me that OS schedulers start to destroy performance gains at only 16 threads? There are 8 Cores from 2007 (Core Quad xeons in 4S config). That's very very close to where we currently are in the consumer space already.

-Fran- · Sep 5, 2018

genz :

This might sound weird or even annoyingly direct, but... Do you really think serious work on servers with multitude of CPUs is done with Windows installed?

That is one of the many reasons why you never really use Windows Server for anything *remotely* serious.

All the people developing dotNet applications must be delusional their code is going to be running critical applications or in critical infrastructure using Windows, if at all. At best, web apps or crappy balancing machines. Funny thing: did you know IIS craps out when the CPU is at 100%? It can't accept new connections and it's still a thing they haven't fixed. Dayum!

Anyway, the point is MS hasn't really taken NUMA seriously for some bizarre reason I don't know personally, even when they do "support" it in MS Server.

Cheers!

gamerk316 · Sep 6, 2018

-Fran- :

MSFT doesn't take NUMA seriously because the scheduler they use really isn't designed for that type of workload. At minimum they'd have to re-write their thread scheduler from scratch, which is something they likely don't want to do at this juncture. And I suspect there's a lot of low-level Windows internals that don't play well with NUMA.

Windows was designed around Unified Memory Access; it's kind of no surprise that it starts to break in a NUMA environment.

As for threading, the main problem as you increase thread count is the OS scheduler/memory access starts to become a larger and larger problem, leading to oftentimes significant decreases in performance scaling. You also need to remember that other tasks are also trying to run, and having one application take all the CPU resources often leads to all applications losing performance as they constantly bump eachother's threads while trying to finish their tasks. In an environment where your application is the only thing running and you have direct control of memory access, you could scale to infinity. But neither of those things are typically true, and scaling declines as thread count increases as a result.

As for Windows on Windows, MSFT is just converting all 32-bit memory accesses into 64-bit equivalents in realtime in order to keep the OS happy. Everything is still going through the Win32 APIs under the hood.

gamerk316 · Jan 3, 2019

https://www.hardocp.com/news/2019/01/03/amd_ryzen_threadripper_2990wx_performance_regressions_linked_to_windows_bug/

Called it. Sounds likes Windows was trying to use only one NUMA node, and spent almost all it's time trying to juggle threads onto just that one node.

-Fran- · Jan 3, 2019

gamerk316 :

More like, who was doubting that was not the case?

It's been known for ages that Windows doesn't do NUMA well. At least, in the enterprise world.

Cheers!

genz · Jan 3, 2019

It reads to me like a hardcoding fix for an Intel bug getting in the way. The OS shouldn't be avoiding using memory controllers period, even if it's because of there being no path to IO on the XCC chips.

Windows seems to fall over this regularly. Issuing a bugfix for 1st gen tech in some form that doesn't specifically target the hardware with the problem.

gamerk316 · Jan 3, 2019

genz :

The problem is the scheduler has gotten so heavily optimized, things that are different tend to break in some really weird ways.

But yeah, it is odd the scheduler was trying to put all the threads on just one node, while avoiding the others entirely. That's unusual.

goldstone77 · Jan 6, 2019

wendell
‏@tekwendell
Jan 1

Owners of Threadripper 2990WX and 2970wx on Windows are going to be very very happy in a day or two when our video drops. 😀

https://twitter.com/tekwendell/status/1080160803188281344

-Fran- · Apr 18, 2019

looprix said:
Hello everyone! I purchased the 2990wx back in September mainly for 3d rendering. It was great out of the box and running 100%. Recently I noticed while rendering in Keyshot the chip seems to be throttled. September 2018: 3ghz 100% all cores, April 2019: 2.23ghz ~65% all cores. I am not sure where to start, can someone please lead me in the right direction?

notes:
-Wendell's Coreprio does not seem to help.
-No other programs are running

AMD 2990wx, Enermax liquid cooling
Asus ROG Zenith Extreme
128gb Corsair Vengeance LPX
Evga 1080 ti
Evga 1600 watt PSU

Well, I suggest you start your own Question thread, as this one is not meant for such advise. Once you do, I'll put my thoughts in it 😀

Cheers!

looprix · Apr 18, 2019

Yuka said:
Well, I suggest you start your own Question thread, as this one is not meant for such advise. Once you do, I'll put my thoughts in it 😀

Cheers!

moving...

-Fran- · May 28, 2019

I hope this is not thread necromancy. This is an interesting view on Windows and Linux 's schedulers:

https://www.phoronix.com/scan.php?page=article&item=windows-1903-threadripper&num=1

Well, it's in the context of the new update from Windows, but the implied is they are actually testing the schedulers first and second the ports from one OS to the other (and vice versa) of the few cross-functional (pro) software.

Cheers!

gamerk316 · May 30, 2019

I will say that Linux's approach to thread scheduling (thread pools) makes more sense with higher and higher core counts; you get more consistent performance then Windows' approach, which has the side effect of threads jumping between cores as they get bumped/rescheduled. Windows theoretically offers higher total system bandwidth as the highest priority thread(s) always run, but any performance benefit is often lost when threads get rescheduled to different CPU cores.

That being said, Linux's approach of giving every thread on a CPU cores thread pool an equal chance to run (fair scheduling) gimps performance in certain tasks (games) where one or two threads do a disproportionate amount of work.

Regardless, it's becoming clear Windows needs to re-design their thread scheduler going forward. That and tossing the NTFS file system are two overdue changes that MSFT needs to get working on.

TechyInAZ · Jun 12, 2019

Hey Everyone!!! In an effort to organize a few stickies in the CPU subforum, I will be shutting this megathread down.

ALL discussion about Ryzen Threadripper and everything Ryzen now goes to the main Ryzen Megathread here --> https://forums.tomshardware.com/threads/amd-ryzen-megathread-faq-and-resources.2943570/

Thank you!

Amd Ryzen Threadripper & X399 MegaThread! FAQ & Resources

Glorious

Distinguished

Glorious

Glorious

Glorious

Distinguished

Honorable

Glorious

Glorious

Glorious

Glorious

Glorious

Distinguished

Glorious

Glorious

Glorious

Glorious

Distinguished

Glorious

Distinguished

Glorious

Distinguished

Glorious

Glorious

Titan

Share this page