AMD Unveils Zen Microarchitecture, Demos Summit Ridge Performance

InvalidError · Aug 26, 2016

RedJaron :

Multi-core and multi-threaded CPUs have been mainstream for nearly 15 years now and software that relies heavily on a single active thread most of the time is still most common. There have been multiple attempts at tacking multi-threading extensions to many programming languages and most have been unsuccessful beyond specialized math libraries. I would not be so optimistic about quad cores becoming entry-level having any effect on 15 years of mostly failed attempts at making multi-threading more developer-friendly.

It does not look like multi-threading will be breaking out of special use-cases (embarrassingly parallel math) any time soon.

RedJaron · Aug 26, 2016

Eh, maybe I'm overly optimistic. I guess I see it that while multi-cores have been available for a while, most consumers were still on duals. Parallelization potential there is helpful, but not as big impact as on quad-cores. So while it could be done, the returns weren't as noticeable. If Zen can indeed usher in period where quad-cores are the norm, perhaps devs will re-visit the idea. Yes, it can be difficult to properly thread code and devs are usually overworked so they need to take the fastest path the get things shippable on time. But if quads become normal, and eight thread CPUs are prevalent, maybe the potential returns become enticing enough for coders to put in the extra work.

But as I said, it could do great things. It could also not mean anything and things will continue on the status quo. But with the silicon and clock speed walls we've seen in the past few years, the CPU industry has had to go broad instead of fast. I think coders will follow that as best they can ( not all code can be, nor should be parallelized of course ).

InvalidError · Aug 26, 2016

RedJaron :

Before worrying about parallelizing across four cores, you need to figure out how to make your code work across two cores. You won't get any benefits until you figure out an effective and efficient method of accomplishing at least this much. Once you have achieved that, you can revisit your algorithm and try to expand it to either four cores or an arbitrary number of cores. Many times, multi-threading that works across two cores requires a complete overhaul to make it work reasonably well across four and things only get worse from there.

Aside from embarrassingly parallel stuff and independent task delegation (ex.: mixing audio, unpacking textures, prefetching data from storage, etc.), multi-threading is usually more trouble than it is worth. This is why most games have one thread using 100% of one core, a second thread using 50% of a second core, and a bunch of minor threads using 0-5% of a core each for another 25-50% aggregate when you spy on per-thread CPU usage with Process Explorer. The game may have 50 threads but total compute time is only about two whole cores worth.

JamesSneed · Aug 26, 2016

InvalidError :

RedJaron :

Before worrying about parallelizing across four cores, you need to figure out how to make your code work across two cores. You won't get any benefits until you figure out an effective and efficient method of accomplishing at least this much. Once you have achieved that, you can revisit your algorithm and try to expand it to either four cores or an arbitrary number of cores. Many times, multi-threading that works across two cores requires a complete overhaul to make it work reasonably well across four and things only get worse from there.

Aside from embarrassingly parallel stuff and independent task delegation (ex.: mixing audio, unpacking textures, prefetching data from storage, etc.), multi-threading is usually more trouble than it is worth. This is why most games have one thread using 100% of one core, a second thread using 50% of a second core, and a bunch of minor threads using 0-5% of a core each for another 25-50% aggregate when you spy on per-thread CPU usage with Process Explorer. The game may have 50 threads but total compute time is only about two whole cores worth.

With DX12's multi-threaded command buffer recording which allows the CPU to accept and dispatch command buffer submissions on all cores, I think we will see more usage for more than four cores in the gaming arena. It brings some extra multi-threading one top of the normal 1 strong thread and a couple weak threads we normally see today.

InvalidError · Aug 26, 2016

JamesSneed :

Nothing stopped game developers from splitting the frame preparation work across multiple threads and then letting the display thread forward the data to the drivers before other than most developers not being willing to put in that extra work. I doubt DX12 relieving some of the overhead from developers (no longer needing to merge lists prepared by different threads or put them in a queue for the dispatch thread, a trivial operation in the first place) will change this all that much. Developers will still need to go through all the extra multi-threaded workload partitioning work that they didn't want to do before to make use of DX12's multi-threaded dispatch.

DX12 is not a panacea. It won't miraculously enable DX12 games to go massively multi-threaded. Most of the burden is still on the programmers' laps and all DX12 does is loosen the API bottleneck.

AndrewJacksonZA · Aug 29, 2016

InvalidError :

Thank you! That's what I keep telling people, especially the AMD fan boys. DX12 & async compute and shading aren't magic wands that developers can wave over their games and Hey Presto! there are performance improvements. It takes the application of intelligence and skill to get results, and if companies aren't willing to do it properly there's a chance that it will turn out poorly (I'm looking at you, Hitman.)

Yes, Microsoft and AMD are helping game developers with DX12 and mGPU issues (I don't know about Nvidia, besides whatever proprietary stuff they're doing in their Gameworks black boxes. Are they helping devs who aren't using GW?) but it still requires effort.

srmojuze · Aug 29, 2016

Mason12 :

I think this is the crux of the issue. The price of an unlocked quad Core i7 is quite astronomical given the price of an almost-equivalent Skylake "Pentium" G4400.

Consider this. Take a single G4400 core, use four of those, and unlock the multiplier.

Intel is already playing a dangerous marketing game to keep the sales of the very best chips ticking (pun unintended) along.

Don't get me wrong, an 4.4 ghz and beyond quad or six core Core i7 with hyper threading is extremely beastly, but there are many risk factors at play. For example, if they didn't force-lock the multiplier Core i3 and even Pentium Skylakes can do lots at 2 cores, especially if pushed to 4ghz and above.

Consider this as well - many of our most demanding applications are finally moving to GPU as heralded years ago. Video editing, raytracing, 3D rendering (offline), 3D rendering (gaming), physics calculations, video encoding, video decoding, UI compositing, is all much better on a GPU and that's also where a lot of enthusiast dollars are going.

Intel made a notable effort in GPU but it's still quite far behind.

TL;DR Intel is in a pickle but AMD keeps missing the punches while Intel sits there wide open for the licking.

AndrewJacksonZA :

Red vs Blue aside there is a lot of tumult in the PC gaming industry. As you point out the core problem is things like DX12 and Gameworks promising so much but always somehow stumbling at the finishing line, usually because of things like cheap outsourcing of console ports! And the DLC, that's just salt in the wound of how PC gamers are treated.

At this point playing a decent DX11 game that you feel gives you value-for-money is probably what most people strive for. Sure, the GPU driving that is important but at the end of the day the GPU is still hamstrung by game developer/publisher practices.

Overblown Steam reviews aside, "Wait for sale" is unsurprisingly common nowadays.

AndrewJacksonZA · Aug 29, 2016

srmojuze :

Are you referring to Larrabee? If so, please go read this, written by Tom Forsyth who worked on Larrabee:
Why didn't Larrabee fail?

srmojuze :

Erm, how exactly do you figure that Intel is in a pickle?

Nett income for 2016 Q2:
AMD: $0.07 billion
Intel: $1.30 billion

R&D spend in 2015:
AMD: $00.95 billion
Intel: $12.13 billion

Even if we only take 10% of Intel's total R&D spend as spent on it's x86 chips as opposed to it's other projects and that those other projects have zero impact on their CPUs, Intel will still have spent $1.2 billion on R&D for their CPUs.

AndrewJacksonZA · Aug 29, 2016

srmojuze :

It's sort of like the chicken and the egg, isn't it? I really praise AMD for helping people go back down to lower levels of coding by helping force Mantle and it's descendants down the throats of the Khronos Group and Microsoft.

(Speaking of low level, INT 21h, anybody? ;-)

JQB45 · Aug 29, 2016

AndrewJacksonZA :

You can go even lower then INT 21h by using only BIOS interrupts or better yet IN/OUT. :bounce:

AndrewJacksonZA · Aug 29, 2016

blppt · Aug 29, 2016

I've never understood why AMD hasn't be able to optimize its core driver stack to be more efficient, rather than introducting Mantle (and, by extension, getting DX12 underway from MS). Its been quite clear for a while now that any game graphics engine that isnt very well threaded from the start has an advantage on Nvidia hardware, and it is generally accepted that nvidia's drivers have lower cpu overhead overall than AMD's.

Its why so far we've seen very little gains on Nvidia DX12 benches---their DX11 and below driver optimizations are already very good, and thus, DX12 doesnt offer a whole lot of benefits in real-world gaming.

DX12 seems to alleviate the inefficiencies in AMD's driver stack (i.e. less unoptimized barriers between the engine and the hardware) and thus we end up seeing the true power of the AMD hardware when DX12/Mantle/Vulcan are involved. Now, granted Nvidia's driver team is apparently many times the size of AMD's, but this has been the case for years...you would think that by now SOMEBODY on Team Red would have figured out how Nvidia is able to create such an efficient driver stack for DX11 and below.

bit_user · Aug 29, 2016

AndrewJacksonZA :

I'm pretty sure @srmojuze is talking about their HD Graphics iGPUs. That architecture has untapped potential, which we'll all behold once Intel stops messing around with eDRAM and puts a big chunk of MCDRAM in package.

Alternately, they could take on the GPUs from AMD and Nvidia directly, and scale up their HD Graphics into a standalone card. I'd bet it could easily beat their own Knights Landing (Xeon Phi), at GPU-compute, if they scaled anywhere near as big. Now, I don't realistically expect they'll go this route, but it's always possible.

AndrewJacksonZA · Aug 30, 2016

bit_user :

As Tom Forsyth said on his blog post, Intel had (still has?) engineers begging to create a powerful GPU but were denied the chance because of whatever reason.

I personally suspect internal politics, but it could be something along the lines of their legal team was scared that they could be seen as trying to create another monopoly-like situation where a titan (pun intended) comes along and crushes all existing players. But hey, that's just an off the cuff thought, not even a theory. 🙂

InvalidError · Aug 30, 2016

AndrewJacksonZA :

Intel is already sort of already there based on Broadwell's results, all it needs is scaling up a bit and 1-2GB of eDRAM/HBM/HMC at 200GB/s to nuke most of the remaining sub-$150 dGPU segment.

AndrewJacksonZA · Aug 30, 2016

InvalidError :

And there you go, they haven't done it. They have the smarts, they have the manufacturing capability, but they've chosen not to do it.

InvalidError :

A Polaris-sized 230mm² die based on Intel's true 10nm manufacturing process, anyone? 🙂

I'm not 100% sure, but does anyone know how Broadwell's IGP compares to my 6670 DDR3 1GB? Tom's GPU Hierarchy Chart only lists Intel's 530 and Iris Pro 6200. Broadwell has the 520, right? (Well, the i5-6200U has it at any rate.)

InvalidError · Aug 30, 2016

AndrewJacksonZA :

There isn't much good benchmarking data for Intel IGPs thanks to most review sites having little interest in IGPs and most people who would buy an i5-5675C/i7-5775C not being interested in using the IGP. The large price premium, lack of availability and imminent Skylake launch turned the remainder of the market away.

Broadwell has Iris Pro 6200, which is roughly on par with the R7-240 (better in some cases) and 30-50% ahead of the next fastest IGP at the time. I imagine that having 1GB of eDRAM instead of 128MB would have raised numbers quite a bit, especially at anything above low detail 720p. Broadwell got hurt kind of bad by the extra latency introduced by the L4 cache, so the next iteration may need to treat eDRAM as a NUMA region instead of a cache to avoid that.

bit_user · Aug 30, 2016

AndrewJacksonZA :

For paper comparisons, I love Wikipedia.

https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units#Radeon_HD_6xxx_Series
https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units#Eighth_generation

But, on paper, my GTX 980 Ti should only be about twice as fast as my old HD 7870. The reality was much different.

JamesSneed · Aug 31, 2016

InvalidError :

JamesSneed :

Nothing stopped game developers from splitting the frame preparation work across multiple threads and then letting the display thread forward the data to the drivers before other than most developers not being willing to put in that extra work. I doubt DX12 relieving some of the overhead from developers (no longer needing to merge lists prepared by different threads or put them in a queue for the dispatch thread, a trivial operation in the first place) will change this all that much. Developers will still need to go through all the extra multi-threaded workload partitioning work that they didn't want to do before to make use of DX12's multi-threaded dispatch.

DX12 is not a panacea. It won't miraculously enable DX12 games to go massively multi-threaded. Most of the burden is still on the programmers' laps and all DX12 does is loosen the API bottleneck.

No argument from me and in fact had the same concerns. I do DBA work for a living and am aware of a lot of the issues around multi-threading. However getting stronger single threaded performance is getting damn hard as the process node shrinks start to slow down, so something has to give.

InvalidError · Aug 31, 2016

JamesSneed :

The only thing that can 'give' is application developers delegating more work to worker threads when the results aren't performance/timing-critical to the control thread. But the delegated items need to be large enough to make the delegation overhead worth the effort, which is not going to happen for the countless quick checks and adjustments control threads typically need to do.

Of course, developers could also choose to live with the single-threaded performance brick wall instead of bothering to attempt re-factoring their code to make it more threadable. I suspect the majority of software will remain in this category and only software/games that really need to push the envelope will implement more than minimalist or automatic (compiler/library/API/framework/etc.) threading.

AMD Unveils Zen Microarchitecture, Demos Summit Ridge Performance

Titan

Splendid

Titan

Judicious

Titan

Distinguished

Commendable

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Titan

Distinguished

Titan

Distinguished

Titan

Titan

Judicious

Titan

Share this page