News AMD Ryzen 9 4950X Rumor: Could it Spell Trouble for Intel Gaming Supremacy?

King_V

Illustrious
Ambassador
Reports indicate that this also pertains an engineering sample of the 16-core part, making it the successor to the 3950X, likely to be called the 4950X -- unless AMD jumps straight to the 5000 nomenclature for the Zen 3 "Vermeer" parts, in which case it will likely be called the 5950X. A move such as this wouldn't be all too surprising given that the current 4000-series chips are APUs based on the Zen 2 architecture.

I know this isn't the main point of the article, but PLEASE, AMD, stop with the x000 model of the APU being based on the (x-1)000 CPU architecture. Have them match up!
 
Could the 100mhz increase make the difference in games?
it depends on what its 4.8 means. old had 4.6 advertised so ~200mhz on paper.
but then we add pbo and other this chip might scratch 5.0 mark without OC.
and games top performance is usually limited by single super thread that schedule work on other threads.
so 5% faster master thread is allowing you to use 13 threads not 12 like last time -> quite a lot of gains.
this is idealized example so MAYBE up to 10% in real games ?
I think it will be needed to saturate what nvidia sends our way....

if its zen3 you are more interested in hardware change that make single MHZ like 15 % faster already (google IPC).
if that rumor is true then this will add up to ~20% faster cpu.
 
  • Like
Reactions: gg83

InvalidError

Titan
Moderator
Could the 100mhz increase make the difference in games?
The 100MHz is mostly irrelevant. What will make a big difference in games is having twice as many cores and L3 cache per CCX to reduce the amount of off-CCX traffic and associated latency: CCX-to-CCX traffic is where Ryzen takes its largest performance hits vs Intel and core clock frequency does not help with CCX-to-CCX latency, especially if traffic goes across CCDs. It is also the main reason why fabric clock is such a big deal for Ryzen.
 

JayNor

Reputable
May 31, 2019
430
86
4,760
Norrod's presentation last year doesn't offer much hope for going higher clock rates.

View: https://www.youtube.com/watch?v=VqFk_Yae-kU&feature=youtu.be&t=411


Looks like 3D manufacturing could enable some performance increase... stacking memory on top, for example, or putting the cores on top of the io chiplet, as Intel did with Lakefield.

Norrod states clock rates could regress at the next node. Some might say the "next node" has already regressed, with both Intel 10nm and AMD 7nm designs unable to match Intel's 14nm boost clocks, which are up around 5.3GHz now.

TSM's claims provide some trade-off of performance and power improvements on smaller nodes, but previous comments from AMD were that TSM's statements were about very simple circuits and not necessarily applicable to their large logic designs.

There are performance solutions employing HBM stacks. Perhaps that can be extended to multiple HBM stacks and wider HBM interfaces.

There are performance solutions employing integrated accelerators. Intel is doing a lot of that with the P5900 solutions and has created a Xeon product with an embedded fpga. Maybe we'll see more of that. The EMIB stitching can result in lower energy multi-chip solutions within a package, as Intel demoed with the Kaby Lake G chip.
 
  • Like
Reactions: gg83
The 100MHz is mostly irrelevant. What will make a big difference in games is having twice as many cores and L3 cache per CCX to reduce the amount of off-CCX traffic and associated latency: CCX-to-CCX traffic is where Ryzen takes its largest performance hits vs Intel and core clock frequency does not help with CCX-to-CCX latency, especially if traffic goes across CCDs. It is also the main reason why fabric clock is such a big deal for Ryzen.
The other big question is what the all-core (or typical) clocks will be for heavy workloads. Intel's 9900K for example runs 4.7GHz all-core, all day long, even though the boost clock is 5.0GHz and the base clock is 3.6GHz. From what I saw in testing, the 3950X typically ran most heavily threaded applications (including games!) at around 4.2GHz. So Intel is ~300MHz less than boost and AMD is ~500MHz less than boost. Zen 3 could change that, and if it gets typical clocks of ~4.5GHz instead of ~4.2GHz, that would be a decent bump in performance.

But yeah, latency stuff is going to be a bigger factor I think. Paul and I have also talked, and Renoir being OEM-only for desktop is very likely AMD trying not to spoil the Zen 3 party.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
But yeah, latency stuff is going to be a bigger factor I think. Paul and I have also talked, and Renoir being OEM-only for desktop is very likely AMD trying not to spoil the Zen 3 party.
Core to Core latency and RAM to CCX latency are things AMD needs to work on, the Chiplet approach has that inherent detriment and there are numerous solutions to it short of being Monolithic Die.

cIOD is just old school NorthBridge moved onto the CPU substrate with shorter traces, but not quite as good as integrated NorthBridge. It's one step back to have Yield gains with CCX packages. Nothing wrong with it, but there are optimizations that can be done with RAM Slot traces in terms of DIMM placement relative to cIOD placement of RAM controller interfaces for largely similar latency across all DIMM's, but it would require AMD to give a design directive of placing DIMM slots on both sides of the CPU socket and arranging the RAM ports and Memory controller to match for minimizing RAM latency to a very narrow band instead of the large variable band that it has now.

As far as ZEN 3 and Renoir, I think Renoir is it's own surprise hit and Zen 3 targeting DeskTop won't affect each other since the market for both are largely separate due to use case.
 
  • Like
Reactions: gg83

gg83

Distinguished
Jul 10, 2015
655
305
19,260
The 100MHz is mostly irrelevant. What will make a big difference in games is having twice as many cores and L3 cache per CCX to reduce the amount of off-CCX traffic and associated latency: CCX-to-CCX traffic is where Ryzen takes its largest performance hits vs Intel and core clock frequency does not help with CCX-to-CCX latency, especially if traffic goes across CCDs. It is also the main reason why fabric clock is such a big deal for Ryzen.
Right. I think that's why the 3300x is a great performer? There's no ccx-ccx?
 

InvalidError

Titan
Moderator
Right. I think that's why the 3300x is a great performer? There's no ccx-ccx?
Yes, that's why it is nearly impossible to overclock the 3100 enough to make it catch up with the 3300X despite having the same total number of cores and L3$ on the same architecture and why the 3300X can give the technically far more powerful 3600(X) a run for its money in some cases.
 
  • Like
Reactions: gg83

Flemishdragon

Commendable
Mar 15, 2019
18
0
1,510
I rather have it running 5Ghz+ and much more power hungry at least have the option Intel is more power hungry anyways. Even for 200mhz + your favorite appz just feel that much more snappier and even a lot sometimes it's worth it!
 

PBme

Reputable
Dec 12, 2019
62
37
4,560
The other big question is what the all-core (or typical) clocks will be for heavy workloads. Intel's 9900K for example runs 4.7GHz all-core, all day long, even though the boost clock is 5.0GHz and the base clock is 3.6GHz. From what I saw in testing, the 3950X typically ran most heavily threaded applications (including games!) at around 4.2GHz. So Intel is ~300MHz less than boost and AMD is ~500MHz less than boost. Zen 3 could change that, and if it gets typical clocks of ~4.5GHz instead of ~4.2GHz, that would be a decent bump in performance.

But yeah, latency stuff is going to be a bigger factor I think. Paul and I have also talked, and Renoir being OEM-only for desktop is very likely AMD trying not to spoil the Zen 3 party.


Exactly what I thought when seeing the 4.8 'boost'. I have had the 3950x since December and it is really a 4.2-4.3 chip at max for all core. I have little doubt the boost makes a difference in specific situations but it is for such a short duration it is irrelevant for any sustained compute jobs. I'm far more interested in what the difference is in the IPC and what clocks it can sustain all or even partial core.

And of course, if anyone's primary goal is gaming, the premium this 16 core model goes for would give you far better returns being spent on a better graphics card.
 

barryv88

Distinguished
May 11, 2010
121
33
18,720
Vermeer might just be the ultimate Zen that many have been waiting for. Things to consider that can contribute to its increased IPC over Zen2:
  • 7nm+ node should bring around 5-10% IPC uplift over current 7nm.
  • Architectural changes. Could easily add another 5-10%? Hard to tell.
  • Clock speed increases. Another 3%?
  • Single CCX over dual CCX designs (for 6C and 8C models)?

All of this can add up to 20% over Zen2. If that's the case, Intel would be in big trouble and Rocket Lake will really have to come out guns blazing, given that it's not too hamstrung by their 14nm limitations.
 
Vermeer might just be the ultimate Zen that many have been waiting for. Things to consider that can contribute to its increased IPC over Zen2:
  • 7nm+ node should bring around 5-10% IPC uplift over current 7nm.
  • Architectural changes. Could easily add another 5-10%? Hard to tell.
  • Clock speed increases. Another 3%?
  • Single CCX over dual CCX designs (for 6C and 8C models)?
All of this can add up to 20% over Zen2. If that's the case, Intel would be in big trouble and Rocket Lake will really have to come out guns blazing, given that it's not too hamstrung by their 14nm limitations.

Usually a shrink brings a performance increase because they can fit more transistors into the same space,which would be a architectural change, or because it uses less power which they can put towards increasing clocks,you are adding the same thing three times here.
 

barryv88

Distinguished
May 11, 2010
121
33
18,720
Usually a shrink brings a performance increase because they can fit more transistors into the same space,which would be a architectural change, or because it uses less power which they can put towards increasing clocks,you are adding the same thing three times here.
I was trying to get a rough estimate of IPC increase (taking mostly everything in consideration). I'm going with 17%ish. What would you reckon abouts?
 
I was trying to get a rough estimate of IPC increase (taking mostly everything in consideration). I'm going with 17%ish. What would you reckon abouts?
The difference will be all over the place depending on what you benchmark,just like going from zen 1 to zen 2, some things will have no improvement while others might have 15-17-20% difference, if today's implementation is handicapping them that much.

Gaming doesn't rely on IPC heavy code,games are coded for the jaguar cores of the ps4 and xbone which is why ZEN which already has higher IPC than intel is still slower in games because it runs at quite lower clocks when all cores are loaded.
 
Practically all gaming benchmarks disagree, high-performance PC gaming requires BOTH strong IPC and high clock frequencies, can't sacrifice much of either for more of the other.
Gaming benchmarks are mostly flyby 3d renders without any user input,pathfinding or AI,maybe you get some scripted events but even then it's in the perfectly controlled environment that the benchmark creates,it's little more than 3d rendering with the help of the game engine.
 

InvalidError

Titan
Moderator
Gaming benchmarks are mostly flyby 3d renders without any user input,pathfinding or AI
Doing a benchmark without any of those things would require fully pre-recorded animation, wouldn't be much of a benchmark and require a stupid amount of extra storage. The more logical way of doing a repeatable benchmark is to replay inputs using fixed/recorded PRNG seeds for AIs and other algorithms, in which case the load on the game engine is effectively the same as actual play as all of the pathfinding, AI and whatever else all need to be re-calculated for playback.
 
Doing a benchmark without any of those things would require fully pre-recorded animation,
No all you need is the camera's x,y,z position and view angles, you feed that into the 3d engine and out comes a pretty picture.
Here's a simple example, you can see him move the camera and at the bottom are the coordinates.
And you don't even need a huge list of coordinates you just use logarithms.
View: https://www.youtube.com/watch?v=K6EIYtTVJ0Y

wouldn't be much of a benchmark
Indeed most aren't.
and require a stupid amount of extra storage.
They are using the game assets that are already part of the game...so yes stupid amount of storage but not extra.


Just look at the benchmark of strange brigade, they maxed it out to the complete limit,it's a completely static scene where other benches at least pretend that some things are moving.
View: https://www.youtube.com/watch?v=RY9jPbaAOOo