News Ryzen 7 7800X3D Smashes Core i9-14900K in Factorio Gaming Benchmark

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
I like to keep things simple...;) As I said, lots of software is not optimized to run with maximum performance on the user's hardware. Maybe that is simple, but it is also true. I just see little advantage to buying slower hardware simply because some software, limited in its hardware support or overall design, won't run very fast with it. Your point seems to me to be that I should buy a slower/less capable CPU because my software might not be able to take advantage of a much faster one. Now, if I was considering a business situation where such limited software would be all that I might run, then maybe that would make sense. But it would only make sense if said slower cpu was cheaper and a better buy, I think. And then, suppose the software is revamped to take advantage of hardware the slower CPU simply doesn't have? For me, it's best bang for the buck, always. So we'll have to agree to disagree, I suppose.
Yup, this train of thought worked great for nvidia with tessellation and now with rendering, just drown games in tessellation or render even though the games don't really need it just to make your cards look like they are much faster than the competition and people will go and buy them, just "take my money" .

Having games that actually need that much cache is fine and fair, but you can't optimize a game that doesn't need much cache to use much cache, you can just make it chew through cache uselessly to make a product look faster than the competition.

Devs will start making more games that need high amounts of cache if and when the consoles get big amounts of cache and even then, not all games can use high amounts of cache.
 
  • Like
Reactions: bit_user
But isn't it the whole point to have software and hardware that augment each other? What is the point of spamming cores, when some software don't fully utilize it? So it is time someone thinks outside the box and not just focus on increasing cores, and/or, clockspeed.
But what is the point of spamming cache, when some software don't fully utilize it?
Both Intel and AMD make the normal line up of CPUs with the amount of cache they use because that's what the golden mean is.
 

ilukey77

Reputable
Jan 30, 2021
794
331
5,290
But what is the point of spamming cache, when some software don't fully utilize it?
Both Intel and AMD make the normal line up of CPUs with the amount of cache they use because that's what the golden mean is.
That is when cache and core become king ..

which is why Intel is so far behind ..

Once AMD have mastered the high end X3d cpu's Intel will be grabbing their ankles and coughing ..

If they want to stay in the competition they need to be working on bigger cache and core cpu's to compete !!

There is literally no logical reason if a 8950x3d with its cache stacking and core count sorted to even bother with Intel if it can crush it in production and all forms of gaming !!

( not to mention efficiency which is why Intels 13900k brute forcing with massive power draw to achieve the wins it does is hurting them as well )
 

bit_user

Titan
Ambassador
But isn't it the whole point to have software and hardware that augment each other? What is the point of spamming cores, when some software don't fully utilize it? So it is time someone thinks outside the box and not just focus on increasing cores, and/or, clockspeed.
I'm sure Intel and AMD do detailed performance analysis of common workloads to see where the bottlenecks are. The idea of adding cache to help alleviate memory bottlenecks isn't new, but going to the trouble of chip-stacking to do it was a first*.

There's another thing AMD did that some might not appreciate, but that's to optimize L3 cache for local CCD access, at the expense of access by other CCDs. Because of this, Zen 4 has lower (same-CCD) L3 latency and higher L3 bandwidth than Alder or Raptor Lake.

Here's the latency diagram for the Zen 4:

zen4_latency_ns.png

Source: https://chipsandcheese.com/2022/11/08/amds-zen-4-part-2-memory-subsystem-and-conclusion/

Here, although Zen 4 lags on L1 & L2 bandwidth, it really pulls ahead on single-core L3 and memory bandwidth:

zen4_st_bw.png


Incidentally, @Alvar "Miles" Udell , if you look at these two charts, they show why I'm sure that speedup you pointed out is a mere coincidence. The latency of L3 vs. DRAM is less than 1/8th, meaning a max of 8x the transaction rate via L3 than DRAM. In terms of bandwidth, the ratio is a more modest 2.7x. However, this gives you a rough upper bound on how much one could gain by working exclusively out of L3 vs. DRAM.

And to those of us wondering how the latency situation might've improved in Raptor Lake, apparently it didn't. The only differences noted there were due to the cache size increase.


Not analyzed there were some of the ring bus tweaks that primarily affect core-to-core communication.

* We could debate about Ponte Vecchio, but that's a GPU and also its cache tile is self-contained. The impressive thing AMD did was use 3D stacking to extend a chiplet's integral L3.
 
  • Like
Reactions: Makaveli

bit_user

Titan
Ambassador
Both Intel and AMD make the normal line up of CPUs with the amount of cache they use because that's what the golden mean is.
There's nothing magic about their cache size decisions. It has nothing to do with "golden mean", but rather a cost/benefit tradeoff they made to balance system-level bandwidth, latency, scalability, power, and cost. I'm sure these are tricky and very much data-driven decisions, based in part on the costs & characteristics of the process node and detailed performance analysis.

Increasing cache sizes typically comes at the expense of latency and cost. Power is probably a secondary concern. These CPUs are designed to maximize performance/$, for the desktop, and that's compromised by the need also to provide good performance/W for servers and laptops.

The E/C-core development takes us in an interesting direction, wherein the P-cores are unleashed to pursue even better performance at the expense of greater cost and higher power, because they don't need to scale very well. E/C-cores will always be superior for highly-threaded workloads, so you can balance a few really fast, huge, power-hungry P-cores with lots of E/C-cores, in order to achieve good scaling on smaller & mid-sized CPUs. I see large CPUs increasingly going in the direction of E/C-cores, only - especially as P-cores get bigger and more power-hungry.

This is best illustrated by the absolute bloodbath that ensued when AMD's 128 C-core Bergamo processor was pitted against Intel's Sapphire Rapids. Go to the last page of this article, to see the Geomean scores & power figures:

The stock Bergamo system was tied for the lowest power, yet it smashed everything else - including AMD's own Genoa-X!
 
Last edited:

n19htmare

Distinguished
Jan 29, 2007
32
0
18,530
It's not the fact that it's faster that is noteworthy, it's the margin by which. 64% is extreme.
Which correlates with the extra cache on the 7800x3d (96MB vs 36MB). If the map fits entirely in the cache, then it's going to be that much faster.
What happens when map doesn't entirely fit in the cache and has to go out to the memory? Have you seen those benchmarks? Unless you're strictly keeping maps at 96MB to fit 7800x3d, this is useless.
 
Last edited:
There's nothing magic about their cache size decisions. It has nothing to do with "golden mean", but rather a cost/benefit tradeoff they made to balance system-level bandwidth, latency, scalability, power, and cost.
When did the golden mean become magic?! It's science, or philosophy if you will, it's about finding the best spot between two extremes, in this case cost and benefit.
The golden mean or golden middle way is the desirable middle between two extremes
 

bit_user

Titan
Ambassador
With the performance differences between the two always having been rather minuscule and debatable,
Always??

Prior to Zen, AMD wasn't even in the game. Zen 1 was only good for scalable workloads, where you could take advantage of the extra cores. For lightly-threaded tasks, it wasn't very competitive against Skylake/Kaby Lake.

Zen 2 was a big leap for AMD, almost catching Intel while they were largely stalled by their 10 nm troubles and the logjam effect that had on microarchitecture improvements. Zen 3 cemented AMD's lead over Intel's 14 nm CPUs.

Since then, it's been a game of leap-frog, with the lead going back & forth: Golden Cove, then Zen 4, then Raptor Cove. Yes, they're now close enough that you can afford to step back and make a real cost/benefits-based decision, rather than being stuck in one camp or the other.

I believe that the ongoing ‘love and hate’ relationship between Intel and AMD only has two outcomes. One either is a Blue or Red fan no matter given all of the back and forth performance rhetoric.
Some of us are just trying to analyze what we see. One doesn't always have to take sides.

If you're not into these levels of details, then I think you'd best be served just by looking at the benchmarks by respected reviewers, and leave it at that. I think the technical details are only worth following if you're into that sort of thing - and mostly for their own sake, rather than to try and inform purchasing decisions.

But like my Mother always said: “A wish is just a wish and dreams very seldom come ever true!”
I respect the dreamers and doers who bring their thoughts and ideas into reality, in the form of these amazing machines. I want to appreciate the challenges they faced, how they tried to overcome them, and celebrate their achievements, when they did. I'm as much on their side as I am the consumer's.
 
Last edited:
  • Like
Reactions: Makaveli

bit_user

Titan
Ambassador
When did the golden mean become magic?! It's science, or philosophy if you will, it's about finding the best spot between two extremes, in this case cost and benefit.
Eh, that's such a generic statement it's effectively meaningless. Modern CPU design is a game of cost/benefit tradeoffs. It's also very data-driven, with whole chip-level simulations being used to run a variety of workloads, in order to test out different parameter combinations.

To me, calling it "Golden Mean" suggests there's an obvious answer. It seems dismissive of all the work that goes into making these decisions and how context-dependent they are.

The other thing is that it doesn't seem to acknowledge there's not one right answer for everyone. The reason AMD didn't put 3D V-Cache on all of their Ryzen 7000 desktop CPUs or EPYC Genoa CPUs is that they recognize it's not the best cost/benefit tradeoff for everyone. This is especially notable with EPYC, where it never even reaches the clock ceiling that the Ryzen X3D CPUs hit. So, the only real tradeoff there is just perf/$, and that's highly workload-dependent.
 
Last edited:

bit_user

Titan
Ambassador
You are like one of those people in sitcoms that always understands something the wrong way just so that the plot can happen...
I explained why I didn't like that term, and I did it without an ad hominem attack. I accept that different people can hear the same words differently, which is why I took the time to explain my perspective. You're welcome to disagree with me, but please do so respectfully.

To suggest that I'm just trying to create drama completely misses the substance of my posts. I make these posts to inform myself & others. It's not as if I don't have other things I could be doing. If you don't find them worthwhile, you're quite welcome to ignore them.
 
Last edited:
  • Like
Reactions: Makaveli

bit_user

Titan
Ambassador
To help appreciate the evolution of CPU caches, I've made a couple charts.

oF0b06C.png

Since Zen 1, AMD has been relatively conservative about increasing cache size. The only changes they made were:
  1. Doubling L3 from Zen 1 -> 2
  2. Doubling L2 from Zen 3 -> 4
  3. Introducing optional 3D V-Cache in Zen 3
If we include 3D V-Cache, that does map to exactly one change per generation.

For Intel, I thought it'd be interesting to go back a bit further:

dw83PlR.png

(click to zoom)
What's striking is how much Intel seemed to like the 64/256/2048 combination, which lasted all the way until their first 10 nm core. At that point, I guess they decided it made sense to have a smaller, lower-latency L1D cache and double L2. (FYI: Palm Cove was the core of the il-fated Cannon Lake CPU). Then, in Ice Lake's Willow Cove, they boosted L2 again to 1.25 MiB, where it stayed until Raptor Cove.

It might look odd to have nearly as much L2 cache per core as L3, but two things to keep in mind are that:
  • L3 is shared by all cores, while L2 (for P-cores) is private.
  • Around the time Intel started increasing L2, I believe they switched their L3 to a victim cache, which effectively makes L3 supplemental rather than redundant.

BTW, to keep the comparison nice & clean, I limited the stats on Alder & Raptor Lake to just the P-cores.
 
Last edited:
A real 'black mark' for Intel - releasing a new CPU that performs worse than an existing AMD.
This isn't the first time a processor has been released that performed worse than predecessors.

Also, I will note that you are basing that statement from 1 benchmark. This is also why they have multiple benchmarks, including real games, synthetics, etc....Not every processor is designed the same way and different programs will work differently with those processors.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
For Intel, I thought it'd be interesting to go back a bit further:
Since that was kinda fun, I decided to keep going. I went all the way back to the first Pentium, since that's the point where I think they first subdivided L1 into separate halves for code & data. Note that I didn't include all versions, but essentially just the ones which best slotted into the process node progression.

nfkgPIE.png

(click to zoom)
The first thing to note is that Pentium Pro & II (Deschutes) had L2 on a separate die. That's why we see the regression in Coppermine, which is monolithic.

Next, it's a little weird how the first two Netburst cores had smaller L1D than the Pentium II and Pentium III. Maybe that was done to reduce latency or in an attempt to reclaim some die-space.

Presler is the newer of the Pentium D's and was the 2-die version. I think only the Extreme Edition might've sold with 2 MiB L2 enabled, but I decided to include that spec to be consistent with showing what the actual hardware could do.

Core 2 is where the previous plot started. It seems pretty clear that Nehalem simply took advantage of having L3 to make L2 leaner and meaner. That formula seems to have been successful enough to last for an entire decade and across 4 nodes!
 
  • Like
Reactions: thestryker
Since that was kinda fun, I decided to keep going. I went all the way back to the first Pentium, since that's the point where I think they first subdivided L1 into separate halves for code & data. Note that I didn't include all versions, but essentially just the ones which best slotted into the process node progression.
nfkgPIE.png
(click to zoom)​
Missing Tualatin for great shame! I kid, but they were fantastic CPUs which Intel intentionally kept away from the consumer market for the most part due to the P4 being trash in comparison.
 
  • Like
Reactions: bit_user
That part surprised me. Intel calls it Intel 7, in order to align themselves with how TSMC and Samsung call their roughly comparable nodes.

However, the very same node currently known as Intel 7 was once called 10 nm ESF (Enhanced SuperFin). So, the author isn't wrong to say "10 nm".

Hope Intel 4 (7nm) for Meteor Lake is mass production ready. I dont know if this is the first time its used on its consumer chips and about the yield rates..

any idea if intel has disclosed the process density comparison with its competitors once again for Intel 4?
 
Hope Intel 4 (7nm) for Meteor Lake is mass production ready. I dont know if this is the first time its used on its consumer chips and about the yield rates..

any idea if intel has disclosed the process density comparison with its competitors once again for Intel 4?
Intel 4, at least from public information Intel has released, is a one and done as far as their products are concerned.

None of the numbers may really matter at all (aside from giving a clue about Intel 3), but AFAIK this is still the latest: https://www.anandtech.com/show/1744...il-2x-density-scaling-20-improved-performance
 
  • Like
Reactions: Lucky_SLS

HWOC

Reputable
Jan 9, 2020
147
28
4,640
I don't understand why it seems to be hard for many people to accept that the X3D chips excel in some workloads and are pretty bad in others. So what if it's faster than Intel's greatest in a couple of games, or that it sucks in some workloads? No one is forced to buy one! And no one is forced to buy the latest Intel chip either.

I just got my 5800 X3D delivered, to replace a 5600 X, and I don't expect to see any real world difference in any game that I play whatsoever. :LOL: I bought it because I love the brutalistic idea behind it, just slap on more concrete on top of the existing old building. Or, you know, something like that but in CPU terms. I do play Factorio, but I've never had an issue with it running slowly. And heavier games are bottlenecked by my RTX3070 anyway running 3 x 1080p monitors. But a small drop of saliva does drip from the side of my mouth when I think about that beautiful stack of cache inside that chip... :yum:
 

bit_user

Titan
Ambassador
I don't understand why it seems to be hard for many people to accept that the X3D chips excel in some workloads and are pretty bad in others. So what if it's faster than Intel's greatest in a couple of games, or that it sucks in some workloads?
What's notable is the amount of benefit it provides. I think we've never seen it provide such a completely massive benefit, before.

Others have noted the benchmark could be artificial and not representative of normal gameplay.
 

bit_user

Titan
Ambassador
I'm confused, why is the 7800X3D listed twice?
The results appear to have been taken straight from the leaderboard, which indeed has those two separate entries. Can't say why. The top of the page contains the following notice:

"Results vary depending on CPU clock, memory clock, memory timings, etc., so take these numbers with a grain of salt."

Link from the article:

The top result does appear to be an outlier, FWIW.
 
Last edited:

HWOC

Reputable
Jan 9, 2020
147
28
4,640
For the top two 7800X3D results, the first one is stated to be using 2 x 32 GB 505 MHZ RAM, the second best 2 x 16 GB 600 MHZ RAM. Factorio is known to love low latency memory, the timings in the scores were not specified.
 

NinoPino

Respectable
May 26, 2022
434
262
2,060
For me, what's a little surprising is that its sweet spot lands so squarely within the additional capacity provided by the 3D VCache. Had its working set been just a bit larger or smaller, maybe the X3D models' advantage would've evaporated.
It's not for sure, maybe a larger cache can improve things a lot more. May be the map they choose to benchmark is right within that specific X3D cache size.

I'll bet the game's developers probably weren't aware of this. Given a bit of time, they could probably optimize it to work a lot better on non-X3D models.
I agree with you, but developers cannot do miracles, if the map is large, have a lot of agents, etc... the memory footprint cannot decrease below a physiological limit.

I don't know about you, but I'd be pretty surprised to see a game which scaled well to so many cores. That would also be newsworthy!
Quickly, I can think of chess games or all software rendered games, and I bet there are a lot of games that potentially can use many cores if optimized in such a way.
The problem is that in the last years the power of graphics cards have relegated CPU importance is the background. So who struggle to optimize the CPU part if anyway it's limited by the GPU power ?
 

NinoPino

Respectable
May 26, 2022
434
262
2,060
I have no idea where you got that idea.
I think @waltc3 say so is that you insist on argue such as cache size is not very important.

The only point I tried to make is that the benefit of having more cache is workload-dependent.
This is an obvious point and all of us knows this.
But if I say to you to choose from a CPU with 1MB of cache and another with 4MB, at the same clock, architecture, etc.. what CPU do you think go faster ?
Generalising, more cache is better, do you agree ?
 
Status
Not open for further replies.