Discussion: AMD Ryzen

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


Sounds realistic to me. If that's how it turns out, could be a pretty nice boon for market share, at least from system builders. I would definitely consider that for me and my girlfriend.
I've forgotten, has anything been "confirmed" or close to confirmed in terms of PCIe lanes?
 


I would expect the Hex core to be the best bang for buck as well. And recent benchmarks I've seen are showing 6 cores even @3GHz out performing quad cores @3.6GHz. Game engines are slowly getting better at using more cores and the architecture of the latest consoles likely has done a big part to advance that along.
 


yer. AMD have been planing that since they got the Xbox, PS4 and wii U using their apu. if most of the game developer want to write games for the xbox and ps4 and port it to pc or the other way around. they need to use all the core there is. since the ps4 and xbox doesn't have that much hours power.
there been 2 road for the CPUs 1st which intel took making more and more IPS "Instructions per second" cpu and 2nd AMD took making more core cpu. and since all games are single threaded back in the days. Intel became the "future proof" system every one gets. and AMD only have less than 5%? market shear. and now the "future" have just made a turn. a turn which more core matter. that is why intel came out with the 10 core cpu.
but since IPS matter with multicore or not. so it a win win for us user : )
 


As I've noted for years, both the PS3 and 360 had six core CPUs, so can we please stop the "more threading because new consoles" nonsense? You're repeating the same arguments from 2006 all over again. We're getting slightly more threading in GPU driver land because the APIs (DX12/Vulkan) now allow it, but as we're already seeing, this isn't leading to any significant performance gain, but rather a general lowering of baseline requirements due to reduced CPU overhead.

The situation in regards to threading in games is the same as it's ever been. Games, by design, simply do not scale beyond a handful of useful threads, because they have tasks that need to be performed in a specific order. As a result, games will always bias toward more per-core performance rather then more cores.

Seriously, we're just repeating the same exact arguments we made back in the BD thread from 2009. Don't make me start quoting my arguments from eight years ago, because they are just as correct today as they were then.
 


The six-core was between 2% (unnoticeable) and 12% faster than the quad core (when using the GTX 1080). When using the GTX 1070 the gap was even smaller. A quad-core at 4GHz (i.e., 11% higher clocks than what they tested) would close the highest gap they measured (12%) with the six-core and win in the rest of benches where the gap was smaller.

Independent DX12 benchmarks show similar results:

We benchmark an eight-core CPU in three DirectX 12 tests, and the results may disappoint you.

I’d say for the vast majority of gamers, the sweet spot lies somewhere between a quad-core with Hyper-Threading and a six-core on the Intel side of the aisle. A Skylake Core i5-6600K will be fine for DirectX 11 games and probably the vast majority of the early DirectX 12 games, but the lack of Hyper-Threading will eventually hurt.

I think a highly clocked quad-core i7 will remain a better choice over six-core or eigth-core for next years.
 


The core count on the consoles was chosen by game developers. And they rejected anything above 8-cores (i.e. more than 8 threads) because would be difficult to program.

It is worth to recall that only six-cores (six threads) haven been traditionally used for games, with both Sony and Microsoft giving developers partial access to a seven core in late times. Therefore games will be optimized for 6--7 threads and will be well handled by a 4C/8T processor with higher clocks.

Also Intel didn't came out with the 10 core cpu because games will use more threads. Benchmarks show that 8C/16T is already irrelevant even for most favorable cases (synthetic gaming benchmarks). Intel has increased the core count to ten, because Intel always increases the core count with each new die shrink, due to more space in the die. This has happened since 45nm Intel chips at least.

45nm: 4-core
32nm: 6-core
22nm: 8-core
14nm: 10-core
10nm: ______

It is not difficult to predict that core count will be increased again for Intel 10nm.
 
*sigh*

Games have been using over 50 threads for decades now, the problem has always been, at the end of the day, only one or two can do meaningful work at any one point in time. DX12 allows the GPU thread to be broken up into smaller units, but that by itself is only going to help lower class CPUs, and really won't affect more powerful i5/i7 units as they weren't bottlenecked in the first place. The rest of games will still largely remain single threaded, simply because that's what game processing demands.

Seriously, we're repeating the same exact arguments again. "Consoles are using six cores, so PCs will scale that well too, so BD is a great design for the future". "Quad core CPUs are dead". And so on. Nothing has changed, and anyone who understands how to actually write software knows that games will continue to be dominated by a handful of threads that do the majority of the work. Period.
 
I really am confused by the high core count on some processors. I was contemplating the 6800k but the benchmarks Tom did showed that the 6700k was quicker in most of the workloads. Does it mean that there is only a handful of developers that program their tools with 4++ cores in mind? How can you even tell if something will profit from more cores, apart from direct benchmarks?

Seeing all this made me doubt Zen a bit. If they can't deliver single core performance then it seems there will be no need for an 8-core CPU, apart from server applications, maybe.
 


It's really quite simple: Most tasks CPUs perform simply do not scale well.

CPUs are good for doing one workload at a time.
GPUs are good for massively parallel workloads.
Multi-Core CPUs are good for doing multiple independent workloads at the same time.

The "independent" part is the problem. Take games: You have multiple parts of the game engine that need access to the same data. As a result, you have to perform most of the processing in a specific order to ensure memory is not corrupted, which limits how much extra performance you can gain by adding more CPU cores. That's why games typically have two heavy threads [Main game engine and Main render thread], plus some helper threads, which explains why top-tier i3 processors can offer similar performance to i7's, or why Pentium class CPUs can sometimes hang with the FX-8350.

Another way to think of it: Take a dual core CPU. Now remove one CPU core, but double the clockspeed. Both of these CPUs have the same theoretical maximum performance, but the higher clocked single core CPU will ALWAYS offer better performance, because it's harder to keep both CPU cores of the dual core CPU fed.
 


It is relevant to Zen as the perception is that it will not be able to clock as high as Intel parts. So they will need again to use more cores to remain competitive.

Ex here where you can see a 6-core with lower speed slightly out performing a 4 core with higher speed.
1465470737-9036-photo-pc.jpg

1465470286-7997-photo-pc.jpg


More details http://forums.anandtech.com/showthread.php?t=2476469
 
Ah, I wasn't very specific. I meant Video/Photo rendering and such. Applications where I though that more cores would make a tremendous difference but it is not the case, as it seems or at least seldom. Apart from synthetic benchmarks, of course.
 


That is oversimplifying it, and it's not entirely correct. If you need to have artificial locks on data structures or incur in context switches, then having twice the performance over a dual-core won't make a single-core better. You will still need to move data across different memory spaces and the end result would be the same for either case.

At the end of the day, having more cores is always better, because the *main* threads can run in less obstructed cores by other threads that want some CPU time. The problem of having 50 threads is not having them spread evenly across single processes (own memory space and own CPU slice). When Devs start actually using threading as it is meant to be, then you will see a massive uplift in performance. I can give you a very simple example of how JBoss has *massive* performance up scaling with more cores available going from 1.4 to 1.5 and then to 1.6. When you start actually making the atomic algorithms inside the program go wide, you can't go wrong.

For the "games" case, given the amount of stuff you need to synchronize, then you will need a decent amount of help from the CPU to make it worth your while. DX12 is the first step in the right direction. Async queues are a MASSIVE improvement for RTS'es and once Blizzard picks it up, you will notice why. Ashes is a good showcase, but it's just the first incarnation. Whenever you have a simple problem that can go as wide as you want, more cores are always the answer 😛

Cheers!
 
That is oversimplifying it, and it's not entirely correct. If you need to have artificial locks on data structures or incur in context switches, then having twice the performance over a dual-core won't make a single-core better. You will still need to move data across different memory spaces and the end result would be the same for either case.

If you have two threads that need access to the same structure at the same time, you shouldn't have two threads to start with. Putting that aside, taking your example, the single core case would be slightly faster simply because you remove the overhead involved when the CPU has to transfer data across it's core-independent CPU caches, as well as a slightly faster scheduling by the OS (only one CPU to consider).

[quote[At the end of the day, having more cores is always better, because the *main* threads can run in less obstructed cores by other threads that want some CPU time. The problem of having 50 threads is not having them spread evenly across single processes (own memory space and own CPU slice). When Devs start actually using threading as it is meant to be, then you will see a massive uplift in performance. I can give you a very simple example of how JBoss has *massive* performance up scaling with more cores available going from 1.4 to 1.5 and then to 1.6. When you start actually making the atomic algorithms inside the program go wide, you can't go wrong.[/quote]

Disagree. You now get into the icky world of Bandwidth versus Latency.

Take the Linux perspective for a moment: Every thread gets an equal chance to run, and is dispatched to a single CPU core, with the cores reloaded periodically. Seems ideal right? Until people found that, whoops, high priority threads were waiting excessive periods of time to execute, and issues with the core loading algorithms were found. Sure, you got nice latency numbers, but the Linux approach is crap when you have just a handful of threads you really don't want interrupted.

In that regards, Windows is the better solution: The highest priority thread(s) always run, every other thread in the system be damned. Running threads have their priority slightly lowered, waiting threads have their priority raised. Ensures applications that have a few heavy threads [Like Games] get optimum performance at the expense of the rest of the system [this is also why running other high priority applications, like FRAPS, kills gaming performance].

The optimum CPU is an infinitely fast single core. Failing that, I'll take the faster cores over more cores pretty much every time, if total CPU performance is equal. The faster cores have a lower chance of being individually bottlenecked, and if they aren't, the only downside compared to the multi-core solution is slightly higher latency.

For the "games" case, given the amount of stuff you need to synchronize, then you will need a decent amount of help from the CPU to make it worth your while. DX12 is the first step in the right direction. Async queues are a MASSIVE improvement for RTS'es and once Blizzard picks it up, you will notice why. Ashes is a good showcase, but it's just the first incarnation.

Again, if you need to synchronize, you shouldn't have threads there.

All DX12 is doing is breaking up the render thread. The total amount of work done by the CPU is still roughly the same. And if a core wasn't bottlenecked prior, there's no real performance gain as a result of breaking up the render thread. Hence why lower tier CPUs see the largest performance gains.

Async Queues are basically the GPU form of pipelining. Simply keeping currently unused portions of the GPU doing work to improve performance. Nothing special, aside from that fact DX11 didn't allow this type of processing.

Whenever you have a simple problem that can go as wide as you want, more cores are always the answer 😛

Problem is, most tasks that are performed on the CPU aren't wide problems. That's why we developed GPUs, to handle those EXACT types of problems.
 


Welp, the company Firewall decided to eat my POST information so all the text was lost. I'm too lazy to re-write everything, haha.

TL;DR was: You can grab *any* program source code today and make it parallel extracting more performance from it. Current frameworks just waste a lot of resources doing stupid threading they don't need and the crucial parts that could be made parallel aren't because it is complex.

Also, Dijkstra-type of problems (routing) and sorts. They need synchronization. If they are not made parallel, their performance sucks.

Cheers!
 


That is just what I replied above

http://www.tomshardware.co.uk/forum/id-2986517/discussion-amd-zen/page-2.html#18113818

The gap between the 6-core @3GHz and the 4-core @3.6GHz was very small with the GTX 1080 and smaller with the 1070. A 4-core @ 4GHz or higher clocks tie or win.
 


If it was as easy as you say then every game would run great on AMD hardware.
 


Well, if you notice, most software that can actually go wide runs very decently in AMD hardware (PD, that is). Then you have the second beast in the room, which are target optimizations for software and compilers. We've already discussed that a million times, but to make it short, most people developing software can't target AMD systems as their primary. Intel is way more widespread on one hand, and on the other they provide the best compiler out there for Windows AND Linux. Only in Linux, through GCC/G++ and LLVM, you can see how good PD can be when properly optimized for it. It still won't reach/touch Intel, but it's way way better than in Windows.

Cheers!
 


First off, you're generalizing. "Most software". "decently". Lets be clear, the type of applications that show of PD the best tend to be benchmarks that, heaven forbid, use the types of algorithms that scale across CPU cores. Which isn't shocking in the least; I predicted BD would be a benchmarking queen over two years before it released!

Secondly, let's just stop with the scaling argument right now. Let's look at Fallout 4:

http://gamegpu.com/rpg/rollevye/fallout-4-test-gpu.html

proz.jpg


amd.jpg


intel.jpg


About what you'd expect performance wise, with the FX-8350 matching Haswell i5's. But note the core scaling, and note the performance difference between AMD and Intel. Despite the fact FO4 is actually stressing all eight cores of a 8350, it's still loosing to the lower clocked 2600k. There's no CPU bottleneck to be removed, your just seeing the difference in IPC in the results. Both CPUs are spending about the same amount of time doing work, but Intel simply executes more instructions during that timespan. Simple as that.
 


I am generalizing, because it's true...



I don't know about FO4, but look at Ashes and then compare it to Starcraft II. That is more in-line on what I am talking about.

Gamerk, Startcraft is a *dual* threaded game by *design*. Even on Intel machines it won't go past loading 2 cores (at least, Wings of Liberty; haven't checked out the new releases). Do you really believe an RTS does not benefit from more cores to give a better gaming experience? Given the nature of how it works, more cores should be utilized as they are available.

Any game that has a lot of things going on that are not graphical in nature and belong to "AI", should be threaded somehow (I am sure they are) and you can start counting other things as well. I remember how Crytech make the Cryengine 3 use the CPU for some effects when they needed them and it actually made more core CPUs show off a bit more. I also remember posting here a screenshot with RAGE using all of my Phenom II cores at the time.

I bet DOOM, since it's based on IDTech 5+, should be very well threaded as well.

Cheers!
 
Gamerk, Startcraft is a *dual* threaded game by *design*. Even on Intel machines it won't go past loading 2 cores (at least, Wings of Liberty; haven't checked out the new releases). Do you really believe an RTS does not benefit from more cores to give a better gaming experience? Given the nature of how it works, more cores should be utilized as they are available.

And how would you do that? Lets face, RTS AI basically boils down to "What units to build" and "where do I move them". The AI usually reduced to the point where entire groups of units are grouped into one AI just to simplify the processing requirements. AI requirements in modern games are, frankly, trivial to process.

Look at it this way: Say I make a unique thread for each individual unit produced, plus additional threads to manage the high level processing for each player AI. Ok, I've got 400+ threads. And I need to run every single one of them at least once (preferably at least a dozen times) in a 16ms window.

See the very fundamental problem here? There's simply not enough time to get every single thread dispatched and processed. The AI's will start to lag as the threads that service them are waiting, taking action maybe two or three frames longer then they otherwise would. All your threading has done is introduced a ton of latency into the AI subsystem.

Simpler solution: Have one thread go through each cluster of units (grouping individual units together to reduce the amount you need to processes), then throw the results back to the main game engine to process. You get through the AI's faster, don't have any latency issues, and given the total processor cost is MAYBE 10% of a single CPU core, the processing requirements can be considered trivial.

What people forget is in gaming, you have relatively strict processing deadlines you need to meet, and every additional thread you create means some other thread is going to be waiting. More threads equals more latency.

Any game that has a lot of things going on that are not graphical in nature and belong to "AI", should be threaded somehow (I am sure they are) and you can start counting other things as well. I remember how Crytech make the Cryengine 3 use the CPU for some effects when they needed them and it actually made more core CPUs show off a bit more. I also remember posting here a screenshot with RAGE using all of my Phenom II cores at the time.

I did an analysis of CryEngine 3 back in the BD/PD thread. The extra workload comes mainly from 12 or so DX helper threads. The thread breakdown was essentially the main engine, the main render thread, twelve render helper threads, and about 50 threads that contribute less then 1% processing time. So essentially, 99% of the CPU workload was either the game engine or DX related. Which makes sense, as the stuff in the Engine can't easily be made parallel.


Look people, putting aside the rendering process and everything that comes along with it, think about what a game actually does. Pretty much none of it, outside of AI, is even remotely taxing on a Pentium 4, let alone modern processors. And outside of Strategy games, even AI isn't terribly taxing. Almost all the CPU load in gaming is related to the rendering process; everything else is pretty much trivial. There's no benefit to threading anything there, and there are many situations where doing so would REDUCE performance.
 
Well gamerk, I have already given you 2 examples on how going wide actually helps (Ashes through DX12 and JBoss as a general program; RAGE as a bonus in the old days when discussing dual vs quad). You gave FO4 as a counter example that it doesn't add much, but I can see in the graphs that dual cores are out of the window and you need *at least* a full quad core (i5 or 8350) to play it. As time goes by, you will see how scaling moves to the side.

I will keep my view on that we still have a LOT of potential to unlock in current software (not only games) if we go wide. I have seen it close and personal and it's interesting how a simple problem that solving it in a linear fashion is almost impossible, becomes trivial when thinking wide. Mind you they are not NP problems 😛

Balancing threads is complicated and in no way I am saying making something that was designed to work in a linear fashion will suddenly, magically, can be ported to work wider. It requires re-design and a lot of effort in learning and executing a different paradigm.

You remind me of the people here at the office that still thinks Mainframes are the ultimate solution to all of the problems, haha. Big monolithic designs.

Cheers!
 
A friend of mine likes to quote from some movie I've never heard of, about a bunch of lackluster superheroes. One of them is supposed to have the "strength of 3 men." Not great by superhero standards but he can still do as much as 3 people put together. He's also supposed to be "as fast as 3 men." Of course the joke here is that 3 men can't run any faster than 1 man, just try it.

What 3 men can do is "more" which isn't exactly the same as "faster." Let's say you're helping a friend move from a downstairs apartment to an upstairs apartment. He's got 30 boxes of stuff to carry and it takes 10 minutes per box. It would take 1 man 30 trips for a total of 300 minutes. 3 men could do the job in 10 trips each, for a total of 100 minutes, 1/3 the time. Individually, none of them went any "faster" but together they did "more."

In both cases, it's 30 total trips. But with three people, you're parallelizing the workload, they're all making trips simultaneously and that's great. But it can't always work like that. Let's say he's got 30 boxes worth of stuff to carry, but only 1 box to put them in. So you load up a box, take it up upstairs, unload it, bring it back downstairs, refill, etc. Now it doesn't matter how many people you have, you can only make one trip at a time. There's no way to do "more" here, the only way to improve things would be go try to make each trip "faster."

That's because the status of each trip is now dependent upon the completion of the previous trip. You can't start a new trip until the old trip finishes. In this case because you need the box to fill and refill. But you can imagine that box as being some data, and the loading/unloading as operations being performed on that data. If those operations have to be performed sequentially, parallelizing won't help.



TL;DR more != faster
 
Status
Not open for further replies.