Discussion: AMD Ryzen

-Fran- · Jun 14, 2016

TMTOWTSAC :

What you just described in your analogy I know it as "indivisible algorithm". I am *very* sure that a big chunk of code out there hasn't reached that point yet. If you do reach that point by means of the framework you're using or something, then nothing else you can do.

In your own example, you do recognize if the box *can be divided* you can make good use of parallelism (faster or not, it's given by other factors). Same applies to *any* solution design in computing science: why make the box un-divisible when you can go with smaller boxes for even *more* people? There is a reason Computer Scientists and Mathematicians make such good money when asked to optimize stuff. They re-design solutions to fit specific hardware; tailor made solutions in software for specific hardware arrays and configurations if you like. So, imagine you have control on how to create the box and you decide to make it indivisible. Then yes, making those 3 people carry the single box might not be as efficient/easy as making the stronger person carry it. You can modify your analogy to fit any scenario possible with CPU width and IPC, but what you and gamerk are missing here is that current code has not reached the threading nature where we can say "it can't go wider". Currently, most developers have the "I can't divide this box, because the framework doesn't let me"; come DX12 for alleviating part of that. Then you have "ok, I divided the box, but then the box allocation is making a huge queue"; come better schedulers! And so on.

I am not saying it is easy (again) but it is *doable* by your average code monkey with good architectural supervision.

Cheers!

gamerk316 · Jun 14, 2016

-Fran- :

Taking the FO4 case again, while the older i3 2100 struggles, the newer 4330 is putting up numbers to match the 2500k, and outperforming the quad core FX-4300, and outright typing the octo-core FX-8150. Kind of hard to take the "more core" argument seriously when you, again, have a dual core matching a faster clocked octo-core.

Your Ashes argument is silly, given that the i3 4330 outperforms the FX-8370, which totally undermines your argument:

Don't have much data on Rage, but I remember it optimized and changed graphical settings during gameplay, which would make it a very unreliable data point.

I will keep my view on that we still have a LOT of potential to unlock in current software (not only games) if we go wide. I have seen it close and personal and it's interesting how a simple problem that solving it in a linear fashion is almost impossible, becomes trivial when thinking wide. Mind you they are not NP problems 😛

I argue the exact opposite: There's no way to make many of these problems parallel without increasing latency to the point of reducing overall performance.

Balancing threads is complicated and in no way I am saying making something that was designed to work in a linear fashion will suddenly, magically, can be ported to work wider. It requires re-design and a lot of effort in learning and executing a different paradigm.

Linux uses per-core runqueues, and they just found a bunch of problems within the scheduler where threads were getting starved and prevented from running. As a general rule, Load Balancing threads is going to reduce overall performance, because you have the balance based on past, not current or future performance. Threads that sleep for long periods then become very active [A games GPU thread being a good example of this] will tend to get load balanced on a CPU core already doing more then the average amount of work, leading to a situation where the GPU thread is blocked by the main game thread, tanking performance. And yes, this can happen in Linux currently, and I've debugged cases where I've personally observed it happening.

Windows does it right: If a thread is capable of running, and there's a CPU core open, run the damn thread. Locking threads to specific cores costs more performance then you'd ever save.

You remind me of the people here at the office that still thinks Mainframes are the ultimate solution to all of the problems, haha. Big monolithic designs.

Today, we call mainframes "Cloud Computing". Same deal, and it's apparently the next big thing.

Look, we did this back in the 80's. We found software simply didn't scale beyond a couple of processors. And we created GPUs and specialized ASICs to handle the stuff that does. Sure, there are PARTS of a program that can be scaled decently, but when those individual parts take 5% of a single CPU core of a 8 core CPU, what's the point?

-Fran- · Jun 14, 2016

gamerk316 :

The graphs you pointed out are indicative only on how far behind PD is from current Intel CPUs. I do know you posted them to prove the game itself does not scale well, but those graphs aren't what you should be using to disprove it. You need something like this:

Well, it's from SA, but seems a good case study: http://semiaccurate.com/2016/03/01/investigating-directx-12-cpu-scaling/ 😛

It seems they made it to scale *well* to 4 cores. And it *does* show great CPU scaling. My point here was Ashes was just the first game to demonstrate how well you can now thread stuff for games thanks to DX12 and how it handles the graphic pipeline. Or that is my understanding at least.

The next hurdle would be for UE, Source (which IIRC, is well threaded) and other big players to make their frameworks scale up better.

gamerk316 :

There is always a sweet spot given by all the middleware stuff you have. My best example here is the Java VM. That thing has been improving a lot in how to expose threading to the programmer. And in Java 8, and all programs certified for it, they are scaling amazingly when written for it. JBoss was my particular example, but I can start naming others. Ironically enough, I think Minecraft was monolithic, haha. So you have, for example from bottom to top: Java->JBoss->Spring/Struts/J2EE/etc. JBoss is the very foundation on what you can do in your application server, so the better it can handle heavy loads, the better you can program for it. I did a lot of scaling testing, simulating from 10 concurrent users to 10000 users and you wouldn't believe how well it handles the loads across the CPUs compared to it's predecessors. I also saw this, to a lesser degree, with other web-oriented software as the time went by and they started nailing threading better. Same CPUs in our racks, better performance due to upgrades in software alone.

I am not arguing directly against what you're saying, so let me repeat it: the current code out there is not yet threaded enough to make it so that it would *decrease* performance. I bet they might have done a terrific job for parts of the code that fully use whatever cores are available, but just for a fraction of the full design.

gamerk316 :

Yes, schedulers are important to the equation. I am not saying otherwise. If you don't know how the scheduler handles workloads, then it hampers the overall design.

gamerk316 :

In the 80s the latency introduced by the surrounding technology was impeding a lot of what today is possible (think interconnects). In the 80s people did not think you would be able to program using fully integrated IDEs, nor that you would be able to compile stuff in a laptop for your day-to-day work. No one needs more than 4KB of RAM, remember? Well, in the scientific sense of the word you don't, but still, haha.

The theorycraft of everything in the 80s was lost in the 90s and the only thing that survived are the programming paradigms.

Cheers!

gamerk316 · Jun 15, 2016

The graphs you pointed out are indicative only on how far behind PD is from current Intel CPUs. I do know you posted them to prove the game itself does not scale well, but those graphs aren't what you should be using to disprove it. You need something like this:

Well, it's from SA, but seems a good case study: http://semiaccurate.com/2016/03/01/investigating-directx-12-cpu-scaling/ 😛

It seems they made it to scale *well* to 4 cores. And it *does* show great CPU scaling. My point here was Ashes was just the first game to demonstrate how well you can now thread stuff for games thanks to DX12 and how it handles the graphic pipeline. Or that is my understanding at least.

You are making the same exact mistake that everyone else keeps making: Scaling != Performance.

As long as no individual CPU core gets bottlenecked, performance is driven by how much work a core is able to accomplish. And taking the i3 vs PD case, sure, 90% across all cores, but when not bottlenecked, a modern i3 beats everything in AMDs lineup, regardless of scaling, simply due to IPC.

Point being, one core at 80% will perform EXACTLY the same as eight cores at 10%. The CPU is doing the same amount of work over the same timespan. The second just makes Task Manager's CPU usage charts look pretty.

The next hurdle would be for UE, Source (which IIRC, is well threaded) and other big players to make their frameworks scale up better.

The framework isn't the problem. As I've noted before, there's not much there you can scale, period.

I am not arguing directly against what you're saying, so let me repeat it: the current code out there is not yet threaded enough to make it so that it would *decrease* performance. I bet they might have done a terrific job for parts of the code that fully use whatever cores are available, but just for a fraction of the full design.

That's how programs ARE designed.

Seriously, open up Task Manager, and add the "threads" tab. EVERYTHING uses 80+ threads. Hell, your typical game launcher uses 60 or so. The problem is, frankly, the stuff you can thread takes trivial CPU time to execute. So the fact the CPU is juggling 60+ threads is ignored <mod edit> because the total processing load is under 1% and Task Manager and other programs can't even measure it.

The main executive CAN'T be threaded; you have things that need to execute in sequential order, which limits threading to operating within sub-components of the system. And most of those sub-components are trivial on the CPU.

The main render thread, prior to DX12, CAN'T be threaded for the same reason: Each step of the graphics pipeline was executed in sequence.

In the 80s the latency introduced by the surrounding technology was impeding a lot of what today is possible (think interconnects). In the 80s people did not think you would be able to program using fully integrated IDEs, nor that you would be able to compile stuff in a laptop for your day-to-day work. No one needs more than 4KB of RAM, remember? Well, in the scientific sense of the word you don't, but still, haha.

The theorycraft of everything in the 80s was lost in the 90s and the only thing that survived are the programming paradigms.!

The theories still held. MIT made a rig which had several hundred individual CPUs back in the 80's, and they found that after about a half-dozen CPUs, the software simply couldn't scale. There's only so much you can do when you have a sequential series of operations. The stuff they could thread were light workload, independent actions that weren't time sensitive. Everything else couldn't be scaled to any reasonable degree. DARPA found much the same deal in the 90's. And so on.

Amdahl's Law continues to hold true: Your performance increase by making code parallel is limited by the portions of the code that are executed sequentially. If 90% of your code is made up of serial operations, the best case performance improvement you can make by making your code parallel is 10%. It's as simple as that. And guess what? Memory access is sequential. Math is (cute compiler optimizations aside) sequential. Hardware access and IO are sequential. The majority of code execution is sequential.

Sure, SOME things can be made parallel. Database operations. Encoding. Rendering. Pretty much anything you can break into individual processing chinks that are independent of eachother can be made parallel, often to infinity. But general code can not. And barring a fundamental rethinking of computing, nothing is going to change this.

<mod edit: please check private messages>

-Fran- · Jun 15, 2016

gamerk316 :

But scaling *is* performance at the end of the day. All things being equal, for the same hardware when it can support higher amount of parallel workloads, it will increase performance. I am not making an incorrect assumption nor assessment: for the same hardware, if you thread better, you get more performance out of your program(s). I don't see why that is so shocking to you, honestly.

gamerk316 :

Yes, frameworks *are* the problem. They lay out the foundation on what you can solve and how. When you start with a white page, you can write whatever you want and draw whatever you want. Order the content however you want and so on. When you get a page with guidelines, your freedom of design becomes limited. It's a trade off the industry is happy to live with and I'm not saying it's a bad thing, but when you can't extract more performance *of your code*, then you have to start taking a look at the frameworks you're using. I don't see this logic as complicated to understand or why is incorrect to you.

gamerk316 :

Different frameworks allow different ways of threading. There is a gigantic difference between a "soft thread" and a "hard thread". You also have different mechanisms to make the best use of their differences to your advantage. This is also part of how the OS works and allows the different frameworks to expose threading to the programmers.

Again, a good example is threading in C, C++ and Java. They are *very* different from eachother and expose different mechanisms to threading. I would imagine C# is also different, but should be similar to Java at least.

Practical example in java:

1.- Main -> cycle -> print -> end.
2.- Main -> thread -> thread_n😛rint -> wait -> end.

The main thread is kept alive in both, and has to outlive the child threads, yes. You have a "main" thread all the way in the program's life. That does not mean you can design your program to *not* depend on it and go as wide as you want. Trade offs! No extreme is good, and each serve a purpose. Example 2 could be faster, but print result in a way you don't want, so you will need to synchro the threads! Etc, etc... Example 1 is *faster* to code and *easier* to test and it *might* be as fast as Example 2, but you will hit a performance wall soon enough how tech is moving forward.

gamerk316 :

I won't argue that. Computing is built on top of a fundamentally sequential paradigm and there is so much you can actually make parallel. Parallelism is just and illusion and all of that. You will reach a point you can't go wider anymore. Just like you can't go under a sub-atomic particle, right?

BUT, my point is that programs can still take good advantage of going wider. Using more than 8 cores is perfectly feasible. There's a reason we're seeing machines with 384 *CPUs* in them and not a single CPU and 40 or 50 GPUs.

All of the bottlenecks you are mentioning are non-issues for many many programs and problems out there being solved using monolithic/sequential approaches. I am including games as well in this bag.

Cheers!

Cazalan · Jun 15, 2016

I didn't mean to spark this discussion again. Getting some Deja Vu here. Lets move back to Zen ya? 😉

gamerk316 · Jun 15, 2016

But scaling *is* performance at the end of the day. All things being equal, for the same hardware when it can support higher amount of parallel workloads, it will increase performance. I am not making an incorrect assumption nor assessment: for the same hardware, if you thread better, you get more performance out of your program(s). I don't see why that is so shocking to you, honestly.

A processor with two cores doing 20% work will perform exactly the same if it had one core doing 40% work. The core(s) aren't a bottleneck in either case, so performance is dominated by IPC/Clock. That's why making things parallel that don't need to be does nothing aside from giving everyone involved headaches.

Yes, frameworks *are* the problem. They lay out the foundation on what you can solve and how. When you start with a white page, you can write whatever you want and draw whatever you want. Order the content however you want and so on. When you get a page with guidelines, your freedom of design becomes limited. It's a trade off the industry is happy to live with and I'm not saying it's a bad thing, but when you can't extract more performance *of your code*, then you have to start taking a look at the frameworks you're using. I don't see this logic as complicated to understand or why is incorrect to you.

Frameworks don't prohibit what the programmer can do, they just provide an interface you need to live within. Sure, there are some rather horrid ones, but within the blank state the franework starts you out with, you're free to do whatever the heck you want.

Different frameworks allow different ways of threading. There is a gigantic difference between a "soft thread" and a "hard thread". You also have different mechanisms to make the best use of their differences to your advantage. This is also part of how the OS works and allows the different frameworks to expose threading to the programmers.

Threads are threads as far as the OS is concerned. How they are exposed by a specific framework is largely irrelevant; you create a thread, the OS is going to schedule it on some CPU at some point in the future. Some frameworks may suck royally when it comes to thread control (pthreads in particular, Java ain't too hot either), but we learn to live within the various limitations that may exist.

Again, a good example is threading in C, C++ and Java. They are *very* different from eachother and expose different mechanisms to threading. I would imagine C# is also different, but should be similar to Java at least.

On Windows, they all invoke Windows::CreateThread in one form or another. Granted, some languages (Java) are lacking in fine grain thread control (Java doesn't allow you to set thread affinity, for example, pthreads has no concept of suspending a thread, and so on), but all that is invisible to the OS.

Practical example in java:

1.- Main -> cycle -> print -> end.
2.- Main -> thread -> thread_n😛rint -> wait -> end.

The main thread is kept alive in both, and has to outlive the child threads, yes. You have a "main" thread all the way in the program's life. That does not mean you can design your program to *not* depend on it and go as wide as you want. Trade offs! No extreme is good, and each serve a purpose. Example 2 could be faster, but print result in a way you don't want, so you will need to synchro the threads! Etc, etc... Example 1 is *faster* to code and *easier* to test and it *might* be as fast as Example 2, but you will hit a performance wall soon enough how tech is moving forward.

The second example is bad code. General rule: Never thread IO. Devices care about IO, and IO will almost certainly go out in sequential order. To ensure this via threading, you have to synchronize every IO thread against each other, which will bring processing to a screeching halt.

I won't argue that. Computing is built on top of a fundamentally sequential paradigm and there is so much you can actually make parallel. Parallelism is just and illusion and all of that. You will reach a point you can't go wider anymore. Just like you can't go under a sub-atomic particle, right?

BUT, my point is that programs can still take good advantage of going wider. Using more than 8 cores is perfectly feasible. There's a reason we're seeing machines with 384 *CPUs* in them and not a single CPU and 40 or 50 GPUs.

All of the bottlenecks you are mentioning are non-issues for many many programs and problems out there being solved using monolithic/sequential approaches. I am including games as well in this bag.

Windows will use how many cores you expose to it. Using cores is trivial. It's wring an application that does not attempt to solve a fundamentally parallel problem in such a way where those cores get used to any significant degree is the problem.

At the end of the day: Serial tasks can not be made parallel. CPUs can get some gains via ILP, but at the end of the day, if my program is of the form of:

1: Do A
2: Do B (A)
3: Do C (B)
4: End

there's only so much I can do. I might be able to make A parallel, or B or C, but the program flow is serial, and theres NOTHING about that I can change.

-Fran- · Jun 15, 2016

Cazalan :

But it's fun to discuss that... Ok 🙁

gamerk316 :

For me, at least in what I know from C and my beginner days (now I use Java mainly, which is easy life, lol), is that a hard thread has it's own address space and memory reserved by the OS, whereas a soft thread doesn't; it shares everything with the parent. That is a huge difference. Think "fork" and "thread" in C. How the OS handles that is completely different as well, since you're effectively launching a new program. That is a form of parallelism that is also applicable for some cases. Think Chrome and their approach to threading. That's a nice example on how to do "heavy threading" vs "soft threading".

gamerk316 :

Yes, that scenario is not easy to "thread". The only thing to notice is that, sometimes, you can still keep on separating the logic to make it even more threaded, but complexity keeps you away from it.

I've seen a couple of problems like that in what I do, but I've managed to make them go parallel with backhanded tricks with the JVM and squeeze a bit more performance, but the amount of effort doesn't make it worth your while, haha.

In any case, I'll stop now gamerk. It's been fun, but it seems we're taking it too far, haha.

Cheers!

gamerk316 · Jun 15, 2016

Yes, that scenario is not easy to "thread". The only thing to notice is that, sometimes, you can still keep on separating the logic to make it even more threaded, but complexity keeps you away from it.

Make a program too complex, and people complain why it breaks all the time. There's a point where stability wins the day over performance. That's why we aren't using hardcoded assembly anymore :/

For me, at least in what I know from C and my beginner days (now I use Java mainly, which is easy life, lol), is that a hard thread has it's own address space and memory reserved by the OS, whereas a soft thread doesn't; it shares everything with the parent. That is a huge difference. Think "fork" and "thread" in C. How the OS handles that is completely different as well, since you're effectively launching a new program. That is a form of parallelism that is also applicable for some cases. Think Chrome and their approach to threading. That's a nice example on how to do "heavy threading" vs "soft threading".

Taking your language, a "hard" thread is really a new process [see the Process tab in Task Manager] as far as the OS is concerned, and programs like Chrome use it to free themselves of the old 2GB Address Space limitation that Win32 programs were limited by. In todays day and age, this type of programming belongs in a dumpster. If you need more then 2GB Address Space, compile as Win64 and call it a day. It was a defensible approach a decade ago, not so much now.

And yes, we're running in circles so lets stop it here. Half expecting this entire conversation to disappear within a few days anyway.

Rogue Leader · Jun 15, 2016

gamerk316 :

Conversation was ok so far, we encourage vigorous debate within the rules of course, but check your PMs.

TechyInAZ · Jun 15, 2016

Yes just like Rogue Leader said, debate is fine but make sure it's civil.

Cazalan · Jun 15, 2016

Doom 2016 can apparently scale to quite a few cores. Must be why AMD chose to demo Zen with it.

-Fran- · Jun 15, 2016

Cazalan :

What current CPU/PC has 44 cores? XD

Are they running that thing in a rendering farm or something? haha.

Cheers!

TMTOWTSAC · Jun 15, 2016

-Fran- :

They got their hands on an E7-8880 maybe?

-Fran- · Jun 16, 2016

TMTOWTSAC :

That is still a 30 thread CPU (according to Intel's page), so it would have to be a double socket server or something. Maybe they're running it inside a E5-2699 V4 or something similar. That thing is 44 threads.

Cheers!

juanrga · Jun 16, 2016

-Fran- :

44 cores or threads? Lovely also to see a Nvidia therein.

juanrga · Jun 16, 2016

-Fran- :

This and the other benchmarks posted before show that a higher clocked 4C will be preferred option for gaming rather than a low clocked 8C.

Quixit · Jun 16, 2016

-Fran- :

Intel's newest Xeon Phi has 72 (Atom) cores per CPU and you can put several chips in one system.

Just to be pedantic.

-Fran- · Jun 16, 2016

juanrga :

I'm still hoping we move onto 6+ heavy loaded threads.

Edit:

Quixit :

KL has more than that, doesn't it?

Cheers!

gonf · Jun 16, 2016

hi all.
saw a rumors post "http://www.extremetech.com/computing/230305-rumors-suggest-amds-zen-servers-will-pack-32-cores-serious-heat"
32 cores? and push to 2017?
I know they are coming out with 8 cores for sure "Lisa" said it on the amd conference 2016 June 1. she also said everything about Zen is on schedule. doesn't that mean Q4 2016? i'm a bit lost.

TMTOWTSAC · Jun 16, 2016

-Fran- :

The Broadwell E7-8880 v4, just released June 6th. 22 core, 44 thread. (Ivy was v2, 15 core 30 thread.) MSRP $5895.00. Supposedly available at launch so they could be sprouting up in the wild now.

http://ark.intel.com/products/93792/Intel-Xeon-Processor-E7-8880-v4-55M-Cache-2_20-GHz

And it's not even the top end, if you've got a body part or two left over, like a spare kidney, you can drop $7174.00 on the 8890 for an extra 2 cores.

juanrga · Jun 16, 2016

gonf :

She always said "availability" on 2016 and "sampling" on 2016. Full volume production is only coming in 2017. And the claim Zen will be released in October was a fabrication by the media as confirmed by AMD's Bridgman.

This is from the financial analyst day. "Availability in 2016" doesn't necessarily mean you can buy it in 2016. They stuck to that and Lisa Su said at computex that SR samples will be available for few selected partners in the coming weeks and announced wider availability of samples in Q3.

The more probable is a paper launch in late 2016 and real launch in 2017.

gamerk316 · Jun 16, 2016

gonf :

Zen was always "Limited Sampling" in Q4 2016, which basically means "not commercially available". Then tech websites brought into some bogus data, and declared Zen months ahead of schedule.

The Zen release for consumers has ALWAYS been Q1 2017. All that means is, despite news otherwise, Zen is right on schedule.

TechyInAZ · Jun 16, 2016

gamerk316 :

We should really appreciate this too. So often a new CPU release is always delayed up to 6 months (sometimes more) beyond the original release date. (Like Broadwell and Skylake *cough cough*)

-Fran- · Jun 16, 2016

TMTOWTSAC :

Ah, I looked at the V2, not the V4. That explains it, lol.

I'd love to run some tests in it though! XD

Cheers!

Discussion: AMD Ryzen

Glorious

Glorious

Glorious

Glorious

Glorious

Distinguished

Glorious

Glorious

Glorious

It's a trap!

Titan

Distinguished

Glorious

Distinguished

Glorious

Distinguished

Distinguished

Reputable

Glorious

Distinguished

Distinguished

Distinguished

Glorious

Titan

Glorious

Share this page