AMD CPU speculation... and expert conjecture

Reepca · Feb 27, 2015

palladin9479 :

Reepca :

You gotta stop approaching things from a human PoV, humans are very serial thinkers to the tune of cause -> effect -> observation -> reaction. Physics is extremely dynamic and parallel, where time is the only serial element. Each interaction would be done simultaneously but only if that interaction happened within the exact same time slice and would use a number of threads equal to the number of simultaneous interactions within the same reference frame. This is how physics simulators work but doing it in real time is computationally expensive requiring massive parallel capabilities. The real issue is creating an API or other model that allowed the input and output of the physics to be useful to the logic of a game. It's extremely hard for humans to wrap their minds around dozens if not hundreds of things happening simultaneously so instead we think of it as a series of events handled in sequence which can work well without our own mind.

A good way to think of it would be how a database serves requests. Each request is a single serial task but the database receives many hundreds of these tasks every second and some of them interact with each other. So while each physics interaction is serial in that moment, there are many interactions happening simultaneously that can be processed separately, for that time slice / moment. Then with the results of those interactions you process the next time slice / moment, then the next and so on / so forth. The issue then becomes balancing it out so that you don't get caught behind since if it takes you longer then 10ms to calculate out the physics that took place in those 10ms your going to start having lag issues. You want to keep the interactions simple enough that you can do them quickly and not get caught, but complex enough to be useful.

I had a long post written up about dependencies between colliding objects between-updates that likely completely misinterpreted your post, so I scrapped it (partially resurrected later).

So if I understand this correctly, physics isn't part of the main game thread, but rather runs separately - constantly updating the state of all objects based on time passed and interactions between objects in a loop. In this way, it is task parallel. However each object is... well... its own object! An individual thread can process the interactions between each object and everything else. In this way it is data-parallel.

But the part that bugs me about this is that for any time-chunk, things happen within that time-chunk. Taking this to an extreme example, suppose that one time chunk lasted 10 seconds. Two people are moving towards each other and, simply checking their paths against each other, one would think they would collide, except that 1 second into this time chunk person A gets hit by a train and flies far away. If we only looked at the interactions between person B and the world, person A and the world, and the train and the world (individually, in parallel) we would end up with persons A and B colliding *and* the train hitting person A. The way I imagine it, nothing is "simultaneous" in a time chunk - something happening within 10 ms of another thing clearly doesn't mean they happen at the same time. Collisions should be detected, sorted by time, processed in order, and every time a collision is processed all objects are moved forward to that point in time (and the corresponding change in position) and any objects that thought they were going to collide with either of those involved in that collision must re-check for the soonest collision... in my head, at least. Due to dependencies between the objects, don't the interactions have to be processed in order?

palladin9479 · Feb 27, 2015

Again your stuck thinking linearly as though you need to calculate the effects of an object for the future. When you do the calculates they haven't moved yet, they only move after you've done the math. With each update you refresh the state of every object in the world. So in your case with object A and B on a collision course with object A being intercepted by object C, you would calculate out each of those as time progressed and not from life to death. What I described was now physics simulators work, the types of things that calculate out what happens on a nuclear level and they are massively parallel. Large scientific experiments are run on very powerful computers using gigantic arrays of processors, and they are run using this method. You don't have a single core running at 40Ghz trying to calculate out the particle interactions in a plasma, instead you have dozens of cores working simultaneously calculating out the interactions, collisions and kinematic effects of billions of ions. Resolution is measured in time, how small the amount of time in between calculations of what has happened not what will happen.

Hopefully your not trying to argue physics are serial when every scientific simulation demonstrates otherwise on a daily basis.

In fact here, I'm going to the let folks who own hardware Physics based processing technology speak about it.

http://www.nvidia.com/object/physx_faq.html

Why is a GPU good for physics processing?
The multithreaded PhysX engine was designed specifically for hardware acceleration in massively parallel environments. GPUs are the natural place to compute physics calculations because, like graphics, physics processing is driven by thousands of parallel computations. Today, NVIDIA's GPUs, have as many as 480 cores, so they are well-suited to take advantage of PhysX software. NVIDIA is committed to making the gaming experience exciting, dynamic, and vivid. The combination of graphics and physics impacts the way a virtual world looks and behaves.

Out of all the possible computation problems that you could argue is serial, physics is the last place you should of started.

cdrkf · Feb 27, 2015

logainofhades :

Ok, the power consumption isn't as bad as I expected (though there is a noticable jump as it's overclocked- it is only a dual core part remember). The thing is though, your example is still absolute worst case for AMD. At the end of the day, would you (or anyone) honestly reccomend a core 2 duo or Core 2 Quad to anyone over an FX 6XXX or FX 8XXX part for modern day workloads?

In a few very specific scenarios (like x87 instructions), AMD aren't where they should be, however looking at a wider set of benchmarks (including a good range of games) I'm 100% certian that even an FX 4XXX part is a worthwhile improvement over anything fom Core 2 (and ok you *might* be able to outrun a FX 4 part when the Core 2 is overclocked, but then you just need to overclock the FX to redress the balance).

The tougher question is when you compare the FX (which are actually quite old) to the latest i3 from Intel. The new i3 is capable of running most things as well or better than an FX 8XXX part, with the possible exception of a few best case applications, and all the time using significantly less power. I think 18 months ago the FX represented very good value compared to Sandy / Ivy parts, but I can't pretend AMD have been dragging this gen out a lot longer than they should be.

Zen may not be a home run like A64 was, however they'd have to really screw up badly for it to be in any worse position than they are now in that space 😛

truegenius · Feb 27, 2015

cdrkf :

i guess, they know how to do this much better than anyone else 😗

FALC0N · Feb 27, 2015

Let's hope not this time. They really need a big win

gamerk316 · Feb 27, 2015

Ok, the power consumption isn't as bad as I expected (though there is a noticable jump as it's overclocked- it is only a dual core part remember). The thing is though, your example is still absolute worst case for AMD. At the end of the day, would you (or anyone) honestly reccomend a core 2 duo or Core 2 Quad to anyone over an FX 6XXX or FX 8XXX part for modern day workloads?

The real question is, if someone comes asking whether it's worth replacing a 9xxx based C2Q for an FX based rig, if there is justification to say "yes".

That's AMDs problem: High tier C2Q's aren't significantly worse then the FX lineup, so you can't justify a system rebuild around FX if you currently have a C2Q; it's not price/performance justified.

juanrga · Feb 27, 2015

cdrkf :

Never say never.

FALC0N · Feb 27, 2015

Gamerk316, there is more to a computer than gaming. You guys talk like nobody does anything with a computer but build gaming rig. Just the opposite is true. You represent a small minority of computer users. And even many gamers use computers for other things. Not to mention that many of these tests are sanitized. For example, your not streaming with C2Q, but could easily with most of the FX parts.

gamerk316 · Feb 27, 2015

FALC0N :

Actually, a C2Q is plenty for streaming. Streaming is actually not that hard on the CPU, all things considered. Games are one of the most stressful things a normal user could be doing.

blackkstar · Feb 27, 2015

First, this forum software is awful. I had a long, nice post written out and it ate my post because I was logged out. Thanks for loading the posting form with AJAX for some stupid reason. When you navigate back to the page, what you typed in the textbox gets deleted when it's normally saved there by the browser.

Anyways, Intel is no saint in not making large IPC gains. http://www.tomshardware.com/reviews/processor-architecture-benchmark,2974-14.html

It's difficult to tell which ones see performance improvements from new instructions and which don't. Conroe only supports up to SSE3 and SB supports up to AVX. But there's a less than 10% IPC improvement in Photoshop CS5 from Conroe to SB over the span of ~4 years. AMD is not the only one having trouble with generall IPC improvements. But Conroe doesn't clock nearly as high as SB does, and the same can be said for Conroe and Piledriver. So IPC by itself is a moot point, which we've been over countless times.

http://techcrunch.com/2008/07/29/overclock-world-record-q6600-24ghz-run-at-51ghz/
Q6600 took liquid nitrogen to hit 5.1ghz and I run 5.1ghz 24/7 on my Piledriver with just a simple XSPC kit. And I'm far from the only person to hit 5.1ghz on an FX chip. Just for fun, Piledriver world record overclock is 72% higher than Q6600 overclock.

And to be fair to Intel, a good Q6600 overclock was high 3ghz range. And now that's what new Intel chips ship at stock. Now, if we were looking at chips that were not improving top frequency or even stock frequencies, then IPC would be something to consider. You know, sort of like IPC not changing very much between SB and Haswell when using the same instruction set, yet SB clock to 5ghz under water and Haswell not making it that far.

FALC0N · Feb 27, 2015

Well, you stick a C2Q side by side with and FX 8000 and tell me what kind of frame rates you get while streaming. The FX will blow it away.

gamerk316 · Feb 27, 2015

FALC0N :

A 6000 series, sure. A 9000 series clocked in the mid/high 3GHz range can still perform. Yes, FX is better, but not "I'm going to spend $600 building a new PC" better.

And yes, I still have a OC'd QX9650 rig lying around (790i platform; DDR3 RAM); it still does the job. Sluggish around the edges, but still in the mid-tier i5 range based on the performance I'm getting out of it. i'd imagine a bit worse for memory dominated tasks, but it's not like it's a Pentium 4 or something.

Point being, I laugh every time I see people doing CPU sidegrades, like going SB->Haswell. There's literally no point to it anymore, and hasn't been since the late C2Q era.

logainofhades · Feb 27, 2015

Yea Sandy to haswell is pretty pointless, unless you have a hardware failure, and have little choice in the matter. Hence why I went Sandy to Ivy. I used my i5 2400 setup to get my file server back up and running, and made a trip to microcenter for my 3570k and board.

etayorius · Feb 27, 2015

gamerk316 :

blackkstar :

Core 2 Quad probably has like 600Mhz advantage in regards to IPC compared to Piledriver, so the Q6600 is probably a good margin ahead the FX in Gaming, but i can see the FX winning in heavy Threaded App.

How old is the Q6600? Exactly, AMD have not managed to match the IPC of a 7/8 year old CPU.

logainofhades · Feb 27, 2015

blackkstar :

3.6ghz was easy with a Q6600, as most C2Q's could easily hit 400fsb. It was beyond 400, that many had troubles. My X3210, which was a xeon equivalent to a Q6400, ran at 3.6ghz @450fsb. It was quite the golden CPU.

de5_Roy · Feb 27, 2015

if y'all gonna sing intel glory, we have an intel cpu thread. 😗

logainofhades · Feb 27, 2015

My AMD glory days are so long ago, I cannot even remember the achievements anymore. 😛 Also, CPU-Z seems to have lost all my old dumps. When I log in, they are gone.

FALC0N · Feb 27, 2015

Gamerk316 wrote

A 6000 series, sure. A 9000 series clocked in the mid/high 3GHz range can still perform. Yes, FX is better, but not "I'm going to spend $600 building a new PC" better.

And yes, I still have a OC'd QX9650 rig lying around (790i platform; DDR3 RAM); it still does the job. Sluggish around the edges, but still in the mid-tier i5 range based on the performance I'm getting out of it. i'd imagine a bit worse for memory dominated tasks, but it's not like it's a Pentium 4 or something.

Point being, I laugh every time I see people doing CPU sidegrades, like going SB->Haswell. There's literally no point to it anymore, and hasn't been since the late C2Q era.

I see your point now. I agree with most of this. I don't understand the "upgrades" from sandy bridge to Ivy bridge to Haswell. They are so small, they just aren't worth the money.

Reepca · Feb 27, 2015

palladin9479 :

I'm not trying to "argue" anything, I'm just trying to reconcile what you're saying with what I know/think. I'm not sure I'm communicating my question clearly, though.

What I think when I hear "parallel physics" is having a thread that runs X other threads to update X other objects, then repeats after they are all updated. It measures the amount of time between each update through this loop and updates the objects as far as time has passed. "Updating" each object involves checking for collisions/interactions with the rest of the system and changing the state accordingly.

Explain to me how my understanding is wrong thus far, if it is.

However, I would also think that when updating each object, surely it needs to be done in order based on time? If there are 10 seconds between updates, if you don't process interactions in order then you end up with everything that happened in those 10 seconds happening at the same time - which doesn't happen in the real world; there is no Planck length of time, it is continuous.

The processing of object A's collision with object C must occur before the processing of object B's collision with object A. If it doesn't, then the simulation will say that object B collided with an object that could not have possibly been there. I can't see any way around it.

jdwii · Feb 27, 2015

gamerk316 :

Not to mention streaming is starting to be done on the GPU like i use shadowplay i see almost zero difference in performance. Plus if it can game it can easily do facebook or word i heard this argument before, moar cores doesn't mean moar multitasking.

etayorius · Feb 27, 2015

de5_Roy :

Suddenly, AMD Fails is all of our fault... AMD is in a sh*t position and every time they answer to Intel Products it seem as if they throw a Diarrhea attack.

Bulldozer was released in 2011, their next arch comes in 2016... that`s 5 years of Faildozer... why on earth it took them 5 years to understand Bulldozer was that bad? they should had done something immediately after Bulldozer, they were in a much much better position that they are now.

8350rocks · Feb 27, 2015

etayorius :

To be fair to AMD.

You are discussing IPC in a software code base for a single specific game, written with archaic code that was not much more relevant 7/8 years ago and is essentially over 30 years old now. AMD has never supported it well, and likely never will since the only software to run x87 code since nehalem launched are the following:

SuperPi
Skyrim

AMD cares not for worrying about either of those...

However, here is a cinebench R11.5 video comparing the 2: http://www.youtube.com/watch?v=DiXkg3-RA8c

Both are overclocked there, the Q6600 a lot more than the AMD is, though the AMD still holds the GHz advantage in terms of clockspeed.

All said and done...it depends on what you are doing. Even then, each architecture will have strengths and weaknesses. So, you really cannot say AMD cannot beat C2Qs, unless you put the caveat in that it is only while running 30 year old code.

By comparison:

A core i7-5960x is incapable of processing punch card instructions from old PCs...does that mean that the punch card instruction models are faster than a 5960x?

Be reasonable.

juanrga · Feb 27, 2015

blackkstar :

You cannot compare Intel to AMD like you are doing it. You are thinking linearly, but world is nonlinear. It is much much easier and cheaper to improve by 10% the IPC of a slower chip such as Piledriver than improving by 10% the IPC of a faster chip such as Haswell.

Some here have mentioned that Haswell IPC is not an improvement over Ivy Bridge. That is right if you limit measurements to x86 software. If you consider AVX software then Haswell brings up to 70% IPC gains over Ivy

http://www.pugetsystems.com/blog/2013/08/26/Haswell-Floating-Point-Performance-493/

juanrga · Feb 27, 2015

etayorius :

The development of Zen/K12 started when Jim Keller returned to AMD in 2012. According to Feldman, it takes three or four-year time frame and $300 million to $400 million in development costs to build an x86-based server chip based on a new micro-architecture

http://www.xbitlabs.com/news/cpu/display/20130709232003_AMD_Amazon_Facebook_and_Google_Could_Develop_Their_Own_Chips.html

Thus Zen cannot be ready before 2016--2017.

jdwii · Feb 28, 2015

Juan i thought Jim keller said he didn't have to start from scratch and it was just going to take the best of Bulldozer and Jaguar and Phenom with an updated process? If so that probably cuts down time.

Always a question for you guys do you think if Samsung/Global foundries had 14-16nm ready Amd would be able to get Zen/K12 out the door faster?

AMD CPU speculation... and expert conjecture

Honorable

Splendid

Judicious

Distinguished

Splendid

Glorious

Distinguished

Splendid

Glorious

Honorable

Splendid

Glorious

Titan

Honorable

Titan

Splendid

Titan

Splendid

Honorable

Splendid

Honorable

Distinguished

Distinguished

Distinguished

Splendid

Share this page