AMD CPU speculation... and expert conjecture

Page 675 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

8350rocks

Distinguished
Ok Juan, assume it is powered by one of your 5W APUs. In HPC that would likely be the case.

It does not change the situation, by that time it may even be something like a 3W ARM processor for all we know.
 




I was going to give a hypothetical scenario with an Apple A7 SoC as the "dCPU" of such system.

AMD will end up creating a GPU with an iCPU to have HSA numbers that justify such design decision. We'll be able to do raytracing in realtime, but the games will still run like crap because the CPU won't be fast enough for the regular "legacy" serial stuff.

I'll stop here. Another dead end discussion.

Cheers!
 


It's impossible to do calculates without first having the data to do those calculations on. The data MUST be copied, nothing will save you from that. What they are talking about is that you, the programmer, don't need to copy the data manually or manage the coprocessors (GPU in this case) memory space. Traditionally you'd have to first copy your dataset into the coprocessors local memory, then establish a session with it and send your instructions, receive the result and send some more. Essentially you have to micromanage everything about the coprocessor, that micromanagement was a headache and presented some real limitations on coding. With something like HSA, the coprocessor manages itself. You just send the instructions with the pointers and the coprocessor will copy the data on it's own, and chances are the data is already there as the prefetcher read the code ahead of time and started the copy before it was needed.
 

Reepca

Honorable
Dec 5, 2012
156
0
10,680
May be slightly off-topic, but I figure you guys probably know just as much as anyone - what ever happened to those enthusiast Kaveri laptop parts (sounds like an oxymoron, doesn't it?) like the FX-7600p? Black Friday has come and gone, and still no FX-7600p laptops - what happened to all those design wins AMD was talking about back in July?

What's your analysis of the situation?
 

colinp

Honorable
Jun 27, 2012
217
0
10,680
Good question and one I've been asking too. Apart from one monster sized MSI laptop, all the Kaveri laptops I've seen have been ULV parts. And AMD GPUs are almost entirely absent from laptops at the moment.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Good post! Let us do some computations.

The iCPU on Kaveri occupies about 50% of the die. On 7nm node the iCPU would occupy 3% of the same die. Duplicate the number of cores from 4 to 8 and will occupy 6%. Now double the size of each core and still the iCPU occupies only a 12% of the die.

But this is for a mainstream size die. The supercomputers will not use mainstream parts but top level high-performance parts. AMD engineers don't provide the size of their extreme scale APU. I could try to guess it, but luckily Nvidia engineers give the size of the their own extreme-scale 'APU': 650mm2.

The above 8-core iCPU (each core of the size of a Steamroller module) would occupy about 5% of the whole 650mm2 die. A separate dGPU with all the die spend on GPU-transistors would provide only 5% more "raw performance" than the APU. But this 5% more is in an idealized world assuming a perfect interconnect with zero latency and infinite bandwidth for the dGPU.

In real world, a real interconnect will reduce the performance due to latency and bandwith bottleneck with the final result that the dGPU will be slower. This is the reason why neither AMD nor Nvidia engineers use dGPUs for their extreme scale supercomputers.

http://www.amd.com/en-us/press-releases/Pages/extreme-scale-hpc-2014nov14.aspx

DARPA-goals.png
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No. The architectural improvements will run faster the regular "legacy" serial stuff as well. Each iCPU core has more IPC than current Piledriver/Steamroller cores. Moreover, the improved cache and memory subsystem provides extra boost on serial "legacy" performance by reducing latencies compared to a dCPU such as FX-8350.
 

cemerian

Honorable
Jul 29, 2013
1,011
0
11,660


every single speculation of your theoretical apu, is based on current size amd cpu cores/modules, its as if you intentionally don't want to even accept the fact that the cpu has to improve(in amd's case by a lot) which also means that the part of the die the cpu will occupy will be alot larger than 2-5%, if the cpu architecture and performance would stay the same then you would be right, but no one(who cares about performance of cpu) is happy with the performance of current amd parts, so the sizes would remain about the same proportion, also what makes you think we will still be on the same PCIE connections between cpu's and gpu's, as if progress wil stop completely on that front and everyone will just focus on APU. i get it you love amd and their apu's and you probably dont game or use your cpu to much, but for those who do their cpu performance is seriously lacking. HEDT market want bigger/faster cores and a dgpu with the whole space occupied just by gpu ideally the same for cpu(no igpu on the die) and have a lot better performance than your theoretical apu's. Dgpu will not die just because of this simple fact
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


First, there is no "speculation". I am using data given by AMD and Nvidia for their respective designs.

Second, it is not my "theoretical apu". I am discussing the APUs presented by AMD and Nvidia.

Third, I already said that the each new CPU core from AMD will be faster than a Piledriver/Steamroller module and about so big on size. Nvidia own cores will provide 50% more IPC than Denver, but still the iCPU occupies less than 10% of the total die. Again this is not "speculation", but follows from the details of the designs given by AMD and Nvidia engineers.

Fourth, I did not even mention PCIe in my above post. I mentioned interconnects in general, because the limits that I have mentioned apply to any future interconnect was PCie 5.0, HTX 6.0, or otherwise.

Fifth, I am trying to explain you why AMD engineers and Nvidia engineers are not using dGPUs for their extreme scale designs. You can ignore my explanation, you can ignore the numbers, and you can ignore the physics behind, but that will not change the fact that their designs for the future are not using dGPUs.

Sixth, the HEDT argument was debunked before and by more than one poster...
 

cemerian

Honorable
Jul 29, 2013
1,011
0
11,660


and yet the new to be built supercomputers will use volta Dgpu's paired with IBM power8 Dcpu's and not apu's
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The article confirms that AMD continues accelerating the transition away from PC market to new markets: embedded, semi-custom...

Kumar confirms that the semi-custom business won a design for an ARM-based product and the other is for a traditional x86 product. He doesn't give details but I think are the SoCs for Apple.

He confirms the good relationships with Globalfoundries, repeats that "AMD is now producing computing, graphics, and semi-custom products all at Global Foundries" and praises Globalfoundries for "the FinFET cooperation deal with Samsung." Does someone still doubt that AMD 14nm FinFET products will be made by Globalfoundries?

The article confirms that "AMD’s outlook on the discrete graphics business is less rosy" and "considers the performance of AMD’s graphics business to be disappointing".
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


But those are not extreme-scale supercomputers but ~20x slower supercomputers... For your information, IBM engineers are working on a future extreme-scale supercomputer design based on a heterogeneous node with the accelerator included on the same die than the CPU (*) instead using dCPU+dGPU.

(*) I also have a copy of the their APU-like design. The difference with AMD or Nvidia is that instead using GPU cores, IBM engineers use other kind of throughput cores for the accelerator on die. I am not going to give details because this is an AMD thread, I will only mention that the IBM design has unified memory and uses a technology similar to the PIM in the design presented by AMD.
 


Yes, I know where AMD is headed very well. It's boring for me. HEDT will be grabbed by Intel and AMD will leave us to rot buying Intel only stuff. The "dCPU" + "dGPU" landscape won't change anytime soon. I can't see beyond 2016 though, so 2020 will be interesting.

Cheers!
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


So basically, if you go for weak, efficient parts, you end up with more interconnects, thus making everything more expensive.

I.E. a 2m/4c APU like Kaveri with 512 GCN cores, in order to compete with a big dGPU with say, 4096 cores, would need 8 times the interconnects to get all those GCN cores spread out over 8 512 GCN core APUs?

HPC APU is not so viable unless you make huge APUs? Which means the entire thing isn't viable for HEDT as a 650mm^2 APU would cost a load of money. We spend all this time arguing how important efficiency is, yet we ignore that a huge, efficient HPC cluster of ARM APU would need a ton more interconnects, thus raising the price significantly.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810
When you switch to the new HPC benchmark, HPCG, as opposed to the ancient LINPACK, the results change quite a bit. The favor seems to go back to regular CPU (homogeneous) systems.

http://www.hpcwire.com/2014/12/01/architectural-surprises-underpin-new-hpc-benchmark-results/

It's also saying that reaching exascale with a more realistic benchmark is going to take another decade or more. The most powerful supercomputer in the world doesn't even reach 1 petaflop with that benchmark.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


#1 and #3 continue being heterogeneous machines. The K machine did jump from #4 to #2. The new benchmark is reflecting the cost of moving data outside the CPU. Prof. Dongarra confirms my claim of that the performance problem is in the interconnect, and it is noted how the problem is temporal and will disappear in a next gen of supercomputers where the GPU will be on the same die than the CPU (aka APU):

On that note, take a look at the results and notice the red markings. Those mean the presence of GPUs or coprocessors. If you start making a few quick connections, you’ll see that the GPU accelerated systems that tend to shine on LINPACK really don’t pull the same power they do on this real-world application-oriented benchmark. As Dongarra said, ““It’s not unlike HPL where GPUs and coprocessors have a lower achievable percent of peak because it’s harder to extract performance from these. It’s not just programming ease either, it’s about the interconnect. When that problems goes away it will change the game dramatically.”

Of course, this is not a message of doom and gloom when it comes to GPUs and coprocessors on these large machines—at least in the future. Since a great deal of the interconnect problem with coprocessors and GPUs will be a thing of the past once data never has to leave its chip home. In the meantime, it’s important for this benchmark to get traction—as well as the codes and systems—for this new generation of processors that will kick off the newest Knight’s family processors and work by the OpenPower Foundation and NVIDIA to nix the hop and keep the movement on the die.
 


I don't see adaptive v-sync 8(

Also, I wonder if they actually added "advanced" features to the "advanced" tab for the drivers... The "advanced" options are stupid basic ones.

Cheers!
 

jdwii

Splendid


I'm honestly puzzled on this. Juan said 4 modules or 8 piledriver cores would be 3% of the total die space of the APU you can triple that size(brute force the design pretty sure that's not happening this time around) and you still have the CPU only using 9% of the die space. I could go on and on why more transistors doesn't always mean more performance but i'll just let you look at Intel vs Amd benchmarks from 2008 and up.
 

jdwii

Splendid


Yeah lets see the type of iGPU power they have for us gamers i just got done with their new 4600 GPU in the haswell and it can't even hold up to gamecube emulation graphics. I guess they will have to improve that end if dgpu's get replaced.

 

jdwii

Splendid


Not bad for the people who already own their hardware. Yes they need to put adaptive v-sync in their drivers and i like FXAA a lot to but not sure how others feel about it(since it works for all games in nvidia settings)
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Here, mate: http://cdn.videocardz.com/1/2014/12/AMD-Catalyst-Omega-Driver_Feature-List.jpg

FreeSync ;)
 


Oh, I'm certainly not talking about their "APUs" overtaking the market, but they do trail AMD in OCL stuff on their own. I'm talking about how dCPU and dGPU will still be relevant for a lot more years, and since AMD's path is not focusing on CPU prowess anymore, Intel will take over HEDT too darn easily. I'm sure they'll improve their "APUs", but the CPU portion will be better than AMD.



But FreeSync requires a compatible monitor, which I don't have. Adaptible v-sync is easy to add as an option. I'm sure of that. RadeonPro does it outside the driver, so it can't be THAT hard to include.

Cheers!
 

jdwii

Splendid
Just to clarify to people who might not know what adaptive V-sync is, its nothing more then a V-sync switch over 60hz it turns V-sync on and below 60hz(or your monitors hz) it turns it off its simple yet amazing. I use it for all my games.
 
Status
Not open for further replies.