AMD CPU speculation... and expert conjecture

Page 579 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Hah! There were quite a few at the time. It was before Intel had adopted 64bit x86.


Anyway, SM15000 in the news again. Another design win.

http://finance.yahoo.com/news/netnames-reimagines-data-center-amds-120000535.html

More than $1.5 million in expected annual operating cost savings
Reduction in physical server rack space by 83 percent
Consolidation of 500 servers into four SM15000 servers
Reduction in ongoing operating expense by 75 percent
 

8350rocks

Distinguished


Yes, the SeaMicro solutions are really something pretty amazing. The density is far and away ridiculously good.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
Some confidential AMD slides about future APUs reproduced by WCCFTECH

http://wccftech.com/evidence-amd-apus-featuring-highbandwidth-stacked-memory-surfaces/

It shows dual memory architecture: e.g. HBM+DDR4. The slides confirm that APUs will be the base for AMD exascale supercomputers.

Very interesting that NTV is mentioned in AMD slides. This is an approach also used by Intel for exascale level chips.
 

8350rocks

Distinguished
Juan, the best APU is <1 TFLOP, meanwhile, the best dGPU is >11 TFLOP.

GPU tech is growing faster than APU tech, and APU tech is >1100% behind the best dGPU.

At the point where interconnects become an issue, something that is 1100% slower will not compensate by better memory tech. If you are running compute on the GPU you would require 11x95w parts, or over 1000w worth of APUs, to accomodate 1x400w dGPU in terms of raw compute...

Can you not see why everyone here says it will not happen that way...? You would have to have an APU with the same compute as Kaveri running at 35W with seamless interconnects and low latency to make it worthwhile. We are not there...nor will we be anytime soon. By the time that happens, we will likely have dGPUs that can do 30-40 TFLOPS.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810
So basically Intel/NVidia/AMD all working on these rather large APU/CPU modules. Knights Landing, Pascal, ???.

I just wonder how big these things will get. Like the 600-800mm^2 Power MCMs.

 


I think there is more to the argument than this though. The purpose of an 'APU' is flexibility. The dGPU is designed with a very focused function in mind. Modern dGPU can work on many more things than older ones, however the uArch is still very narrow in what it can and can't be used for. Those 11 Tflops are only available for very specific workloads. The APU in contrast has a mix of compute resources such that it can be used on many different workloads, albeit at lower overall efficiency. Personally I think the APU is a very elegant solution if used in the correct context, and I can also see why this type of system would be desirable for HPC applications.

I guess the question here is more one of time-scales than anything else. At the moment the transistor and heat budget for an APU limits the GPU portion too much to allow it to compete with a dGPU. The other thing to consider is that outright performance isn't the key but rather perf / w. Now it's interesting you mentioning 35w Kaveri- as Kaveri has been shown to maintain a large bulk of it's 95w performance all the way down to 45w. Comparing a cluster of 45w kaveri against a high end dGPU might not be so clear cut (especially with a mixed workload).
 
guess the question here is more one of time-scales than anything else. At the moment the transistor and heat budget for an APU limits the GPU portion too much to allow it to compete with a dGPU. The other thing to consider is that outright performance isn't the key but rather perf / w. Now it's interesting you mentioning 35w Kaveri- as Kaveri has been shown to maintain a large bulk of it's 95w performance all the way down to 45w. Comparing a cluster of 45w kaveri against a high end dGPU might not be so clear cut (especially with a mixed workload).

Remember dGPU's use the same technology as APU's and thus any advancement in APU technology will also apply to a dedicated graphics processor. So the faster and more energy efficient you make that integrated graphics processor, the faster and more energy efficient the dedicated one becomes. That is why you see a consistently large difference between mainstream dGPU's and the highest end iGPU's. It's all about transistor and power budgets.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Don't forget IBM! The IBM talk mentioned the replacement of outdated discrete architecture with disjoint memory by 'APUs' (cpu+coprocessor) with unified memory.

It is unsurprising that everyone (AMD, Nvidia, Intel, IBM, Fujitsu...) is working in the same paradigm, because the laws of physics are the same for all.

Regarding sizes, some Intel doc mentions 400mm2. Nvidia mentions 290mm^2. Those sizes include cores+caches+noc+IO but don't include stacked DRAM and other elements. AMD doesn't mention sizes in any doc that I have, but they would be similar to Nvidia designs. What is interesting is that AMD is already considering including SSDs in SoCs.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Relevant link about PCRAM

http://www.micron.com/about/innovations/pcm
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The purpose of APU is to bring the advantages of integration. HSA brings further performance gains and programing simplifications.

As you correctly note, it is a question of time-scales. Everyone in the industry knows that APUs will kill dGPUs, the question is when.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The best APU is about twice faster than what you pretend and it is not the fastest APU that could be made today, but only the fastest APU that Sony want to pay.

APU tech is growing faster than dGPU tech. In his recent 25x20 talk, AMD's Papermaster mentioned how APUs will bring efficiency gains of about 25x. dGPUs will only increase by about 2.5x in same time span

11 TFLOPS dGPU x 2.5 = 27.5 TFLOPS dGPU

1.8 TFLOPS APU x 25 = 45 TFLOPS APU

APU will be much faster than dGPU.

Every engineer from AMD, Nvidia, Intel, IBM, Fujitsu... know this. Unsurprising that the recent slides reproduced by WCCFTECH mention clearly that AMD has selected APUs as solution for exascale computers.

No sure what are you arguing here...
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810
Since when does 25x efficiency equal 25x performance? They still have to fab these things. You'd be looking at a 50 Billion transistor APU (not counting DRAM) to hit 45 TFLOP.
 
AMD Teases “Core Is Back” Teaser and Core Evolution Video – New CPU Announcement Possibly Imminent
http://wccftech.com/amd-teases-core-is-back-teaser-core-evolution-video-cpu-announcement-possibly-imminent/
mentions amd a-series. so.. not a standalone cpu, but an apu most likely.

looks like amd now has enough incentive to bundle stacked memory with the apus
http://wccftech.com/evidence-amd-apus-featuring-highbandwidth-stacked-memory-surfaces/
they mention "significant benefit for high volume markets" in one of the promo slides. i hope this means that we don't have to imagine core i7 4770 (space heate)R with iris pro when we imagine high bw stacked dram. :p

Examining AMD’s Driver Progress Since Launch Drivers: R9 290X & HD 7970
http://www.eteknix.com/examining-amds-driver-progress-since-launch-drivers-r9-290x-hd-7970/
seems like amd's launch drivers have improved since 2011. good work.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
Mobile Kaveri very competitive against Haswell

http://www.anandtech.com/show/8119/amd-launches-mobile-kaveri-apus/

I expect Mobile Carrizo to bring similar competition.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


That link has been given before in this page. Don't miss the discussion. :p
 

jdwii

Splendid


Nice that is some great progress quite happy to hear that from Amd wonder how their Nvidia driver article will look like.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Have you ever considered why AMD's Papermaster announced during his "25x20" talk that AMD want improve APUs efficiency by 25x for the year 2020? Why not 18x or 6.3x?

http://www.amd.com/en-us/press-releases/Pages/amd-accelerates-energy-2014jun19.aspx

The answer is that there is an international goal of scientific/engineering community (hey I am also a scientist guys and one that works in "hard science" ;)) to increase performance of computers by 50x, but increasing power consumption by only 2x . Thus you need a 25x increase in efficiency to hit that goal.

All the 25x efficiency gain will be translated to 25x more performance with the remaining 2x gap coming from doubling the power consumption.

WCCFTECH only reproduced some slides of the AMD talk. This slide from the Nvidia talk is much more interesting because details the goals for the year 2020

DARPA-goals.png


Curious eyes will notice that the left hand side of the slide mentions "GPUs", whereas they are replaced in the right hand side of the slide by the "72000 HCNs".

"GPUs" are the discrete Tesla cards used in current supercomputers such as Titan, whereas "HCN" means Heterogeneous Compute Node, aka APUs in AMD parlance. Evidently Nvidia is proud enough to not use AMD market names and invented its own name: HCN.

Of course, this is all about HPC/server/HEDT, where all the efficiency will be spend on increasing performance. In mobile market part of that efficiency will be used to reduce power consumption and increase battery life.

Thus in laptops we will see ~10TFLOPS APUs but in HPC/server/HEDT we will see ~40TFLOPS APUs.

As I mentioned plenty of times before in this thread, Nvidia APU/SoC targets 40 TFLOPS and is rated at 300W. In fact I can be more precise, their APU brings a peak of 40.96 TFLOPS.

P.S: your transistor count is incorrect by a factor of ~2.5x or so.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780




Better. If that piece about stacked DRAM is true it would be Iris' menace.
 

colinp

Honorable
Jun 27, 2012
217
0
10,680
I would suspect that the vast majority of cpus with Iris have been sold to Apple, who will stay Intel until they leave x86 altogether.

What will stacked DRAM do for pure CPU performance, and what will be the impact on TDP?
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Depends of latencies. Probably less than few percent in memory-heavy cases.
Added TDP will be rather negligible. At least if it's similar in size to Intel's 128MB eDRAM.
 

yeah, most of those went to apple.
simply put, stacked dram will help accelerate compute-based workloads as the cpu cores won't have to go further down to slower system memory for data. one of the most apparent benefit of stacked dram may be faster single core performance due to the stacked dram acting as a large L3 cache (in carrizo apus/socs) (if that's how it's utilized). it'll also depend on the cpu's memory access performance, stacked dram performance (latency), what type of cache it'll be treated as etc. i am still looking into how these work.
uaually stacked dram would require additional cooling and raise the package tdp as more transistors equals more heat. it'll depend on the design. better power management, memory access performance might not result in high tdp. i haven't seen hbm's power consumption yet.
 


No, the purpose of APUs is to get a halfway decent GPU on the die so you don't require an external GPU, essentially creating a SoC platform that can be used as an entry into mobile markets.

Your entire argument that you can put a CPU and dGPU on the same die and outperform both as standalones within a very tight heat/power budget is insane. I can confidently state at no point in time will any APU beat a CPU/dGPU within the same power budget.
 
Status
Not open for further replies.