AMD CPUs, SoC Rumors and Speculations Temp. thread 2

Page 47 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


I recall the old discussion and how everyone else disagreed with the predictions, saying it was impossible. Then AMD published the last roadmap and the plans about starting replacing dGPUs by SoCs and some people changed their opinion. Latter AMD engineers published the EHP paper and virtually confirmed all the arguments given in this forum two years ago.
 


I will remark once again that the reason why AMD doesn't make a large APU today (e.g. 600 mm² size) is due to the DDR bottleneck. It is the same reason why the low graphics cards use DDR memory but the top cards use GDDR5 memory.

AMD could fabricate a large APU with tons of CUs today, but its performance would be bottlenecked by a slow DDR3/DDR4 memory subsystem. Improved memory subsystems are needed. That new memory tech has been developed during last years, AMD will use HBM to build high performance (+4TFLOPS DP) APUs

AMD-Greenland-GPU-Based-HPC-APU.jpg
 


Evidently the 200--300W APU announced by AMD (check the roadmap) will not use the same cooler than a 65W Carrizo-based APU. That is evident. It is so evident like saying that the 200W Phi CPU from Intel doesn't use the same cooler than a 65W Skylake CPU.

 
I will remark once again that the reason why AMD doesn't make a large APU today (e.g. 600 mm² size) is due to the DDR bottleneck. It is the same reason why the low graphics cards use DDR memory but the top cards use GDDR5 memory.

Very low end GPUs use GDDR3, not DDR. Two totally different technologies.

Secondly, the memory bottleneck is still going to exist. All AMD is doing is putting HBM on die, but the main system memory is still going to be DDR4. Meaning that you are still bound by DDR transfer rates.

AMD is basically doing with APUS EXACTLY what discrete GPUs already do: Create a large cache of very fast memory on the die you can preload ahead of time, so the GPU doesn't stall out. The only difference is AMD is doing this over the main memory bus, rather then through PCI-E. And the cost of this is going to be a very large, expensive, power hungry die which is not going to offer great profit margin for the company.
 


https://en.wikipedia.org/wiki/AMD_Radeon_Rx_300_series#Desktop_products
https://en.wikipedia.org/wiki/AMD_Radeon_Rx_300_series#Mobile_products

Lowest ones use DDR3, low-middle uses GDDR5 and top uses HBM.



No. Current mobile/desktop/server APUs only have DDR3 memory and the iGPU is bandwidth bottlenecked. For comparison, the GPU on Fury X card has access to 512GB/s of bandwidth provided by HBM stacks, whereas the iGPU on top Kaveri APU (dual-channel DDR3 @2133MHz) only can access to 34GB/s, which are shared with the iCPU. That is the reason why Kaveri APU is limited to 8 CUs: the slow DDR3 memory [1].

Check the diagram given above. The iGPU on the high-performance APU announced by AMD has access to HBM memory stacks with 512GB/s of bandwidth. Evidently the DDR-bottleneck of current iGPUs is eliminated and engineers can add lots of CUs up to get >4TFLOPS.

Of course, the main system memory of the APU will be DDR4. But this is exactly the same situation that with a dCPU+dGPU configuration. The Fury card has a private memory poll of HBM but the main system memory continues to be DDR3 or DDR4 on mobo.

As I already explained multiple times, the APU concept is better, because the iGPU can access directly to the main memory system via hUMA links, whereas the dGPU has to access to the main memory system indirectly via a slow PCIe interconnect, and wasting CPU cycles on synchronizing accesses (the dGPU is a second class device and works in slave mode).

[1] In fact, AMD had planned a superior Kaveri model, with 6CPU cores and a larger number of CUs; that superior performance APU was served by GDDR5 memory. It was canceled because one of the providers of the SO-DIMM modules was out of business. The Kaveri die still has a GDDR5 memory controller.
 


I'm sure some may have said it was impossible but certainly not everyone. "Fusion is the Future" was AMD's mantra since buying ATI. They have been on track of including more and more transistors for both the CPU and iGPU since the beginning. They have gone from 800M transistor APUs to some 5B transistor APUs in just a few years time. There is no engineering restriction to limit the progression to larger APUs.

The question has always been around is it economically viable to do so, and when. Not if it is possible or not.
 


That's the big question. In the consumer space, which is what we are discussing, everything comes down to effective costing. Currently, and for the next twenty years, you get better high end capability by separating out the GPU from the CPU. The physics is pretty solid on this, thermodynamics is a harsh mistress. Now practically anything can be solved if you through enough money at it. Willing to pay a few thousand USD for a CPU and suddenly they can make a huge 600~800mm^2 die that outputs 300~400W of power, which is what you would get if you fused something like an i7 onto the side of a Fury X. Create an interconnect between them with powerful stacked memory locally and it's no different then having them separate. You gotta deal with the insane thermal issue, but again with enough money you can create a custom solution that would handle that.

Of course you could just separate them and it would be a fraction of the total cost and you would get identical performance in consumer workloads.

See everything that's come out of Juans fingertips has been related to HPC workloads where they are running statistical analysis on financial market trends globally, calculating the gravitational density of a neutron star or running a multi-dimensional simulation of a plasma during fusion. And for those situations he is actually correct, the locality of the giant vector co-processor helps a ton because of how sensitive those simulations can be to latency. Running it on a powerful dGPU is still better then running on a weak APU, but a powerful, and expensive, integrated APU would be the best of both. Nothing of it is applicable to consumer workloads though. Essentially he has said nothing this entire time and is just a giant distraction.

It's like a group is discussing the Superbowl and someone is insisting that the Red Socks will win the world series. The Red Socks could very well win that game, and it's completely irrelevant to the discussion at hand no matter how many charts and statistics they quote.
 
As I already explained multiple times, the APU concept is better, because the iGPU can access directly to the main memory system via hUMA links, whereas the dGPU has to access to the main memory system indirectly via a slow PCIe interconnect, and wasting CPU cycles on synchronizing accesses (the dGPU is a second class device and works in slave mode).

The PCI-E bus is fast enough to keep the GPU fed; it's not like you see GPUs drop to 50% utilization because they are waiting for data to come across the bus. That's the wonder of pre-loading what you need ahead of time. Yes, you add a constant 1ms or so latency writing back to main memory, but given you have a 16ms deadline to play with (assuming 60 FPS), this can be managed. AMD is fixing something that doesn't really need to be fixed.
 



It does need to be addressed but the priority is not the same for each market segment. With the Radeon group being separated off again their big APUs are now more on the semi-custom timelines. They have designs for them in the queue but they likely won't be taped out until they have enough customers lined up or the grant money is high enough.

Meanwhile they are continuing with Zen CPUs and also big dGPUs. This will satisfy the workstation/HEDT market segment. VR is going to ramp this year with 3 main projects with an anticipated 6M or so units shipped. This should help drive the dGPU sales this year.
 
I guess we're going to Seattle now:

http://arstechnica.com/information-technology/2016/01/amds-datacenter-arm-processors-finally-hit-the-market/

AMD has been vague about both pricing and performance. The company says that the top-end part will cost around $150, with the others coming in below. This is quite a bit cheaper than Intel's Xeon D processors, which pack Broadwell-class processor cores with a bunch of I/O connectivity. These parts start at $199 and go up as high as $675. However, AMD concedes that the Xeon Ds are considerably faster. The company says that Intel's 2013-era Atom-based C2000 series systems-on-chips are a better comparison for these new models.

I have my smugface on right now.
 


The problem is that these are ARM based chips and no matter how hard the ARM fanboys proclaim it the best ARM based uArch is not nearly as powerful as current x86 uArch CPUs.

It is also not a very good thing to see a company say that as it puts doubt in their product. Why would a person buy something a company doesn't think can compete with current rivals?
 
happy new(!) year, guys.

The Silver Lining of the Late AMD Opteron A1100 Arrival
http://www.anandtech.com/show/9956/the-silver-lining-of-the-late-amd-opteron-a1100-arrival
Silver Lining Systems puts AMD's ARM server in a different class
http://semiaccurate.com/2016/01/14/silver-lining-systems-puts-amds-arm-server-in-a-different-class/
AMD Launches the Opteron A1100 Series of 64-bit ARM CPUs for Servers
http://semiaccurate.com/2016/01/14/amd-launches-the-opteron-a1100-series-of-64-bit-arm-cpus-for-servers/
it literally means silver lining.

JEDEC Publishes HBM2 Specifications - Will Scale Up To 32GB, 8-Hi Stacks, with 1 TB/s Bandwidth
http://wccftech.com/jedec-publishes-hbm2-specifications-scale-8hi-32gb-stacks-1-tbs-bandwidth/
AMD Seattle Support In The Linux Kernel Still Getting Squared Away
http://www.phoronix.com/scan.php?page=news_item&px=AMD-A1100-Kernel-Support
AMD HSA Support Finally Appears Ready To Be Merged In GCC
http://www.phoronix.com/scan.php?page=news_item&px=AMD-HSA-GCC-Merge-Ready
New AMD CPUs To Support Power Monitoring With Linux 4.5
http://www.phoronix.com/scan.php?page=news_item&px=Linux-4.5-Fam15h-Power
It Will Be Interesting To See If AMD Supports Coreboot For Zen
http://www.phoronix.com/scan.php?page=news_item&px=AMD-Zen-Will-It-Coreboot




for all the utter misinformation over performance, tdp, uarch, pricing spread in this thread and the old thread, the reality sure doesn't come as a surprise. 32W TDP on a 28nm mobile process, 25w on a quad - both without an igpu, 2 gigs and under clockrate.. it's still good for a starter but...as i've been saying all along(!) seattle was never gonna take the world by storm. anywho, i think ars's comparison to xeon-d is unfair. yes, both address the same...ish market but the bdw cores are so much more powerful, the process node advantage, better power management, ecosystem, BS publicity from arm as well as it's media friends...the last one has got nothing to do with tech though. vanilla arm cores are never gonna cut it, be on the look out for custom cores on smaller nodes and gradually maturing ecosystem.
 
Still better than the X-Gene offering but architecture isn't going to save you when it's 28nm vs 14nm Broadwell.

As many here said. Too little, too late. It's a long term play at best. They needed it to get the fixes into Linux for K12.
 

i may be wrong, but iirc they bragged about their cpu part being faster than intel's cpu (forgot which one) and then withdrawing soon after.

amd needs to get power management support into linux before k12 gets here. i gotta say, i am kinda cautiously looking forward to seeing what k12 can do. seems like qualcomm wised up and been catching up (if not going ahead) to amd.
 


Even with rumors going on around that discrete GPUs won’t stay in the market in the long term, the statement is simply dismissed by Raja who further says that the market for dGPU will always be on the rise as compute demand will never decrease.

Hey, the guy running AMDs GPU division agrees with me about the future of dGPUs.
 


Seattle was originally announced on 2013 for release on 2014

http://ir.amd.com/phoenix.zhtml?c=74093&p=irol-newsArticle_pf&ID=1830578

But AMD delayed and delayed it, until Intel came with 14nm Xeon D. At this point the Seattle Opteron only makes sense as development/testing platform until K12 arrives.
 


Naming others "ARM fanboys" is a direct route to start a flame war, but let us focus on facts.

Seattle Opteron is not using "the best ARM based uArch", not even close! That Opteron is using the same A57 cores you can find in several phones. The A57 core is a phone-class core. In fact, the A57 is not even the best phone core. The A72, Cyclone, Twister, and other phone-class cores are better.

The closer x86 'equivalent' of this A57 core would be a Silvermont core from Intel or a Puma core from AMD. I wrote 'equivalent' because both Silvermont and Puma consume more power and are not really suitable for using in phones. There is no phone with quad Puma cores. Silvermont uses a better process node (22nmFinFET) to somewhat compensate for its inefficiency.

I don't understand why some people insists on confounding a tiny phone core like the A57 with a larger server core like Broadwell core used in the Xeon D line and then makes wrong claims about the ISA. To put things in perspective, the A57 core has about 20% of the size of Broadwell core (at same process node). I.e. one Broadwell core is bigger than a cluster of 4x A57 cores. The ARMv8 ISA is better than the x86 ISA, but the ARM ISA is not magical and doesn't allow you to get the same performance with less than one fourth the transistors.

Why don't you compare a x86 server core against an ARM server core? The Vulcan core is a server-class ARM core and it is expected to be so fast like Haswell. AMD K12 is another server-class ARM core (you will not get phones with K12 inside). The K12 core is expected to have the performance of 2x A57 cores, which would put the K12 nearly at Broadwell level. Keller already said that K12 is "wider" than Zen.



AMD knows that Seattle is too late for production servers, but AMD can still sell it for development/testing purposes.
 


Are you sure? The WCCFTECH article finalizes with:

AMD’s Latest Packaging and Integration To Help Development of HPC APUs

The letter “P” is very dear to Raja as it means four key components for RTG while designing next gen GPUs. The four “Ps” include Performance, Power, Price and last but not least Packaging. AMD is the firs graphics vendor to ship an HBM powered graphics card and they have got some experience from it in the packaging department. Tight integration of the GPU and HBM silicon on the same substrate (interposer) leads to some crucial learning that helps designing next generation solutions for compute hungry audiences.

AMD High Performance Compute Platforms

For some time, we have been hearing about HPC APUs which are simply put, high-end APUs that will come with a fast discrete graphics chip, several next-gen x86 cores and tons of HBM memory, all integrated on a package and all chips linked via the fast interconnect which has been talked about in this article. We have previously reported on extensive details regarding the High-Performance Computing APU and Exascale Heterogeneous Processors from AMD. If all goes well, we will see an update in this regards when the Zen processors hit the market which is probably due for release at the end of this year.

WCCFTECH uses the term "discrete graphics" for what I and the rest of the world calls "integrated graphics". APUs use integrated graphics.
 
If Raja says dGPU then I take that as dGPU. He notes the different requirements for HPC, workstation and consumer graphics.

Even with rumors going on around that discrete GPUs won’t stay in the market in the long term, the statement is simply dismissed by Raja who further says that the market for dGPU will always be on the rise as compute demand will never decrease.
 

I agree with your assessment. Raja Koduri is well aware what iGPU versus dGPU entail, and dGPU is not going anywhere anytime soon.
 
Status
Not open for further replies.