AMD CPU speculation... and expert conjecture

Page 514 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The 14nm node opens to door to the design of a 8-core APU, where 8 big-cores would represent about 25% of the total die area. Once one has 8- big cores inside an APU, the need for a separate dCPU vanishes.

The real game changer is the 10nm node.



Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.

Memory bottleneck is solved by HBM/HMC memory. AMD is already integrating HBM on the next APUs. Nvidia is doing the same (APUs with up to 1.6TB/s BW). Intel will start selling 'CPU' with packaged MCDRAM (500GB/s BW) the next year.

For the sake of comparison, a GTX680 has 192 GB/s BW available.
 

truegenius

Distinguished
BANNED

what would you like to say to those who use features like quad sli or crossfire?

and in 2020 if someone tries to play a game on his 4 monitor setup each at 8k resolution then how these apus would be able to store the game data need for processing by igpu ? would amd use the crystals of some broken crystal ball ?
 

so, uhm.. what type of memory does PS4 use? i guess slow ddr is ... slow.
 

jdwii

Splendid




I suspect to see a huge performance benefit from Trinity(Richland) APU on a laptop to a Kaveri since Kaveri has around 15% more performance per clock compared to the phenom in my testing and its clocked higher as well at 2.7Ghz vs 2.5Ghz for A10-5750M. Probably a good 20% boost on average in CPU performance with a nice boost in GPU performance Since its going to use GCN. Honestly i might upgrade my old llano laptop to this if i can find a good laptop with the best APU for 650$ or less.
 

sapperastro

Honorable
Jan 28, 2014
191
0
10,710


DDR5. Unsure of the speed or latency.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I would say the same that I said the first time, and the second time, and the third time,... that I was asked about that.



Since you seem so confident, just let me know how many DRAM memory have these APUs?
 


Memory is nowhere near the main bottleneck in a system right now and while in servers faster RAM would be nice, it is not what is needed for a APU to shine as current APUs do taper off.

The PS4 APU is actually a bit under impressive if you look at it from a GPU standpoint. It has the same amount of shaders as a HD7870 yet with even the CPU portion, it is slower in TFLOPS (1.84 vs 2.54) than a HD7870.

And pulling off 200-300w of heat from a CPU is fine. It would be the 250-300w of heat from the GPU plus the 80-150w + of heat that would be hard for air coolers to do as easily.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The PS4 uses GDDR5 @ 2750MHz as system memory. DDR3 @ 3000MHz would offer only about one half of the bandwidth.

I guess your computer DDR3 system memory peaks at something as 34GB/s. The PS4 system memory peaks at 176GB/s.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


DDR5 doesn't exist. The last JEDEC standard is DDR4, which is not still in the market.

GDDR5 != DDR5
 

how is gddr5 "beyond" the "slow" ddr memory in pcs?
 

jdwii

Splendid


But yet CPU's work better with lower latency and GPU's work better with more bandwidth.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


APUs are a compromise, plain and simple. Kaveri had to compromise CPU speed for GPU speed. There are some advantages though, like cost.

In maybe 5-6 years you'll be able to get consumer APUs with 3D memory at a reasonable price. It all depends on how quickly Micron/SKHynix/Samsung can ramp their production and get the costs down.

For reference, right now the only board you can get with 3D memory on it is 25k. I should be getting mine (for work) sometime this summer.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Memory bandwidth is the main bottleneck for current APUs. It is the reason why Richland increased native DRAM frequency from 1866MHz to 2133MHz. It is the reason why top APUs from Intel introduce L4 cache. It is the reason why the Xbox1 APU uses ESRAM and it is the reason why the PS4 APU uses fast GDDR5 instead of slow DDR3.

AMD had originally planned a faster top Kaveri APU with 6 CPU cores and more powerful GPU (that would hit 1TFLOP). That original APU concept used fast GDDR5 memory. However, there were problems with one of the memory suppliers and then AMD canceled its original plan, reduced CPU cores to 4, lowered the GPU clocks, fused the (now unused) GDDR5 memory ondie controller and released the final top Kaveri APU that you can purchase in the store, the one that relies on slow DDR3 memory.

The PS4 APU has been designed to fit inside a console. The HD7870 has been designed to fit inside a desktop. Consoles are limited at about 100W TDP. Desktops can have up to 10x more TDP than consoles.

If you were to use a HD7870 for a console you would cut its TDP by about one third to fit it in the 100W thermal limit, which implies lowering the clocks if you want maintain the same number of execution units.

Assuming cubic scaling one third less dissipation correspond to lowering clocks by ~1.4. which reduces TFLOPS from 2.54 to ~1.81.


Then you would have a CPU+dGPU configuration that would be more expensive, less reliable, and clearly slower. Both Sony and Microsoft rejected a CPU+dGPU configuration for the hardware of its consoles since the first minute that they start thinking about the design.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I have mentioned how GDDR5 module provides twice more bandwidth than a similarly clocked DDR3 module.



As I mentioned plenty of times in this thread, a CPU is a LCU whereas a GPU is a TCU. The "L" is from "Latency".

But a CPU working better with lower latency doesn't imply that CPU is insensitive to bandwidth.

From the textbook Programming Many-Core Chips

It’s becoming clearer that memory bandwidth rather than memory latency is the next bottleneck that needs to be addressed in future chips. Consequently we have seen a dramatic increase in the size of on-chip caches, a trend that is likely to continue as the amount of logical gates on a chip will keep increasing for some time to come, still following Moore’s law. There are however several other directions based on novel technologies that are currently pursued in the quest of improving memory bandwidth:

I bolded the relevant part. He then reports some of the research directions, including stacked RAM, which I mentioned above in one of my former posts. From the HMC consortium FAQ:

What is the problem that HMC solves?
Over time, memory bandwidth has become a bottleneck to system performance in high-performance computing, high-end servers, graphics, and (very soon) mid-level servers. Conventional memory technologies are not scaling with Moore's Law; therefore, they are not keeping pace with the increasing performance demands of the latest microprocessor roadmaps. Microprocessor enablers are doubling cores and threads-per-core to greatly increase performance and workload capabilities by distributing work sets into smaller blocks and distributing them among an increasing number of work elements, i.e. cores. Having multiple compute elements per processor requires an increasing amount of memory per element. This results in a greater need for both memory bandwidth and memory density to be tightly coupled to a processor to address these challenges. The term "memory wall" has been used to describe this dilemma.

Why is the current DRAM technology unable to fully solve this problem?
Current memory technology roadmaps do not provide sufficient performance to meet the CPU and GPU memory bandwidth requirements.

What are the measurable benefits of HMC
HMC is a revolutionary innovation in DRAM memory architecture that sets a new standard for memory performance, power, reliability, and cost. This major technology leap breaks through the memory wall, unlocking previously unthinkable processing power and ushering in a new generation of computing.
+ Increased Bandwidth — A single HMC unit can provide more than 15X the bandwidth of a DDR3 module.
+ Reduced Latency – With vastly more responders built into HMC, we expect lower queue delays and higher bank availability, which will provide a substantial system latency reduction.
+ Power Efficiency — The revolutionary architecture of HMC allows for greater power efficiency and energy savings, utilizing 70% less energy per bit than DDR3 DRAM technologies.
+ Smaller Physical Footprint — The stacked architecture uses nearly 90% less physical space than today’s RDIMMs.
+ Pliable to Multiple Platforms — Logic layer flexibility allows HMC to be tailored to multiple platforms and applications.

In fact their site correctly claims that this new memory technology is needed for exascale supercomputers:

Eventually, HMC will drive exascale CPU system performance growth for next generation HPC systems.

To throw some numbers to the discussion, the HMC 1.0 specification already consider sustained bandwidths on pair with a modern L3 cache from Intel. This is like if your CPU had access to a 100x bigger L3 cache

http://www.extremetech.com/computing/167368-hybrid-memory-cube-160gbsec-ram-starts-shipping-is-this-the-technology-that-finally-kills-ddr-ram

http://www.extremetech.com/computing/152465-microns-320gbsec-hybrid-memory-cube-comes-to-market-in-2013-threatens-to-finally-kill-ddr-sdram

As I mentioned above AMD plans to offer HBM as option for the K12 APU. I can also confirm you that the ultra-high-performance designed by Nvidia doesn't use any L3 cache, but an ordinary hierarchy L1/L2 and then stacked DRAM with a total throughput of about 1.6TB/s.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Don't confound the general concept with a particular implementation of it.
 

can you repost it, please? i follow this thread. i am absolutely certain that you have never mentioned how gddr5 is "beyond" the "slow" ddr memory in pcs. by that i mean that you have never provided any technical explanation on that matter. my query is sorta two-parter. part 1 - how is gddr5 beyond ddr3 i.e. what makes gddr5 so much more advanced than ddr3 in terms of technology and performance - assuming that's what you mean by "beyond". part 2 - how is ddr (i guess you mean ddr3) slower than gddr5?
 

truegenius

Distinguished
BANNED

what ? i didn't got your question
:sweat: i didn't born in pondicherry, didn't studied in Britain thus a little bit slow in English :cheese:
so me want you question explain :whistle: ( cavemen english :p )


because they don't need to have swapable gpus thus single chip solution was better.


average joe don't use exascale indeed extreme joe's extreme desktop cpu is not an exascale cpu
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
As I mentioned before AMD is dropping CMT by SMT for new architecture

http://www.xbitlabs.com/news/cpu/display/20140510165441_AMD_to_Introduce_New_High_Performance_Micro_Architecture_in_2015_Report.html

Not a lot of details about the new micro-architecture are known at present. What is recognized for sure is that it will drop CMT in favour of some kind of SMT (something akin to Intel’s HyperThreading) technology to improve performance in both single-threaded and multi-threaded cases.

About the unfounded rumor that AMD would return to SOI, Keller admits that he is "happy" with the 14/16 FINFET process and that it "looks pretty good". He also admits is targeting frequencies close to 4GHz for its K12 core.
 

colinp

Honorable
Jun 27, 2012
217
0
10,680


16-core Steamroller? Not according to your crystal ball!
 

8350rocks

Distinguished


Alright, I am no dignifying your lunacy with another response.

If you do not post a link to a verifiable source documenting exactly any claims you make from here on out, I am just ignoring you...

Additionally, not one of your claims about ARM have panned out, and AMD themselves have not stated anything confirming any of your claims.
 

8350rocks

Distinguished


HPC != HEDT

If that was the case, then you would see 256 node home PCs with coprocessors for FP ops and games would be designed to run on 100+ cores. Graphics would look like dreamworks quality CGI from hollywood movies, and your average power bill in the states would run about $300-400 monthly.

Additionally...what are your qualifications? How is it you feel ever so more qualified to speculate (being generous with that word, as it is mostly garbage you spout) about future PC technologies? There are many here with MANY more years experience in RELEVANT fields that disagree with you than just me. So...feel free to speculate away, however, I have a feeling your magic 8 ball is going to break soon...
 

8350rocks

Distinguished


From your same article:

What is, perhaps, more important is that one of AMD’s official document detailed a sixteen-core AMD Bulldozer-derived processor. If the company pursues this opportunity an goes for a 16-core chip featuring Steamroller or Excavator cores, its new chips based on the new micro-architecture will only be available in 2016 or even 2017.

I know nothing...and I know no one at AMD, yep...that is certainly provable with that statement above.

According to your magic 8 ball, there would be no more HEDT dedicated dCPUs. But what do we find? PLANS FOR A NEW HEDT dCPU!?! Holy cow...what's next chicken little? The sky is falling?
 

8350rocks

Distinguished


Ironically, ALL the developers working on PS4 and XB1 will both tell you the decision to use ESRAM on the XB1 is the achilles heel and is actually a hindrance rather than a help. The ability to use all memory addressable for tasks as a coherent block on PS4 is far more useful and will only hinder XB1 performance moving forward (as we are currently seeing with lower FPS on cross platform games versus what PS4 can provide).
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


My original words are quoted above in this same message. I mentioned how GDDR5 is beyond the "slow" DDR3. I didn't mention "why", but I can do it now you ask me. GDDR5 allows data to be transmitted on both peaks of the signal, this is why a single 3000MHz GDDR5 module provides double peak throughput than a single 3000MHz DDR3 module: 48GB/s vs 24GB/s, respectively.
 
Status
Not open for further replies.