Report: AMD Carrizo APUs To Get Stacked On-Die Memory

Status
Not open for further replies.

InvalidError

Titan
Moderator
Since many of Intel's roadmaps seemed to indicate Intel was planning to make their 128MB Crystalwell L4$/GDR standard across most of their lineup next year, I would have been more surprised if AMD did not announce something similar to avoid falling even further behind.
 

PEJUman

Distinguished
Apr 27, 2008
58
6
18,665
Most CPU benches for intel does not seem to scale with memory bandwidth (at least when compared to AMD APUs). I think AMD processors would benefit a lot more from on package DRAM (ala crystalwell); who knows, maybe this will allow them to finally catch-up to intel again. We really need AMD.

Intel have a tendency to coast when allowed. It was athlon that drives them into the CORE microarch, and abandon netburst. Now we have the sandy-ivy-haswell coast again...
 

InvalidError

Titan
Moderator

But their IGP does.

Most desktop applications require a balance between bandwidth, latency and processing power. Once you pass the typical bandwidth and latency requirements for typical workloads for a given architecture, benefits drop off sharply. Intel simply happens to be a few miles ahead of AMD at decoupling their CPUs from memory latency and bandwidth under most circumstances.

GPUs on the other hand are almost entirely dictated by bandwidth since almost every computational challenge GPUs face can be made easier and faster with more, faster memory to cache results and duplicate frequently accessed items across memory channels to accommodate more concurrent accesses.
 

knowom

Distinguished
Jan 28, 2006
782
0
18,990
AMD still needs to get it's ducks in a row in terms of power efficiency because it's miles behind Intel in terms of clock for clock basis with the right hardware and know how Intel CPU's simply run at much better voltages for their clock for clock performance output.
 

Drejeck

Honorable
Sep 12, 2013
131
0
10,690
APUs are really interesting products and they are really evolving into something completely unseen. The Xbox One experiment and the X360 with on board SRAM gave excellent results in performance/efficiency, plus, they never achieved to build fast L3 cache and the obvious workaround is a larger cache to avoid cache miss and larger bandwidth even if the client is not going to buy that expensive ram kit 2400 CL 9.
 
A good news from AMD :)

AMD + Radeon combo should have brought a real good product. We are waiting for a huge leap from the previous APU line up.

Keep fight back and bring a good competition. It will benefit all of us, customers
 

razzb3d

Distinguished
Apr 2, 2009
163
0
18,690
I have a bad feeling about this. Sticking 64-256MB of on die memory, at say half CPU speed is pointless. It's too little for the GPU to use as a framebuffer and too slow for the CPU to use as L3 cache.

A good ideea would be to slap 1GB to 2GB of GDDR5 in there over a 192-256 bit bus, so the GPU can use that as a framebuffer (think A10-8800K with built in R9 270) and the CPU part can use that for L3 cache. Another cool thing is that you should be able to start your PC with all DIMM slots empty since RAM is on-die.

Why a 192 to 256 bit bus? Because it needs to be as fast as possible. If it's over a 128 bit bus like sistem memory, the CPU will not be able to use effectively it as L3 cache. Why GDDR5? Because the L3 cache should be as fast as the CPU, so it will not slow it down. Think FX 8350 - when you OC the northbridge to CPU speed (and with it the L3 cache) you notice significant performance improvement in demanding task (especially those FPU-related).
 
I find it funny that CPUs have sort of come full circle. CPUs around the time of the original Pentium had cache on the motherboard in addition to the RAM, and then they had cache on the Slot A and Slot B units which had basically RAM connected to the CPU at half speed on the same board.

They moved away from it in favor of faster smaller cache on the CPU itself, and now they are moving back to having RAM, probably running at half speed again, connected on the CPU but not on the CPU die itself to increase performance. Interesting how history is repeating itself.
 

InvalidError

Titan
Moderator

128MB may not sound like much but it is large enough to cache most of the most frequently used textures for the IGP if you tune settings accordingly. Crystalwell has about half the latency of system RAM and 4X as much bandwidth (50GB/s read + 50GB/s write, which is on par with 6GT/s 128bits GDDR5 without GDDR's extra latency), which should also be fairly useful for multitasking by keeping most regularly accessed stuff evicted from the CPU caches from having to be fetched from system RAM again every time a context switch flushes stuff out.

The frame buffer uses relatively little bandwidth compared to other more important things like the Z-buffer so it would be one of the last things that gets dumped in there when there is spare space.

Today's high-end GPUs may have a seemingly impressive 6GB RAM but what is most of that RAM used for? Resource duplication across multiple channels for read bandwidth multiplication.
 

alextheblue

Distinguished
A good ideea would be to slap 1GB to 2GB of GDDR5 in there over a 192-256 bit bus, so the GPU can use that as a framebuffer (think A10-8800K with built in R9 270) and the CPU part can use that for L3 cache.
You want the CPU to use GDDR5 as L3 cache? No. Just no. Caches need to have very low latency. GDDR5 had horrendously HIGH latency. Worse than regular ol' DDR3.

What they're doing with stacked DRAM is the basically the best of both worlds. It's faster than falling back to main memory (bandwidth AND latency) so it can act as an L3 cache if necessary, but it will mainly benefit HSA and graphics. Invalid's post covers the graphics side of the equation pretty well.
 

WINTERLORD

Distinguished
Sep 20, 2008
1,775
15
19,815
I wonder what they mean by much higher performance. Is married law comming about or will it be the same 15-30%increase every year like usual?
update I ment murfies law*
 

oj88

Honorable
May 23, 2012
91
0
10,630
AMD needs to make some ~47W TDP mobile APUs, not just up to 35W.

Just released 35W FX-7600P Kaveri is pretty good for gaming laptop. Does anyone know which laptop manufacturer is making 17.3" with it?
 

falchard

Distinguished
Jun 13, 2008
2,360
0
19,790
Nice to see AMD utilizing tech they developed for Microsoft. 128MB is not much, but its big enough to hold a 4k res buffer. Obviously usage depends on the game. If a game is about the visuals and needs 8 GB GDDR5 with 2048 Stream Processors at 1 Ghz each, that's a different beast to tackle. But if its a strategy game that has a lot of AI calls or used to process Physics, it could be quite something.
 

Drejeck

Honorable
Sep 12, 2013
131
0
10,690
Dunno why I can't quote Razzb3d. Anyway, GDDR5 latency is tipically 15 CAS, DDR3 is tipically 9. Now speaking of real time access they are quite similar but the higher bandwidth favors larger chunks of data, which a CPU doesn't require. AMD has never built an L3 cache fast enough but giving the caching algorithm more memory means less cache miss. I'd prefer static ram for it's higher performance. A 32/64 MB would be sufficient with an efficent caching algorithm, but it costs more and probably has a larger power consumption. GPUs are really latency tolerant. The bits are not important for CPUs. Maybe you can't even imagine how fast is an Intel L3 cache, or a L1. Those caches are for repetitive tasks not to store data, frame buffering or system memory (even if they are going closer to a SoC)
 
APUs are really interesting products and they are really evolving into something completely unseen. The Xbox One experiment and the X360 with on board SRAM gave excellent results in performance/efficiency, plus, they never achieved to build fast L3 cache and the obvious workaround is a larger cache to avoid cache miss and larger bandwidth even if the client is not going to buy that expensive ram kit 2400 CL 9.

I wouldn't say unseen considering the concept of a shared CPU and GPU with common memory was the goal all along with a very good idea of what the benefits are. Maybe unseen to the average person, not those in the computer industry for 20+ years.

Also, sticking a fast buffer next to a processor isn't always the best solution to solve the issue of the main memory being too slow. The small, fast buffer will simply fill up causing a bottleneck which in terms of gaming would be massive stutter. Game developers aren't too happy with the XBOX ONE because of this and they have to spend expensive man-hours trying to avoid this, unlike the PS4.
 
Status
Not open for further replies.