Memory Scaling, AMD's Trinity-Based APUs, And Game Performance

mohit9206 · Feb 6, 2013

blazorthon :

http://www.xbitlabs.com/articles/graphics/display/amd-trinity-graphics_6.html#sect3
Radeon HD 7660D performs worse than the Radeon HD 6570.
Below is another review from Rage3D.com which basically compares the A10-5800k (Radeon HD 7660D) vs. the A10-5800k DG (Radeon HD 7660D + Radeon HD 6670 dual graphics), Core i3 + Radeon HD 6670 and lastly Core i7 + Radeon HD 6670. There are other configurations though.

http://www.rage3d.com/reviews/fusi [...] x.php?p=11

As can be seen, the A10-5800k performs worse than the A10-5800 DG, Core i3 + Radeon HD 6670 and Core i7 + Radeon HD 6670. The exception is Civilization V were all four provides similar performance. However, the A10-5800k DG does perform well.

If you continue to insist that the integrated Radeon HD 7660D is as as good as a Radeon HD 6670 w/ DDR3 RAM (probably about 10% slower than w/ DDR5 RAM), then you should back it up with some benchmarks.

bak0n · Feb 6, 2013

[citation][nom]richardginn[/nom]How about GDDR5 ram and Trinity???I was thinking you could create a hybrid motherboard that would have 2 or 4 DDR slots and 1 GDDR slot in it.You can use the DDR ram if the GDDR slot is empty, but if you wanted even more performance put in one stick if GDDR ram. I do not know how much the GDDR ram would cost though.[/citation]
WAY back in the day they actually had Ram for video that you could "upgrade" on a motherboard. But that was around when color monitors were coming out...

blazorthon · Feb 6, 2013

[citation][nom]natoco[/nom]Wheres the prices of that memory you used for benchmarks? Most bottom of the range descrete gpus use ddr5 let alone ddr4. What a waste of money, really how cheap do you want to go when your going to uses the thing for years, alot of people spend more than the cost of a gpu at the supermarket per week and that does not last years so why be such scrooge mcducks. When food costs more than a gpu and you whinge about cost you need a reality check.[/citation]

No, video cards use GDDR5. There is no such thing as DDR5 and DDR4 is not out yet either. GDDR5 is based on DDR2 and DDR3. Crash already explained the fallacy in your complaint about the price discussions, so I'll leave that as it is.

Also, IDK about other people, but even for my large family, I don't spend more than the cost of a high-end graphics card at the super market each week. Far more in bills and such, yes, but not in the supermarket. Also, I like to have money held over in case of emergencies. For example, I helped some family in New York during the power outage after the *super storm* and that cost about two thousand dollars. I wouldn't want to have dealt with those issues if I couldn't afford it because I spent that money on a computer instead of power generators, food/water, and such, especially if that was instead at the cost of necessities for my immediate family.

Also, many (perhaps most) of the bottom-end discrete cards use DDR3 (usually on a 128 bit interface or less commonly these days, a 64 bit interface), not GDDR5 like the mid-ranged and high end cards.

blazorthon · Feb 6, 2013

[citation][nom]InvalidError[/nom]You do realize that FBDIMMs are actually SERIAL using TMDS signaling, right? This exactly what I said... serial interfaces have a much easier time getting through sockets and slots at very high speeds because every bit can be locked on individually, no need to worry about clock skew between bits the way wide high speed parallel interfaces do.DDR uses strobe signals to break down the 64-bits DIMM bus into separately "clocked" 8bits groups but even this starts getting touchy beyond 2GT/s: if you have a 100ps setup and hold time on the bus, you have less than 300ps of wiggle-room left for everything else.But the overall latency x period yields roughly the same effective latency in terms of nanoseconds and gimping your clock rates achieves nothing more than sacrifice bandwidth. In benchmarks, higher bandwidth with lower effective latency practically always wins even if latency is numerically higher.Most DDR2-800 (and even DDR2-1066) RAM I can see on Newegg is 5-5-5, not 6 or 7 while DDR3-1066 is overwhelmingly 7-7-7.Lets crunch some numbers...- DDR2-800-5 = 5 / 800 = 6.25ns effective latency- DDR3-1066-7 = 7 x 1/1066 = 6.5ns- DDR3-1600-9 = 9 / 1600 = 5.625ns- DDR4-2133-14 = 14 / 2133 = 6.5ns (based on photos of pre-launch packaging)The ~50% cycles latency bump at the transition boundary between technologies usually favors the older stuff in benchmarks. DDR3-1600 may have higher latency cycle count than DDR2 or lower speed DDR3 but it still has lower effective latency even without having to pay for premium low-latency bins at higher frequency and latency cycle count.If the main objective is feeding an IGP, it makes no sense to sacrifice bandwidth for lower latency and in most CPU benchmarks, it makes little to no sense either when the clock rises high enough to offset latency cycle bumps. For mainstream RAM, whatever the current definition of mainstream may be, we we have been around the 6ns mark for most of the past 10 years.As for "chip count not being a problem", interface width and JEDEC are.Chips with wider data busses draw more current from their IOB power plane, need their internal architecture to be that much wider and likely slower. This means wider chips run significantly hotter (2/4/8X as much stuff happening inside, may actually require heat spreader) and also means it becomes more difficult to adequately bypass/filter the power supply to support higher speeds.The other thing is that JEDEC defined the DIMM interface as having one strobe signal per 8bits data group so having one 8bits chip per strobe signal (or a pair for double-sided DIMMs) is practically dictated by the standard itself - try finding a 1GB DDR3 DIMM that isn't 8x128MB configuration even though today's 1GB ICs would make a single-chip DIMM theoretically possible.Many things sound nice in theory but hit brick walls in practice.[/citation]

Yes, I realize that FB-DIMMs are serial. My point was that there are solutions.

Again, the DDR3-1066 modules are all either older modules or poorly binned modules. My point was that yes, at first, new DDR memories tend to have higher latencies, but that gets fixed later on. Furthermore, I repeat what I've already said multiple times: DDR4 is already supposed to have lower latency than DDR3 even when it comes out for us consumers and even if that fails to be done, it'll almost certainly be done by the time it gets into APU systems. However, I do admit my mistake in forgetting to put the 5 in the DDR2 numbers. I'll edit that into my previous post; thanks for pointing that out. It still doesn't change what I said, but it was a mistake of mine nonetheless.

You completely ignored the whole point of what I said about dropping frequency to tighten timings for an apples to apples comparison and took it to an irrational conclusion that had nothing to do with what I was saying. Like I said, that was about an apples to apples comparison. I never said that it would be practical for anything more than that.

I know how timings and frequency relate to latency. I've lectured people about it before (several times) and even Crashman pointed out that he knew that I knew this.

The chips on modern memory modules are all 32 bit wide chips regardless of how many there are on each module. Talking about differing widths changing things when they are all the same width doesn't make any sense. Differing chip counts on modules are there for having differing capacity modules and such reasons. Furthermore, if you care to look at modules historically that had lower chip counts such as eight or four, they were always capable of having higher performance than the full sixteen chip modules. In fact, some early DDR3 motherboards required high frequency modules to be modules that only had eight chips specifically because that was easier on the memory controller.

Having a single chip module is not theoretically possible because it takes two chips for the 64 bit connection. Heck, there might be reasons for why there needs to be four chips seeing as I've never known any two chip memory modules. Also, I've had four and eight chip modules of DDR2 and DDR3. Some of them had heat spreaders, some of them didn't. However, none of them needed them except when overclocked beyond even their rated specs significantly.

Many things may hit brick walls in practice, but we tend to overcome such obstacles. We adapt and overcome. That is how technology advances. For example, like I suggested earlier, we could take the concept of multiple serial lanes instead of true parallel connections such as is used in PCIe. With sixteen lanes, a PCIe 3.0 x16 slot has a theoretical bandwidth of nearly 16GB/s both ways and its practical bandwidth seems capable of getting close to that from what I've seen by looking at PCIe 3.0 x16 benchmarks. Like you suggested earlier, we can use eDRAM caches, although I suspect that 1GB with current fabrication processes is too optimistic because although much denser than the densest SRAM, eDRAM is still usually something like several times less dense than DRAM in memory modules according to what I've read about it.

Again, you speak of mainstream RAM. Again, I reply to that by saying that since the memory that we were specifically talking about at the start of this, DDR3-2400, is not mainstream, so mainstream memory is not the point of this discussion.

palladin9479 · Feb 7, 2013

mohit9206 :

The rage review wont' come up but the first one has the 7660D with DDR3-1866, this article demonstrated the benefits of DDR3-2133 which is now the same price. Also the first review isn't using DDR3 but 1GB of 128-bit GDDR5 memory. The primary limitation of APU's is the memory interface to the system memory, dGPU's that have faster memory will perform better then the 7660D.

The first article didn't really tell you that the 6750 they were testing comes in two flavors, at least not until after the benchmarks. One runs DDR3 and represents the vast majority of available cards, the other is much rarer and runs GDDR5. There is a very large difference between those two versions, so large that the GDDR5 is rarely sold in favor of the more expensive models.

Blaz's assertion is correct, the 5800K with DDR3-2133 memory will outperform all the budget dGPU's on the market and equal to many of the low end midgrade (anything without high speed GDDR5 memory). It's only when you get to the upper mid range / high end that the APU falls flat, and at that point you should be going for a strong CPU + dGPU combo anyway.

Jarmo · Feb 14, 2013

So... how about 4-6 GB of fast memory vs 8-16 GB of slower but cheaper?
Which gives better results?

Because I don't know.

palladin9479 · Feb 15, 2013

Jarmo :

You must first ask the question "What are you doing?".

Assuming playing video games, then you want two channels of the fastest memory you can get. APU's are incredibly sensitive to memory performance, actually ALL GPU's are sensitive to memory performance. The only time you'd need large quantities of slower memory is if your running some large applications that run primarily on the CPU.

Wisecracker · Feb 15, 2013

palladin9479 :

Wellll ... :lol: ... Sorta/Kinda

AMD is generally at the mid-point of the Fusion arch-design that began with Llano and the Unified North Bridge (UNB). The UNB is the arbitrator of memory traffic from the GPU via onion and garlic. Mmmmm, that's tasty!

The 128-bit (2x64-bit) interface should not be that big a limitation when used efficiently, and AMD is transitioning the arch to do so. The *interface* can become saturated because there is so much thrashing, flushing, caching, hitting, missing, writing, re-writing, etc., taking place.

As AMD advances HSA efficiency and bandwidth utilization improves; and when you roll in DDR4 with *unified* direct memory addressing, further improvements to the UNB/memory controller, IOMMU (could open a whole new world in dual-graphics and the 'compute' ceiling), GCN SIMD Engine Array, etc., things could come together quite nicely for AMD. Decent coding and even page-faults for the graphic core local memory should increase efficiency, too.

All these slides are from the Developer Summit 2 years ago (PDF), so as AMD has been seemingly making slow, incremental improvements, the Fusion HSA has really started to come together. Hope they can pull it off.

palladin9479 · Feb 15, 2013

You completely misunderstood what I said. An APU is not just a CPU its also a ... GPU. The primary limiter to the GPU's performance is the shared system memory interface, hence the big performance bumps from going to higher memory bandwidth.

Wisecracker · Feb 15, 2013

palladin9479 :

That's what you said. It may not have been what you meant.

If the *interface* was the primary limitation, faster RAMs and timings would not make any difference.

Right?

(no worries, mate)

palladin9479 · Feb 16, 2013

Wisecracker :

Context is key

Assuming playing video games, then you want two channels of the fastest memory you can get. APU's are incredibly sensitive to memory performance, actually ALL GPU's are sensitive to memory performance. The only time you'd need large quantities of slower memory is if your running some large applications that run primarily on the CPU.

Murray B · Mar 3, 2013

When Intel integrated the FPU into the CPU many complained and claimed it was far better to use a discrete solution. Integrated was better in most ways and today floating point is included in most CPUs. Now many are complaining that graphics processors are being integrated and many are claiming the discrete solution is better. It's deja vu.

Crashman · Mar 3, 2013

[citation][nom]Murray B[/nom]When Intel integrated the FPU into the CPU many complained and claimed it was far better to use a discrete solution. Integrated was better in most ways and today floating point is included in most CPUs. Now many are complaining that graphics processors are being integrated and many are claiming the discrete solution is better. It's deja vu.[/citation]Nope. Nobody ever developed a "super FPU" for the discrete market. High end GPUs will always be designated for high-end purposes, it wouldn't make financial sense for a company to add a billion GPU transistors to its CPU.

Murray B · Mar 3, 2013

Sorry not to be clear. When Intel integrated the FPU into the 486 there was an uproar over integration especially since the Cyrix part was about twice as fast as the competition for the same money. The integration was unavoidable then and will likely be just as unavoidable now. It won't be long before every CPU will contain many GPU cores. Discretes are now obsolescent.

Crashman · Mar 3, 2013

[citation][nom]Murray B[/nom]Sorry not to be clear. When Intel integrated the FPU into the 486 there was an uproar over integration especially since the Cyrix part was about twice as fast as the competition for the same money. The integration was unavoidable then and will likely be just as unavoidable now. It won't be long before every CPU will contain many GPU cores. Discretes are now obsolescent.[/citation]But you're saying the same thing you said before, except that you're mentioning another company that also integrated an FPU (Cyrix). Cyrix also made the Media GX...which gets back to my point. Which you should read, above, before repeating yourself again.

analytic1 · Mar 13, 2013

What do you think the benefits would be if apus came with 10mb (or any amount that would fit on chip) of esram embedded on them ?

blazorthon · Mar 13, 2013

analytic1 :

I think that you mean eDRAM and I'd say that the potential benefits could be huge, or at least greatly diminish the reliance on high memory bandwidth to get better performance.

billcat479 · Apr 1, 2013

I still question this testing, don't get me wrong it's important for people thinking of going that route but I think you have to seperate the cpu issues as these are far from top of the line cpu's even for AMD so I would think the testing of memory other then the basic improvement of FPS is useless if the cpu has a hand in the micro-stutter.
If it could be tested to single out each with comparable cpu/gpu clock speeds to find out if one or both add to this problem.
What good is putting DDR10,000 in it when the cpu can still put a monkey wrench into the mix?
I think these are a good step for AMD and I really hope they work harder on the cpu side and get better shared work out of a combined APU but they are still low budget systems and are not really up to playing the newest games perfectly. I really hope they can do it though, we need AMD around to keep computers affordable.

blazorthon · Apr 1, 2013

billcat479 :

I don't think that the CPU is being a big problem, granted you may be right and I most certainly wouldn't mind an experiment to test that. Trinity's high frequency quad core CPUs are pretty good. They're not i5s or such, but they often keep pace with the i3s just fine.

Memory Scaling, AMD's Trinity-Based APUs, And Game Performance

Distinguished

Distinguished

Glorious

Glorious

Splendid

Distinguished

Splendid

Splendid

Splendid

Splendid

Splendid

Honorable

Polypheme

Honorable

Polypheme

Honorable

Glorious

Distinguished

Glorious

Share this page