AMD CPU speculation... and expert conjecture

ColinAP · Mar 4, 2014

gamerk316 :

How times have changed. </deadpan>

gamerk316 · Mar 4, 2014

ColinAP :

Edison was brilliant at taking the research others had done, turning it into a product, patenting it to death, then selling it. As an inventor, he was so-so at best.

Things are actually getting interesting now. Universities are starting to patent the underlying tech they discover, which is going to break current product models over the long haul. I can't stress how big a deal this is.

Cazalan · Mar 4, 2014

gamerk316 :

Sounds like a show I just watched on Leonardo Da Vinci. Many works he has been credited with, the tank, the helicopter, various flying devices, were all made 50-100 years before him. He just improved the designs and had better drawings. Which is great on it's own, but it shows how people tend to idolize certain individuals regardless of their true achievements.

Cazalan · Mar 4, 2014

de5_Roy :

Getting re-releases of Kabini when we should be getting Mullins? :pfff:

de5_Roy · Mar 4, 2014

Cazalan :

this launch seems to be region specific.
that, and i don't see mullins making much headway in the face of impending bay trail flood (eagerly subsidized by intel). it'd be a good chip, but amd doesn't have enough money to pay the oems.
i have lowered(than usual) my cpu-expectations from both red and Blue this year
http://www.fudzilla.com/home/item/34107-semiconductor-sales-rebounded-in-january

edit: i blame the tn panel for color-blindness!

Cazalan · Mar 4, 2014

de5_Roy :

To me it sounds like AMD is having trouble getting OEMs to foot the bill for the APUs. So they're going back to tried and true sockets. Even though it should be cheaper for the BGA product. Things like the KBN-I/5200 have remained out of stock. I was going to buy one a while ago but they took about 6 months too long to show up.

http://www.newegg.com/Product/Product.aspx?Item=N82E16813135363

I was expecting these to be much cheaper than they currently are. The chips are tiny compared to Kaveri. If the rumored price reduction for Win8 for small systems is true then this segment could take off again.

Edit: They really should have predicted this push-back from the OEMs and offered this socket much earlier in the life-cycle.

con635 · Mar 4, 2014

jdwii :

Thanks, good read, it fits in with the amd roadmap, kaveri = greater parallelism, carizzo = more performance, think the next apu will get upgrades on the cpu side rather than gpu. Hope so anyway, was all set to buy kaveri this month but think I'll just add a dgpu to my trinity and wait a little longer.

gamerk316 · Mar 4, 2014

http://www.extremetech.com/computing/177099-secrets-of-steamroller-digging-deep-into-amds-next-gen-core

Probably the most important tidbit:

Improved branch prediction, however, doesn’t seem to deliver the performance gains we would’ve liked to see. Instruction fetch is the next area to consider — though we can’t benchmark it directly. Agner’s tests, however, may shed some light on the problem. According to his work, the fetch units on Bulldozer, Piledriver, and Steamroller, despite being theoretically capable of handling up to 32 bytes (16 bytes per core) tops out in real-world tests at 21 bytes per clock. This implies that doubling the decode units couldn’t help much — not if the problem is farther up the line. Steamroller does implement some features, like a very small loop buffer, that help take pressure off the decode stages by storing very small previously decoded loops (up to 40 micro-instructions), but the fact that doubling up on decoder stages only modestly improved overall performance implies that significant bottlenecks still exist.

And the conclusion:

At the same time, however, it’s also clear that dual decoders wasn’t the fix that many AMD enthusiasts were hoping it would be. L1 cache contention remains problematic, as does the low set associativity. Integer throughput is poor partly because only two of Steamroller’s four integer pipelines are practically useful for most work. The long pipeline ensures that branch prediction misses will always hit the chip hard. The chip’s L2 latency remains much higher than its Intel counterpart and its memory controller is much slower.

The question of whether next year’s Carrizo can “fix” the Bulldozer architecture depends entirely on which design attributes are holding the core back. The only thing we know for certain about the core at this point is that Excavator includes support for AVX2. If Steamroller’s low performance is primarily caused by the shared fetch unit, than decoupling that system and adding 256-bit registers for AVX2 could significantly improve the core’s integer performance. If, on the other hand, the chip’s low performance is directly related to its long pipeline and high cache contention in the L1, it’s going to be much harder to solve.

Not holding my breath; I've made my views on the entire arch perfectly clear these past four years.

palladin9479 · Mar 4, 2014

tourist :

I said a long time ago that cache contention would be the biggest problem for the BD uArch. It's only made worse by the fact that AMD is significantly behind Intel in cache, prefetch and branch prediction technology. They will never be able to make up that difference without an insane amount of cash for R&D, just ain't gonna happen. They know this so instead they try to go in a different direction, designing the APU and building an easily modifiable modular uArch.

8350rocks · Mar 4, 2014

@juanrga:

Your error lies in assuming you can get an APU to equal FLOPS as a dGPU.

Because of the issue in die size requirements, this will only happen when superconductors can feasibly be used in APUs.

As it sits, any time you say an APU can do X FLOPS, at that given time, I will be able to point to a dGPU that can outperform it by roughly 5 fold or greater. Seeing as GPUs are increasing performance an order of magnitude better than CPUs per generation at this point, APUs will NEVER catch dGPUs until such time as the pre-conditions I laid out above are met, and even then it will likely be a LONG time before that day comes.

EDIT: When you have disparity of orders of magnitude performance difference, latency becomes less an issue. As, the sheer computing power overcomes the minute latency differences.

ColinAP · Mar 5, 2014

gamerk316 :

Nonsense. You'll get even more patents filed in an already overstretched system that is unable to properly test their validity. More of the same, in other words.

gamerk316 · Mar 5, 2014

8350rocks :

Thats may take as well. Until you can get dGPU power in an APU at decent yields, power draw, and price, dGPU's aren't going anywhere.

EDIT: When you have disparity of orders of magnitude performance difference, latency becomes less an issue. As, the sheer computing power overcomes the minute latency differences.[/quotemsg]

Not necessarily. Simple example: If I have a minimum 16ms delay transferring data to the GPU for it to do its work, then no matter how much data I can pass along the bus, the GPU will ALWAYS miss that first frame.

The delay is just that: A delay. And during that delay, you can do NOTHING. Pumping more data across the bus simply means a lower amount of delays, but the underlying delay itself can't go away. That's why i dislike the trend of higher bandwidth, higher latency toward RAM, since the higher latency is more likely to reduce CPU performance then the higher bandwidth is to increase it.*

*I'm ignoring the case of a iGPU, which obviously needs the bandwidth.

blackkstar · Mar 5, 2014

gamerk316 :

Not necessarily. Simple example: If I have a minimum 16ms delay transferring data to the GPU for it to do its work, then no matter how much data I can pass along the bus, the GPU will ALWAYS miss that first frame.

The delay is just that: A delay. And during that delay, you can do NOTHING. Pumping more data across the bus simply means a lower amount of delays, but the underlying delay itself can't go away. That's why i dislike the trend of higher bandwidth, higher latency toward RAM, since the higher latency is more likely to reduce CPU performance then the higher bandwidth is to increase it.*

*I'm ignoring the case of a iGPU, which obviously needs the bandwidth.[/quotemsg]

I'm not so sure you comprehend fully how memory works.

Latency is measured in clock cycles. If you have higher latency but higher frequency, it can mean less actual time between transferring data.

Consider DDR1 at 400mhz with CAS 3 timing. Now consider DDR2 at 800mhz with CAS6 timing. It is the same.

This is why GDDR has awful latency but it doesn't matter. The effective frequency is so much higher that it makes up for the fact it takes more clock cycles to transfer data.

juggernautxtr · Mar 5, 2014

I think AMD may see R&D money coming, they are selling the hell out of gpu's the damn things can't stay on the shelf longer than a couple days before the miners suck them up.
the latest article on AMD change ups i read said they employed literally the best cpu architect in the industry, i think the changes in steam roller were the quick fixes they could employ at the moment.
that same article said the front end was being worked on, so excavator may see some decent improvements if that article is true.

I think we will something with big igpu on chips soon, yeah its gonna be one hell of a vrm.
no one said they couldn't produce a g32 socket with gpu and cpu on the same socket.
I am still wondering if rambus will ever make a return. it was fast yes, but was hotter than hell running. the improvements on it by now should make it viable for use by now.

de5_Roy · Mar 5, 2014

palladin9479 :

the cache redesign/improving - is it really that expensive? will it get in the way of hsa performance?

juggernautxtr :

jim kelly or some other guys.... i remember that his/their return to amd was hyped as some kind of biblical mega event that will instantly change the face of amd cpus/apus as we know them.... may be he/they will. i remember he was working for apple, who develops arm cpu cores.
almost the same as resonant clock mesh technology was hyped as the ultimate power efficiency booster....

i am a bit lost atm. neither intel nor amd seem to have anything interesting from their high end mainstream lineup. i've been eagerly waiting for something on puma cores.

Cazalan · Mar 5, 2014

If Beema is supposedly 2x the performance/watt of Kabini does that mean AMD has an update for GCN coming this year as well? I'm assumig most of the gains for Beema are on the GPU side.

They need an answer for Maxwell before the larger die versions of that hit the market.

jdwii · Mar 5, 2014

de5_Roy :

I hope they have him working on the next CPU design without any stupid cache sharing or other dumb decisions that even the writers at ExtremeTech can point out, since Amd is basically done with anything beyond 4 X86 cores according to their roadmaps they need to bring back strong single core performance. Have A CPU that can process serial information as fast as possible and then send parallel code to the IGPU when possible.
Something tells me this obvious kind of statement isn't happening and they're probably finding new ways to make a processor with 1/10 the performance of in I3 use 1/10 the power so people can play angry birds on their android with x86.

gamerk316 · Mar 5, 2014

Boy, [H] just posted something in their BF4 Mantle review that will spark a firestorm:

http://www.hardocp.com/article/2014/03/04/bf4_amd_mantle_video_card_performance_review_part_1/1#.UxeOHEFRTHU

In each case, the use of Mantle did not increase the best playable settings for each card within the multiplayer environment. The more interesting thing to note is that the DirectX 11 performance of the Catalyst 14.1 and 14.2 Beta drivers appears to be lower than the performance offered by the game's launch drivers from last year. When we look at today's Mantle performance in comparison to our launch day performance, there is little to no benefit from a gameplay experience perspective to enable Mantle at this time.

There's that cynic in me that wonders where AMD intentionally tanked their D3D driver, even if just a little...

palladin9479 · Mar 5, 2014

the cache redesign/improving - is it really that expensive? will it get in the way of hsa performance?

Thing to know about cache design is that it's like voodoo magic. Dumb cache is fairly simple, you have it setup to read a certain number of bytes ahead of every read request and store that information in cache. Most programs access memory in short sequences so you can generally bet that if it's access byte A, it will also ask for A+1,+2,+3,ect. That is what the older design's were like and there are some severe limitations whenever your program does a set of unexpected jumps. Smart cache attempts to predict what your code is about to do, and instead of just reading ahead of every read request, it does a bit of code analysis and heuristics to figure out path A or B, then read the code ahead of that path to predict which memory set to load into cache. The algorithm that does the prediction and heuristics is a closely held industrial secret. Intel's is by far the most advanced, and they use that as a cornerstone of their design.

To put it short, if AMD were to implement everything identically to the Intel design but using AMD's own cache logic, it wouldn't necessarily be any faster then the current BD uArch. As much as people demonize BD, without understanding what they were trying to do, it's a fairly interesting design with lots of flexibility.

Cazalan · Mar 5, 2014

gamerk316 :

Good to know but not a very thorough review. What's important is how it does with an average CPU/GPU and APUs, not $330 CPU and $700 GPU.

Rum · Mar 5, 2014

gamerk316 :

Just stop with the conspiracy theory, it serves AMD no purpose to tank DX performance to make Mantle look better on top of that don't you think the disparity would be greater if they were crippling drivers? AMD would not risk that kind of scandal because it would not only turn off potential buyers but piss off their user base which would jump ship to Intel/ Nvidia... Who are sadly no better! This review is just BF4 doing what it does best and that is sucking!

jdwii · Mar 5, 2014

palladin9479 :

Some of those people work at Amd those same people said that is the main reason a lot of people were canned including the CEO, Amd will have a competitive CPU when they ditch the module design until then we can expect I3 performance from a quad core Amd for quite some time with more power draw. I do not expect anything more, and like my last comment i hope they're getting rid of this design, most people stating otherwise probably isn't even running Amd on their main rig.

palladin9479 · Mar 5, 2014

Some of those people work at Amd those same people said that is the main reason a lot of people were canned including the CEO, Amd will have a competitive CPU when they ditch the module design until then we can expect I3 performance from a quad core Amd for quite some time with more power draw. I do not expect anything more, and like my last comment i hope they're getting rid of this design, most people stating otherwise probably isn't even running Amd on their main rig.

That is your own opinion, and your entitled to it. The moduler design isn't the cause of the performance issues, it's quite a bit more complicated then that. AMD's cache logic, which is included in Phenom btw, is inferior to Intel's. That is a fact. AMD lacks the raw R&D staff required to create something as advanced as Intels, so they used a different approach. AMD would not of won the XBONE and PS4 design wins without the modular architecture as the custom chips wouldn't be possible. Those wins dwarf their PC sales division.

jdwii · Mar 5, 2014

palladin9479 :

And the opinion of many experts including ones at Amd, and again i'm pretty sure Amd won the consoles because they could do everything in 1 place vs many others and its been said Amd did it for a low price.

Cazalan · Mar 5, 2014

palladin9479 :

Rumor has it that modular tech was licensed from Sonics, so basically anyone can do that if they wanted to. The tech has already been licensed to ARM so you can count this as unique as a DDR3 IMC now.

AMD CPU speculation... and expert conjecture

Honorable

Glorious

Distinguished

Distinguished

Splendid

Distinguished

Honorable

Glorious

Splendid

Distinguished

Honorable

Glorious

Honorable

Honorable

Splendid

Distinguished

Splendid

Glorious

Splendid

Distinguished

Honorable

Splendid

Splendid

Splendid

Distinguished

Share this page