AMD CPU speculation... and expert conjecture

wh3resmycar · Sep 4, 2014

i'm disappointed with the 285, so i just bought a 280x instead (disposed a 650ti boost). i'm actually installing the card as of this writing. hopefully all goes well.

Check her out. She's rocking a nice a pair.

i know the 285x is a bit more power efficient but this 280x consumes just as much as my gtx460 4 years ago.

vmN · Sep 4, 2014

wh3resmycar :

The R9 285 is to replace the r9 280 not the r9 280x.
One of the goal on the r9 285 is to reduce production cost.

logainofhades · Sep 4, 2014

vmN :

It was a horrible naming idea. Uninformed people are going to think it is better, when in fact, it is not.

wh3resmycar · Sep 4, 2014

well that's fine in a sense, the 285 naming scheme locks it as the last of r9 28x series. my 280x is working fine, almost twice as fast the 650ti boost it replaced. funny thing is i'm still encountering the traditional AMD driver issues. AMD can't just do what nvidia is doing with image scaling. looking at it at the lighter side, it put a smile on my face. "it aint AMD if you don't encounter driver issues."

Embra · Sep 4, 2014

AMD lists Radeon GPUs that support FreeSync

http://www.guru3d.com/news-story/amd-lists-radeon-gpus-that-support-freesync.html

vmN · Sep 4, 2014

AMD namingscheme continue to confuse people.

8350rocks · Sep 4, 2014

de5_Roy :

The Matrox design win is actually a major thing for AMD, that will give them massive inroads in workstations, and professional applications. That could seriously be a major boost for their GPUs.

de5_Roy · Sep 4, 2014

vmN :

and confused people make repeat customers. that's what amd's marketing dept. wants.

8350rocks :

high margin ... 😗

juanrga · Sep 4, 2014

-Fran- :

First all thanks by considering this issue.

I agree on that current GCN 1.0 is inefficient. It cannot scale down and compete with mobile GPUs and it cannot scale up and compete with HPC GPUs. If we check the 100 first entries on Green500 list we find only one entry with AMD GPUs. However, I think that he was referring to future GCN architectures. Koduri is developing a new GCN architecture which is being synchronized for K12 release. I guess that efficiency will be one of the keys of the new design. Recall that AMD is aiming the ultra-low power (sub 2W) market for 2015--2106.

Exynos SoCs are based in ARM standard Cortex designs. Samsung lacks a custom core and cannot compete against other's 64bit. There are good reasons why Qualcomm, Apple, Nvidia... spend years and billions on designing custom cores. Moreover, Samsung tried to enter the server market but canceled the project. By acquiring AMD, Samsung could reuse K12 for its server division.

Samsung is fighting Intel in the business where Intel excels: foundries. This is a very complex and risky business. How could Samsung be worried about the relatively cheap laptop business? Specially when could reused designs and basically replace an ARM CPU by a x86 CPU. AMD Skybridge would look fantastic for this.

Samsung only integrates external cores and GPUs, which they don't control. Apple, Qualcomm, Nvidia, AMD... control the core and the GPU as well and can adapt/integrate each better.

Glofo 'experience' is irrelevant. In fact, Glofo canceled its own 14XM node, which was uncompetitive and late to the market, and licensed the technology and the 14nm node from Samsung. Thus AMD products will be competitive thanks to being made on technology from Samsung foundries.

I think each needs the other. However, I don't think that x86 is one of main reasons, I think that Samsung is not really interested in x86. And AMD already shared its beliefs about x86 (see my signature).

gamerk316 · Sep 4, 2014

How the L1/L2 caches work:

http://www.extremetech.com/extreme/188776-how-l1-and-l2-cpu-caches-work-and-why-theyre-an-essential-part-of-modern-chips

In the real world, an L1 cache typically has a hit rate between 95% and 97%, but the performance impact of those two values in our simple example isn’t 2% — it’s 14%. Keep in mind, we’re assuming the missed data is always sitting in the L2 cache. If the data has been evicted from the cache and is sitting in main memory, with an access latency of 80-120ns, the performance difference between a 95% and 97% hit rate could nearly double the total time needed to execute the code.

Which itself is a major reason why in "real-world" programs, you'll never see 100% core usage, and why high-80's is basically an indication the CPU is maxed out.

A cache is contended when two different threads are writing and overwriting data in the same memory space. It hurts performance of both threads — each core is forced to spend time writing its own preferred data into the L1, only for the other core promptly overwrite that information. Steamroller still gets whacked by this problem, even though AMD increased the L1 code cache to 96KB and made it three-way associative instead of two.

Which explains why Intel is still far ahead in memory based benchmarking. A 5% hit-rate decline could affect your performance by over 20%. That's huge, and the one major thing AMD can work on to improve their performance going forward.

de5_Roy · Sep 4, 2014

gamerk316 :

+1

edit: imo the article starts out very well, but skips a few important bits right before going into bd/pd cache's exploration e.g. cache coherency, snooping and consistency protocols etc. a brief overview of those woulda been nice. that's why the first part seems aimed at single core and the second part at multicore processors, without transition.

blackkstar · Sep 4, 2014

@juan, GCN will never be as efficient in traditional GPGPU as a chip from Nvidia. Nvidia's goals are to remove as much GPGPU as possible to reduce transistor count while not crippling specific GPGPU tasks (like those required for PhysX). If they cripple something and it's more efficient to add transistors for specific tasks (NVENC), they will go that route.

AMD's GPU goals are to create a strong monster for HSA. Their efficiency will come from the large performance gains of HSA over GPGPU. But until we see HSA come to fruition, AMD will lag behind.

I do not think AMD has any reason to chase after low power like in mobile ARM. That is a race to the bottom market and it does AMD no good to be there. They want to be in markets where HSA can do well and AMD can offer a solution that no one else can. An efficient and small GPU is not special and it becomes another competitor to ARM GPUs and smaller Nvidia GPUs. But as far as competing with Nvidia on efficiency, AMD is going to have to compete with someone who feels it's fine to just cut functionality out of a product to raise efficiency when AMD is not in the position to do that thanks to HSA.

But being a GPGPU monster for HSA is going to cost transistors and it's going to put AMD behind Nvidia and company in efficiency in traditional tasks. I think that Nvidia knows this and it's why it's pushing efficiency so hard in marketing material (like reviews screaming GTX 750 Ti is fantastic for PPW).

AMD's value and efficiency here won't show up until we get software that can exploit their hardware. And knowing how things have been in the past, it's not a guaranteed thing.

gamerk316 · Sep 4, 2014

^^ NVIDIA is not going to do anything that hurts it in compute, given its a very important market for them. They're highlighting PPW because from now on, they are building up from their mobile designs, rather then the other way around. Going forward, NVIDIA should rarely if ever lose to AMD in PPW. Hence why they highlight it.

etayorius · Sep 4, 2014

blackkstar :

AMD slogan should be:

Lagging today for creating the Tech of tomorrow, who will be obsolete then.

It`s basically the same strategy they used with Bulldozer. HSA seems awesome but if they can`t make use of the tech NOW, they are basically screwed.

Damn it AMD, Damn it... just give me a 4-6 Core Phenom based CPU with 15% increased IPC and 20% increased speed, and i would haste to the local PC Hardware Store.

de5_Roy · Sep 4, 2014

blackkstar :

it is, just not the way some reviews are presenting. for instance, nvidia's turbo algorithm is very good and well-adapted to maxwell's design. at first i thought nvidia pulled off some kind of feat without a die shrink until i started to look into it. almost bought one too... stupid prices....

amusingly, amd has very similar tech in kaveri, mullins/beema and gcn gpus...

etayorius :

either of those things require design changes that'd turn a phenom ii core into something else, so not gonna happen.
i'm hoping jaguar's successor [strike]zen[/strike] will turn things around. time will tell....

jdwii · Sep 4, 2014

gamerk316 :

I was actually reading from a programmer the other day about Amd fx and they said that sometimes it would be easier to use the 8 core fx as just a 4 core fx over L2 cache issues. I was also amazed that Amd did that to the L1 right from the beginning i thought they were a bit uneducated and i was wondering if they even had any type of college education or even work experience in the field.

jdwii · Sep 4, 2014

etayorius :

blackkstar :

AMD slogan should be:

Lagging today for creating the Tech of tomorrow, who will be obsolete then.

It`s basically the same strategy they used with Bulldozer. HSA seems awesome but if they can`t make use of the tech NOW, they are basically screwed.

Damn it AMD, Damn it... just give me a 4-6 Core Phenom based CPU with 15% increased IPC and 20% increased speed, and i would haste to the local PC Hardware Store.

It seems like Amd does do this tech is the future but to make it you have to have products today to sell. Although i feel they do its important they stress that.

logainofhades · Sep 4, 2014

de5_Roy :

Honestly, if Phenom II had the improved memory controller, core counts, and frequency ceiling that FX does, it would have been a better chip than Piledriver.

de5_Roy · Sep 4, 2014

logainofhades :

llano had upgraded imc.
ph ii x6 cpus had more cores.
i dunno about frequency ceiling.... iirc thuban cpus were clocked lower than deneb cpus so amd already made sorta clockrate vs core count tradeoff with those. i don't think that design woulda allowed for both. i do read about people running overclocked ph ii cpus up to 4.0 GHz and some even 4.5 GHz (at the expense of more power and heat).
pd, especially fx 8xxx cpus should be higher performing than phenom ii. 8 cores and 4GHz+ clockrate would not be possible without bd design. i think that amd woulda made an 8 core phenom if it was possible. edit: an 8c phenom design like that woulda been at least useful in the server department even if low clocked.

logainofhades · Sep 4, 2014

IIRC, the memory controller was improved more for FX. Clock for clock, Phenom II traded blows with Piledriver. Look how a 4.0ghz PhII 965 held up against a 4.2ghz stock FX 4350. I would venture to believe a PhII X6 would have performed similarly against an FX 6350, clock for clock. Without a doubt, FX's saving graces were increased core count and clock speeds.

juanrga · Sep 4, 2014

gamerk316 :

How the L1/L2 caches work:

http://www.extremetech.com/extreme/188776-how-l1-and-l2-cpu-caches-work-and-why-theyre-an-essential-part-of-modern-chips

In the real world, an L1 cache typically has a hit rate between 95% and 97%, but the performance impact of those two values in our simple example isn’t 2% — it’s 14%. Keep in mind, we’re assuming the missed data is always sitting in the L2 cache. If the data has been evicted from the cache and is sitting in main memory, with an access latency of 80-120ns, the performance difference between a 95% and 97% hit rate could nearly double the total time needed to execute the code.

Which itself is a major reason why in "real-world" programs, you'll never see 100% core usage, and why high-80's is basically an indication the CPU is maxed out.

A cache is contended when two different threads are writing and overwriting data in the same memory space. It hurts performance of both threads — each core is forced to spend time writing its own preferred data into the L1, only for the other core promptly overwrite that information. Steamroller still gets whacked by this problem, even though AMD increased the L1 code cache to 96KB and made it three-way associative instead of two.

Which explains why Intel is still far ahead in memory based benchmarking. A 5% hit-rate decline could affect your performance by over 20%. That's huge, and the one major thing AMD can work on to improve their performance going forward.

Since his return to AMD, Keller has been working hard on developing about a dozen of new techniques related to caches. Keller has also developed an non-standard stack cache, which could be so important to K12 as the uop was to SB (IMO).

In fact part of his work on caches is expected to be implemented in Excavator.

juanrga · Sep 4, 2014

It is interesting that some believe that HSA is a kind of magic technology that will turn obsolete stuff such as architectural efficiency. No, it is not.

HSA (Heterogeneous System Architecture) is AMD response to what everyone else is doing: Intel neo-heterogeneity, Nvidia CUDA heterogeneity, Fujitsu heterogeneity, IBM... I don't find any special advantage of HSA over the competition. E.g. this is Nvidia CUDA response to AMD hUMA

HSA is just a needed ingredient for efficiency, but not the only one. AMD needs efficient CPU and GPU architectures just as everyone else. The laws of physics are the same for all of us.

The day 2 September I gave a slide that resumes the goals of the new AMD. The first item mentions that the old goal of developing inefficient cores for HEDT has been abandoned.

8350rocks · Sep 4, 2014

AMD is not concerned about super low margin markets like smart phones/tablets...you are talking about having to be a massive player in those markets to make the tiny margins even worth the time investment to bring something competitive to market.

These are the top 3 higher margin markets left in hardware:

(1) Commercial Servers/Workstations

(2) HEDT Consumer CPUs/APUs

(3) Consumer GPUs

Embedded solutions can be very profitable, but VIA has a lockdown on a lot of that. Not that AMD could not make inroads to do so, but it would be an even more uphill battle, as VIA is likely over 80-90% penetration in embedded x86 solutions.

gamerk316 · Sep 4, 2014

8350rocks :

Nevermind embedded as a whole is still the one place PPC is supreme. So they'd be competing against both Via and IBM.

juanrga · Sep 4, 2014

My beliefs/guesses for K12 core, an update. I expect a 'small' core, Jaguar style, but with improved IPC and frequency, and next features/performance:

■~6mm^2
■~40% +IPC
■3.0--3.5GHz
■CMP architecture
■4-issue
■2x128bit SIMD/FP unit

Other thoughts:

Q3-2016 release
95W TDP

Disclaimer: no warranty, I can update my opinion without notice 😉

AMD CPU speculation... and expert conjecture

Distinguished

Honorable

Titan

Distinguished

Distinguished

Honorable

Distinguished

Splendid

Distinguished

Glorious

Splendid

Honorable

Glorious

Honorable

Splendid

Splendid

Splendid

Titan

Splendid

Titan

Distinguished

Distinguished

Distinguished

Glorious

Distinguished

Share this page