AMD CPU speculation... and expert conjecture

harly2 · May 20, 2014

8350rocks :

You're always on about this ARM thing. It is clear that ARM in a PC is a waste. Why would anyone waste all that money to convert and emulate Windows or OS X when x86 is doing just fine....what is there to talk about?

That said ARM is mobile... x86 is a fail in phones and tablets. For servers it's obvious the vendors want out from Intel's thumb so there is a place for ARM in servers. But if anyone says ARM in PC just don't respond...that is silly.

juanrga · May 20, 2014

colinp :

So far as I know Excavator is on track and everyone who I call says me the same. Excavator patch was added to GCC at the end of the last year and was added to Clang this month. What sense has to add support for a canceled architecture? For 2015 we expect FX, Warsaw, Toronto, Cambridge, Carrizo... I don't see any official word from AMD neither from any tech site saying what you say.

I agree on that AMD is limited by constrained resources, but AMD only needs to tweak puma+ core to adapt it to Skybridge 2015, develop the new K12 core for 2016, and then the x86 'sister' core derived from the K12.

Moreover we have word from Keller than cat and Bulldozer lines will be fused in a single line. Thus I see AMD optimizing resources.

If excavator cores are to be released the next year, this implies that the design is finished at the time of writing this. AMD canceling excavator now will not bring backward all the money spent on the development of the core.

Moreover, Intel will release Broadwell at the end of this year. I don't see how AMD could compete with 14nm Broadwell using 28nm Kaveri during all 2015. but I can see them half-competing with 20nm Carrizo up to the new core is ready.

Time will say what AMD has planed.

P.S.: We have known for a long time that AMD was preparing a new HEDT product since Keller returned. The words of someone who pretends to have insider information but only says what we already know is not impressing me, specially when we know all his insider information since the past year was wrong as another poster noted before.

8350rocks · May 20, 2014

juanrga :

colinp :

So far as I know Excavator is on track and everyone who I call says me the same. Excavator patch was added to GCC at the end of the last year and was added to Clang this month. What sense has to add support for a canceled architecture? For 2015 we expect FX, Warsaw, Toronto, Cambridge, Carrizo... I don't see any official word from AMD neither from any tech site saying what you say.

I agree on that AMD is limited by constrained resources, but AMD only needs to tweak puma+ core to adapt it to Skybridge 2015, develop the new K12 core for 2016, and then the x86 'sister' core derived from the K12.

Moreover we have word from Keller than cat and Bulldozer lines will be fused in a single line. Thus I see AMD optimizing resources.

If excavator cores are to be released the next year, this implies that the design is finished at the time of writing this. AMD canceling excavator now will not bring backward all the money spent on the development of the core.

Moreover, Intel will release Broadwell at the end of this year. I don't see how AMD could compete with 14nm Broadwell using 28nm Kaveri during all 2015. but I can see them half-competing with 20nm Carrizo up to the new core is ready.

Time will say what AMD has planed.

P.S.: We have known for a long time that AMD was preparing a new HEDT product since Keller returned. The words of someone who pretends to have insider information but only says what we already know is not impressing me, specially when we know all his insider information since the past year was wrong as another poster noted before.

Hmm...aside from your misquotes and putting words in my mouth because you never bother to actually READ my posts, I have been more accurate than you have. Note, my information has also evolved, do you think their plans have been static this entire time? You would be a fool to make such reckonings.

I will reveal more when I am permitted. That is all I can say. The day the NDA expires on this stuff you will hear all about it

juanrga · May 20, 2014

If the idea of a hybrid Steamroller-Excavator core did perplex you then this will you go you crazy

http://wccftech.com/amd-developing-generation-apu-x86-cheetah-arm-cores-features-gcn-20-cores-hsa-support/

I am still trying to digest some info from it. It sound so nonsensical that I am trying to understand how someone can invent something of that level and pretend that was said by AMD.

juanrga · May 20, 2014

MANTLE already adopted in four engines by seven developers in 20+ games

MANTLE adoption rates compared to Dx10 and DX11

Software ecosystem for AMD products. I didn't know that Windows server support APUs

jdwii doesn't show this slide to your boss: it is about a webpage generated by a PHP script running an ARM server by AMD

Finally a lovely collection of GPUs including the last R9-295x2

Cazalan · May 20, 2014

gamerk316 :

I disagree. I don't mind discussing ARM. I had at least 5 projects last year that used ARM cores. It's the constant back and forth of x86 vs ARM that has been done to death. Like the Windows vs Linux or Console vs PC. It has its place but its dull. There has to be a better place to do it without the fanboyism that is going on here.

jdwii · May 21, 2014

"jdwii doesn't show this slide to your boss: it is about a webpage generated by a PHP script running an ARM server by AMD"
Oh how cute a wittle web server

Cazalan · May 21, 2014

jdwii :

Applied Micro runs their corporate website on their custom ARMv8 processors. This is what AMDs generic A57 cores will be competing with for 2 years until they make their own custom ARMv8 core. http://www.apm.com

It's basically what Red Hat, Ubuntu and others have been using the last year to get 64bit ready.

gamerk316 · May 21, 2014

MANTLE adoption rates compared to Dx10 and DX11

Read the slide better: "Conservative Mantle Adoption" and "Expected Mantle Adoption".

You know what isn't shown? "Actual Mantle Adoption".

You are a marketing departments wet dream.

Software ecosystem for AMD products. I didn't know that Windows server support APUs

Why wouldn't it? The CPU is fully usable, and the APU can be treated as an iGPU for the purposes of running. More marketing fluff on full display.

gamerk316 · May 21, 2014

Cazalan :

You want me to start a technical rant about why were were wrong to adopt X86 over 68k? I will if you want me too.

cdrkf · May 21, 2014

gamerk316 :

From my understanding x86 hardware implementations have moved further and further away from it's 'CISC' roots and operates in a very similar way to RISC designs like ARM. I don't think the ISA makes that much difference any more, you could theoretically make a very efficient x86 low power design (which we're kinda getting), or a higher power, high performance ARM design (like AMD is proposing) and anything in-between. The key consideration is software support- which is thankfully getting less dependent on x86.

blackkstar · May 21, 2014

cdrkf :

It is still CISC. It still follows the philosophy of "lets add more instructions to make things faster!" instead of "just make a simple core with basic instructions that does those basic instructions as fast as possible."

The guts of modern x86 CPUs may look more like RISC because they are translating complex instructions to simpler ones and then following the RISC philosophy, but at the end of the day, you're going to keep seeing more and more instructions being added to x86 while ARM gets something like NEON if they're lucky.

Also Juan, please stop. Those slides only prove what I've been saying forever. Notice how all the HSA wins for consumer environments are in x86 and there are barely any consumer applications in HSA ARM wins? And notice how x86 is basically a superset of ARM?

But back on AMD CPUs, I'm starting to read from this swirl of rumors that it seems like whatever AMD is doing with their next generation cores will be nothing like how traditional cores are developed. It almost seems like they are doing a lot with what they have available to them. One of the ways I see them doing this is by taking their strategy of making a bunch of "building blocks" of different cores and then developing different "building blocks" to be parts of each core.

Look at what AMD is supposedly doing and look at their budget. They are releasing a new graphics API, HSA, APUs, ARM cores, x86 cores, GPU cores, new platforms for consumer, server, HEDT, mobile. They're crazy busy. There's no way that each of these are being completely distinctly developed. There has to be huge overlap, like Mantle is a tiny subset of HSA applied to graphics or something.

juanrga · May 21, 2014

gamerk316 :

Then you missed the former slide which gives number of available games?

You are the guy who believed the infamous Nvidia marketing slide about DX11 magic driver... They must have fooled you, not me. I was the guy that give here the corrected slide without the marketing lies. :sarcastic:

gamerk316 :

Thanks by confirming that APU can run CPU software, but my doubt is about if WS supports iGPU acceleration. I know that Firepro GPUs are supported, but don't know about APUs support.

blackkstar :

No, x86 is not a superset of ARM, stop posting stuff like this.

The lack of consumer applications in HSA ARM wins is easy to explain: HSA ARM hardware has not been still released. 2015 Skybridge APUs will be first ARM HSA enabled APUs.

blackkstar :

What? Mantle and HSA are different things.

jimmysmitty · May 21, 2014

juanrga :

Great some random guys blog, who is obviously someone who prefers Linux. I guess someone who prefers it would trash it?

I have noticed this much. The tech savvy people who prefer Linux will never acknowledge the faults or possible faults within Linux. They will only praise it while also trashing Windows. Most IT people I have met will acknowledge the same thing I have said, all software has security holes and always will no matter what.

Lets sort of leave this off here shall we?

juanrga :

I want to know why you think a company who has not had a significant win nor has not made a decent profit for the past 8 years, since Core 2 came out, is somehow making the right choice as opposed to a company that has been generating billions a quarter in revenue.

Nothing against AMD, I think their GPUs are still decent but from a business perspective they have done nothing but make bad choices and are finally making some choices that have started to slowly turn them around. Very slowly.

Mantle is a great idea. Problem is it doesn't support enough GPUs. DirectX 12 is supposed to be supported by all current DX11 GPUs. Which one is better for people?

ARM, well anyone can afford to license it. It is a dirt cheap license. That doesn't mean they will make a magical ARM CPU that comes out and suddenly plows down anything x86. We can especially consider that when it comes to the fact that AMD has little to no ARM experience compared to Samsung or Quallcomm.

Again though as this is all just speculation I think Juan that you need to brace yourself for the chance that you could be wrong about everything because until it comes out it never will be anything but a rumor.

juanrga · May 21, 2014

cdrkf :

Since many years ago both AMD and Intel implement x86 processors as RISC machines. Any modern x86 CPU translates x86 to RISC-like microcode which is then executed on the wire.(*) AMD and Intel use different and secret microcodes today.

There are several reasons why x86 is not so efficient as ARMv8. The variable-length instructions, and the small register set of x86 interfere with modern compiler optimizations, which means that the ARM hardware is more efficiently used by the software, wasting less power to do the same work.

x86 CPUs require a larger area for the decoder. This has two consequences: either you spend more area to do the same work loosing efficiency, which roughly scales as (sqrt area)^-1, or the ARM hardware can use the extra space to improve performance, which roughly scales as (sqrt area).

Precisely Keller praised both the larger register set and the smaller decoder:

Keller was very complimentary about the ARMv8 ISA in his talk, saying it has more registers and "a proper three-operand instruction set." He noted that ARMv8 doesn't require the same instruction decoding hardware as an x86 processor, leaving more room to concentrate on performance.

This is why Intel best design are need quad cores and higher TDP to compete against dual-core armv8 hardware with small TDP, despite Intel has a node advantage: 22nm finfet vs 28nm bulk.

(*) In some cases a single x86 instruction is translated to about one hundred of micro-ops, which are then executed on the wire.

gamerk316 · May 21, 2014

I have noticed this much. The tech savvy people who prefer Linux will never acknowledge the faults or possible faults within Linux. They will only praise it while also trashing Windows. Most IT people I have met will acknowledge the same thing I have said, all software has security holes and always will no matter what.

The first step is to admit when you have a problem. That's why Linux's greatest problems never get fixed.

There are several reasons why x86 is not so efficient as ARMv8. The variable-length instructions, and the small register set of x86 interfere with modern compiler optimizations, which means that the ARM hardware is more efficiently used by the software, wasting less power to do the same work.

And yet, the Itanium, with its HUNDREDS of registers, had poorer IPC then X86.

Seriously Juan, you have high speed CPU caches to hide the fact you only have so many CPU registers. And you only need TWO to do math. Its that type of stuff that goes in the L1 anyways, and is essentially free. You aren't loosing performance due to sub-optimal register use.

8350rocks · May 21, 2014

juanrga :

cdrkf :

Since many years ago both AMD and Intel implement x86 processors as RISC machines. Any modern x86 CPU translates x86 to RISC-like microcode which is then executed on the wire.(*) AMD and Intel use different and secret microcodes today.

There are several reasons why x86 is not so efficient as ARMv8. The variable-length instructions, and the small register set of x86 interfere with modern compiler optimizations, which means that the ARM hardware is more efficiently used by the software, wasting less power to do the same work.

x86 CPUs require a larger area for the decoder. This has two consequences: either you spend more area to do the same work loosing efficiency, which roughly scales as (sqrt area)^-1, or the ARM hardware can use the extra space to improve performance, which roughly scales as (sqrt area).

Precisely Keller praised both the larger register set and the smaller decoder:

Keller was very complimentary about the ARMv8 ISA in his talk, saying it has more registers and "a proper three-operand instruction set." He noted that ARMv8 doesn't require the same instruction decoding hardware as an x86 processor, leaving more room to concentrate on performance.

This is why Intel best design are need quad cores and higher TDP to compete against dual-core armv8 hardware with small TDP, despite Intel has a node advantage: 22nm finfet vs 28nm bulk.

(*) In some cases a single x86 instruction is translated to about one hundred of micro-ops, which are then executed on the wire.

ARM beating Intel does not say much, especially since they cannot even really compete with the offerings from AMD at the moment. Now, how does it fare against the cat cores? Last time they ran A15s against the x86 cores from AMD, even at lower clockspeeds the cat cores killed the ARM SoCs.

cdrkf · May 22, 2014

8350rocks :

juanrga :

ARM beating Intel does not say much, especially since they cannot even really compete with the offerings from AMD at the moment. Now, how does it fare against the cat cores? Last time they ran A15s against the x86 cores from AMD, even at lower clockspeeds the cat cores killed the ARM SoCs.

True, but the A15 cores are running at much lower wattages than the Cat cores. Performance has to be taken into account with power or it's meaningless- I can imagine with the leakage reductions in Puma that Puma is similar perf / w to A15, however AMD have stated in their slides the expect a serious performance advantage on the new ARM parts they are building compared to the cat based server chip they have (ofc that is a newer ARM core they will be using though)...

con635 · May 22, 2014

http://wccftech.com/amd-developing-generation-apu-x86-cheetah-arm-cores-features-gcn-20-cores-hsa-support/

gpu acceleration without the need for software changes?? Probably a 2+2=5 story but the idea is nice.

http://gamingbolt.com/ps4-ice-team-programmer-surface-tilingdetiling-on-the-cpu-is-10-100x-faster-now

Not to sure I'm understanding this right but is this hsa in gaming? offloading parallel cpu work to the gpu cores?

colinp · May 22, 2014

con635 :

That can't possibly be true. Our font of all knowledge CPU-wise will happily tell you that the PS4's APU is basically good for nothing.

de5_Roy · May 22, 2014

colinp :

it's written in asm.

the workload is suitable for parallel execution.

the apu consists of cpu, igpu, imc, u.n.b.. ps4's soc has other hardware blocks as well.

the jaguar cpu part will underperform a core i5 IF both are running pc software (e.g. hardbrake.. or photoshop).

the performance improvement in the article is coming from improved coding, that's all. programmers will get gradually better with time as they continue to work with the console.

con635 :

cheetah... sounds familiar. i gotta check my old post but i sense a "another prediction confirmed!!" :pt1cable:

gamerk316 · May 22, 2014

8350rocks :

The phone market is saturated, and going to contract in coming years; I expect to see sales fall by half. That's going to put a dent in sales and profits, and make ARM a LOT less attractive. Embedded sales are much more reliable, hence why NVIDIA is moving in that direction.

con635 :

You can do this driver side, the problem is (and as I've been repeating for 5 years now), you still need to break the tasks up so you can take advantage of the GPU cores in the first place.

There's NO advantage to loading CPU threads on a GPU; GPU's are HORRID at anything that requires significant processing. They are fast because you have many little blocks of processing that are easy to handle by weak cores. Load a CPU thread on a GPU, and performance tanks. You'd get more performance out of a single CPU core working 4 threads then you would offloading them to a GPU.

And lets not forget the GPU is going to be busy doing rendering and compute at the same time, so it not like the GPU is going to have resources free to handle excess CPU work.

cdrkf · May 22, 2014

gamerk316 :

I think this is the crux of AMD's whole strategy though- as you say you need to feed instructions to the most suitable execution resource. There are plenty of things that are currently handled by CPU cores that would run better on the GPU, and potentially vice versa.

If HSA endures and reaches it's goal- then the particular engine doing a task will become transparent to the OS, and the allocation of resources will become automatic. At that point, the assembly of the 'whole' becomes more important than any individual component which would give AMD an advantage (the total theoretical throughput of their APU's is really high, the problem is utilising it properly in software with the current tools).

gamerk316 · May 22, 2014

I think this is the crux of AMD's whole strategy though- as you say you need to feed instructions to the most suitable execution resource. There are plenty of things that are currently handled by CPU cores that would run better on the GPU, and potentially vice versa.

If HSA endures and reaches it's goal- then the particular engine doing a task will become transparent to the OS, and the allocation of resources will become automatic. At that point, the assembly of the 'whole' becomes more important than any individual component which would give AMD an advantage (the total theoretical throughput of their APU's is really high, the problem is utilising it properly in software with the current tools).

No, there aren't, and that's the logical failure here. A GPU core is ALWAYS going to be slower then a CPU core, simply due to design. The ONLY reason a GPU is faster is because they have upwards of 1000 cores, and in highly parallel work where all those resources are used, ends up outperforming 4 core CPUs.

From a CPU's perspective, it is simply executing a stream of instructions. The CPU is more or less blind to how those instructions are organized, and thus can't make assumptions about if the instructions in question are better run on an iGPU or not. That type of data handling is ALWAYS going to be done user level, either by the developer or the compiler.

And in both cases, you have to consider the varying performance differences between the CPU/GPU when deciding whether to offload work or not. The Dolphin Emulator, for instance, has options to use both OpenMP and OpenCL for texture decoding, and typically, on my system (2600k and 770 GTX), both slow the emulator. With a weaker CPU though? Maybe those will result in a speedup. So even making hardcoded assumptions will cause problems. Just look at the idTech5 engine and listen to how people react when their performance options are automatically chosen for them in a sub-optimal manner.

Hence why HSA is a pipe dream: You need the devs to support it. And if Intel isn't on board, then its not going to be universally adopted. If AMD's strategy is built around HSA being widely adopted, then they've already failed. Devs do not go out of their way to support processing for the minority. In specialized software? Sure, since there's a market there. But in the real world? No.

Its this same reasoning why I called BD's design a failure two years ahead of time, because irrespective of anything else, low performance cores that mandated significant CPU threading of typically non-threadable tasks to make up its per-core performance deficit was NOT going to happen. Same concept here.

8350rocks · May 22, 2014

cdrkf :

8350rocks :

True, but the A15 cores are running at much lower wattages than the Cat cores. Performance has to be taken into account with power or it's meaningless- I can imagine with the leakage reductions in Puma that Puma is similar perf / w to A15, however AMD have stated in their slides the expect a serious performance advantage on the new ARM parts they are building compared to the cat based server chip they have (ofc that is a newer ARM core they will be using though)...

Actually, in the article they noted how close the power consumption was, so in terms of perf/watt the cat cores were actually quite a bit ahead of the A15 cores. You must remember, the cat cores were running @ 1.5 GHz which is power sipping for x86, while the A15 cores running at ~2.0-2.2 GHz is maxed out for A15 cores.

AMD CPU speculation... and expert conjecture

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Glorious

Glorious

Judicious

Honorable

Distinguished

Champion

Distinguished

Glorious

Distinguished

Judicious

Honorable

Honorable

Splendid

Glorious

Judicious

Glorious

Distinguished

Share this page