AMD CPU speculation... and expert conjecture

cdrkf · May 22, 2014

gamerk316 :

With the exception that HSA isn't an AMD only strategy. The majority of Intel CPU's now include an IGP, and HSA has pretty much all the ARM players involved so I think it's very much possible that it will gain a foothold. What I can see happening though is, like other technologies AMD has implemented, they're probably not going to be the ultimate winners of the technology (e.g. AMD's x86-64 instruction set, which is benefiting Intel nicely now).

The thing that HSA needs is time- this is the way all the players are moving (even if they're not calling it that). I agree that the magic self optimising execution system is probably not going to happen, but if everything starts becoming HSA compliant (or something similar even if under another banner) then the software developers will probably start moving that way. Not everything is going to be adjusted for it- as there is a large proportion of software that simply wont benefit. If a little app will run fine on a single thread on a 1ghz arm A7 core there's really no need. However it's the heavy duty software that's going to implement this stuff (so games, cad, rendering, video editing and so on) as those applications are where there is something meaningful to gain.

juanrga · May 22, 2014

jimmysmitty :

Ok. Those are my reasons to be optimist:

1) You seem to ignore that there are two AMDs and that the new AMD is different from the old AMD. As Koduri mentioned the AMD that he joined is very different to the AMD that he leaved.

2) The old AMD management and engineers responsible for past few years were hired. E.g. the entire team responsible for Bulldozer fiasco was hired, including the CEO and the vicepresident of engineering.

3) There is a lot of talent in the new AMD: Read, Keller, Papermaster, Koduri, Feldman...

4) New semicustom strategy to avoid falling PC market.

5) Skybrige/Ambidextrous strategy and new cores developed by Keller in harmony with new GCN arch. developed by Koduri.

6) Despite old management, red numbers, and process disadvantage the old AMD was able to offer competitive products, some of them being better than Intel alternatives. Even Inteltech admits that Beema/Mullins are better than anything that Intel has in that space.

7) Intel process disadvantage is disappearing. 3 years foundry lead is being reduced to 1 year.

8) AMD has wined many relevant customers to competitors: from console monopoly to MacPro.

9) Many exciting hardware/software new projects with broad support: HSA, MANTLE, SBSA...

10) Insert your favorite here.

Now about your other questions. MANTLE is available today in main engines and many MANTLE games are in development. At the time of writing this, DX12 only exist in some slides. Thus anything real as MANTLE is better for people.

Keller was responsible for the design of several ARM CPUs when worked at Apple. And he has excellent experience on x86 as well (K7, K8, x86-64...). He is the right guy to develop a high-performance ARM core.

Of course I can be wrong. But others also can. Time will say.

juanrga · May 22, 2014

gamerk316 :

No sure what you pretend to say here:

(a) That anything used in the Itanium is useless because Itanium "had poorer IPC than X86". But this is clearly wrong, because the Itanium also used SMT and fixed-length instructions and both are used in satisfactory modern architectures. E.g. Intel Sand Bridge Xeons use SMT.

(b) That having more registers is something bad because Itanium "had poorer IPC than X86". Again this is wrong. Precisely the x86-64 ISA doubled the number of general registers of the old x86 ISA: from 8 registers to 16 registers.

(c) That having more registers than x86-64 is something bad because Itanium "had poorer IPC than X86". Again this is wrong. The new armv8 ISA introduces the same number of registers than other satisfactory high-performance ISAs, including the POWER ISA v2.07 from IBM, i.e., the last ISA used in the CPUs that run circles around Intel Xeons.

juanrga · May 22, 2014

con635 :

I gave this news and the WCCFTECH link before (in this same page)

http://www.tomshardware.co.uk/forum/352312-28-steamroller-speculation-expert-conjecture/page-266#13330512

Then I mentioned that the story doesn't make any sense for me. I have been thinking about it and the only possibility to make some sense of what they say (admitting this is not a completely invented story) is if AMD is introducing ARM cores inside the next GCN 2.0.

We know from some leaks that next GCN will include serial processing units to assist the parallel processing units. We know that Nvidia is doing something similar, adding ARM cores to the next gen graphics architecture. And we have rumors that say that Intel will be adding x86 cores to Skylake graphics arch. with the graphics cores being developed by thee same architect responsible of the Knight Landing Phi cores.

gamerk316 · May 22, 2014

juanrga :

Regardless of the process design, AMD’s x86 Cheetah architecture would introduce new technologies geared towards compute. It is clearly mentioned that AMD’s x86 Cheetah cores won’t be enough to power compute needs which the APU is geared towards, hence for each CPU core there would be one dedicated 64-bit ARM core that analyses the incoming tasks & offloads them to the GPU. This is obviously not as efficient as running tasks that are using OpenCL (or any GPU acceleration), but it works with every application so even older programs like an old video editor or a pure CPU benchmark will use the GPU to do most of the work.

Read more: http://wccftech.com/amd-developing-generation-apu-x86-cheetah-arm-cores-features-gcn-20-cores-hsa-support/#ixzz32Tb58Qu9

This is stupid. I really can't stress this enough. It's not even doable as far as I'm concerned, and really can't see this increasing speed at all, simply due to the overhead time to do the analysis, never mind that CPU threads will not run well on the GPU.

Seems to me AMD is still trying to go out of its way to create ways to hide the fact its cores are lacking in performance, rather then fixing the actual problem.

gamerk316 · May 22, 2014

http://www.extremetech.com/computing/182790-amds-next-big-gamble-arm-and-x86-cores-working-side-by-side-on-the-same-chip

Reminding everyone this side-by-side approach has been tried several times before, and all failed for various reasons.

8350rocks · May 22, 2014

juanrga :

con635 :

I gave this news and the WCCFTECH link before (in this same page)

http://www.tomshardware.co.uk/forum/352312-28-steamroller-speculation-expert-conjecture/page-266#13330512

Then I mentioned that the story doesn't make any sense for me. I have been thinking about it and the only possibility to make some sense of what they say (admitting this is not a completely invented story) is if AMD is introducing ARM cores inside the next GCN 2.0.

We know from some leaks that next GCN will include serial processing units to assist the parallel processing units. We know that Nvidia is doing something similar, adding ARM cores to the next gen graphics architecture. And we have rumors that say that Intel will be adding x86 cores to Skylake graphics arch. with the graphics cores being developed by thee same architect responsible of the Knight Landing Phi cores.

Why this is wrong...

GCN can run SSE4 executable x86 code now...why would they need ARM cores to run serial code when they can already do it? (albeit much more slowly than a CPU, it can still be done...)

Cazalan · May 22, 2014

People must be spending too much time in Colorado to be taking that seriously.

8350rocks · May 22, 2014

Cazalan :

LOL +111111111111111111111111111111111

Cazalan · May 22, 2014

Maybe ARM licensing is getting too expensive? Now Qualcomm + Oracle + Imagination and others teaming up for a MIPS open source foundation.

http://www.theregister.co.uk/2014/05/22/mips_maneuvers_for_worlddog_adoption_with_open_source_foundation/

jdwii · May 22, 2014

Once you look into juan's sources you can find his fallacious arguments, remember why i bring that up so much when someone uses fallacious arguments they are not being truthful

cdrkf · May 23, 2014

Cazalan :

As with all things, it's good to have alternative sources. ARM is as dominant in low power and mobile applications as Intel is in PCs and servers, and as we've seen with Intel a single monopoly player isn't a good thing. Once a company becomes that entrenched it's very difficult to dislodge, I mean AMD managed to equal or surpass Intels best offerings from the release of the first Athlon to their first dual core and yet Intel still managed to shut them out *almost* killing the company.

AMD are now diversifying which will give them access to some new markets Intel simply can't manipulate in the same way. ARM aren't playing dirty at the moment (at least nothing I've heard suggests they are, yet) but keeping MIPS alive and kicking is a good way to help keep them honest!

juanrga · May 23, 2014

8350rocks :

I enjoy how you bring to the discussion the older 32 bit armv7 A15 core in each occasion that I mention the new armv8.

In any case, since you insist, the old 32-bit A15 core is slightly behind the jaguar 64-bit core in performance:

Well, for those that were anxious to see how Tegra's K1 performance does compared to the AM1 APUs, it does reasonably well in most workloads for processor-bound tasks. In many cases the Tegra K1 on the Jetson TK1 was competing with the AMD Athlon 5350 but in other cases the ARM SoC was struggling.

But, as you know, AMD will be not using the A15 core, neither any other old 32-bit core. AMD will be using the 64 bit A57 core and then will use a custom 64-bit core.

The A57 comes with two different modes of execution AArch32 and AArch64. In the old 32-bit mode the A57 offers 15--30% IPC gain over A15. First number is for integer and the last is for floating point.

In the new 64-bit mode the A57 offers 55--46% IPC gain over A15. Moreover, the A57 has been designed to target higher frequencies, at the nominal target frequency the A57 will be up to 100% faster than the old A15.

You don't need to do the math. AMD already gave some numbers for the Opteron A1100 vs. X2150. According to AMD, the A57-based Opteron was 2.85x faster than the jaguar-based Opteron and with substantial advantage in power consumption: in short, the ARM core is faster and more effective than the x86 core that replaces.

AMD also reports the performance per core for both A57 and jaguar: 10 over 7. This puts A57 IPC above Piledriver:

A57 IPC > jaguar IPC
jaguar IPC > Piledriver IPC
================
A57 IPC > Piledriver IPC

Of course the new K12 core will be faster than the A57, because the K12 is a custom core. We don't know if AMD will improve the A57 by hitting higher frequencies only or if K12 will bring additional IPC gains. I already posted here my estimations of the performance of the K12 compared to Piledriver and Haswell. I took a conservative viewpoint where the K12-IPC = A57-IPC and the only difference is that K12 hits 4GHz.

I got that K12 @ 4GHz is faster than Piledriver/Steamroller @ 4GHz and close to Haswell performance. People in other forums seems to agree, and some few speculate that K12 will be at Skylake level. Time will say.

juanrga · May 23, 2014

gamerk316 :

8350rocks :

(1) Nvidia officially announced that will be adding ARM CPU cores to their next graphic architecture. The last rumor is that those new graphics cards with integrated ARM cores are ready

http://techreport.com/news/26300/rumor-points-to-bigger-maxwell-gpus-with-integrated-arm-cores

(2) Some leaks suggest that AMD will be adding serial processing units to the next gen of GCN. My doubt is if those serial processors that appear mentioned in the diagrams are the same ARM cores mentioned by the WCCFTECH news or if the tech site is inventing the news.

(3) There are similar rumors for Skylake graphics. Except that the rumor points to Intel adding x86 cores (instead ARM cores) to post-Broadwell graphics architecture.

Both of you can consider this "stupid", "wrong" and the like, but if the WCCFTECH news is half-true, then AMD would be doing the same for GCN 2.0 than Nvidia and Intel are doing for their respective graphics architectures. In the three cases the addition of serial processing units to GPUs has two objectives: first to increase performance and second to simplify heterogeneous programing.

juanrga · May 23, 2014

gamerk316 :

Bulldozer was a fiasco because it was a unbalanced (and even self-contradictory) design and its defects were amplified by Glofo underperforming SOI process. It has nothing to do with heterogeneous computing; in fact, Bulldozer is an homogeneous computing arch.

Traditional silicon scaling is dead. Those few dreaming with a single core @ 100GHz can continue dreaming. It is well-understood that the only way to improve performance significantly is towards parallelism: e.g. 100 cores @ 1GHz.

CPUs are terribly inefficient at parallel workloads. On the other hand, GPU cores are designed for optimal throughput per watt and throughput per area. A GPU core that was "slower" but occupies double space than a CPU core or consumes 50% more power would be completely useless.

When you claim that the developer has to select what computation is made on the CPU and what computation is "offloaded" to the GPU, you are referring to the traditional accelerator approach to GPGPU, where the GPU is a second class processor (a co-processor) slaved to the CPU (first class).

This is not what HSA is about. In essence, HSA is about upgrading GPUs status to first class processors, i.e. at same level than CPUs.

The claim that the decision to where run instructions in an heterogeneous system is "ALWAYS going to be done user level" is untrue. There are a strong heterogeneous compute research on how do this at kernel level. Different schedulers are being invented and researched which select where to run the instructions according to several parameters such as raw performance, power consumption... The final goal is that the operative system decides what part of an heterogeneous system is best suitable to run a given work.

Of course those automatic decisions can be overdrived by the developer (compiler directives) or by the user (OS tunning), if needed. The HSA spec. defines a HSA-kernel with a HSA-aware scheduler which:

The scheduler manages scheduling and context switching of TCU jobs. Scheduling in HSA can be
performed in software, in hardware, or in a combination of both. An HSA implementation can choose how
to split scheduling work between software and hardware to best match system requirements.

I don't know why you insist on mentioning that devs need to support new hardware. This is a general rule since computers were invented and it also applies to HSA. You are not saying anything new. In fact, the HSA manual says:

HSA is a system architecture encompassing both software and hardware concepts. Hardware that
supports HSA does not stand on its own, and similarly the HSA software stack requires HSA-compliant
hardware to deliver the system’s capabilities.

What do me suspicious is that you never use your developers-argument against Nvidia or Intel. When Nvidia develops a new graphics architecture with new features, these have to be supported by developers. DX12 will support new hardware features not supported by DX11 for instance. The same about Intel, if Intel introduces new AVX instructions on hardware, developers have to support them. E.g. you need software prepared for the new Haswell instructions if you want use them.

You are making the mistake of believing that HSA is a only-AMD thing, when HSa is being developed by the HSA Foundation, which includes members that are more bigger than Intel. What you call "minority" is in reality the majority in terms of market share. Intel is considered the underdog in mobile. The success of HSA is independent of if Intel decides to join the foundation or not; this is the same that happened with x86-64, which was a success even with Intel initially rejecting it. Latter, Intel had to reconsider its position and adopt x86-64 due to its increasing popularity.

Finally, I find amazing your comment that HSA is not for the "real world". Thus according to you LibreOffice, WinZip, Adobe Premiere and Photoshop, Linux, Java 8, OpenCL 2.0... all them must be products for some fictitious universe, right?

Ok, then you must also pretend that the next recent HSA demos

http://semiaccurate.com/2014/04/16/amd-demos-fedora-running-arm-hsa-chips/

http://arstechnica.com/information-technology/2014/04/amd-demos-hsa-for-the-server-room-with-java-on-top/

were run in another universe. :sarcastic:

Cazalan · May 23, 2014

So that cheaper Win 8.1 rumor is true. We could see some under $300 APU systems in the near future.

Just has to have Bing as the default search engine, which can be switched.

http://www.dailytech.com/Microsoft+Announces+Windows+81+with+Bing+for+Lowcost+Computers/article34957.htm

griptwister · May 23, 2014

Here's an interesting talk with Jim Keller. I don't know if anyone has posted this, or seen this yet. It's looking pretty interesting. I'm excited!

http://www.youtube.com/watch?v=SOTFE7sJY-Q

juanrga · May 23, 2014

http://www.extremetech.com/computing/182790-amds-next-big-gamble-arm-and-x86-cores-working-side-by-side-on-the-same-chip

speculates about something similar to what WCCFTECH reports. Extremetech suggestion to make a x86-ARM big.LITTLE hybrid doesn't make sense to me. However the alternative that they mention in the page 3, "Another option: ARM cores, tightly coupled to the GPU", is just what I have been mentioning in this thread to make sense of the WCCFTECH news. As I said before the news by WCCFTECH only makes sense for me if the ARM cores are included in GCN 2.0. Some relevant excerpts from the Extremetech article:

Project Denver is a custom ARM core from Nvidia that the company intends to leverage across a wide range of markets, including a future GPU with an integrated ARM core on-board.

An APU implementation of this concept wouldn’t have the monstrous bandwidth to main memory that Nvidia baked into its original Project Denver unveil, but it wouldn’t need that bandwidth. Instead of taking over from the main CPU, AMD could rely on rapid communication between CPU and the ARM core via HSA, while leveraging the other chip for GPU program setup and multithreading. The advantage of this approach is that AMD might be able to handle all the necessary heavy lifting in its own software stack — done properly, applications might just run faster, without needing any additional acceleration. Our Steamroller deep dive discussed how much faster heterogeneous computing is on Kaveri as compared to Richland — integrating ARM cores could provide a further boost.

noob2222 · May 23, 2014

Someone really needs to look at the whole marketing campaign not just a small section of it.

AMDs quoted 10 over 7 is in one specific benchmark pitting the 8 core ARM opteron at an estimated clock speed of at least 2.5 ghz against the quad core 1.9ghz jaguar. Multiply that by 4 you get 10 to 7.6. Wow what a concept.

And where is this jaguar > pd? http://www.anandtech.com/show/7974/amd-beema-mullins-architecture-a10-micro-6700t-performance-preview/3

Last I checked with cinebench the higher numbers meant faster.

jimmysmitty · May 23, 2014

juanrga :

I am not a fan of WCCFTECH. Most of their stories are normally repeats of others and as well they tend to post every flippin rumor and assume it is true until the next rumor comes out.

A bit annoying TBH.

noob2222 :

Shhhhh...... marketing slides = truth. That is why Bulldozer was the superior CPU.

wh3resmycar · May 23, 2014

uhm remember this?

http://www.bit-tech.net/hardware/graphics/2011/02/17/amd-manju-hegde-gaming-physics/1

the only thing AMD gpu accelerated the past half a decade is "H A I R".

BELIEVE THE SLIDE, THE SLIDE IS REAL.

etayorius · May 24, 2014

wh3resmycar :

Did not Sony showed Havok calculating about 1,000,000 Objects through the PS4 GPU? Besides, AMD been doing Bullet for about 2 years... just not on the Mainstream Desktop Games Level.

https://www.youtube.com/watch?v=zPnwmsTokso

Skyrim can barely do 1,000 Objects with CPU Havok, add another 500 objects and the game will stop responding and CTD after several minutes.

https://www.youtube.com/watch?v=i_gzGVWgWgQ

I don`t know about you... but the APU inside PS4 has an AMD Radeon GPU.

If i remember correctly, AMD been talking about GPU Physics and AI GPU soon after they announced MANTLE too, so they may be preparing an announcement for Physics and AI on Radeon GPU`s.

It`s been like 3 years since that article, besides Manju Hedge (Ageia), Erwin Coumans (Bullet & Sony) and some Japanese called Takahiro Harada have been working on AMD to bring some sort of OpenCL Physics to Radeon GPUs.

wh3resmycar · May 24, 2014

etayorius :

wh3resmycar :

Did not Sony showed Havok calculating about 1,000,000 Objects through the PS4 GPU? Besides, AMD been doing Bullet for about 2 years... just not on the Mainstream Desktop Games Level.

https://www.youtube.com/watch?v=zPnwmsTokso

Skyrim can barely do 1,000 Objects with CPU Havok, add another 500 objects and the game will stop responding and CTD after several minutes.

https://www.youtube.com/watch?v=i_gzGVWgWgQ

I don`t know about you... but the APU inside PS4 has an AMD Radeon GPU.

If i remember correctly, AMD been talking about GPU Physics and AI GPU soon after they announced MANTLE too, so they may be preparing an announcement for Physics and AI on Radeon GPU`s.

you must've been new here.

http://
http://

8 years and counting and again, the only thing they've GPU accelerated is HAIR.

so to the dude that keeps showing "AMD slides", good luck.

etayorius · May 24, 2014

wh3resmycar :

You`re being TECHNICAL and you are horrendously wrong, AMD has been doing Physics through the GPU on Havok before and have also been doing GPU OpenCL Physics on Bullet since 2 years ago, you`re stating the only thing they have done is TressFX, i just gave you 2 more examples of AMD accelerating Physics, so next time don`t claim incorrect technicalities.

But you however can say this: AMD has only acceleated Physics through, Havok and Bullet in the last decade, and only TressFX in the consumer level.

You want gimmicks? we already got one called PhysX, which has done absolutely nothing to revolutionize games and has been used only for Clothes and Debris "effects" only.

Yeah i am new, i started posting on this same thread since page 60+ of this same thread, this is the second time someone brings the "New" thing to the thread, which is quite irrelevant to be honest, did not knew i needed some sort of Community approval to comment on this thread, guess ill have to bribe some users to accept me as one of the pack haha.

What about Star Swarm? you don`t see "PhysX" Like brand, but the demo is using the GPU for Physics and AI.

wh3resmycar · May 24, 2014

demos don't count sorry. so the 2 examples you've given are moot. unless you're into watching demos over and over again. my post is about not delivering on their promise.

i'll repeat 8 years and counting, the only thing they've GPU accelerated is HAIR.

EDIT: NVIDIA has a ray tracing demo, does that mean NVIDIA is actually doing consumer grade ray tracing? :pt1cable:

the point of my "critique" is that with AMD slides, you'd know what you're getting into. i'm looking at you the dude with "the slidez"

AMD CPU speculation... and expert conjecture

Judicious

Distinguished

Distinguished

Distinguished

Glorious

Glorious

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Judicious

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Champion

Distinguished

Honorable

Distinguished

Honorable

Distinguished

Share this page