AMD CPU speculation... and expert conjecture

palladin9479 · Aug 23, 2013

Care to develop your post or must I take it as gospel?

I've explained in excruciating detail the components of a processor, the kinds of instructions that it use's and the differences in how the processors under discussion work. As you have no clue how work actually gets down on the processor these explanations fly right over your head and you insist on clinging to a bad interpretation of marketing jargon. If you would take the time to actually learn what all these pieces are parts are and how they are put together to form a modern processor then you would understand the lesson, otherwise I'm just wasting my time.

-=Edit=-

The really really short non-technical answer is this. HSA is nothing but AMD gluing a REALLY BIG FPU to the side of a CPU.

griptwister · Aug 23, 2013

I'm sorry juanrga, I'm no genius, but I'm even sure you're wrong on this one...

juanrga · Aug 23, 2013

palladin9479 :

I hoped a genuine technical answer, instead this.

griptwister :

It is entirely possible that I was wrong, but without saying "where" and "why" I cannot check it neither learn in case was true.

griptwister · Aug 23, 2013

"an x86 core is a "HSA core."

HSA is a unification process between the CPU and the Graphics Processor. The CPU part handles the Serial load while the GPU handles the Data Parallel Load. Therefore, a x86 core is not a HSA core. In fact, a HSA isn't even a core, it's simply the sharing process. Well, that's what I understood the definition of HSA being...

**Edit** hafijur is on point with this one... I think.

lilcinw · Aug 23, 2013

palladin9479 :

If AMD is planning on moving the FPU out of the modules in favor of a GPGPU style FPU co-processor how does all of this get scheduled? Will there be a need for some kind of super scheduler that controls the modules and iGPU? Is that the job of the integer cores to schedule it? Will the module front-end send instructions to the iGPU front-end?

Will it even be possible to remove the FPU from the module completely or will it still be needed for legacy purposes? Will the CPU be able to pass x86 extensions to the iGPU or would that incur a performance penalty?

The more I think about it the more questions I come up with and realize how little I understand about what a computer does. Now I wish I was able to watch AMD's HSA presentation Sunday.

hcl123 · Aug 23, 2013

juanrga :

1) have you any *remotely* credible statistics of this lol ? ... and what about those that arbitrarily don't fall into the gamers category ?.. after all a FX is the price of middle to entry-level intel and light years superior to those (try linux)... and what about servers ?... after all Orochi the die design of FX is the same for the CPU chips for the Open Platform server systems, and open a platform and then say to potential adopters; "now you have to design your own CPUs " !? LOL...

2) "The goal of an octo-core FX CPU would be parallel workloads. However, the GPU is much better at parallel workloads than the CPU." .. though that is true, just go say that to Intel, their Phy X MIC things are about AVX (2 and 512), instructions that a CPU can easily run...

3) Not necessarily. Tahiti was supposed to have the "same virtual space" by IOMMU tricks, IOMMU which 990FX chipsets have. That didn't came about. But for being hUMA, the only that needs is having "cache coherency", then Hypertransport and its HTX slots(since long implemented in server boards) can solve the problem very easely. FX chips could have an IOMMU on die like APUs, and have HTX slots from the CPU (like in servers and not unlike PCIe slots for the 12 thread high -performance intel)... i even presented HTX+PCIe combo slot patents, meaning for this the FX only needs also a PCIe bridge on die. hUMA is basically an evolution of NUMA( non uniform mem addressing-> which every server system of AMD can employ)... its NUMA+IOMMU...

5) You mean the presstitutes the pressluts ?... i wouldn't worry too much about the press (unless you like to be driven by the nose), there are plenty of good or not bad press still, you only have to look harder. Besides basing technical decisions on "subjective" appreciations for a dying World, the Windows(tm) PC world, is hardly any coherent performance appreciation, even less for a design that will serve the server world that is more and more Linux alike... and would be the worst mistake AMD could do.

hcl123 · Aug 23, 2013

sarinaide :

Oh! yes that can be quite correct, AM3+ kind of sockets is EOL...

The rest i doubt very much. Besides making an Steamroller AM3+ then they will have to make G34 or GC36 sockets for server...repeate the same positioning of now and all. Better will be to have the same socket for server as for FX lines.

And saying an FX can be hUMA also so soon ( and it can be, *if* and when only AMD knows), could impact very negatively Kaveri selling...and worst if not a server socket that is already prepared for HTX slots, they would have to invent a totally new socket for this, that will have a low penetration in any case ( the enthusiast is a very small market in any case).

No... if SR FX comes later, it will use the same socket of a server boarder like the foreseen GC36 ( DDR3 and DDR4 support), only no need for registered ECC mem.. and it *could* be hUMA (not saying it will, only saying it could), IOMMU and hUMA for other accelerator boards (FPGA like), could be very useful over HTX slots, like is used for some HPC and server systems now( only now those are not hUMA, are NUMA)... not only for GPGPU. And all would fall into the Open Platform initiative.

To me Warsaw is Steamroller, the same die could be FX, like Orochi rev C for server is the same die for the actual FX (save very minor irrelevant tweaks). So saying AMD hasn't announced SR FX can be faux if one can read properly.

palladin9479 · Aug 23, 2013

Page 136 refuse's to load for me, keep getting error 500 or just a blank page.

If AMD is planning on moving the FPU out of the modules in favor of a GPGPU style FPU co-processor how does all of this get scheduled? Will there be a need for some kind of super scheduler that controls the modules and iGPU? Is that the job of the integer cores to schedule it? Will the module front-end send instructions to the iGPU front-end?

Will it even be possible to remove the FPU from the module completely or will it still be needed for legacy purposes? Will the CPU be able to pass x86 extensions to the iGPU or would that incur a performance penalty?

The more I think about it the more questions I come up with and realize how little I understand about what a computer does. Now I wish I was able to watch AMD's HSA presentation Sunday.

It's possible. People really don't know what the "FPU" does and they end up making silly assumptions. FPU is the Floating Point Unit, the unit that does binary math on values that have a decimal place. Processors only know binary math and in binary there is no decimal place, values are always a whole integer. To represent decimal data we end up splitting the value into two pieces, whats before the decimal and what's after, do work on them separately and put the value back together. That is very expensive computationally to do on a generic integer unit. So instead we design a unit that's hardwired to do those kinds of calculations and doesn't need to do a tons of integer work to simulate it. That is all an "FPU" is, later we added the ability to do basic vector math and that eventually evolved into what we call a "FPU" but is really a SIMD co-processor. It still does math on decimal numbers but can also do math on large arrays much faster then a scalar integer processor.

You can't divide work into "serial / parrallel", it won't make sense. Instead it needs to be divided into scalar and vector which is extremely close but not exactly the same. Doing memory operations and logical compares are both things that can be done both in serial and parallel depending on the task, neither of which can be done well on a vector processor. It's really important for people to realize that vector processors are VERY limited in their scope. When their good their REALLY REALLY good, but when their not good they absolutely suck. They are poor at control and general purpose processing but absolutely amazing at array math.

Best way to put it is that vector processors are beyond amazing at calculating the density of a neutron star, simulating the gravitational effects of a black hole, calculating particle physics and rendering polygons. Their absolutely horrible at everything else, to and including running your Operating System and playing video games.

Also it bears to mention that the iGPU on SR would be significantly weaker then any dGPU present on the market at that time. It would not be a "very powerful blah blah", but a "decently powerful blah blah". It wouldn't touch the raw computing performance of something like a 770 / 7970, this is physics speaking. It's a nice coprocessor and I could see it being handy in quite a few situations but lets try to be real here. It's single biggest advantage is latency, no need to ship a data package off to the dGPU when you can get it done locally so it's useful for vector math that's not large enough to warrant sending to the big guy.

Now to the question posted above, it would heavily depend on the internal language of the GCN architecture and it's compatibility with SSE and other common SIMD languages. It would be a good bet that since AMD's been working on this for awhile, that GCN was designed to be compatible with the common SIMD operations and there could be a bolt-in FP scheduler added. We won't really be seeing this with SR though, possibly with whatever comes after.

hcl123 · Aug 23, 2013

sarinaide :

AM3+ was release more than one year ago, doubt there will be any new chipset for AM3+. Also doubt very much there will be any new CPU for AM3+, except a new minor SKU/tweak of Vishera eventually (even so not much likely), like Centurion was already.

SR FX, if they do maintain that name(they could change the name anyway), in any case will be a new socket.

That is my strong bet. Sorry to disappoint some eager waiters... and even AMD... but better the truth sooner, than shattered illusions later. Can be wrong can be right, but suspect AMD will never announce anything now in this department, they simply can't, its very bad marketing.

hcl123 · Aug 23, 2013

palladin9479 :

I think that is not entirely correct. There is no silver bullet, but many effects and "physics" can benefit greatly if codded with HUMA in mind, specially if using either the CPU as well the GPU, and even if the majority of the game code is as usaully JITed and only the driver does its thing. Also there are "task" queues visible from user-space. Now this could be very interesting, only is not about hUMA is about HSA... HSA games !

juanrga · Aug 23, 2013

griptwister :

HSA is not a process. HSA stands for Heterogeneous System Architecture. This is an Architecture for unified computation using different kind of compute units (CU).

As I said before, HSA introduces two kind of compute units: LCU and TCU

The LCU is a generalization of the CPU. An LCU supports the native CPU instruction set and also supports the HSAIL instruction set.

The TCU is a generalization of the GPU. TCUs supports only the HSAIL instruction set.

This is the sense on how "an x86 core is a HSA core"; of course the inverse is not true. It is now when I must confess that I referred to "x86 core" and "HSA core", because palladin9479 was using that nomenclature in his posts. It is more correct to talks about LCUs and TCUs in HSA.

As I said before, the ordinary application uses each compute unit for a specific kind of task. E.g. TCUs excel at parallel workloads, but HSA goes beyond (thanks to hUMA) and allows for both units to work together in a same task. I already quoted this before:

Both processors can use the same pieces of data at the same time. You don't need to copy stuff and this allows for completely new algorithms that utilize CPU and GPU at the same time. This is interesting since a GPU is very strong, but extremely dumb. A CPU is extremely smart, but very weak. Since you can utilize both processors at the same time for a single task you have a system that is extremely smart and extremely strong at the same time.

I wait this clarifies something things. Note that it is a common misunderstanding that HSA only deals with CPUs and GPUs. This is not true, HSA also considers DSPs for instance.

palladin9479 · Aug 23, 2013

hcl123 :

And none of that applies to abstracted graphics API's like DirectX and OpenGL. Specifically the parts that are shipping data off to the GPU to be drawn / shaded / ect. Things like HSA / HUMA are applied at a very low level, damn near bare metal level. In HLL's and abstracted layers you don't work directly with hardware addressing or even opcodes. This is really gamer's territory but I do know that both graphics API's must work under the assumption that the GPU is a separate entity to maintain compatibility with dGPU's. This makes many of the aforementioned assumptions impossible, at least at the application level. You won't be using HUMA on a PC to make rendering graphics faster. Consoles are a different story, but definitely not PC's.

From the systems point of view a bolted on non-X86 vector processor is just a coprocesor similar to the SIMD units. If their using a bolt on scheduler then it'll just act like a 16x or 32x shared 128-bit FPU unit, if their addressing it as a separate unit entirely then you'll need compiled binary code to make use of it. 99% of the consumer software is not JAVA or any other JIT language, it's compiled for binary before it ships from the manufacturer. The closest you get is the DirectX / OpenGL API's, but those are done by the graphics drivers made from AMD / Nvidia, not even developers get to see inside them.

hcl123 · Aug 23, 2013

palladin9479 :

I think that in the sense implied by the previous poster, that is, GCN replacing the FlexPU or the FlexFPU being around GCN, like a GPU inside a CPU, i think will never happen... or at least for the foreseeable future.

The problem i think is not the kind of instructions and the nature of data of the SIMD paradigm. Or in a sense it is, but the problem is exactly the all filtering and texturing alike features very specific to a graphics processor. Even if completely ray-traced, the nature of most of the render or pos-render operations (ROPs) will be enough to clog any CPU like uarch where latency is much more important.

So its not the "shaders" that are the real problem is everything but the shaders. Worst many render, texturing, and filtering ops, also interact directly with shaders ( interpolation can be striking). So avoiding having all that huge latency insensitive data- flow near a CPU cache structure, for now is the best approach. In that sense the division of "latency work" and "parallel work" of the AMD/HSA paradigma is not that artificial.

So in the foreseeable future, an APU with a GPU will continue having separated elements. Even if ray-tracing starts to be pervasively used, there will still be a specific graphics processor with some shaders. In this case the FlexFPU can grow much in power and size for this ray-tracing, but there will still be a graphics specific "processing" element with "shaders" for graphics more specific operations...

In this last case the FlexFPU can becoming to resemble the GCN. But if intel push with Larabee is any value those FPUs will still be based on AVX like instructions and vectors, and the "shader" of the GPU specific element will have different instructions and data model, and could be much smaller than today, yet maintain very high levels of performance, since graphics could be mostly ray-traced like.

This is good for HSA approach, server/HPC chips could have a lot of FP power but no graphic specific elements... those specific elements, usually still with a lot of fixed functions, could be smaller and in discrete card on expansion bus, or be APU like and be on the same die. But then the definition of APU will be blurred, all "compute power" would be definitely on the side of the CPU on those FPU SIMD like pipes, and every CPU by definition will be an "accelerator".

I think Intel has not given up yet, and an evolution of Larabee are what they will pursue. "Shaders" will be Phy X like cores, the integer cores will lose the FP capability, and there will be a small GPU like now for all graphics specific stuff. Intel having something like a *module* is also a good bet... yet they could maintain real separated cores. In this last case, the same has above would apply to them... but i think is their crux to "copy" AMD a lot of times lol... so module like lol

8350rocks · Aug 23, 2013

palladin9479 :

This is entirely true...

@juanrga:

I think what you're missing, and what palladin9479 is trying to explain, is simply this (as best I can describe it):

The "scalar" cores or integer cores, can do physics calculations, but it would be like using an abacus to do math. Can you get there? Sure, eventually...however, it's not efficient.

The "vector" instruction units, or FPUs, do physics, etc. extremely well...like supercomputer capability in a smaller scale sense. However, the FPU is not capable of basic scalar operations. It's like trying to get an x86 OS to run on a POWER architecture machine. It's not going to happen. They don't speak the same language. In a sense, it's simply not made to do those things, and hypothetically, even if you could get it to do serial/scalar operations, if you managed it. The FPU would be so horrible at it, it wouldn't even be worth the effort to get there.

HSA is going to bridge a gap in many ways, however, only vector instructions are going to be shared. GPUs/FPUs do not do integer calculations for s#!%, for lack of a better layman term.

So, it's like you're talking past each other. You're not grasping what he is saying, and he is hearing the same thing back from you as you said before.

Basically, HSA will only be effective in situations where the software/code is compiled for it, and it will only show massive performance gains in situations where there are a massive numbers of array calculations/vector instructions.

Additionally...DX and OpenGL, etc. are not going to see massive benefits from HSA initially. In fact, I would doubt either one would see much benefit for a while, here's why:

M$ isn't going to give a rat about putting HSA enabling improvements into DX API, primarily because they're never in a hurry to do anything, and are typically too lazy to optimize for something that doesn't show them a direct cash benefit.

On the other hand, OpenGL is run by the Khronos Group...and while they have their heart in the right place (most of the time), they're too discombobulated and lackadaisical to actually get anything done in a time frame that would even remotely be relevant at all. In other words, when DX12 hits with full HSA enabled optimization in the DX dev tools, then OpenGL will likely "be looked at" to keep up with DX12. They don't care much for innovating anymore, they just want to avoid falling too far behind DX and M$.

As much as I wish OpenGL was a better API, unfortunately, it has just been left languishing in the hands of inactive or incompetent people who have had no motivation to innovate, and cannot agree on how to fix it. So they keep bolting on new crap on top of the old stuff and nothing is ever streamlined or made to work coherently. Additionally, it's hard to even get a consensus among people who develop with OGL regularly as to what is the best way to do anything, because you can typically do the same thing 5 different ways, and none of them are the fastest or the easiest.

HSA will be a benefit in gaming at some point in the future, and may streamline some things in the way games work now, by removing some "minor" bottlenecks on a very low level, however, many games will not be optimized to use it to it's full potential until it has the full backing and optimization from the APIs used to develop AAA titles in the modern gaming world.

Hence my argument that I still need an 8 core CPU, because most of the crap I run, will likely see less benefit of HSA than even some of the things your typical PC user will. Specialized tools are often neglected in favor of mainstream applications and software in this regard unless the demand is extremely high. In this case, the demand won't be that high for a while likely as most development houses have top notch machines in place that can currently do nothing with hUMA or HSA.

juanrga · Aug 23, 2013

hcl123 :

juanrga :

1) have you any *remotely* credible statistics of this lol ? ... and what about those that arbitrarily don't fall into the gamers category ?.. after all a FX is the price of middle to entry-level intel and light years superior to those (try linux)... and what about servers ?... after all Orochi the die design of FX is the same for the CPU chips for the Open Platform server systems, and open a platform and then say to potential adopters; "now you have to design your own CPUs " !? LOL...

2) "The goal of an octo-core FX CPU would be parallel workloads. However, the GPU is much better at parallel workloads than the CPU." .. though that is true, just go say that to Intel, their Phy X MIC things are about AVX (2 and 512), instructions that a CPU can easily run...

3) Not necessarily. Tahiti was supposed to have the "same virtual space" by IOMMU tricks, IOMMU which 990FX chipsets have. That didn't came about. But for being hUMA, the only that needs is having "cache coherency", then Hypertransport and its HTX slots(since long implemented in server boards) can solve the problem very easely. FX chips could have an IOMMU on die like APUs, and have HTX slots from the CPU (like in servers and not unlike PCIe slots for the 12 thread high -performance intel)... i even presented HTX+PCIe combo slot patents, meaning for this the FX only needs also a PCIe bridge on die. hUMA is basically an evolution of NUMA( non uniform mem addressing-> which every server system of AMD can employ)... its NUMA+IOMMU...

4) You mean the presstitutes the pressluts ?... i wouldn't worry too much about the press (unless you like to be driven by the nose), there are plenty of good or not bad press still, you only have to look harder. Besides basing technical decisions on "subjective" appreciations for a dying World, the Windows(tm) PC world, is hardly any coherent performance appreciation, even less for a design that will serve the server world that is more and more Linux alike... and would be the worst mistake AMD could do.

1) Steam Hardware survey. One correction: I said 0.36% and it is 0.31%

4) As I said before, FX chips are better than many reviews claim. But lack of professionalism / unfairness doesn't hide the fact of that the FX chip have had bad press.

hcl123 · Aug 23, 2013

palladin9479 :

That is where HSAIL enters... it will be native + HSAIL... and HSAIL isn't by definition binary, its a close to metal intermediary language, that could deal with graphics specific languages in some extent ( not specific for it, for it there is AMDIL, that is, OpenGL directly compiled to AMDIL as example... and i think VS DX gets a lot of cookies for this to), and could deal for most(all) of the rest that doesn't need a binary format for a CPU, which could include those GPU task queues( i must confess i'm not sure of this part). Basically then the workload will be JIT finalized by a specific engine to a lot of archs, and the runtime has some interpreting capabilities to, not only some APIs.

So questions why XBone is not HSA ? ( i have some lol)... and isn't true that Microsoft Visual Studio is used to code games ?... the same logic will be , or could be, with HSA toolchain (no idea what they will call it) based on LLVM developments. HSA APIs for games in that sense will be HSAIL and more specifically in AMD case AMDIL also. Everything could be much less bloated than with most current intermediary API abstractions, the developers much more free to mix OpenCL and other constructs they like in a game, everything in a sense could be more close to the metal, and drivers work alleviated.

That is why i talked of HSA games, and the "never settled" going "forever" doesn't surprise me a bit.

So in sense i don't think an HSA game will run on intel or with nvidia without proper HSA runtime or proper specific drivers... which being the case so far, mens only AMD will provide those, and this if their hardware is present, no other vendor will touch those (unless nvidia joins the HSA).

ARM will be well covered.

So saying AMD is all about HSA, like risking everything on HSA... any wonders why all this hasn't anything to do with ditching FX ?

hcl123 · Aug 23, 2013

8350rocks :

Funny you say that because "rounding" operations, transforming FP into integer, are quite common even for GPUs. Worst, the juicy part of AVX 2 is exactly the Integer vector part, and those in AMD like intel will run in what you can call the FP pipes with FMAC abilities.

8350rocks :

That is not true in case of hUMA, which was the point that lead to this. It depends on the definition of "effective", but hUMA will already provide some benefice even if nothing is changed in an application/ game code... more if its coded to take advantage of hUMA, specially "compute" jobs.

In any case i think most of the hUMA magic will be in BIOS(to set it up) in drivers and in the OSes that have to have some minor patches to work properly with this. Userland job will be minimal in any case.

In case of HSA is "mostly" correct, it must be hardware + software working together, being this software not exclusively application code. So more correctly is the software above all. Because HSA software can run on non-HSA hardware, with the proper runtime environment. Intel will never have hUMA like defined by HSA, but they can run HSA software, AMD will provide the proper runtime with their GPU drivers... that is why Broadwell is going mostly BGA, they don't like AMD they don't wont AMD... even if with the "close" starts to smell like rotten (dead) for the sides of nvidia lol...

ARM the initial idea is very identical, they will not have hUMA in a first phase... but they will optimize to the bone their runtimes, OSes and drivers(hUMA can be emulated if need).

(Real HSA software will always need a specific runtime... or intermediary abstractions like Aparapi for Java)

8350rocks :

That is correct, they have to be re-complied to HSAIL, and HSAIL/AMDIL in case of games. That is why "Never settle" has gone "forever", additional binary translations or intermediary abstractions to HSAIL/AMDIL first, simply involves to much overhead, any possible gains would be lost in the additional steps... not an option.

juanrga · Aug 23, 2013

8350rocks :

HSA's goal is not to bridge the gap between scalar and vector instructions (that is why I refuse to use his terminology) but between "latency" and "throughput". This is well-observed on the definition of the HSA compute units, which I gave before. There is no such distinction between scalar and vector compute units in HSA. In the end, this is why we are talking about the fusion between CPUs and GPUs and not between 'scalar' cores and FPUs.

Effectively, integer cores are not optimal for physics calculations, but if the only problem was what FPUs are better for doing physics computations, AMD (or any other) would simply implement a bigger, more powerful, FPU unit in the CPU, and ready! Things are more complex.

The reason why AMD (or Intel or any other) doesn't do that is because this is difficult and has a big cost due to various factors of the CPU architecture, including the latency hiding requirement (I think this is the reason why HSA differentiates between latency and throughput compute units). It is much easy to scale FPU performance in a GPU (or similar compute unit). I already provided a graphics that shows the increase in floating point performance in GPUs compared to CPUs for a span of several years.

Since the increase in performance of GPUs is 10x that of CPUs, and this tendency will continue in next years. AMD and HSA partners have developed a new architecture to use all that performance in an optimal (performance) and simple way.

Evidently, HSA will be only effective in situations where the software/code is compiled for it. The term "HSA enabled software" has been used before. I have recalled several times how developers are finding up to 5x performance gains when HSA is enabled. Evidently, if the software ignores HSA, then we will see no gain.

I don't understand why this point about the software is being repeated forever in this thread. Some of Haswell improvements require software to be recompiled for it. However, I don't see the "only if software is compiled for it" repeated forever in talks about Haswell.

Sincerely, I don't care about Windows 8/RT and other failed experiments. I know that AMD has provided HSA support to the Linux kernel and compilers and it is now colaborating to enable HSA in LibreOffice.

Correct me if wrong, but is not OpenGL coming with HSA support this year?

Finally, I already mentioned that I understand that some people want/hope/need a Steamroller octo-core FX chips or similar. I only commented that they are a minory and that all the info at hand points that such chips don't exist.

-Fran- · Aug 23, 2013

I think the main idea behind HSA is to make the iGPU a big self-contained SIMD+FPU part/engine like palladin says. Unifying memory is the very first step to start making everything more heterogeneous. I'm sure they have plans to share registers and even unify the decoders + dispatchers at some point, after having the abstraction layer for OGL, DX and OCL, plus giving compilers the support. My take is they won't break the mold that much, I'm sure they'll de-couple everything at driver/kernel level so they have a dispatcher or translation layer for every specific DX/OGL/OCL and put it into good old ASM for the CPU to distribute.

In other words, I think the most sane way to make HSA more likely to succeed is to just make a huge abstraction layer for everything (in the roadmap, IIRC) and then go into the hardware to start fusing components around.

I do believe that the GPU being fused into the CPU is perfectly doable without leaving X86 at all (yes, I feel like we're discussing Larafail), but you'll need a LOT of stuff to make that happen.

Also, I think why Intel and nVidia don't want to be part of that. I'm sure the ISA would get really bloated and at some point and they would start regulating what goes in and out from the spec (that's with my assumption being correct).

Heh, it's the exact same as the time when the FPU was off-die and CPUs would need it to "accelerate" some code (I remember AutoCAD asking for it). It was from 386SX to 486DX IIRC.

Cheers!

mlscrow · Aug 24, 2013

I might have some good news fellas, though I wouldn't look too deep into it, but I found AMD's Roy Taylor on Twitter and I sent him a message quoting him saying, "...we still love PC gamers and we're absolutely committed to them." and I asked him then why we aren't seeing an 8 core Steamroller FX chip yet and he replied saying, "...keep the faith." Take that for whatever it's worth, but I actually feel good about that response. Fingers crossed.

griptwister · Aug 24, 2013

mlscrow :

That's the kind of thing I wanna hear. Thanks for sharing.

noob2222 · Aug 24, 2013

). It is much easy to scale FPU performance in a GPU (or similar compute unit). I already provided a graphics that shows the increase in floating point performance in GPUs compared to CPUs for a span of several years.

Ask gamerk on this one about scaling to 12+ cores with 4 CPU cores running @ 3.5ghz and 8+CU @ ~1ghz.

Seeing this massive improvement will need massive software. Everything today won't see much benefit at all.

IF all you claim is true, AMD is banking on this APU when at release will be slower than BD. 2-4 years from now it might look ok but what reason is there to upgrade now? just wait the 2-4 years for the next big thing because upgrading now will be a downgrade.

With releasing only a 4 core APU, everyone that has PII x6, or FX 6xxx-8xxx will see no reason to upgrade, especially considering most FM2+ motherboards would be a downgrade.

This is why if AMD abandons the AM3+ completely will piss off any loyal customers.

The other side of the coin is AMD needs to get this hardware out there so that the software can come.

In the mean time they need to support both mainstram and high end, and not just pushing low end parts that are designed for "tomorrow's software"

Ags1 · Aug 24, 2013

It is much easy to scale FPU performance in a GPU (or similar compute unit). I already provided a graphics that shows the increase in floating point performance in GPUs compared to CPUs for a span of several years.

For completely different programming problems. FP performance for vector use cases can be sky-high, but are irrelevant to scalar use case.

esrever · Aug 24, 2013

I believe HSA will completely cut out discrete GPUs except for the highest of high end systems in the next 10 years due to how little we gain from pure graphics hardware these days. The first part for AMD would be to get the hardware out and then the entire ARM consortium will follow. Every soc in the world will eventually use HSA because they will always have a integrated GPU, this allows for higher hardware utilization. If AMD can standardize it and get enough support for it, they can be in a good spot for mobile cpus without discrete gpu parts. I doubt the GPU compute units will be integrated into the FPU any time soon. They would need both for the foreseeable future. What HSA does is make it easy to utilize the GPU portion of the APU so the programmer can choose to not use the FPU where the GPU cores will be faster. Once HSA become the common place, AMD can eventually do away with all but 1 FPU in the CPU, which would be purely used for legacy support. I don't believe GPU accessible registers that can process SIMD instructions will be in the works for a while. You will still need legacy support on FPU instructions even if you can do that simply because of scheduling to the GPU cores will get you conflicts and you can't always resolve these with pipelinable instructions on the CPU side.

Kaveri having huma is a big thing but also its not. In the PC front I don't see it coming to great use for a couple of years. The hardware has to be out before the software. Having a coherent memory across all heterogeneous processors is required but will not be utilized at this point. Simple tweaks to compilers should start make use of it eventually. I would like to see some good GPGPU performance from it but I doubt its a game changer yet. We will have to wait and see.

simon12 · Aug 24, 2013

I still don't really understand what HSA & HUMA will do in practical ways, are the following statements correct (Isuspect not in many cases):
HSA will only boost performance in applications that could be accelerated by a GPU
HSA will add no boost to performance in programs not optimized for it
Also some questions:
Can HSA be used by a dedicated GPU to increase performance using a dedicated card and no APU?
If so could it use current hardware ie a FX6300 + a Radeon 7770?
If not could it boost performance with a next gen GPU ie FX6300 + Radeon 8xxx or 9xxx?
If either of the above 2 can help is it possible we may see systems using a dedicated HSA card ie a FX6300 + a Radeon 8750 for HSA + a Radeon 8950 for the rest of the game.
Is there any possibility of HSA being used in Intel or Nvidia hardware in the future?
Do you think HSA will be a big deal in a gaming PC in a years time?
Hope this makes sense.

AMD CPU speculation... and expert conjecture

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Honorable

Honorable

Splendid

Honorable

Honorable

Distinguished

Splendid

Honorable

Distinguished

Distinguished

Honorable

Honorable

Distinguished

Glorious

Distinguished

Distinguished

Distinguished

Honorable

Splendid

Splendid

Share this page