AMD CPU speculation... and expert conjecture

juanrga · Aug 22, 2013

Ags1 :

Because:

1) Only a small percentage of people is interested. I think only about a 0.36% of gamers has a octo-core FX. Moreover, the modular approach does not eliminate the money that AMD would pay to fabs for a low-volume product.

2) The goal of an octo-core FX CPU would be parallel workloads. However, the GPU is much better at parallel workloads than the CPU. Therefore, AMD is substituting one half of the CPU cores by an iGPU. Example? Substitution of octo-core Opteron CPUs by quad-core Berlin APUs.

3) An octo-core FX CPU lacks hUMA.

4) AMD want to maintain and develop a single platform, instead sharing resources between two: FM2-like and AM3-like.

5) APUs have better press than FX-line.

juanrga · Aug 22, 2013

blackkstar :

We would observe something like this

EXT64 · Aug 22, 2013

Hmm, why aren't the Intel (and AMD) CPUs using AVX in that test? If you are going to give the APU AVX code, you should give the non-APU chips it as well to make the test more comparable.

esrever · Aug 23, 2013

FX isn't going to be replaced by FM2. There just won't be steamroller FX until end of 2014 at the earliest. The boards are being phased out for ones that support the FX9XXXXs.

de5_Roy · Aug 23, 2013

ugh.. this 'fx is gonna die' talk is annoying. i know i've said something like this before but i meant the cpus like fx8350 i.e. their hardware, not the brand or the concept of a flagship product. if amd wants, they can take a cake and brand it as fx. it's just a brand.

a flagship can easily be a hexa/quad module apu or a unlocked, high clocked, dual module with a xeon-phi-like accelerator and be branded as an fx. amd is really mum about those but that doesn't mean it's outside speculation. from what i am seeing from the mobo vendors, socket fm2+ might just have a flagship like that in the future.

another speculation: top of the pyramid 8 core jaguar (dual 4 core module) for microservers and dual berlin apu with quad ch. ram, both with igpus validated for server and workstation tasks and/or the gcn cores (compute units) used for compute only (like xeon phi or tesla).

edit: current am3+ socket seems to be enough forward looking features to live a long life. except ddr4 support.. i guess. it'll make upgraders bored out of their minds though... :whistles:

sarinaide · Aug 23, 2013

I hate to say it but last year the Coolaler and OBR leak that Piledriver aka Vishera was to be the last traditional AMD CPU which was vermently shot down is taking root to be once again they were correct. AMD refuse to talk about any AM3+ other then the socket will be run for a short time longer and discontinued. Also by now there would be ES samples of sorts or early prototype chips which don't exist lending to speculation that the avered to leaked document is correct again.

juanrga · Aug 23, 2013

^^^^ This this

juanrga · Aug 23, 2013

Some more info on how hUMA architecture in Steamroller Kaveri improves the use of virtual memory:

Whenever the CPU tries to access a virtual address that's been written out to disk, rather than being resident in physical memory, it calls into the operating system to retrieve the data it needs. The operating system then reads it from disk and puts it into memory. This system, called demand-paged virtual memory, is common to every operating system in regular use today.

It is, however, a problem for traditional CPU/GPU designs. As mentioned before, in traditional systems, data has to be copied from the CPU's memory to the GPU's memory before the GPU can access it. This copying process is often performed in hardware independently of the CPU. This makes it efficient but limited in capability. In particular, it often cannot cope with memory that has been written out to disk. All the data being copied has to be resident in physical RAM, and pinned there, to make sure that it doesn't get moved out to disk during the copy operation.

hUMA addresses this, too. Not only can the GPU in a hUMA system use the CPU's addresses, it can also use the CPU's demand-paged virtual memory. If the GPU tries to access an address that's written out to disk, the CPU springs into life, calling on the operating system to find and load the relevant bit of data, and load it into memory.

Together, these features of hUMA make switching between CPU-based computation and GPU-based computation much simpler. The GPU can use CPU data structures and memory directly. The support for demand-paged virtual memory means that the GPU can also seamlessly make use of data sets larger than physical memory, with the operating system using its tried and true demand paging mechanisms.

palladin9479 · Aug 23, 2013

gamerk316 :

Except that a GDDR5 dGPU will beat out the DDR3 system memory APU every time. That's the whole point of the PS4 using GDDR5, they wanted to avoid the severe graphics performance penalty that comes with slower memory.

Now juanrga is just spinning in circles. I've already mentioned that the current scene is kept in system memory for fast reloading, typically in the 128~256MB range but 512MB isn't unheard of. Go over that 512MB limit and your program (game really) starts to have some severe limitations due to the dumb NT memory subsystem. I can't wait for native 64-bit games to start being released that actually make use of large amounts of system memory.

None of that matters because HUMA will have absolutely zero effect on DirectX and OpenGL. Both systems are abstractions of the actual hardware being used, the game has absolutely no clue if the GPU is sharing the same memory space as the CPU and shouldn't care. Nothing will be treated any different then it is now, for graphics rasterization anyway. HUMA really only applies to the iGPU being used as a powerful vector co-processor, stuff like modeling / math / video editing / ect. that typically makes significant usage of SIMD or OpenCL / GPGPU.

Now it bears mentioning that consoles, like the two coming up, are able to code to the bare metal and don't have to worry about compatibility with that abstracted hardware.

palladin9479 · Aug 23, 2013

de5_Roy :

I fully expect the next "desktop" process to be on the FM2+ platform. Brand names mean nothing to me, I only look at capabilities. A mainstream "desktop APU" is wasting approximately 50% of it's die space and thus becomes a very bad purchase value / cost wise. I'm looking at the economics of the whole thing. The concept of a powerful vector co-processor is way to tantalizing to not take advantage of, yet the entire iGPU doesn't need to be present to make that a reality. If your focus is to create a decently powerful desktop processor (mainstream) with the expectation that there is already a good gaming GPU, then you don't need all the parts of the iGPU, don't even need most of the parts. Take out what you don't need and your left with enough die room for another module (6 cores which seems to be the sweet spot for value right now) and possibly some more cache. Now normally doing all that would be rather expensive, lots of design work required. With BD being a modular design it becomes quite cheap to do that and you can actually make money off it.

I fully expect there to be a ~$200 CPU with 6~8 (probably 6) cores and HSA capability on the FM2+ platform.

palladin9479 · Aug 23, 2013

juanrga :

You do realize that 99%+ of PC games are released on Windows and the primary function for GPU's is gaming. Consoles are consoles, they'll be coded for hardware and I expect them to make the most out of it. Consoles also have extremely low hardware performance due to cost limitations, they rely on coding to hardware to maximize their performance.

HUMA does absolutely nothing for abstracted graphics languages like DirectX / OpenGL. There could be some "magic" happening at the driver layer with memory mapping to speed things up, I expect AMD to do this with their APU drivers.

Please people, learn some ASM and how things actually get done on a system. You can't magically throw acronyms at something and expect results. We don't live in an ideal world where each generation gets to start fresh. In Microsoft made their money by offering this thing called "Backwards Compatibility" which pretty much dominates the industry.

sarinaide · Aug 23, 2013

hafijur :

How is this new, AM3+ released before FX, makes little difference at the end of the day.

lilcinw · Aug 23, 2013

Hot Chips starts Sunday. AMD is slated to have four discussions but they are all related to HSA/SOC/APU.

If nothing else we should have some new official AMD slides we can argue about for the next six months.

sarinaide · Aug 23, 2013

palladin9479 :

Going through your posts in reply to the others there seems to be a difficulty reasoning a break from traditional computing which ultimately HUMA, HSA "Fill in accroymn here" necessitates. I understand its your line of work but AMD as a co-founder of HSA wouldn't have just stumbled on this ideology of breaking away from traditional standards just the other night while sucking thumbs. There is a process to this that was set in motion 6-7 years ago and perhaps answers will come in time and perhaps what you say cannot be done or isn't new may infact be very capable of being done and work. Time will tell.

Going on what the last 2-3 pages have shown and the reasoning behind it, HUMA as a step forward for AMD means a system doesn't need a powerful serial processor which is good in many regards as serial processors demand power scaling as they get stronger and the costs escalate. if this HUMA, HSA thing works then perhaps it is revolutionary as ultimately a GPU scales cheaper and easier on a die.

Nvidia's Maxwell introduces a HUMAesque design and this coincides with Microsoft and Nvidia pending announcements joining HSA foundation clearly shows that it smore than smoke and mirrors

juanrga · Aug 23, 2013

palladin9479 :

juanrga :

You do realize that 99%+ of PC games are released on Windows and the primary function for GPU's is gaming. Consoles are consoles, they'll be coded for hardware and I expect them to make the most out of it. Consoles also have extremely low hardware performance due to cost limitations, they rely on coding to hardware to maximize their performance.

HUMA does absolutely nothing for abstracted graphics languages like DirectX / OpenGL. There could be some "magic" happening at the driver layer with memory mapping to speed things up, I expect AMD to do this with their APU drivers.

Please people, learn some ASM and how things actually get done on a system. You can't magically throw acronyms at something and expect results. We don't live in an ideal world where each generation gets to start fresh. In Microsoft made their money by offering this thing called "Backwards Compatibility" which pretty much dominates the industry.

Sorry, but nothing of this invalidates my point or the quotes from AMD, experts, game developers... that I have reproduced.

Cazalan · Aug 23, 2013

sarinaide :

Like any collaborative projects the potential is great but the results can be a long time coming. The available developer tools look rather slim right now. For academics it's great. Who wants to work on traditional C code.

http://hsafoundation.com/hsa-developer-tools/

One note of interest is Microsoft isn't listed as a contributor but their compiler is listed as one of the HSA tools. (Microsoft C++ AMP)

palladin9479 · Aug 23, 2013

sarinaide :

There is nothing new about it. This has all been done before, usually it's a customized supercomputer running an extremely customized OS. We're just taking those concepts and scaling them down to the consumer level.

Now onto what MANY people seem to have issue with. Processors run binary code, this doesn't look anything like C++ / C# / VB / JAVA or any of the other high level languages. The closest thing you can get to it is ASM which is why I recommend people learn some of that. CPU's run off opcodes that consist of nothing but simple binary math and logical compares. That math can be divided into two groups, scalar and vector. Scalar is serial integer math and logical compares (if A > B jump to address C), Vector is doing math on multiple values simultaneously (Add A to value B, C, D and E). Using a vector processor to do scalar instructions is very inefficient, using an scalar processor to do vector instructions is again very inefficient. Prior to "HSA" you already had a vector co-processor, you called it the "FPU" but it really was just a SIMD vector coprocessor. It already had "HUMA" as it was part of the x86 uArch.

Now as Gamer can attest, most code is scalar in nature with lumps of instructions that is or can be vectorized. This means you will always, without exception, need scalar processors. A vector processor CAN NOT replace a scalar one where as a scalar processor can replace a vector one (thought at an extreme cost). This is why scalar and not vector processors are used in everything.

So no, one "HSA CORE" can not replace an x86 scalar "core". 1000 "HSA COREs" could not replace a single x86 scalar "core". There is no massive change coming, no special magic code that will somehow change the basics of processing. All AMD is doing is creating a standardized language & method for non-uniform processors to talk to each other and the OS to feed them instructions.

Cazalan · Aug 23, 2013

lilcinw :

They do a good job of archiving the presentations too. Not sure how long it takes to post them but some good stuff there.

Cazalan · Aug 23, 2013

palladin9479 :

Granted the ideas aren't new, but going from ad-hoc methods to a standard can be like night and day. The success ultimately depends on how many engineers ($$) these companies are willing to put on it. If it was just AMD it probably wouldn't go anywhere. With names like ARM, Sony, Qualcomm, Samsung , they have a chance of truly changing how easy it is for developers to fully leverage the hardware being made.

Then I see their website still says Copyright 2012 so who knows.

palladin9479 · Aug 23, 2013

Cazalan :

Ohh I quite agree, the standardized method allows generic programmers to take advantage of what previously was only available to the HPC world. It's not really a new capability as GPGPU and OpenCL have existed for awhile now, but it is a more efficient method for doing what those do. Actually that's a really good way to present "HSA". There is nothing that you can do under an "HSA" design that you couldn't already do with OpenCL / GPGPU, HSA is just a faster more efficient method to go about it. The big take away is that is does not replace the workhorse ALU's present in all scalar processors. It doesn't allow you to vectorize what couldn't already be vectorized. It's also why I push for people to learn how to write a basic program in ASM, that will teach them how a CPU process's data and the difference between scalar and vector instructions. How control logic and memory management logic work, how registers and stacks interact with the actual instructions you send the CPU. ASM is one step away from binary, most opcodes you write have a direct binary counterpart and very little rearranging / compiler "magic" happens. Because of that it's the absolute fastest language possible, yet it's incredibly hard to write in and could take you years to write what a C++ program can do in weeks. It's only purpose these days is to write low level drivers and as a teaching tool.

juanrga · Aug 23, 2013

palladin9479 :

Richland/Trininty are very good purchases for gaming and Kaveri will rise the level, whereas introducing serious compute on the desktop.

I like very much AMD's plan of eliminating one-half of the modules in the die (extra modules useless for 99% of people) and adding a powerful compute iGPU in its place. In this way AMD is able to provide a maximum 8x improvement in gigaflops per-watt and boosts of up to 5x in performance for enabled applications.

lilcinw :

What more could they talk about?

juanrga · Aug 23, 2013

I recommend Toms article about AMD APUs to anyone interested in this new thing named HSA, its advantages over OpenCL, and a vision to future

http://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262.html

I think the final words in that review summarize very well the reason why I am so enthusiast about Kaveri coming with HSA-enabled Steamroller (and why I applaud the most than probable abandon of the FX line):

At a time when cracks are appearing in Moore’s Law and the costs of shrinking fab processes continue to skyrocket, an industry that wants to keep accelerating compute capabilities must turn increasingly to optimizing efficiency. Ultimately, this is what GPGPU and HSA enable. By old methods, how much would a CPU need to evolve in order to facilitate a 5x performance gain? Now, such gains are possible simply through hardware and software vendors adopting an end-to-end platform such as HSA. No pushing the envelope of lithography physics. No new multi-billion-dollar factories. Just more efficient utilization of the technologies already on the table. And through that, the world of computing can take a quantum leap forward.

juanrga · Aug 23, 2013

palladin9479 :

And that makes no sense because an x86 core is a "HSA core".

HSA considers two kinds of compute units and the entire architecture/framework provided global computation using one kind or other or both at once depending of the kind of workload and context.

The main goal of HSA consists on selecting the most adequate compute unit for a given task, but as said above HSA allows for both kinds of compute units to work together in a given task. No, you cannot achieve this with OpenCL/GPGPU.

palladin9479 · Aug 23, 2013

juanrga :

SLAMS HEAD ON DESK

Stop using marketing terms without knowing what they mean. Seriously your starting to sound like an advertising brochure, powerpoints included.

I'll just say that what you just posted is complete nonsense.

You effectively just said this

http://www.youtube.com/watch?v=hkDD03yeLnU

“I’ll create a GUI interface using Visual Basic to see if I can track an IP address.”

Sounds great to viewers everywhere.

juanrga · Aug 23, 2013

palladin9479 :

juanrga :

SLAMS HEAD ON DESK

Stop using marketing terms without knowing what they mean. Seriously your starting to sound like an advertising brochure, powerpoints included.

I'll just say that what you just posted is complete nonsense.

Care to develop your post or must I take it as gospel?

AMD CPU speculation... and expert conjecture

Distinguished

Distinguished

Splendid

Splendid

Splendid

Splendid

Distinguished

Distinguished

Splendid

Splendid

Splendid

Splendid

Distinguished

Splendid

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Share this page