AMD CPU speculation... and expert conjecture

Page 270 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

griptwister

Distinguished
Oct 7, 2012
1,437
0
19,460


Lol, or is it possible AMD is going to continue to use AM3+ and AM4 throughout 2014?
 
Adding two more memory channels definitely needs a new socket as you damn near double your interconnects. It's 200+ pins per memory channel (64-bit width) and then some.

Now having said that, just because a IMC is capable of talking to four memory channels does not mean it will be released on a platform with all four memory channels. AMD tends to target budget / low mainstream and quad memory channel would be expensive.
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780
I remember reading a rumor that Kaveri will have GDDR5m (not GDRR5) which is faster than DDR4 and Slower than GDDR5, if they also use Quad Channel then AMD may have a win.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


That's a different platform without APU support. They don't really need quad-channel for that platform. The servers do with 12-16 cores, socket G34 (1974 pins).

For reference: FM (904 pins), FM2+ (906 pins).
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


You waste your time.

Its a completely fabricated controversy. What juanrga doesn't understand is that according to his "theory", if hUMA breaks on even an APU with 2 simultaneous mem pools (like DDR3 + GDDR5) it will break even more with something like hNUMA LOL

Matter of fact in his system only with CPU and 2 channels of DDR on DIMMS, there is already "different access times", the DIMM further from the CPU will always have worst access times. That is why is arbitrated, that is why there is timings possible to tweak, and that is why is "synchronized" ( its *S*DRAM).

The main problem is that is MMU for the CPU and IOMMU for the GPU/accelerators, and it can have "coherent Memory" or memory coherency without being cache coherence. IF GPU had a similar MMU of the CPU then it could easily be ccNUMA. Its not that because different HSA implementors can have different coherency protocols on the CPU side, and AMD could not force an unified view (MOESI) on this.

So hUMA is already a cc-hNUMA of sorts, otherwise it couldn't have cache coherency being the CPUs MMU and GPU/accelerators IOMMU. Those MMU on the CPU side only have to co-operate with the IOMMU on GPU/accelerators, no need for different "coherency protocols" on the CPU side to work, only adapt those to work with IOMMU(edt), and is this because a GPU/accelerator can be "memory coherency" (same virtual space) without being cache coherent with the CPU, its a lose broad implementation, and it must be this way because of the very different accelerators and implementors on HSA.

So what could not fit HSA is exactly juanrga view, a strictly "uniform" mechanism for all... no one else would be on HSA except for AMD.

But we need not to worry it will go to dGPGPU.. its on the roadmaps.



 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


You forget "drivers" or the "runtime"... the JIT environment that all GPUs now must have, and run mostly on CPU, and those are "transparent" to the game developers. Those are quite bloated and those could make use not only of "automatic vectorization" but scaling with "multi-threads" as much as possible.

Besides most of geometry could be offloaded to GPU also, direct userspace task "dispatch/scheduling" seems tailored for this, don't know specifics of how much, but some more multi-threading perhaps could apply here to.



You don't need to get rid of the VRAM. Matter of fact IOMMU on a VRAM-less GPU doesn't make sense... where would be the "memory" for the "I/O Memory Management Unit" to manage ??... The fact that cache coherency is included in the specification, only means that a marriage some how with the ccNUMA specifications is employed... you can have NOW already "coherent memory" without cache coherency...

[UPDATE : Its only 1 mem pool in PS4 and Kaveri, but is not VRAM-less anything, matter of fact its already arbitrated without GPU, that single mem pool is NOT divided ( that tis the old scheme) on hUMA, but shared ... sorry the language, like a switch(only to give an idea but is not the case)... the total mem pool is part of the CPU, and the same total mem pool is part of the GPU-> so all system DRAM is like VRAM. For this to work it needs coherency and consistency mechanisms, memory coherency more than cache, and that is what hUMA with IOMMU "semantics" provide. Yet nothing prevents the same coherency/consistency mechanisms to work with several mem pools, even on discrete parts ]

So hUMA seems tailored for multi mem pools, and discrete is definitely a possibility ( though "on platform"-> soldered to mobo, and "on package" -> inside socket... will be definitely much more common). So hUMA is already tailored for the Volta like approach, HBM memory on the same subtract or interposed.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Neither of which is for Kaveri. Kaveri will be for FM2+(A88)... i think stated officially... and that platform only has 2 channels of DDR3. So more or other memory only "on package" (inside the socket).

And ESRAM for the GPU will be quite cheaper... specially if 28nm FD-SOI which is ~half the size of 32nm SOI... and it doesn't or have to break hUMA anything.

Kaveri with with 4 channels of DDR... seems like one of those... how we can say, juanrga thing lol
(ESRAM has >100GB/s(could be tweaked for quite more), 4 channels of DDR3 2400 is ~60GB/s)

 

hcl123

Honorable
Mar 18, 2013
425
0
10,780
Games strictly speaking Tim Sweeney is right... it will have very little advantage (if any appreciable) if not coded to take advantage of "memory coherency", and this depends on tools and languages and very little of it is yet there for games. I don't know how much drivers/runtime HSA aware can do that "automatically" for the developer, but i suspect not much.

For physics&effects which are more and more part of modern games, OpenCL & AMP have already the "paradigma" embedded at language level, so it will accelerate on HSA platforms out of the box... but the rest of games ?...

What needs is Visual Studio with LLVM, and so VS DX# compiled directly to HSAIL or an (other ?) extension proper for games like AMDIL (HSAIL and AMDIL could merge). The same is more easy for OpenGL. Then point for the stars to be sit on the moon on worst case... and define "HSA Games"... it could be Ray-traced like [ mixed with raster] for illumination... it could be pervasive OpenCL physics &effects... and it could work with higher abstractions like OpenGL centered or DX centered...
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


KDE 4.11 with GCC 4.8.1. I didn't really try that hard for that benchmark, I can usually get it lower 1:40s, really close to breaking 1:40. I just kind of left Firefox and stuff open and didn't play with "nice" at all or anything. That and I compiled my kernel with preemption because I get tired of having a laggy desktop that stutters (like Windows does) when all 8 cores are loaded.

I do enjoy reading Hajifur talking about windows benchmarks and talking down to game developers about game development. I notice he's been kind of quiet since my $199 CPU managed to beat $800+ of Intel CPUs in raw performance. I noticed I think he's growing kind of afraid of me, when I start addressing him he kind of disappears.

Also, I am curious. It sounds more and more like HSA is going to end up something where you just recompile instead of reprogram? Is this true? I thought that was a massive, massive problem to overcome. I'd find it humorous if I could buy a cheap APU laptop, slap gentoo on it, and then have it outperform my entire CPU only FX 8350 rendering rig.
 

8350rocks

Distinguished


Well, the first one is a server chip...not too likely to have in the average workstation...plus mighty expensive.

The second one isn't even the same size image file...which is an apples and oranges comparison.

Intel is not the best at everything...and actually, they're not really the best at most things...just 1 or 2 things. Get over it hafidurp.
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780
AMD killing AM3 and releasing Roadmap for 2014-2015, no CPU in the next two years... i don´t mind building an APU if they do manage from 50-200% more performance with HSA and HUMA in NEW applications, the boost of 20-30% IPC will also help in older applications, so no big deal i guess.

http://www.techpowerup.com/189707/amd-updates-product-roadmap-for-2014-2015.html

TechPowerUp:

"In 2014 AMD's AM3 socket will retire after a 5-year run at the markets, as would its first APU socket, FM1. By the end of 2013, APUs would amount for 70 percent of AMD processors, while CPUs (chips devoid of on-die graphics), will amount for 30 percent."

I think that means no CPUs until late 2015.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


That is not at all exact, because the performance of a CPU depends on both single-core and scaling, (precisely Steamroller will improve scaling within a module) but I see your point. I got a bit more info about HSA paradigm in processors design: AMD claims that CPUs are good at detecting parallelism on serial workloads and exploiting it, but this is limited to moderate amounts of parallelism. When adding more cores to the CPU and providing parallel code, the CPU wastes resources trying to (re)discover parallelism that has been already discovered by the programmer, doing the extra cores in the CPU inefficient both in power as in die space.

This is the reason behind moving massive parallelism outside of the CPU to a non-latency compute unit (e.g. a GPU). The GPU is more efficient for this kind of parrallelism, not only from a performance point of view, but also from power consumption and die space. AMD doesn't explain what they did mean by moderate, but I think that their current focus in 2-4 cores CPUs (maybe 6) would offer an idea.




As I said before, there is no way to implement hUMA (and its single memory pool) in an APU+dGPU. AMD is implementing HSA in dGPUs. They are implementing a double memory pool DDR3+GDDR5, and are working in solving the problems with the different bandwidths/latencies.
 

8350rocks

Distinguished


You actually can, how they handle it is yet to be determined, but they are already going down that road via the current GPU acceleration tech in the GCN cards.

HTX + PCIe would make this easier, faster, and more efficient. Though it does not have to necessarily be done that way, it would likely be the *best* way to do it.

hUMA has nothing to do with the actual memory pool being unified, only to do with the addressing scheme of that memory and how the GPU and CPU address it/view it.

So, none of this breaks hUMA or HSA in any way.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


In that quote I state twice that AMD is going to implement HSA in dGPUs.



It is more like ~80GB/s for the quad-2400.
 
Also, I am curious. It sounds more and more like HSA is going to end up something where you just recompile instead of reprogram? Is this true? I thought that was a massive, massive problem to overcome. I'd find it humorous if I could buy a cheap APU laptop, slap gentoo on it, and then have it outperform my entire CPU only FX 8350 rendering rig.

Shouldn't even need a recompile. Due to the non-X86 nature of the HSA cores (or whatever you want to call them), they likely will be invisible to the OS, meaning its up to the drivers/hardware to handle thread assignment.

Granted, DirectX, OpenGL,OpenCL, DirectCompute, CUDA and the like should all automatically go to the GPU portion of the chip, but I would imagine the implementation by AMD will be smart enough to load threads to HSA cores if the resources are available. [If not, HSA is DOA]
 

8350rocks

Distinguished
http://wccftech.com/amd-roadmap-20142015-products-updated-kaveri-apus-arrive-q1-2014-hawaii-gpu-late-september/

Looks like FX series is currently accounting for 30% of AMD's new CPU shipments...

With numbers like that I am pretty certain they wouldn't cancel the line. APUs aren't the 80-90% everyone thought they were. Though, they are clearly doing well.
 


isn't kabini already release?
 

8350rocks

Distinguished
Yes it is...

In other news...Steve Ballmer has announced his impending retirement within the next year:

http://www.tomshardware.com/news/retirement-bill-gates-steve-ballmer-surface-rt-valueact,24009.html

Will it be too little, too late for M$? I don't know if they can recover from the fiasco that is Winblows 8/RT/Blue/Surface/Phone/XBone.

If anyone @ canonical is watching...launch a massive ad campaign now and get Ubuntu Linux out into the world while you have the best opportunity. If M$ gets their act together in time to right the ship...Linux may be destined for a small % user base indefinitely.



 

griptwister

Distinguished
Oct 7, 2012
1,437
0
19,460
http://www.amdoverclock.net/amd-updates-product-roadmap-for-2014-and-2015/

And "AMD Gaming" on facebook told their fans:

"Are you a proud AMD Radeon fan?

You'll want to pay close attention to this page starting soon..."

Also, FX accounts for 30% of sales? AM3 is to be phased out, but that means nothing about AM3+. I think you all are over reading this. Until I hear it from the Horses mouth that AM3+ is canceled, I will believe that they are continuing SteamRoller to the socket. They know how many people are banking on this...
 

BeastLeeX

Distinguished
Dec 13, 2011
431
0
18,810


I completely agree. Like Techpowerup said, "In 2014 AMD's AM3 socket will retire after a 5-year run at the markets, as would its first APU socket, FM1." AMD has not said anything about AM3+ yet, and 30% FX processors are not the best numbers, but AMD should know that people like me are still using Phenom II cores, on a AM3+ mobo, waiting for Steamroller FX. Once the FX chip is released, their sales might get a slight boost. And the FX-6300 is looking like a better deal each passing day, especially for low-mid end rigs. But, their is no 6-core Kaveri (that we know of).
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790




Most old software, using only CPU, will have to be rewritten to use the GPU for compute. This is why AMD joined with LibreOffice devs. to enable HSA. Maybe in some special cases a smart HSA compiler can do some magic for old code.

However, some of the software already using the GPU for compute will work automatically with HSA software without any change:

19hc25_hsa.png
 

8350rocks

Distinguished


I wonder if that will be in the Catalyst 13.8 beta drivers when they go "official" as I am running the beta drivers now.
 
Status
Not open for further replies.