AMD CPU speculation... and expert conjecture

Page 263 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I consider a 20% and a slight reduction in clock freq. compared to Richland.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The possibility AMD could release APUs as base gaming systems that can be upgraded with a dGPU, in whose case, the iGPU could work exclusively for compute was already considered and discussed before in this thread.

In fact, I will add more data supporting this speculation. We know that AMD is moving all the heavy compute tasks from the CPU to the GPU. E.g. the PS4 will use the GPU for complex processes which are traditionally handled by the CPU such as physics simulation, collision detection, raycasting for audio, decompression...

Therefore, we have for kaveri:

i) The CPU alone has about the same performance of an i5/FX-6

ii) Heavy computations are moved to the iGPU, which implies that the CPU will perform better for the remaining tasks.

iii) The iGPU has more performance than a hypothetical dual socket FX-9590s OC at 6GHz.

iv) FX-series R.I.P.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No. I did mean what I wrote.



There is no way that Kaveri will be 4.5 GHz. I am convinced that will be clocked slower than Richland.



There is no SR FX for the 2p-4p and it looks more reasonable to me that Warsaw CPUs will be substituted by an Excavator APU/CPU in 2015, just like Opterons are being replaced by Berlin in 2014. I also hope Excavator to come in 2/4 cores configs.

I see no reason for PD refresh in AM3+. For me, the FX-9000 series is the last desktop line.



Yes, I gave this link a pair of days ago in this thread.
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


got any sources of all your " facts from AMD" or do you just start rumors.

you never seem to provide any sources at all.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


It was more like VR-ZONE launched the rumours that kaveri is delayed and the FX-line is cancelled and AMD replied to VR-ZONE denying only the rumour about kaveri...

Of course, this does not imply that AMD confirmed the other rumour, but the asymmetry in their treatment of both rumours is curious.



There are other two advantages of hUMA. The first is the the system does not need to maintain a copy of VRAM on RAM, freeing up memory. This means that a HUMA APU with 8 GB has more available memory than an traditional CPU+GPU with 8+2GB.

The second advantage is that hUMA allows that both CPU and GPU can use the same pieces of data at the same time. During GDC 2012, AMD already mentioned how the possibility of utilizing both processors at the same time allows them to perform certain complex tasks that couldn't be performed with the traditional PC architecture.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


ummm... Cell of the PS3 used DMA (direct memory access) to talk to the ESP... plenty of copies... XBone CPU can talk directly with the ESRAM without copies, at least that is how i understand it. No.. its not the same thing since its only 32MB, but will alleviate those copies a great deal.

And bonus is that "from" the ESRAM will reduce latencies a great deal. The rest XBone GPU will have the same scheme of Llano/Trinity, it will have its own partitioned DRAM space.

 
There are other two advantages of hUMA. The first is the the system does not need to maintain a copy of VRAM on RAM, freeing up memory.

...

...

You know that doesn't happen right. It's actually impossible due to the non-kernel memory address range on Windows x86 being 31 bits large (2GB max) and the kernel range being another 2GB max. There was a time back when cards had MUCH smaller memory sized (think 32~64MB) that his happened, AGP era basically. That has long since passed and now the contents of VRAM are left there or paged out in a 32-bit application. 64-bit Windows is a bit saner, it'll try to keep it there but can copy it out if you task switch to another application. What is kept in memory is a copy of the current scene, that tends to be MUCH smaller then the size of the VRAM and allows the program to start rendering again though it'll have to reload resources.

Anywho HUMA isn't anything new, just a new word for a concept that's been around since before the 8087 coprocessor. HSA is just a set of language protocols and standards that allow non-uniform processors to talk to each other and cohabitate in the same system without requiring different memory subsytems.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


No... Just because the AM3+ might very well be EOL, doesn't mean FX is R.I.P. Steamroller FX might very well need another socket, one notable difference is that it might have PCIe support along with support for HTX slots on board.

PCIe everybody understands it could be similar to intel, HTX ppl would also understand if they knew AMD "server" boards. The catch is that hUMA "needs" cache coherency, and with HTX is not only APUs that can have hUMA but also the traditional configs with discrete graph cards.

Combo "PCIe+HTX" slots i even presented the AMD patents here... put a PCIe board in there and it will be like today, put the same but with a HTX ready board (which probably will be AMD only for a long time, doubt nvidia will have it any time soon if ever) and that discrete GPGPU will be hUMA.

No wonder Intel freaked out, and is making almost all Broadwell line BGA... makes something smell like rotten for the sides of Nvidia, like something is dead lol...

 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


There is one notable difference, x87 used the same MMU/TLB of the CPU, it was a closed coupled co-processor that later jumped inside the CPU.

With hUMA each "heterogeneous processor" uses its own MMU/TLB, in the lines of IOMMU of AMD, which v2.5 is now an official HSA standard.

That is what makes it different... hUMA its cache coherency + IOMMU (kind of). And is like this because i think the intention is make it real flexible, its not only for close coupled "processors", it could function with different processors GPU, DSP even FPGAs, on interconnects soldered into a board or in another slot or socket.

Matter of fact what makes it similar to the original x87 is the FlexFPU in those modules, we can say the FlexFPU is a hUMA closed coupled (same MMU/TLB) co-processor.

 

hcl123

Honorable
Mar 18, 2013
425
0
10,780
Another advantage of HSA that nobody talks about is "task queues".

Those "task queues" in hardware will look like another small buffer, but there is a catch... with advanced enough interconnects it could make the "heterogeneous" processor a bus master element, that is, the CPU can be put to idle while the heterogeneous is crunching, kind of makes the heterogeneous part "independent" of the CPU, since those "queues" will be visible from "userpase" (for applications) without the CPU having to schedule those tasks.

For a GPU is like an evolution of the nvidia Fermi scheduling scheme (better with that), which Kepler represented a step back (1.5B lol).

For one thing it means you don't need the absolute top performance CPU to have the absolute top performance out of the GPU... dedicated task queues scheduling, hUMA, bus mastering/ "lightweight notification"( which didn't went for PCIe v3... thanks intel... but HTX can have it http://www.eetimes.com/document.asp?doc_id=1171458 ) the GPU(s) can be at 120%(lol) even with a xfire config( thanks LwN), while the CPU reports periods of idling lol
 


I think the default is 256/512MB or so reserved by the system for this purpose, but still, even if the working set of data is small, you still have to copy it to the GPUs VRAM. That's the step thats being removed by HUMA.

And I'd argue that old 32-bit 2GB barrier is still a major limiting factor on PC gaming. We still get the occasional game that isn't compiled as Large Address Aware (*coughSkyrimcough*), which enables full 4GB worth of Addressing on Win64 (and 3GB on Win32 if the /3GB switch is set) [And I'm going to totally disregard the existence of PAE/AWE for this discussion]. Obviously, native 64-bit binaries would be best, since you could theoretically support a 16EB Address Space, pending Software/Hardware support.
 


Your thinking address space being reserved, he was speaking about actual memory being used. As in 2GB of system memory being used to keep a copy of the 2GB of data in the VRAM. Basic knowledge of Window's memory model renders that laughably absurd.

Having both GPU and CPU using the same memory space has huge advantages when it comes to binary languages. Currently if someone wants to write code to run on the GPU (GPGPU / OpenCL / ect..) its a completely different code set then what you send to the CPU. With HSA / HUMA you can compile a single binary that contains both sets of code as a single stream. It's basically SSE on steroids.

My point is that the existence of a large powerful vector co-processor does not reduce the usefulness of multiple scalar processors. They each specialize in different work loads and do not replace each other. What that large vector co-processor allows is physics / AI and other array math heavy applications to run better then before, provided you have a dGPU. This is why I speculated that AMD will release a 6 ~ 8 core version with a cut down iGPU that only does HSA procesing and acts like a co-processor instead of rendering graphics.
 

griptwister

Distinguished
Oct 7, 2012
1,437
0
19,460
@palladin9479, Out of all the rumors and theories here, that would make sense if AMD canceled the FX series and the AM3+ or AM4 line up. If we had something a little more concrete though... Those gigabyte MoBos are pretty looking. I'm thinking about getting a FM2+ Processor just to say I have a sense of hUMA.
 

mayankleoboy1

Distinguished
Aug 11, 2010
2,497
0
19,810


i don think this will be happening in kaveri. i think amd needs a die shrink+excavator arch to do this
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I was discussing a PC with 8 GB RAM. I was assuming 64-bit all. I didn't mention Windows...

In an ordinary PC architecture, all the textures/shaders/vertices have to be uploaded to the VRAM from the system RAM. This means that your graphics data will temporarily be in both system RAM and VRAM. It does not matter if once the upload is done you may choose to get rid of the copy in system RAM. At some instant data is duplicated.

The question about how many RAM is used at a given instant is complex. E.g. it depends if textures are stored uncompressed or compressed on RAM. If I recall correctly, the unreal engine use 4:1 and 8:1 compression ratios for textures. It also depends of the file format. Some formats can contain more than one copy of the same texture, each copy compressed with a different algorithm for use on a specific GPU.

My point holds: an APU hUMA 8GB PC has more available memory than a traditional PC with 8 GB RAM + 2GB VRAM. Some game developers recommend 12GB RAM for PCs to try to replicate the 8GB of the PS4 by this motive. Epic uses 16GB in the PCs.



Well, AMD claims that the center of its universe are APUs. Both server and desktop roadmaps show no Steamroller FXs are coming. AMD want to unify sockets. There are strong rumours that the FX line disappear...
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


This is already available to jaguar cores with hUMA support. From AMD:

PS4 will have hUMA wich means that you no longer need a distinction between CPU partition and GPU partition. Both processors can use the same pieces of data at the same time. You don't need to copy stuff and this allows for completely new algorithms that utilize CPU and GPU at the same time. This is interesting since a GPU is very strong, but extremely dumb. A CPU is extremely smart, but very weak. Since you can utilize both processors at the same time for a single task you have a system that is extremely smart and extremely strong at the same time.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Then you will laugh at this quotation:

 
^^ The point Palladin was making though, is that whole 2GB of VRAM won't be used in one shot (which is correct). More VRAM is just like more RAM in a system: You avoid having to constantly move data in/out of it to make room for the dataset you are currently working on.

So when you eliminate VRAM and let the GPU access main memory directly, guess what? That VRAM bottleneck GOES AWAY. So, for instance, if you took a 512MB, 1GB, and 2GB version of the same exact card on a PC, benchmark it, you will see the 512MB version of the card does the worst, especially on high settings. Why? Because the GPU is waiting on data to get to the VRAM, which often means something has to move OUT of VRAM to make room. Meanwhile, the GPU is sitting around doing nothing. The 2GB version of the card, by contrast, can just keep loading more and more data into VRAM, so you don't have the performance loss due to VRAM being filled. Thats the type of bottleneck HUMA does away with.
 

Ags1

Honorable
Apr 26, 2012
255
0
10,790
Why would AMd kill FX? They have a modular arch so it should not take much investment to scale up to four modules even if two modules are the focus. FX may cater to a niche but it is a much bigger niche than, say, the Nvidia Titan niche. If Intel push up their prices it opens the market to AMD.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


A $200 APU using HSA benchmarks compared to a $999 Intel EE using CPU only would result in the Intel EE getting slaughtered.

I'm imagining something along the lines of comparing perhaps:

AMD CPU + iGPU for OpenCL physics/other GPGPU tasks + dGPU for rendering vs Intel CPU + dGPU for rendering.

Both of which are running the same exact code, but the Intel CPU has to do the work of the iGPU for the AMD rig.

Perhaps imagine a situation we can see now when we can compare AMD CPU + Nvidia GPU for PhysX + GPU for rendering vs CPU + GPU for rendering with the same PhysX settings but the CPU + GPU combination has PhysX running on the CPU.

AMD has a huge chance with these console wins to get a massive software advantage over Intel in benchmarks, which would be the biggest win AMD could achieve. AMD didn't even have that when they had better CPUs.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Perhaps because it's getting close to 2014 and with AMD talking about hUMA all the time the FX without any graphics at all starts to look like a relic. Marketing wise that's a challenge, combined with the initial bad press for Bulldozer FX.

They can always revive it later when it makes sense to do so. With the mobile/low power push today the big chips are being released on a slower cadence. That's even true for Intel.
 

griptwister

Distinguished
Oct 7, 2012
1,437
0
19,460
Hmmm... there must be something big going on... AMD is really clearing house. Also, I noticed Newegg doesn't have the ASUS 990FX R 2.0 Gen 3 listed anymore... I'm starting to buy this, "AMD is moving the FX to the FM2 Platform."
 

iafro1989

Honorable
Aug 19, 2013
2
0
10,510


That mobo is just out of stock.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


And mapping VRAM <--> RAM does not need to be one MB to one MB. I already mentioned one example of how a texture can occupy more RAM than VRAM.

His explicit mention of the "Window's memory model" makes me believe that he misunderstood completely what I tried to say, because what I was saying is completely independent of the operative system used.

My point was that a traditional PC architecture always has redundant data stored on both RAM and VRAM for applications such as games. How many of that total data is redundant and how many RAM occupies depends on lot of factors: from game settings to the game-engine internals.

The new hUMA architecture eliminates those redundancies and free up system memory. That is one of reasons why some game developer recommends 12GB RAM for PCs:

Next-gen consoles adopt 8GB of unified memory as a baseline. In contrast, PC operates two distinct pools - system memory (DDR3) and video RAM (typically, GDDR5). Our advice for graphics is to get a card with as much GDDR5 as you can, but system memory also has to be factored in.

Linus Blomberg of Avalanche recommends 8GB of DDR3, while another of our sources believes that 12GB is a safer bet for future-proofing your PC, bearing in mind the overhead required by Windows combined with the fact that graphics data needs to spool from system RAM into the GPU's onboard memory.

This, from an AMD seminar, is also interesting:

Game developers and other 3D rendering programs have wanted to use extremely large textures for a number of years and they’ve had to go through a lot of tricks to pack pieces of textures into smaller textures, or split the textures into smaller textures, because of problems with the legacy memory model… Today, a whole texture has to be locked down in physical memory before the GPU is allowed to touch any part of it. If the GPU is only going to touch a small part of it, you’d like to only bring those pages into physical memory and therefore be able to accommodate other large textures.

With a hUMA approach to 3D rendering, applications will be able to code much more naturally with large textures and yet not run out of physical memory, because only the real working set will be brought into physical memory.
 
Status
Not open for further replies.