AMD CPU speculation... and expert conjecture

Page 572 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

nowhere on the page nor on the promo slide does it say that cpu and gpu can manipulate the same data at once nor how. the page and the slide says about sharing resources, using the same memory space, coherency, but not the same data, not at once. in the phoronix link i posted, says hsa is about allowing different processor types share system resources more effectively and mentions shared pageable memory for example, but not "CPU and GPU manipulating cooperatively the same data at once". now it looks like you tried to b.s. through an argument, got caught, and tried to b.s. through again (edit: from the looks of it, the first result from the first page of a google search, instead of in-depth research)... and got caught, again.

if more knowledgeable people find my statement incorrect, feel free to prove it with credible, technical explanation. i'd be happy to be wrong since it'd mean amd has revolutionalized throughput computing and hsa.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


It is true that I predicted this was going to happen (I did the math about one year ago) but here I want emphasize what AMD already admits openly

http://www.theregister.co.uk/2014/06/20/amd_25x20_power_efficiency_pledge/

http://www.amd.com/en-us/press-releases/Pages/amd-accelerates-energy-2014jun19.aspx

Recent AMD papermaster talk about the plans of the company for the year 2020 don't mention dGPU in any moment, because AMD knows that dGPUs don't scale up. Papermaster clearly mentioned that silicon alone cannot bring a 25x efficiency gain and that only APUs with HSA can achieve that goal

Current A10-Kaveri: ~800 GFLOPS on ~100 W

Multiplying by 25x efficiency

APU of year 2020: ~20 TFLOPS on ~100 W

Intel will start selling a 200 W processor the next year. If we consider that higher TDP then

APU of year 2020: ~40 TFLOPS on ~200 W

When Papermaster mentions "25x" he is rounding for marketing purposes. Moreover, the above numbers lack the memory on die because we use Kaveri as baseline. I did all the math and I expect something more close to 250W for a 40 TFLOPS APU. Nvidia engineers also did the math and they expect their own 40TFLOP APU will be rated at 300W TDP.

AMD APU of year 2020: ~40 TFLOPS on ~250 W
Nvidia APU of year 2020: ~40 TFLOPS on ~300 W

TDPs can vary ~50W depending of the amount of memory on die and its bandwidth.

There is no way an engineer can obtain 40TFLOPs, using a 250--300W dGPU. Silicon will only provide a ~2.8x thus

R9-290X: ~5.6 TFLOPS on 290 W

dGPU of year 2020: ~16 TFLOPS on 290 W

...................................................................................

We can continue with this interesting stuff. Using the APUs to build a fast supercomputer for the year 2020

100000 APUs of year 2020: ~4000000 TFLOPS on ~20000000 W

This correspond to an exascale supercomputer on 20MW, which is precisely the DARPA goal for the year 2018-2020. This is why Cray, Nvidia, AMD, Intel, Fujitsu... are designing future supercomputers around APUs/SoCs. Nobody in the whole planet is using dGPUs because don't scale up.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780
Juan, you still don't get my point. You act like APU will solve all the problems of dCPU and dGPU and let it scale. But you forget that HPC APUs will need more than one APU to scale. And at that point you end up in the same situation as you would with dGPU and dCPU. Off die dCPUs and dGPUs sharing coherent memory. Meaning that what I'm saying is going to have to happen anyways. The only difference with APU is that each socket is shared between CPU and GPU. You will still need to transfer data across each APU as well. That bottleneck is still going to exist when you stitch all your APUs together. Yet you somehow think that invalidates dCPU and dGPU in every single market because HPC is pushing for many APUs.

So all these issues like PCIe 3.0 latency and such are going to have to be fixed eventually. There is no way around it. And that fix will apply to dGPU and dCPU systems as well.

And as others have noted, APUs will not scale to as many cores as discreet counterparts. People are also forgetting that dGPU and (probably) dCPU will also scale in performance as time goes on. APU is not the only device that will be increasing in performance until 2020.

The problem with APU like you are saying, is that it introduces many more bandwidth and latency problems between each APU as you need more APUs to do the task of a single large GPU.

In 2008, 6 years ago, we were using 8800GTS as the high end and we were transitioning away from the AGP bus. 2002 we had the 9700 from ATI. Do you see how APU growing so much in 6 years does not make it a special snowflake?

Let that sink in for a moment.

Your other problem is you are basically comparing a more efficient mid range GPU on an APU to a high end dGPU. Would you care to explain to me how having a CPU which has low throughput thrown on every single GPU in the system would be better than many smaller, more efficient GPUs on their own? Why have those CPUs sitting there contributing to TDP and power consumption when you can just have one CPU with many smaller, more efficient GPUs?

Who is to say that an efficient dGPU like 7790 or 750 Ti won't just scale like you're saying with APU? It's clear that high end parts are having issues, but if you take the GPU from an APU that's scaling like that and do something like put it into a socket (sort of like Knight's Landing) you save on having a useless CPU adding to TDP and stuff.

And as I said earlier, dGPU shared like that will have the same exact problems as sharing many APUs.
 

jdwii

Splendid


Can we stop with the beloved nvidia crap as it stands they have a superior design that uses 30% less power for the same performance. Nvidia does a lot of closed crap but they are good at what they do....graphics.
 

jdwii

Splendid


Show me a link proving that it has IPC on level with haswell its your claim prove it. Also lets all set and love juan for acting like TFLOPS are the only performance measure. If that was true we would see Amd's 280X be around 25% faster ALL the time compared to a 770gtx. Also claims mean nothing at all and under his finding's he used 25% which wasn't anywhere enough for your claims of 20+Tflops which you still haven't told everyone yet that its not the only performance metric.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I am saying you that I cannot predict what you asked me about framerates for the year 2016, what part of my message: "It is not easy to predict, because it depends of lots of factors: market evolution, foundries roadmaps, memory maker roadmaps,..." you didn't get?

Second, don't forget that those claims about GPUs are not only from mine, but also from GPU companies such as Nvidia.

Third, the market couldn't care less about what you care ;-) We know that future games will use more and more the GPU for compute, not only for rendering.

Moreover, leading game developers are already studying the abandon of rendering techniques used in current GPUs and replace them with a new arsenal of compute-based rendering tools. One of the more interesting parts of those new techniques is that avoid the use of intermediate layers of graphic APIs, and rely on general languages such as CUDA to access directly to hardware general compute capabilities bypassing the graphics pipeline completely.

This is a very very old demo of rendering using CUDA, but can give you an idea

http://hothardware.com/News/NVIDIA-Shows-Interactive-Ray-Tracing-on-GPUs/

Finally, it doesn't matter if you care about KL or not. By outperforming a 290x, the KL puts in place to those 'experts' here who claimed it would be impossible to hit 290x level raw performance. In the second place, KL will put lots of pressure on the GPU divisions of both Nvidia and AMD. Both companies will need to accelerate their respective SoC plans.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I already suspected that I would not spend time on explaining this to you, because you would not understand anything and you would return with perennial fantasies about "caugth", "bs", "lies"...

I gave you a pair of links but I am not surprised you cannot understand either.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Your point was got and debunked earlier, but you act as if it never was.

Your strategy starts by making a fallacy. In the post just above the yours, I mentioned how one will need about 100000 APUs for 20MW HPC. You start by ignoring what i said with your fallacy "you forget that HPC APUs will need more than one APU to scale".

Then you continue by adding another fallacy. You pretend that an APU in a socket needs to feed APUs in other sockets, which is false. Inside an APU the CPU feeds the GPU. But an APU doesn't feed another APU. Only in some special cases an APU can require computation from another; however, the software is being explicitly designed for reducing such situations to a minimum and running computations locally.

In your CPU-in-one-socket-and-GPU-in-another-socket idea, we would always have to move computations between CPU and GPU. Wasting latency, bandwidth, and energy. Your idea brings a much slower and power hungry system. This is the reason why no engineer consider your idea (APUs are used instead).

Finally you post another fallacy, pretending that CPUs and GPUs will scale magically forever, when we know this is untrue (you ignored anything of what I wrote about this).

The funny part is that what you propose, using GPUs on sockets, is an old idea that was evaluated and rejected by AMD many years ago:

AMD's leading software expert Neal Robison said that Fusion-architecture - which integrates general-purpose [x86] processing cores with highly-parallel stream processors of Radeon GPUs - is a better solution for high-performance computing than to install special-purpose accelerators into CPU sockets. According to AMD, "it makes more sense from the software developers standpoint". Besides, it investments into "tool has already been made so we might as well use it". It looks like the once proposed Torrenza platform is no longer even considered as viable.

"APU is a better and cleaner solution than sticking a GPU in the same socket," said Neal Robison.

http://www.xbitlabs.com/news/other/display/20111211180811_AMD_GPGPU_Accelerators_in_CPU_Sockets_Make_No_Sense.html

I find amazing that Gamerk expects that the green company that said that dGPU will be replaced by 2020 will continue making dGPU for him, and you expect that the red company that has killed its Torrenza platform will do GPUs in socket for you.
 

your parroted links do not contain any remote indications about how this:
works. i call it parroting because you utterly failed to provide any personal understanding to how that even worked. it's safe to call you a liar now, since your own provided info do not contain any such claims made by amd.
seems like you can't explain because you don't even know how that would work. even if amd had such technology you sure didn't provide any link to that.

edit: i re-read you reply and.."I already suspected that" - you already suspected that you'd get caught lying? or did you expect that if you lie you'd be called out?...then don't resort to lying and post credible and properly represented info, LOL.
 

jdwii

Splendid


No you can't use the same memory at the same time. Also for further notice tell people what fallacy they are committing not just call it a fallacy
 

jdwii

Splendid


They won't get rid of them until they see fit that happens. Juan claims they have no DGPU planed but why do they need to tell us what they are making in that market 6 years form now when most of the information juan provides us mainly talks about servers not gaming rigs. I could swear juan claimed that we can't put sound on are video cards however sound cards exist how does that even work using his "logic"?
 

you can use the same memory space - memory coherency. for example, multicore cpus use cache, system memory this way. the cores can read the data, but as soon as one core changes the data, the rest of the processor must be notified of the change.
or in multisocket servers (out of my knowledge-scope) where hypertransport or qpi links are used for maintaining coherence.
 

jdwii

Splendid
http://cpuboss.com/cpus/Nvidia-Tegra-4-vs-Intel-Core-i7-4650U
Man i can't even begin to find any benchmarks comparing Arm's best with Haswell this is the closest thing i got for IPC they are clocked around the same(arm is clocked 200mhz more Tegra 4) yet its still 38.4% slower on geekbench and clocked 11% higher meaning Tegra is around half the performance per clock compared to haswell(cough PD). Based on proof and not claims i don't see anyone using Arm getting haswell IPC soon. Edit also looking into it it seems like geekbench favors soc.
 
DGPU will eventually be like sound cards. Will take a while tho. It gets to a point where even gamers don't need or want more than what integrated will give them. Technically, the consoles are already gaming PCs that run integrated graphics. Given it a couple more generations and who knows.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


You couldn't find worse benchmark? Use Phoronix, there are a lot of ARM/x86 benchmarks.
 

jdwii

Splendid


None came up i tried finding a benchmark for Arm vs Haswell IPC. Very hard
 

colinp

Honorable
Jun 27, 2012
217
0
10,680

Don't take that condescending attitude. What I get is that you are making bold claims about dGPUs being obsolete by 2020, but can't even articulate what the picture will look like in 18 months.


Don't take that condescending attitude. The market does care what their customers want. And gamers want high frame rates at native resolutions with all the sliders to the right.

Finally, to reiterate, can you please stop taking that condescending attitude? I'm here to have a discussion, ask some questions, get some answers, etc.
 

ah, discreet sound cards... good times. yeah, it'll get to a point where consumers will have "good enough" gaming performance (e.g. 1080p @60fps and decent antialiasing) from cpu and igpu at cheap price. carrizo might do that.
i wonder how much power those socs will use after a node shrink. and if someone would jailbreak a ps4 and run x86 o.s. like ubuntu or freedos on it.... an soc like that in an intel nuc or gigabyte brix form factor case will make a great living room gaming pc.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Because Haswell is so much different from SB and IB.
 

8350rocks

Distinguished


Except the name is decided, just not yet released.

Also, if they aim to outperform PPW, and do so at similar TDP to Intel, how can you say their goal is not outright performance?

If PPW is greater on a chip with ~70-80W TDP over another chip with similar/same TDP, is outright performance better? Yes!
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


Actually, AVX2 can make a massive difference. But people insist on running Windows only and enjoying letting half their CPUs sit idle because all it does is run x87 or SSE2 code all day long and they are so afraid of moving away that they get all grumpy and defend things like crazy.

Also Juan, my point still stands. Yeah, the previous slide does have APUs, but I took those slides to mean that they were going for APU implementations right now and that they were leaving the door open to dGPU + dCPU HSA/hUMA later on.

No one said a dGPU in HPC needs to be on a discreet card and to be the biggest GPU you can find. APU is efficient because it has a GPU that offers very good performance per watt. Not because it is an APU.

You compare highest end dGPU to APU and then go "wow the PPW sucks!" GTX 750 Ti was twice as efficient at cryptocurrency mining as 260x.

https://litecoin.info/Mining_hardware_comparison

290x, not overclocked much, is about 700KH/s. That's a 300w card. It is twice as inefficient as GTX 750 Ti at 60w for 256KH/s.

If you are going to compare APU GPU for PPW, try and compare it to a GPU of a similar size and TDP. And you will see the benefit not only goes away, but it is worse on an APU because you need all the CPU stuff on the same die. You also have to optimize the process for GPU and CPU instead of just GPU.

And as I've been saying, you're going to have the problem of inter-chip communication regardless of if they're GPUs, CPUs, or APUs.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Not only Windows, most Linux software is compiled for Pentium 4-class CPU. Exceptions are distributions like Gentoo, but in general most Linux, Windows and Mac users install precompiled software. There's also Java which could be optimized to particular CPU during runtime, but it's almost non-existent on desktops.

Phoronix is biased, because they use -native flag during compiling their benchmarks, but it's the best we have.
 
Status
Not open for further replies.