AMD CPU speculation... and expert conjecture

Page 430 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

colinp

Honorable
Jun 27, 2012
217
0
10,680
This looks like a mistake. The 5800k is a 100w CPU, meaning that the rest of the system is pulling at *least* 102 Watts...

Which means that Kaveri is using at most 12W. Yeah right.

Either that, or someone is comparing an OC'd 5800k against a non-oc'd Kaveri.



 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Yeah the 5800K number looks pretty high. Something not right there. Ah well the real reviews should be out today/tomorrow.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Overnighted too? :) Grats on the new toy!
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
Cazalan and gamerk316, all current products that you can purchase in stores have some research behind. In the concrete case of exascale compute, the research is being transfered to products that you can purchase. I already mentioned that the AMD original research was based about FSA, which is the precursor of the HSA shipping with Kaveri today. The Nvidia original design included ARM custom cores and Nvidia will be shipping Denver this year. Intel results of research will start appearing in the next Xeon Phi.

noob2222 Read what I wrote. I repeat: I have obtained the IPC increase from taking "the CPU benchmarks". Gaming at 1080p and high-settings is not a CPU benchmark, gamming at 720p and low settings is. See this graph.

qSNrpeA.png


Search "CPU benchmark" in its head, now see the resolution. See the 720p? See the low-settings? Now stops your ridiculous and perpetual attacks against me.

Ranth, I am not sure if we speak the same idiom. A principle doesn't fixes problems. A principle is not a thing made to fix anything, but a general rule of Nature. I am having difficulties to understand from where you got that "the principle of locality will fix all problems with thermal-density".

The research started years ago, but it is not finished. E.g. the needed memory technology is not still ready. There is no still any exascale level supercomputer. I said that the products are scheduled for the year 2018 or so. Therefore, you cannot see any of this in a high-end server today. Again I don't understand why you are asking me about this.

I believe that you are contradicting yourself. You first stated your belief that a discrete GPU is always more powerfull than an APU and then you claim now that Nvidia and AMD are migrating to APU because cannot compete with Intel? Nvidia is already developing a CPU to compete with Intel Xeons. Then why don't release discrete GPUs for exascale? I will say you why, neither Nvidia nor AMD are developing GPUS for their respective exascale projects, because the engineers know that at the exascale level an APU is much more powerfull than a discrete GPU. The reason? I said it before. At exascale level the principle of locality holds. In fact, I can give you even the order of magnitude identified in the research. At exascale level an APU is about 10x more powerfull than a CPU+dGPU. Or said otherwise, a CPU+dGPU with a performance similar to the APUs designed by AMD and Nvidia (20--40TFLOPS) would consume about 10x more power. This is what experts in exascale call the "power wall".

Let me repeat you the same questions that you left unaswered: do you believe that all the engineers from AMD, Nvidia, Cray... are selecting APUs for exascale supercomputer because they don't know that a CPU+dGPU is better? Or do you believe that all the engineers from AMD, Nvidia, Cray... are selecting APUs for exascale supercomputer because they know that a CPU+dGPU doesn't work at that scale?

I asked these questions to you because (your own words) "I just do not believe you". I did, because I suppose that you will agree that all the scientists and engineers working in developing the next exascale supercomputer know a bit the topic.

I am planning to write an article about how an ultra-high-performance level APU would look. I think that I will write a section explaining why a CPU+dGPU cannot achieve the same performance level. If you can wait a bit I would like to write a draft for review it and use your feedback to explain things better.

Yes, I can give you a link to Intel migrating from discrete card to 'APU' (Note that Intel calls CPUs to its APUs)

Xeon-Phi-Knights-Landing-GPU-CPU-Form-Factor-635x358.png

 

colinp

Honorable
Jun 27, 2012
217
0
10,680


The way it should work is that when you have a piece of software coded to take advantage of HSA, then it is executed on the APU.

Otherwise, the GPUs in the system should be treated like cores of a CPU, with work being assigned to the most appropriate resource. E.g. if you have some heavy graphics work, then this goes to the dGPU. If you have some physics or something else then it goes to the iGPU. And if you iGPU and dGPU have the same architecture, then you can have a crossfire option too.

There is precedent of a sort, when people used to use an Nvidia GPU for PhysX and an AMD GPU for rendering. Until one of those lovely companies arbitrarily shut off that option.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Power consumption figures are available here

http://foro.noticias3d.com/vbulletin/showthread.php?t=421821&p=5036332&viewfull=1#post5036332

The above graph is comparing the power consumption of a stock Kaveri against an OC Trinity to get the same performance. However stock vs stock he says that power consumption is much better in Kaveri than in Trinity.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No. You are describing a memory wall. I am discussing a power wall. Both concepts are closely related (both are walls) but are not the same.
 
The problem any sane engineer runs into fast is space, cache is expensive space wise which is why we use dense system memory instead and try to prefetch / predict what's needed several steps ahead of time.


Juan if your referring to memory wall as in Patterson's three walls then you still don't understand palladin's point above ... he is talkign about the cache memory latency and efficiency on the die ... your referring to the size of the memory.

For other's here is the basic premise, courtesy of a very good article by Rich Pell, referenced below:

"Power Wall + Memory Wall + ILP Wall = Brick Wall"

- The Power Wall means faster computers get really hot.
- The Memory Wall means 1000 pins on a CPU package is way too many.
- ILP Wall means a deeper instruction pipeline really means digging a deeper power hole. (ILP stands for instruction level parallelism.)

http://www.edn.com/design/systems-design/4368705/The-future-of-computers--Part-1-Multicore-and-the-Memory-Wall

Intel's entire cache memory structure is faster, but more importantly more responsive (lower latency) and runs cooler, due to better power (thermal) design of the transistor matrices.

Moreover, their prefetcher is smarter ... and wider.

Irrespective of another gigahertz or more in speed ... AMD hits the power wall ... but their IPC is lower.

Note one of the reason's postulated why Intels IB isn't such a great overclocker is they may be hitting the power wall because the cache has shrunk so much it can't simply get rid of the enourmous heat due to insufficient thermal solutions (less area to conduct heat away from the die).


 


But you can't just "use" a low-level API for a platform that it isn't specifically coded to use. Meaning until Intel and NVIDIA update Mantle with the necessary information to work with its GPU HW at a low level, it will remain usable only for AMD GCN based cards.

So yeah, the odds of that happening are about the same as AMD taking up NVIDIA's offer to help them port over PhysX to DirectCompute.
 


The PhysX precedent was different though, since you had to physically install two separate display drivers to get the option to work properly. Different then a split CPU/APU/dGPU config.

But here's the issue I have with the S/W approach: You are going to lose platform independence if you have to handle using the iGPU of the APU in software. How you are going to use an AMD iGPU is going to be different then how you use an Intel iGPU, due to the differences in HW capabilities. Plus you need to handle HSA on a per-app basis, which isn't he prettiest approach out there.

Frankly, that's one area where the Windows API needs a LOT of work; MSFT needs to update the OS to get past the "There can only be one CPU device and one GPU device and one Audio device" mantra, which would make using such resources a lot easier, from a development perspective.
 


I'd be a happy man is some engineer out there could find a way to make the time it takes to access main memory to near zero, so we can start to do away with CPU cache, since its just a really expensive [in terms of space/power] memory speed hack. Same concept as GPU VRAM basically; keep a large cache of RAM on the die, so you don't have to constantly access main memory.

I've always argued the L3, for most tasks, is just wasted space on the die. It isn't significantly faster then access main memory normally would. Might just be better off enlarging the L2 really...
 
I'd be a happy man is some engineer out there could find a way to make the time it takes to access main memory to near zero, so we can start to do away with CPU cache, since its just a really expensive [in terms of space/power] memory speed hack. Same concept as GPU VRAM basically; keep a large cache of RAM on the die, so you don't have to constantly access main memory.

I've always argued the L3, for most tasks, is just wasted space on the die. It isn't significantly faster then access main memory normally would. Might just be better off enlarging the L2 really...

Not possible to get it to near zero, but they can definitely do better then what they do now. Mostly it's the requirement for general purpose expandable memory bus's that holds it all back. You could, in theory, implement a memory array consisting of a few chips wired directly on the motherboard. Each chip would have to be it's own terminated bus to the CPU's memory controller instead of the daisy chained method we have now. Daisy chaining the chips is the #1 reason we have the 7ns refresh barrier, the electrical signal needs to get past the last chip on the last module in the chain before your able to send another signal. When it's just CPU -> chip you can get crazy low latencies and tight timings, but your sacrificing tons of space as the amount of memory per module is limited. Samsung has some 8Gb chips, so four of them in parallel will net you crazy fast 4GB of memory, they say they can make 128Gb modules eventually once their 3D stacking technology is market ready.

Of course that all has to be soldiered onto the board. The moment you introduce some sort of electrical interface you just lost a lot of that tight timing as your signal paths are now longer and using touching metal for a higher signal loss rate.
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780
I been reading AnandTech and ExtremeTech reviews of Kaveri, on the GPU seems good but on the CPU both sites seems rather disappointed, and it does seems that Kaveri is little bit faster on MultiThreaded than Richland and slower in SingleCore performance and both sides claim is not even competition to the i3.

So... what is your final verdict for Kaveri you all?

I think ill be sticking with my 980BE C3 at 3.91Ghz
 

$hawn

Distinguished
Oct 28, 2009
854
1
19,060
Gosh! Kaveri's CPU side is a joke! Its actually worse than a A10-6800K in some cases. The GPU side also isn't doing anything dramatic, and probably it's being choked by less bandwidth.

The only plus point is the great improvement in power consumption. The 45W APU's really shine. Guess Kaveri's maximum potential will be realized in Laptops form factors only. On the desktop side, AMD is now completely dead!!
 
*waits for apology*

Also note, in the gaming benchmarks, it looks like when the res is turned down and settings lowered, Richland tends to win. Once you crank everything up, putting much more emphasis on the GPU portion, Kaveri starts to jump ahead, sometimes significantly. [Nevermind most titles aren't playable at those settings]. 45W offerings look significantly improved as a result of the stronger GPU.

Which tells me the following:
1) GPU is significantly stronger then previous generations [no major shock here]
2) CPU performance is basically flat; IPC gains, clock decrease

CPU wise, anand put it best:

In the broader sense however, Kaveri doesn't really change the CPU story for AMD. Steamroller comes with a good increase in IPC, but without a corresponding increase in frequency AMD fails to move the single threaded CPU performance needle. To make matters worse, Intel's dual-core Haswell parts are priced very aggressively and actually match Kaveri's CPU clocks. With a substantial advantage in IPC and shipping at similar frequencies, a dual-core Core i3 Haswell will deliver much better CPU performance than even the fastest Kaveri at a lower price.

(Done editing this now I think)
 


Why would you be replacing a dCPU with an APU in a desktop system? The FX6350 / 8350 both beat our (I own one too) 980BE's pretty badly.

AT did their usual and tried to sell you on an Intel solution, both by including a $300+ USD CPU with a cheap GPU and by including the rare and expensive Iris Pro. So even with their usual bias Kaveri came out pretty good, for what it's designed for. It's solid for laptops and SFF desktops.

You'd put it inside something like this,
http://www.mini-box.com/M350-Enclosure-WITH-PICOPSU-150-XT-and-150W-Adapter-KIT

You can get a 120w PSU or a 150/160W PSU depending on what else you cram in there. Drop in two sticks of 4GB DDR3-2133 memory and a 2.5 inch HDD / SDD and you have a killer mini-itx "console". I've posted pics of my Richland living room PC (used for playing games on my HDTV at 1280x720). Actually that is a resolution that needs to be tested more often as it's incredibly common for HDTV's to use it when dealing with an input signal over 30hz.
 
what.the.hell.amd.
07%20-%20Kaveri%20CC.jpg

"up to 20% ipc uplift(average ~10%) blah blah"
avg 10% on the cpu? 10% ain't gonna cut it on desktops.
edit: this was one of the nda slides i hadn't seen before. still, 10% more on less power and a vastly superior igpu - not bad. it's "not bad" and not "great" because of the pricing, both retail and msrp.
 
Yeah, guys, Kaveri is a part that goes in laptops and $500 PC's. That's about it. CPU side, it looses out to i3's. I know this is going to make some people's head explode, but that's all this CPU is: A cheap integrated solution.

Still, the Iris Pro benchmarks, while an odd choice at best, do show a disturbing point for AMD: At lower settings, Intel wins due to CPU performance. If Intel ever gets a decent iGPU, its not that hard to believe they could push AMD out of even this market segment.

From TechReport:

We've seen this dynamic with previous APUs, and it's always made for a tough sell on the desktop. Gamers who actually care about graphics performance are better off with discrete video cards that deliver better visuals and smoother frame delivery, while those who don't care about gaming are better served by Intel chips with higher per-thread performance and lower power consumption (which typically leads to lower noise levels.) APUs occupy this awkward middle ground for so-called casual gamers who want something better than an Intel IGP but not as good as a halfway-decent graphics card. As Jerry Seinfeld would say, "who are these people?" Seriously, I've never met one.
 

sonoran

Distinguished
Jun 21, 2002
315
0
18,790
It sounds like AMD focused on making better chips for laptops this go 'round. Not what high performance enthusiasts want to hear, for sure. But from a business perspective it's probably exactly what they needed to be doing.

And gamerk316...I wouldn't hold my breath while waiting if I were you. ;)

 

Something smelt weird about those as I have a 6800K sitting about 20 feet from me and it gets better frame rate then what they had. I believe the devil is in exactly which settings were set to what. As you and I both know certain graphics settings have a large CPU requirement to setup and execute them while others have intense memory utilization. I'll see what Toms puts out and look for the differences in performance profiles as this is something I've noticed on my 6800K, and even my older HP A8-3550MX.

@everyone else

http://www.asrock.com/mb/AMD/FM2A88X-ITX+/

Really nice board to pair up with the 65W version in that M350 chassis I linked earlier. Should fit a Noctua LH9a for a quiet living room casual gaming box. Put a USB wireless keyboard / mouse plug on the hidden front USB port and it'll give a clean look.
 

colinp

Honorable
Jun 27, 2012
217
0
10,680
The sad thing isn't that it doesn't match up to an i5, as I think only Juan here was expecting that. The sad thing is that it isn't an obvious upgrade over Richland. It's better at some things, worse at others. Reminds me of BD over PII.
 


I could see that happening. GPU is more powerful this go around, and since all settings aren't created equal...In short, the more CPU power used, the better Intel looks in comparison. Hence why benchmarking should be done using presets and max/min settings ONLY, otherwise you get unreproducible results across different sites.
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780


Simple, Richland is already 20% faster than a stock 980BE in SingleCore and about 10% faster in MultiCore, so i was expecting Kaveri to improve SingleCore performance close to i5 levels... i was wrong, Kaveri is a step backwards in SingleCore compared to Richland the MultiThreaded performance is an avg of 10% compared to Richland, in CPU side Kaveri is complete FAIL, the GPU side is good but it is being chocked by the Ram Bandwidth just as Gamer predicted, seems he was right the whole time.
 
Status
Not open for further replies.