AMD CPU speculation... and expert conjecture

Page 428 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Looks like DX12 is focusing on fixing Driver latency related issues; don't I love being right?

http://www.fudzilla.com/home/item/33638-maxwell-is-not-directx-12-next-compatible

From what we heard, DirectX Next actually fixes a lot of latency related issues that are present in DirectX 11 and earlier versions. The new DirectX should have lower driver latency something that developers have complained for quite some time but we are not aware of any major feature set that will come with the DirectX Next.
 

Master-flaw

Honorable
Dec 15, 2013
297
0
10,860
Still praying for Mantle...
Not to concerned about GPU upgrades a year or two down the line...My Xfire set-up should last a bit longer than that.
though it pretty much was a given that DirectX 12 would concern such things.
 

Magiks

Honorable
Jan 13, 2014
5
0
10,510
I've been following this thread for a long while now... but I decided to create an account to post this...

Besides being a cheap gaming platform,

Kaveri solves these problem:

redundant copy of texture data stored in memory.(hUMA)
reduced latency for GPGPU work.(HSA)

I think the main purpose of "GPU cores" on APUs is not only for rendering graphics, but to fully utilize the capability of current GPU technology to perform parallel calculations to speed things up. Physics calculation is an Example.

"GPGPU? I bet my dGPU can perform more calculations than an iGPU!"

a large dGPU can certainly perform more calculations than an iGPU but, everytime you need to do a calculation on a dGPU you first need to copy you data to the dGPU's RAM and after retrieve the data once you've done the calculation, redundancy and latency.

clothes phasing though body anyone? i think kaveri's HSA could shine here, the iGPU can be used to calculate all the dynamic vertex and pass it to the dGPU while still delivering good frame rates compared to having the CPU do the calculation...


Above is my assumption of how the current technology works, may not be 100% accurate and not well articulated... lazy...
 

Rum

Honorable
Oct 16, 2013
54
0
10,630


Well if it turns out to be true then Mantle served us all greatly in that it pushed M$ to actually work on and fix their crappy API! But Mantle will still serve a purpose because it will be multi platform compatible.
 

colinp

Honorable
Jun 27, 2012
217
0
10,680
Open question then: Will the software / driver know which bit of work should be done on the iGPU and which should be done on the dGPU?
 

i was okay with it with llano since it was the first one (well, not technically. but that's how i see it). knowing about the ram bottleneck, i was disappointed with trinity because i was expecting amd to improve pd's imc to improve 7660D's performance. kaveri 7850k's igpu will very likely have higher bw to be fed by the memory. if the igpu gets over 30GB/s @max, it'll perform very well. that's over 6-10 gigs the current imc can extract. i am more anxious after seeing how much haswell can squeeze out of ram.

the current benchmarks are making me wonder if hsa-accelerated/assisted programs will be memory bottlenecked. from what i've read, ddr4 isn't looking to improve bandwidth, rather power use - which will be more significant in servers than in desktops.
the rumor of kaveri possibly having GDDR5 support made me ecstatic. it woulda been worth it, especially in laptops and bga desktop solutions. since laptop oems tend to cut as much corners as possible, they might use ddr3 1333-1600, or a single dimm to further lower igpu performance. then there'd be the thermal restrictions.

tomorrow we'll all find out more, i guess. let's hope amd doesn't split up reviews like the last time. it'll send a bad message.
 


"Multi-platform", meaning PC only, limited to just GCN based AMD GPU's.
 


This is a question I've been trying to get answered, to no avail.

What I think is going on, is that, when by itself [no dGPU], the GPU part of the APU is going to be treated exactly the same, software wise, as a dGPU would. When paired with a compatible dGPU, then you'd get a form of hybrid-CF, which has been around for some time, and would be controlled via the GPU driver.

Now, when you have the APU paired with, say, a NVIDIA GPU, would their be any performance advantage? That's what I'm not sure of, since I've heard ZERO implementation details.
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780



Yeah, not even 9% more IPC compared to Trinity, while it`s supposed to be up to 20% against Richland, Skyrim is one of those games that benefit from either more Speed or Higher IPC in CPUs, Skyrim should be a very very good way to benchmark Kaveri against Trinity SingleCore performance.

http://media.bestofmicro.com/0/N/387527/original/F1-2012.png

Sp yeah... Richland is 11% faster than Trinity in Skyrim, and Kaveri having a lead of less than 9% Against Trinity is LAUGHABLE.

I suck at math, but even i know that Kaveri having less than 9% a lead against Trinity in Skyrim puts it behind Richland 11% against the same Trinity.

I may just pick a Kaveri if it offers at least a 10% more IPC in Sinle Core performance compared to Richland, i need the SingleCore because i mod and play Skyrim and Oblivion Heavily, other games such as Saints Row2 too which is another SingleCore based game.

Not too interested in Modern games, even less the FPS War Games... so MultiThreaded performance is not much of my concern, that`s just myl needs.






I also been wanting to know if any Application will benefit from HUMA, or that the application needs to written specifically to take advantage of HUMA.
 


You've basically just summed up the "Bandwidth vs Latency" discussion.

For GPU's, since they number crunch a LOT of data at any one time, bandwidth is more important to performance then latency is, since you may have several hundred MB worth of data to pump across. That's why dGPU's have, what, 2GB+ of high bandwidth RAM right on the card now? Its generally faster to pump a lot of data across the bus once, rather then a little bit at a time. Hence all the discussions on RAM speeds choking Llano/Kaveri GPU performance.

Here's the big HSA question I've yet to see anyone answer: Say you have a Kaveri APU, and NVIDIA GPU. You have some application which can take advantage of compute functions [say OpenCL], and is also coded to support HSA. On THIS particular setup, will the iGPU in the APU have any benefit toward performance?

The issue here is a fairly significant one: Windows only supports one GPU driver at a time, and you can bet NVIDIA isn't going to support a config with AMD GPU's. So how the heck is Windows going to know to use the iGPU of the APU? They'red have to be dedicated HW inside the CPU HW itself to handle the scheduling, since the iGPU cores would be invisible to the OS as far as scheduling threads is concerned. This implementation though opens up the use case where iGPU cores are used in a way they shouldn't be, reducing performance [Remember: Individual GPU cores are very, very weak]. Its this implementation details I still want more information on. [When using an AMD/AMD config, you could handle this case via GPU drivers. Intel/NVIDIA configs are also handled GPU driver side. Its the AMD/NVIDIA case I find interesting...]
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780
I bet that Kaveri would have been sweet if AMD would had manage to match Richland speeds of 4.1Ghz Def and 4.4Ghz Turbo, the 20% architectural improvements are surely there since AMD manage to push almost same performance as Richland with 15% less speed... Damn you GloFo.

 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Of course the iGPU is still bottlenecked. Richland was bottlenecked and it's running the same speed memory (officially). There are more GCN cores so it should be supporting 2400 by default but it's not.

As for HSA you're unlikely to see very much support yet. It's still an emerging standard. So far the OpenCL performance looks a great deal better due to GCN. This Raytrace one showing 2.58x faster. With more programs supporting OpenCL this can be a great advantage going forward.

A10-7850K-OpenCL.jpg


http://wccftech.com/amd-kaveri-a10-7850k-overclocked-45-ghz-benchmarked-a105800k/

Overclockers may have some fun if they can push it to 4.5Ghz/900Mhz on water like this reviewer.
 

i was thinking, like eyefinity or quick sync, being supported by at least 2-3 widely used softwares e.g. handbrake transcoding, compression softwares scaling to all "12 compute cores", photoshop/gimp/blender, 3ds max etc. but not like vce, the hackneyed encoding quality and perf instead of the promised cool stuff. details will be in the upcoming rant.

ah that preview. i hope that the reviewer was using liquid cooling just to be safe instead of being forced to use it. at first glance it reminded me of reading ivb o.c. analysis a year after sandy bridge came out. :p it's one thing to push leaky pre-release silicon (possibly on an open test bench) and another thing to o.c. retail parts in a retail component-built pc. learn from intel's cruel pranks, customers.
speaking of overclocking, i think sarinaide woulda been here gushing about kaveri performance by now (by now i mean for 2-3 months before launch). i always enjoyed reading his o.c. endeavors.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


The design improvements for Kaveri increase multi-core "IPC" not really single core "IPC". Having dual decoders does nothing for a single core. AMD is going after efficiency and multi-thread performance.

Intel is going to lead in single core "IPC" as they have a very wide 7 port execution engine (including 4 ALU).
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


It depends on what you're going for. I doubt a memory overclock will require water cooling. If that's the main bottleneck then 2400/2666 may be fine on air with a good memory kit. I think I read somewhere the review kits shipped with 2666 memory.
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780



Sadly i may not have an use for Kaveri then... i will wait for Big Site Reviews, just few more hours to go, just in case Kaveri does have good SingleCore performance that i need.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
noob2222 What you say is untrue. I took all the CPU benchmarks except the two CB by reason explained before.

Ranth I already said you which principle: The principle of locality, which was identified in the DARPA report about challenges at the exascale compute level. Let me make you a pair of questions. Do you really believe that all the engineers working at AMD, Nvidia, Cray, Intel... have selected APUs for their fastest designs because they are unaware that dCPU+dGPU is better? Or do you believe all them know that the APU is better at that scale? Think before replying.

gamerk316 I am not assuming that what you believe. In fact the principle of locality is closely based in power consumption. That is why before I mentioned the "power wall" identified at the exascale level. Yes current memory subsystems also violates locality. That's why the exascale level designs include novel memory architectures and new programming paradigms. Nvidia is developing extensions/modifications to CUDA and as AMD design is follow the lines of their HSA approach.

etayorius Not only I don't know why you insist on saying that Steamroller is a 6-9% faster, when CPU benchmarks show about 20% average (IPC), with several benchmarks where it is above 30% (IPC). I also ignore why you have avoided this part of the Spanish review, which I quoted above

we just have the announced and expected results.

If you are only interested in Skyrim (a game sponsored by Intel) and in "SingleCore" applications and you are not interested in MultiThreaded then an Intel chip (an i3?) must be your best option.
 

tracker45

Distinguished
Jan 23, 2012
409
0
18,790


The real benefits of APUs is they are quieter than usinga graphics card and cpu combo.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I expect 10% IPC for single core in the best case, but with lower clocks Kaveri will be so fast or slower than Richland in single core. I believe that you missed all the explanations in my BSN* article about the improvements made in the Steamroller module and also this old slide

excavator.png


Pay attention to the text below Steamroller: "greater parallelism" ==> increase multi-core IPC.

Apparently Excavator will improve single core execution with more ALUs, wider FPUs... but this is topic for another thread ;-)
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780




I have looked into 3d rendering with GPU, and they always have less features than the CPU versions. I've tried Cycles with Blender and Mental Ray with Nvdia. They were all missing really important features that we need, like texture bake.

It's been my personal experience that GPU accelerated versions of applications are always lagging behind the traditional one in features, sometimes by a large margin.

Which is why I'm not falling over for HSA and still angry about not getting a traditional HEDT platform. Yes, HSA will solve a hardware problem, but it introduces a new software problem.

I'm optimistic about Mantle because AMD seems to have done everything it can to make sure that it's as easy as possible to port things over, but you really don't seem to get that so far with just OpenCL. Hopefully HSAIL will change that but it's still more than likely a ways off.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


DARPA funds lots of research. Only a fraction of it ever gets made into a shipping product. Research is just that, research. Every year I read about 10 new battery breakthroughs and maybe one of them will see the light of day. The devil is in the details. Something always knocks them out of production.
 

yes. and that's exactly what needs changing. since amd is in the forefront of hsa, they should make the push to use mainstream solutions using softwares like these. just to the point where mainstream users get to use them as regular desktop softwares. professional users will always use specialized platform, so, it shouldn't hurt revenue. softwares like these are the ones that push for more performance (apart from games) so hsa and amd stands to benefit. day-to-day usage softwares that perform basic tasks won't benefit as much from hsa. otherwise it will be like bd/pd where people simply ignored multicore cpus and went for intel and amd apus. (kaveri is an apu, but amd is hyping it as a "12 compute core" apu (8 of those will never hit cpu-class clockrates at stock).)
showing decent software support at launch would be highly beneficial for amd. this is part of the reason why i disliked amd giving exclusive use of mantle to ea dice. i understand the reasons but in the end it didn't benefit amd much (yet). we've been hearing about mantle improving bf4 performance for months, yet the whole thing sounds like another vaporware. if multiple vendors could display support for actual existing games, it'd be much better for reputation and credibility.
 
Status
Not open for further replies.