AMD CPU speculation... and expert conjecture

Page 370 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
@juanrga:

1.) Where does AMD say they're replacing FX with an APU at all? You did not answer the question.

2.) Where did it say specifically 20nm bulk? You did not answer...

3.) You quoted tech websites...not AMD directly. That's the same as saying your speculation will come to pass...(oh wait! you're doing that!). Again, question is unanswered.

I await your quotes from AMD directly and the links to the quotes for verification.
 


If AMD is getting 50% idle times, then they need to fix their drivers.



Max theoretical, assuming no internal/external bottlenecks exist. Absolute increases will be lower, obviously.

FX-8350: 256 GFLOP
Kaveri: 856 GFLOP = 118+737 GFLOP

On the CPU side, you have a 2x edge in favor of the 8350, so in purely CPU benchmarks, FX is about twice as fast, again, assuming no bottlenecks exist. So figure ~25%-33% edge in favor of FX when factoring that in.

Throwing in the GPU, you have about a 3.5x max theoretical edge, again, assuming no bottlenecks. That being, said pair FX with a 660GTX or something, and the equation changes back in favor of FX. So without knowing what GPU FX is paired with, the second number of kinda useless since you have NOTHING to compare against.

That's the point Juanrga keeps skipping over: FX WILL be paired with a GPU of some sort, so you MUST factor that in when equating performance. The real issue, as 8350rocks points out, is power/cost. Obviously, Kaveri platform is cheaper, but an FX-8350 paired with a $200 GPU will still be significantly faster.

So to answer the unasked question: In games, Kaveri will roughly equate to core-i3 performance when the latter is paired with an equivalent GPU. So it puts downward pressure on i3's, but that's about it.
 


As explained in my BSN* article those formula provide "the maximum floating point performance".

It is the maximum allowed by the architecture. The maximum allowed by the GCN cores and the maximum allowed by the FPU in the module. Recall the the FPU in Steamroller is the same FPU used in Piledriver regarding performance.

That doesn't mean that the above maximum values are the values obtained in the practice for each code. The effective GFLOP depend on the rest of the architecture (e.g. front-end ability to feed the FPU) and of developer ability to obtain the maximum possible from the silicon.

Said that, the GFLOP number gives a good idea of how Kaveri will outperform a FX-8350 when using HSA software. In my BSN* article I included a HSA benchmark

HAAR-face-detection-kaveri-pre.png


A FX-8350 would score less than 20.
 


1) Answered before. I repeat. They are not replacing FX with APUs now but extending FX towards 2014. However, they will replace FX in the future. LIsa Su has already stated AMD plans to dominate the market from phones to servers using APUs. I gave the quote before.

2) Neither she mentioned explicitly the word "bulk" when mentioned the current products made @ 28nm "We are fully top-top-bottom in 28nm now".

3) I quoted tech websites that reproduce that AMD has said. One of the quotes was: "The demand for greater parallel computing capabilities is building through all levels of computing, from mobile devices and PCs to cloud servers and high-performance computing systems, she said."




The same happens to Nvidia cards. Or do you believe that Titan/780 are 2x faster than 290X? Because aren't.



A FX paired to a GPU is not an HSA enabled system and will not run HSA software as the Kaveri APU does.

Moreover, both of you forget that the HSA APU can be paired with a HSA enabled dGPU to make a more powerful HSA system.
 
Leaked details on A8 Kaveri APU

http://translate.google.it/translate?sl=it&tl=en&js=n&prev=_t&hl=it&ie=UTF-8&u=http%3A%2F%2Fwww.bitsandchips.it%2F9-hardware%2F3641-specifiche-della-apu-a8-76x0k-con-moltiplicatore-sbloccato

The link is also interesting for the next part:

This information comes from the same source who confirmed the stop to the development of new chipsets for Socket AM3 +, the death of the Socket AM3 + , the maximum 512SP Kaveri, and the elimination of the IMC dedicated to GDDR5. It is therefore a reliable source.
 
The same happens to Nvidia cards. Or do you believe that Titan/780 are 2x faster than 290X? Because aren't.

NVIDIA and AMD have ALWAYS taken different approaches to GPU design; NVIDIA favors stronger shaders, AMD favors memory bandwidth. This goes back to when DX5/DX6 were not very memory intensive APIs (Which ATI pressured MSFT to change in DX7/8, hence why NVIDIA's 5000 series did so badly). NVIDIA could have other bottlenecks not related to draw call overhead. Likewise, AMD could be twice as powerful as NVIDIA, but being held back by software.

What I can say, in the games I play, I'm seeing >85%+ usage, so no, my 770 isn't idling 50% of the time.
 


The issue is, HSA is going to do nothing in heavy rendering/compiling scenarios. Sure, it might make photoshop faster...though I would challenge you to show me a design house doing 3D modeling using photoshop on a wide scale.

You won't find one, because photoshop is for 2D artists. Now, Autodesk products for 3D modeling...they aren't going to favor something like HSA just yet would be my guess. It's too polarizing for the large numbers of design houses working on Intel machines with NVidia GPUs.

So don't expect HSA to catch on like you think until well after the launch and likely the next generation has released.

Is it innovative? Sure, I cannot fault AMD for their forward thinking.

Is Kaveri an improvement over Richland? Sure, though they very heavily missed the target performance, and I have word as to why that happened (the commentary has been provided already in a previous post).

The issue I see here, is that you are buying all the hype, and I am hearing other things from AMD.

Do they expect it to sell well? Sure, for mainstream PCs. Laptops and OEM desktops.

However, they do not consider it to be HEDT. Your posts manipulating comparisons by including the GPU aspect of Kaveri are entirely misleading. We are discussing CPU intensive tasks, and last time I checked 256 > 118 by 100+%. So, Kaveri is not something I would put into any graphics workstation, or other productivity machine in my office. Until that day comes...Kaveri is not High End. It may be high end for all in ones, laptops, tablets, or whatever else they want to put it into. However, it is by far and away, less raw horsepower than the 8350, and will never exceed the performance of the 8350 in CPU bound tasks.

 

If theoretical GFLOPs are all that matters, why keep the CPU and instead just use a GPU?

If we are only looking at GFLOPs to measure performance, then we might as well just get rid of the CPU entirely and replace it with GCN cores as the CPU is just wasting die space for low FLOP parts of the chip.

I realize I am going about this in a round-a-bout kind of why, but you and I both know a system without an x86 CPU isn't going anywhere because not all workloads can scale to GPU cores.

If you are going to include GPU GFLOPs in APU calculation and Mantle is going to work across dGPUs, why can't I use my Tahiti chip for my total GFLOP count in my rig?

Then I have 318GFLOPs for my CPU and my overclocked Tahiti (which is close to Hawaii with ~40% overclock) has 5,324.8GFLOPs.

So my rig has 5,642.8 GFLOPs. Suddenly your comparison between FX and APU with APU winning doesn't look like such a winner now.

It was obvious from watching APU13 AMD has intentions of bringing all of this to dGPUs.

Just to put this into perspective, my system has about 6.6 times more theoretical GFLOP performance than Kaveri.

And yes, I realize you can add a dGPU to kaveri, but if you do add a high end one, the iGPU's performance is so small it is almost irrelevant in comparison.
 


Nvidia has just presented new GPGPU with unified memory addressing (aka Nvidia version of AMD hUMA)

http://www.brightsideofnews.com/news/2013/11/18/nvidia-launches-tesla-k40.aspx

UnifiedMemoryCUDA_689.jpg
 


What about Solidworks, ProE and Catia? I believe they take advantage of GPU.
 
i think people are focussing on the wrong strength of kaveri. kaveri is amd's first real sign to shifting fp calculations to gpus, if the leaked bench showing 16% regression and ~30% higher integer performance is true.. kaveri already has the igpu, so as long as amd can use the igpu as fp-co-processor, shouldn't be a problem. problem will be how the softwares see this new system and how they can properly take advantage of it. although i wonder how the igpu will perform if two or more tasks try to use it simultaneously... like play games and run some kind of avx code-type-thingies (that's one of the first fpu-taxing tasks google told me^_^).

although, fp regression pose a problem with cpu derivative of berlin compared to other cpus (fx).
 


But the way AMD is IMPLEMENTING it, they are leaving it to the developers to move the processing over. And for serial FP tasks, it won't benefit to move the processing over.

So the FP performance? It IS going to show up in benchmarks.
 


I find it curious that several talks about HSA and CAD were given at APU13.

What "heavily missed target performance"? Overall GFLOP? Sure, because the drop of GDDR5 support obligated AMD to reduce the iGPU frequency. It has nothing to do with bulk vs SOI as you pretend, because the iGPU and the 7750 are both bulk. It is memory bandwidth all the time.

At the same time AMD showed iGPU running BF4 at least 2x faster than GT630, which surprised more than one, because they were expecting less performance.

I have given you quotes from AMD, including vicepresident, that contradict you.

I compared Kaveri to CPUs using ordinary CPU workloads, using HSA workloads and using MANTLE enabled workloads.

I have also shown non-HSA situations Kaveri will outperform the 8350 CPU, but you insist on ignoring those.

The only manipulation here is from you selectively picking what you want, misleading posting about it, and ignoring the rest.

No, Kaveri is not for tablets. As mentioned before in this same thread Beema is for tablets.



Therefore anything that I wrote beyond GFLOP is ignored I see.

Also total performance of 856+5000 doesn't seem irrelevant when compared to 256+5000, specially when the former includes HSA hUMA
 
Here's the issue Juan, in the one benchmark where Kaveri beats PD, they underclocked to 1.8. Which sounds good, until you factor in what happens when a SINGLE PD core gets over 100% work. Here's a hint: You take a giant performance hit. I'm guessing that's what happened here; basically, PD's design ended up bottlenecking when underclocked. If Kaveri were OC'd, rather the PD UC'd, then I'd expect the opposite (and expected) result of PD thrashing Kaveri.

So as always, you need to qualify your statement, to "Kaveri beats PD when PD is underclocked to half its normal speed", also noting how the PD arch is not designed to run at those low frequencies. Or did we suddenly decide to get away from stock v stock testing when it became convenient? We're talking performance, not architectural efficiency, right?
 


Exactly. Since first day (since AMD adquisition of ATI) AMD concept of APU has been that of a heterogeneous compute processor, where the GPU is used as a giant FPU

http://www.hardwarezone.com/feature-amd-fusion-new-era-computing-coming-soon/looking-ahead-fusion-platform

http://www.extremetech.com/computing/93046-the-dao-of-dozer-understanding-amds-next-gen-cpu

AMD [...] has long-term plans to move floating-point workloads towards the GPU with the Fusion architecture.

Kaveri is the first APU that fulfills that long dream and however some people continue believing that an APU is something for cheap gaming or cheap graphics. LOL

amd-hsa-2.0-explained-640x291.jpg


 
Well, I do believe the lower GFLOPs are due to Bulk not reaching the same clocks as FD-SOI. The formula uses the speed for each component (CPU and GPU) to calculate the FLOPs, so it's kind of obvious that the process DOES matter in terms of how APU behaves now. Indirect effect, but it still sucks.

And in those same lines, not giving official Turbo speeds is really something to be wary of. They can't back out of the 4GHz mark after Richland broke it so easily. I am doubting the initial batch of Kaveri parts will be as good as Richland when OCing and hence will have lower performance after tweaking. I wonder how the GCN part will stack against VLIW4.

Cheers!

EDIT: Looks like HSA software will save Kaveri. If it doesn't, then Kaveri starts to smell like a flop.
 


I would sure as heck hope so.

http://www.geforce.com/hardware/desktop-gpus/geforce-gt-630-oem/specifications

192 cores @ 875 vs 512 @ 700 ... I would hope that kaveri can be faster ...

Who in their right mind in any shape or form would expect less performance out of ~2.5x as much hardware? Thats a dumb statement in itself.
 


Guess what? Because AMD is leaving it to developers to change they way they do things, an APU is going to be something for cheap gaming.

Now, if AMD simply routed all FP calculations to the APU, or had something trivial (say, a one-liner) to do this in code, then that would be one thing. Instead, AMD is forcing software updates, forcing new coding styles, and forcing something that won't be supported by the other giant in the market. Typically, that doesn't end well; anyone remember how many studios came out behind PhysX?

What I am afraid we are seeing is the beginning of an API war, which means we WILL be returning to the days of software incompatibility. I remember the days when Glide got all the graphical options, OpenGL had the high quality textures, and DirectX had lower quality textures, but hardware T&L and dynamic shadows. If you didn't have a Voodoo, you had reduced quality. That's where things are starting to head, and I don't like it one bit.

I'm *hoping* MSFT preempts all this by greatly expanding the DirectX API, preferably with a Physics engine built in (so we gone one standard adopted by everyone), and gives a mechanism for lower-level hardware access (which they can do better then anyone, since they can change the OS to make it work).
 


To be fair, NVIDIA's shaders have ALWAYS been faster then AMD/ATI, but that number of extra ones Kaveri has is significant enough where AMD is going to win that matchup.
 


which is why I was talking wether or not it gets adopted. If HSA fails, Kaveri fails.
 
The bright side of APU13 is that it looks like AMD plan to compete in desktop by delivering FX PD at ever lower prices, even if there is no FX SR.
 


first off, there is no proof of clock speed.

2nd, this "benchmark" has been increasing constantly.

now its 3311 integer and 653 fp.

http://www.cosmologyathome.org/show_host_detail.php?hostid=187215

there is no reported clock speed, and seeing as this started out at 2544 integer score, what are the chances that this isn't always running at 1.8 ghz?
 

How about the 8% they missed the CPU target and the 24% they missed the GPU target?

They didn't hit the CPU target because of clockspeed, and they didn't hit the GPU target because of clockspeed.

Both of those factors correlate to several things:

Target TDP limits (influenced by bulk vs. FD-SOI)
Target power consumption limits (influenced by bulk vs. FD-SOI)
Clockspeed headroom (influenced by bulk vs. FD-SOI)

The issue is, bulk substrate means higher leakage, which means more power is required to reach the same clockspeed that can be achieved on FD-SOI because of the insulator.

This means, basically, that while Kaveri runs at 3.7 GHz at 95W TDP, Richland runs at 4.1 GHz + Turbo at 100W TDP or 10% faster on a less advanced process with larger/fewer transistors.

Now, your 20% performance gain just took a 10% hit because of loss of clockspeed. See the difference?

High leakage in bulk substrate is why Intel uses so many tricks to make FinFET work on bulk. Like 3D transistors, and all the other super expensive to R&D stuff they use in their fab just to keep clockspeeds competitive on bulk process.

Additionally, bulk is far more temperature sensitive to power consumption at lower voltages than FD-SOI. If you crank the vcore up on a haswell chip, it gets hot fast. You can turn up the vcore on a FX chip and it doesn't require nearly as drastic cooling to run cooler than the Intel option.

These are all factors that you have not addressed and lead directly into the issues AMD had with bulk process.

They are also why I believe my contact mentioned directly that they would not pursue HEDT on bulk substrate, and would not be doing bulk beyond 28nm.

In the end, if AMD keeps this up, we all lose. FinFET is not the way of the future, it's many times more costly to develop a FinFET on bulk with all the technology Intel has, to compete, rather than take a simple planar UTBB FD-SOI that competes without all the dog and pony show tricks to get there.

But what do I know about CPUs? You clearly are the master of everything about CPUs, if nothing else because you said so, and no one else could possibly be right about something.

FX > Kaveri


 


Gate last actually does make a difference, the back biasing and other advantages it affords help. However, the issue is that bulk substrate is just a generic silicon wafer, while FD-SOI (or Fully Depleted Silicon On Insulator) has a thin slice of insulating material in the middle of the wafer. This has tremendous advantages over a standard silicon wafer.
 
Status
Not open for further replies.