Overclockability on these new APUs is suspect to skepticism at this point. I have a feeling we will see Intel-esque variability on a wide scale. (i.e. the odds of hitting the silicon lottery with the new APUs will be significantly lower than they are with Richland).
First, it was fear about AMD going bulk, then it was skepticism about clocks (your why-did-they-only-tested-1.8GHz argument). Now we know it clocks at high frequency of 3.7GHz, you have bad feelings about OC. What will be next?
Well, let's break that down shall we...?
1. BECAUSE AMD WENT BULK the clocks are not even what they achieved in Trinity (3.8 GHz + Turbo) much less Richland. Which means I was right, and my assessment was correct. Clock speeds decreased in Kaveri. I told you they would not even break 4.0 GHz in stock configuration, and here I am proven right.
2. Why would AMD not at least put the flagship part at the SAME clockspeed as the flagship Richland unless they had difficulties with performance? If AMD is not getting the performance they want, then that means the headroom must be quite a bit less for them to be at a 400 MHz clockspeed disadvantage to start. AMD has historically pushed clockspeeds to the limits, this time we see a regression. The OC tests will tell the tale.
Why would they lose 9% clockspeed unless they had to? Especially when the architecture gains are roughly ~20% from what we know. That means flagship Kaveri is only a ~11% improvement over Richland as near as we can tell currently. It may end up being less once the cherry picked benchmarks are tested against the more rigorous benchmarks for CPU performance.
In short...they're losing HALF their performance gain by giving up that much clockspeed. THAT is what going to bulk gets you.
Had it been FD-SOI, then it would really be a flat out ~20% gain we would be testing against other benchmarks to determine the range of improvement. As it sits, this generation is likely to see something on the order of a 1-15% improvement because they gave up too much clockspeed going bulk...all when it could have been 10-25% improvement had they been able to keep the clockspeed and make the uarch improvements.
Who was wrong and who was right?
This generation, in tests where the flagship Kaveri is really 10% better in efficiency versus Richland, it will be 1% better due to loss of clockspeed.
They should have gone FD-SOI, as I said all along.
1. The 3.7GHz virtual match the 3.8Ghz and we don't know the turbo. Yes we waited a small decrease in clock speeds, but I mean all the fear and negative claims about it is a very low clocked part. I recall people saying that it would be clocked at 2.6 GHz. I recall an italian site, mentioned here, saying that was 2.9Ghz and so on. I recall someone saying that 3.5GHz couldn't be surpassed because that was the maximum frequency of Intel chips on bulk... I also recall your bad feelings
the fact that they're not comparing stock clocked parts is a bit disconcerting. I feel that bodes poorly for what future clockspeeds will be.
The truth is what they have managed to maintain the high frequencies without SOI.
2. I can imagine many reasons. First, Richland is made in a mature 32nm process, tweaked during years. Kaveri is the first product released by Glofo in their new 28nm process. If you leave them more years, they could offer more freq. no doubt. Second, Kaveri has lower TDP constraint than Trinity/Richland. Kaveri is 95W APU or less. Third, we don't know if AMD is following Intel approach of lower base clocks and more aggressive turbos. Fourth, Kaveri has a bigger GPU that consumes much more power. Therefore the CPU cannot be clocked so high as with a small GPU if the total TDP is of 95W or less.
This will not break OC records, but we heard rumors that can OC up to 4.5GHz.
Had it go for FD-SOI they have had to have to pay more; split resources between two process (one for dGPU other for iGPU, one for Steamroller other for jaguar); be following a dead-way (no evolution beyond 20nm at Glofo, everyone else on FinFETs); be delayed (Glofo has not ready 28nm SOI) and then compete against Broadwell instead Haswell; be locked at Glofo (one of poor foundries actually)...
The move to bulk was smart and means they pay less; concentrate resources in a 28nm bulk process for all products; be following a highly evolving road (20nm bulk for 2015 and then 14nm FinFETs); be selling the first Kaveri chips, and prepare Carrizo successor for Broadwell arriving by 2015; be open to more reliable and aggressive foundries such as TMSC, which has almost ready the 16nm node and is already tapping out chips in 10nm FINFET.
A part that caught my attention was that about how AMD would pair a APU with a dGPU. Previously in this tread two options were discussed. In one the iGPU was devoted entirely to compute and the dGPU to graphics. In the second option the iGPU and the dGPU would work in tandem doing compute or graphics or both. This is from the talk:
The application also acquires the ability to take control of the multi-GPU and to decide where to run each command issued. Why AMD has provided in Mantle access CrossFire compositing engine, data transfer between GPU etc.. Will allow multi-GPU modes that go beyond the AFR and adapt better example to use GPU computing in games or asymmetric multi-GPU systems as is the case for APU combined with a GPU. For example it is possible to imagine the GPU load based rendering and APU handle the post processing.
Which is exactly where I've been saying things were moving toward, and how I see this all being applied: The APU handles Physics/Compute/OpenCL, while the dGPU focuses on graphics.
But it is the other possibility ;-) "The application also acquires the ability to take control of the multi-GPU and to decide where to run each command issued". Therefore both iGPU and dGPU work in tandem, as I said, and the workload is split dynamically. In some cases it will make sense the iGPU for compute and the dGPU for graphics, in other cases it will make sense that both iGPU and dGPU execute graphics work, in another case part of the dGPU can assist to the iGPU on compute...
Which is exactly how I DON'T want HSA implemented, because this will never work in the real world.
Remember, first and foremost, you can not make the assumption no other application has access to the hardware you are trying to use. If you are not VERY careful, you can easily tank performance in a multi-application environment compared to what it would be if you stuck with a more traditional approach.
Secondly, you will run into problems in regards to future scalability. Today, if you had a GTX 610, you would obviously want the APU doing rendering, and most of the compute work going to the dGPU, based on how fast each one is. But how about the GTX 910; does that assumption still hold? You run into the very dangerous situation where the assumptions you make about the hardware are no longer true in a year or two, which results in your very fine tuned model costing possibly significant performance.
Frankly, what AMD should have done was get MSFT to modify their WDDM model to allow separate devices to be selected as the default GPU device, and default compute device. That solves the problem with having to do this application side, and forcing the developers to make guesses about the state of the hardware.
Oh ! So mantle doesn't actually improve game performance, fps etc, it just makes games easier to make for developers.
Still good though.
No. MANTLE improves game performance thanks to eliminating the overhead in Microsoft DirectX API.
In consoles avoiding the DirectX overhead allows for about 2x more performance on the same hardware. I think we don't will see that in the PC, but 30%--50% increase in performance seems feasible.
Dice developers have said that Radeon + MANTLE enabled version of BF4 will ridicule Nvidia Titan. We will see.
As I've said before, DX11 doesn't have nearly as much overhead as DX9. You are loosing, worst case, ~10% maximum performance in non-CPU bottlenecked situations.
Secondly, when developing for consoles you can make assumptions about the hardware that you can NOT on PC's. If you are not careful, fine tuning too much can tank performance on PC's, because you, the developer, do not have final say on when your application even runs, let alone where resources are allocated or when your worker thread actually starts. There's a reason we went away from coroperativly scheduling threads.
Disagreeing is fine, but he writing "to save their ass", "a crappy cpu", "and an idle IGP", "weak ass cores"... clearly denotes hate.
Specially when he is plain wrong. As shown in my article about Kaveri a 3.7GHz SR CPU perform like a SB/IB i5 with ordinary CPU workloads, loosing in the FP intensive ones but outperforming in the integer workloads. There I assumed 20% IPC over PD, but some late leaks suggest that the final improvement is >30%. Therefore add to the scores I published if the leak is true.
you want to claim I am wrong yet you only offer your opinion. APU cores are weak, always have been.
the lowly fx-4100 is 46% faster than the A10 5700 and clocked slower. The fx-4320 is 58% faster and the fx-8350 and the I5 3470 is 73% faster. good luck catching that i5.
This is the same architecture. Id call that a weak ass core any day.
now tell me using these RL results how removing the L3 cache in favor of making the cpu weaker and adding an IGP is such a great thing to ever happen to the computing industry?
How is kaveri going to catch the I5? 8350? ... heck even the 4320 ...
Id say the a10 5700 is a pretty good spot to start with Kaveri figures since they both clock at 3.7 ghz. As you put it "in ordinary cpu workloads" shown above, its starting at negative 40% already just to get to the 4320.
Kaveri is not going to be the answer your hoping it will be. It will not catch the i5 3470 very often (if ever,) it may here and there, but that will take some luck "in ordinary cpu workloads"
Tell me just exactly how I am "just plain wrong"?
I already said this to you. Also you continue using essentially the same logic than in your previous attack to AMD ARM line.
Here you pick very old benchmarks (not optimized for AMD architecture, one of them is a Nvidia sponsored game) and an old 5700 Trinity APU and you claim this is how Kaveri will perform.
You miss the estimation of Kaveri CPU performance in the BSN* article, you miss the leaked benchmarks comparing Kaveri to Bulldozer and Piledriver FX, and you miss the BF4 benchmark given by AMD during October talk, where a Richland APU got the 98% of the performance of a FX-6350 and the 96% of the performance of the FX-8350 (the three using a R9 280X and playing @ 1080p ultra). I think you continue missing the subsequent discussion on multiplayer BF4. And you miss that AMD is in the consoles now, with a CPU based in jaguar cores.
Can you compare the performance of the PS4 CPU to Kaveri CPU? I can.
Maybe you don't still understand this, but game developers will be offloading the consoles CPUs and running the heavy computations on the consoles GPUs. That is why both consoles have GPGPU abilities and HSA support.
You also miss that MANTLE aims to liberate some CPU bottlenecks that exist in current gaming technology. This is from Oxide talk at APU13:
Mantle Unleashed: How Mantle changes the fundamentals of what is possible on a PC. Over the last 5 years, GPUs have become so fast that it has become increasingly difficult for the CPU to utilize them. Developers expend considerable effort reducing CPU overhead and often are forced to make compromises to fully utilize the GPU. This talk will discuss real-world results on how Mantle enables game engines to fully and efficiently utilize all the cores on the CPU, and how it’s efficient architecture can eliminate the problems of being CPU bound once and for all.
get over yourself. Very old benchmarks? I specifically looked for games released in 2013 ... ermago ... thats soo friggin old.
a10 5700 = piledriver without l3 cache
4320 = piledriver fx
8350 = piledriver fx
whats soo old about the a10 5700? all that richland brought was higher clock speeds over trinity, kaveri brought lower clock speeds. you can't compare the 4.2 ghz richland to the 3.7 ghz kaveri and say its going to be 30% faster on top of being slower clock. 3.7 ghz piledriver vs 3.7 ghz kaveri is a fair comparison.
If anything id describe AMD's APU as a gpu with an integrated cpu while Intel is making cpus with an integrated gpu.
The reason your stuck on this "BF4 ERMAGO BENCHMARK" is because it doesn't stress the cpu AT ALL. Its a gpu bound benchmark. Thats why it was chosen, its a tactic called marketing.
As for the BSN article, its was pulled from your website.
Agree on that "very old" was an exaggeration from my part, but it was needed to compensate your exaggerated attack on Kaveri APU (I mean that one where you wrote "ass", "crappy", and "ass").
In the same paragraph I also wrote what I mean by "old", (="not optimized for AMD architecture"). Sorry but no, I was not referring to if was released in early 2013 or in late 2009.
I am not discussing marketing, you are. I am discussing technical and economic details behind marketing and execution plans by a company. I am explaining you that the whole master plan (which I have given you some elements) is towards offloading the CPU more and more. As you don't get it, offloading the CPU means you push more work on the GPU. Therefore a GPU bound benchmark is more characteristic of how next gen games will behave than old Intel+Nvidia games.
I note how you avoided my question about the jaguar-based CPU in the consoles...
noob2222 :
juanrga :
Ranth :
Juan we do agree that richland and piledriver is basicly the same, expect for powermanagement and the like, right? what does the richland APU have that the fx doesn't? And specifically what is it that makes the APU be the only one recieving performance increases:
The six factors, Except for the piledriver -> steamroller/HSA (Won't help in singlethreaded), doesn't everything else apply to fx and older apu too..?
The APU lacks L3 cache. I don't know if there are other differences which AMD has not disclosed. What I know is what follows.
The BF4 benchmark given at the October talk to OEMs shows a Richland APU performing as well as FX-6350 and FX-8350. The APU gives a 98% and 96% of the performance of each FX respectively.
The 'old' argument that FX is 30--50% faster because has L3 cache doesn't apply here.
The argument that Suez map is single player and not well-multithreaded doesn't mean anything, because a FX-4350 will not be 30--50% faster than the FX-6350 and FX-8350.
meaningless ... really ...
ya, ... multiplayer really favors the APU core in the 750k doesn't it.
Just to clarify,
750k = piledriver without l3 cache and no IGP.
4350 = piledriver fx
6300 = piledriver fx
8350 = piledriver fx
lets compare the 4.5 ghz piledriver 750k to the 4.7 ghz fx. That only accounts for a 4.5% difference in clock speed, if perfect scaling (wich its not) then subtract 4.5% from the results.
4350 = 18% faster than the l3 cacheless APU core
6300 = 47% faster than the l3 cacheless APU core
8350 = 76% faster than the l3 cacheless APU core
How many times do you have to be proven wrong? Single player benchmark is meaningless because it doesn't stress the cpu AT ALL. Its strictly a gpu benchmark until you drop below 3.0 ghz.
More of the same. Here you pick again the same benchmark that I commented before in a reply to you, when you linked it by the first time. And then you repeat the same without reading what was said to you. So typical.
And you always change the story. You stated that kaveri will =i5 in ordinary cpus workloads. When proven wrong now it's only in apu specialized software.
You may be happy with weaker cores coupled with a software solution, it will not go over well when the reviews come out.
And unsurprisingly you come with the same misunderstanding again, one that I corrected before and before...
In my BSN* article, which evidently you didn't read, I show how kaveri performs as an i5 using ordinary cpu workloads. I am not using "apu specialized software" as you pretend.
Finally, I am happy with the idea of software using my hardware. All my hardware (or most of it) and not only one half or one third of it. There is another person here with similar ideas to me. He compiled a program for his hardware and now it runs 2x faster or said in another way the former unoptimized program was using only a 50% of the performance of his hardware.
You seem to prefer the other way. If the software only uses a 50% of the hardware you update to a 2x more powerful hardware for that the same software can now ignore the 50% of more.
I mentioned this elsewhere, but the software side of things are very good for AMD right now in regards to gaming.
Remember when FX 8350 first launched? You'd have to dig forever to find good FX 8350 benchmarks and the AMD guys were floating the same few benchmarks which showed it doing well over and over again while there was a bombardment of Skyrim, Shogun 2, Starcraft 2, Warcraft, etc.
Now look at how things have changed. GamerK is trying to disprove the FX and the best he can do is post a few niche examples of modern games that don't run well on FX.
I do think this is why AMD is not in a rush to push SR on HEDT. PD has gone from trailing Intel 3570k by 30% in reviews in all the gaming benchmarks to beating 3770k. Nothing on the chip has changed.
AMD simply has absolutely no reason to even release SR on HEDT. Why release a chip that is 15% faster when you just got 30% improvement out of software increases by nipping the whole "game developers optimizing for Intel" thing right in the butt?
After watching the APU13 keynote on Mantle it's quite clear that AMD is designing Mantle to not only scale where GPU on APU does some other calculation besides rendering, but that this will scale between TWO dGPUs.
Watch the whole thing, I think Johan screwed up. He said you'd see a Mantle situation where one GPU calculated global illumination whilst one just rendered the scene.
Do you know what's missing from that equation?
An APU.
And I hate to beat a dead horse, but it's quite clear AM3+ is completely incapable of HSA.
Yet at the same time running two dGPUs on APU platform is a complete waste because of the PCIe lanes available.
So why talk about Mantle using two dGPUs? AMD doesn't have a publicly available platform that supports everything required for Mantle to use two dGPUs.
Johan dun goofed.
When did Crysis 3 and BF4 become a niche examples again?
The ONLY FX that is competitive is the FX-8350; the others, even the lower clocked 8 core variants (8230, etc) all lag behind IB i5's. The weak individual cores hobble the architecture, even in games that scale well, like BF4 and Crysis 3.
Same trend exists, the FX-8350 still lags behind the i5-3750k, adn even the FX-6350 comes in just ahead of the i3 lineup, trailing the i5's by a decent margin. Even the 8 core FX-8320 can't match even the cheapest i5, the 3350p, in BF4.
Hence my point: FX's architecture deficiencies are hidden in part due to clock speed. You typically see, at max OC, Intel pulling farther ahead. You also see the 4-core FX being noncompetitive, and even the 6-core being one step up from the i3-lineup.
Farther, the FX-8350, in both Crysis 3 and BF4, lost to the i5-3570k, which comes in just $10 more expensive. And at max OC, the i5-3570k pulls farther ahead (except in the case of a GPU bottleneck being reached, as seen in the second image). So I'd have to recommend the i5-3570k over the FX-8350 in EVERY instance right now. The 6350 is attractive for its price (~$140), but you accept sub-i5 performance by going that route.
That is not the sign of a good architecture, that with a clock speed edge and DOUBLE the cores, you still lag in performance, even in titles that use more then a dozen threads. The only time FX matches high end IB chips is when a GPU bottleneck suppresses the results.
anyone know how they did 2x the performance per watt while still being on 28nm? They should be competitive with baytrail with numbers like this if true.
noob2222 :
esrever :
anyone know how they did 2x the performance per watt while still being on 28nm? They should be competitive with baytrail with numbers like this if true.
Wow , that's just strange. Beema and Mullins weren't canceled to make way for arm cpus? Wonder who said they would be.
the problem with that statement is, amd compared 15 watt beema to kabini 25 watt that we know the less efficient kabini chip, if amd compared to 15 watt kabini (a4 5000), or 25 watt beema vs 25 watt kabini than we will have better picture of the improvement, is there result of pcmark 8 home with amd 5000?
But Juan that is a single case out of many, okay so in the BF4 it won't be 50%, what about the million other applications (Which I btw doesn't think is right either, 10-25% is more likely).
"It's not reflected" in a canned benchmark by AMD who is trying to sell APU's. Would it not be odd if AMD choose a benchmark where the richland would perform poorly? That would be terrible marketing. And before you go claim that "this is how it's going to be" , you have to have more of the picture. Meaning more than one benchmark.
Of course it is only a single case, but I recall that when I started my discussion of this, I clearly said that AMD was taking it as "prototype" for the "next gen" games.
Yes, it is picked benchmark by AMD who is trying to sell APU's at the expense of the FX chips. They could compare Richland to an Intel chip, but they compared to its own FX chips. The FX-4350 was not even mentioned.
Of course, look at the die sizes on the APUs...where do you think the margins are better? That 315mm^2 die on the FX series makes them money, but APUs selling for ~70% of the cost, with 60% of the die size makes tons of sense. They can get better yields out of the product because they don't need the big dies like they have in FX.
Now, that of course precludes the fact that FX is still a better CPU when you need the raw horsepower, and so I anticipate they will still sell them by the truck load to boutique builders and DIY PC builders.
You talked about FX 8 cores being 0.4% of steam hardware survey, I think it was, and when you consider that's 50 mil people on Steam, most of them on mobile solutions, I would think you would see something along those lines...that's still 200k machines with FX 8 cores using steam. Considering that there are many people who do not have steam (myself included), I think that the prediction those numbers give is likely only 10-15% of the likely number of people running such systems in the US. You are also neglecting the very prevalent productivity types with that figure, and I think that plays a large part in the small representation of that sample.
If the demand for FX-8350 CPUs was higher than it is, AMD margin would be better. Look at the 9000 series. They were overpriced, nobody was purchasing them and recently their prices dropped by giant amounts. It is the always the same: lack of demand.
I consider that steam statistics must be pretty accurate from my knowledge of the local market, but look at own AMD numbers: the FX 8-core represents like a 2% of the total revenue generated by both APUs and CPUs. Now try to compute the percentage of the total revenue.
8350rocks :
juanrga :
de5_Roy :
then he kept repeatedly bringing it up saying that benchmark is the reason amd cancelled higher core SR-FX cpus and apparently that's what amd 'told' oem representatives at that event (even thought he failed to provide an audio transcript after repeated asking).
This is pure and simply false (*). The reasons why AMD is not releasing SR FX line are multiple and I explained them here before: transition to APU, reorganization of server plans, and lack of demand. I gave details and further explanations for each one.
What I said about that October talk slide was that it clearly reflect AMD plans about migrating to an APU strategy and guess what Lisa Su (AMD vicepresident) confirmed my thoughts, this week during the opening keynotes:
Lisa Su, Senior VP & GM of Global Business Units at AMD, delivered the opening keynote and the message was clear: AMD is positioning its Accelerated Processing Units (APUs) -- which combine traditional multi-core CPUs and a discrete multi-core graphics processing unit on a single chip -- to dominate the market from smartphones to servers.
(*) Like when you said that I never wrote the PFs for the slide but I did in one of my posts in this thread.
LOL...When did AMD say, specifically, that they were not putting out a new FX successor? Without you reading between the lines and interpreting...
I will have word from AMD about future FX successor within the next week. So don't go counting chickens before they hatch again...
And where in AMD vice-president words says that they will release a FX successor?
What if I already have in my hands the official desktop roadmap for 2014?
8350rocks :
juanrga :
palladin9479 :
Anyone attempting to argue the new APU's will be remotely close to a high end dGPU is deluded. That's just not how CPU's work. The highest end APU will be about equal to the low end dCPU's but with a moderately good GPU bolted on. The entire purpose of an APU isn't to replace dCPU's, it's to do budget / compact computing. SFF and similar use's where power and space are limited and going with an all inclusive solution works out the best.
Anyone attempting to argue that the entire purpose of an APU is to satisfy the budget / compact computing market is deluded.
AMD APUs have been competing as a low-cost alternative in much of the consumer market, because the APU concept had not been still completely developed. Kaveri is the first APU that fulfills that 'ancient' AMD dream that was born with the acquisition of ATI.
It has not still completely developed by two basic reasons: memory bandwidth and software. But the second is a consequence of the former. Once the memory bandwidth problem was solved with the almost ready stacked memory we will see the born of ultra-high-performance APUs
P.S.: the bandwidth provided by this new RAM technology is superior to the L3 cache in FX chips.
HMC is superior to even GDDR5, however, that technology is years away from wide spread adoption in consumer PCs. Sure, super computers costing millions and millions of dollars use it...however, there isn't even a standard for a way to integrate it into the motherboard yet...
I didn't mean this to be happening tomorrow. Kaveri is a 2014 product and uses DDR3. Carrizo successor comes in 2015 and uses DD3/4. The new high-bandwidth memory technology is something for 2016/17. Then we will start to see ultra-high-end performance APUs (which not will be mainstream). Nvidia is developing an APU with cube memory that is projected to give 10x the performance of the GTX Titan.
If that was the case, we would have already seen the slide for it 1000x because you would be trying to convince us that there is no dCPU coming, and that would be your "holy grail" of evidence. However, I am more willing to bet you have the official 2014 Desktop APU roadmap that everyone else has...
@logain: you don't have to be a native citizen to find a website unreliable. i am not defending pclab.pl, multiplayer games are very hard to consistently benchmark anyway. but.. at least they did it, and used dice recommended o.s. their test hardware were more well-rounded than the other site that benched bf4 mp and crysis 3 - gamegpu.ru.
....
now what if some russian poster said that they considered gamegpu.ru unreliable?
Other members from Poland have labeled that site as untrustworthy for results.
And I know a few Russian gamers on other forums who claim the same about GameGPU. I'm sure "biased" in both groups minds is "doesn't agree with my pre-conceived result set".
Here's the main issue here: I've got very limited samples to choose from here. And the only two that bench significant numbers of CPU's conflict, come from foreign sources (PcLab and GameGPU), and most annoyingly, don't match up with any other set of benchmarks I can find.
Toms and PcLabs match in terms of ordering (2500k ahead of FX-8350), but Toms didn't bench nearly enough CPU's to be enough to validate PcLabs entire result set. Same thing occured with Crysis 3 results (and I noted it at the time), in that GameGPU tended to place the FX-8350 and other FX chips higher then other sites (Toms, etc). So we get the situation where no ones result sets agree, and NO ONE is investigating why thats happening.
Point being, until some other site gets serious about benching CPU's, we're stuck with two conflicting result sets.
One thing I AM finding though, significant in the case of BF4, is that Win7.1 and Win8 results tend to differ, significantly in some cases. So we may have some OS effects in this case, which could explain everything. But I doubt any site is going to investigate that issue...
Point being, since the ordering with PcLab tends to agree with Toms (as far as ordering is concerned), I'm sticking with them unless someone can prove the results aren't legit. And no, saying "the results aren't legit" doesn't count by itself.
AMD's 2014 product roadmap will release shortly if not today...
Piledriver FX is going to be the HEDT line through 2014 until further notice. AMD concedes Kaveri is not going to replace FX in HEDT, and they know the segment is crucial for gaming, and it's the only segment growing at this time in DT PC.
AMD expects to see the PD arch gain performance boosts in several facets and remain mostly competitive at the price points they have established through software optimization and other means.
The next HEDT platform already has an internal codename, though he could/would not tell me what it was. There were no promises that we would see the HEDT successor in SR arch, it may not be until excavator that they move forward. The issue is simply that they will not do HEDT on bulk given the way things went with Kaveri. If they had been able to completely meet/exceed the design goals on bulk with Kaveri they would have moved forward with HEDT. However, he expressed the sentiment that they regretted being forced to downgrade their process to bulk and that they know they lost performance to release the APUs in a timely manner.
Next HEDT line may come on 28nm or 20nm, he wasn't clear there, but he was certain it would not be bulk, and felt like they would move back toward FD-SOI if GF could get their "issues" sorted out for 28/20nm SHP. He could not tell me if the next iteration of their HEDT line would be a FM2+/FM3 product, or if they would in fact move forward with AM4 when DDR4 arrives, though he expects that the timing of DDR4 and the new HEDT platform would be pretty close. Supposedly, there is a new IMC in the works for next gen HEDT hardware (to coincide with DDR4).
if amd is so fab-bound and using bulk substrate, it makes sense for them to wait for fd-soi at 20nm and include ddr4 support to move on to an entirely new socket and new 'big' core for hedt. one catch is that current info showing st micro licensing their fd-soi booster tech for 28 and 14nm(xm version) only.
Again, if only AMD could invest in its own fabs, rather then waiting for GloFo to handle its own business...
They hit their release targets more reliably before they spun off GF.
I agree with the sentiment...though at the point where they are, launching a private fab would be near disastrous to their bottom line. They would need to be at a point where they could offset the startup losses from the fab end with product revenue. It will be interesting to see if they go that route down the road.
if amd is so fab-bound and using bulk substrate, it makes sense for them to wait for fd-soi at 20nm and include ddr4 support to move on to an entirely new socket and new 'big' core for hedt. one catch is that current info showing st micro licensing their fd-soi booster tech for 28 and 14nm(xm version) only.
The XM version is really a hybrid, using 14nm front end and 20nm back end with a gate last approach.
Again, if only AMD could invest in its own fabs, rather then waiting for GloFo to handle its own business...
They hit their release targets more reliably before they spun off GF.
I agree with the sentiment...though at the point where they are, launching a private fab would be near disastrous to their bottom line. They would need to be at a point where they could offset the startup losses from the fab end with product revenue. It will be interesting to see if they go that route down the road.
My point was that it was a mistake to spin GF in the first place.
Again, if only AMD could invest in its own fabs, rather then waiting for GloFo to handle its own business...
They hit their release targets more reliably before they spun off GF.
I agree with the sentiment...though at the point where they are, launching a private fab would be near disastrous to their bottom line. They would need to be at a point where they could offset the startup losses from the fab end with product revenue. It will be interesting to see if they go that route down the road.
My point was that it was a mistake to spin GF in the first place.
Well, you could argue that it was...
However, you could also argue that it wasn't...since GF has lost significant money since it was spun off, where would AMD be right now? Hemorrhaging cash right now...that's where. From a business perspective it made sense in some ways, and not in others. I think all we are doing is arm chair quarterbacking at this point.
Disagreeing is fine, but he writing "to save their ass", "a crappy cpu", "and an idle IGP", "weak ass cores"... clearly denotes hate.
Specially when he is plain wrong. As shown in my article about Kaveri a 3.7GHz SR CPU perform like a SB/IB i5 with ordinary CPU workloads, loosing in the FP intensive ones but outperforming in the integer workloads. There I assumed 20% IPC over PD, but some late leaks suggest that the final improvement is >30%. Therefore add to the scores I published if the leak is true.
you want to claim I am wrong yet you only offer your opinion. APU cores are weak, always have been.
the lowly fx-4100 is 46% faster than the A10 5700 and clocked slower. The fx-4320 is 58% faster and the fx-8350 and the I5 3470 is 73% faster. good luck catching that i5.
This is the same architecture. Id call that a weak ass core any day.
now tell me using these RL results how removing the L3 cache in favor of making the cpu weaker and adding an IGP is such a great thing to ever happen to the computing industry?
How is kaveri going to catch the I5? 8350? ... heck even the 4320 ...
Id say the a10 5700 is a pretty good spot to start with Kaveri figures since they both clock at 3.7 ghz. As you put it "in ordinary cpu workloads" shown above, its starting at negative 40% already just to get to the 4320.
Kaveri is not going to be the answer your hoping it will be. It will not catch the i5 3470 very often (if ever,) it may here and there, but that will take some luck "in ordinary cpu workloads"
Tell me just exactly how I am "just plain wrong"?
I already said this to you. Also you continue using essentially the same logic than in your previous attack to AMD ARM line.
Here you pick very old benchmarks (not optimized for AMD architecture, one of them is a Nvidia sponsored game) and an old 5700 Trinity APU and you claim this is how Kaveri will perform.
You miss the estimation of Kaveri CPU performance in the BSN* article, you miss the leaked benchmarks comparing Kaveri to Bulldozer and Piledriver FX, and you miss the BF4 benchmark given by AMD during October talk, where a Richland APU got the 98% of the performance of a FX-6350 and the 96% of the performance of the FX-8350 (the three using a R9 280X and playing @ 1080p ultra). I think you continue missing the subsequent discussion on multiplayer BF4. And you miss that AMD is in the consoles now, with a CPU based in jaguar cores.
Can you compare the performance of the PS4 CPU to Kaveri CPU? I can.
Maybe you don't still understand this, but game developers will be offloading the consoles CPUs and running the heavy computations on the consoles GPUs. That is why both consoles have GPGPU abilities and HSA support.
You also miss that MANTLE aims to liberate some CPU bottlenecks that exist in current gaming technology. This is from Oxide talk at APU13:
Mantle Unleashed: How Mantle changes the fundamentals of what is possible on a PC. Over the last 5 years, GPUs have become so fast that it has become increasingly difficult for the CPU to utilize them. Developers expend considerable effort reducing CPU overhead and often are forced to make compromises to fully utilize the GPU. This talk will discuss real-world results on how Mantle enables game engines to fully and efficiently utilize all the cores on the CPU, and how it’s efficient architecture can eliminate the problems of being CPU bound once and for all.
get over yourself. Very old benchmarks? I specifically looked for games released in 2013 ... ermago ... thats soo friggin old.
a10 5700 = piledriver without l3 cache
4320 = piledriver fx
8350 = piledriver fx
whats soo old about the a10 5700? all that richland brought was higher clock speeds over trinity, kaveri brought lower clock speeds. you can't compare the 4.2 ghz richland to the 3.7 ghz kaveri and say its going to be 30% faster on top of being slower clock. 3.7 ghz piledriver vs 3.7 ghz kaveri is a fair comparison.
If anything id describe AMD's APU as a gpu with an integrated cpu while Intel is making cpus with an integrated gpu.
The reason your stuck on this "BF4 ERMAGO BENCHMARK" is because it doesn't stress the cpu AT ALL. Its a gpu bound benchmark. Thats why it was chosen, its a tactic called marketing.
As for the BSN article, its was pulled from your website.
Agree on that "very old" was an exaggeration from my part, but it was needed to compensate your exaggerated attack on Kaveri APU (I mean that one where you wrote "ass", "crappy", and "ass").
In the same paragraph I also wrote what I mean by "old", (="not optimized for AMD architecture"). Sorry but no, I was not referring to if was released in early 2013 or in late 2009.
I am not discussing marketing, you are. I am discussing technical and economic details behind marketing and execution plans by a company. I am explaining you that the whole master plan (which I have given you some elements) is towards offloading the CPU more and more. As you don't get it, offloading the CPU means you push more work on the GPU. Therefore a GPU bound benchmark is more characteristic of how next gen games will behave than old Intel+Nvidia games.
I note how you avoided my question about the jaguar-based CPU in the consoles...
noob2222 :
juanrga :
Ranth :
Juan we do agree that richland and piledriver is basicly the same, expect for powermanagement and the like, right? what does the richland APU have that the fx doesn't? And specifically what is it that makes the APU be the only one recieving performance increases:
The six factors, Except for the piledriver -> steamroller/HSA (Won't help in singlethreaded), doesn't everything else apply to fx and older apu too..?
The APU lacks L3 cache. I don't know if there are other differences which AMD has not disclosed. What I know is what follows.
The BF4 benchmark given at the October talk to OEMs shows a Richland APU performing as well as FX-6350 and FX-8350. The APU gives a 98% and 96% of the performance of each FX respectively.
The 'old' argument that FX is 30--50% faster because has L3 cache doesn't apply here.
The argument that Suez map is single player and not well-multithreaded doesn't mean anything, because a FX-4350 will not be 30--50% faster than the FX-6350 and FX-8350.
meaningless ... really ...
ya, ... multiplayer really favors the APU core in the 750k doesn't it.
Just to clarify,
750k = piledriver without l3 cache and no IGP.
4350 = piledriver fx
6300 = piledriver fx
8350 = piledriver fx
lets compare the 4.5 ghz piledriver 750k to the 4.7 ghz fx. That only accounts for a 4.5% difference in clock speed, if perfect scaling (wich its not) then subtract 4.5% from the results.
4350 = 18% faster than the l3 cacheless APU core
6300 = 47% faster than the l3 cacheless APU core
8350 = 76% faster than the l3 cacheless APU core
How many times do you have to be proven wrong? Single player benchmark is meaningless because it doesn't stress the cpu AT ALL. Its strictly a gpu benchmark until you drop below 3.0 ghz.
More of the same. Here you pick again the same benchmark that I commented before in a reply to you, when you linked it by the first time. And then you repeat the same without reading what was said to you. So typical.
And you always change the story. You stated that kaveri will =i5 in ordinary cpus workloads. When proven wrong now it's only in apu specialized software.
You may be happy with weaker cores coupled with a software solution, it will not go over well when the reviews come out.
And unsurprisingly you come with the same misunderstanding again, one that I corrected before and before...
In my BSN* article, which evidently you didn't read, I show how kaveri performs as an i5 using ordinary cpu workloads. I am not using "apu specialized software" as you pretend.
Finally, I am happy with the idea of software using my hardware. All my hardware (or most of it) and not only one half or one third of it. There is another person here with similar ideas to me. He compiled a program for his hardware and now it runs 2x faster or said in another way the former unoptimized program was using only a 50% of the performance of his hardware.
You seem to prefer the other way. If the software only uses a 50% of the hardware you update to a 2x more powerful hardware for that the same software can now ignore the 50% of more.
I mentioned this elsewhere, but the software side of things are very good for AMD right now in regards to gaming.
Remember when FX 8350 first launched? You'd have to dig forever to find good FX 8350 benchmarks and the AMD guys were floating the same few benchmarks which showed it doing well over and over again while there was a bombardment of Skyrim, Shogun 2, Starcraft 2, Warcraft, etc.
Now look at how things have changed. GamerK is trying to disprove the FX and the best he can do is post a few niche examples of modern games that don't run well on FX.
I do think this is why AMD is not in a rush to push SR on HEDT. PD has gone from trailing Intel 3570k by 30% in reviews in all the gaming benchmarks to beating 3770k. Nothing on the chip has changed.
AMD simply has absolutely no reason to even release SR on HEDT. Why release a chip that is 15% faster when you just got 30% improvement out of software increases by nipping the whole "game developers optimizing for Intel" thing right in the butt?
After watching the APU13 keynote on Mantle it's quite clear that AMD is designing Mantle to not only scale where GPU on APU does some other calculation besides rendering, but that this will scale between TWO dGPUs.
Watch the whole thing, I think Johan screwed up. He said you'd see a Mantle situation where one GPU calculated global illumination whilst one just rendered the scene.
Do you know what's missing from that equation?
An APU.
And I hate to beat a dead horse, but it's quite clear AM3+ is completely incapable of HSA.
Yet at the same time running two dGPUs on APU platform is a complete waste because of the PCIe lanes available.
So why talk about Mantle using two dGPUs? AMD doesn't have a publicly available platform that supports everything required for Mantle to use two dGPUs.
Johan dun goofed.
When did Crysis 3 and BF4 become a niche examples again?
The ONLY FX that is competitive is the FX-8350; the others, even the lower clocked 8 core variants (8230, etc) all lag behind IB i5's. The weak individual cores hobble the architecture, even in games that scale well, like BF4 and Crysis 3.
Same trend exists, the FX-8350 still lags behind the i5-3750k, adn even the FX-6350 comes in just ahead of the i3 lineup, trailing the i5's by a decent margin. Even the 8 core FX-8320 can't match even the cheapest i5, the 3350p, in BF4.
Hence my point: FX's architecture deficiencies are hidden in part due to clock speed. You typically see, at max OC, Intel pulling farther ahead. You also see the 4-core FX being noncompetitive, and even the 6-core being one step up from the i3-lineup.
Farther, the FX-8350, in both Crysis 3 and BF4, lost to the i5-3570k, which comes in just $10 more expensive. And at max OC, the i5-3570k pulls farther ahead (except in the case of a GPU bottleneck being reached, as seen in the second image). So I'd have to recommend the i5-3570k over the FX-8350 in EVERY instance right now. The 6350 is attractive for its price (~$140), but you accept sub-i5 performance by going that route.
That is not the sign of a good architecture, that with a clock speed edge and DOUBLE the cores, you still lag in performance, even in titles that use more then a dozen threads. The only time FX matches high end IB chips is when a GPU bottleneck suppresses the results.
Which is exactly my point. You keep referring to the same website and the same benchmarks to make your point.
When FX 8350 came out, you could find games FX sucked at all over the place.
Going back in time to see FX 8350 reviews, here are the results in google and the games tested
1. Tom's hardware: BF3, Skyrim, Warcraft
2. PCGamer: Shogun 2
3. PCMag: no benchmarks
4. Amazon: no benchmarks
5. TechReport: Skyrim, Batman, BF3, Crysis 2
6. Tom's hardware with the same gaming benchmarks again
7. Engagdet roundup
8. Newegg link
9. Anandtech: Skyrim, Diablo 3, Dragon Age, Dawn of War, WoW, SC2.
My point is that to show FX sucked at gaming when FX first came out, there were plentiful resources and pages upon pages of benchmarks showing that FX was behind the competition. BF3 and Crysis 2 were really the only strong games for it back then. Basically, just a handful of outliers.
Now you are stuck posting the same benchmarks from the same website saying "see, nothing has changed!!!!!" If multi-core is not catching on and it's impossible for it to catch on, I want to see pages of google results of modern games where FX is still significantly behind.
GamerK you really need to grow up. The old 10ghz Nehalem died a long time ago and we're never going to see anything like that.
Intel can no longer squeeze more single core performance out of their CPUs. They are even going to add "MOAR COARS" with Haswell-E. AMD has just finally caught up to older Intels in IPC with SR (going by rumors).
There is a big problem here that chips can no longer scale in the ways they previously have, and that we have to do something else to make a difference.
The path you are suggesting of sticking to single core rendering and not looking for alternatives is suggesting that we create a world where there's no longer a reason to upgrade your CPUs because Intel can't make anything faster and AMD is still catching up in single thread.
Tell me how good that works out for AMD and Intel when they need to sell CPUs? You are so backwards thinking it makes my brain hurt. I can imagine you as a crotchety old man sitting in front of his computer going "these stupid kids and their Penium 1s, it all went downhill after the 486, I wish I could go back to the golden age of computing!"
The fact that it is Linux/OSX native is the biggest boon. Now you could do AAA titles on all 3 major OS platforms...breaking the monopoly Windows has on high end PC Gaming.
and seemingly shift the monopoly to amd. doesn't look like a good thing in long term.
Except AMD has clearly stated that they want to build a standard about it, offering it to Nvidia and Intel.
de5_Roy :
juanrga :
de5_Roy :
then he kept repeatedly bringing it up saying that benchmark is the reason amd cancelled higher core SR-FX cpus and apparently that's what amd 'told' oem representatives at that event (even thought he failed to provide an audio transcript after repeated asking).
This is pure and simply false (*). The reasons why AMD is not releasing SR FX line are multiple and I explained them here before: transition to APU, reorganization of server plans, and lack of demand. I gave details and further explanations for each one.
oh really? false, is it? here's what you said in your post, ad verbatim:
juanrga :
In their talk they were saying to OEMs that the new APU gives >90% of the FX-8350 performance at one fraction of the cost. That is what the slide #13 says. That doesn't look as promoting the FX line. The FX-4350 didn't even was mentioned by AMD.
I see two options either (i) AMD will be abandoning the FX-4000 and refreshing the FX-6000/8000/9000 a la Warsaw or (ii) AMD will drop the entire FX line completely and focus on an APU line only.
I am not assuming anything, I asked you a specific question.
but wait.. there is more, in case you have conveniently forgotten (along with the poorly executed deflection when i ask for audio version):
juanrga :
de5_Roy :
juanrga :
In their talk they were saying to OEMs that the new APU gives >90% of the FX-8350 performance at one fraction of the cost. That is what the slide #13 says. That doesn't look as promoting the FX line. The FX-4350 didn't even was mentioned by AMD.
talk? there's an audio version? i didn't listen to that. if amd said that the new apu(6790k) gives >90% of fx8350's performance, they were intentionally being vague (>90% of what? what tasks?) to pitch 6790k. benchmarketing at play - largely irrelevant, since independent reviews will reflect the real world perf/price.
^^ bolded the relevant part.
That part that you bolded is "That is what the slide #13 says."
I am saying which is the message given by the slide. Not that the slide is the reason for AMD plans. You got it backwards.
The reasons why AMD is abandoning FX line (and not releasing improvement or refresh) were given by my before:
The reasons why AMD is not releasing SR FX line are multiple and I explained them here before: transition to APU, reorganization of server plans, and lack of demand. I gave details and further explanations for each one.
Disagreeing is fine, but he writing "to save their ass", "a crappy cpu", "and an idle IGP", "weak ass cores"... clearly denotes hate.
Specially when he is plain wrong. As shown in my article about Kaveri a 3.7GHz SR CPU perform like a SB/IB i5 with ordinary CPU workloads, loosing in the FP intensive ones but outperforming in the integer workloads. There I assumed 20% IPC over PD, but some late leaks suggest that the final improvement is >30%. Therefore add to the scores I published if the leak is true.
you want to claim I am wrong yet you only offer your opinion. APU cores are weak, always have been.
the lowly fx-4100 is 46% faster than the A10 5700 and clocked slower. The fx-4320 is 58% faster and the fx-8350 and the I5 3470 is 73% faster. good luck catching that i5.
This is the same architecture. Id call that a weak ass core any day.
now tell me using these RL results how removing the L3 cache in favor of making the cpu weaker and adding an IGP is such a great thing to ever happen to the computing industry?
How is kaveri going to catch the I5? 8350? ... heck even the 4320 ...
Id say the a10 5700 is a pretty good spot to start with Kaveri figures since they both clock at 3.7 ghz. As you put it "in ordinary cpu workloads" shown above, its starting at negative 40% already just to get to the 4320.
Kaveri is not going to be the answer your hoping it will be. It will not catch the i5 3470 very often (if ever,) it may here and there, but that will take some luck "in ordinary cpu workloads"
Tell me just exactly how I am "just plain wrong"?
I already said this to you. Also you continue using essentially the same logic than in your previous attack to AMD ARM line.
Here you pick very old benchmarks (not optimized for AMD architecture, one of them is a Nvidia sponsored game) and an old 5700 Trinity APU and you claim this is how Kaveri will perform.
You miss the estimation of Kaveri CPU performance in the BSN* article, you miss the leaked benchmarks comparing Kaveri to Bulldozer and Piledriver FX, and you miss the BF4 benchmark given by AMD during October talk, where a Richland APU got the 98% of the performance of a FX-6350 and the 96% of the performance of the FX-8350 (the three using a R9 280X and playing @ 1080p ultra). I think you continue missing the subsequent discussion on multiplayer BF4. And you miss that AMD is in the consoles now, with a CPU based in jaguar cores.
Can you compare the performance of the PS4 CPU to Kaveri CPU? I can.
Maybe you don't still understand this, but game developers will be offloading the consoles CPUs and running the heavy computations on the consoles GPUs. That is why both consoles have GPGPU abilities and HSA support.
You also miss that MANTLE aims to liberate some CPU bottlenecks that exist in current gaming technology. This is from Oxide talk at APU13:
Mantle Unleashed: How Mantle changes the fundamentals of what is possible on a PC. Over the last 5 years, GPUs have become so fast that it has become increasingly difficult for the CPU to utilize them. Developers expend considerable effort reducing CPU overhead and often are forced to make compromises to fully utilize the GPU. This talk will discuss real-world results on how Mantle enables game engines to fully and efficiently utilize all the cores on the CPU, and how it’s efficient architecture can eliminate the problems of being CPU bound once and for all.
get over yourself. Very old benchmarks? I specifically looked for games released in 2013 ... ermago ... thats soo friggin old.
a10 5700 = piledriver without l3 cache
4320 = piledriver fx
8350 = piledriver fx
whats soo old about the a10 5700? all that richland brought was higher clock speeds over trinity, kaveri brought lower clock speeds. you can't compare the 4.2 ghz richland to the 3.7 ghz kaveri and say its going to be 30% faster on top of being slower clock. 3.7 ghz piledriver vs 3.7 ghz kaveri is a fair comparison.
If anything id describe AMD's APU as a gpu with an integrated cpu while Intel is making cpus with an integrated gpu.
The reason your stuck on this "BF4 ERMAGO BENCHMARK" is because it doesn't stress the cpu AT ALL. Its a gpu bound benchmark. Thats why it was chosen, its a tactic called marketing.
As for the BSN article, its was pulled from your website.
Agree on that "very old" was an exaggeration from my part, but it was needed to compensate your exaggerated attack on Kaveri APU (I mean that one where you wrote "ass", "crappy", and "ass").
In the same paragraph I also wrote what I mean by "old", (="not optimized for AMD architecture"). Sorry but no, I was not referring to if was released in early 2013 or in late 2009.
I am not discussing marketing, you are. I am discussing technical and economic details behind marketing and execution plans by a company. I am explaining you that the whole master plan (which I have given you some elements) is towards offloading the CPU more and more. As you don't get it, offloading the CPU means you push more work on the GPU. Therefore a GPU bound benchmark is more characteristic of how next gen games will behave than old Intel+Nvidia games.
I note how you avoided my question about the jaguar-based CPU in the consoles...
noob2222 :
juanrga :
Ranth :
Juan we do agree that richland and piledriver is basicly the same, expect for powermanagement and the like, right? what does the richland APU have that the fx doesn't? And specifically what is it that makes the APU be the only one recieving performance increases:
The six factors, Except for the piledriver -> steamroller/HSA (Won't help in singlethreaded), doesn't everything else apply to fx and older apu too..?
The APU lacks L3 cache. I don't know if there are other differences which AMD has not disclosed. What I know is what follows.
The BF4 benchmark given at the October talk to OEMs shows a Richland APU performing as well as FX-6350 and FX-8350. The APU gives a 98% and 96% of the performance of each FX respectively.
The 'old' argument that FX is 30--50% faster because has L3 cache doesn't apply here.
The argument that Suez map is single player and not well-multithreaded doesn't mean anything, because a FX-4350 will not be 30--50% faster than the FX-6350 and FX-8350.
meaningless ... really ...
ya, ... multiplayer really favors the APU core in the 750k doesn't it.
Just to clarify,
750k = piledriver without l3 cache and no IGP.
4350 = piledriver fx
6300 = piledriver fx
8350 = piledriver fx
lets compare the 4.5 ghz piledriver 750k to the 4.7 ghz fx. That only accounts for a 4.5% difference in clock speed, if perfect scaling (wich its not) then subtract 4.5% from the results.
4350 = 18% faster than the l3 cacheless APU core
6300 = 47% faster than the l3 cacheless APU core
8350 = 76% faster than the l3 cacheless APU core
How many times do you have to be proven wrong? Single player benchmark is meaningless because it doesn't stress the cpu AT ALL. Its strictly a gpu benchmark until you drop below 3.0 ghz.
More of the same. Here you pick again the same benchmark that I commented before in a reply to you, when you linked it by the first time. And then you repeat the same without reading what was said to you. So typical.
And you always change the story. You stated that kaveri will =i5 in ordinary cpus workloads. When proven wrong now it's only in apu specialized software.
You may be happy with weaker cores coupled with a software solution, it will not go over well when the reviews come out.
And unsurprisingly you come with the same misunderstanding again, one that I corrected before and before...
In my BSN* article, which evidently you didn't read, I show how kaveri performs as an i5 using ordinary cpu workloads. I am not using "apu specialized software" as you pretend.
Finally, I am happy with the idea of software using my hardware. All my hardware (or most of it) and not only one half or one third of it. There is another person here with similar ideas to me. He compiled a program for his hardware and now it runs 2x faster or said in another way the former unoptimized program was using only a 50% of the performance of his hardware.
You seem to prefer the other way. If the software only uses a 50% of the hardware you update to a 2x more powerful hardware for that the same software can now ignore the 50% of more.
I love the idea of utilizing software. Not at the expense of only releasing low end hardware and relying solely on the software to make up the lack of good hardware. We need both. APU are not high end. Period.
APU are a little of both, and a master of none. Without the specialized software as I have shown kaveri needs 80% or more boost to catch the low end i5. It needs 40% to catch the 4320. This is in ordinary gaming cpu workloads.
Your the one not looking at the data I have provided and coming up with lame excuses instead of trying to convince me with hard evidence. Your website and marketing slides that are designed to only show the best possible scenario do not count as evidence. Those will be only true in a few select cases and are not representative of "ordinary workloads".
Kaveri is not "low end hardware".
And as explained before the benchmarks given are using "ordinary workloads". They are not "specialized software" for APUs neither the "best case".
You don't understand what was made/measured in the BSN* article. I have corrected your misunderstandings again and again and again. Still you insist on that those benchmarks are "specialized software" for APUS, when are not.
anyone know how they did 2x the performance per watt while still being on 28nm? They should be competitive with baytrail with numbers like this if true.
Wow , that's just strange. Beema and Mullins weren't canceled to make way for arm cpus? Wonder who said they would be.
You? Because AMD said "jaguar servers" replaced by "arm servers"... :lol:
from those slides, another sentence that caught my eye was "thin layer of abstraction" but how thin? amd claims it is extendable to other uarches and forward compatible. doesn't that mean it will eventually become bloated? may be i'm being too skeptical.
Extendable to other uarches doesn't mean adding layers on top. MANTLE consists of two layers: one API plus the driver. If you substitute the MANTLE driver for GCN by another for Intel or Nvidia the MANTLE API will work on top of non-AMD hardware.
A part that caught my attention was that about how AMD would pair a APU with a dGPU. Previously in this tread two options were discussed. In one the iGPU was devoted entirely to compute and the dGPU to graphics. In the second option the iGPU and the dGPU would work in tandem doing compute or graphics or both. This is from the talk:
The application also acquires the ability to take control of the multi-GPU and to decide where to run each command issued. Why AMD has provided in Mantle access CrossFire compositing engine, data transfer between GPU etc.. Will allow multi-GPU modes that go beyond the AFR and adapt better example to use GPU computing in games or asymmetric multi-GPU systems as is the case for APU combined with a GPU. For example it is possible to imagine the GPU load based rendering and APU handle the post processing.
Which is exactly where I've been saying things were moving toward, and how I see this all being applied: The APU handles Physics/Compute/OpenCL, while the dGPU focuses on graphics.
But it is the other possibility ;-) "The application also acquires the ability to take control of the multi-GPU and to decide where to run each command issued". Therefore both iGPU and dGPU work in tandem, as I said, and the workload is split dynamically. In some cases it will make sense the iGPU for compute and the dGPU for graphics, in other cases it will make sense that both iGPU and dGPU execute graphics work, in another case part of the dGPU can assist to the iGPU on compute...
Which is exactly how I DON'T want HSA implemented, because this will never work in the real world.
We are discussing MANTLE not HSA...
gamerk316 :
juanrga :
MANTLE improves game performance thanks to eliminating the overhead in Microsoft DirectX API.
In consoles avoiding the DirectX overhead allows for about 2x more performance on the same hardware. I think we don't will see that in the PC, but 30%--50% increase in performance seems feasible.
Dice developers have said that Radeon + MANTLE enabled version of BF4 will ridicule Nvidia Titan. We will see.
As I've said before, DX11 doesn't have nearly as much overhead as DX9. You are loosing, worst case, ~10% maximum performance in non-CPU bottlenecked situations.
Secondly, when developing for consoles you can make assumptions about the hardware that you can NOT on PC's. If you are not careful, fine tuning too much can tank performance on PC's, because you, the developer, do not have final say on when your application even runs, let alone where resources are allocated or when your worker thread actually starts. There's a reason we went away from coroperativly scheduling threads.
As I said before the overhead in DX9 was of ~10x or higher. In DX11 it is reduced to something as 2x only if you use batch calls, which reduce the game richness and developer freedom.
No sure why you repeat what I said about optimization in PC (~30--50%) being less than in consoles (~100%).