AMD CPU speculation... and expert conjecture

gamerk316 · Feb 16, 2015

Cazalan :

Probably right on.

All I have to say, is thank god MSFT licensed windows by the CPU instead.

palladin9479 · Feb 16, 2015

gamerk316 :

Yeah Oracle's licensing scheme is why we are moving off SPARC Solaris for x86 RHEL for our critical stuff. Moving to JBoss is saving a ton of money though we're still struggling to find a decent enterprise scale database to replace Oracle RDBMS. Postgresql and MSSQL simply won't cut it for us. At least with x86 CPUs we can get better price scaling inside ESXi.

blackkstar · Feb 16, 2015

Are we really going to do this? L3 helps, but only in some situations. Are people missing the logic here? If I sometimes use my computer, and you see me not using the computer, that doesn't prove that I never use the computer. Same thing.

If CPU sometimes use the L3, and you see CPU not using L3, that doesn't prove that L3 never gets used. You would need to show that every single use case offers absolutely no performance gain. It's because the original claim isn't that L3 or my computer is always used, it's that it can sometimes be used. Another example, someone saying "I sometimes use my car to get to places faster". If you see them walk to the bathroom in their house and go "LOL U DIDNT USE UR CAR U LIED TO ME!" you're on the wrong page and you're not proving anything. The logic some of you are employing is simply mind numbing. I realize these examples are stupid, but sometimes you need a dumb example to expose faulty logic.

The real question that should be asked, is if the L3 cache is used enough for Bulldozer family CPUs to justify the die space used. It's quite clear sometimes it does absolutely nothing and other times it can help a lot. Even in this thread, it's explained why by people like Palladin. L3 cache is going to be absolutely useless if the information the CPU is looking for isn't there. In fact, I'd imagine it can even be a bad thing, because if the data is very unlikely to be in L3, you have to check if it exists in the L3 before you make the jump to system memory. I don't know what kind of data structures they use to store data in L3, but it could be somewhat time consuming I would imagine.

I'm in the camp that it was a waste of die space. It's clearly not used very often in consumer use cases, and Bulldozer family server market share is abysmal. It was a design choice for servers that offered hardly nothing for consumers, and the CPU ended up being consumer oriented whilst failing in servers. That space would have been much better used for additional Bulldozer family modules, but I've explained this before. AMD thought Bulldozer would be great for servers. They were pretty wrong. Lots of server software is licensed by CPU core, and they managed to sell their CPU cores as one module equals two cores. If they BSed things with one module being a core with one logical core, they would have been a lot more appealing. But hindsight is 20/20. I'd imagine they looked a customer cases with Bulldozer where many weak cores ended up in a ton of addition licensing costs and will adjust with Zen. Many small weak cores is not good for some server loads, specially when someone is having their way with you for licensing per core. Other times it is fantastic. Which is also why I don't see K12 and Zen having similar design goals.

juanrga · Feb 16, 2015

8350rocks :

1.) You are wrong. At those integration levels the CPU occupies about 5% of the 650mm2 die. Does this mean that your "dedicated GPGPU die" would have 5% more GPU cores and be 5% faster than the APU? Of course no, because the "dedicated GPGPU die" has to spend die space on a memory controller for its own memory pool and has to spend die space on the interconnect logic. Add the cost of data movement and the conclusion is that the "dedicated GPGPU die" is much slower than the APU.

2.) The diagram in the article shows clearly that the central APU and the assistant APUs are not connected by "TSV interposers". No sure which is your misconception here.

Horst Simon is one voice within the community. Others disagree with him. He has made a bet on that an exascale system will be not ready on November 2019. I don't care if he wins the bet or not. My claim is about what will happen around 2020. Iit is irrelevant if it happens on 2019 or 2020 or 2025 or what; the concrete data is irrelevant to my point; point which you have ignored once again.

You finally copy and paste a quote from him about the cost of data movement. You don't say us anything new. The introduction of the APUsilicon article is devoted to explain the cost of data movement and why this is the cause for all the engineers abandoning dGPUs.

juanrga · Feb 16, 2015

blackkstar :

One of Bulldozer mistakes was to offer consumer-level CPUs whose throughput require a number of treads beyond what was usual in the consumer-level software. Threading of software has advanced since Bulldozer launch, but you don't see Intel selling 16-core CPUs for mainstream PCs, instead you see Intel selling dual-cores and quad-cores. Nvidia has a 16-core design for future server/HPC node, but Tegra Denver for mobile is a dual-core...

Your suggestion to use L3 die space for extra modules makes the problem poor. A hypothetical 16-core Bulldozer CPU without L3 would represent a fiasco bigger than what FX-8100 series CPU were. Whereas a hypothetical 8-core Bulldozer CPU without L3 would represent about the same kind of fiasco.

That extra space from eliminating the L3 had been better used on increasing the execution ports of the cores, duplicating the front-end, increasing the size of L1/L2 caches, giving a separate FPU for each core...

gamerk316 · Feb 16, 2015

Not sure if the DX12 i3/APU results got posted:

http://www.anandtech.com/show/8968/star-swarm-directx-12-amd-apu-performance

As predicted, the i3 remains ahead, and gains big in the frame time department. I'll say it again: DX12 is going to make the i3 a viable gaming CPU again. And AMD fanboys are going to rage about it.

esrever · Feb 16, 2015

gamerk316 :

In an actual game, that would be negligible. Also they are GPU limited. I doubt it matters to you but you can get a 6 core for less than the i3. If games uses all 6 cores, then the i3 becomes pointless.

con635 · Feb 16, 2015

noob2222 :

gamerk316 :

The UNLOCKED £55 athlon kaveri, yes fifty five gbp with unlocked multiplier says hi to the i3. Now how close to i5 was the i3 again?
I'm a massive fanboy of price/performance btw.

con635 · Feb 16, 2015

Oh and to add to the above the budget a88x boards are better than the budget h81 boards, eg pcie3, vrm heatsinks etc

gamerk316 · Feb 16, 2015

esrever :

As I've pointed out many times going back to before BD even launched: The ability to scale workload to cores does NOT imply a performance increase. As long as the CPU can get all it's work done faster then the GPU, your performance is bound by per-core performance.

Sure, I can make a 1 million core CPU, and scale workload across all the cores. But if a single core chip can do at the work in the same timespan, guess what? The 1 core CPU is just as fast.

As should be painfully obvious now "using more cores" does not automatically imply "more performance".

Reepca · Feb 16, 2015

palladin9479 :

cdrkf :

I never said it didn't give any improvement, I said it provided 10~15% best case scenario. That's human binary thinking for ya, people immediately assume it must be 100% one way or 100% another. The second set of benchmarks actually demonstrate exactly what I was saying. The 750K (4.3Ghz) vs FX-4350 (4.2~4.3Ghz) is 141/136 = 3.67% difference with with minimum being 6.99%. The minimum is where your getting cache miss's on L2 and that's where L3 would come in. The real problem is that L3 takes up nearly half the die, so your sacrificing 5~15% performance for cutting the die size in half or adding an semi-powerful iGPU. So if you do want to make a SoC then the first thing to get tossed is the L3 to make space.

The L3 taking up so much space gets me thinking: How would Intel's CPUs look if they ditched L3 and instead used that space for, say, a bigger/better iGPU?

Cazalan · Feb 16, 2015

gamerk316 :

I suppose you could but you'd have to sacrifice for it. I game with an i5 and I sometimes have to close my web browser or it just chugs along. And the usual arguments if you want to do anything else while you're gaming, like say a Twitch stream.

esrever · Feb 16, 2015

gamerk316 :

It should be obvious that dx12 was designed for better balancing work across more threads. Maybe you just don't get that simple fact. When you make a 1 core cpu with more performance than 1000 cpus then go ahead, Im sure they have a nobel prize for that kind of thing. We are stuck in the real world where the only way to go is to go parallel. We aren't even talking about going from 1 to 1000, we are talking about going from 3 to 8 or so. But of course, the 1 mythical cpu in your imagination can do all that just as fast and doesn't require you to think any harder about your terribly threaded code.

noob2222 · Feb 16, 2015

Im curious how many people understand the differences with best-case, average, and worst-case.

Average isnt the only thing that matters, if im building a system for photoshop, i dont care how fast autocad or encoding runs. If im building a system for games, i dont care how fast the igp is if a specific game runs slower on an apu. I dont care about average game performance spread over 75 games when the one i like runs 30% slower for whatever reason on an apu. Average is meaningless when a specific target runs like crap.

Averages can be skewed depending on what a review decides to include and what to ignore. What is the average user going to notice on his own system? The average performance or the one game that runs like poo. Is he going to say "well all the other games run ok so im happy that this one game is soo slow"?

jdwii · Feb 16, 2015

noob2222 :

That's the main reason i can't recommend the fx to anyone anymore for gaming. It offers inconsistent performance in gaming.

Directx 12 simply lowers the CPU bottleneck for all players not just ones with moar cores.

palladin9479 · Feb 16, 2015

gamerk316 :

Haswell i3 was always a viable gaming CPU. That fourth ALU gave HT enough extra power that it would no longer be limited by background tasks / OS work like the Pentium G is. I consider the fx63xx to be in the same class as the i3, each have a slightly different strength depending on exactly what your doing but otherwise are comparable. The i5 outclass's both of them while the fx83xx is in a different category due to it's niche usefulness and the i7 is in it's own class with no other CPU to compete. To be honest I don't really see the i3 as a "dual core" anymore since the Haswell update, it acts far more like a strong three core / moderate four core CPU.

jdwii · Feb 17, 2015

True genius and others I would like people to know Juan is not the only one claiming the dgpu will be obsolute one day. If one day a Apu uses only 10% or less die space for the cpu what would be the point of a dGPU?

Think about that guys. We all know 8 cores is more then enough for any of us and I'm talking 8 Intel haswell cores.on a 14nm die that type of cpu is quite small on 8-10nm even smaller. HBM+ddr4 sounds amazing stacked memory for the GPU and ddr4 for the CPU while allowing data to share things like HSA.

On 28nm apus are kinda a joke for gaming but what about smaller cores and 14nm? In the next 3-5 years things will start to look interesting. I call on my fellow members here do not be scared I honestly think PC gaming will really win out over all of this.

palladin9479 · Feb 17, 2015

I nuked a bunch of posts because the fighting was getting too uncivil. No more insults please.

truegenius · Feb 17, 2015

jdwii :

i want to give a list of constraints with explanation but it will be followed by juan's post in which he will state that i already gave solution to those constraints 73 pages ago 24 pages ago 478 posts ago but will link or tell to go to his article in his every post etc and then someone will pm me using rude words that i am trolling ( i repeat "trolling" ) and will delete my post :pfff:

palladin9479 :

second part of my previous was on l3 discussion and can make it easy to understand why l3 was skipped and may even settle l3 discussion with agreement from both parties, it took more efforts than just listing the general reason about skipping the l3
only first part was dedicted to troll the troll

gamerk316 · Feb 17, 2015

esrever :

Again, the ability to load balance across multiple threads, by itself, does NOT increase performance.

What DX12 is making easier is the ability to perform rendering across multiple threads. You could technically do this in DX11, with restrictions, and both Frostbite and Cryteck 3 are capable of doing so. Crysis 3 in particular did so across almost a dozen threads.

And yet, you had the same performance you see in other games, with AMD, at best, matching Intel, despite the fact the game could scale to 12 cores. Why? Because even though their chips have fewer cores, they are powerful enough to get all those threads done in the same amount of time, or in some cases, faster, then an 8 core chip from AMD. Sure, those extra threads help keep CPU load down on each core, which is good (less chance of bottlenecking on any one core), but again, doesn't by itself increase performance if you sacrifice per-core performance in order to increase scalability.

This is the basic argument I've been making since the early details of BD were starting to leak, and 5 years later, they still hold true. Look back at the BD thread, and what did I predict? A CPU that would do great in benchmarks and a few subset of tasks that scale well (encoding, for example), but would otherwise lag behind Intel in performance due to weaker cores. And 5 years later, it still holds true.

Games are not CPU bottlenecked, and almost certainly never will be again. You're CPU side performance gains are going to come from being able to dispatch faster to the GPU (DX12/Mantle/OGLNG), and by increasing per-core performance, not by increasing core count.

The amount of CPU side work that is meaningful isn't increasing by much. As a result, increasing the ability of the CPU to do more work (adding cores) will not increase performance. Improving the speed that work gets done (IPC, clock speed), however, will. That's about the simplest terms I can put it in.

Embra · Feb 17, 2015

Samsung Mass Producing 14nm FinFET Chips
http://www.tomshardware.com/news/samsung-14nm-finfet-mass-production,28570.html

Is this something AMD can get in on? What is the relationship between Samsung and AMD?
I heard rumors of a buyout, which I think is purely rumor.

de5_Roy · Feb 17, 2015

Embra :

yes, since amd's usual foundry partner glofo has licensed it from teh sammy. my concern is the following though:

Technically, though, Samsung's 14nm is not a pure 14nm process, but somewhere between a 20nm process and a 14nm process -- which still leaves Samsung in a much better position relative to Intel than in the past.

according to glofo's site, this 14nm has two iterations: 14nm LPE for mobile socs (reduced leakage and low power instead of clockrate) and LPP (likely to be used to gpus and apus/socs).

Cazalan · Feb 17, 2015

It is a significant advancement for Samsung considering they really just barely started shipping 20nm. Granted it's not a full node shrink but FinFETS are better than no FinFETs. It could definitely bode well for AMD which is looking to introduce 14nm at a more relaxed 2H2016, of course through the GF side of the GF/Samsung cooperation.

jdwii · Feb 17, 2015

truegenius you are more then welcomed to PM me with any further information so your post won't get simply deleted not sure why that would happen.

Anyways tourist nice to see you back on.

Another liquid cooled GPU happy at least they aren't trying to use air cooling
http://wccftech.com/amd-fiji-xt-r9-390x-cooler-master-liquid/

New APU A8 -7650K 3.3Ghz with 3.8Ghz turbo quad core steamroller but for 120$? Seems a bit to much
http://www.kitguru.net/components/cpu/anton-shilov/amd-to-launch-a8-7650k-kaveri-apu-this-week/

Edit and i would once again like to state juan isn't the only one claiming the dgpu will be gone one day(not saying it will or not myself but i can imagine it)

I remember reading on some Intel forums(bias maybe) and some users their think the same way. linus also said similar things and i doubt you can laugh at the creator of the linux kernel, Not saying he is right its not like he is a hardware engineer but i still think his opinion counts.

Edit here is the source for Linus's quote
http://www.realworldtech.com/forum/?threadid=141700&curpostid=141714

noob2222 · Feb 17, 2015

Im not completely sold on dgpus going away. There are too many unknowns at this point. I stated when ivy came out that 14nm will be a difficult node for desktop speeds and was laughed at. Where is the fabled droadwell DT? In order for Dgpu to vanish, nodes have to continue to shrink without losing ground to heat density and having to run slower to compenate. They way everyone wants to talk about efficiency > all, is everything from this point just going to run slower than what we have now?

There are too many obstacles to just blindly praise intel for "talking" about 5nm when there isnt anything about 14nm to get excied about. Sure at 5nm they can put a massive igp on chip but who cares if the cpu caps at 500 mhz.

Not to mention that 4k resolution is just starting to show up.

AMD CPU speculation... and expert conjecture

Glorious

Splendid

Honorable

Distinguished

Distinguished

Glorious

Splendid

Honorable

Honorable

Glorious

Honorable

Distinguished

Splendid

Distinguished

Splendid

Splendid

Splendid

Splendid

Distinguished

Glorious

Distinguished

Splendid

Distinguished

Splendid

Distinguished

Share this page