AMD CPU speculation... and expert conjecture

juanrga · Oct 10, 2013

blackkstar :

juanrga :

Oh, but juaranga, you're missing the point! gamerK found a situation where a game engine doesn't scale to all cores! So therefore games NEVER scale past a few cores and there's no point in ever adding more cores! Your problem is that you're not using gamerk logic! You need to use gamerk logic to understand this!

Another example of gamerk logic where you draw a conclusion from a single piece of evidence:

"I ate a taco, it made me sick, therefore all food makes you sick"

You need to start thinking like that, because that's how gamerk thinks! Please start making your arguments like that. They should follow a formula along the lines of "here is a cherry picked example so allow me to apply this evidence to every situation and then keep repeating it until it becomes accepted!" You also get bonus points for going "IM A GAME DEVELOPER IM AN AUTHORITY ON THIS YOUR OPINION IS INVALID!!!"

Intel moving Haswell-E to 6 core and 8 core configuration when it's still on 22nm and Intel is charging $580 for a 227mm^2 die and they're apparently going to throw that away just for the goodness of consumers, because that's what Intel does, they offer better products at great prices just to help out consumers! THATS THE ONLY REASOONNN!!!!! I'm so glad we have GamerK to explain why Intel is moving to 6 and 8 core on enthusiast market and to show us that they're going to do this just to be nice to us!

(if you can't detect the sarcasm in this post, I feel sorry for you)

Of course I missed it, because I am always wrong :pt1cable:

Recall when he promised us that two main threads would be the rule for next gen games. i3 was going to rule the world of gaming. He was so confident that even had prepared for us the "I said so".

Still this new game is already using 8 threads as some of us predicted, although toms has managed to test a part of the game barely using six-threads and under GPU-limited situation for capping the real potential of six-core i7.

The really interesting part will come with the MANTLE updated edition, which will _optimize_ the support for 8-cores by eliminating the DX constraints

It will be funny to hear again expertise comments about how it couldn't be made.

8350rocks · Oct 10, 2013

juanrga :

Blandge :

It is not about dismissing but about filtering quality. Also no sure from where you got your "bad software" claim.

8350rocks :

Several: Opteron-X, JG, SB, A9, A15... and GFLOPs/DMIPS

When AMD says that Steamroller will be about 30% faster PD, you repeat the number in many threads. When AMD says that its Seattle will be 2x-4x faster than Opteron-X you start this show. It is the same AMD in both cases, but you accept one claim whereas reject angrily the other and attack to anyone who disagree with you: double standard.

Because you're basing ARM performance on an x86 CPU baseline.

Apples, meet Oranges, you're now the same...

See my point?

ARM and x86 do not perform equally in many tasks, you are making the *HUGE* mistake of assuming that ARM performance will be x86 + X%...and it will absolutely not.

That's mainly why many think your claims are ludicrous at best. You have no ARM baseline to judge by...by your own admission the A7 and A9 are not the same thing as A57. Many well know this. You can talk hype all you want, however, this is not the same as an evolution of an architecture, like SB->IB->Hasfail, or BD->PD->SR.

In effect, I think you're doing no more than pulling numbers out of your ...sky... to be diplomatic about it. Then you present them as verifiable facts because someone put them on a marketing slide comparing a ULV x86 to a not yet developed or released ARM core that doesn't even have ES's available for testing.

-Fran- · Oct 10, 2013

I have to agree with Juan on that part as well. Gamerk has been very vocal about (not)core-scaling in games. While I do understand his points, he is fighting with the same paradigm arguments on serial programming. I think palladin gave a great post on that matter a LOT of pages back in the thread, but as you can imagine, I won't go back and look for it. Basic gist of it was: change the paradigm and re-think the problem to go parallel.

Also, I think in Tom's article, they lacked the usual (and lovely) "core scaling" graph they use in some articles. It was weird for them not to show one. Mr Angelini, if you're reading this, please add one next time, for the full release of BF4 and specially the "MANTLE" edition, haha.

I won't get into performance theory, because... You know... Flying cows and all.

Cheers!

noob2222 · Oct 10, 2013

juanrga :

Here is where you fail. Marketing isn't going to give you the best possible scenario, they are giving you what is possible.

2-4x opteron x. they didn't say wich one. You automatically put in the best one they have, the x2150 @ 158 Gflop. What if marketing was refrencing the x1150 @ 72 Gflop, or even the dual core @36 Gflop? This is of course assuming 2 ghz and not 1.5 or 1.0 ghz.

http://www.amd.com/us/products/server/processors/six-core-opteron/Pages/ctp-calculations.aspx

Marketing isn't lying by saying 2-4x faster, they aren't saying its 2-4x faster than the flagship opteron x like you are

In case you didn't know how marketing works, they love to show their best case scenario. Meaning its the x1100 vs the 16-core ARM cpu. It could be anywhere from 72 gflops to 256 and marketing would be correct.

It does not mean without a doubt in anyone's mind that its 256, that would mean that its 8x faster than the dual core or 1ghz optern x, but if they stated that, your figures would turn into 512.

While (Juanrga = wrong) {printf("JUANRGA IS NEVER WRONG")};

blackkstar · Oct 10, 2013

I have seen GamerK's argument too many times.

It happened when people were building Pentium G systems on Reddit. Folks cherry picking benchmarks where Pentium G beat Phenom and then saying Phenom sucked and get Pentium G.

Then new games came out and Pentium G with 2 threads was useless. So everyone started recommended i3s instead of FX 6300 or Phenom 2 instead of Pentium G.

Then, i3 wasn't enough for things like BF3 multi-player so folks started recommending FX 6300 over i3. Now we are starting to see people recommend 8350 over 4670k and 3570k.

It is a constant pattern I've noticed and every single time there is someone telling you to buy Pentium G or Core i3 instead of the AMD they are posting cherry picked benchmarks telling you you don't need two threads, or four threads, or 6 threads, or 8 threads.

It's a never ending cycle. a year from now, if the cycle repeats itself, everyone will be building FX gaming rigs. Mark my words, it's already happened twice already and it's always the same. GamerK probably floats around on /r/buildapc and tells everyone to get Pentium G then i3 and now 3570k.

Mod Edit: Do not call people out or try to cause drama.

jdwii · Oct 10, 2013

8350fx scales ok to BF4 now wondering how it will be when mantle comes out?

8350rocks · Oct 10, 2013

jdwii :

Should be even better with less overhead. Probably see usage dip to 60-70% across the board.

Cazalan · Oct 10, 2013

Looks like BF4 needs MANTLE badly if it is GPU limited by a Titan at 1080P.

It is about time a greater emphasis is put on the software side than trying to brute force it with 6+ billion transistor graphic chips.

blackkstar · Oct 11, 2013

I really can't believe we're doing this.

We go from:

"Look at these 720p, graphics on low benchmarks to compare CPUs"

to

"Look at this 1080p, graphics on ultra benchmark to compare CPU, i3 is the same!"

As an AMD guy, I didn't just spend the last several years having Intel guys posting 720p gaming benchmarks which show differences in CPUs that make FX look weaker suddenly turn around and change to 1080p ultra settings and use that as a CPU bench?

Is this how it's going to be? AMD starts to catch up in gaming so after years of "AMD SUX IN SKYRIM IT GETS 120FPS INSTEAD OF 190FPS AT 720P!" we go to "it doesn't matter you're GPU bottlenecked ANYWAYS SO JUST BUY INTEL!!!"

My face hurts from pushing my palm into it from all of the facepalming.

esrever · Oct 11, 2013

6300 is best budget gaming cpu. In toms bf4 tests, the 8350 did better than the 2500k in terms of mins. I would imagine 8 cores offer smoother gameplay.

jdwii · Oct 11, 2013

If i put 100$ into paypal is their anyway some of you guys would never mention Arm in this article again?

m32 · Oct 11, 2013

jdwii :

ARM IS THE FUTURE! ARM IS GOING TO TAKE OVER X86!!!! PRODUCTIVITY, GAMING, POWER EFFICIENCY AND OTHER STUFF IS ALL ON ARM's SIDE! DON'T BE SCARED OF THE FUTURE!

:lol: I can't wait to see some SR tech. This thread is getting out of hand.

juanrga · Oct 11, 2013

8350rocks :

No.

Before starting. Can you snip the replies to other authors, before replying me? Or it is too work for you? I have made it for you now.

First. I gave you baselineS ==> Plural ==> More than one.

Second. A9/A15 are not "x86 CPU baseline".

Third, one can compare ARM performance to x86 performance, in the same way that one can compare PowerPC performance to x86 performance, and MIPS performance to ARM performance... This is usual.

Fourth. Nobody said you that ARM and x86 perform equally in many tasks. We know that some taks will be best suited to ARM and other best suited to x86. That was emphasized before. That is why the word "about" is used. That is why the symbol "~" is used.

Unsurprisingly, this also happens when comparing x86 to x86. We know that a FX-8350 can be faster than i7-3770k in some task but barely match an i3 in other tasks. But this has never stopped you from making x86 to x86 comparisons or to make comments against the i3. Another instance of how you use double standards.

Fifth. Evidently the A7 and A9 are not the same thing as A57, but we can measure performance. Benchmarks of the A57 against the A15 exist. Benchmarks of the A15 against the A9 exist, and so on.

Finally. The only ludicrous here is your anti-ARM crusade: from your initial nonsense CISC is "Complete" but ARM is not, to your recent "what is the baseline", passing by your attempt to compare architectures by using x86 chip vs ARM cluster benchmarks. LOL

noob2222 :

I love how you make things in your mind.

Before starting, same advice than for rocks. Can you snip the replies to other authors, before replying me? Or it is too work for you? I have made it for you now.

First. You ignore that we are not giving exact numbers. That is what the symbol "~" means.

Second. We are not using marketing alone; also, not all marketing is made equal.

Third. You use the same double standard than rocks. He used marketing slides for posting his "30% faster than PD" claim about Steamroller. Have you replied him saying that 30% must be best case scenario? No. Have you explained him your theory about marketing? No. You only use this kind of argument when it is about ARM. Again double standards.

Fourth. I took the 2-4x opteron claim by AMD as working hypothesis that suggest (as many sites have noted) that jaguar ~ A57, then used independent claims (not by AMD) to obtain that jaguar ~ A57.

Fifth. Quoting now arbitrary Opteron numbers without knowing that you are doing implies that in fact you are pretending that the A57 will be slower than the A15.

I have calculated the GFLOPs of the AMD Seattle chip from a study of the _architecture_ of the chip (note I assumed 2Ghz, when AMD claims >= 2GHz) and compared them to the GFLOPs from other chips: A9, Piledriver, Opteron,...

Marketing didn't play any role in calculating the GFLOPs. :na:

juanrga · Oct 11, 2013

-Fran- :

Gamerk problem is that he is trying to argue against facts such as cpu profilers and benchmarks. He claims that game engines cannot scale well beyond 2 threads (and that OC i3 would be the killing gaming cpu), but we have engines and games that scale well to 6 threads. And we will see engines/games that scale well to 8 threads.

In fact games could scale up to 16 threads if coding adequately. Carmack already shared his thoughts about how he could do several funny things with a 16-core console.

16-core usage wouldn't be mainstream because the consensus among developers is that it would requires special programming techniques. However, we will see how 8-core usage becomes mainstream the next year, thanks to new consoles being 8-core. Any mayor engine (Crytek, Frostbyte, Unreal...) scales up to 8-cores those days.

gamerk316 · Oct 11, 2013

Gamerk problem is that he is trying to argue against facts. He claims that game engines cannot scale beyond 2 threads (topping about 4), but we have engines and games that scale well to 6 threads. And we will see engines/games that will scale well to 8 threads.

You still see two main threads doing the bulk of the work. Then combined with about a dozen or so lighter threads, mostly on the GPU rendering side of the house, which depending on how you build the engine, you can do for DX11 (and ONLY DX11, via multithreaded rendering). I actually plan to do a GPUView analysis at some point, just to see what's going on under the hood. But I'd expect something similar to Crysis 3: Two threads going at least 60-70% of the total workload, which is going to hurt FX at the end of the day.

Interesting to note how well the core i3 holds up though, all things considered. Intel would make a boatload if they made a ~4GHz i3.

Also worth noting: The Beta sucks RAM:

http://www.hardocp.com/article/2013/10/10/battlefield_4_beta_performance_preview/4#.UlfnZUE_tZs

One of the first things that we noticed when we initially starting playing the beta was the high level of CPU usage that was occurring on the author’s personal gaming system, this is an Core i7-2600K system with 16GB of memory and two Radeon HD 7970 cards in CrossFire driving three 1920x1200 monitors. As we were getting acquainted with the basics of game play. Remembering back to Battlefield 3, we were accustomed to seeing 40-50% CPU usage during game play, however, during the Battlefield 4 Beta, we often observed CPU usage in excess of 80-90% on my personal system, at the onset of his game testing.

Moving on to our official review system with the the GeForce 770 GTX, during game play, we observed an average load across all CPU cores in the 90-95% range during each of the testing scenarios. However, with the R9 280X, we were observing CPU usage around 80-85%. Initially we began testing with just 8GB of system memory in the review system. After a significant amount of gameplay, we were noticing that 8GB of memory may not provide enough space for the game. We were experiencing memory being swapped out to the hard drive in virtual memory, meaning we were exceeding 8GB of RAM and this was affecting our smoothness and performance.

We upgraded our test platform to have 16GB of system memory, which is the level that we performed all of our graphed testing at here today. Subjectively, there did feel like there was a difference in the overall gameplay experience by utilizing a larger amount of memory, especially with the GTX 770. More testing into memory utilization needs to be done. The game seems to consume more memory the longer you play. In our testing scenario, we got a maximum of 6.5GB of system RAM utilized just doing our short run-throughs on the previous page. However, it is after several hours of gaming, that the RAM will be pegged through the roof, and in the case of 8GB of system RAM, it just wasn't enough for long sessions of gameplay.

Possible RAM bottleneck when using "only" 8GB. Bears watching when looking at benchmarks.

de5_Roy · Oct 11, 2013

slow day, no SR news. again.

AMD solves 4K display problems with VESA Display ID v1.3
http://semiaccurate.com/2013/10/11/amd-solves-4k-display-problems-vesa-display-id-v1-3/

vr-zone has a funny video showing arm a9 4c vs intel clovertrail+ 2c running nfs game on a tablet. i lol'ed.

etayorius · Oct 11, 2013

I got more news from that Chilenian site, it seems AMD talked to them about an hypothetical 6 Core APU with 768 GCN 2.0, AMD did not said a release date nor if this is actually planned.

http://www.chw.net/2013/10/amd-prepara-un-apu-sextuple-nucleo-con-graficos-con-768-shader-processors/

On an older news in the same site AMD claims that Steamroller (APU or CPU? not clear) will bring greater Performance than what Piledriver brought compared to Bulldozer:

http://www.chw.net/2013/10/amd-steamroller-ofrecera-un-mayor-grado-de-mejora-que-piledriver/

I just want a decent upgrade to my PhenomII x4 980, the FX-8350 just does not cut it... but anything they release later (if not piledriver based) i will pick, the performance will probably be about 30% at minimum compared to my current PhenomII, so that should be enough... if it`s 6 or 8 cores, the better.

In that site they keep talking about a "Piledriver+" arch, so maybe this is the tech used in Richland? it is supposed to be based on an improved Piledriver but the IPC perfomance is even faster than the FX-8350, maybe AMD will stick to Piledriver untill 2014 with some "improved" Vishera tech... i rather get Steamroller... but if the performance is right i will just pick what ever they throw in 2014.

8350rocks · Oct 11, 2013

juanrga :

Baselines of what 64 bit server capable ARM architecture?

Second. A9/A15 are not "x86 CPU baseline".

No, they're tablet chips, which you so astutely pointed out earlier, is not the same thing as a DT or server CPU.

Third, one can compare ARM performance to x86 performance, in the same way that one can compare PowerPC performance to x86 performance, and MIPS performance to ARM performance... This is usual.

Comparing is one thing, making assumption based on marketing slides and hype with no currently equivalent architectural baseline is too many assumptions and extrapolations to be useful. We will find out what Seattle does in 2H 2014. How about no more ARM discussion until we at least have ES's? Everyone good with that? I know I would be fine with it.

Fourth. Nobody said you that ARM and x86 perform equally in many tasks. We know that some taks will be best suited to ARM and other best suited to x86. That was emphasized before. That is why the word "about" is used. That is why the symbol "~" is used.

As per above, you are making too many extrapolations with minimal data. It would be like trying to extrapolate how long a 1960's supercomputer would take to run modern x86-64 instructions if you converted it from punch card instructions. You might be close...you might miss the mark by an entire galaxy worth of error. It's useless information.

Unsurprisingly, this also happens when comparing x86 to x86. We know that a FX-8350 can be faster than i7-3770k in some task but barely match an i3 in other tasks. But this has never stopped you from making x86 to x86 comparisons or to make comments against the i3. Another instance of how you use double standards.

When comparing them, we have as close to an actual "fair" comparison as possible. I also would like to point out that I recommend most people take benchmarks with a grain of salt as you can never be entirely sure of 100% of the variables in play. I give that advice with working hardware you can buy in the real world. Imagine how I feel about trying to guess numbers from something that doesn't exist, and has no prior precedent from a similar architecture...?

Fifth. Evidently the A7 and A9 are not the same thing as A57, but we can measure performance. Benchmarks of the A57 against the A15 exist. Benchmarks of the A15 against the A9 exist, and so on.

As you so astutely mentioned directly above when someone showed you the inferiority of ARM in compute benchmarks:

"Tablet chips != DT or server chips, this is not a fair comparison"

So, trying to extrapolate data from incomparable architectures is ok now? Is that because you are doing it and it serves your purpose? Because everyone else is scratching their heads. It would be like trying to extrapolate Xeon E5 performance from an Intel Atom...or trying to make assumption about Steamroller based on Temash performance. Yet according to you, we are ludicrous for trying to do such things. However, it's clearly ok for you to throw around those numbers and claim they're comparable...right? :sarcasm:

Finally. The only ludicrous here is your anti-ARM crusade: from your initial nonsense CISC is "Complete" but ARM is not, to your recent "what is the baseline", passing by your attempt to compare architectures by using x86 chip vs ARM cluster benchmarks. LOL

Ok, I have had enough of this reference, explain to me what you understand the difference to be between CISC and RISC without cut and paste from Wikipedia. You still have failed to do so, and I think you misunderstand the difference entirely. So let me break it down for you:

Could you get a RISC architecture to do everything a CISC architecture can? Sure, with enough coding effort you likely could.

HOWEVER:

Because you cannot use the same level of abstraction in the code, and the instructions are far simpler, your code would be simpler. This means it would take more code to do the same things comparatively, and the CPU would spend more time processing instructions. Why, you ask? Well, because when you have higher level instruction sets in the CPU uarch, you can use more advanced instructions that take longer than 1 clock cycle to run. This means less code can do more work because you can run more complex instructions that a RISC architecture would have to break down into multiple operations.

So, what that means, is that in RISC, to do the same high level abstraction, you would have more bloated code to get all of the same functionality. Your CPU would be more bogged down running code longer because it doesn't have the high level instructions. Think windows is huge in x86?? Want to bet a DT version of windows for ARM architecture would be even more bloated if they included the same features? Want to bet it would run significantly slower too for many operations because it would just take more time to process the extra code to implement the abstractions?

The answer to the above is yes, it would be slower, it would take more time to run code that requires higher level instructions in x86.

THAT is why ARM will not beat x86 in raw compute. It simply would not happen. No matter how much you try to brute force it, x86 is better at raw compute.

Now get off your dead ARM horse, and stop beating the poor thing, it's dead...ok?

Cazalan · Oct 11, 2013

gamerk316 :

I was surprised how well the i3 held up considering BF4 is supposed to be a game that scales well. The haswell i3 @ 3.6ghz probably would have got close to the 2500K. FX-4350 numbers were not generated either. I'd be curious to see how the Richland APUs perform on one of the new FM2+ motherboards having PCIe 3.0 support.

I consider Intel not releasing i3 @ 4+ghz as a sign they want AMD to stay alive but limping along.

Cazalan · Oct 11, 2013

Regarding exascale computing the x86 world isn't sitting idly by. Intel has 3 research teams working on it. Expecting to achieve 1 Exaflop by 2018 and 4 Exaflop by 2020.

http://www.intc.com/releasedetail.cfm?ReleaseID=586162

noob2222 · Oct 11, 2013

juanrga :

ya, ok so you came up with the figures from real world benchmarks on a chip that won't be made till next year and have the schematics for building it yourself. got it.

While (Juanrga = wrong) {printf("JUANRGA IS NEVER WRONG")};

juanrga · Oct 11, 2013

gamerk316 :

http--www.gamegpu.ru-images-stories-Test_GPU-Action-Battlefield_4_Beta-test-bf_4_intel.jpg

gamerk316 :

It seems like they bottlenecked the GPU by using only 2GB VRAM (BF4 recommends 3GB). Probably then RAM was used to page VRAM and then they run out of main memory and paged to disk.

Also we don't know how many ram they used for tools. BF4 recommends 8 GB only for the game. They probably run the game _and_ a set of tools on a total of 8GB.

juanrga · Oct 11, 2013

etayorius :

No. AMD did not say them that. AMD gave them the next diagram (which the Chilenian site reproduces) and which they interpret incorrectly

Consider the CPU. The Chilenian site interprets that is a six-core CPU, but they are wrong.

Steamroller introduces 3 ALUs per core. Therefore AMD gave them a diagram for one Steamroller module. We can see the two cores, each one with its L1 cache, and then the L2 cache shared between each pair of cores within a module.

I don't have at hand a GCN diagram, but I suspect that the GPU diagram is representing a 3CU configuration.

etayorius :

Nothing new. Piledriver introduced about a 8% IPC improvement because was a minor refresh. Steamroller introduces about a 20% because it is a serious architectural update.

Cazalan · Oct 11, 2013

Interesting paper on the modern RISC vs CISC debate.

http://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa-power-struggles.pdf

"While our study shows that RISC and CISC ISA traits are
irrelevant to power and performance characteristics of modern
cores, ISAs continue to evolve to better support exposing
workload-specific semantic information to the execution substrate."

"The consensus was that “with aggressive
microarchitectural techniques for ILP, CISC and RISC
ISAs can be implemented to yield very similar performance.”"

Which backs what Intel has been saying. Every time the re-evaluate the x86 ISA to make it simpler they find no real world benefit. The efficiency is all in the implementation and based on the level of performance they wish to achieve.

Cazalan · Oct 11, 2013

juanrga :

That's the first I've heard of moving to 3 ALUs. Bulldozer/Piledriver only have 2 ALU per core. That would easily put the device on par with the later Intel chips.

AMD CPU speculation... and expert conjecture

Distinguished

Distinguished

Glorious

Distinguished

Honorable

Splendid

Distinguished

Distinguished

Honorable

Splendid

Splendid

Honorable

Distinguished

Distinguished

Glorious

Splendid

Honorable

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Share this page