AMD CPU speculation... and expert conjecture

Page 319 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Of course I missed it, because I am always wrong :pt1cable:

Recall when he promised us that two main threads would be the rule for next gen games. i3 was going to rule the world of gaming. He was so confident that even had prepared for us the "I said so".

Still this new game is already using 8 threads as some of us predicted, although toms has managed to test a part of the game barely using six-threads and under GPU-limited situation for capping the real potential of six-core i7.

The really interesting part will come with the MANTLE updated edition, which will _optimize_ the support for 8-cores by eliminating the DX constraints

MANTLE-4.jpg

Frostbite-3-AMD-Mantle-API.jpg


It will be funny to hear again expertise comments about how it couldn't be made.
 

8350rocks

Distinguished


Because you're basing ARM performance on an x86 CPU baseline.

Apples, meet Oranges, you're now the same...

See my point?

ARM and x86 do not perform equally in many tasks, you are making the *HUGE* mistake of assuming that ARM performance will be x86 + X%...and it will absolutely not.

That's mainly why many think your claims are ludicrous at best. You have no ARM baseline to judge by...by your own admission the A7 and A9 are not the same thing as A57. Many well know this. You can talk hype all you want, however, this is not the same as an evolution of an architecture, like SB->IB->Hasfail, or BD->PD->SR.

In effect, I think you're doing no more than pulling numbers out of your ...sky... to be diplomatic about it. Then you present them as verifiable facts because someone put them on a marketing slide comparing a ULV x86 to a not yet developed or released ARM core that doesn't even have ES's available for testing.
 
I have to agree with Juan on that part as well. Gamerk has been very vocal about (not)core-scaling in games. While I do understand his points, he is fighting with the same paradigm arguments on serial programming. I think palladin gave a great post on that matter a LOT of pages back in the thread, but as you can imagine, I won't go back and look for it. Basic gist of it was: change the paradigm and re-think the problem to go parallel.

Also, I think in Tom's article, they lacked the usual (and lovely) "core scaling" graph they use in some articles. It was weird for them not to show one. Mr Angelini, if you're reading this, please add one next time, for the full release of BF4 and specially the "MANTLE" edition, haha.

I won't get into performance theory, because... You know... Flying cows and all.

Cheers!
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


Here is where you fail. Marketing isn't going to give you the best possible scenario, they are giving you what is possible.

2-4x opteron x. they didn't say wich one. You automatically put in the best one they have, the x2150 @ 158 Gflop. What if marketing was refrencing the x1150 @ 72 Gflop, or even the dual core @36 Gflop? This is of course assuming 2 ghz and not 1.5 or 1.0 ghz.

http://www.amd.com/us/products/server/processors/six-core-opteron/Pages/ctp-calculations.aspx

Marketing isn't lying by saying 2-4x faster, they aren't saying its 2-4x faster than the flagship opteron x like you are

In case you didn't know how marketing works, they love to show their best case scenario. Meaning its the x1100 vs the 16-core ARM cpu. It could be anywhere from 72 gflops to 256 and marketing would be correct.

It does not mean without a doubt in anyone's mind that its 256, that would mean that its 8x faster than the dual core or 1ghz optern x, but if they stated that, your figures would turn into 512.

While (Juanrga = wrong) {printf("JUANRGA IS NEVER WRONG")};
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780
I have seen GamerK's argument too many times.

It happened when people were building Pentium G systems on Reddit. Folks cherry picking benchmarks where Pentium G beat Phenom and then saying Phenom sucked and get Pentium G.

Then new games came out and Pentium G with 2 threads was useless. So everyone started recommended i3s instead of FX 6300 or Phenom 2 instead of Pentium G.

Then, i3 wasn't enough for things like BF3 multi-player so folks started recommending FX 6300 over i3. Now we are starting to see people recommend 8350 over 4670k and 3570k.

It is a constant pattern I've noticed and every single time there is someone telling you to buy Pentium G or Core i3 instead of the AMD they are posting cherry picked benchmarks telling you you don't need two threads, or four threads, or 6 threads, or 8 threads.

It's a never ending cycle. a year from now, if the cycle repeats itself, everyone will be building FX gaming rigs. Mark my words, it's already happened twice already and it's always the same. GamerK probably floats around on /r/buildapc and tells everyone to get Pentium G then i3 and now 3570k.

Mod Edit: Do not call people out or try to cause drama.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810
Looks like BF4 needs MANTLE badly if it is GPU limited by a Titan at 1080P.

It is about time a greater emphasis is put on the software side than trying to brute force it with 6+ billion transistor graphic chips.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780
I really can't believe we're doing this.

We go from:

"Look at these 720p, graphics on low benchmarks to compare CPUs"

to

"Look at this 1080p, graphics on ultra benchmark to compare CPU, i3 is the same!"

As an AMD guy, I didn't just spend the last several years having Intel guys posting 720p gaming benchmarks which show differences in CPUs that make FX look weaker suddenly turn around and change to 1080p ultra settings and use that as a CPU bench?

Is this how it's going to be? AMD starts to catch up in gaming so after years of "AMD SUX IN SKYRIM IT GETS 120FPS INSTEAD OF 190FPS AT 720P!" we go to "it doesn't matter you're GPU bottlenecked ANYWAYS SO JUST BUY INTEL!!!"

My face hurts from pushing my palm into it from all of the facepalming.
 

m32

Honorable
Apr 15, 2012
387
0
10,810


ARM IS THE FUTURE! ARM IS GOING TO TAKE OVER X86!!!! PRODUCTIVITY, GAMING, POWER EFFICIENCY AND OTHER STUFF IS ALL ON ARM's SIDE! DON'T BE SCARED OF THE FUTURE!

:lol: I can't wait to see some SR tech. This thread is getting out of hand.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No.

Before starting. Can you snip the replies to other authors, before replying me? Or it is too work for you? I have made it for you now.

First. I gave you baselineS ==> Plural ==> More than one.

Second. A9/A15 are not "x86 CPU baseline".

Third, one can compare ARM performance to x86 performance, in the same way that one can compare PowerPC performance to x86 performance, and MIPS performance to ARM performance... This is usual.

Fourth. Nobody said you that ARM and x86 perform equally in many tasks. We know that some taks will be best suited to ARM and other best suited to x86. That was emphasized before. That is why the word "about" is used. That is why the symbol "~" is used.

Unsurprisingly, this also happens when comparing x86 to x86. We know that a FX-8350 can be faster than i7-3770k in some task but barely match an i3 in other tasks. But this has never stopped you from making x86 to x86 comparisons or to make comments against the i3. Another instance of how you use double standards.

Fifth. Evidently the A7 and A9 are not the same thing as A57, but we can measure performance. Benchmarks of the A57 against the A15 exist. Benchmarks of the A15 against the A9 exist, and so on.

Finally. The only ludicrous here is your anti-ARM crusade: from your initial nonsense CISC is "Complete" but ARM is not, to your recent "what is the baseline", passing by your attempt to compare architectures by using x86 chip vs ARM cluster benchmarks. LOL



I love how you make things in your mind.

Before starting, same advice than for rocks. Can you snip the replies to other authors, before replying me? Or it is too work for you? I have made it for you now.

First. You ignore that we are not giving exact numbers. That is what the symbol "~" means.

Second. We are not using marketing alone; also, not all marketing is made equal.

Third. You use the same double standard than rocks. He used marketing slides for posting his "30% faster than PD" claim about Steamroller. Have you replied him saying that 30% must be best case scenario? No. Have you explained him your theory about marketing? No. You only use this kind of argument when it is about ARM. Again double standards.

Fourth. I took the 2-4x opteron claim by AMD as working hypothesis that suggest (as many sites have noted) that jaguar ~ A57, then used independent claims (not by AMD) to obtain that jaguar ~ A57.

Fifth. Quoting now arbitrary Opteron numbers without knowing that you are doing implies that in fact you are pretending that the A57 will be slower than the A15.

I have calculated the GFLOPs of the AMD Seattle chip from a study of the _architecture_ of the chip (note I assumed 2Ghz, when AMD claims >= 2GHz) and compared them to the GFLOPs from other chips: A9, Piledriver, Opteron,...

Marketing didn't play any role in calculating the GFLOPs. :na:
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Gamerk problem is that he is trying to argue against facts such as cpu profilers and benchmarks. He claims that game engines cannot scale well beyond 2 threads (and that OC i3 would be the killing gaming cpu), but we have engines and games that scale well to 6 threads. And we will see engines/games that scale well to 8 threads.

In fact games could scale up to 16 threads if coding adequately. Carmack already shared his thoughts about how he could do several funny things with a 16-core console.

16-core usage wouldn't be mainstream because the consensus among developers is that it would requires special programming techniques. However, we will see how 8-core usage becomes mainstream the next year, thanks to new consoles being 8-core. Any mayor engine (Crytek, Frostbyte, Unreal...) scales up to 8-cores those days.
 
Gamerk problem is that he is trying to argue against facts. He claims that game engines cannot scale beyond 2 threads (topping about 4), but we have engines and games that scale well to 6 threads. And we will see engines/games that will scale well to 8 threads.

You still see two main threads doing the bulk of the work. Then combined with about a dozen or so lighter threads, mostly on the GPU rendering side of the house, which depending on how you build the engine, you can do for DX11 (and ONLY DX11, via multithreaded rendering). I actually plan to do a GPUView analysis at some point, just to see what's going on under the hood. But I'd expect something similar to Crysis 3: Two threads going at least 60-70% of the total workload, which is going to hurt FX at the end of the day.

Interesting to note how well the core i3 holds up though, all things considered. Intel would make a boatload if they made a ~4GHz i3.

Also worth noting: The Beta sucks RAM:

http://www.hardocp.com/article/2013/10/10/battlefield_4_beta_performance_preview/4#.UlfnZUE_tZs

One of the first things that we noticed when we initially starting playing the beta was the high level of CPU usage that was occurring on the author’s personal gaming system, this is an Core i7-2600K system with 16GB of memory and two Radeon HD 7970 cards in CrossFire driving three 1920x1200 monitors. As we were getting acquainted with the basics of game play. Remembering back to Battlefield 3, we were accustomed to seeing 40-50% CPU usage during game play, however, during the Battlefield 4 Beta, we often observed CPU usage in excess of 80-90% on my personal system, at the onset of his game testing.

Moving on to our official review system with the the GeForce 770 GTX, during game play, we observed an average load across all CPU cores in the 90-95% range during each of the testing scenarios. However, with the R9 280X, we were observing CPU usage around 80-85%. Initially we began testing with just 8GB of system memory in the review system. After a significant amount of gameplay, we were noticing that 8GB of memory may not provide enough space for the game. We were experiencing memory being swapped out to the hard drive in virtual memory, meaning we were exceeding 8GB of RAM and this was affecting our smoothness and performance.

We upgraded our test platform to have 16GB of system memory, which is the level that we performed all of our graphed testing at here today. Subjectively, there did feel like there was a difference in the overall gameplay experience by utilizing a larger amount of memory, especially with the GTX 770. More testing into memory utilization needs to be done. The game seems to consume more memory the longer you play. In our testing scenario, we got a maximum of 6.5GB of system RAM utilized just doing our short run-throughs on the previous page. However, it is after several hours of gaming, that the RAM will be pegged through the roof, and in the case of 8GB of system RAM, it just wasn't enough for long sessions of gameplay.

Possible RAM bottleneck when using "only" 8GB. Bears watching when looking at benchmarks.
 
slow day, no SR news. again.

AMD solves 4K display problems with VESA Display ID v1.3
http://semiaccurate.com/2013/10/11/amd-solves-4k-display-problems-vesa-display-id-v1-3/

vr-zone has a funny video showing arm a9 4c vs intel clovertrail+ 2c running nfs game on a tablet. i lol'ed.
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780
I got more news from that Chilenian site, it seems AMD talked to them about an hypothetical 6 Core APU with 768 GCN 2.0, AMD did not said a release date nor if this is actually planned.

http://www.chw.net/2013/10/amd-prepara-un-apu-sextuple-nucleo-con-graficos-con-768-shader-processors/

On an older news in the same site AMD claims that Steamroller (APU or CPU? not clear) will bring greater Performance than what Piledriver brought compared to Bulldozer:

http://www.chw.net/2013/10/amd-steamroller-ofrecera-un-mayor-grado-de-mejora-que-piledriver/

I just want a decent upgrade to my PhenomII x4 980, the FX-8350 just does not cut it... but anything they release later (if not piledriver based) i will pick, the performance will probably be about 30% at minimum compared to my current PhenomII, so that should be enough... if it`s 6 or 8 cores, the better.


In that site they keep talking about a "Piledriver+" arch, so maybe this is the tech used in Richland? it is supposed to be based on an improved Piledriver but the IPC perfomance is even faster than the FX-8350, maybe AMD will stick to Piledriver untill 2014 with some "improved" Vishera tech... i rather get Steamroller... but if the performance is right i will just pick what ever they throw in 2014.
 

8350rocks

Distinguished


Baselines of what 64 bit server capable ARM architecture?

Second. A9/A15 are not "x86 CPU baseline".

No, they're tablet chips, which you so astutely pointed out earlier, is not the same thing as a DT or server CPU.

Third, one can compare ARM performance to x86 performance, in the same way that one can compare PowerPC performance to x86 performance, and MIPS performance to ARM performance... This is usual.

Comparing is one thing, making assumption based on marketing slides and hype with no currently equivalent architectural baseline is too many assumptions and extrapolations to be useful. We will find out what Seattle does in 2H 2014. How about no more ARM discussion until we at least have ES's? Everyone good with that? I know I would be fine with it.

Fourth. Nobody said you that ARM and x86 perform equally in many tasks. We know that some taks will be best suited to ARM and other best suited to x86. That was emphasized before. That is why the word "about" is used. That is why the symbol "~" is used.

As per above, you are making too many extrapolations with minimal data. It would be like trying to extrapolate how long a 1960's supercomputer would take to run modern x86-64 instructions if you converted it from punch card instructions. You might be close...you might miss the mark by an entire galaxy worth of error. It's useless information.

Unsurprisingly, this also happens when comparing x86 to x86. We know that a FX-8350 can be faster than i7-3770k in some task but barely match an i3 in other tasks. But this has never stopped you from making x86 to x86 comparisons or to make comments against the i3. Another instance of how you use double standards.

When comparing them, we have as close to an actual "fair" comparison as possible. I also would like to point out that I recommend most people take benchmarks with a grain of salt as you can never be entirely sure of 100% of the variables in play. I give that advice with working hardware you can buy in the real world. Imagine how I feel about trying to guess numbers from something that doesn't exist, and has no prior precedent from a similar architecture...?

Fifth. Evidently the A7 and A9 are not the same thing as A57, but we can measure performance. Benchmarks of the A57 against the A15 exist. Benchmarks of the A15 against the A9 exist, and so on.

As you so astutely mentioned directly above when someone showed you the inferiority of ARM in compute benchmarks:

"Tablet chips != DT or server chips, this is not a fair comparison"

So, trying to extrapolate data from incomparable architectures is ok now? Is that because you are doing it and it serves your purpose? Because everyone else is scratching their heads. It would be like trying to extrapolate Xeon E5 performance from an Intel Atom...or trying to make assumption about Steamroller based on Temash performance. Yet according to you, we are ludicrous for trying to do such things. However, it's clearly ok for you to throw around those numbers and claim they're comparable...right? :sarcasm:

Finally. The only ludicrous here is your anti-ARM crusade: from your initial nonsense CISC is "Complete" but ARM is not, to your recent "what is the baseline", passing by your attempt to compare architectures by using x86 chip vs ARM cluster benchmarks. LOL

Ok, I have had enough of this reference, explain to me what you understand the difference to be between CISC and RISC without cut and paste from Wikipedia. You still have failed to do so, and I think you misunderstand the difference entirely. So let me break it down for you:

Could you get a RISC architecture to do everything a CISC architecture can? Sure, with enough coding effort you likely could.

HOWEVER:

Because you cannot use the same level of abstraction in the code, and the instructions are far simpler, your code would be simpler. This means it would take more code to do the same things comparatively, and the CPU would spend more time processing instructions. Why, you ask? Well, because when you have higher level instruction sets in the CPU uarch, you can use more advanced instructions that take longer than 1 clock cycle to run. This means less code can do more work because you can run more complex instructions that a RISC architecture would have to break down into multiple operations.

So, what that means, is that in RISC, to do the same high level abstraction, you would have more bloated code to get all of the same functionality. Your CPU would be more bogged down running code longer because it doesn't have the high level instructions. Think windows is huge in x86?? Want to bet a DT version of windows for ARM architecture would be even more bloated if they included the same features? Want to bet it would run significantly slower too for many operations because it would just take more time to process the extra code to implement the abstractions?

The answer to the above is yes, it would be slower, it would take more time to run code that requires higher level instructions in x86.

THAT is why ARM will not beat x86 in raw compute. It simply would not happen. No matter how much you try to brute force it, x86 is better at raw compute.

Now get off your dead ARM horse, and stop beating the poor thing, it's dead...ok?


 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


I was surprised how well the i3 held up considering BF4 is supposed to be a game that scales well. The haswell i3 @ 3.6ghz probably would have got close to the 2500K. FX-4350 numbers were not generated either. I'd be curious to see how the Richland APUs perform on one of the new FM2+ motherboards having PCIe 3.0 support.

I consider Intel not releasing i3 @ 4+ghz as a sign they want AMD to stay alive but limping along.
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


ya, ok so you came up with the figures from real world benchmarks on a chip that won't be made till next year and have the schematics for building it yourself. got it.

While (Juanrga = wrong) {printf("JUANRGA IS NEVER WRONG")};
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


http--www.gamegpu.ru-images-stories-Test_GPU-Action-Battlefield_4_Beta-test-bf_4_intel.jpg




It seems like they bottlenecked the GPU by using only 2GB VRAM (BF4 recommends 3GB). Probably then RAM was used to page VRAM and then they run out of main memory and paged to disk.

Also we don't know how many ram they used for tools. BF4 recommends 8 GB only for the game. They probably run the game _and_ a set of tools on a total of 8GB.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No. AMD did not say them that. AMD gave them the next diagram (which the Chilenian site reproduces) and which they interpret incorrectly

APU-CPU6-GPU-768-660x660.jpg


Consider the CPU. The Chilenian site interprets that is a six-core CPU, but they are wrong.

Steamroller introduces 3 ALUs per core. Therefore AMD gave them a diagram for one Steamroller module. We can see the two cores, each one with its L1 cache, and then the L2 cache shared between each pair of cores within a module.

I don't have at hand a GCN diagram, but I suspect that the GPU diagram is representing a 3CU configuration.



Nothing new. Piledriver introduced about a 8% IPC improvement because was a minor refresh. Steamroller introduces about a 20% because it is a serious architectural update.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810
Interesting paper on the modern RISC vs CISC debate.

http://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa-power-struggles.pdf

"While our study shows that RISC and CISC ISA traits are
irrelevant to power and performance characteristics of modern
cores, ISAs continue to evolve to better support exposing
workload-specific semantic information to the execution substrate."

"The consensus was that “with aggressive
microarchitectural techniques for ILP, CISC and RISC
ISAs can be implemented to yield very similar performance.”"


Which backs what Intel has been saying. Every time the re-evaluate the x86 ISA to make it simpler they find no real world benefit. The efficiency is all in the implementation and based on the level of performance they wish to achieve.

 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


That's the first I've heard of moving to 3 ALUs. Bulldozer/Piledriver only have 2 ALU per core. That would easily put the device on par with the later Intel chips.
 
Status
Not open for further replies.