AMD CPU speculation... and expert conjecture

Page 614 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I find it interesting that after years promoting the i3 for next gen gaming, now you turn your eyes to 8-core chips... :sarcastic:

I expect the 8-core Haswell to outperform the FX-9000 series in any imaginable metric. But compared to other Intel chips things are not easy for gaming.

BF4 core profiles show that the game is optimized for six-threads. The six-core FX has average load of 95% and the lowest load core was at 92%. This contrast with the eigth-core FX with an average load of only 68% and the lowest load core at 55%.

In average FPS, the 8-core FX is only a 10% faster than the six-core FX, when could be up to 52% faster (assuming perfect linear scaling).

This is a consequence of only six cores of the consoles being available to games. Six single-thread cores.

Now the situation for Intel is radically different because of hypertreading. An i3 get maximised its four threads. The next hyperthreaded model is a quad-core eight-thread i7. This has average load of 68% with lowest load of 60% for one of the physical cores. It is evident that BF4 is not maximizing the CPU usage. A six-core twelve-thread i7 has average load of only 64% (for loaded threads), the minimum core load decreases to 56% for one of the physical cores and several of the twelve available threads are no used (0% load) because the game doesn't scale up.

In average FPS, the twelve-thread CPU is only a 30% faster than the eigth-thread CPU, when theorically (linear scaling) could be up to 54% faster. The difference coincides with the upgrade from 4 to 6 physical cores. In fact, several of the virtual cores that are loaded in the quad-core i7 are empty in the six-core i7 because the scheduler fill first the empty physical cores.

In my opinion the future eigth-core sixteen-thread i7 will barely match the six-core twelve-thread i7 and it is even possible that run the game slower due to the reduction in frequency from 3.5GHz to 3.0GHz.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


The single player, which is usually what gets benchmarked, doesn't scale well to many cores. The demanding part of BF4 is the 64 player multiplayer maps.

Take a look at BF4 benchmarks on i3s on Youtube. most of them are someone walking down a narrow corridor and going "look, I get 60fps+, the i3 is fine!" Things change when you're in an open map with a ton of things happening.

But 5960x at stock is sort of disappointing. Consider the following:

5960x has 33% more cores than 4960x. Yet the base clock is 20% lower on 5960x and the turbo is 21% lower than the 4960x. The IPC increase from IB -> Haswell is not that good.

It is sort of like original Bulldozer release. Not much IPC increase yet lower clocks with more cores, all with higher TDP.

Anyways, this is an AMD thread. Intel is going to launch a $350 6 core CPU in the next year or so. Whatever AMD does, they're going to have to compete with that. It should finally get interesting here for a bit. It's just a shame a lot of people don't need all those cores.
 
amd can take advantage of hsw-e launch. let intel go through all the debugging with ddr4 compatibility and early high prices, then launch carrizo (and may be an hedt platform ;)) with ddr4 support when amd finds convenient.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
Wait ... arm 64 = itanium in terms of software compatibility? That really panned out didnt it.

Itanium and x86 are two completely different ISAs, one is VLIW, the other is CISC; Itanium had to emulate x86, which generated a performance penalty. Moreover, Intel/HP did not bring an adequate VLIW compiler and most programers used x86 emulation instead the native ISA:

«Intel went to software developers and said "we need code for this." Code writers said "we need compilers because nothing we have is optimized for this architecture"», says Culver. He faulted Intel for not making optimal compilers available for developers when it shipped, which stopped the processor's momentum dead.

Of course this is not what is happening for ARM64, because compilers support and native applications have been ready for months. Moreover, I mentioned before that Vulcan core can run both 64bit applications and 32bit applications because has both AArch64 full-mode and AArch32 user-mode implemented in the ARMv8 decoder...

Evidently, cores such as A53, A57, Denver, Cyclone... can run both new software and the old software without any performance penalty; what is more, old software run faster! For instance, the A57 core used in AMD's Seattle and Skybridge products run 32bit software about 15--30% faster than the A15 core at same clocks.

ARM64 is backward compatible with ARM32, but the new ARM ISA is implemented separately instead as an extension. ARM enginners decision brings the best of both worlds: backward compatibility and bright future. Companies will implement future cores (K13? K14?) with only AArch64 support, which will reduce development and fabrication costs, whereas will increase the chip efficiency; at the other side any company fabricating x86 CPUs will require to support tons of unuseful legacy support in the ISA, what only adds transistors waste to the silicon.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
5960x has 33% more cores than 4960x. Yet the base clock is 20% lower on 5960x and the turbo is 21% lower than the 4960x. The IPC increase from IB -> Haswell is not that good.

The 33% is correct because takes the 4960x as baseline. The other percentages are incorrect because take incorrect baseline. The base clock of the 5960x is a 17% lower than the 4960x (3.6GHz --> 3GHz) whereas the turbo is a 12% lower than the 4960x (4.0GHz --> 3.5GHz). The baselines to compute the percentages are the 4960x frequencies, i.e. 3.6GHz and 4GHz respectively.

The 5960x will be about so fast (90--95%) as the 4960x on single thread performance, because the lower turbo frequency will be almost compensated by the IPC gain. However, the 5960x will provide about 15--20% more agregate throughput. Those are values for ordinary x86 software. By using Haswell AVX extensions, some applications will run up to 2x faster on the 5960x.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
Carrizo only supports DDR3. Internally it support up to DDR3-2400, but probably will be rated at 2133MHz officially.

The APU with DDR4 suport is Toronto. AMD Seattle already suppport DDR4.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I know the original Charlie articles used by Wikipedia and other sources. The history is more or less as follows.

Denver (then codename T50 if my memory doesn't fail) was originally planned only as a x86 CPU. Nvidia claimed then that Transmeta code morphing granted them the rigth to develop a non-x86 CPU that could run x86 code. Internally the core would be a VLIW design and code morphing would be used to translate x86 input to internal ISA for execution.

The problem is that Nvidia and Intel did start a several-years patent war regarding x86, chipsets, and GPUs. Both companies finished the patent war with a settlement that obligated intel to abandon Larrabe and that obligated Nvidia to abandon its x86-compatible CPU. If you check the details of the settlement in the link below, you can see that not only Nvidia doesn't obtain a x86 license but, moreover, Nvidia is prohibited to develop a x86 emulator as well.

http://www.anandtech.com/show/4122/intel-settles-with-nvidia-more-money-fewer-problems-no-x86/2

Once Nvidia got that prohibition, the Denver project was rethinked and morphed (pun intended) to an ARM-compatible core, but then Nvidia has to wait to ARM to develop ARMv8. This is why Denver core has been delayed up to today. But well finally the Denver core achieves Haswell-class performance with less power consumption. I am now expecting their Boulder core for servers. :D

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Ok, confirmed: the eight-core Haswell loses to the six-core Haswell because the game cannot scale up to many threads and the frequency penalty holds

bf4-fr_r_600x450.png


Note: there is something odd about the six-core 5820K and 5930K in that graph; it looks as if the labels were interchanged.
 


And what software out there of any note uses ICC to compile? Oh right, outside of a few benchmark programs, NONE. Or do I need to break out the compiler diagnostics, again, and look at what every game in the past decade has been compiled with? I thought we finally put this one to bed?

I find it interesting that after years promoting the i3 for next gen gaming, now you turn your eyes to 8-core chips... :sarcastic:

You apparently missed my "duo is dead" proclamation from around 2009 or so, back when everyone here was pushing for E8600 over the Q9550 (you all know who you are). I predicted, correctly, that two heavy game threads would bog down most duo cores in terms of frame stability, making quads the sweat point for performance.

I also correctly predicted that games wouldn't be able to scale well beyond that, which is the main reason I called out BD from the start. Games like BF4 are one of the few exceptions, due in part to EA getting multithreaded rendering working decently in DX11.

DX12 could change the math somewhat, if its easier to make a multithreaded rendering model, then you could see the rendering backend made more parallel, which, ironically enough, would benefit duo cores the most (better core balancing between the two cores, hopefully making latency more stable).

Take a look at BF4 benchmarks on i3s on Youtube. most of them are someone walking down a narrow corridor and going "look, I get 60fps+, the i3 is fine!" Things change when you're in an open map with a ton of things happening.

Latency in a nutshell. The i3 can service an average of 60 FPS, but the latency chart looks horrid, since it dies on high-workload frames. Thats the two-heavy-thread dynamic making games unplayable at highest settings, even though the averages look fine.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Yes, a priori Denver could run ARM, x86, POWERPC... It could be a smarter alternative to AMD Skybridge project, which needs different cores for each ISA.

But, as commented before, Nvidia signed a settlement with Intel that prohibits explicitly the use of x86 emulation. Moreover, I don't think that Nvidia is really interested in POWERPC ISA.

I think that Nvidia goals are completely different to AMD Skybridge. I think that Nvidia main interest in code morphing is for reusing the Denver core inside GPUs. I think Volta is the GPU that will include Denver cores for assisting the SM processors. Aka Nvidia own approach to heterogeneity.
 

colinp

Honorable
Jun 27, 2012
217
0
10,680


So you didn't bother reading the article then?
 

jdwii

Splendid

Gamer from being on tomshardware for over 4 years or so i can say you where always right about the things you said. I remember when i first got to tomshardware people where trying to get me to buy a Phenom dual core over a Athlon II x4 something i found funny i doubt you remember but you helped me decide on not doing that. Then i remember saying bulldozer was going to be amazing over invalid logic until i saw others downplaying IPC so much and thinking they could pull a Pentium 4 10Ghz design. It was quite simple that it was made for servers first and it sucked at doing that as well being even or slighly better than their 12 core Magny. Probably over lower IPC and CMT scaling to 80% or 80% X 16= 12.8
Its funny that certain people think that an engine that can use all 8 cores well means the rendering thread will not be as heavy or that performance will actually go up? Its quite simple when playing these next gen games i still see one of my cores on my CPU at 70-80% while the rest are at 40-60% or 20-30%. Directx 12 and Mantle and OpenGL nx should help with all of this but it feels more like a break time than a solution for Amd.
 


It was never put to bed, because we need to de-compile the libraries used as well. Knowing how the front executable was compiled doesn't mean much, from what I know. Specially when you're inside Windows, you need to do it through MS'es compiler most of the time (using dotNET in particular).

I remember you said you'd do it. I could be remembering wrong though.



You predicted? I've been here since the year 2000 and I can't remember that. In all major threads, that is. I do remember you saying that games could not scale beyond 2 threads though. Like a million times you've mentioned it.

Well, from the Core 2 Duos point, threading in Intel was not really something they really cared about at that point. I don't have exact figures, but scaling was very poor (80%?), since specially on their quads... Core 2 Quads were 2 Duos slapped together. For shame! lol

Phenom II was a very different beast altogether. 93% was it? I believe for the time it was stupid high scaling. Phenom I, I have no idea.

Anyway, time and time, we've read about how OGL (prior to 3, IIRC) and DX9c (basically, XBox 360) were not thread friendly as the rendering pipe was always modeled and implemented with 1 heavy thread. Hence piss poor scaling all over the place. Not even physics was threaded IIRC.

If you want to get technical about it, you can start threading subroutines of every part of your game arch into whatever piece size you want, but to do that, you must design it from scratch like that (a point Palladin has been very open about, but oh well). Meaning, using frameworks made the job almost impossible. Not even ID Tech 4 was well threaded from what I remember. And I have this for RAGE when I was using my Phenom II (which is ID Tech 5):

RAGE_Bench_2.jpg


Anyway, threading in games is in its infancy IMO, since no major overhauls have been done in like 15 years. It's been only around 5 years since "more than 2 cores" became available, but not even 2 years since it became some-what important. 2014 has been very lazy in showcasing games with major overhauls underneath, but they're coming. At least, I hope so...



At least we agree on something. Scaling for the unlocked Pentium is horrible due to how gimped it is. It's amazing how much it is in need of AVX and a larger cache. It's stupid. I wonder how an unlocked i3 would look though.

Cheers!
 
Anyway, threading in games is in its infancy IMO, since no major overhauls have been done in like 15 years. It's been only around 5 years since "more than 2 cores" became available, but not even 2 years since it became some-what important. 2014 has been very lazy in showcasing games with major overhauls underneath, but they're coming. At least, I hope so...

But even then, you still had things threaded. Its not like threading is a brand new thing. The stuff that could be worked independently, could be, and was. Games are using about the same number of threads now as they were 10 years ago. The difference is, you have the capability in DX/OGL to break up the giant render thread somewhat, albeit in a very limited fashion. And that's the threading you see now. DX12/OGL NG should help a ton breaking that thread up, but because CPUs aren't the bottleneck (in gaming rigs, at least), you aren't going to see a huge performance benefit. You'll see it at the bottom, not the top.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I remember each one of the million times where he said us that and when he predicted that 8-core processors from AMD would lose to Haswell i3 in new games such as BF4.

In any case, my point is that the eigth-core processors from AMD, including the new FX models presented last month, are a more powerfull choice over the FX-6000 series for well-threaded games. I cannot say the same of Intel eight-core Haswell, which loses to the six-core Haswell.

I had enjoyed if Toms has added some FX-8000 series to the review of the new Haswell.



Indeed. Game developers agreed that poor threading of most last gen games was a consequence of hardware choices in previous consoles (dual-thread single-core). They also agreed that the new octo-core consoles would break that barrier and all them predicted that FX-8350 would be a better choice for gaming than an i5 (Eurogamer poll). Since then we have seen new games where the FX-8350 is on pair with i5 or even beat quad-core i7. However, DX11 and OGL still limit the amount of parrallelism that can be extracted from 8 cores.
 

jed

Distinguished
May 21, 2004
314
0
18,780


Please stop this madness the FX 6 core vs the FX 8 core running at the same clock speeds, yes
the two extra cores shine, just like the 5960X vs 4960X.

http://www.digitalstormonline.com/unlocked/intel-haswell-e-5960x-review-and-overclocking-benchmarks-idnum324/

 

Ags1

Honorable
Apr 26, 2012
255
0
10,790
Hey Jed, your system is monstrous, I take my hat off:

http://www.headline-benchmark.com/results/403b1068-88fd-42e7-a0a7-8c9ae4d57e7f/403bbefa-261b-4637-90de-a43fe7c8fc00

Back on topic, for me the FX chips are getting better and better. Intel performance is staying flat so AMD don't need to roll out new processors to avoid falling further behind. Instead AMD are driving down price and TDP by improving their yields and binning. And software is steadily going in AMD's direction, with more and more well-threaded applications able to take advantage of the AMD CPUs, including many recent games. In Holland I can get an 8320 for 116 euros. For that money the best that Intel can offer is only an Intel Core i3-4350.
 

jed

Distinguished
May 21, 2004
314
0
18,780


Thx, the new machine I'm building should be even better

http://www.newegg.com/Product/Product.aspx?Item=N82E16819117404&cm_re=5960x-_-19-117-404-_-Product

http://www.newegg.com/Product/Product.aspx?Item=N82E16813132260&cm_re=asus_deluxe_x99-_-13-132-260-_-Product

My new case will house both systems

http://www.caselabs-store.com/cart.php
 

8350rocks

Distinguished


Actually, as far as well threaded engines go...CryEngine 3 is the most well threaded, and actually using the threads. Crysis 3 is still the most taxing game on any system at this point. It has been out for some time now too...
 

8350rocks

Distinguished


What I find really interesting there...situationally, the OC'ed 8350 in that comparison is actually faster at integer and floating point calculations in certain areas. The one thing that absolutely gets killed on the 8350 is the IMC. Which we all knew was an Intel advantage. However, I find it striking that the 8350 is actually better than the 3960x under certain conditions...lends credence to the thought that if the architecture were in a better software optimized environment, it would not really be all that far behind. Considering how well threaded Java can be, that is indeed very interesting how drastically software skews results toward the status quo in general applications.

Makes me wonder how far behind are the AMD offerings in terms of actual hardware...
 

from the recent benchmarks (techreport's), it looks like the scores can vary with software, may be with the way softwares access memory and cache. sisoft sandra's memory bw bench scores slightly differs from aida64 memory benches. in aida benches, stock fx8350's memory read score is better than lga1150 haswell cpus'. in memory copy benchmark, fx8350 is faster than core i5 2500k. but in write benchmark, fx8350 is behind even pentium a.e. iirc haswell suffers from memory performance regressions, so it's not that unexpected. from the aida benches, imo vishera's and kaveri's bigger weak spot may be the in the L2 and L3 caches (latency).

as for the prices, it is great for amd-buyers. i doubt we'd ever get to buy an 8 core cpu under $140 if amd had vishera's successor lined up. as a comparison, they priced fx 6100 around $170 iirc. i think amd's way of designing cpus might have to do something with the price drops. is it possible that amd can reduce prices because they designed only one die?

edit: haven't seen any new amd-related news today, so i'm back poking the old stuff. :p
http://www.phoronix.com/scan.php?page=news_item&px=MTc3Nzg
 

jed

Distinguished
May 21, 2004
314
0
18,780


I would say it's in the benchmark itself because here's the same two cpu's and the 3960X wins many
more with a lower score.

http://www.headline-benchmark.com/results/9a411141-526d-4452-8e78-1c9891c2efa0/b83dfc32-8c85-4c58-83bc-604302a291ab

Well 8350rocks if you know of any neutral benchmarks you need
tested between i7 3960X and FX 8350 or 9590 you and I can
handle that, there's no need to wonder when we can make it possible.
 

8350rocks

Distinguished


Well, that is just it, without recompiling a benchmark to favor both architectures equally, outside of something like the headline benchmark, it is nigh impossible to figure out how "neutral" something really is...

I know blackkstar has done some recompiled benchmarks with AMD friendly compiler flags, just curious as to how that would stack up. I am sure he will be reading this at some point, maybe he has an idea.

Stuff that is compiled through MSVC and other compilers is really pretty poorly optimized because the target is a general group, and not a specific set of hardware. If we had something setup to run optimized for both architectures (SSE4, AVX, etc.) then things would be interesting to say the least.

Stock benchmarks are difficult to determine the actual code paths compiled unless you recompile completely.

Also, what OS are you on? Assuming windows, though you may be on Linux, or who knows what...
 

jed

Distinguished
May 21, 2004
314
0
18,780


My OS is windows 8.1 64-bit, again I'm willing to run any neutral benchmark to satisfy your curiosity.
 
Status
Not open for further replies.