AMD CPU speculation... and expert conjecture

Page 438 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
^^ could be seattle. :D

from what i've seen with kaveri, it is possible for amd to make a 8 and 16 core "big" cpu with steamroller. whether glofo will optimize such design for soi and willing to work with amd is a whole different matter.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


That's the Software Optimization Guide for Steamroller architectures (Family 21 or 15h) which is publicly available.
The architecture supports up to 8 modules (16 cores).
It identifies multi-socket capable models (40h–4Fh and 2h) with extra HyperTransport 3.0 interfaces.

There's just no indication AMD plans to FAB them based on the available product roadmaps. The capability has always been there of course.
 
Kavari's promise is strangled by its inherent Achilles heel. If I took a 4770K and fused it with the 7800 iGPU you would probably see performance almost double what they are producing now on the Bulldozer core.

It would be an i3 not an i7, there isn't enough room on the CPU die to do otherwise. It would be also be the i3 without L3 and bigger as intel doesn't give as much real estate to their iGPU as AMD does. The iGPU on APU's is nearly half the CPU die, so APU's are essentially the same size as a four module but with two modules removed and the iGPU bolted on instead. Also for your i7 + iGPU idea, the performance would be worse then a i3 as your sharing TDP with the iGPU, something that current i3 + dGPU setups don't take into account.
 

juggernautxtr

Honorable
Dec 21, 2013
101
0
10,680


still be cool to see what it could be. it's nice to dream now and then.
 


Its hyperthetical since the benches always have a 4770 + HD4600, all I am saying is if you took Intel CPU with AMD iGPU the result willl be a substantial difference over the 7800+R7. Steamroller is essentially generation 3 bulldozer cores with tweaks and refinements on process but essentially hasn't moved the x86 goal posts far at all and excavator will as above be evolutionary not revolutionary. I am waiting on a revolution and that will only happen after excavator (hopefully).

That is not to say I don't like the upgrades and technologies, just been waiting for some x86 kick.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
Final version of my article with analysis of data disclosed at APU13 and CES-2014, explanation for clocks, and so on is ready:

http://juanrga.com/en/AMD-kaveri-benchmark.html

Additional data here

http://openbenchmarking.org/result/1401145-PL-AMDA1078505

http://openbenchmarking.org/result/1401123-PL-AMDA1078581

Using the available benchmarks @4GHz the predictions made are in excellent agreement with final measurements

x264-kaveri-comp.png

C-Ray-kaveri-comp.png

Himeno-Benchmark-kaveri-comp.png


The Steamroller architecture performs poor than I expected in x264 and better in the other two, with an average deviation of 0% (average of -7%, 5%, and 1%). This is an excellent agreement between prediction and final measurements. This means that, in the average sense, the Kaveri CPU performs as I predicted in this work.

However, final Kaveri silicon is clocked at 3.7GHz, and scores are a bit poor. For instance, Kaveri overclocked at 4GHz hits 94.45 in the x264 benchmark but only hits 90.34 at the stock frequency. This is a difference of -4% between stock and OC. Next, I present a series of benchmarks for Kaveri at stock.

In the first version of my article I compared the estimations made for Kaveri to measurements of two Sandy Bridge i5 and one Ivy Bridge i5. In this occasion, the Sandy Bridge i5-2400S is replaced by the Haswell i5-4670. This way we can compare the AMD Steamroller architecture to three successive generations of Intel iCore processors. Small differences between the scores shown below for the i5-2500K and the i5-3470 and the numbers used in the first version of this article are a consequence of improvements or an occasional regression in new versions of the software used.

x264-kaveri-final.png

John-The-Ripper-kaveri-final.png

Himeno-Benchmark-kaveri-final.png

LAMMPS-Molecular-Dynamics-Simulator-kaveri-final.png


As I predicted, Kaveri is at the Sandy Bridge i5 level in integer tests such as x264 but looses in heavily FP tests. However, Steamroller behaves better than I predicted in FP tests. There is 30% IPC gain compared to Piledriver. The explanation could be the short FPU pipeline (4-->3) and the faster L2 cache.

Special mention is the performance in JTR test. I predicted Kaveri would be faster than Ivy Bridge i5

John-The-Ripper-kaveri-pre.png


and we can see that Kaveri even outperforms the Haswell i5-4670.

There are other interesting tests I have not included in my article. I will only mention that in a pair of tests the Kaveri CPU was almost so fast as the FX-8350.

Kaveri is a good APU. and it behaves as I predicted.


What tourist reports is only the top of the iceberg. Most reviews of Kaveri have used W7 (optimized for Intel) instead W8.1, one of them took a 2400MHz memory kit and underclocked to 1600MHz before benchmarking Kaveri, all sites used AIDA64 3.xx versions when the new AIDA64 4.0 is HSA enabled, one of the review sites used a x264 binary which is specifically optimized for Intel (including Haswell extensions)...
 

Lessthannil

Honorable
Oct 14, 2013
468
0
10,860


This is exactly what I was thinking. Bulldozer is good at everything a game doesn't need and is terrible at the things games use and need the most. Piledriver has good performance relative of its price, but once you consider that you need to buy an 8 core CPU to do the work of 4, its flaws shows.

I don't care for 8 cores. 6 cores is nice, but is in no way neccessary. I only need 4 because I am a "typical" desktop user outside of games. The FX 4300's performance is damning. It also seems like a slap in the face that AMD completely ignored per core performance with this revision to focus on something it already does well in. I would of bought a Piledriver/Steamroller FX or Kaveri CPU if it even matched Sandy Bridge or perhaps even Nehalem on single core perf.

I also don't also care for using software tricks to make the FXs seem decent. Asking game devs to utilize and have benefits for 8 cores (keep in mind, the amount of people who have 8 cores is almost trivial at 0.33% on Steam) was a tall order. Now, AMD is asking for Mantle and HSA to bail them out. Seeing how long 8 core utilization took (and even still some devs won't do that), God knows how long it will take them to swallow a whole graphics API and HSA or if they will in the first place!
 
^^ amd is trying to design hsa in a way that adoption is faster than multicore adoption. if they can execute it, hsa will spread faster.
as for mantle, if it's vendor-agnostic, it'll help both amd and intel. if someone like valve adopts in in steamos, it'll get a massive boost. right now though, mantle is exclusive to ea/dice, who are, valve's competitor. i don't like mantle myself. but if i was an entry level user with low budget and kaveri-based pc owner, i'd use mantle to get as much perf out of kaveri as possible.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


You got it all wrong.

Desktop users can benefit from moar cores. That is why processors from Intel for the desktop have OpenCL support and you can use OpenCL software to run desktop tasks using GPU cores.

Second, the choice of eight-cores for the consoles was a game developers choice as Sony head has explained lots of times. Some of them even have explained in public why prefer moar cores. Check the Eurogammer interview to Metro Last Light developer:

Oles Shishkovstov: We are talking PS4, right? I am very excited about both CPU and GPU. Jaguar is a pretty well-balanced out-of-order core and there are eight of them inside. I always wanted a lot of relatively-low-power cores instead of single super-high-performance one, because it's easier to simply parallelise something instead of changing core-algorithms or chasing every cycle inside critical code segment (not that we don't do that, but very often we can avoid it).

In fact some developers asked for a 16-core console, but the general consensus was eight.

Third, game developers rewriting game engines to use more cores also benefit Intel users. Intel is stagnant in CPU architecture. Why do you believe that Intel is now following "moar cores"? Intel has announced 8-core Haswell chips, because a 6-core version would offer minor advantage over current Ivy Bridge. And Broadwell Xeons will increase number of cores up to 18, compared to current Xeons.

Fourth, AMD is not asking developers to use MANTLE. Game developers have been asking for something as MANTLE for decades. This is why MANTLE is being developed in collaboration with game developers. MANTLE is the solution to an old problem: API overhead/limits.

Fifth, HSA is not AMD thing. HSA is a HSA foundation thing. Pay attention to founder members and affiliates. HSA is a standard about something that exists since years ago (heterogeneous architectures), but that is difficult to use with current tools.

Most top supercomputers are heterogeneous architectures and there is consensus among experts that exascale level of performance only can be achieved with heterogeneity. In fact, Nvidia and Intel are developing its own versions of heterogeneity. HSA foundations members are providing one solution to existent problems.
 

that's basically a mobile haswell core i3 sku with iris pro tuned for higher tdp. the closest thing is this one @$315, with hd5100. even an a10 7850k will be better for price.
or a cut-down core i5 4570R. it won't necessarily be cheaper. it will cost over $200. the edram alone costs ~$80. the low power i3 4330T costs $138, so the end price of an i3 with iris pro will be $220, at least. that's without the motherboard price, any additional i/o ports, cooler and the rest.
 

Lessthannil

Honorable
Oct 14, 2013
468
0
10,860


I find that hard to believe.

http://

 

jdwii

Splendid
"When we asked AMD's CTO Joe Macri if a better solution would have been six CPU cores and fewer GPU cores, the answer was a resolute 'no'. Macri said that very little software is able to take advantage of more than four cores."
Yeah why did they make BD again?
 

i guess it was budget. bd was already delayed before launch, glofo was struggling with 32nm... i think amd didn't have any other alternative. they made the design to scale from servers to desktop to laptops with minimal number of dies. the future goals of bd were to launch modula design philosophy and hsa underpinnings. unlike intel which has probably 2-4 main designs... haswell mainstream, haswell-e/ex/ep, atoms, itanium etc. one budget-friendly scalable design was suitable for amd. that's why the "moar cores", "multithreading is the future", "future is fusion" happened. with hsa, they don't need more than 4 cores for mainstream. after hsa's realization, amd has changed it's mantra from "more cores" to "even more (compute) cores but you're legally bound to mention 4 cpu cores plus 4 igpu cores". their tune will change with the stuff they're trying to sell you, just like everyone else.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


(i) Servers, (ii) impossibility to compete with Intel in single core perf. (iii) impossibility to build a 8-core CMP within available process technology.


 

juggernautxtr

Honorable
Dec 21, 2013
101
0
10,680
I love the we don't need more cores Intel mantra garbage, if it were up to intel we'd still be using single core processing waiting an hour for a game to boot.
people just don't give credit where it's it is due. AMD was the one that pushed core advancement, it was amd who gave us 64 bit processing, amd is first to heterogeneous computing.
but OMG AMD is so far behind intel.... they are? please explain.
sounds like intel fanboisms spazzing in there sleep over intels so much more advanced architectures......yeah okay.
 


Physics scales. Hence why both AMD and NVIDIA are trying to move processing of physics onto the GPU, since this opens up a LOT you can't do on the CPU, due to the extra power available.
 

jacobian

Honorable
Jan 6, 2014
206
0
10,710


Why AMD behind?

Bad power efficiency and lower per-core performance. AMD likes to give everyone many cores, but two cores in each module share too much stuff together reducing their efficiency. The greatest thing about AMD is that you get 8-integer cores for very affordable price. But what do you need them for? This would work out pretty well on a server, but on the desktop you run into decreasing returns pretty fast. For floating point, you get only one execution unit for every two cores. This sounds awfully like Intel's "hyperthreading".



 

Sigmanick

Honorable
Sep 28, 2013
26
0
10,530


Well, I was under the impression that the AMD FP unit has traditionally been overpowered compared to Intel. Part of the Reason the original Athlons performed better than the P4's

compare an i7 2700k vs 8350 FP results and you can see that the 8350 remains ahead. (Yuka @ 3.5G vs Palladin)
http:// 2700k vs 8350

However, if you compare the ph-ii 720 the FP numbers look very similar, minor updates. Assuming I have nothing stupid holding the Ph-ii back, Integer performance gains are astounding to me. http:// 720 vs 8350
 
^^ looks like it. SR could always be 12 core if amd wanted it and if glofo cooperated. 16 core monolithic is... too big.
if i read toms power consumption and efficiency charts right, both 7850k and 7600 can hit over 100w at peak and in similar workloads. i think they hit well over 100w but it's hidden under 6800k's line.
since toms benchmarks are mainly cpu-intensive and multithreaded, i'm guessing that the cpu cores aren't as power efficient and/or amd's/asrock's power management wasn't good enough. moreover, i read somewhere that "adding decoders in OOO cpus increases power use"...? or something like that. if 2 modules can hit over 100w at peak, how high will 8-16 cores go?
 
Status
Not open for further replies.