AMD CPU speculation... and expert conjecture

Page 439 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Did you read at least the title of the article? They are referring to multi-core CPUs. A GPU needs to be feed by a CPU. Nobody builds a heterogeneous supercomputer with i3s. At contrary 8/10/12/16 core CPUs are used to feed the GPUs.



Hum, main bottleneck in Bulldozer modules was the shared decoder. This has been eliminated in Steamroller. The 20% penalty is now gone. Reviews of Kaveri show that its performance per watt (CPU alone) is at the Ivy Bridge level. And its performance per watt for the whole APU is clearly beyond.

Integer per core performance is at Sandy Bridge level. In some tests the A10-7850k scores 10--15% behind i5-2500k and in others outperforms the i5-2500k.

The floating point performance is much poor, but the problem is not the shared FPU per module, but the wide of the FMAC units. Steamroller is a 8FLOP/core architecture. Excavator fill fix that with 265-bit FMAC units, offering 16FLOP/core as Sandy/Ivy Bridge.

I agree on that 8-cores is overkill for mainstream use today. Why do you believe that Kaveri is dual/quad-core. It was explained in this thread during months that AMD was targeting the mainstream user (95% of users) with Kaveri. Therein the lack of evolution on the FX line.

At the end, like it or not, AMD and Intel are doing the same. Only the 'tempos' are different. Intel started with strong CPU evolution and is now stagnant, migrating to GPU evolution and moar cores (already mentioned that Haswell increase cores from 6 to 8 and Broadwell xeons will increase cores from 12 up to 18). AMD started with a strong evolution of GPU (GCN) and moar cores and is now evolving CPUs to caught Intel.
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780

Ags1

Honorable
Apr 26, 2012
255
0
10,790


Well, I've signed it :)
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


AMD doesn't get a lot of credit for a lot of things they do.

If I had a dollar for every time someone using a GDDR graphics card (which AMD invented) or a 64-bit x86 OS (which AMD invented) and they were using Intel or Nvidia, I'd be buying you all 4 R9 290x cards.

AMD is blowing us little hints that APU only isn't the future. I don't know what exactly is going to happen, and some of these rumors are probably not going to happen. However, it is AMD telling us that we've got more to look forward to than 2m/4c APUs.
 

griptwister

Distinguished
Oct 7, 2012
1,437
0
19,460


I signed it too. Lol, I think it's pretty obvious by now, AMD just isn't using the "FX" name. SteamRoller CPUs are coming out. I just doing think they'll be here this year and they'll only be 15% faster.

 


If they make it in SOI, then you shouldn't laugh. BULK will be busy handling Kaveri and PS4/XB1 supply, so I really doubt GF or TSMC could handle more BULK for a possible FX CPU.

Its not a bad idea to be honest, but since most improvements were made for handling more threads better (in SR), I would say PD owners would yawn at it in current workloads that are bottlenecked by single core/thread performance since they won't show any significant improvements.

What was GF's promise for the next node? 22nm SOI? 22nm BULK FinFet?

Cheers!
 
People still comparing PHII to BD? Guess they'll never learn.

Anyhow, a DT SR chip would be pretty decent as long as they can produce it on something that clocks high. BD's uarch absolutely needs high clock speeds, those 2 ALU's can only do so much when compared to design's that have 3~4 ALU's.

This is mostly AMD trying to tackle the enterprise sector which favors high core count chips, they won't ever be able to competitively compete here. Low power markets is where most of the chedder is and where they should focus their efforts. They just need to add a third ALU to each "core", would cost relatively small amounts of die space and provide the extra integer performance that everyone is complaining about.
 

Sigmanick

Honorable
Sep 28, 2013
26
0
10,530
I had the chips and wanted to see how they compared. Absolutely, BD has improved integer performance; and it isn't just a clock speed variance. On average, the increase was a factor of 500%, when the speed difference is only 125% (2.8ghz vs 3.5 ghz). FP performance did not change as significantly. The FP seemed to scale fairly linearly with the speed increase.

How this is affected by the reduction of ALU's per core is something I don't truly understand yet. So, from my perspective, the proc performs this well despite the loss of 1 alu/core.

Believe it or not, I will be updating HL Bench with a Clawhammer processor if I can get the old pc to boot - just to see how far efficiency and power have come.
 
so far, i haven't seen anything dampen people's enthusiasm so fast like kaveri's reviews did. with intel, everyone knew it was going to be faster, but boring (i'm sure there's a car analogy here). with amd everyone was excited (with and without clues to why) about kaveri. heck, most people ignored jaguar without realizing it's potential (until consoles came out). now keveri comes out and doesn't o.c. to 5 gigglehurz and everyone is suddenly disappointed. by everyone i mean mostly c.a.l.f.
there are lots of legitimate reasons to be disappointed with kaveri, such as - amd's reluctance to share technical information on glofo's failure to fab kaveri on soi, reluctance to share info about steamroller b cores, absence of hsa-compliant software demonstration at launch, vce 2.0 demonstration, zero information on kaveri's power management, turbo, powertune's mechanisms etc, !@#$ing with independent reviewers by giving them so little time to build a launch review, by not telling if kaveri's xdma engine, out of the box, can use a hawaii card as primary and it's own igpu as compute accelerator and vice versa and so on. oh, then there's the price - never explaining why we should buy an a10 7850k for prices higher than a core i5 3350p (msrp is under $6, but may not include hsf price) or at the same price as haswell core i5 4430.

i wonder if amd has SR-A reserved for soi. would be nice if it was more power efficienct than sr-b.

amd should really stop beating around the bush and clearly state why it wants the 4 cpu cores + 8 gpu cores be treated as 12 compute cores - to be used in a dt system like a discreet cpu capable of 12T and running a discreet graphics card or two. the "12 compute cores hype" really hits a dead end when i (and other casual/entry level users - target demographic for kaveri) try to think " if 12 cores are running the cpu, how will we get graphics?"

there's plenty to be excited about kaveri. it's the penultimate hsa device in it's physical form. 'nuff said. well... not really.

Sony engineer unveils the ‘functional beauty’ of the PS4′s cooling system
http://vr-zone.com/articles/sony-engineer-unveils-functional-beauty-ps4s-cooling-system/69852.html
 

jdwii

Splendid


Compared to BD phenom or the stars architecture is better even in today's benchmarks the majority of the time its safe to say Amd know's BD was a failure and since their focusing on 4 cores the module design is a failure and yes they know this now i hope. Amd will barley beat the star cores with the excavator design with PD Amd needs a 15% advantage in clock speed to beat a x6 phenom and now its 10% with steamroller and with excavator they will probably be around 5-10% faster in performance per clock compared to the stars design on average besides programs using the new instruction sets this is not in opinion this is a fact, again for the 100th time Amd would have been better with X86 performance if they sticked with the star's design and improved on it.
 


It's your opinion being regurgitated as fact.

BD uArch was designed to clock high, the K10 uArch can't achieve high clock rates due to design limitations. This is why the "clock for clock" or "at the same clock" argument fails. The last K10 uArch was Llano not Phenom II, so if you want to see a K10 APU then compare against that. It was a four core 32nm implementation of the stars uArch and had the same limitations. Cost, not clock speed, is what differentiates products from each other. AMD isn't "focusing on four core", four core is the max they can fit inside an APU and still have it at a reasonable cost and TDP for consumers. There is no admission to some imagined past grievances as you have stated. The original BD CPU wasn't much of an upgrade for the x4 or x6 crowd, but the 8350 was most definitely an upgrade. I happen to have a 980BE working in my backup system and it's no where near as good as my FX8350.

What you, and many others not clued in to how uArch's are made, don't understand is that the fictional imaginary stars design you have invented could never exist. There was no future 32/28nm "Phenom III" that was a scaled down design, it wouldn't of worked without a major redesign. As you have all just witnessed going to smaller nodes presents very real problems to design, chiefly thermal density and current leakage. Another thing your not realizing is that the BD uArch is far more efficient in utilizing processor resources then Phenom II is and while I can understand why they wanted to go with an 8x2 ALU design I still believe a 6x3 could of been better.

The only real weakness in the BD uArch that isn't present in the K10 is the way they implemented the SIMD FPU. When used in normal workloads it's fine but when synthetic benchmarks are ran on it it tends to perform poorly due to how it's a units are linked. I believe AMD went with this design because they eventually wanted to move most of the SIMD functions to the GPU component and the four 256-bit units present in the BD are sufficient for anything a consumer would throw at it (anything requiring more then four FPU's would be better one on a GPU anyway).

Anyhow I just got done reviewing historical information and 8350 crush's both the 980 and the 1100 overall, it beats it in single threaded and absolutely crush's them in multithreaded. This is primary due to the 980 and 1100 having their clocks being TDP limited, physics is a hard and cruel master. I don't mind people speculating and theory-crafting but the incessant "it's worse then phenom II" is horribly incorrect.

fx8350 vs 1100T
http://www.anandtech.com/bench/product/203?vs=697

fx8350 vs 980
http://www.anandtech.com/bench/product/362?vs=697
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


They are signing for "8 core or more". I suppose some people needs to dream.



Hum. AMD announced many years ago that they were migrating from SOI to bulk. They also explained why: (i) freedom to chose foundry instead being locked at Glofo, and (ii) elimination of extra R&D by unifying process for all the products.

In recent days AMD disclosed another reason why they choose bulk: the high-density of the iGPU in Kaveri is not suitable for a SOI process.

HSA-compliant software is ready. For example AIDA64 4.0 is HSA enabled. Guess what? 'professional' reviews used AIDA64 3.xx in their 'reviews' of Kaveri.

And so on...
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


It is just repetition of repetition of repetition...

The GDDR5 version of Kaveri was canceled time ago, because the SO-DIMM were not ready due to bankruptcy of one of the makers of GDDR5M. and the impossibility to release GDDR5 Kaveri systems with only one supplier. At least two are needed for redundancy.

The diagram that they reproduce is not a 8-core system because a CU is not a core. It could represent Warsaw (Opteron) as planned initially many many time ago

AMD_ArchRoadmap_689.jpg


Again, there is nothing as that in official roadmaps and final Warsaw is different

AMD-Opteron-APU-Roadmap-635x473.jpg



 
AMD’s FS1b socket to get new chips and a rebrand
Supply chain sources say that AMD will be dropping the FS1b moniker in favor of rebranding it to AM1.
http://vr-zone.com/articles/amds-fs1b-socket-get-new-chips-rebrand/69950.html

More GPU failures on MacBook Pros
http://www.fudzilla.com/home/item/33702-more-gpu-failures-on-macbook-pros
amd, this time.

AMD A10-7700K "Kaveri" De-lidded
http://www.techpowerup.com/197051/amd-a10-7700k-kaveri-de-lidded.html
seems like amd pulled an intel. as a result, now the so-called "real gamers and enthusiasts" have new apus that are overpriced at the top, uses thermal paste under heatspreader, has higher heat generation per area, much denser than previous gen, not necessarily faster cpu, bulk silicon, reduced clockrate and often regressed performance, uses almost as much power at peak load... and is Not Called Haswell. :lol: *munches on popcorn and watches c.a.l.f. flip out again*. on the plus side, none of this matters where kaveri will very likely shine, laptops (and in some small form factor desktop pcs, finally). i'm looking forward to what mobile kaveri is bringing. xdma, much improved dual gfx are already great. with new drivers, it can't get worse.

Yes, Intel is subsidizing Bay Trail tablets
http://www.fudzilla.com/home/item/33701-yes-intel-is-subsidizing-bay-trail-tablets
temash and mullins are about to face the intel money flood.
 
The only real weakness in the BD uArch that isn't present in the K10 is the way they implemented the SIMD FPU. When used in normal workloads it's fine but when synthetic benchmarks are ran on it it tends to perform poorly due to how it's a units are linked. I believe AMD went with this design because they eventually wanted to move most of the SIMD functions to the GPU component and the four 256-bit units present in the BD are sufficient for anything a consumer would throw at it (anything requiring more then four FPU's would be better one on a GPU anyway).

Here's the issue for that: That setup only work on two conditions:

1: No dGPU is present in the system
OR
2: An AMD dGPU is present in the system

Thanks to WDDM, the OS can only handle one graphical driver at a time. And using the iGPU is currently handled GPU driver side. HSA, as far as SOFTWARE IMPLEMENTATION goes, is no different then using OpenCL with a dGPU, with the iGPU as the target instead.

So why should I bother with HSA, which only works on a single product line [APUs], when I can use OpenCL and get basically the same thing working across every GPU in existence?

If I were AMD, I would be pushing both MSFT and the Linux kernel developers to allow multiple graphic driver installations for this very reason.
 

juggernautxtr

Honorable
Dec 21, 2013
101
0
10,680


8 core Intel chip pulls 140 watts server edition.
8350 8 core pulls 125 desk top version. server edition is below 100 watts.
a 4 core Intel desk top pulls 77 watts thats more than half the watts of the 8350.Intels "power efficiency" isn't that impressive if you ask me.

and the 16 core chips

"The five 6200 chips -- 6262 HE, 6272, 6274, 6276 and 6282 SE -- run at clock speeds between 1.6GHz and 2.6GHz, and are priced between US$523 and $1,019. The chips draw between 85 watts and 140 watts of power"

so Intels efficiency really isn't that impressive and in my opinion goes to show it is less power efficient than AMD.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


HSA doesn't work only on APUs. In fact, HSA covers a broader hardware range than APUs. And you can use OpenCL also with HSA hardware. Moreover, OpenCL 2.0 has been developed in agreement with HSA specification. Several key features of OpenCL 2.0 map one to one to key features of HSA.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Efficiency is not number of cores per watt, but performance per watt.

The better efficiency of Intel chips has been the reason why Intel own ~95% of the server market and AMD the remaining 5%. However, Steamroller efficiency is good and at the Ivy Bridge level

http://openbenchmarking.org/embed.php?i=1401168-PL-SENSORTES44&sha=8000f0b&p=2
 
BTW, remember all my arguments about best case benchmarking:

embed.php


Truthfact: GTX 650 is up to 3% faster then the R9 290 for $380 less!

Hence my stance on "up to" numbers. Without context, they mean nothing.


[I'm hoping everyone gets I'm making a point here?]
 
Status
Not open for further replies.