AMD CPUs, SoC Rumors and Speculations Temp. thread 2

Page 21 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.



You can see the pain people are going through with ARM in papers like this.

http://www.teratec.eu/library/pdf/forum/2015/Presentations/A6_06_Forum_TERATEC_2015_Perna_Enginsoft.pdf

The ecosystem needs another 4 years or so to be viable.

I wouldn't be surprised if K12 is delayed again, except for a semi-custom project where they have the funds to build the ecosystem they require.
 

semicustom as in Nintendo NX with 8x K12 cores, small fiji gpu with 4GB HBM or 4-8GB HBM2 or 8GB GDDR5...? 😗

@8350rocks: k12 for HEDT or desktop seems like a red herring to me, as HEDT has been strictly x86 territory no matter how much people delude themselves.
i do agree with the lack of efficiency claim though, vendors like facebook kinda pulled the rug under ARM on that one. vendor disappointment may be one of the reasons why ARM's development keeps getting delayed. there is always next time.
 


That paper is about ARM+GPGPU. There are similar stories about x86+GPGPU, and the reason why GPGPU compute is no more popular.



Say that to Paypal or to the AMD semicustom customer of the ARM powered SoC.
 


Not sure yet what others stated but this is only true in the consumers eye not the manufacture. For example if it cost Amd 200$ to make a part and they have to sell it at a similar price or less they lose money, as many will tell you FX CPUs didn't earn them much over this issue.
 


Was that at me lol, trust me Nintendo isn't going to use HBM for NX unless they can get one heck of a deal they want their system to be cheaper then the Wii U Satoru Iwata stated he believed that price hurt the Wii U. Although Amd did state Zen or K12 was being used for a gaming device for 2016, personally i'd expect K12 over Zen over power savings since Nintendo likes their system small and Nintendo already has experience making games for Arm, also i expect K12 to probably be cheaper since its cheaper to make Arm designs.

Although for 3rd party support Zen or any X86 CPU would be a better choice for compatibility.
 


Just to comment here...

That is not entirely accurate. FX actually did ok, and the margins ended up actually being better than expected after the process matured. The money GloFo had to pay back because of yield issues, nodes behind schedule, etc. turned out to be enough that the CPU division did turn a profit a few quarters, and narrowly lost money a few others. The problem is, when your division is netting positive cash flow, but not enough to offset R&D, etc. to keep things rolling...you get what we have now...and a rough looking balance sheet overall for the division.

 


i mean amd cores scale well compared to intel.

for example its like having crossfire or SLI. with one card you get 30 fps with 2 you get 50. naturally you would think 30+30 =60 but it dosent work this way scaling is never 100%. its the same with cpu's if you run a test and restrict the cpu to one core it will do for example 1 work. if we open it up to all 8 cores it will do around 6 work. meaning that a theorical 2 more units of work were possible but lost in reality.

this all core perf to one core perf ratio is called a multiplier ratio. it shows us why the amd fx chips were able to do so great in multi core benchmarks while having the same number of threads and much less single core power at similar clocks. without this ratio those benchmarks are pointless and cannot be fully understanded.

back to the original point amd cpu's have had historically better multiplier ratios than intel parts and piledriver MP ratio is still to this day better than even skylake! unfortunally with skylake's nearly twice the single core performance piledriver gets blown out of the water, as we should expect. \

cinebench r15 shows these ratios very well and shows how cpu's stack up in raw power for what they are. remember to only compare cpu's with the same core (thread) count or else ratios will be wacky. cant compare xeon 24 core to 8 core that ratio will be much better on intel's side unless we divide it by core and then im getting into too much math already I can tell 😀
 

You mean we should compare a 4 core / 8 thread i7 against a 4 module / 8 core FX? That happens because resources in FX are duplicated (integer pipeline), while in i7 they are shared (hyperthreading), right?

Zen will be simple cores with hyperthreading too, so we can expect similar performance with similar core count and clocks matching pref/clock differences, but you suggest that Zen will be faster in multithread, because of some unknown multiplier. Sorry, but it doesn't make much sense.
 
Core scaling is not that important compared to brute performance. Phenom had like 93% scaling? And it still under-performed Core 2 Quads with 75-ish % scaling (can't remember the exact number, but it was low), so not much of an issue when overall performance is not there to take advantage of that amazing scaling. And core scaling is not quite the same as threading scaling (I don't know how it was called either XD) when looking at SMT or CMT vs pure core scaling. Related, but not quite the same from what I remember.

Cheers!
 


The only multicore that the FX wins at is in highly threaded loads against 4c/8t i7s. On an actual real core level, Intel still wins that.

If it was not so then AMD s BD would be doing much better in the server market since servers take advantage of multiple cores much better than consumer desktops do yet AMD has a very small amount of market share in servers.

AMDs implementation of SMT might be better but I do doubt it, Intel does have better experience at SMT but I do think AMD will at least not have the same performance issues SMT had for Intel with their first iteration, Pentium 4.

Core scaling also depends highly on the programs and how well they can take advantage of it.
 

Actually, I remember the tests on Tom's when BD launched. On highly threaded loads, the FX-8150 stayed between the i5 and i7, beating the later in just one bench.

I don't think BD ever had multithread performance to match the 4c/8t Xeons in the server market.
 


We don't know how much of the pipeline is duplicated for Zen yet. HTT is minimal, but also cheap for Intel to implement. By contrast, CMT gave greater gains, but ate up a lot more die space to implement. We really can't guess at gains until the specifics of the SMT implementation are known.
 


In reality Bulldozer/Piledriver cores scaled rater bad due to the 20% multi-threaded penalty of the CMT approach. FX chips won on highly-multi-threaded workloads only when comparing 8-core AMD vs 4 core Intel, 4-core AMD vs 2-core Intel, and so on.

N-core AMD vs N-core Intel always win Intel.

And the gap is higher on server, because Intel 16-core CPUs are single die, whereas the 16-core Opteron are dual-die (which adds an extra performance penalty).
 
Any Carrizo reviews yet?
24.gif
 
zen is back on q4 2016. process node may be either made on TSMC 16nm FF or GF 14nm FF+
http://wccftech.com/amd-contracts-tsmc-produce-zen-16nm-woes-14nm-process-troubles-globalfoundries/

regarding the debate about multiplier of fx chips and only working well on highly multithreaded workloads that is what I was getting at. for example handbrake encoding fx chips were just as fast as the 8 thread chips from intel when all cores enabled while significantly slower when one core enabled.

count of cores dosent matter to program. only the threads that can be ran at the same time. if you have CMT 8 thread it gave amd a big advantage to a SMT 8 thread. the point here is 8 cores vs 4 core 8 thread looks the same to program and therefor when each core from amd is 30% slower than each core from intel we find this ratio to be important. where it physically gets its improvement could be said from the shared L2 or fpu's I don't know that part, but I do know that 8 amd cores(threads) that each do 1 unit of work are roughly equal to 8 intel threads that each do 1.3 unit of work.

this is my point its hard to grasp what is so amazing about BD until you understand this key point, and makes me still believe in the CMT design. if only amd had a way to improve that single core performance they would have blown intel out of the water on multithreaded workloads.

@juanrga yes cmt scaled poorly loosing up to 20%, but look at what smt lost! up to 45%!

(im comparing to i7 3770k the top end cpu that intel had to compete with piledriver. its unfair to compare to haswell or broadwell or skylake as amd droped development of BD and intel didn't.)
 


The biggest problem is that at the same time AMD was pushing CMT Intel was starting to push 8 core 16 thread CPUs in the server market. If they had a CMT based design out when Intel was still at 4c/8t or maybe even 6c/12t then they might have done better but their CMT design was too late and inefficient.

CMT is a superior version of SMT for sure since it has extra resources. Only downside is since it is so different from SMT is that it required an OS to be rewritten to work with it the same way a OS had to be optimized for SMT.

Maybe in the future CMT might become viable again but considering that Intel has anything from 4c/8t up to (soon) 28c/56t CPUs without the need for CMT AMD might want to focus on catching up in core count and performance per core first then look into a CMT design.

Of course we mightr also need more info on CMT as CMT might be why BD is more like Netburst instead of K8....
 
All SMT implementations simply come down to how much of the pipeline you want to bother duplicating. Pretty much that simple.

BDs problem is this, you have about a 30% per-core performance deficit, plus an additional 20% or so hit when both cores of a BD module are used in non-Integer exclusive workloads. Combine those two factors, and you can see why BD, even in programs that did scale to all cores, would lose to an i7. If AMD had marketed the chip as a quad core with 2-way SMT, it would have been received better, but marketed as an 8-core CPU, people expected more. That's how we ended up at "Intel quads beat AMD's eight core chips".

And I note again: MSFT wouldn't have needed to patch Windows for BD derived chips if AMD just set the HTT CPUID bit, since the Windows patch basically forces BD to be treated the same way HTT is.

EDIT

The above point actually raises an interesting question: For Zen, when AMD says "X number of cores", how are they defining X? Because I hope they learned their lesson and aren't counting SMT cores in the total again...
 

I'm pretty sure AMD did specify cores/threads, or if it was just some of the rumours that was been publish. Everything I have seen have been specifying cores and threads. I doubt they would make that mistake.
 


Considering that BD was marketed as an 8 core CPU when it was not a full 8 core CPU, they can make that mistake. Of course the marketing team for BD was pretty bad anyways.
 

Well, their FX processors was in a "crosszone".
Only time can tell, but I don't think they would make that mistake.
 
Status
Not open for further replies.

Latest posts