AMD CPUs, SoC Rumors and Speculations Temp. thread 2

Page 25 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


Glofo is in problems as well. Petroleum is in crisis and reduced the amount of money Mubadala can invest on Glofo. They had to cancel its own 14nm process, and license the node from Samsung. The original plan for Apple was to use Samsung for the full A9 chips, and Samsung was going to use Glofo as second source to produce those chips beyond the volume capacity of Samsung factory. Due to delays and yield problems, Apple changed the plans in last minute and contracted TSMC for 60% of its A9 chips; now Samsung does the remaining 40% and Glofo does nothing... Glofo invested lots of money and don't have the volume of sales they expected just a pair of years ago. Glofo needs money to survive even more than AMD, and last round of rumors is that Chinese want to purchase Glofo for their own chips.

Then we have the weird issue named WSA, which obligates AMD to produce certain amount of chips on Glofo yes or yes, under economic penalty in case AMD doesn't.
 

I REALLY want those Q3 numbers. More indication the bottom may have fallen out.
 
All evidence indicates that AMD is going down... really sad, i really really don`t want them to go because Intel and nVidia will start acting like D-s with the prices of their products, and we will also start getting super small performance increase every new gen.

I hope they really make it at least to Zen so they do gain some profit, but they are always always late with their products.
 


Other reports are saying AMD regained market share in graphics.
They also signed an exclusive contract with HP for their Carrizo Pro APU. The 2nd largest PC vendor.
They could probably break even if there isn't another big one time charge, which occurs often with AMD.
 


Agreed, although the release of a high end part (if it's any good) may be more important that you initially give it credit for the simple reason of
1: Halo effect
2: Upgrade path.

Excavator APU's aren't terrible (at least on mobile). They should be fairly competitive with Intel's entry stuff (e.g. Pentium / i3). The big problem AMD has now is that on the FM2+ platform that is your limit (as they're better processors are on another socket, and those don't really offer the performance to warrant the upgrade anyway).

With an integrated socket and a (hopefully) much better high end option, all of a sudden AM4 looks like a decent buy. Start off with a quad core excavator, upgrade to a Zen based processor later (either the high end 8 core or a sub version of it, I'm certain for example there will be a harvested 6 core zen if the top part is a monolithic 8 core.).
 
It seems some 'reverse-engineering' of Zen patches provides info that Zen is finally a 4 ALU + 2 AGU + 2 SIMD with 32KB L1 and 256KB L2. And each SIMD unit 128bit FMA

zen-architektur2bcorewou9h.png


If that is accurate, then my prediction [1] of the SIMD wide and FMA units was right, which implies Zen is a 16 FLOP arch, as I expected. For the sake of comparison Haswell is 32 FLOP and Skylake (at least Xeons) is 64 FLOP. The Xeon KNL Phi is also 64 FLOP.

I also got right the total number of integer/mem pipes, but I had predicted 3ALU + 3AGU, instead 4ALU + 2AGU.

Albeit I finally proposed a 3ALU+3AGU configuration for Zen, I asked David Kanter about the possibility of Zen using an 4ALU+2AGU configuration, when the Internet was full with the slides (that latter I showed to be fake [2]). The discussion with Kanter was:

> I have a question, I predicted 3ALU+3AGUs and the leaked diagram shows six integer
> pipes. Do you believe a 4ALU+2AGU would be a better combination or not?

3 AGU + 3 ALU is a much better mix. Remember that x86 is load+op, so generally you want to sustain nearly a 1:1 ratio of memory to ALU operations. Haswell and Broadwell have extra ALUs to handle branches, etc.

2 AGUs + 4 ALUs would be rather disappointing and also at a severe disadvantage for HPC to Intel.

David

Full analysis with further details:

http://dresdenboy.blogspot.com.es/2015/10/amds-zen-core-family-17h-to-have-ten.html

[1] http://semiaccurate.com/forums/showpost.php?p=235170&postcount=219

[2] http://juanrga.com/en/the-fake-zen-slides.html
 



What does all this means in Mortal terms? good or bad?
 
Nice piece of information, Juan. Thanks for that.

I think they're focusing on efficiency a bit too much. The layout, being general and all, does not tell the whole picture, but that L1 D$ 8-way cache worries me a bit. 32KB for it does not seem bad, but being 8 way... Uhm... Also, what about the size of the I$ cache? Are they betting on their improved prediction or something? And can the 2 FMACs be "superposed" or they'll be completely independent?

In any case, how did you arrive to the 16 FLOP figure?

Sorry, but I can't read too much into that diagram, haha.

Cheers!
 


Intel did the same with the move from Core 2 to Core I. Their L2 was much larger but due to superior branch prediction in Core I they dropped the L2 size per core and added a ton of L3.

One benefit Intel has had for a while though is that their L3 saves all instructions so that the CPU doesn't have to look to the system memory again which takes more time and is vastly slower. That is something AMD should be planning on implementing IF they want to even come close to catching at least Haswell.
 
In mortal terms it means performance would be between Sandy and Haswell, IFF there are no bottlenecks in the rest of the design (e.g. caches or memory controller). As Kanter confirmed my 3ALU+3AGU proposal was better for HPC. This could explain why we didn't hear any HPC win for Zen, and why customers choose Intel and IBM.

I would guess L1-i cache would be 32Kb as well. There several ways to get the 16 FLOPS number. Simpler is considering there are four pipes 128 bit each. A single precision operation takes 32bits. Therefore performance is ( 4 x 128 bit/core ) / ( 32 FLOP/bit/FLOP ) = 16 FLOP/core.

The two 128bit FMAC units can be combined to give a 256-bit FMAC unit.
 


Now you say Haswell is a 32 Flop / core design, however as I understand it that is *only* in relation to FPU instructions? Standard instructions it's 4 ALU same as Zen.

I mean, despite having *double* the fpu power of say Ivy, Haswell doesn't perform that much quicker in most games / general software. I'm hopeful based on these specs that Zen could be just as fast as Haswell in a lot of real world software, simply because 512 bit AVX 2 instructions are so seldom used... I mean from the consumer side at least, comparable performance in real world things is the most important factor (e.g. mp3 conversion with lame / iTunes, gaming, video trans-code). Undoubtedly Haswell (and newer) Intel designs will likely outperform Zen in synthetic benchmarks that use all the new instructions. The problem AMD has currently though is there's nothing they're really good at.
 


The problem I see is that servers do benefit from these advantages that Haswell/Skylake all have and that's where AMD should be focusing as the margins are much higher.

It would be easier to pull themselves out of their deep hole with server design wins than consumer design wins.
 


Ahh...yes...I recall some info about it being 4 ALU/2 AGU with 4 pipes coming out a while back (in the old thread I believe...I know I had discussed it)....

Assuming that they got a lot of the cache improvements into the uarch...this should be quite competitive.
 

Only a sub-set of servers see any benefits.

If zen's per-core performance in within reasonable distance to Intels, the question might never land to who has the better core performance, but rather who can supplement with the better uncore, etc..
 


Yes and no. AMD was still within performance of Core 2 45nm with K10 45nm but Intel still started taking market share. Once Nehalem hit it was a pretty done deal.

AMD needs to work on a CPU that can implement more than just one type of server design win. Some of the servers that benefit from these advantages Intel has are where the really good money is at.

That said, Intel also has Purley which is up to 28c/56t while Zen is up to 16c/32t with other enhancements around the CPU itself. How will they compete with that if their per core is weaker?

AMD used to focus solely on server CPUs the trickle down designs to consumers. K8 and K10 were both very server oriented and during those times they had more than the current paltry 5% market share. Much more.
 


AMD never had that much of the server market (max maybe 15% share?). They did have a larger slice of the consumer side though.


I mean from looks of things Zen maybe a more consumer oriented part (obviously they will still be offering it in servers). That isn't necessarily a bad thing if it's competitive enough- I mean AMD's bread and butter is consumer rather than server (always has been) and it's the market they need to stabilize first in order to drive revenue through (yes albeit at lower margins).

The thing is, if AMD can get some significant traction in laptops and desktops again, well there's hope for them. I think AMD's next big play in servers will be a large HBM equipped APU (which in the correct applications could be seriously fast).
 

Things have radically changed since then, however, my mentions still holds. If performance is close enough, then that wont be the dominated factor in doing deals. Likely with a ton of different reasons all pulling their weight.

Intel still have to segment their productlines, which can give some advantages to AMD.
We could consider HSA one, but that haven't had its fair share of working in the industri yet. However, these big contracts we keep talking about here, certainly aren't afraid of implementing new technologies.
That sub-market however, basically already have everyone else's fingerprint all over it. Hard to get too..

AMD won't necessarily have anything to compete there (except perhaps a future 32c/64t cpu || future APU(?)).
Do they need to compete with that?
 
Yes AMD will have to compete with Skylake Purley. While it will probably start on the high end it will trickle down much like everything and they will have to find a way to compete with something that has more power in other says.

While they don't need to have all the server market, they ned as much as they can get and not in the entry level market but in the top end market that they used to be in.

HSA is something to consider but again it is not like Intel is just sitting on their hands not developing something similar or even better. HBM is a good idea for a big APU but I think for servers HMC is going to be better.

Just will have to wait and see. I am hopeful that Zen has more server orientation to it but so far I am not convinced it does.
 


If you use older binaries then Haswell floating point performance is not very superior to Ivy. But recompiling one can find up to 80% higher performance clock-for-clock. Few consumer-Windows applications use new Haswell instructions, but Zen main target isn't that, but server/HPC.

ALUs are cheap to design and implement. AMD could design tomorrow a core with 16 ALUs. The key is on feeding those ALUs instead having them iddle. Zen has only 2 memory ports. Sandy/Ivy have three memory ports. Haswell/Broadwell have four and Skylake probably has increased this number.
 


According to Lisa Su, Papermaster and others Zen was explicitly targeting servers and HPC. The PC market is declining and cannot sustain AMD business. This has been explained again and again since Rory Read. AMD will not get traction on laptops for when AMD has Zen-based APUs Intel will be on 10nm and new arch.

AMD only option to survive was to get a big piece of servers. And this Zen design is not good enough.
 
http://hexus.net/tech/news/cpu/86954-zen-processor-block-diagram-devised-amd-software-patch/
the link says sandybridge, ivybridge, haswell and broadwell - all of them have 4 alu and 2 agu. is this right?
 


It is accurate so far as I am aware...Skylake may go to 5, though, I question the necessity and the necessary modifications to feed that and make it work.
 


No.

Sandy Bridge / Ivy Bridge: 3 ALU + 3 memory units
Haswell / Broadwell: 4 ALU + 4 memory units
Skylake: ???
Zen: 4 ALU + 2 memory units
 

can you please provide any link for this?
 
Status
Not open for further replies.