AMD CPU speculation... and expert conjecture

Page 235 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

hcl123

Honorable
Mar 18, 2013
425
0
10,780


This ?

What are the consequences of Kaveri being 28nm bulk instead of SOI? Poor OC than Richland?

If the kaveri part in 2013 is mobile... none!

Mobile systems are not for OC, and they have quite lower clocks than the desktop brethren, you could even lose the all guaranties on the whole system if you OC. I think is not even possible in any mobile system, its BIOS restricted.

If *all* Kaveri SKUs are "bulk", then expect much worst OC potential and lower "commercial" speeds from the start. Only intel can tweak "bulk" decently, it took years and finfet and a 'literal' mountain of money to tweak that process. Even so is quite worst than FD-SOI in all departments.

Its possible that the 28nm SHP that appeared in some Glofo charts is FD-SOI... more likely ~22nm compared with 28nm bulk, as Glofo stated... its possible also that it falls into the FAB2.0 model, that Glofo also stated it would allow, that is, the 28nm(~22) SHP is responsibility of AMD based on the STMicro licensing with Glofo help, its essentially an AMD process under FAB2.0 model, that is why it disappeared from "official" Glofo charts again... the 28nm FD-SOI Glofo offers is not SHP, it has no "booster" techs, its exclusively a low power process.

So for Low Power 28nm bulk, like used for mobile, is a good bet (at least is not bad)... the 28nm TSMC just proves it... but for high performance, if AMD wants to compete it needs FD-SOI, as simple as that. Intel is going 16nm bulk finfet, it will have a great advantage to double or more the size of the GPU in those chips, though *inferior* in every aspect to Radeon, size matters, intel would have too much advantage in size for AMD be comfortable.

Lets see what happens, i think the "kaveri delay" might very well be related to FD-SOI, and is for the mainstream/dektop... if not, its going be very hard for AMD.

 


Here is my $0.02, I think that the 2013 Kaveri parts are mobile, while the 2014 Desktop parts as HCL pointed out will likely be related to the GF FD-SOI to be finalized in late 2013 8350 pointed out a lot of pages ago.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


That is "old news". Meanwhile AMD payed ~800 millions to get out of Glofo entirely, and most probably the deal includes fab processes and Wafer Supply. Its absolutely ridiculous to think AMD payed so much money only to get rid of Glofo stock, they could just had sold it on the open market and make some money with it, not have to pay on top of things...

So what ever happens it was already payed for... i'm still betting FD-SOI will be reality... it will in any case sooner or latter... matter of fact already is a reality for STMicro ARM offerings. Even 20nm bulk is kaput(EOL for plain planar bulk processes), not only 28nm PD-SOI if happens, lower than 28/20nm only Fully-Depleted techs, and that means finfet or UTBB/ETSOI ( ultra thin/extremely thin ... Buried Box SOI ... aka FD-SOI).

 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


AMD still has purchase requirements from GF (confirmed in July 18th earnings call). They paid to reduce those requirements AND break exclusivity, but the ties are not completely severed. Like divorced parents AMD is still stuck paying child support.

I really hope it is PD/FD-SOI or we will be extremely lucky to see 3.5Ghz even. GF/TSMC/Samsung don't have any track record providing CPUs running past 3Ghz on bulk. I've seen maybe a couple test chips but they were tiny old ARM cores, not anything near 1B transistors.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


That depends on how serious they are about HSA being the future. They could very well trim the CPU cores back to allow more GPU compute units.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


They already have 16 cores in "one socket" since Interlagos (G34).

perhaps what you mean is if they will have 16 cores in a single die !?

I don't think so, not steamroller which will still be 2 cores per module, but which probably for server could have 5 modules and so 10 cores per die... 20 cores per socket.

The same 4 modules 8 cores per die SR is not bad... and i think the trend will be to double the GFLOP count not the GIPS count.. steamroller server/FX could have double FlexFPU or double size FlexFPU (it will be better for all workloads including server).

Excavator most probably... it will have 4 cores/threads per module... so 16 cores die will be common, perhaps server could have 5 modules, so 20 cores die 40 cores socket(equivalent to G34). Yet Excavator could double again the GFLOP count... that "die shot" that circulated... i strongly suspect is a Excavator with 4 thread/cores (micro-clusters, 4 AGU = 1 cluster; 4 ALUs another cluster, so 4 micro-clusters per module) and quadruple the FlexFPU size compared with BD/PD.

http://farm6.staticflickr.com/5321/9104546631_4c7a4a023b_o.jpg

A comparative is always nice, and the smaller one on the right could be EXv and could be 28nm FD-SOI ( rougly = 22nm bulk).

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
@hcl123 @TheQ6660Inside. I did mean desktop. The link that I provided is old, but is the one that I know that has AMD official statement about the 28nm. Of course, it is entirely possible that mobile goes bulk and desktop goes SOI.

AMD claims kaveri A10 desktop will be 1.05 TFLOPs. I calculated the performance of CPU and GPU and the total performance makes sense if the GPU is 0.9GHz and if the CPU is 4.1GHz.

Globo itself claims that 28nm is bulk and HPP goes above 3 GHz

http://www.globalfoundries.com/technology/28nm.aspx

But I think that 4.1GHz would be too high even if they tweak/master the process.

I found this pair of very interesting links

http://www.advancedsubstratenews.com/2013/07/globalfoundries-on-cost-vs-performance-for-fd-soi-bulk-and-finfet/

http://www.eetimes.com/document.asp?doc_id=1280415

The first "risk production" will come from a Globalfoundries wafer fab in 4Q13 and volume production will ramp during the first half of 2014,

for the same level of performance, the die cost for 28nm FD-SOI will be substantially less than for 28nm bulk HPP (“high performance-plus”).

I think they fit nicely with your explanation that kaveri is delayed by the transition to 28nm FD-SOI :)
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


At 5% market share I'm wondering if AMD will even try to continue their legacy big core chips. They certainly aren't in 2014 as the roadmap is already out. When Bulldozer was ramping for market they had Cray in their back pocket to order gobs of chips. Their new products are all Intel chips. "Once bitten, twice shy." Cray took a big loss when Bulldozer was delayed. No one likes late chips when it holds up 100s of millions in orders.

When was the last time AMD even talked about their high end server chips? Everything in the last year has been about SeaMicro and ARM core chips. SeaMicro doesn't fit those 12C/16C chips.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


8-core piledriver are replaced by 4-core steamroller in the server. It seems reasonable that 16-core piledriver will be replaced by 8-core steamroller (maybe 6-core).

 

hcl123

Honorable
Mar 18, 2013
425
0
10,780
Doubt very much Cray took any loss because of any late chip... supercomputer are deployed once and last many many years, so contracts must contemplate this, besides there are services and support and upgrades and Titan was already upgraded to Piledriver.

And there isn't no gobs of chips... there is a couple of big machines, AMD will sell way more FX9590 than chips to any supercomputer, the big bonus is the advertisement potential of supercomputers.

Seamicro fits "nodes"... its a cloud cluster... the max config AMD sells now has 256 nodes, each could have a 8 core chip, so we are talking 2048 cores.
 

jdwii

Splendid


That would need an EPIC increase in performance per clock, even with magic i feel they wouldn't be able to match their old Server CPU in throughoutput
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


I am really pessimsitic about AMD's big core server offerings. This is a situation where power consumption actually matters beyond fanboys who don't know what TDP is and a large amount of CPUs using more power than the competition can actually mean something beyond a few dollars a year.

I personally took FX 9000 series as re-assurance from AMD that they care about enthusiasts and they are not afraid to make chips that gobble power or make a lot of heat. A sort of way of AMD saying "hey, we know we can't beat Intel in power or heat output but we don't really care about that."

Which, in my opinion, is a good thing. I don't like FX 9000 series because it's obvious preying on those who aren't so bright, but at the very least it means that AMD isn't forgetting about us.

At least, that's my take on it.

There's also some rumblings on S|A forums about 20nm not being ready and FD-SOI 20nm not being ready for a while, which might have something to do with the delays.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


ummm... i don't think so... the FLOP number is exclusively GPU.

512sp x 2 (2 operations, each sp pipe is MAD) x 1000Mhz = 1024TFLOPs "single precision"(32bit ops), indicating the speed of the GPU could be a little above 1Ghz like Radeon (1050Mhz)

The CPU side is moot... If only 1 FlexFPU (like in BD/PD), but each FMAC pipe 256 bit wide, that is, able of 8x 32bit single precision op each FMAC pipe... or 4 64bit double precision each pipe...

http://translate.googleusercontent.com/translate_c?depth=1&langpair=auto|en&rurl=translate.google.com&sandbox=0&u=http://www.planet3dnow.de/vbulletin/showthread.php%3Fp%3D4772585&usg=ALkJrhhm0kgKeXyOflR8VJ5w7d1PBEGGMw#post4772585

http://www.planet3dnow.de/photoplog/file.php?n=24314&w=o

It will be :
8x 2 pipes x 2 ops pipe (MAD) x 4000Mhz = 128GFLOPS single precision, 64GFLOPS double precision per Module... or 256 SP, 128GFLOPS DP per APU (2 modules)

This is different, that is Excavator (? )
http://farm6.staticflickr.com/5321/9104546631_4c7a4a023b_o.jpg

If each of those FMAC pipes is a 256bit pipe in the same topology, it will have:

8x 8 pipes x 2 ops pipe (MAD) x 4000Mhz = 512GFLOPS single precision, 256GFLOPS double precision per Module... or 1 TFLOP SP, 500GFLOPS DP per APU on the CPU side (2 modules, 8 cores... a 4 module like FX will double, 1 TFLOP double precision per die, a MCM like G34 will be 2 TFLOPS)

see where this is getting ? (by by Phy X)



Yes but not quite,Glofo FD-SOI will not have "booster" tech... with good booster tech you can augment the clock potential more than 40%(you have to read a lot of past articles)... so 3 x 1.45=4.3Ghz, puts it at the potential of the 32nm PD-SOI... i think SR on FD-SOI could have even a little higher clock than today Richland/FX.



What do you mean by big cores ?... AMD BD cluster kind of cores couldn't be more streamlined than now, the simple ADD ALU has a FO4 of 10 ( intel is 17)...the size of a chip is what the fab process allows, a Hasfail at 32 nm PD-SOI probably would be close to 400mm²...the Iris then would brake into 500mm²...


 

GOM3RPLY3R

Honorable
Mar 16, 2013
658
0
11,010


Honestly I'd be so nice if upon installation you could choose what you want on there. I'd work so much better and overall would take less time, eventually resulting in better performance. ^_^
 

GOM3RPLY3R

Honorable
Mar 16, 2013
658
0
11,010


I apologized, and don't take it to heart.
 

GOM3RPLY3R

Honorable
Mar 16, 2013
658
0
11,010


I'm sorry but I have to say this. If SOI is so much better than Bulk, then how come AMD's chips with more cores perform the same if not a little worse (i.e i5-3570k vs. FX-8350), and create more heat even on a little workload? I'm sorry to bring this up, but I had to put that out there.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780
Steamroller Improvements

Micro-architectural

- Store to load forwarding optimization
- Dispatch and retire up to two stores per cycle
- Improved memfile, from last 3 stores to last 8 stores, and allow tracking of stack dependent operations.
- Load queue (LDQ) Size Increased to 48, from 44
- Store queue (STQ) Size Increased to 32, from 24
- Increase dispatch bandwidth to 8 INT ops per cycle (4 to each core), from 4 INT ops per cycle (4 to just 1 core). 4 ops per cycle per core remains unchanged.
- Accelerate SYSCALL / SYSRET.
- Increased L2 BTB size from 5K to 10K and from 8 to 16 banks.
- Improved loop prediction.
- Increase PFB from 8 to 16 entries, the 8 additional entries Either can be used for prefetch buffer or as a loop.
- Increase throughput snoop tag.
- Change from 4 to 3 FP pipe stages.

Self explanatory... good article to read just below... and discussed past posts, probably a lot of pages back, attending the all "propaganda" effort made in this thread, filling perhaps much more than half of the 120 pages

http://translate.googleusercontent.com/translate_c?depth=1&langpair=auto|en&rurl=translate.google.com&sandbox=0&u=http://www.planet3dnow.de/cgi-bin/newspub/viewnews.cgi%3Fid%3D1362665945&usg=ALkJrhi4QlK9wVFS0Ws-mNIjkxIlDa6ckg

Integer Execution specific (edt)

BEXTR reg, reg, reg.......EX0 EX1 FastPath Double 32-bit/64-bit instructions of
MOV reg, reg................EX0 EX1 FastPath Single this shape can therefore issue to
XADD reg, reg...............EX0 EX1 FastPath Single AG0 or AG1 for Models 20h
XCHG reg32, reg32........EX0 EX1 FastPath Double and Greater
XCHG reg64, reg64.........EX0 EX1 FastPath Double

So the AGUs will perform more than simple address calculation for load/store operations... it could give a nice boost to performance

http://translate.googleusercontent.com/translate_c?depth=1&langpair=auto|en&rurl=translate.google.com&sandbox=0&u=http://www.planet3dnow.de/cgi-bin/newspub/viewnews.cgi%3Fcategory%3D1%26id%3D1328302051&usg=ALkJrhgKPLVIZhdNGU8NzH3Fh3qIahOMKw
( ppl have to copy the all link and launching it from the browser address bar, this site breaks google translate links)

Floating Point Execution specific (perhaps only excavator, or SR desktop)

FP256 bit in a single pass, meaning most probably the FlexFPU load buffer can align 2 128bit Load accesses, and the FP retirement can write-back 256bit to. Most probably the L1 Data will remain 128bit wide per port for not impacting the latency of Integer operations.

http://www.planet3dnow.de/photoplog/file.php?n=24314&w=o

If this is possible there is something very good with having the FP operations out of the Integer path, i think the "module" and BD uarch could show "superior" design, is here to stay liking or not... and independent of windows benchmarks... heack! it could even transition to ARM designs ( then it will be really really SUPERIOR... LOL )
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


FAB proces has nothing to do with the "potential" performance of the design. That SOI is much better for high performance, which means *higher clock* i think there is no doubt now. And STMicro showed that FD-SOI, tweaked with back-gate bias which is not possible with finfet, is also quite superior for low power... 1Ghz at 0.6v ( its commercial now) on ARM A9... (better tweak development points possible the same with 0.48v, Bulk needs ~0.9v for 1Ghz).

Besides you are pointing windows benchmarks, that are invariably tweaked for intel... and this has even less to do with the REAL potential of a design than fab process ( use Linux, more threads, unbiased LOL ) (edt)

... elas! you can still use windows, the benchmarks have NOTHING to do with the software you use, because pratically none (if there is one i don't know) of the windows applications you use is tweaked for intel, and this from freeware to commercial software ( so benchmarks DO NOT REPRESENT your software ... SORRY to wake you of your illusion to.. and in reality you can't say any i5 performs better)... only Linux has better multithreading and more apps with more threads, so you can benefit from more real cores... at least if you go intel 6 core/12thread HDT, then linux is kind of compelling, in windows that i5 you point can even gain for a 6core/12thread (abhorrent isn't it ?) (edt)

 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


The 8/12/16 core chips. AMD already said the APUs outsell the FX by 8 to 1, and there isn't a huge markup there to get much profits back. Similar with the server chips. AMD is channeling all their R&D money into the low power SoC, APU, ARM APU and also custom APU/SoC. They're moving into higher volume lower margin products. Maybe in 2015 they'll introduce more big core chips after they've returned to profitability.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Actually it did cost them another quarter in the red. It was bad enough they even mentioned it in a couple press releases. They had the orders for the computers and were expecting to return to profitability in Q3, but AMD delayed causing them to slip and remain negative for the quarter. I was following them closely because they were one of the few companies to even talk about Bulldozer besides AMD.

http://investors.cray.com/phoenix.zhtml?c=98390&p=irol-newsArticle&ID=1624518&highlight=

"Our 2011 outlook considers the impact of past and currently expected delays in receiving a key component for our systems."

Which their big new system at the time was a large array of Bulldozers. Which came trickling in pushing out the orders to end of Q4. They scraped by with a mere $1.2 million profit for the year.
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


more heat? ....

5522_43_intel_core_i7_4770k_haswell_4th_gen_cpu_and_z87_express_chipset_review.png


where? AMD is known to run ~60c without issues where as Intel always likes to push 80+.
 
Status
Not open for further replies.