AMD CPU speculation... and expert conjecture

Page 546 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

blackkstar

Honorable
Sep 30, 2012
468
0
10,780
Juan, man, you're getting sort of desperate. We have someone here who is directly talking with people who work at AMD. He has insider information and he's biting his tongue as much as possible to not get himself or others in trouble. And yet you ignore him?

As soon as you get blown out, you move the goalposts to servers. I told you repeatedly that you mis-interpreted the extremely vague statements about "arm winning" yet you ignored them. And now we have someone who is basically saying that AMD itself states that it was a very specific case (ULP servers).



AMD ran out of low hanging fruit in K10 and would have ended up just like Intel is now, making small tweaks which results in tiny IPC increases. The only thing is that AMD doesn't have access to the quality of fabs Intel has (in terms of power consumption), so it would have been a completely futile battle.

To suggest AMD compete with K10 would be suggesting that AMD would beat or compete with Intel on raw performance with a traditional CPU core while having a tenth of the R&D budget on an inferior process node is insane. There is no way it would have worked out.

The point is that AMD needed to try something radical because it can't take on Intel head to head. They went for low IPC, high frequency, small cores to cram as many as possible on a die. That was the issue. There are simply far too many variables to discuss with Bulldozer where you can label its point of failure to a single talking point.

CMT might have worked absolutely fantastic if AMD decided to aim for a 2m/4c design for high end, with each module being very beefy, fat, high IPC designs while being smaller than 4 beefy cores.

AMD's new radical approach is HSA and a unified system. One of Bulldozer's big problems is it needs software that understands the hardware better. Going HSA and Mantle gives AMD a lot more control over the software, as it's nearly impossible, given AMD's market share, for AMD to push their own compilers with optimizations over competing products.

AMD had to take a chance with something like CMT to be competitive with Intel.
 

8350rocks

Distinguished


LOL...*smirk* I said...nevermind...
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Hey, no one can be more right than Juan. Not even AMD employees!



As far as I know they didn't intend to have lower IPC in Bulldozer than K10. When they get final performance results it was too late (Bulldozer was delayed about 2 years).
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


That's old news but someone posted a slide at S|A which showed GF was already making 30 thousand 22nm wafers a month in Fab 8. That was speculated but never announced from my understanding.

Not to start the fab drama again but interesting none the less. ;)

iJJlFux.png


 

jdwii

Splendid


No CMT works good for 70-80% scaling but it does so by lowering its ALU and AGU count making it worthless in most single threaded app's not to mention the poor idea of sharing a FPU, and cache......this whole design speaks bad latency and bad single core throughoutput. Honestly the only way they could fix this is by adding another ALU+AGU per core and by adding another FPU and making the cache dedicated, but would that really be CMT no it wouldn't the design bluntly sucks compared to Intel and is only slightly better then thuban unless some of the newer instruction sets are used.
 

jdwii

Splendid


On dolphin emulator i noticed that OpenGL was so much better on this GPU i was like why? My 6950 was good enough to run those games at 1080P close to maxed out(i had AA down). Now with this new GPU i have everything at max and it performs nicely some CPU bottlenecks but that's it nothing to bad.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Any proofs that Bulldozers have too few ALUs and AGUs? I'm not even asking for FPU because there're enough benchmarks which shows that one FPU per module is good for most workloads.

Main bottleneck found by third parties is cache. I'm sure there's more, but only AMD knows about them.
 

Rum

Honorable
Oct 16, 2013
54
0
10,630


Doh... I didn't know the WCCF source was old news... -_- kinda hard to sift through all the juan slide/ response barrages to find these things. my bad!
 


There's been a LOT of work on the OGL renderer in Dolphin recently, due in large part to the Android port and having to work around driver bugs (PowerVR chips SUCK at OGL compliance). Its considered the more accurate backend (SW renderer aside), though DX11 is considered faster in "most" situations.

That being said, Dolphin still has a LOT of game specific issues, and the various re-writes to the engines recently have uncovered a lot of regressions. Its in a much more stable state then it was a year ago however; its really improved. I make a habit to visit the bug tracker every week or so though, just to see if there's something I can help with.
 

8350rocks

Distinguished


Makes you wonder if that is 22nm FD-SOI SHP....if so, that makes some things really interesting does it not...?
 

isn't that the fd-soi fab? i don't know if ibm had a bulk process for 22nm.
tsmc was never a contender, ibm being part of c.p.a. and all.
btw, was that the plant apple tried to buy?
edit:fixed the wrong id quoted. :p
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Well, yes :)

Probably there are a lot of products produced in 22 nm (of course lesser known).

Carizzo is produced in 22 nm FD-SOI (sounds like saner choice than bulk 20 nm)? :)
 

jdwii

Splendid


The shared cache is a major problem probably more so then the shared FPU. Amd claimed back in the early bulldozer day’s that they could not feed the pipelines fast enough or that they were barely used to take full advantage of the 3 ALU+AGU Phenom had. But given by benchmarks it seems like the Phenom was truly 25% faster in performance per clock compared to bulldozer, i can conclude based on my benchmarks that PD is around 10% slower every time i benchmark the system compared to phenom per clock. Clearly this is not 33% since Piledriver only has 2 ALU+2AGU vs phenom's 3. Steamroller did add a lot to the performance per clock under testing i noticed it to be a 20% increase in performance per clock and pulling ahead of the phenom by around 10-15% in multiple tests including Fritz_chess,Cinebench, and some gaming such as Far Cry 3,BF4,Crysis 3,Sleeping dogs .
However Intel is 50% stronger per clock compared to PD(using those benchmarks above) when i tested the I5 haswell at 3.2Ghz both CPU's had turbo off and i only tested with 4 cores(let windows do the scheduling running on 8.1) I did not have steamroller to test however so its good to state that Intel is around 30% stronger compared to the A10 7850K per clock. Also important to remember even the A10 7850K has a X90% ratio vs PD 80% so even the A10 in multithreaded workloads will be around a 3.6 core vs a 3.2 core when 2 modules are provided(with PD and BD). Steamroller was a great thing and I only wished Amd would of made a 8 core Steamroller I would of bought it even if it was 250$ I know it would of competed really well.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I agree on that CMT is only one of the things that did hurt AMD, but I disagree on that the design philosophy was sound. It is a defective design and even self-contradictory.

Intel is not making anything special now, but merely using CMP and SMT, which are known to work.




CMT is only one of the problems of Bulldozer. Excavator will use modules, but post-Excavator cores will not.




This is a resume of the history of the scientific misunderstanding, hype, fraud, and patent wars surrounding HP 'invention':

http://arxiv.org/abs/1201.2626

http://link.springer.com/article/10.1007%2Fs00339-011-6578-7

http://arxiv.org/abs/1207.7319

http://www.theregister.co.uk/2011/12/27/memristors_and_mouttet/

http://tinytechip.blogspot.com.es/2011/10/more-evidence-hps-memristor-is-fraud.html

The best summary of HP announcement of "The machine" is found here

http://www.infoworld.com/t/cringely/hewlett-packards-machine-vaporware-meet-empty-suit-244265

Congrats to HP for announcing a nonexistent computer running on an imaginary OS. Our lives will never be the same

"The machine" looks great when HP compares it to an ancient computer built with 45nm chips, when compared to competition scheduled for the same year, then "The machine" specs look a bit outdated.




Unlike the PS4 APU, Kaveri was designed to use GDDR5m SO-DIMM modules. This early plan was finally canceled because one of the two suppliers of the SO-DIMM modules went out of business and one supplier couldn't guarantee production/stock.




What I am getting tired is of laughing so hard reading your attempts to negate reality. Do you really mean the same poster what in the past claimed to have insider info from someone at AMD, but everything what he said to us here was proven wrong?




Hey, both of you are ignoring that we are not discussing anything said by AMD employees but what a person claims that someone at AMD said him.




Jaguar/Puma show how AMD can compete with Intel head to head even in an inferior node, when CMT is abandoned.

AMD CMT is adapted from old architecture developed by DEC in 1996.

CMT would fail even for high IPC designs, because its two problems are the inefficiency (related to scheduler troubles) and the shared FPU.

AMD's HSA (Heterogeneous System Architecture) is AMD answer to similar technologies from others: Nvidia CUDA heterogeneity and Intel neo-heterogeneity. HSA is interesting and I have praised it in this thread, but HSA is not "radical".
 

jdwii

Splendid
"What I am getting tired is of laughing so hard reading your attempts to negate reality. Do you really mean the same poster what in the past claimed to have insider info from someone at AMD, but everything what he said to us here was proven wrong?"
Juan you really should stop talking about yourself here its a bit 3rd person.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810
Someone thinks they're a patent lawyer now. Watch out HP ur going downnnnnnnnn. Who cares if HP has the most server revenue and 15 Billion in cash. And the 317,500 employees must just be twiddling their thumbs all day. There's just no way that many people could come up with something new to the market. ;)

Blaise Mouttet is the one making the biggest stink, cited in 5 of those links. He also holds patents on memristors so I'm sure he has no bias or motivation to discredit HP. A case of sour grapes maybe, just maybe. The guy has never even held a job in the field. He's just a patent searcher with listed skills of "MS Excel, MS Powerpoint, MS SharePoint". ROFL. No wonder the patent office is so totally messed up. You got newbie engineers making decisions over industry veterans.
 


RE CMT, I'm not saying I think CMT is a good idea now- my point was that back when they started designing bulldozer (5 years + before it launched) I can understand why they went that way. The one thing CMT has allowed AMD to do (that other design schemes might not have) is to cram allot of cores onto a reasonably sized die at a relatively large node (compared to what Intel are working with at least). Also the design scheme behind CMT doesn't actually dictate poor single thread performance, as if your using one module for one thread, very much like HT all resources are dedicated to that thread and speed improves.

Also someone was mentioning the narrower pipelines on bulldozer, although it isn't quite as simple. Phenom II had 3 combined ALU + AGU units so at peak throughput could execute 3 instructions in parallel. Bulldozer / Piledriver decoupled the ALU and AGU units, so can in actual fact execute 4 instructions in parallel given the correct workload, the idea being that most workloads would use at least 3 of the available units therefore matching Phenom II. Sadly things haven't worked out that way, although I'm not sure that the Integer pipelines are the issue (it's more likely the small caches on Bulldozer compared to Phenom II).

I agree with you on Kaveri- it is a shame that the DDR5 dimms weren't available as the iGPU in Kaveri + DDR5 would be epic (1920 x 1080p @ 30+ fps in pretty much any title). I think as new memory technologies become available AMD's APUs are going to get better and better. I'm waiting for the day Tom's recommend an APU on either (both?) of their best CPU / GPU for the money articles.

 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Remember that Bulldozer family also sucks in single-threaded benchmarks, where properly implemented shared cache shouldn't be a problem.

I think that 70-90% better performance with 50% larger core (module) is good trade-off. Even for FP intensive computation it's 60-70% (it's almost impossible in practice to create piece of code which comprise only FP instructions).
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Agree.



Mouttet is only one of the authors in those links. This is a quote from Paul Meuffels and Rohit Soni:

We have shown by means of a thorough analysis in terms of electrochemistry that HP’s “memristor” model is misleading. Our arguments are based upon textbook electrochemistry and can be easily reproduced. There are no real devices which would operate in accordance with HP’s model because the model is by itself in conflict with fundamentals of electrochemistry. There seems to be no way out; otherwise, somebody would have tried to refute our argumentation in the meantime. Thus, HP’s memristor research group does not have found a realistic physical model for a working memristive device.

Scary how you defend HP by posting anonymous ad hominem at people as Mouttet, claiming he is biased and so instead just commenting on the extensive collection of data/arguments that he provides. And then you finish with a pathetic attempt to attack his skills.

As it was said to you in S|A forums, just stop this ridiculous hype of HP waporware.










Carrizo will be produced in bulk by reasons explained again and again and again during months, back when people pretended that Kaveri was SOI... but I smell a new wave of SOI hype/nonsense/lies is coming back to this thread.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Everything is possible and from you two FX is more reliable for me (probably not only me :) ). And I think that AMD engineers knows better than we which node is good for them.

Or maybe not, so why didn't you send your CV to AMD?
 
Status
Not open for further replies.