AMD CPU speculation... and expert conjecture

Page 274 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

8350rocks

Distinguished


That's from back when everyone thought it would be a NVidia GPU in both systems. When NVidia was eliminated...the discussion of PhysX was also eliminated. Additionally...the developer kit does not mention PhysX at all.

You have to understand...the reason PhysX works on NVidia better than AMD, is because it is optimized as a part of CUDA. AMD does not use CUDA, so essentially, havok or bullet would actually give *superior* performance as an API without having to do a massive rewrite.

*Could* it be done? Sure...but why would you, when AMD already has perfectly viable Physics engines that work complimentary with their own hardware? The short answer is: you wouldn't. Also PhysX is not anything developers are particularly crazy about, a physics engine is simply a physics engine. There is not a dramatic performance difference when you compare PhysX to Havok or Bullet...
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


Ah, yes I did. I got a little confused with my wording. 4m total, 2m removed from 6m die. It makes sense to me but it's pretty ambiguous now that you point it out.

PhysX would have to be completely ported to OpenCL. It is probably just running on the CPU with AVX and such enabled. The cores are there and no one knows how to use all of them properly yet, so why not just throw physics at it?

Also Bullet is great, I use it in Blender. Sony also uses it in projects on their own.

http://en.wikipedia.org/wiki/Bullet_%28software%29#Commercial_games
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The news that I linked above is from 8 Mar 2013 (Nvidia official announcement was one day before). I can sure you that everyone knew then that the PS4 was powered by AMD APU. In fact the relevant part of the news is this one:

This marks the first time PhysX has been accelerated by AMD graphics hardware
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


All that is obsolete. "Physics" will be OpenCL or other language like C++11/14 or AMP and could run "simultaneous" on the CPU and GPU, specially because the new definition since AFDS2011 now includes cache coherency. More Complex or quite complex physics now can run simultaneous on CPU+GPU with much simplified coding on normal high level languages.

In hUMA what remains *unified or uniform* for all possible processing elements in a system, is the "virtual memory space" (and this can even include disk swap or virtual files, memory on adapters...)... all processing elements will see the same, though the physical memory addressing can be different. Now cache coherency makes its easy to share stuff trough caches... all its called hUMA... and it is as if asked for "physics" lol

http://mygaming.co.za/news/wp-content/uploads/2013/05/AMD-Kaveri-hUMA-shared-memory-600x337.jpg

Just search for some OpenCL physics effects on youtub and you might see how much and how fast things could be obsoleted.

 

8350rocks

Distinguished

You keep thinking that...meanwhile...I will go by the devkit...which says nothing of PhysX.

AMD does not run CUDA.
 

nvidia has not said it will run GPU physx. the person in the article just assumed by the looks of it. nvidia said physx is coming to the systems. They are probably talking about the physx engine which will run on the cpu if there is no nvidia gpu.
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


The development kit for the PS4 is likely clocked at 2.75ghz. Dev kit for XB1 has 12gb memory instead of the consumer's 8gb.

The ps4 != PS4 dev kit. XB1 != XB1 dev kit. end of story.

They need the extra hardware to run the necessary diagnostic software while testing the program in production.

http://www.gamespot.com/news/sony-loaning-out-ps4-dev-kits-to-developers-free-of-charge-for-a-year-6412084
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Sony officially includes to Nvidia (together Havok and others companies) under the section Physics/AI/Animation for PS4

http://www.scei.co.jp/ps4_tm/index.html

And here is Nvidia official announcement for the PS4

http://nvidianews.nvidia.com/Releases/NVIDIA-Announces-PhysX-and-APEX-Support-for-Sony-Computer-Entertainment-s-PlayStation-R-4-941.aspx

Of course, PhysX implementation on PS4 doesn't use CUDA.



Maybe, or maybe the person has internal info.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


A week after the PS4 dev. kit passed FCC certification, the own PS4 also passed FCC certification

http://www.engadget.com/2013/07/22/sony-playstation-4-fcc/

sony-ps4-retail-console-passes-through-fcc-details-available-35620-1.jpg

sony-ps4-retail-console-passes-through-fcc-details-available-35620-2.jpg

sony-ps4-retail-console-passes-through-fcc-details-available-35620-3.jpg


and guess what? The PS4 also list the same 2.75GHz as maximum clock in the system. Sony said that the CPU is clocked at 1.6GHz. The 2.75GHz is the frequency of the memory (CK = 1.375GHz; WCK = 2.75GHz). Note that 2.75Ghz memory in quad-channel provides the 176Gb/s bandwidth claimed by Sony.

I know little about XboxOne, but the PS4 dev has 8Gb, the same than the PS4. Part of memory and of cores are reserved for development/testing tools. In principle only six cores and 5.5GB are available to games. Here you can see a CPU profile for a forthcoming PS4 game. The game uses six threads, each one locked to a single core. The OS and the profiler are running in the remaining two cores

91
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


That's what happens when you troll too much. People stop believing a word you say. There is an age old fable for this. Perhaps they don't teach it in school anymore.

http://en.wikipedia.org/wiki/The_Boy_Who_Cried_Wolf

 

that's becuase posts like that would usually be followed by "
hafijur said:
so, then, like, my intel core i5 is like, totally faster than, like this 2.75 ghz jaguar cpu. like totally faster! interu ichiban!! coa ai faibu banzai!!
"
it happened so many, many, many times that he got trolled.

xbone(R)'s gpu explored and other stuff:
http://semiaccurate.com/2013/08/30/a-deep-dive-in-to-microsofts-xbox-one-gpu-and-on-die-memory/
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
Some people here speculated that Steamroller/Kaveri delay is because it is waiting for SOI @ Globalfoundries. However, AMD official claim is that Kaveri is bulk and the delay is because of HSA. Here a different perspective

http://hothardware.com/News/AMD-Kaveri-Shipments-Slip-Into-2014-Speculation-Rampant-on-Unannounced-Products1/

According to his sources, the delay is caused because AMD sent the chip too late to Globalfoundries. The chip would be @ Globalfoundries about Q3-2012, but @ November 2012, the chip was still on AMD. There is no explanation for the delay beyond "AMD kept the chip back to put some additional polish on its performance". Unfortunately, this does not explain if that was because couldn't match original prediction or because is some kind of Steamroller 1.1v with improvements.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810



Some good stuff in there. AMD put way more than just Jaguar + Radeon in that massive SoC. The question is who owns all those extra (~16 or so) custom cores. Can AMD add some of those to their own SoCs? That level of customization likely means their profits will be a bit higher than expected for the part. I can see now where their custom SoC business could take off from here.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The AMD fans here make one or two nonsensical claims per each hundred of correct claims. One Intel fan here makes one or two correct claims per each hundred of nonsensical claims.

Here an gaming example of how a six-core AMD chip can be faster than an i5. *

500x1000px-LL-7d31c35c_proz.jpeg


The octo-core is not full loaded

350x700px-LL-d3796154_proz20amd.jpeg


A simple computation shows that is comparable to a six-core full load. Even five-cores full loaded are enough to outperform the i5.

* I have also other kind of benchmarks showing how an octo-core AMD leaves in the dust an i7-3770k, but forget those now...
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


That's mostly a rehash of "a rather confused DigiTimes story". They're talking about a desktop Kabini which I wasn't even expecting. They're already out in desktops and AOIs with the BGA chips. Perhaps a slightly higher clocked/TDP version?

It is rather confusing though. At hotchips IBM was talking about a 22nm SOI process for their Power 8 chip, which GF would also have access to.

There was speculation before that Kaveri is already on it's 2nd revision, by S|A. The first one being scrapped because it was too slow. Which is why we got Richland instead. Did they try it on bulk and then realize they couldn't get the clocks and had to go back to SOI? It wouldn't be the first time a high level manager made a decision without fully understanding the ramifications.

Even Intel struggles with 4GHz (Turbo only) on bulk so what kind of magic can GF perform here? Inquiring minds want to know. The delay could be just because AMD got too busy with the PS4/XBone chips which are more complex than we thought.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The author differentiates between the "rather confused DigiTimes story" and his own sources. Regarding Kaveri the author writes:

The reason Kaveri was late taping out, according to my sources, was that AMD kept the chip back to put some additional polish on its performance.

[...]

If holding Kaveri back a few months gave AMD the time it needed to further improve the core, and the end result is a chip that's 10-15% faster clock-for-clock than Piledriver as opposed to 8-11%, then there's no downside here. If it helps AMD ensure that the chip yields well at high frequency or low power consumption, that's all to the good. In the grand scheme of things, moving the launch from late November to early January isn't going to hurt the company's sales much, but it may lead to significant improvements in the final product.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Nothing of that is stated in the link you provide... only that Kaveri was a very late 2012 tape-out, and as has "been usual" from tape out to the streets it takes ~1 year. And this is not because its AMD or Glofo or Kaveri, it has been ~1 year from tape-out to streets since long for all participants.

The 1th kaveri on the streets will be bulk, not because it "was revealed" in that article... it was revealed in another interview with an official AMD representative some time ago. But take caution its the 1th kaveri SKUs, which usually since Llano are "mobile" parts, and for mobile "low power bulk" is not a bad bet.

All of Kaveri will be bulk ?... it could be, but "plain plannar bulk" is kaput, in future (VERY SOON) finfet or FD-SOI will be the only choices... AND SOONER AMD MAKES A TRANSITION THE BETTER... the problem is that finfet on those foundries is yet 1 to 2 years late, leaving only FD-SOI as the alternative for soon, which is already in full ramp up production at STMicro Crolles fab... and cherry on top of cake is easier, more yields less variability, more performance, AND CHEAPER than finfet either on bulk or SOI wafers (its like screaming full lungs "pick me up" lol).

And STMicro and Glofo are partners( since more than 1 year, and Glofo stated FD-SOI ready for full ramp up in 2014), so FD-SOI for the Kaveri "desktop" is not a wild "speculation", and like intel tic-toc, making a fab process transition on the same design is tremendously facilitated, more the case of bulk to FD-SOI, which could happen in less than 6 months if both processes were started almost simultaneously... get a few runs at bulk then port to FD-SOI, is considerably easier, both processes share the same BEOL and some identical MEOL... having a chip bulk and then followed by the same design FD-SOI it could take very short time, specially if using the same foundry, which is the case with Glofo. The chips will be quite identical, the same BEOL some of the same MEOL, only the FEOL (front end of line) will differ significantly... all bug catching and errata will be done in the bulk version, its the same in this department (could be exactly the same).

So even if FD-SOI happens nothing is really late... or AMD fault.

Kabini is an identical story, it already debuted this summer, the versions in 2014 can be tweaks and new packages for new form factors... but of course the "rumor mills" are restless, and when it is up to bash AMD then they enter hyperdrive mode ( at least seems like that to me (edt)).

 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


I can almost assure you that the 22nm SOI process of IBM is FD-SOI... high performance, full of "booster" techs FD-SOI. AMD could have identical for 28 FD-SOI, if they followed a Fab 2 model(perhaps why they payed ~700 millions to glofo in 2012)... a little like renting installations and pick a reference design of the foundry (which in this case could be 28nm FD-SOI with back gate bias), but then tweak it with "booster" techs.



I don't know why, but perhaps that was one main reason the previous CEO... Seifert... was fired.. yes he was fired. The first tests and simulations on first bulk runs might had showed "terrible"... but since it could had been quite cheap he might wanted to go ahead... like all "economists" (Seifert is an economist not an engineer) that supposedly run or give directions to the world, that kind is only good at making dung decisions and theories, for their bankster riddled insane masters... out of not less insane and borged statistical approaches...

It could had been the worst debacle for AMD ever,,, the end!


 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810

Which would coincide with the S|A story but with much fewer details. They've been polishing it for 2 years now. Neither have definitively said which process is being used.

He's showing to be rather spot on for this one: http://semiaccurate.com/2012/11/06/amds-kaveri-apu-slips-again-2014-now/

If indeed they are bulk I think they're going to be limited to an A10-5745M class (2.1/2.9Ghz) APU. Certainly Steamroller cores and better GCN will make that a much better mobile chip, but desktop users will yawn and cry fowl.

 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


How could he not troll !?... the major fault is not his is, are articles like this http://hothardware.com/News/AMD-Kaveri-Shipments-Slip-Into-2014-Speculation-Rampant-on-Unannounced-Products1/ ... incredibly biased, with peremptory generalized absolute statements based on windows TM fake and biased benchmarks, that are not even representative of the "common windows software" ppl use AT ALL, much less of other OSes and environments. Nevertheless useful idiots always want something "absolute" to grab on (wisdom and truth doesn't fit, if slight more complex and nothing of absolute), and most of times end up thinking they have the greatest when in fact most of what they run in their systems (common software) is faster on the hardware they think is much inferior LOL

If holding Kaveri back a few months gave AMD the time it needed to further improve the core, and the end result is a chip that's 10-15% faster clock-for-clock than Piledriver as opposed to 8-11%, then there's no downside here. If it helps AMD ensure that the chip yields well at high frequency or low power consumption, that's all to the good. In the grand scheme of things, moving the launch from late November to early January isn't going to hurt the company's sales much, but it may lead to significant improvements in the final product.

Because ultimately, that's what this is about. AMD needs Kaveri to hit, and hit big. While there's no chance of closing the gap between Piledriver and Haswell in a single leap, a 10-15% IPC gain would put Piledriver's efficiency back at Thuban/K10 levels (but at a significantly higher clock speed). At that point, AMD's quad-cores stop being outperformed on a regular basis by Intel dual-core chips and the CPU side of the APU equation gets a decided kick in the pants.

Piledriver can be already ~10% better than K10 for many jobs at the same number of cores and at the same clock(HPC), and you can't simply detach the clock ability and MULTITHREADING from this design, is embedded to the bone, and in true and clear a 15% improvement for SR will put it ahead of Hasfail (less than 10% improvement for a toc) at the same clock and number of threads... but i'm also quite convinced the issue is being addressed in the next crop of benchmark software (not hardware) lol

For normal usage it means next to nothing, because even today the "feel" and real race speed (not qualifying benchmarks on biased conditions) on AMD and Intel is practically the same, totally indistinguishable in the all common tasks of a desktop system . "Compute" (OpenCL etc) and heterogeneous yes... those could mean much much more..

But if e-pennis means anything... we can say AMD has the crown of the fastest commercial chip ( buying a sports car you wont argue about mileage or noise or price, the same logic seems to have influenced AMD), arguably, but yes, in actuality and in fact, AMD has the "fastest commercial PC CPU" around (OC other tweaks are after market add-ons)... doesn't mean more than exactly that...

http://www.xtremesystems.org/forums/showthread.php?286815-AMD-FX-9590-in-FlanK3rs-hands-and-in-review

But back to the psyche work... if anyone believes that *bold* uber hogwash statement in that quoted part of http://hothardware.com/News/AMD-Kaveri-Shipments-Slip-Into-2014-Speculation-Rampant-on-Unannounced-Products1/... how can he not troll ??

"Realizing" or "coming to grips", that after some auto praising his decisions in forums and are only blind biased "other" funboys that fail to see the light and the genius of his choices... and *Free Advertising* is a self aggrandizing phenomena, is not about truth... then the only choice left is trolling no matter what, realizing or admitting otherwise, is admitting to himself he has been fooled and he is stupid (many egos just can't stand that, without severe consequences ).

He could loosen up quite a bit... he shouldn't have to feel like a fool, the world is full of deceptions even wise ones can fall, only what it takes is a glimpse at politics. (commercial "politics" is not that different)


 

hcl123

Honorable
Mar 18, 2013
425
0
10,780
ummm ... very interesting if true

http://www.xtremesystems.org/forums/showthread.php?282723-AMD-quot-Steamroller-Excavator-quot-info-speculations-and-experience&p=5203772&viewfull=1#post5203772

I saw this info posted on another forum and wanted to see what people here think about it. This is from the AMD supplied gcc "machine descriptor" file:

;; AMD bdver3 Scheduling
;;
;; The bdver3 contains three pipelined FP units and two integer units.
;; Fetching and decoding logic is different from previous fam15 processors.
;; Fetching is done every two cycles rather than every cycle and
;; two decode units are available. The decode units therefore decode
;; four instructions in two cycles.
;;
;; Three DirectPath instructions decoders and only one VectorPath decoder
;; is available. They can decode three DirectPath instructions or one
;; VectorPath instruction per cycle.
;;
;; The load/store queue unit is not attached to the schedulers but
;; communicates with all the execution units separately instead.
;;
;; bdver3 belong to fam15 processors. We use the same insn attribute
;; that was used for bdver3 decoding scheme.]

Pity the poster doesn't give a reference link, not going to pick at GCC lol...

ummm ... could mean SR has the same 4 decode pipes, only arranged like 2. It seems to indicate there are 2 decode engines but sharing 4 decode pipes like before (simply 8 decode pipes seems an overkill, unless there are 2 fetch engines also)... *If* true, It means what is REALLY doubled are the dispatchers (we have been here before).

The fetching only 2 cycles could mean the Vertical Multithreading scheme for "block" is 2 instructions or 2 cycles instead of 1 BD/PD (interleaving). That is, fetch from one thread in 2 consecutive cycles, which can correspond to 2 instructions... and then change for the other thread for 2 cycles... and on and on. If only one thread present per module is as before all cycles goes to it.

3 FP units and 2 Integer units ? ... now this is a complete surprise if true... how ?

Perhaps 1 MMX unit with 2 pipes capable of some FP... and 2 FMAC units with 2 pipes each ? .. 256bit per pipe ?

The "detached L/S engines" could mean a good dose of run-ahead... more asynchronousness (sorry for the bad word)... possibly "data speculation"...
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780
Now sheds some light

http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01079/bdver3.md

(define_cpu_unit "bdver3-decode0" "bdver3")
(define_cpu_unit "bdver3-decode1" "bdver3")
(define_cpu_unit "bdver3-decodev" "bdver3")

;; Double decoded instructions take two cycles whereas
;; direct instructions take one cycle.
;; Therefore four direct instructions can be decoded by
;; two decoders in two cycles.
;; Vectorpath instructions are single issue instructions.
;; So, we have separate unit for vector instructions.
2 decodes , 2 direct path 1 vector in each... fetch can do 2 instructions cycle (double fetch ?) so decode 4 instructions (direct) in 2 cycles using 2 decodes...

Now that seems worst decode for a single thread per module compared with BD/PD lol ... unless after decode there are stream loop buffers like intel LSD... or unless the 2 decodes can work for the same thread.

;; Three FP units.
(define_cpu_unit "bdver3-ffma0" "bdver3_fp")
(define_cpu_unit "bdver3-ffma1" "bdver3_fp")
(define_cpu_unit "bdver3-fpsto" "bdver3_fp")

Now it seems worst than BD/PD also lol... its not units its pipes... unless those "ffma#" are 256bit each, and so capable of 2 256bit AVX instructions per cycle, else there isn't really anything to write home about in here.
 
Status
Not open for further replies.