AMD CPU speculation... and expert conjecture

Page 279 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Sure the future bet is heterogenous everywhere... but the rest is just utter rumor hysteria.(you can have hUMA/HSA over discrete adapters)

Just comprehend that so far the FX and the server SKUs share exactly the same die. This die "Orochi", has been packaged for C32 for G34 and for AM3+...

More!... there is a OpenPlatform definition based on G34 kind of sockets, and the new GC36 socket *compatible* with G34, that is, able to have the former Opteron 6000 series, and which socket so is form factor the same of G34, (you can pick the old G34 chips and put it on the new sockets (compatible))... will be prepared for DDR4 and more interface on board (PCI3 v3, HT/HTX v 4.0)

If FX is dead so will be traditional server CPUs.

Now that seems to be the worst business move ever... Open Platform 3 seems to have picked some steam, its a neat idea to have mobo manufacturers innovate and diverge according the needs of their costumers, based on a same reference design. Now if "server" chips are gone, its akin to say "now you have to design your own CPUs for that G34" ... its not logic at all... i can't see Supermicro Tyan, others, embark in an adventure that is a complete dead end... unless they know an have assurances that you don't.

[ UPDATE: in the end there could be only one socket for Server and enthusiast/professional... seems logic, it will cut a lot of costs with packaging and market position... it will be GC36, LGA, prepared for up to 4 channels of DDR4, and PCIe and HT/HTX on board, and could accept not only MCM parts but also single die parts... in the end it may be not only AM3+ that is dead, it will be also C32.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Don't feed the troll ... oh! well... i feel like feeding the pigeons lol

if Hasfail is 110% better than Piledriver, means that Piledriver is WAY worst than Athlon K8 lol

Can i recommend some kind of psychological treatment... mild, relaxed, with no offense ? (before its too late)

I'm not really interested in pointless bickering, much less with insane arguments, like hooligans about football matches . In the end all have brand preferences for some reason or another, its even sane... but none of those IDMs are paying no one to embark in demented campaigns... unless they are paying you... *if not* a psychoanalyst recommendation i think is not insulting at all.

[ UPDATE: ok!.. i'm sorry... i think that there is a lot you don't comprehend (or conveniently forget)... WHY ? WHAT IS GOING ON ?

Applying the same logic you do... HAISFAIL is 110% better !? ... NO! PILEDRIVER IS ~650% BETTER THAN IB, AND SINCE HASFAIL IS LESS THAN 10% BETTER THAN IB MEANS PILEDRIVER IS AT LEAST ~640% BETTER THAN HASWELL LOL

PoetgreSQL pgbench
http://www.phoronix.com/scan.php?page=article&item=llvm_clang33_3way&num=4

Of course none of the above is true, just compiler and software are much more easy and powerful to push performance with, than hardware changes... don't change the uarch, change the benchmark software, will always show gobs of better performance... LOL ]
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780
Careful...trolls turn to stone at the first rays of daylight lol ... DB engines run >90% from DRAM(otherwise no one would buy a server system with more than 128GB... and now they are pointing to 1TB lol)
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Nvidia has confirmed Maxwell is 20nm. About Logan, I cannot find any official confirmation only this

http://www.brightsideofnews.com/news/2013/7/25/nvidia-shows-kepler-gpu-inside-the-2014-next-gen-tegra-at-siggraph-2013.aspx

Do you have any link for your 28nm claim?



Inside? LOL. I can sure 100% that I said something different. I was saying that IBM and Nvidia will release an heterogeneous chip with a Power8 CPU plus a CUDA GPU integrated in the same die, not CUDA "inside" Power8. LOL
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


I'm not really much of a digger of internet news, much less histerical rumors... but where does is say unequivocally that Warsaw is Piledriver ?

For server it seems clear (based on this slide) there will be 2 versions for Seamicro Clusters, one ARM other an APU (Seatlle and Berlin) and one version that is *compatible* with a *compatible socket* (meaning same G34 form factor, like FM2+ is *compatible* with FM2) for the Open Platform definition (Warsaw)
https://semiaccurate.com/assets/uploads/2013/06/AMD-Server-Roadmap.png

The only reference to Piledriver is in this slide
https://semiaccurate.com/assets/uploads/2013/06/AMD-Warsaw-Slide.png

But references Piledriver in the context of

" Seamless migration *and from* Opteron 6300tm series "

* Upgrade options to 12 and 16 Piledriver cores OPN

Doesn't make sense at all those rumors... 12 and 16 cores Piledriver is exactly what is there already ... upgrade to the same ? ... or is it that 12 and 16 Piledriver chips (6300 series) are possible in the new server boards, and there will be then an upgrade path * and from* them ? (use the new or use the old 6300 and upgrade later)

If there isn't a new socket why the word "compatible" for socket and platform ? ... upgrade the same Piledriver based CPUs to the same sockets and platforms( this i not compatible, its the same) ???

If there is a new socket, then why not SR based ? ... and all makes more light and sense if this socket is to G34 as FM2+ is to FM2 ... right ?

(EDT)
And then why not the new possibly called GC36 socket be the same for FX kind of SKUs ?... yes it will be more expensive, perhaps with 4 DDR channels as Intel HED... when Intel extreme launched the scream from AMD enthusiasts was exactly why is not there a G34 socket i can use... perhaps AMD will be delivering that exactly, perhaps they are confident about the performance of the new SKUs to make it more expensive... and in the end hUMA/HSA will be possible over discrete Radeon adapters and seamless if HTX slots are used...

(EDT 2)
And if the approaches to HBM memory id true, they are going to need something like a G34 socket... even if what is inside "interposed" with memory are going to be only APUs... AMD seems serious about interposed HBM, as the presentation at HotChips hints.. they will use a silicon interposer for sure(way better), with 1 or 2 CPU/APU die and a stack of HBM memory on a central controller or not... G34 like will be more than needed now for "desktop" systems than ever, no way you could put a stack of HBM in a form factor of something like a FM2 socket, no way in hell, so the sooner they get users familiarized with large socket the better...
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Nope, regarding MT the 8350 is at the same level of performance than the 3770k. AMD only needs to improve the 8350 about a 5-10% to caught the 4770k and that is achieved with the Fx-9000 series.

Superpi is not representative of any real software. Moreover, I already shown you that a bug in BIOS affects the performance of AMD chips in superpi. A fix is available and this improves superi scores a lot of.

GIMP scores at your bit-tech link show a 4 threads Intel chip outperforming an expensive 12 threads Intel chip. It is evident that the software is using only a fraction of the top-end Intel chip. Very very very little people would purchase a FX-8350 or a 3930k to play with software that only uses about a 50 or a 25% of the chip (you seems to be an exception to the rule).




Of course HSA works with discrete. I already said this many pages ago. Don't understand why you now repeat it to me. Not sure about hUMA. If hUMA is an unified memory architecture the answer is "yes", if hUMA is an uniform memory architecture the answer is "not". AMD insist on that hUMA is uniform. I asked AMD to clarification, but they respond with their hUMA slides, which don't clarify the issue.

Just to emphasize that the HSA specification clearly says that its memory model is unified. Therefore, the confusion here is on AMD side.



No today. AMD is covering those with the new Warsaw CPUs. They are to Opteron Servers what Centurion was to FX desktops. I don't expect any pure CPU after Warsaw, but an excavator 4C APU/CPU.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Hysterical rumours? LOL. What about AMD official presentation of its server strategy? What about official server roadmaps such as this one

amd-server-roadmap-2014-il.jpg


which says clearly that 12/16C Warsaw is Piledriver. Steamroller only comes in 4C Berlin.
 
People seem to be misconstruing that AM3+ = FX = "Desktop Performance". That's not necessarily true as FM2+ can do everything that AM3+ can do, no need to support two sockets anymore. I don't see anymore AM3+ chips being produced.

That being said, there is absolutely nothing preventing them from putting 3~4 modules onto a chip with a couple of CU's for HSA, calling cream muffin (tm) and pricing it at ~$180. Four cores is prefect for a low wattage notebook / mobile CPU like Kavari but isn't enough for desktops. AMD has had a six core CPU presence in the desktop world for far too long to not provide ~something~ for people.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


That will be a HUGE TREMENDOUS VICTORY but for Nvidia... its a complete surprise, i'm tempted to bet everything against it lol

Besides no way in hell and all demon regions would a Power 8 with eDRAM work or be anywhere close to 4Ghz on crap bulk processes, TSMC or even Intel... 2Ghz TSMC would be a damn luck lol(edt)... besides i believe IBM eDRAM is not possible without SOI.

So it must be Nvidia to make a port to the secret SOI processes of IBM... akin to IBM open up its secret sauces and powders of their processes...

Yes another fab with SOI is possible to guard the secrets, and then a join development... but no one else has 22nm SOI... PD-SOI needed for eDRAM... except IBM, and no one will have that ever. So it must to be IBM to make the HUGE EXPENSE of porting a huge complex chip like Power 8 to other SOI process only for the joy of having a kind of APU, when in a future revision they could use an evolved version of 1 or 2 SP co-processores of Cell per Power 8 core, have a gazilion of more GFLOPS with that, and better without the graphic specific stuff that is not needed for server systems alike, since no one will be playing games out of a $100k server, neither will there be multi-monitors attached to it... and above else, this nvidialess option, without having to pay a dime to Nvidia for the privileged lol...

OTOH If you said Nvidia will pay IBM for the privilege to use the new PCIe v3 cache coherent CAPI for their Quadro offerings, making it a potential preferential option for IBM systems with GPGPUs, then it sounds much more like IBM, that so far hasn't payed royalties to no one but has received them from a lot of others... or in tandem Nvidia also license the Power ISA (base of Power 8) that has gone "open" like ARM, and then Nvidia pay even much more for the privileged ( but why would they do that if no "client" software for Power ISA is out there, unless Apple revives their app base, or more logic they invest in porting the Linux huge base to Power ISA (edt) ??? ) ... if NOT something in this lines i thing that IBM+Nvidia APU is just another one of those dead-born ridiculous rumors lol... i will not wait with the utmost interest and expectation... i would not wait at all lol

 
Also ... who the hell is running an Enterprise Database solution on 128GB or memory? The industry standard is 256GB per box and goes up depending on whether your doing horizontal or vertical scaling. Xeon's are for cheap commodity box's that run ESXi with apache / NT / ect.

As for "cost", you can tell someone isn't informed when they think the cost of the hardware is a selling point. Oracle Enterprise Database 11g is $47,400 per processor, without the support or additional feature costs. The way you find your "processor count" in multi-core systems is to multiply the number of "cores" by a metric value that Oracle provides, usually 0.5 except in the case of IBM Power which is 1.0 (Oracle sticks it to the competition). So a single T5 CPU is 16 cores (each doing 8 threads) and dual socket is pretty common for a total of 32 "cores * 0.5 = 16 "processor licenses". 16 * $47,500 = $760,000 purchase for a single box. That's without any support or additional options. A SPARC T5-2 box with 512GB of memory will run you about $70,000 ~ $80,000 depending on options (10Gbe / ect..). A four socket T5-4 @ 1TB of memory is about $150,000 ~ $160,000 USD and has 64 cores (512 threads) for a licensing cost of $1,520,000 USD. Oracle Weblogic Suite is $45,000 per "processor", though you can purchase stand alone versions for $25,000 and under per "processor". Then you have a ton of middleware products for adapting and integrating your Line of Business (LoB) applications into the whole damn thing. These all tend to be purchased together as part of a engineered solution which is designed and implemented by people like me.

Suddenly the entire idea of "hardware cost" being used to determine purchasing decisions is rather silly.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Ok!... this one seems more legit, the one at wccftech seemed clearly a photoshop job

Yet, the same logic remains, even its PD based... is a Piledriver tweak with a new socket, and this last part seems much less dubious (compatible G34). Now, good or bad depends on the fab processors... and YET IT DOESN'T SAY ANYTHING CONCRETE on the fate of FX...

FX then could be a new design for late 2014 as example, because the most pressing problems of FX if it is to grow in "number of cores" or in "clock speed", is that is in need of more DDR3 channels, or then the same 2, but higher speed ( 3200MHz as example) DDR4... and OTOH it doesn't need all the links for 4s systems and other RAS features of top server chips (not used anyway), alleviating that way power wasted and making dies smaller with more yields.

DDR4 like that only by end of 2014 beginning 2015. In the meanwhile, if that 28 SHP slides (gone as fast they appeared) of GloFo is 28nm high performance PD-SOI, and if they will have ultralowK dielectrics, than is possible Centurion under 125W... new Centurion could go 6GHz or more... Piledriver is not bad, matter of fact arguably "Centurion" as the crown for the more performant chip around, clock counts... a LOT...

Anyway so far we can only speculate... more clock derived from a new fab process based on the same design, a new die design based on Piledriver without being the same die of server, or a new design based on SR without being the same for server(this for sure)... but to me the most important is really the new socket, for the reasons exposed before.

What does not fit in the speculations is no more FX/server like chips, otherwise that Open Platform doesn't make sense, neither would a new "compatible socket"(AT ALL)... it could be PD for all SKUs different of APU in all of 2014 (doubt), but after Warsaw there will be more, and there will be more for that new socket, and the same is quite good to have "enthusiast" desktop systems.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Wait! I think I got this wrong. Although the wiki claims that "Nvidia cores" will be merged with "Power cores"

http://en.wikipedia.org/wiki/OpenPOWER_Consortium

I have searched more data and it seems integration will be not made at die level, but that it will be a CPU+dGPU integration

http://www.pcmag.com/article2/0,2817,2422829,00.asp

This makes much more sense, besides your SOI reasons, this Power8+CUDA integration aim will be competing with Xeon+Phi.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Yes but what you don't seem to understand is that is "unified" at the **VIRTUAL MEMORY** not physical

hUMA SLIDE
http://mygaming.co.za/news/wp-content/uploads/2013/05/AMD-Kaveri-hUMA-shared-memory-600x337.jpg

"uniform" is "virtual memory" not "physical memory", and was that even before being hUMA (memory coherency) and that is why HSA standard is only IOMMU v2 (encompasses this possibility).. i think that was clear since the beginning in 2011, AMD talked of "memory coherency" didn't talked of "cache coherency", and that the HD7000 series have this "memory coherency" ability was hinted elsewhere (doesn't due any good yet because exactly the drivers are lacking..see below) , yet HD79## series(more precisely) could never had cache coherency because it was on CRAP, sell Intel dead-end design CPUs instead, PCIe...

*If* HT/HTX 4.0 materializes all can change, cache coherency will be kind of ccNUMA for the "physical memory" pools, but will be coherency by *regions* (not all of an adapter DRAM need to be "physical" coherent), an adapter DRAM pool can mix cache-coherent with non-coherent traffic seamlessly (so different from ccNUMA). And HT 4.0 is foreseen in the rumor of the GC36 socket, why in an educated guess i try to put a coherent puzzle together lol

GC36 RUMOR
http://translate.google.com/translate?langpair=auto|en&u=http://tweakers.net/nieuws/82803/amd-voorziet-boulder-serverchip-in-2014-van-20-steamroller-cores.html&sandbox=0&usg=ALkJrhhOW-FuZA2e5UZWNMgHUV_G3Lz71w
(broken translate got to copy and paste )
http://tweakers.net/nieuws/82803/amd-voorziet-boulder-serverchip-in-2014-van-20-steamroller-cores.html
(original)

The only thing they need to make this work for all HSA members, which most probably wont adopt HT/HTX, is to make their "Light weight Notification" part of the standard. HT adapters or discrete (soldered stacked) heterogenous accelerators, could talk seamless with the same PCIe or in any other interface, without CPU intervention (AMD tech and that is what LWN does), and also facilitates having PCIe cache coherency "emulated" in runtimes/drivers.

hUMA is intrinsically dependent of the OS for this "uniformity" of virtual memory... hUMA/HSA will need a *runtime/driver* even if everything is tightly integrated in a single chip with only a DRAM pool, simply becasue the principal coordinating mechanism for Virtual Memory in a system is the OS. And Runtimes and drivers are yet to be specified by HSA, but they are foreseen.

See HSA like a very low level Virtual Machine for this HSAIL definition, its a JIT approach mostly, though it could be compiled to metal (specific CPU + GPU/Accl ISAs) at install or before, and in a mode to understand better we can say is similar to Java VM... Any program in HSAIL can run in a lot of different architectures and systems *unchanged*(even have a binary format)(edt). hUMA just adds to this, the cache coherency requirement can be a VERY broad definition (can be any cache level of mem pool and implementer wants and is driver/runtime defined) , its not ccNUMA, many different protocols can suit if they talk the MMU/TLB +DMA language of IOMMU... so most probably there will never be a coherency protocol defined for this "physical memory" hUMA aspect (HERE THE DIFFERENCE).

All this is relevant, because most of those rumors of FX/server big cores etc are dead, i suspect are based on ppl having this wrong idea that hUMA/HSA will be only for SoC, and can function only with a single pool of DRAM (nothing discrete applies)... and since AMD will be all behind HSA, and server update is yet PD, then AMD has given up on those (could be very very wrong... PD tweak with HTX slots and HTX Radeon way better than now, and "never settle" full of HSA features, could just kick intel to oblivion)


 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


I could see Intel license the CAPI stuff with the utmost enthusiasm... a PCIe v3 cache coherent Phy X board...

But then of course Intel is the same or worst of IBM... having to pay royalties to some one is something that can make them lose their sleep... and i think PCIe cache coherency is foreseen for the next revision of the standard (v4 16GT/s like supposedly HT 4.0 is).

In the same logic IBM could have the Blue Gene like (/Q not jeans lol) or other supercomputer chips full of FMAC pipes... and not license CAPI cheap, so i think the CPU+dGPGPU integration is kind of "unilateral", its not join development anything, and Nvidia will have to pay right for it lol

 

i've always liked intel's double-dipping strategy. looks like reed is implementing his own version at amd. in the end, amd is a business like any other, they're not obligated to pamper desktop pc enthusiasts(1-5%) unless doing that made them sizeable amount of money.
N.B. - i am not a pc enthusiast, just enthusiastic about pcs in general. :D
 
http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler
"NVIDIA got Logan silicon back from the fabs around 3 weeks ago, making it almost certain that we're dealing with some form of 28nm silicon here and not early 20nm samples."

http://www.pcper.com/reviews/General-Tech/NVIDIA-Introduces-Kepler-Ultra-Mobile-Market-and-Tegra
"We are assuming that it will be NVIDIA’s second Tegra on 28 nm."

Most tech sites are saying 28nm, I have yet to see any valid claims of 20nm since the 20nm ramp will likely be in H2 2014 for anyone who isn't apple.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


As I said before an uniform memory access model provides a single access time to physical memory. A non-uniform memory access provides two times: one for local memory and other for remote memory. This is standard stuff.

In the slide, AMD clearly represent a single physical memory pool. This works for kaveri APU, but will not work for dGPUs. Once again, the HSA specification clearly states that its memory model is unified. There is not a single mention to uniform or hUMA in it.

If we want now reinvent the words and give them other meanings then the confusion is guaranteed.



IBM has joined with Nvidia because it cannot compete with others using only CPUs. The partnership with Nvidia is IBM acceptance of heterogeneous compute. For HPC, either IBM would release something as the Phi or would complement CPUs with GPGPUs. IBM chose the latter option and I applaud them.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


The confusion is yours, there isn't such ting as single access time... as in linear time... not even for MPUs (multi processor units), which are simply UMA without "h" or "N"... otherwise they couldn't have more than 1 channel of memory, and they couldn't have more than 1 core due to contention.

All memory access are "arbitrated", the need for different access domains is exactly imposed by this "arbitration" due the enormous differences in latencies about probing coherency and access, and exactly because different mem/cache pools can have enormous distances (at the system scale) between them.

But when you go "virtual" the arbitration is much more easy, yet in the end you will be accessing physical memory of some kind, the virtual is only on the arbitration and addressing.

The MPU that you use, is UMA yet quite for sure you also use a "virtual file" (windows) or a swap partition (Linux) on disk... this swap on and swap off, so to speak, can go on even if your physical DRAM is not full occupied... the access time differences can be enormous, if you ever noticed sometime... yet is not because of that your applications don't see this so called virtual spaces as if physical memory (underlying is always "physical" only this case "disk"), and is not because of that your MPU stops being UMA.

[EDT . Besides the "word" UNIFIED should had spelled it to you, if it were about time it would had been SIMULTANEOUS or NON-SIMULTANEOUS... but it is about "addressing spaces" not "time". A traditional SMP (simultaneous multiprocessing) system with 2 sockets, is neither UMA or NUMA, each socket has its own addressing space that can be accessed simultaneously, and the unifying characteristic is exactly virtual memory ]
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


UMA => single access time (uniform access).
NUMA => multiple access times (non-uniform access).

http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access
http://en.wikipedia.org/wiki/Uniform_Memory_Access

Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors).

Not only HSA is using the correct term unified, but Nvidia also uses the correct term unified for its next gen dGPU

GPU-roadmap-copy-640x356.jpg


Nvidia Maxwell GPUs will be able to access its own local memory (GDDR5) and also the remote memory (system DDR3). Two physical memory pools => two access times => the memory model is non-uniform but unified.

===================================================================

More stuff. Charlie speculation about the XboxOne CPU being clocked at ~1.9GHz was not right:

We recently just went into full production, so we're now producing en masse Xbox One consoles. We've had real good progress on the system. In fact, we just updated the CPU performance to 1.75 GHz on top of the graphics performance improvement, so the system is really going to shine [and] the games look pretty incredible.

http://arstechnica.com/gaming/2013/09/xbox-one-gets-a-cpu-speed-boost-to-go-with-its-faster-gpu/
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


I doubt the big cores are gone forever, but like Intel there will be a bigger lag between updates. AMD likely waiting for 20nm for their next big core push.
 

8350rocks

Distinguished


Maxwell was cancelled entirely per S|A...
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


http://news.softpedia.com/news/NVIDIA-Maxwell-Graphics-Cards-Confirmed-for-Q1-2014-377838.shtml
 
Status
Not open for further replies.