AMD CPU speculation... and expert conjecture

Page 278 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Yes that blog owner reminds me of hafijur lol

"Stacking" wont be here for at least a couple or years more... and MainSoC being >500mm² only in is imagination, and even if true, there wouldn't be enough space to put on the same die anything close to the GPU (dGPU) he claims.

OTOH Xfire is very facilitated, specially if not PCIe... that CPU NorthBridge complex if following other AMD APUs, is already a Hypertransport switch of some kind (ALL CPUs and APUs of AMD so far have the same), its needed for "clean" CPU<->GPU cache coherency. So a 8bit HT link from there would be enough and very natural... for something soldered on the same mobo.

But from here to actually have some dGPU is miles apart... it could be in revisions of the chips, doubt there is more than announced now.

 

griptwister

Distinguished
Oct 7, 2012
1,437
0
19,460


Lol, In what sense? In the sense that he has no idea what he's talking about? Or in the sense that the name "hafijur" builds a picture in your mind of similar looking man to the guy who makes those blogs? Or both?
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


I would... be very surprised... already am. "Deep trench" eDRAM and FD-SOI just don't go very well together, those are very shallow thin films in FD-SOI, you break the SOI film to make deep trenches and it wont be SOI anymore, PD-SOI could have it(deep buried SOI film), but i was under the impression that after all the research and development and announcements of IBM concerning exactly 22nm FD-SOI, that Power 8 and next Z chip would be FD-SOI.

For what i know there is ways to do this on FD-SOI, but seems horreful expensive and complicated (breaks SOI an reconstructs it deeper in the wafer subtract).

Glofo hasn't yet put its acts together concerning 28nm, nothing on market yet with 28nm except chipsets and like... 22nm and then PD-SOI in SHP format,(edt) would be quite a jump, very NOT likely. OTOH their 28nm FD-SOI is already close to 22nm, its a ~half node shrink compared with 28nm bulk "as stated".

If there would be 28nm PD-SOI, and not half shrink, it would already be wonderful... specially if with ultralowK dielectrics for the interconnects, 5Ghz at or below 125W is at hand, with a longer pipeline (17 stages like SNB/HSW) more clock prone FX/server die design (edt). Nevertheless i'm kind of convinced that no "foundry" will do it. For all purposes "planar" techs except for FD-SOI, below the 20nm, are completely kaput, either SOI or bulk ("fully depleted" will be the new mantra either with SOI or finfet). If IBM is 22nm PD-SOI it would be the last of its kind... a 650mm² chip at 4Ghz makes Intel look silly, and Nvidia dear leader cry like a baby lol ...

But we have to remember IBM is not a foundry, its "special" processes are not for licensing, though the "Alliance fab techs" are based on those, there are plenty of differences.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Dedicated co-processor ? ... why complicate that much with kind of proprietary ISAs and models, when something like OpenCL is much more flexible, and with HSA even C++11, AMP... etc (even CUDA would be better).

I think the trend is exactly a "compute" oriented graphics pipeline...

http://cdn4.wccftech.com/wp-content/uploads/2013/09/APU-+-DGPU.jpg

... with HSA, nested parallel computations is at hand. "Ray-tracing" and like could integrate seamless with all "physics" in a game, raster would still be here but it could lose much of its mojo, and it will be based of "hyper-tiling" approaches that seems where DX12 (>DX11.1 ) is headed, its possible very large "texture" computations on very small memory with tiling and compression, and that is where ESRAM and tiling enters with MSFT tests, 3Gbit textures on 4 MB memory, and all ROP ( Rendering Output Pipeline) could be based on ESRAM with good dose of tilling, it would alleviate the need for gobs of bandwidth to be efficient.

 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


LOL ... you really sound like that X Box One blog poster...

Power 8 is 96 threads in a single chip with wide issue order (16 exec pipes) per core (8 threads per core, 2 pipes per thread)(edt), and ~2.5MB of L2/L3/L4 per thread...

All its software is "optimized" compile-install, and if not enough Power ISA is way more powerful and efficient than x86. Its power is <250W... and even if 250W, it makes 250/96 = 2.6W per thread... and equivalent 20 threads Hasfail like (which seems to be the max of server chips), would have to be a 52W chip at 4Ghz big LOL... and since those will be like 130W for ~3Ghz, it seems to me IBM has almost 4x the power/thread efficiency, and probably the ~same comparing perf/w (vs hasfail) , on those raw numbers compared with intel .... SILLY ! LOL...

IBM has a very good presence in ALL important data centers... if those guys were "biased" about buying chips with rated TDPs above 200W, IBM would be bankrupt... so perhaps they know something you don't, and have other metrics and considerations besides the "propaganda" you are so brainwashed with LOL

[ UPDATE: Besides Power 8 has a Voltage Regulator per core "in die" with extensive clock and power gating, compared with the single VR per chip off die of HSW... so Power 8 is prepared for the kind of intermittent loads scenarios, it doesn't have to be considered at full load all the time to maintain is superior efficiency ]

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I already mentioned before in this thread the one million particles demo run on PS4 using Havok engine, when I discussed why AMD/Sony want to run physics and AI on the GPU.

The ~470 games include CPU, GPU, and PPU accelerated PhysX. The two that I mentioned explicitly above use GPU acceleration.



Two things to note:

1) the ARM chip will be 4x the opteron, which means ARM already has jaguar performance, at lower power consumption (jaguar is currently the most efficient x86 arch).

2) AMD expects ARM server to outsell x86 server in the long run.

Interesting because this move the balance towards ARM in the recent poll that I started here in the forums and that nobody replied. LOL



http://www.eetimes.com/document.asp?doc_id=1280415

http://vr-zone.com/articles/amd-confirms-kaveri-will-be-in-the-hands-of-enthusiasts-in-2014/50308.html#ixzz2bQ2UwCI5
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


IBM and Nvidia have joined to integrate CUDA cores with Power8. The result will be a beast!
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Except that IBM POWER is years ahead of Haswell: E.g. POWER8 has SMT8. Haswell HT is like one quarter of that.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Efficiency

compute-2012-chart-1.png


Power7+ (BG/Q) is years ahead of Intel Ivy Bridge. If I recall correctly the new POWER8 is about 2x above the POWER7+.

Intel efficiency what? Intel is the kind in Intel-land, but the world is bigger than that.

Sure that IBM is fighting Intel, well no it is just the contrary Intel is fighting IBM (LOL). But Intel cannot compete with IBM in CPUs and the fight is between IBM CPUs and Intel combo made of CPUs _plus_ accelerators (Phi).

The rest of your thoughts about Steamroller are equally enjoyable. Besides the new nonsense, I enjoy that after spending months in this thread about Steamroller, you discover _now_ that it has two cores per module. LOL What were you doing during months of posting here? Don't reply, all of us know the answer.
 
Is he seriously trying to argue high end data processing with some of the posters here....

It's one thing to compare different x86 processors to each other under the umbrella of cheap commodity Windows / consumer computing. It's an entirely different ballgame to argue bit iron or high end database systems. Once you enter into the RISC world it's dominated by POWER and SPARC (Oracle's newer offerings have really turned it around). Intel doesn't hold a candle two those design's and there is nothing they can do about it. x86 is VERY BAD at moving massive amounts of data around, it sucks for HPC do to it's reliance of variable length instructions and front end decoders.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
You don't know. You invented that (as usual) and ignored corrections and information showing the contrary (as usual).

Only a note for you. The new processor is POWER8, the previous was POWER7+, POWER7 is two generations old.
 


Because the application where you would use PhysX are ALREADY stressing the GPU; its got no extra time to also compute complicated physics, which limits what you can do with the API in games. Same issue with DirectCompute: The GPU is already overworked, so you get very limited implementations that look nice, but add nothing to gameplay (extra partical effects, Lara's hair, some derbies blowing around, etc)

If you want significant amounts of Physics effects, you have to use a GPU-like architecture. And since the GPU is overburdened, that means you need a second co-processor. To me, the APU/iGPU looks mighty attractive for doing just that. You can do pretty tech demo's on the GPU easily enough, but for games, Rendering + PhysX (or any GPU accelerated API) is simply too much of a hit of FPS to be of much use.

My point is this: If I have a discrete GPU, but also have a built in GPU/APU, what can't I ALSO use that? That's the way I want PC's to go.

(Also FYI, NVIDIA implemented GPU PhysX using CUDA)
 

jdwii

Splendid


Sucks for us AM3+ owners who think Piledriver is not a worthy upgrade from phenom, but i just hope their APU comes fully loaded and beats the 8350/I5 in multitasking and is more competitive in single threaded apps then the 8350fx, being a discrete video card guy i still can care less about the graphics on the APU unless its improving my gaming experience or application performance even when i have a discrete video card so basicly the onboard video card runs in the background accelerating apps/games.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


I really don't understand why AMD would refresh a prodct range where they're bleeding market share to Intel and aren't competitive in performance/watt at all and then ignore a platform where they have almost 30% marketshare (gaming CPUs), are (arguably) better than or even with Intel, and the CPU is a part of the entire platform.

If you want my opinion, AM3+ is going to stick to PD until fall 2014 and then we'll see a new desktop chipset. AMD more than likely is avoiding coming out and saying no AM3+ SR (at least soon) because it would piss everyone off massively.

As I've said before, APUs can only take you to so far, and that's about the $150 performance bracket. There's only so much computational power you can squeeze onto a single die with a sane TDP. Considering high end AMD dGPUs are more than twice the tdp of your average APU, for AMD to go APU only would mean that their maximum gaming CPU would be about FX 4300 level.

http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/57615-amd-vishera-fx-6300-fx-4300-review-11.html

Look at that. Do you really think AMD is going to casterate themselves like that?

If AMD's going APU only, their Gaming Evolved program is going to evolve into Intel Inside. AMD needs a CPU with more than 4 cores to be competitive with anything more than an i3.
 

griptwister

Distinguished
Oct 7, 2012
1,437
0
19,460
http://www.eteknix.com/asrock-reveal-a88x-fm2-amd-motherboards/

Yet another motherboard manufacture... idk if we'll see the high end move to APUs. But I just hope they add more cores, because they're gonna need em!
 
most revenue come from oem sales, not retail. fx presents a difficult proposition compared to the apus which is why apus sell many more despite being 'less powerful' and 'cheaper'. or may be those two factors should be 'because' not 'despite'. the oft-cited reasons for amd's preference are - good enough and cheap. the apus are both and they're cheaper. and, they have igpus. almost no one (i.e. general public who drive sales) will pick an fx4000 over an apu. additionally, if hsa and it's components allow amd to acclerate multithreaded tasks well, there will be little reason for 'pure' cpus to exist. :D
i don't see how amd can revitalize customers' interest in their aging am3+ lineup by keeping so silent. only hardcore amd faithful and people who closely follow amd related information would have any interest. most interest is already shifted to the apus and arm.

from that article, i got the impression that amd should dump hypertransport for pcie. then what was the point for keller's joining? something doesn't match.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


This part:

The Steamroller architecture, which will be introduced with the Kaveri parts, will not be represented on AM3+.

confirms that some of us said here (there is not Steamroller FX), but people insisted on negating. The discussion about Warsaw is more of the same. There is no 6/8 Steamroller modules, that is why Berlin comes only as 4 cores and Warsaw 12/16 cores are based in Piledriver. Warsaw are to servers somewhat as Centurion to desktop, the last series and both are aimed to 'legacy' software. Future software is heterogeneous.

AMD is not releasing high-end servers using Steamroller, because the market is moving in other directions. AMD assault to the server market is based in the new ARM CPUs. In fact AMD predictions are that sales of ARM will surpass x86, reason for which they are not releasing high-end new cores. My only surprise is that their ARM approach is a CPU instead APU. I expect future ARM APUs from AMD.

The article mention that their HSA approach is like their 64 bit approach. This analogy was also mentioned here before. I even remarked that AMD did learn from its mistakes and are releasing HSA via a foundation where big players are participating instead pursuing a new technology in solitary.

The article mentions Intel approach to HPC heterogeneous computing, but lacks to mention an important player: Nvidia. Nvidia is releasing its own heterogeneous approach. Next year Nvidia GPUs will include unified memory and ARM cores for 'standalone' GPGPU computing and will release the waited Logan 'APU'. Moreover Nvidia has just partnered with IBM, Google, and others for developing high performance heterogeneous compute.

I agree entirely on that "The only real problem here is that GLOBALFOUNDRIES’ 28 nm process is late to market for AMD and their CPUs." Nvidia is about to release 20nm 'APUs' and Intel is moving to 14nm with its heterogeneous approach. In my opinion AMD will only compete in price.

The article speculates about the possibility of releasing "a 3 module Kaveri part next year well after the two module units have been on the market a while". I doubt, because Excavator is announced for 2015 on 20nm, but all depends on if excavator is an early 2015 launch or late, or even if it is delayed up to 2016 or no. If Excavator is an early 2015 launch and match predictions (40+% IPC) then I don't wait a 3M Kaveri. If Excavator is delayed or fail to deliver expected performance, we would see a 3M kaveri to fill the hole.

Latter the article discusses why we will not see "core counts should not go up dramatically on the desktop and notebook market". I agree on that 4 cores will be the standard. The problem however is not due to software, as the article says, but due to CPUs being inefficient at parallel workloads. As the image in the article says CPU are optimized for serial works. We will see massive increase in number of cores, but only in GPU/accelerators cores, because those are optimized for parallel tasks.
 
nvidia is nowhere close to 20nm. Thats why logan is 28nm and won't be out until mid next year. Maxwell will probably be their first 20nm part and thats looking like its not coming out until mid next year too. The big maxwell, GM100 probably won't be out until later on for a more mature process.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


With their modular design AMD could release a 3-4 module APU with a very small number of CU (1-2) for graphics or even none if they wanted. Parts like the Athlon 750k are APUs without graphics, and they work in FM2 motherboards. The question is will they?

They have released parts specifically for bundling with graphics cards for embedded markets. This could be extended for the desktop/workstation space.
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Ok it seems i was off by a couple of months, 4Q starts in less than a month... (boy! i can't sleep tonight lol)



No "CUDA" cores are inside Power 8.. i can assure you (100%)

One feature that is remarkable in Power 8 is they (IBM)(edt) devised a "hardware" way to have PCIe v3 cache coherency without changing the protocol... its called CAPI (Coherent Attach Processor Interface)

http://translate.googleusercontent.com/translate_c?depth=1&hl=pt-BR&rurl=translate.google.com&sandbox=0&sl=ja&tl=en&u=http://pc.watch.impress.co.jp/docs/column/kaigai/20130828_612950.html&usg=ALkJrhj8a_PNrnUkhvKlLbCPgtg9O_VhgA

SLIDE IMAGES
http://pc.watch.impress.co.jp/img/pcw/docs/612/950/11_s.jpg

http://pc.watch.impress.co.jp/img/pcw/docs/612/950/10_s.jpg

For that you have to augment the PCIe interface in your adapters... it could function with CUDA based and well GCN based, its not in the GPGPU uarch, its in the "interface".
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Uter rubbish !

If SteamRoller is 30% better in a generalized way it will be clearly ahead of Hasfail in single thread (WAY AHEAD)... of course bentmaks will be coded to not show that, but that is a problem of Windows, and not representative of the software you run right now... which could just right now be faster on AMD than Intel LOL ( just try it lol).

Besides Power 8 will NOT be *potentially* faster than HSW in single thread, it has 2 pipes per thread, but only 2 " kind of simple ALUs" per core (add + mul)... HSW will have 3, while SR if the AGUs will do ALU add+ mov (which seems the case) will have 4 pipes per thread.

There is a penalty in shoving 8 threads trough the same core, and in the same logic AMD approach will always *potentially* be better for single thread than Intel share all exec approach (4 vs 3 of Haisfail). Of course in case of IBM single thread is not at all that important, their workloads will be massively parallel (multithreading), the beating will be enormous compared to intel... and in case of AMD if they ever drop the *Module Approach* that Intel will copy (have no doubt about that, they (intel) have no choice to scale) for sure, they not only lose the *potential* better single thread of the CMT (cluster multithreading) they will have a FlexFPU that cannot scale as well (AMD FP256 as example, just cannot mix well with 64bit Integer cores).



Sure... they will not run windows.

And the "allusions" to intel just shows that they behave more like a criminal syndicate when concerning "business", than a pure tech oriented outfit.

As to power(Watts) ARM server will deliver a severe beating... just wait and watch.

 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860

this:
For the higher module count parts, we have to wait for 22/20 nm process nodes to open up for AMD. This should happen in late 2014.

HAHAHAHA ... especially if this is from GF. Samsung or tsmc, maybe.

As for the article saying GF is holding kaveri on delays, I thought 28nm bulk was already shipping.

AMD is going to lose 30% market share if this article holds true.

The G1.Sniper A88X is the very top of the line FM2+ that Gigabyte is showing.
Im not interested in weak cores that will do "ok" and motherboards that can handle a maximum of 2 PCI-E slots, 1 at x16 and 1 at x4. ya, thats right, they don't even do x8/x8 xfire/sli, not that kaveri can handle crossfire or sli at high end.

AMD downgrading their current computing power for 2 years or more? not interested. Might as well give Intel the keys to the office so they can steal you blind and a@@ rape their customers.
 
Status
Not open for further replies.