AMD CPU speculation... and expert conjecture

-Fran- · Feb 10, 2015

juanrga :

blackkstar :

This image from Hallock's blog article is very interesting:

Because it shows that porting DX11-->DX12 requires only medium effort, whereas porting DX11-->Mantle-->DX12 requires medium+low. Therefore even AMD is giving another reason why game developers will prefer the direct native DX12 path.

HAHAHAHA!

I'm seriously laughing out loud, Juan. You're not taking that slide at face value, right? Seriously? xD!

Cheers!

juanrga · Feb 10, 2015

-Fran- :

Apple, Nvidia, Cavium, APM... were the first ARMv8 on the market, not AMD. APM has been selling XGene1 for a while and are currently sampling the second gen XGene2. APM hardware has been already benchmarked and several partners are selling it even with GPGPUs included. It seems you missed some old news

APM hardware performs better because uses custom core, unlike AMD, which is late to the party and uses A57 cores because lacks a custom core.

-Fran- :

Wait! Are you saying that blackkstar wouldn't take too seriously Hallock's claims about Mantle? Do you refute AMD's claim that Mantle is "the easiest way to port to all platforms"?

juanrga · Feb 10, 2015

blackkstar :

Interesting interview to AMD's Roy Taylor. Let us quote him praising both Windows 10 and DX12:

Q: Why are you excited about Windows 10 – can you get into specifics?

A: “I am a huge fan of Windows 10 and have been using the Tech preview version for some months. I think that Microsoft has done a really incredible job of integrating a seamless experience for smartphone to tablet to notebook to desktop. I will be buying my first Windows phone when Windows 10 versions are available.”

Q: What are the benefits of DirectX12?

A: “I’m super excited about DX12 for a number of reasons, mainly because it embraces multi-threading in a way that developers have been asking about for years. Its low overhead nature will unlock graphics performance in a way we haven’t seen before, much like our own Mantle. DX12 games will perform better on a wider range of hardware, making a great experience possible for more people.”

http://www.itworldcanada.com/blog/amd-vp-roy-taylor-talks-about-windows-10-virtual-reality-security-personal-identification-trends-more/101728

-Fran- · Feb 10, 2015

juanrga :

I didn't know that, to be honest. Thanks for sharing. Now, I'll just wait and see if AMD ends up using a "non-custom" A57 for their ARM entry. I really doubt AMD would use ARM's reference design just like that. I know it's their first ARM entry, but I do think they know better about Servers and can tweak the uArch to something interesting.

juanrga :

Actually, yes. I never take any information at face value. And ESPECIALLY from interviews. Even from technical fellas. Unless it's inside a technical paper, I don't (or try not to) buy into "marketing slides".

Porting has a lot of variables, so saying "yeah, it's totally easy to go from A to B using this tech" leaves a lot of information out. You always have to put into context that information. In the case of Hallock's, they revolve around "creating a game (engine?)" mostly (IIRC). Gamerk has more experience with games (I think), but I can tell you, generally speaking, porting something from technology A to technology B is NEVER EASY. Example: JBoss to Weblogic (or vice versa); DB2 to Oracle (or Postgres, MySQL, Mongo, etc). They sound like "how can it be hard if they're the same?", well, it all *depends* on how tight you have the implementations around the base technology used first.

Cheers!

EDIT: Wrong Tags.

truegenius · Feb 11, 2015

juanrga :

was wondering if dx12 is providing better performance for same hardware over dx11 and they will use dx12 in xboxone so this should atleast improve some performance ( maybe some times very significantly, may get much ahead of ps4 sometimes). So was wondering what will be sony's answer to this to stay ahead of xboxone in terms of performance ? more optimization ? or ps4 version of mantle "mental" 😗?

juanrga · Feb 11, 2015

-Fran- :

AMD uses a standard Cortex A57 core and companion tech (e.g. ARM interconnects). You can find more info here

http://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-11-day1-epub/HC26.11-4-ARM-Servers-epub/HC26.11.410-Opteron-Seattle-White-AMD-HotChipsAMDSeattle_FINAL.pdf

They cannot tweak the uarch, because only licensed the core. The reason why use A57 core was explained before.

-Fran- :

Ok, but you could say all that in a reply to blackkstar, when he used Hallock's blog entry as [strike]gospel[/strike] source to spread more hype about Mantle. Instead, you did react only to my reply to him!

But guess what? I agree with you, Gamerk, and others entirely. In my reply to blackkstar I could just say that AMD's Hallock is spreading hype again. However, I decided to be a bit ironic and emphasize that even Hallock admits that DX11-->DX12 is an "easier" path than using Mantle according to "AMD internal estimates" buzzword.

-Fran- · Feb 11, 2015

juanrga :

Ok, looks like AMD won't be changing much from the A57. Fair enough.

juanrga :

Who is that chap, anyway?

Cheers! 😛

de5_Roy · Feb 11, 2015

*chews on popcorn*
anywho, in today's amd news

AMD Fiji HBM limited to 4GB stacked memory
http://www.fudzilla.com/news/graphics/36995-amd-fiji-hbm-limited-to-4gb-stacked-memory
*wonders what kind of spin this will get...*

Why You Don't See Coreboot Supported By Many Modern Intel Systems
http://www.phoronix.com/scan.php?page=news_item&px=Intel-Boot-Guard-Kills-Coreboot
the real reason is not microsoft.

juanrga · Feb 11, 2015

truegenius :

The PS4 has two APIs, one top-level and another low-level. The low-level API works nearly at metal level and has much less overhead than Mantle providing higher performance than Mantle.

The Xbone doesn't have standard DX11, but a custom version with extensions that allow low-level access to hardware. Thus whereas DX12 will improve performance on the PC side will not change things significantly on the console. Even Microsoft Xbox head admits that

http://www.kitguru.net/gaming/anton-shilov/microsoft-directx-12-will-not-dramatically-improve-xbox-one/

de5_Roy :

It is true that AMD will be limited to 4GB whereas Nvidia will not, but the explanation has nothing to do with 2.5D vs 3D stacking. In fact, it is false that Nvidia will use 3D stacking on the GPU. It is technically impossible. Both AMD and Nvidia use 2.5D stacking for high-end GPUs.

truegenius · Feb 11, 2015

de5_Roy :

is it typo or did they really mean fiji on "28nm"

atleast i saw easily understandable pic of stack memory implementation
here you can see that they are using different die for gpu and memory but are on same package, previously i thought that they will try to do this either on die ( some hypothetical exa scale apu the dgpu killer 😗 ) which will be very costly and maybe not doable or that easy. or on pcb, on pcb won't be that much of use as we have plenty of room on pcb for memory and connectors/ printed wires thus no stacking needed apart from some power saving
now i am just wondering when they say 1024bit wide do they mean connectors for separate 1024 bit (it is serial interface so more bits more wires), if so then how they will manage to do that (i didn't saw any implementation (for consumer) of stacked memory thus hard to render what they will actually look like in terms of connections), won't 1024bit will result in much bigger die !

as you can see above, stacked memory, and i was referring to this type of implementation for apu by saying something like side port memory on package

cemerian · Feb 11, 2015

truegenius :

de5_Roy :

is it typo or did they really mean fiji on "28nm"

atleast i saw easily understandable pic of stack memory implementation
here you can see that they are using different die for gpu and memory but are on same package, previously i thought that they will try to do this either on die ( some hypothetical exa scale apu the dgpu killer 😗 ) which will be very costly and maybe not doable or that easy. or on pcb, on pcb won't be that much of use as we have plenty of room on pcb for memory and connectors/ printed wires thus no stacking needed apart from some power saving
now i am just wondering when they say 1024bit wide do they mean connectors for separate 1024 bit (it is serial interface so more bits more wires), if so then how they will manage to do that (i didn't saw any implementation (for consumer) of stacked memory thus hard to render what they will actually look like in terms of connections), won't 1024bit will result in much bigger die !

as you can see above, stacked memory, and i was referring to this type of implementation for apu by saying something like side port memory on package

it will be the same amount of connectors, but the wider bus will be achieved thx to the memory stacking just like on samsung's Vnand, but those are just my thoughts, could be wrong

juanrga · Feb 11, 2015

truegenius :

If you read the article 20 TFLOPS AMD APU you can check by yourself that the exascale dGPU killer APU uses 2.5D stacked RAM, aka the APU die is connected to the stacked ram modules using an interposer. You cannot use a single die. That is technically impossible.

noob2222 · Feb 11, 2015

Guys, juans crusade against AMD is nothing new. He has always been on a crusade to promote ARM and now that his predictions are falling apart, again, instead of admitting the truth he is placing the blame on AMD for not making his wishes come true.

He should take his own advice and not make predictions without evidence, but thats only used when it fits his agenda of MANTLE SUCKS.

de5_Roy · Feb 11, 2015

truegenius :

i don't know for sure what process node fiji's gonna use. only rumors(+NaCl) so far....

truegenius :

the stacked memory on fiji (acc. to the leaks and rumors) isn't mounted on the pcb, it's on a silicon interposer which is then mounted on a substrate.

when they say 1024 bit wide, they're saying that each memory er.. module has 1024 pit-outs with each pin capable of transferring 1Gb/s (i'll have to look up jedec specs for exact specs) like a gpu i.c. has 256 pin outs in total for 256 bit bus (4x 64bit, each memory controller is 64bit wide and each pinout is 1 bit "wide"). back to the rumors of fiji "package" - since the gpu will have 1024 wires for memory bus, it'll not be economical for the card pcb due to electrical and manufacturing requirements (the card pcb will need more layers among other things afaik). thanks to 2.5D s.i.p. tech, the gpu can connect 1024 wires to the stacked memory and reduce the final wire count off the gpu package. it's 2.5D because the stacked memory is not mounted vertically on the gpu i.c.

having a 1024bit wide bus will increase the memory block size but this is where the node shrink comes into play.

i hope i got the facts right and clear. oh and the memory interface will be DDR.

juanrga · Feb 11, 2015

de5_Roy :

The main reason to use an interposer is not economical but one of performance. Putting the fast stacked memory on an interposer is the way to avoid bottlenecking the memory: both latency and bandwidth. If the fast stacked memory was outside the package, it would be bottlenecked by the off-package communication through the pcb. This is also explained in the 20 TFLOPS AMD APU article mentioned before. This is a relevant image:

The more close the memory is to the computing device more fast is the access to data/instructions. GDDR5 is relatively slow and can be put apart on the card and connected to the GPU die thought the PCB. 2.5D is the next step with memory on the same package than GPU and connected through an interposer. 3D is the last step with the memory directly on top of the GPU die and connected directly via TSVs. 3D stacking is required for the fastest access to memory. That is why the AMD compute exascale node uses 3D stacking for memory bound applications and 2.5D stacking for compute bound applications.

de5_Roy · Feb 11, 2015

juanrga :

lol, i am not touching this. may be after the amusingly public meltdown phase passes...

juanrga · Feb 11, 2015

truegenius :

I believe it is not a typo. 😉

szatkus · Feb 11, 2015

truegenius :

What else? WCCF came up with that 20nm Fiji rumor. Every sane source was indicating 28nm.

truegenius :

HBM != GDDR5, including buses.

juanrga · Feb 11, 2015

tourist :

juanrga :

could be as there will be different variants, just depends on which one is introduced first.

Nope.

truegenius · Feb 11, 2015

juanrga :

if not then it looks they are not much behind nvidia in efficiency
and its coming from [strike]amd hater[/strike] you means good 😀

i currently tested hd7950 for power consumption (using hwinfo64) and found that gpu took 125w and vram took 42w max (tried furmark, heaven, 3dm11, gpuviewer opencl opengl, mix of all to get this reading)
the card was running at stock settings (same as reference), and max power consumption was 167w, thus fairly below 200w ceiling [strike](i guess nvidia would have rated this card at 150w tdp i mean card power, just like gtx980)[/strike] 😗
so it looks like 4096 gcn on 28nm under 300w is possible by using stacked vram (cutting power usage here), reduced computation performance, aggressive voltage/clock setting, and maybe more mature 28nm ?

szatkus :

in articles i see that they are using ddr interface which mean for x bits there is atleast x+1 pins for data transmission
over 1024 connection looks possible on package as circuit's distance/length gets reduce and power consumption is also decreased thus we can reduce wire's thickness too to accommodate more wire's in same width ( HDI ? ), wiki looks to confirm this much connections
[strike]or is ddr for memory to logic chip communication and logic chip to gpu is some different interface with higher bandwidth per second per pin[/strike]

don't bother about these stroked lines thoughout the reply, human brain keeps on thinking things like this 😛

juanrga · Feb 11, 2015

noob2222 :

Not only the recentest data confirms what I said about ARM servers but also disproves those forum posters that claimed that ARM phones/tablets would start to lose demand:

http://community.amd.com/community/amd-blogs/amd/blog/2015/01/29/the-arm-server-ecosystem-continues-to-gain-momentum
http://www.cloudwedge.com/4891-the-rise-of-64-bit-arm-chips-in-servers/
http://www.zdnet.com/article/arm-profits-buoyed-by-demand-for-64-bit-iphones/

juanrga · Feb 11, 2015

truegenius :

I believe this issue was settled before. One wouldn't confound the efficiency of the graphics architecture with the efficiency of the memory subsystem.

HBM reduces power consumption compared to GDDR5. Also I believe the memory stacks will use 20nm which will add to increase the overall card efficiency.

noob2222 · Feb 11, 2015

Hey Juan

I guessed you missed this one. http://www.tomsguide.com/us/tablet-dead-as-we-know-it,news-20401.html
go ahead and keep believing that the decline in advances will not affect tablet sales the same way it did with DT.

I love your blogpost on amd's page posted in january talking about an upcoming conference in november 2014. How old was that original article?

de5_Roy · Feb 11, 2015

noob2222 :

as long as he can direct traffic to his site, he makes moniez and more incentive to spam it's link here and on other forums.

esrever · Feb 12, 2015

Juan can single handedly make this thread into an unfollowable mess.

AMD CPU speculation... and expert conjecture

Glorious

Distinguished

Distinguished

Glorious

Distinguished

Distinguished

Glorious

Splendid

Distinguished

Distinguished

Honorable

Distinguished

Distinguished

Splendid

Distinguished

Splendid

Distinguished

Honorable

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Splendid

Share this page