AMD CPU speculation... and expert conjecture

gamerk316 · Feb 25, 2014

palladin9479 :

Yep, Bandwidth isn't the issue for most tasks, latency is. Take access main memory by the CPU; no matter how much you can pump across the bus, if it takes you 1ms to get that data across, regardless of size, you are stuck for 1ms. Hence the CPU cache, which is really just a way to reduce latency as much as possible and keeping the CPU fed. VRAM operates the same way.

If the minimum read time is 1ms, regardless of how much data you can pump across the bus, every time you need to read from that bus, you stop processing for that amount of time. That's latency. And it kills performance.

8350rocks · Feb 25, 2014

juanrga :

-Fran- :

As mentioned before, the concept of APU is not restricted to having a GPU inside

http://en.wikipedia.org/wiki/Accelerated_processing_unit

The HSA specification considers explicitly the general kind of TCUs, which satisfy HSAIL-ISA.

8350rocks :

AMD said that 3 years ago and has not changed the plans for exascale supercomputers, because the laws of physics are the same now than 3 year ago...

I don't know how you got the 4x ratio, but it is wrong. The AMD chief engineer is using the common HPC practice of mentioning DP performance. The APU he mentioned gives 14x more theoretical performance than the R9-290X (which only offers 790 GFLOPS). Moreover, as I already said to you before, you cannot just compare raw FLOPS. In practice the single APU will be much faster than 14 discrete cards "R9-290X in uber mode" working together.

Let me mention that the APU designed by Nvidia engineers is a 20 TFLOPs beast. I will leave you as homework to get how many discrete cards "R9-290X in uber mode" working together you need to match the performance of the Nvidia APU for supercomputers.

I already said before that the process node is 10nm.

The link you provide doesn't say what you pretend. At contrary he confirms stuff I have said and you and others negated. He clearly says that current supercomputer architecture doesn't scale up. He mentions some of the challenges of the design of exascale supercomputers.

What he says is all well-known. The exascale designs from AMD, Nvidia, and Intel provide explicit solutions to the problems. Regarding interconnects, the AMD design includes a NIC of 40--100GB/s. The Nvidia design includes an interconnect of 150GB/s. This is beyond the buses used in current supercomputers based in the outdated CPU+dGPU architecture...

I have given you before a slide from SC13 (aka Supercomputer 2013), explaining how the supercomputers of tomorrow will use APUs instead discrete cards used today:

Xeon-Phi-Knights-Landing-GPU-CPU-Form-Factor-635x358.png

Can you see the words "Today" and "Tomorrow"? Intel expect the supercomputers of 2015 to use the APU at the right of the slide. What you propose (CPU+dGPU) is outdated and terribly slow.

Meet Dragonfly...500GB/s bandwidth interconnects...

Nothing discrete is outdated, or slow...you do not have that much data bandwidth in cache.

Cazalan · Feb 25, 2014

Lots of things can be done. It doesn't mean that's the way it will be utilized. I saw it and dismiss it as marketing fluff. Ultimately it isn't even up to Intel. It's the system integrators that choose that. Your Cray's, your SuperMicros, your Tyans. The people making building blocks for HPC.

You just said NVidia was working on a mythological 150GB/s interconnect for their exascale APU. Then when I explain to you exactly how that can be done you say it hasn't been invented yet? LoL! :pt1cable:

Of course internal interconnects will always be faster than external interconnects, that's basic physics. CPUs/APUs may grow to 100B transistors by 2020 but you'll still need tens of thousands of them to get to exascale.

juggernautxtr · Feb 25, 2014

de5_Roy :

NICE!!!! that has a port i need bad for updating my car scanner......... consider it owned!!

Cazalan · Feb 25, 2014

juggernautxtr :

Can't you just use one of those $5 USB to IEEE-1284 Parallel DB25 adapters?

juggernautxtr · Feb 25, 2014

Cazalan :

for some reason they cause errors in my code reader, have had to send 2 back because something in the programming is not being translated correctly.

vmN · Feb 26, 2014

Something I don't quite get, is why AMD is continuing their clombsy scheduling system.
I could imagine using a shared scheduler on mutiple cores(Just like on GPU), would balance the workload much better and could provide better performance.

de5_Roy · Feb 26, 2014

Faster computer performance targeted by next-gen HMC memory spec
http://www.pcworld.com/article/2101880/faster-computer-performance-targeted-by-memory-consortium.html

juanrga · Feb 26, 2014

8350rocks :

Now split the bandwidth per Aries chip among the whole number of Xeon CPUs. Cray engineers have measured the bandwidth for Sandy Bridge Xeon based cluster and they got 15GB/s available for each Xeon CPU.

Now compare to the bandwidths per APU given above and you can see why I am far from impressed.

Cazalan :

I find interesting that you took before Hazra marketing stuff (against heterogeneity) as gospel but now you take Intel roadmap and selling plans as marketing. When Hazra claims that the discrete card is for legacy users it is not making marketing, but telling you the plans of the company. This is the same situation when Feldman (AMD) say you that the Warsaw CPU is only for legacy users and that AMD doesn't plan to release Steamroller/Excavator Opteron CPUs. That is not marketing it is reality.

About what supercomputer makers will do, I can say you that Cray is one of members of the team designing the supercomputer based in Nvidia APUs. :sarcastic:

I have said that AMD/Nvidia APUs are scheduled for 2018 or so. Why are you surprised that the products are not ready today? If I say you that Carrizo APU is scheduled for 2015, will be you also surprised that you cannot purchase one today?

What I tried to say you above is that the new interconnects for the exascale level supercomputers are not simply obtained from taking a current interconnect and increasing the bandwidth. I already explained you that exascale compute is not obtained by simply scaling up current architectures/designs. One link explaining some of the new paradigms was given. You can continue ignoring it, but doesn't change anything.

Of course internal interconnects will always be faster than external interconnects, that's basic physics, but whereas you mention the obvious you ignore the relevant part. At exascale level there is a 10x power wall, which doesn't exist at current level. Current supercompuiters are based in a CPU+dGPU design whereas future supercomputers will be not due to this.

de5_Roy · Feb 26, 2014

AMD bringing dual OS solution to retail
http://www.fudzilla.com/home/item/34051-amd-bringing-dual-os-solution-to-retail
AMD to offer Android emulation on retail products
http://semiaccurate.com/2014/02/26/amd-offer-android-emulation-retail-products/

i want this on a kaveri apu rite nao http://www.techpowerup.com/198227/hybrid-memory-cube-consortium-releases-hmc-2-0-specification.html 160GB/s from a 2GB chip @ 70% less energy mmm... <3 perf power efficiency...

Catalyst Beta 14.2 V1.3 drivers are optimized for Thief
http://techreport.com/news/26085/catalyst-beta-14-2-v1-3-drivers-are-optimized-for-thief
Dual graphics DirectX 9 application issues have been resolved rly :O

AMD Press Talks Up Major Open-Source Linux Driver Features
http://www.phoronix.com/scan.php?page=news_item&px=MTYxNDc

AMD reportedly moves desktop headquarters to China to strengthen competitiveness
http://www.digitimes.com/news/a20140224PD214.html

AMD-powered fit-PC4 released
http://www.fanlesstech.com/2014/02/amd-powered-fit-pc4-released.html

Cazalan · Feb 26, 2014

de5_Roy :

That ups it to 60GB/s. Between L2 and L3 cache speeds.

de5_Roy · Feb 26, 2014

Cazalan :

i always think gddr5 instead of sram when i see those kind of bw numbers and forget the latency part. lol

gamerk316 · Feb 26, 2014

Cazalan :

What's the minimum latency? Funny how that goes unmentioned.

juggernautxtr · Feb 26, 2014

i wonder when rambus will make a return, amd said they might make video cards with it but i never saw those.

Cazalan · Feb 26, 2014

juanrga :

Why does marketing confuse you so much? Of course Intel is going to tout that Phi can run solo. It differentiates them from what AMD/NVidia have for products today that compete with it. You're focusing on a singular aspect that ultimately is not the game changer for that hardware, which is the on-package memory.

juanrga :

Yes I'm aware. Cray works with pretty much every major and minor player in the field. They still work with AMD too. Until you see a product announced it's just one of many things "in the works".

NVidia is also working with IBM to make a traditional PowerPC+GPU solution. But who knows where that will lead now that IBM is selling their fabs. There's a lot of nervous IBM'ers right now. 13,000 layoffs with more to come.

Heck Samsung just joined the OpenPower consortium. I thought ARM was the future. 😉

juanrga :

You identified a NVidia approximated 150GB/s interconnect and I showed you how it can be done. Do you need now more than 150GB/s?

Sure they will need something more elaborate than the existing Dragonfly topology, but the underlying building block is still an MGT. The MGTs aren't just scaling up in bandwidth they're also reducing power. Unless you're envisioning quantum interconnects, we only have parallel and serial buses to work with.

juanrga :

You mean to tell me that discrete circuits will become integrated circuits to reduce power? :sarcastic:

gamerk316 · Feb 26, 2014

Mantle just died:

http://techreport.com/news/26090/mantle-no-more-gdc-sessions-point-to-the-next-directx?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+techreport%2Fall+(The+Tech+Report)

Come learn how future changes to Direct3D will enable next generation games to run faster than ever before!

In this session we will discuss future improvements in Direct3D that will allow developers an unprecedented level of hardware control and reduced CPU rendering overhead across a broad ecosystem of hardware.

If you use cutting-edge 3D graphics in your games, middleware, or engines and want to efficiently build rich and immersive visuals, you don't want to miss this talk.

Driver overhead has been a frustrating reality for game developers for the entire life of the PC game industry. On desktop systems, driver overhead can decrease frame rate, while on mobile devices driver overhead is more insidious--robbing both battery life and frame rate. In this unprecedented sponsored session, Graham Sellers (AMD), Tim Foley (Intel), Cass Everitt (NVIDIA) and John McDonald (NVIDIA) will present high-level concepts available in today's OpenGL implementations that radically reduce driver overhead--by up to 10x or more. The techniques presented will apply to all major vendors and are suitable for use across multiple platforms. Additionally, they will demonstrate practical demos of the techniques in action in an extensible, open source comparison framework.

So Mantel spurred DX and OGL to get their acts together to address their overhead issues, removing the need for Mantel to exist in the first place. Which I predicted several months ago...

8350rocks · Feb 26, 2014

@JUANRGA:

Seriously...?

Look, a single GPU is orders of magnitude more power efficient than a single APU for exascale computing.

What do you not understand? 4x GPU + 1x CPU > 4x APUs.

Are you that stubborn or that ignorant? Which is it?

con635 · Feb 26, 2014

gamerk316 :

Mantle just died:

http://techreport.com/news/26090/mantle-no-more-gdc-sessions-point-to-the-next-directx?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+techreport%2Fall+(The+Tech+Report)

Come learn how future changes to Direct3D will enable next generation games to run faster than ever before!

In this session we will discuss future improvements in Direct3D that will allow developers an unprecedented level of hardware control and reduced CPU rendering overhead across a broad ecosystem of hardware.

If you use cutting-edge 3D graphics in your games, middleware, or engines and want to efficiently build rich and immersive visuals, you don't want to miss this talk.

Driver overhead has been a frustrating reality for game developers for the entire life of the PC game industry. On desktop systems, driver overhead can decrease frame rate, while on mobile devices driver overhead is more insidious--robbing both battery life and frame rate. In this unprecedented sponsored session, Graham Sellers (AMD), Tim Foley (Intel), Cass Everitt (NVIDIA) and John McDonald (NVIDIA) will present high-level concepts available in today's OpenGL implementations that radically reduce driver overhead--by up to 10x or more. The techniques presented will apply to all major vendors and are suitable for use across multiple platforms. Additionally, they will demonstrate practical demos of the techniques in action in an extensible, open source comparison framework.

So Mantel spurred DX and OGL to get their acts together to address their overhead issues, removing the need for Mantel to exist in the first place. Which I predicted several months ago...

Think this was the plan, it was mentioned before, will the dx improvements be free though? I cant even get 11.x without shelling out for windows 8. This is big step forward for budget PC gaming, happy times.

Cazalan · Feb 26, 2014

gamerk316 :

I don't think those details have been worked out fully. It is an interconnect using a packet based protocol. With read and write sizes of 16 to 128 bytes. Latency is likely on par with what you'd get from a QPI link.

Cazalan · Feb 26, 2014

gamerk316 :

You can thank AMD for twisting their arms. Someone had to do it.

juggernautxtr · Feb 26, 2014

Cazalan :

m$ will most likely turn it into an app you have to pay for or to get the next improvements have to buy a new OS.

ColinAP · Feb 27, 2014

gamerk316 :

That's a paradoxical thing to say. If "Mantel" hadn't have existed in the first place then how would DX and OGL have been spurred to get their acts together?

Another one of your predictions was that nobody would ever use "Mantel". They have. So your prediction rate on "Mantel" is 50%. Well done, give yourself a pat on the back.

palladin9479 · Feb 27, 2014

So Mantel spurred DX and OGL to get their acts together to address their overhead issues, removing the need for Mantel to exist in the first place. Which I predicted several months ago...

Just because MS will support something, in the future, that may or may not resemble MANTLE, isn't cause to proclaim it dead. Now we enter into the competition phase where competing standards will each evolve and attempt to solve the problem in different ways. Developers will try out both and over time eventually one will win out as the accepted standard and the other will fade into obscurity. Regardless of which eventually wins, the process itself will enhance the final product for the users as each competitor will be trying to create a bigger better mousetrap.

gamerk316 · Feb 27, 2014

palladin9479 :

Remember AMD is the smallest of the three major GPU OEM's, after NVIDIA and Intel. So if DX and OGL improve, even if to a lesser extent then Mantel allows, Mantel goes away simply due to market support.

Secondly, I know for a fact the improvements to DX have been in work for at least a year; you saw the first bit of the API speedup in 11.1 (Look at BF4 in Win8 versus Win7). OGL...is a bit more surprising, since the API is just so badly written at this point.

vmN · Feb 27, 2014

The thing is mantle is currently only supported by newer AMD cards, which is their biggest problem right now.
We are going through a interesting time, gentlemen.

AMD CPU speculation... and expert conjecture

Glorious

Distinguished

Distinguished

Honorable

Distinguished

Honorable

Honorable

Splendid

Distinguished

Splendid

Distinguished

Splendid

Glorious

Honorable

Distinguished

Glorious

Distinguished

Honorable

Distinguished

Distinguished

Honorable

Honorable

Splendid

Glorious

Honorable

Share this page