AMD CPU speculation... and expert conjecture

colinp · Apr 25, 2014

Palladin, it would have been much less effort to just ignore him. You have as much chance of changing his mind or convincing him he is wrong as you have of convincing a tiger to turn vegetarian.

blackkstar · Apr 25, 2014

@gamerK, juan is basically saying that games are being developed around the PCIe bottleneck by making sure that the games don't have to transfer too much stuff from system memory to video memory. At that point, a PCIe 1.0 x16 vs PCIe 3.0 x16 benchmark is useless, because the issue is being worked around.

@juan, it seems like you think that the most important part of a GPU and GPGPU is latency. I've been over this with you before. If it a task would take an hour on a traditional CPU, 30 seconds of transfer from system memory to video memory on a big dGPU and 15 minutes to finish, or you had an APU that was a third of the big dGPU but had zero time to transfer (meaning 45 minutes to finish), you'd still want the dGPU even though the latency is bad.

PCIe is a massive problem no matter how you all want to take it. If it were actually meeting the needs of heterogenous computing, Nvidia and Intel would be using it. You would have Xeons connected by PCIe because it'd be fast enough.

And it's not.

At this point, Nvidia has NVLink and Intel has QPI. AMD would normally have Hypertransport, but it's in dire need of an upgrade.

Which is why I think that we're going to see AMD come out with a new version of Hypertransport or a replacement that is either more competitive than HT 3.1 (which was last updated in 2008, by the way) or even better than QPI and NVLink.

The biggest problem with everyone who thinks APU is the only future is that if there is no fast interlink between APUs, your HPC load is going to be bottlenecked by the same things that dCPU and dGPU are once you have a workload that uses more than one APU, because one APU won't be able to access the memory of another APU.

So, even then an APU only ecosystem still needs the fast links I'm saying we need. But by the time you reach that point, you might as well be using dCPU and dGPU since they'll both be using HSA over the same links that the APUs would be using anyways. Except dCPU and dGPU don't have to deal with the problems of APUs such as:

1. GPU and CPU have different requirements from fabrication process.
2. GPU has to sacrifice density in fab process for CPU
3. CPU as to sacrifice clock speed in fab process for GPU
4. Both must contend for die space
5. Both must contend for heat dissipation (as mentioned earlier 100w CPU + 250w GPU = 350w+ total)
6. Power delivery of motherboard must also provide the power of GPU and CPU to a single socket.
7. Multiple APU system loses HSA shared memory advantage and is back to square one with multiple devices that have to talk over (currently) slow busses.

I don't know how else to explain it. To make an HSA APU, you need to make a CPU that supports HSA, you also need to make a GPU that supports HSA. If you can provide solid bandwidth and latency between dCPU and dGPU, you might as well go that route since you already have a CPU and GPU design that can do it. Except at this point, you're at a massive advantage over APU because you can just keep adding devices. Imagine HSA between 4 8m/16c CPUs and 8 Hawaii-class GPUs. All sharing one giant memory pool. It'd be a bloodbath of Phi and Nvidia NvidiaLink.

gamerk316 · Apr 25, 2014

blackkstar :

And therefore, not a problem. And before you say "but they have to compress everything", think about how much more VRAM we'd need with uncompressed textures. Or how much system RAM APU's would need. Its a tradeoff: you give up some percent GPU usage to greatly reduce your memory footprint and data you have to transfer. Software is Tradeoffs, and you can't write software without making some of those decisions.

juanrga · Apr 25, 2014

gamerk316 :

Your physics is incorrect. I already explained why. An don't forget that your beloved Nvidia disagrees with you.

gamerk316 :

Epic reading fail! What I said was that future games wouldn't be constrained by today hardware.

gamerk316 :

Wait, I didn't mention if decompression is easy or hard. I simply mentioned that game developers use a series of tricks to avoid the existent PCIe bottleneck. This is a game designed to minimize PCIe bottleneck

This is a game sensible to PCIe bottleneck

Using textures compressed by an 1:4 factor is equivalent to increasing bandwith by four, which is equivalent to migrating from PCIe x4 to x16, for which the FPS of the last game increased by 43%.

Of course future games will be more bandwith intensive and PCIe3 will be a complete bottleneck.

gamerk316 :

Also CPUs of all types work on the same basic principles, but AMD didn't release Steamroller FX CPU by the reasons that I gave before and that apply also to GPUs.

gamerk316 :

What? Replace the fast GDDR5 on the PS4 APU by slow DDR3 or eliminate the ESRAM from the Xbox one APU and leaves only the slow DDR3 and tell us about how big is the resulting bottleneck.

In case you didn't notice, AMD had initially planned a faster Kaveri APU with a total performance of 1050 GFLOPS. The original design used GDDR5 memory, but this was dropped due to a problem with the maker of the GDDR5 DIMMS and the finally released Kaveri APU is slower than initially expected and limited to using slow DDR3 memory.

By this reason GPU performance of Kaveri is between the 7750 DDR3 and the 7750 GDDR5 versions.

Any new memory kind is initially expensive and only becomes cheap when is mainstream. DDR4 will be initially expensive and we will see it first in server mobos.

Believe it or not but stacked RAM will become mainstream and cheap. It is being designed for use on phones as well.

gamerk316 :

Unlike CPU performance, which is stagnant, GPU performance is increasing almost exponentially

The Demos and games that I mentioned above don't use PhysX, but Havok. I saw the demo of a million object physics computed by the PS4 GPU. Evidently Physx couldn't do that.

de5_Roy · Apr 25, 2014

it's that time of the year again. time to make prediction on upcoming low power amd apus/socs. yay!
for puma, i am predicting:

dual channel lp/ddr3/l 1600/1866 support.
2.5ghz max. clockrate on single core, 2.1-2.2 ghz on all cores.
turbo in both cpu and igpu.
user-configurable tdp like kaveri a8 7600.

for the igpu:

lowest clockrate 300 mhz, max. clockrate 700-850 mhz.
128 gcn 1.1 cores, uvd 3.0 (or whichever is the latest one). i want to see 256 cores. but that'll prolly double the igpu area on the die.
possibly 2-4 a.c.e. at least 2.

juanrga · Apr 25, 2014

palladin9479 :

Here you have completely ignored my arguments, the math, and my corrections to your mistakes. You have also ignored the quotes of experts who disagree with you.

colinp :

I am honored by your kindly words, I am the minor of his problems. His big problem is that any engineer/scientist from Nvidia, AMD, Intel, Cray, SDHPC,... working in the processors of the future is doing just the contrary to what he claims, because he is plain wrong.

Of course, not only Intel, AMD, or Nvidia agree with my prediction about future processors, but plenty of research groups in the world do:

We envision that, around 2020, the processor chips will feature a few complex cores and many (may be 1000’s) simpler, more silicon and power effective cores.

http://www.irisa.fr/alf/index.php?option=com_content&view=article&id=55&Itemid=3

The chief of the above project is an Intel Medal prize winner.

juanrga · Apr 25, 2014

palladin9479 :

To be fair, IBM POWER is doing it infinitely better than SPARC designs on high performance applications. See the red area below? That is IBM share. See the white area? This is SPARC. It is almost dead now

The fact that you said this shows you have no clue how big iron works. It's like saying diesel trucks are dead because there aren't any in Formula 1 racing.

Most big iron systems aren't super computers. They are smaller implementations that run LoB and business software for business's. IBM and Oracle are the two choices for Financial analysis, forecasting, modeling along with sales tracking and medical databases. The typical configuration involves either IBM DB2 or Oracle RDMBS running on Power / SPARC with some form of J2EE managed web platform that runs all the business applications. The way J2EE works in managed mode is that you deploy a single administrative (can actually do multiple) instance, then as many worker installations as you want. The admin interface then deploys out the web applications into the virtual containers that run on the worker installations with the database connections using JDBC and other Java based data factories.

Take something like the T5-2. You would run three T5-2's with one T5-4. The T5-4 would be hosting the Oracle RDMBS with 512GB of system memory and 10GbE network connections, or even LACP configured across 4~8 1GbE network connection. Each of the T5-2's would also be running LACP connections. Inside each T5-2 you would deploy three or four J2EE worker installations, the T5-4 would get the Oracle RDMS and the administrative installation inside their own containers. From there you can deploy hundreds of web applications, with centralized management, redundancy and clustering into those nine to twelve worker zones. You may even through in an installation of DSEE if your wanting to use a separate LDAP for user access and control. There are also dozens of fusion middle ware applications that can be deployed that interconnect this system with other non-Oracle systems for communications. Also there will be specialized line of business applications present on the system that often need to communicate with back end web applications inside that portal. This is the reason why Oracle and IBM are designing chips with ridiculous thread counts, because these environments actually have that many worker threads present doing stuff. Single thread performance no longer matters only the total amount of I/O you can keep pumping through it.

That is what an enterprise class business portal looks like. It costs a million or so USD, most of which is the licensing for the Database and J2EE web management software. You'll want to toss in some sort of shared storage system if you don't already have an enterprise SAN, so that may or may not add to the cost. Millions of large business's across the world use a system very similiar to that to manage their business and financial data. Those quarterly sales reports that are produced, guess where they are produced at? Certainly not some financial guys desktop computer.

And since those systems aren't super computers, they will never be on a "Top 500" chart.

I mentioned how SPARC has been killed on top500 list, but I could have mentioned also how SPARC was first killed in the workstation space or how will be killed in server space soon.

It is not a "diesel trucks" vs "Formula 1 racing" analogy. My claim was more like an horse carriage vs a modern combustion vehicle.

About four years ago Ubuntu dropped architectural support for nearly dead architectures such as Itanium and SPARC

http://www.osnews.com/story/23712/Ubuntu_Drops_SPARC_IA64

I know why the T5 is a wrong design that is not going anywhere in the long run. and know that SPARC servers will be killed in some few years

http://www.fool.com/investing/general/2014/02/20/is-oracles-server-division-a-losing-proposition.aspx

griptwister · Apr 25, 2014

I do believe APUs will take off in budget server computing. Maybe even in the high end eventually once AMD get's their stuff together. But I saying APUs are going to take over the mainstream market, and calling others wrong because they don't believe the same, in and of it's self is wrong. The mainstream market likes to upgrade. What if you want your i5 and you want to upgrade your GTX 650? But you can't because it's all on one dye. Now you have to buy a whole new APU and possibly upgrade your motherboard. It makes no sense. I can however see GPU cores going on a APU to do some heavy lifting with the CPU and freeing up the bottleneck on the dCPU. Saying that APUs will replace dCPUs and dGPUs completely is an absurd claim.

jimmysmitty · Apr 25, 2014

griptwister :

At some point it is possible that a APU will replace it. Look at the CPU itself. People never thought it would become what it is. The CPU now holds the north bridge, VRMs and on certain models for Intel the south bridge. That means the motherboard is becoming nothing but a riser card to the CPU to allow to plug in peripherals. So I could see APUs taking over. Of course it would have to be cost and power efficient and as well provide the performance needed. This is probably years away though.

BTW, APUs have already taken over the mainstream market. Intel has the majority of sales, and the majority of Intel sales are LGA1150 which are all APUs. Not everyone uses the iGPU but there are more of them sold than AMDs FX series and Intels LGA2011 combined.

As for APUs in budget servers, only if there is software support. It is fine to want it but if they don't have the software support, it wont take off. It is the same reason why some people cling to their archaic hardware or OSes (i.e. XP) because it is either extremely expensive to upgrade to a new OS and software or the software was never updated to work beyond XP.

That is why when I see a company using XP or older versions of Windows Server I want to smack them upside the head.

wh3resmycar · Apr 25, 2014

to the dude that's saying APU's will replace GPUs, that only holds true if videogame graphics will not move forward. we're a good 20 years early for real time ray traced video games running at 720p (approx), good luck with whatever you're smoking.

hell APUS wouldve replaced dgpus already if we never moved forward from quake3.

juanrga · Apr 25, 2014

griptwister :

But here you are ignoring the history of computation. The evolution of computing has always been towards integration. Do you know that some years ago the floating point unit was an external coprocessor attached to a special socket?

By introducing it inside the die, you gain in performance, reliability, and the cost is lower, but you cannot upgrade now the FPU. If you want a more powerful FPU you have to upgrade now the whole CPU (e.g. SandyBridge --> Haswell).

Do you know that in the past the memory controller was on a separated chip outside the CPU? AMD introduced it on die and Intel followed them. This integration means more performance, reliabilty, and low cost, but now you have to purchase a new CPU/APU if you want to use some new memory technology. E.g. an i5-3570k or an A10-6800k don't support DDR4 memory and never will do.

The complete integration of the GPUs inside on the same die is a fact, as Nvidia research Team agrees. You can believe that it is an absurd claim but it is based in the laws of physics.

wh3resmycar :

What part of the future APUs will be much more faster than any dGPU was not still understood? Do you believe that engineers from AMD/Intel/Nvidia... are smoking the same? That is truly amazing claim!

juanrga · Apr 25, 2014

blackkstar :

A GPU/GPGPU is a TCU and those are rater insensitive to latency. This is why a GPU memory controller is optimized for bandwith, not latency.

blackkstar :

I agree.

blackkstar :

There is an important difference. The GPU works as a coprocessor to the CPU. Thus you need to offload the computation to the GPU, and move the result of the computation back to the CPU. This is why an interconnect CPU--GPU is slow and inefficient, and rejected for exascale compute.

With APUs you don't need to offload the computation from one APU to another. The computations are local, and only in some few cases you need to transfer data from one APU to another using the interconnect.

This is the reason why Nvidia will first use NVLINK to connect a CPU to a dGPU, but latter will use NVLINK only to connect their ultra-high-performance APUs. In fact the interconnect between CPU and GPU cores inside the ultra-high-performance APU has to be so fast that Nvidia is not using a traditional bus system as in current CPUs/APUs.

AMD follows a similar approach to Nvidia. The internal docs show that they use a 100GB/s interconnect for the APU--APU connection. For the sake of comparison Hypertransport peaks at only 26GB/s.

I doubt that AMD will improve Hypertransport. I believe it is death. For Seattle SoC AMD is taking a different approach with the acquired Freedom Fabric.

blackkstar :

1. No. AMD has have problems because initially chose wrong SOI process, but their move to bulk and next to FINFETS correct that.

2. No. AMD problems are the result of their use of automated design tools.

3. No. Again this is exclusive to AMD designs.

4. True for current APUs, but irrelevant for future CPU:GPU ratios as explained before.

5. True for current APUs, but irrelevant for future CPU:GPU ratios as explained before.

6. No problem with providing energy to a 300W socket.

7. Not a real problem because HUMA is relevant for computations made on the same data. CPU and GPU on same die will be working in same data but an APU will be not working in data from another APU very often.

Yes, if one can invent a CPU-GPU interconnect that provides solid bandwidth and latency between dCPU and dGPU at exascale level then a traditional dCPU and dGPU architecture would be superior. The problem is that magic interconnect cannot be invented by the same reason no enginner can invent a perpetual motion machine or a car that breaks the speed of light limit.

The laws of physics favour an APU solution. I already explained why and give some basic data like the wire to compute ratio. All the engineers know the laws of physics. This is why all the engineers working on exascale are developing APUs, despite some people here negating it as hell.

juanrga · Apr 25, 2014

AMD tweet: "Something SMALL is coming and it’s BIG news! Can you guess what it is? Click for a hint!"

Mobile Kaveri I guess

Cazalan · Apr 25, 2014

Do you even read your own links? LoL

"Oracle Engineered systems grew by double-digits in the second quarter of fiscal 2014. Revenue from the company's high-end SPARC SuperCluster servers also managed to grow by triple-digits."

"So, don't hold your breath waiting for Oracle to sell its server division any time soon."

Oracle is 175B company. SPARC isn't going anywhere anytime soon.

griptwister · Apr 25, 2014

I believe APUs may replace dCPUs to some extent. But external GPUs will still be around I think. Also, @juanrga's tweet:

https://ad.moontoast.com/7e9f5f0e-c4ca-11e3-927b-123139027524?cmpid=social_20140423_22545984&optimize=direct

Cazalan · Apr 26, 2014

juanrga :

Looks almost exactly like the Kabini die, so Beema/Mullins.

palladin9479 · Apr 26, 2014

Cazalan :

It took them a few years to undo the damage Sun Microsystems did. They were a good company but made too many bad business decisions and didn't have a focus. Oracle bough them to get a hold of Java and to use them as a platform for enhancing Oracle RDMBS, they later bought BEA Weblogic Services and turned it into Oracle Weblogic Services (OWLS). So now they own a hardware platform, an incredibly popular pseudo language, a high end scalable database engine and an enterprise focused web application platform. They have control over all the parts necessary to fully vertically integrate. They even own an LDAP server (DSEE) along with an entire Unix OS's. That's why your seeing sales expansions, you can now build an entire system from one vender and all the components are from the same people and designed to work together efficiently. It's similiar to how MS use's AD, Exchange, Sharepoint and Office with the Windows OS. The really cool thing is you don't need to use SPARC for everything since Solaris runs on x86 just fine, if you don't need the massive I/O capabilities you can use cheaper x86 box's for your OWLS front end services. Though you still want the high end SPARC running your RDMBS, well if you have a large amount of transactions that is. That and ILOM + PSH is worth the extra expense on something that critical. Something that would bluescreen anything else won't phase a modern SPARC system. We had this happen a few months ago, some malfunction in an onboard USB controller was sending out bad voltages on one of the PCIe lanes. The system detected it and shut down both the PCIe lane and the USB controller, notified us that something was wrong and did a soft kernel reset. We put in a support request and the Oracle engineer came out and we swapped out the board and continued forth. The system was still available and servicing requests even when part of it's hardware was in a degraded state. Stuff like that isn't needed in most cases so it gets left out of cheap commodity servers.

I see future dCPU's having the SIMD array off the graphics core acting as an onboard local coprocessor. It would act like a super FPU and process those instructions that need a fast turn around time. They won't be replacing dGPU's as integrated won't ever approach discrete in raw computational power. And lol at juan's x87 reference he took from what I posted earlier. Guess he forgot that the Xeon Phi exists. It's an external floating point co-processor, aka external FPU. The Xeon Phi 7120P has 16GB of memory with 320GBps bandwidth and has a single precision performance of 2416 GFLOPS and double precision of 1220 GFLOPS. It use's 300W of power and is basically a gigantic external FPU designed by Intel. The Phi smokes any integrated FPU to and include the SIMD array on AMD's APU. Any technology advancement that would enable that level of performance on the APU would enable 4~5x that performance on an external co-processor. Again the only the only advantage to the integrated solution is when the total number of instructions have an execution time lower then the latency on the external bus. Once the execution time exceeds that latency, its faster to execute it externally.

de5_Roy · Apr 26, 2014

Game consoles spur AMD's x86 processor market share
http://www.pcworld.com/article/2148620/game-consoles-spur-amds-x86-processor-market-share.html

The latest Project CARS trailer looks spectacular
http://techreport.com/news/26374/the-latest-project-cars-trailer-looks-spectacular
windows version will support mantle api.

catalyst 14.4 out of release candidate
http://www.techpowerup.com/200230/amd-catalyst-14-4-whql-released.html

juanrga · Apr 26, 2014

Cazalan :

long-run != "soon"

griptwister :

And who will make those external GPUs? The red company that claims is transforming itself into a Soc/APU company? The blue company that claims that will kill external GPUs? (Some people expect this to happen by 2015 with Skylake APU). Or the green company whose GPU research team wrote this:

In this time frame, GPUs will no longer be an external accelerator to a CPU; instead, CPUs and GPUs will be integrated on the same die with a unified memory architecture.

Just say me: the red, the blue, or the green company? Or some other?

Cazalan :

I think you are completely right!

juanrga · Apr 26, 2014

palladin9479 :

Sorry, but you are guessing wrongly. I know the current Phi co-processor card. I also know that Intel engineers have made the math and agree with me

Xeon-Phi-Knights-Landing-GPU-CPU-Form-Factor-635x358.png

The future is not about discrete cards. As I said before the laws of physics are the same for Intel, AMD, and Nvidia. And Intel Phi engineers are the first to take a step in the right direction towards the future.

truegenius · Apr 26, 2014

blackkstar :

and world is not limited to hsa applications only, gaming is something which don't benefit from heterogeneous (graphics part)
here we have 2 options, either remove communication delay for which we have to decrease the shaders count or other option is dGPU in which you can do with little lower communication but does not need to reduce shaders
either way, we are creating certain bottlenecks, and we have to decide which matters more for waht purpose
and for gaming, more shaders is what matters thus dGPU matters for gaming and won't die

de5_Roy :

i am waiting for some predictions of steamroller fx cpu, or will we see any steamy, its already over 9000

(post count of this thread)

griptwister :

point
this is the main thing that everyone consider before buying any pc, that is "future proofing/upgradability"

jimmysmitty :

moving northbridge to cpu die makes sense as we still can't upgrade these chips on our board as they are soldered

and if you shift the igpu of post intel lga1150 socket cpus to board then it won't matter as people buy these cpus because they perform good and they use dGPU thus don't need any igpu. And for office use igpu on board will be enough because it will provide display without adding dgpu

de5_Roy · Apr 26, 2014

truegenius :

puma launch is closer.
the closest thing to an sr cpu would be a die-harvested kaveri. right now, i don't know how amd will brand those cpus since they moved athlon and sempron brands to kabini socs in dubiously titled "AM1 platform". my prediction is that we might see another shift in apu/cpu/soc model numbering soon, in q3 or later.

there is a certain amount of demand for amd hedt platform, but i don't see how a possible hedt platform fits into amd's present roadmaps. in terms of standards, ddr3 is near end, pcie 3.0 is already in kaveri, gddr5 is near eol, ddr4 will come in with haswell-e, widespread demand and propaganda of lowering cpu overhead in games make a new cpu launch moot. carrizo and seattle will have ddr4 support, so i am waiting to see what happens after carrizo. however, the way kaveri is designed and priced, it is kind of amd's high end mainstream cpu regardless of how it performs with current softwares.

truegenius :

futureproofing is a myth. busted many, many times.

juanrga · Apr 26, 2014

truegenius :

You don't need HSA for "heterogeneous computing". HSA is only a specific approach to heterogeneous computing. Gaming benefits from heterogeneous computing. This is why new consoles such as PS4 are designed around that. Precisely the GPU part of the PS4 APU is a custom design optimized for compute tasks.

But even ignoring new trends in game development. PCie is a problem. I already showed a benchmark where a common game suffer ~40% decrease in framerates by changing the PCie slot by one slower. As mentioned above, game developers design and limit the current games to fit within current PCie limits. This doesn't mean that future games will do.

Gaming dGPU will be killed in some years and replaced by faster gaming APUs. ;-)

Cazalan · Apr 26, 2014

And yet you can still get pretty good scaling up to quad SLI. When you can fit 4 Hawaii GPUs and an i7 on a single die I'll worry about PCIe speeds. And you can always combat that bandwidth limit by increasing the local memory size.

4x Hawaii is 24.8 billion transistors
Haswell GT2 is 1.4B

26.2 Billion transistors. Yeah that's going to be a long ass while to fit on a single die. Maybe at 7nm, a full 3 nodes ahead of where we are now..

wh3resmycar · Apr 26, 2014

this.. it's sad that some folks here are still justifying Piledrivers to people. and i pity the folks who actually fall into this trap. the fake security of having 6-8 cores "since consoles are using an 8 core AMD chip" and "games are starting to use MOAR COARS", then you'll be stuck with a platform with no upgrade path.

AMD CPU speculation... and expert conjecture

Honorable

Honorable

Glorious

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Champion

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Splendid

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Share this page