AMD CPU speculation... and expert conjecture

gamerk316 · Jul 28, 2014

cdrkf :

gamerk316 :

For APU's to grow they will needed dedicated memory on die, no doubt. I think that is likely to happen sooner rather than later, after all Kaveri would match a 7750 for graphics oomph if it just had enough memory bandwidth!

So, considering how much of the die is ALREADY reserved for, what, 100MB of local cache, you now want to also put 8GB on there? Because at that point, you're talking moving main memory onto the die, which simply isn't cost effective OR efficiency effective.

griptwister · Jul 28, 2014

Fidgetmaster :

:lol::lol::lol: I'm laughing so hard right now! Thanks for sharing this! Those poor VRMs!

I'm really hoping for some decent cores to come out this year from AMD. I'm really curious to see if they're making any modifications. I also remember seeing that closed loop FX product that Roy posted on Twitter. So I suspect another FX product for FM2+. Possibly FM3.

I got some really cool case mods planned for my PC. Hopefully going to build an AMD theme'd rig. I'm really hoping to see a EATX AMD motherboard so I can fill up my rig. (Would it be bad if I put an Intel CPU in a AMD theme rig?)

@gamerk316, I recently purchased a Roccat Kave 5.1 XTD headset. It's really good for directional hearing, But I'd like to eventually get a headset for music and playing games that have crisp sound. Any sub $200 headphones that are good? I feel like I should note, I purchased a Razer Kraken 7.1, I had to turn it in the next day it was so bad. Pure digital sound is just terrible.

gamerk316 · Jul 28, 2014

@gamerk316, I recently purchased a Roccat Kave 5.1 XTD headset. It's really good for directional hearing, But I'd like to eventually get a headset for music and playing games that have crisp sound. Any sub $200 headphones that are good? I feel like I should note, I purchased a Razer Kraken 7.1, I had to turn it in the next day it was so bad. Pure digital sound is just terrible.

Depends on use; headphones that excel in gaming can stink in music/movie watching, and the reverse as well. My master list:

http://www.head-fi.org/a/headphone-buying-guide
http://www.head-fi.org/t/534479/mad-lust-envys-headphone-gaming-guide-update-7-9-2014-ultrasone-hfi-15g-added

Currently using an Audio Technica M50s right now; doesn't do anything bad, but doesn't do much great either. Solid headset for the price.

Now back on track people! 😛

8350rocks · Jul 28, 2014

esrever :

gamerk316 :

The way of thinking is quite archaic, anything you can do with memory on a gpu you can do on an apu. Interconnects are pretty much always bottleneck, which is why everything is being integrated on die. Anyways, by the time we have stuff like this, the whole chip would be different. We would be able to stack high speed ram on die by then I would hope.

Freedom Fabric from SeaMicro called (AMD owns SeaMicro), it said interconnects are no longer an issue....

gamerk316 · Jul 28, 2014

Ananad was the last site to take a serious look at this issue:

http://www.anandtech.com/show/5458/the-radeon-hd-7970-reprise-pcie-bandwidth-overclocking-and-msaa

Note these are PCI-E 3.0 speeds, so PCI-E 3.0 x8 = PCI-E 2.0 x16.

There's virtually no difference between x16 and x8; <.5 FPS on average. Only two titles show significant performance reduction for x4 (Dirt and Shogun, interestingly. Not Crysis, BF, Metro, or Civ.). Hell, did anyone have x2 doing remotely that well in every title aside fro Dirt and Shogun? Even PCI-E 2.0 x8 is still barely moving the needle downward in most titles.

Point being, bandwidth is not, nor will be, an issue. And if it does, they'll just come up with a faster bus.

Cazalan · Jul 28, 2014

Yes it's incredibly premature to judge the APU/dGPU paradigm when we don't even have the first dGPU w/3DRAM yet.

HSA/Mantle has benefits but they're rolling out like molasses. I can't procure a laptop at work because of potential future benefits. It has to have benefits today.

I expected the first 3DRAM to be deployed on smart phones first, because they don't need a lot of RAM. And they benefit the most from low power. Samsung has one in the queue but it hasn't been released yet.

Cazalan · Jul 28, 2014

Warsaw still getting design wins. Seems to be due to the higher 64GB memory available.

AMD Opteron 6338P - 12 Core (2.3GHz/2.8GHz)

http://community.amd.com/community/amd-blogs/amd-business/blog/2014/07/28/amd-opteron-processors-chosen-to-power-11-internets-dedicated-xl-servers

juanrga · Jul 28, 2014

gamerk316, I am comparing future APUs and dGPUs on same node. Everyone knows that the interconnect will be the bottleneck for the time frame. There is even strong consensus in the whole industry about the magnitude: 10x. Again I am no sure what you pretend by quoting Anandtech review of PCIe 3 impact on current games FPS.

8350rocks, FF didn't say what you pretend. In fact AMD will use APUs for its exascale project, because interconnects are the big problem, as Papermaster mentioned in 25x20.

In general, I recommend to some of you to study the difference between present tense and future tense. When I mention APUs, silicon, and interconnects of year 2020. I don't mean Kaveri A10, neither PCIe3 slots, nor 28nm bulk.

juanrga · Jul 28, 2014

Excellent post by IDC in another forum, explaining why SOI is almost dead. I copy and paste content:

Unfortunately SOI isn't "all that", and that is the bottom line reason why we don't see wide-spread adoption and implementation of SOI in HVM across the industry.

The only businesses that tout the advantages of SOI are those with a clear and present conflict of interest (shareholder and personal employment wise) regarding the very real prospect of SOI itself becoming a thing of the past.

Intel, TSMC, Samsung, Apple, Qualcomm and every other financially sound IDM, foundry and fab-less business only stand to benefit from SOI if SOI actually did deliver on the hype. None of them stand to benefit from ignoring it or riding rough-shod over it.

So, given that the smart-money in business has opted to relegate SOI to the sidelines, this suggests that the marketing promises of SOI proponents are a tad hollow and miss the mark.

He is an actual foundry engineer. I love how he openly mentions that those promoting SOI have a hidden agenda.

juanrga · Jul 28, 2014

Cazalan :

There is nothing premature on taking design decisions based in well-known results of research.

In fact, those results are the reason why ATI merged with AMD. ATI already knew in 2006 that discrete cards would dissappear and that its only way to survive was merging with a CPU maker to developing integrated graphics. Nvidia knows the same and this is why tried to obtain a x86 license since 2006. It failed, had to cancel/delay its x86 APUs projects and obtain an ARM license.

8350rocks · Jul 28, 2014

Considering that they acquired SeaMicro AFTER they stated APUs for exascale, and the Freedom Fabric interconnects are capable of >10Tb/s more efficiently than anyone else does it...their logic perspective could be entirely different now Juan. Did that thought ever occur to you?

-Fran- · Jul 28, 2014

juanrga :

One correction, Juan. AMD bought ATI, it was not a merge. It has a very different implication from what you're saying.

Cheers!

8350rocks · Jul 28, 2014

juanrga :

Hmm, considering we see real world results in clockspeeds from PD to Kaveri, and those performance costs ended up all but completely negating any process advancements and IPC improvements in the uarch...you still think SOI is bound for the way out huh?

Perhaps you should actually read some of the design engineer perspectives about thermal properties for FD-SOI and the fact that most consider it a necessity to even achieve FinFET past ~12-14nm.

The substrate for bulk will not hold up...even Intel has admitted as much, they just cannot afford to retool to run SOI with the way they are on the ropes for profit these days...they are hemorrhaging money like crazy, and cannot afford a new fab.

I can also guarantee you if AMD still owned GloFo, it would be on FD-SOI at this point, and it would not be an issue, in fact, who is to say we would not have seen a 4.5 GHz Kaveri flagship with better thermal properties and superior leakage versus the bulk chip we have now...?

I for one predicted it would pan out exactly as it did back when the news was breached about bulk. Go back and read it...I told you the clock speed losses would essentially negate the process advancements in terms of performance. Look what we got Juan...a chip that is a full 10% lower max clock speed with ~12-14% advancements in uarch design. It even under performs compared to overclocked Richland parts in some areas...so...take your "SOI is dead" garbage and sell it to someone who is not reading the news on that front and following substrate trends closely.

Frankly, if something does not give in substrate soon, the race will end about ~6-8nm, and that will be that...nothing more to see...until someone comes up with something revolutionary...FinFET on FD-SOI is the ONLY hope that industry has past ~7nm

palladin9479 · Jul 28, 2014

esrever :

No.

For the same reason that dGPU's will never go away, transistor budget. Whatever you can fit onto or near a CPU, you can cheaply fit 16~32x the amount on a DIMM or other moulder ram technology. You put 1GB of memory onto the CPU, you can fit 16~32GB of memory on a system buss. You put 8GB of memory on /near the CPU using super sayen technology, then you can fit 128~256GB of it on a local system bus. Dedicated peripheral cards also can fit even faster memory, typically 8~16x of whatever you can fit onto or directly attached to the CPU. So if you put that 1GB of shared memory on die, then you can put 8~16GB of high speed dedicated memory on the GPU. This quickly becomes a game of "whatever you can do, I can do better!" with the discrete solution always being at least an order of magnitude ahead of the integrated solution.

What you will see is that integrated solutions will cannibalize the low end markets, things like the 820/830 are already nearly obsolete and the 840 is coming very close. Middle tier holds it's own because it has access to more expensive high speed dedicated memory while integrated solutions must deal with lower cost shared system bus's. Discrete sound cards died off because there is only so much you can do to create a 44100hz two channel audio signal going to a set of trashy walmart speakers. As others have pointed out, those who actually care about their audio quality go with more exotic solutions like external DSP's. Hell my entire living room is powered by a Yamaha RX-V3800 which was something like $1600 or $1700 when I first purchased it. It does all my signals processing via HDMI or SPDIF. Side note, I think Sony was seriously f*cked in the head when they abandoned the SPDIF interface for HDMI, many of us want to send uncompressed multi-channel audio to our DSP's from the sound sources.

So if people are willing to spend serious change on high end audio equipment for a more fuller experience, don't you think they would also spend equally if not more money on high end graphics equipment?

gamerk316 · Jul 29, 2014

juanrga :

1: AMD brought ATI. It wasn't a merger.

2: NVIDIA wanted an X86 license because they were scared shitless (rightly so) they'd be left out in the dark as CPU performance increased. Remember, this was back in the days where everyone thought the Pentium 4 would clock up to 10GHz. Things have changed somewhat since then, mainly gpGPU and the rise of ARM, which made this less an issue for NVIDIA.

So what we have here if you taking something that happened and twisting the reasoning to fit your own pre-conceived notions.

gamerk316 · Jul 29, 2014

gamerk316, I am comparing future APUs and dGPUs on same node. Everyone knows that the interconnect will be the bottleneck for the time frame. There is even strong consensus in the whole industry about the magnitude: 10x. Again I am no sure what you pretend by quoting Anandtech review of PCIe 3 impact on current games FPS.

Nope. We have literally double the bandwidth we currently need, and that's even considering the 20% performance penalty due to PCI-E's encoding scheme. We've got plenty. You'd need memory usage to DOUBLE to stress PCI-E 3.0 x16, and while I expect memory usage to go up as things move to 64-bit native, I'm not expecting a doubling of graphics memory. So PCI-E 3.0 x16 is plenty for at least the foreseeable future (2020 or so).

Cazalan · Jul 29, 2014

On what hardware is there this well-known research? What drivers? There's only so far you can go with simulations before you need real physical hardware. There's just projections on top of projections on process nodes that haven't even begun to ramp yet.

Cazalan · Jul 29, 2014

gamerk316 :

PCI-SIG isn't sitting idly by either. PCIe 4.0 (twice the bandwidth of 3.0) is due 2H 2015 with cards showing up in 2016. In addition to the copper specs which they've said could double again (2019/2020), they've already been tasked with making a Thunderbolt competing optical PCIe.

https://www.pcisig.com/developers/main/training_materials/get_document?doc_id=b6de0327f73985548b986b7c6501ebe55fad804f

Avago already made an optical gen 3 expander. With a connector the size of an ethernet jack.

http://www.avagotech.com/pages/optical_pcie_gen3/

A future dGPU could come self contained in it's own box and sit on the shelf like any eSATA or USB3 drive does today. Giving rise to better board layouts and cooling options.

jdwii · Jul 29, 2014

Like i stated earlier APU's are 6 years behind in performance as it stands now if not more. A I7 920 is Superior to a APU in CPU tasks and a 4870 ATI video card is actually still faster to just doesn't have directx 11. Right now it seems like its impossible.
Including what was already discussed quite well by gamer and palladin the amount of transistors that can fit on a single piece of silicon is limited and with APU's the GPU has to share resources with everything else.
Juan used the example of the FPU being integrated into the CPU however this example is quite silly as it isn't nearly as big in comparison to a GPU that is around 12 Billion transistors alone in extreme cases. If you also include the CPU and memory into the mix you have a lot of heat into 1 chip that would easily today consume 400-500 watts(actually 500+) the only way to cool that is with a water cooler.
Once again i can't stress this enough we are not done pushing graphics in games with 4K gaining share and multi displays becoming more common and with Ray tracing becoming more a thing its really going to be interesting, i actually have doubts that we will be able to make reliable gaming at 4K with multi monitors with games using Ray tracing with 5nm silicon and i have more doubts that will be done on 1 piece of silicon.

esrever · Jul 29, 2014

palladin9479 :

esrever :

No.

For the same reason that dGPU's will never go away, transistor budget. Whatever you can fit onto or near a CPU, you can cheaply fit 16~32x the amount on a DIMM or other moulder ram technology. You put 1GB of memory onto the CPU, you can fit 16~32GB of memory on a system buss. You put 8GB of memory on /near the CPU using super sayen technology, then you can fit 128~256GB of it on a local system bus. Dedicated peripheral cards also can fit even faster memory, typically 8~16x of whatever you can fit onto or directly attached to the CPU. So if you put that 1GB of shared memory on die, then you can put 8~16GB of high speed dedicated memory on the GPU. This quickly becomes a game of "whatever you can do, I can do better!" with the discrete solution always being at least an order of magnitude ahead of the integrated solution.

What you will see is that integrated solutions will cannibalize the low end markets, things like the 820/830 are already nearly obsolete and the 840 is coming very close. Middle tier holds it's own because it has access to more expensive high speed dedicated memory while integrated solutions must deal with lower cost shared system bus's. Discrete sound cards died off because there is only so much you can do to create a 44100hz two channel audio signal going to a set of trashy walmart speakers. As others have pointed out, those who actually care about their audio quality go with more exotic solutions like external DSP's. Hell my entire living room is powered by a Yamaha RX-V3800 which was something like $1600 or $1700 when I first purchased it. It does all my signals processing via HDMI or SPDIF. Side note, I think Sony was seriously f*cked in the head when they abandoned the SPDIF interface for HDMI, many of us want to send uncompressed multi-channel audio to our DSP's from the sound sources.

So if people are willing to spend serious change on high end audio equipment for a more fuller experience, don't you think they would also spend equally if not more money on high end graphics equipment?

Thats like saying off die caches will never go away. One day you will be able to do TBs of memory close to the die. It doesn't matter. Sure you could probably get 128MB of cpu cache if you made a die just for it but nobody does that any more because it is not efficient. Who cares what you can do right now, if this is a discussion about technology in 10 years then things evolve. What do we need faster cpus anyways, the current ones are plenty fast. Who know what future APUs will be, maybe we put 90% of the transistor budget to the parallel compute engines and 10% to serial cores, what would the difference be then between a dedicated card and an APU? There really isn't any limit to what the future holds.

Anyways I would never trust anyone who would call themself an audiophile because audiophiles buy dumb stuff all the time, they buy $1000 cables just to hear placebo effects. Sure the built in soundcard isn't as good as a dedicated one and those aren't as good as a external amp but its still all the same, people don't need more and the products are a niche now.

cdrkf · Jul 29, 2014

gamerk316 :

Well I really doubt you'd need 8gb for something equivalent of an HD7750. 1gb on die would be a nice sweet spot....

Point is though what I'm talking about is *future* tech. Stacked memory would allow you to more seriously look at included a high bandwidth memory package on die for quite a large amount of ram. You say that it isn't cost effective, however I'd argue it's a case of *it isn't cost effective yet*...

If we look back at processor design the amount of external components now on die: The FPU (maths co-processor for anyone as old as me), level 2 cache, level 3 cache (super socket 7 + K6-III) PCI controllers, USB controllers, PCIe controllers, the entire north and south bridges.... Why is graphics and main memory so sacred that they cannot ever be integrated? Eventually the number of transistors available on die reaches such a point that including these things becomes inconsequential.

I honestly think if you want back to the early 90's (thinking first 32 bit processors- 386 / 486 era) and showed people a schematic of a current CPU they'd think you were crazy...

cdrkf · Jul 29, 2014

jdwii :

My problem with the "i7 920 outpaced APU for CPU" argument is this: All the current i3 / i5 / i7 are 'APU' type chips. And there is no denying that the Haswell i7 mops the floor with the 920 *despite dedicating a large proportion of its die to graphics*. Intel are also developing shared memory (but aren't there yet).

The point is that AMD have pushed on graphics hard because they have some top notch graphics tech and sadly a rather inefficient CPU uArch at the moment. Intel are still developing their graphics tech (which they got a leg up from Nvidia with) and are ahead in CPU so the balance of their 'APU' is different but the underlying principal is the same. Remember Intel also support OpenCL for compute and their iGPU is actually quite effective at that. They're also adding in fixed function logic for specific functions (e.g. Quicksync).

AMD coined the phrase, but the design approach applies to whoever you look at. One thing I think we can safely say now- dCPU is a thing of the past. Moving forward all CPU will include a graphics component. It isn't such a stretch to think in the future the APU approach will actually become more efficient. As I said previously part of what is wrong is that at the moment the APU is actually still pretty CPU heavy (even for AMD). If you look at how many shaders you can saturate per CPU core (easily over 1000). Now when transistor budgets increase with smaller nodes, it becomes more viable to balance the APU better to achieve optimal efficiency. On die memory would also be required.

This doesn't mean the dGPU will go away any time soon, however it's going to become more and more niche. There will be *no need* for a dGPU less powerful than the R9 270 as I can quite believe iGPU will scale up to that.

palladin9479 · Jul 29, 2014

esrever :

Umm you do realize we still have off die cache's ... we call it system memory. After that is slow long term storage, HDD's. We have since created a memory hierarchy that is as follows.

L1 cache
L2 cache
L3 cache
System Memory
SSD storage
HDD storage

The fastest memory possible is the L1 cache which is only one step away from physical CPU registers, it's also the most expensive in space utilization. After that is L2 and L3 cache. We then move off the CPU die and can utilize different transistor technology for modular memory which is slower but far bigger. Once we're done with memory we move onto the non-volatile storage mechanisms which have the greatest storage density.

Just to illustrate how obscene your comment is I will use a common power setup.

i7-4790K
Total L1 cache: 64KB per core: 256KB total per chip
Total L2 cache: 256KB per core: 1MB total per chip
Total L3 cache: 8MB

System Memory: 16GB DDR3 (8x2)
SSD: 256GB
HHD: 2TB
GTX 780 (3GB GDDR5) GTX 880 (4~8GB GDDR5)

Notice the insane leap of sizes from 8MB of L3 cache to 16GB of system memory and 3~8GB of dGPU memory. So even assuming you could get 256MB of on die memory using today's technology, your still 64 times lower then what the user had in system memory and 12~16 times lower then that same power use's graphics memory.

Congrats you can now load a small linux kernel with nothing else happening. Once you put 1GB of on-die super memory, your now talking a system with 64GB of memory and 12~16GB of fast graphics memory. This would all be with the same technology. You can never catch up.

Using your exaggerated example of DNA memory. 1TB of magical on-die memory would have 64TB of system memory at 12~16TB of fast graphics memory. So while "ONE TERABYTE" sounds big in today's terms, it's awfully small when contrasted to what else will be available.

de5_Roy · Jul 29, 2014

cdrkf :

high end, high performance gpu is not a low hanging fruit. that's why. by the time one integrates a "high performance" gpu by today's or future standard, an even higher performance gpu will be made possible by the same tech, so the integrated gpu won't be "high end, high performance" anymore.
it sort of looks like you're getting emotional about what cannot or should not be put on die. before... pre-65nm i think, fabrication wasn't about trade offs. that was the p4 era iirc. now, you can put 7-8 Billion transistors on a big die but can't run it without trading off clockrate or thermals (heat generation, temperatures i.e. physical limits). power management and turbo help, but even those have limits and are subject to design limitations as well. this is why everyone goes after the low hanging fruit first. current hsa, mantle, apu, pcie integration, fivr all gradually become low hanging fruits. but a high performance gpu is a very different thing. a gpu, by itself is a standalone asic with it's own processing units and memory hierarchy. you're not integrating an fpu or pcie controller, you're integrating a full blown asic. heterogenous computing techs make the gpu usable for general purpose. when you're running something like a 7850k (~248mm^2 die, 3.4B trsnsistors?), even in current paradigm you're switching on a large portion of the i.c. on load whereas on a cpu you'd be switching on a far smaller portion of the i.c.. keep this in mind because it'll become vital very shortly. also include the possibility that the soc has very good power management to prevent it from overheating (real time load calculation and balancing). now imagine the "big" soc with the high perf. cpu and igpu you prefer: imagine how much of the i.c. you'd be turning on on load(on gpus, more cores are in use due to paralles processing), how far higher the heat generation would be per area (remember how it affected ivb and haswell, but much bigger impact) on load. heat generation per area won't go down much because you'd be packing more transistors per area and then switching them on. current, better process tech can reduce leakage OR improve performance at the expense of power use. in the latter case, you'd power use and heat generation issues. in the former case, you won't be getting the high performance you'd expect from putting together a high perf. cpu and gpu on die but you'll get lower power use. and those are after you've fixed yield issues - which have plagued every foundry with each shrink. this is where hsa comes in - going after the current low hanging fruit of software and coding overheads.
oh, and adding components on die adds to total cost, so there's economic concerns as well.

cdrkf · Jul 29, 2014

de5_Roy :

If you read my statement I'm not the one declaring the imminent death of dGPU... However the iGPU is going to extend further and further up the performance scale in relation to dGPU moving forward. The point is that dGPU tech is going to become more and more niche. That *isn't* the same thing as saying it will die out. One thing I do predict, a combined APU *will become more efficient than separate components* once the correct balance of CPU and GPU resources is reached. In that context I think APU type chips will replace the 'dGPU as accelerator' cards for HPC workloads. Gaming / graphics rendering isn't really the same thing, as it is so specific that having a single ASIC with 100% transistors dedicated to that job is still going to be the better option overall.

The other thing to remember, modern GPU's are built in blocks. Having a large iGPU doesn't require you to turn it all on at once- each block will be individually gated and only used when it's needed (this is all current tech). The new consoles have already shown it's quite feasible to put a mid range dGPU on die with the CPU.

AMD CPU speculation... and expert conjecture

Glorious

Distinguished

Glorious

Distinguished

Glorious

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Glorious

Distinguished

Splendid

Glorious

Glorious

Distinguished

Distinguished

Splendid

Splendid

Judicious

Judicious

Splendid

Splendid

Judicious

Share this page