AMD CPUs, SoC Rumors and Speculations Temp. thread 2

Vogner16 · Jan 4, 2016

palladin9479 :

"ideal world" will software ever really get the full potential of this setup? your proposing dual graphics where the slave card "dgpu" is much more powerful than the master card like what was proposed to be supported with dx12, but what is gained by this is minimal compared to what is proposed to be gained by this because software is to far behind. the ideal solution is not a dual setup of anything but one single chip with transistors to do every calculation within a near 0 latency distance of each other.

to propose that an APU will be able to fit those 18 bil transistors to meet highend apu spec would require a die to be physically massive like seen on ibm power cpu's. this physical implementation of the ideal APU would be expensive to make and man while also requiring new motherboards coolers and educating the public how to build with this radical all in one APU. no ram slots proper cooler selections how to mount heights over hbm and die etc... BUT in theory could be as fast as the top end separate cpu dgpu or your proposed "slave and master gpu" system and cost less in the long run to make.

key component here is cost less to make. if amd does a study to find that a 4000 core apu monster is faster than a 3800 core dgpu and cpu setup but cost as much as a 5000 core dgpu and cpu then they will lose money to build it and wont waste their time, despite how cool it would be to see that tech become available.

I think its just to expensive to build at 16nm 10nm and maybe even 7nm.

gamerk316 · Jan 4, 2016

"ideal world" will software ever really get the full potential of this setup? your proposing dual graphics where the slave card "dgpu" is much more powerful than the master card like what was proposed to be supported with dx12, but what is gained by this is minimal compared to what is proposed to be gained by this because software is to far behind. the ideal solution is not a dual setup of anything but one single chip with transistors to do every calculation within a near 0 latency distance of each other.

Stop it please. CPUs are great at doing complex decision making of one thing at a time, GPUs are great at doing massively parallel vector equations. They are two fundamentally difference architectures, and trying to combine then under one roof will result in sacrifices to both.

Yes, in terms of FLOPS, a GPU crashes a CPU. But if you had a GPU do CPU type logic, even in a parallel situation, the GPU would be several hundred times slower. The reverse is also true; CPUs can't scale to handle the type of work GPUs excel at.

And stop blaming software; I've been trying to explain since 2009 why people who don't work the field are wrong about how you think software works, and have explained, many times, who software doesn't, and never will, scale beyond a few cores. In the simplest term, because no two cores can modify the same data at the same time, you can't write software that will scale in the way Vector processing does. The best you can do is take the independent portions of a program that do not interact with eachother and do those in parallel. Which is what games already do.

ex_bubblehead · Jan 4, 2016

Ok, this is as far off topic as will be allowed. Please take anything not directly related to the thread topic elsewhere.

gamerk316 · Jan 4, 2016

The "theory" in regards to software scaling is the same as it's always been:

https://en.wikipedia.org/wiki/Amdahl%27s_law

Your maximum benefit from increasing the number of processors is limited by the parts of the program that can not be made parallel. If 80% of the workload is serial, your maximum performance benefit form adding more processors is 20%. As a result, the Law of Diminishing Returns kicks in fairly quickly. And being a BS and CS, I know for pretty much every normal task you do, at least half, and often more, is serial and nature and CAN NOT be made parallel. This is why I've always been down on AMD since BD was first announced, because at least for general purpose computing, their entire approach has been WRONG.

Now, back on track; anyone with any real news, or are we going to spend the next 9+ months repeating the same arguments we made in the BD megathread circa 2010?

salgado18 · Jan 5, 2016

gamerk316 :

It depends, is this comment worthy?

http://wccftech.com/xbox-one-may-be-getting-a-new-apu-based-on-amds-polaris-architecture/

jimmysmitty · Jan 5, 2016

salgado18 :

Not from them. They report every rumor first then correct as time goes or, even worse, just report a new angle.

They were claiming the XB1 was getting a new APU (it is even in your link) but that was proven to just be the code name for the NetFlix for XB1.

Vogner16 · Jan 5, 2016

salgado18 :

no. this is a report on a paper being filed to a theory zen apu with a single stack of hbm.

Vogner16 · Jan 5, 2016

http://wccftech.com/amd-carrizo-athlon-desktop-cpus/

Carrizo on FM2+?!

juanrga · Jan 5, 2016

palladin9479 :

cdrkf :

Been awhile since I chipped in but I've answered that question already.

At their heart CPU's aren't complicated things, binary math hasn't changed in many decades. It's trivial to make an IC that adds binary digits together and outputs the result into memory. Problems arise when you start trying to make it "go fast" and execute unpredictable arbitrary code, because while the math units are extremely easy to make they can't do anything without an accompanying I/O subsystem. As you scale the workload up, that I/O subsystem starts being your bottleneck. It's no longer an issue about executing instructions, but one about getting those instructions there in the first place. If you were to look at the die map of a modern x86 CPU you would notice that an incredibly small area is actually devoted to executing the instructions, 5~10% tops. The rest is devoted to control logic and various caching / prediction algorithms, aka "magic", in an attempt to alleviate the I/O bottleneck. All those extra transistors use power, lots of it.

Knowing that we take another look at the ARM design. ARM CPU's are specifically built without much of that "extra magic" precisely to keep their power usage down. As long as the ARM CPU keeps it's throughput relatively low, then it can continue to operate without the magic and thus with a lower power usage. The moment sometime tries to ramp throughput up, into main stream x86 / Power / SPARC ranges, they suddenly slam right into that I/O bottleneck wall and need to start devoting more and more transistor space to "magic" to circumvent it. As they approach those higher performance levels they would notice their power advantage becoming less and less until suddenly they aren't any different then anyone else. Out of all the CPU manufacturers, nobody comes close to Intel for how advanced that "magic" secret sauce is. People like to rag on AMD, but in all honestly AMD is about on par with everyone else in the industry for CPU designs, they just happen to be in the unenviable position to compete with chipzilla who is so far advanced in control I/O that it's not even funny anymore.

Don't mix architecture with microarchitecture. That secret sauce, that "magic" that you mention belongs to the microarchitecture. But we are here discussing the advantages of the ARM architecture over the x86 architecture. Jim Keller gave a talk about this, he emphasized some of the advantages of ARM over x86, why ARM is more efficient and how this affect the way that he designed Zen and K12. Keller said that the more efficient ARM ISA allows engineers to spend more transistors on getting "performance" and said that K12 has a "wider engine" than Zen.

http://dresdenboy.blogspot.com.es/2015/11/amd-k12-looks-to-be-at-least-4-wide.html

I already offered a detailed discussion of the advantages of the ARM ISA over the x86 ISA in the older thread, when I predicted that ARM servers would be more efficient and cheap than x86 servers. I am not going to repeat the arguments and tech details, but I want to remark that since then third party analysts and early customers of ARM servers have confirmed my claims giving measurements of efficiency and costs. People is reporting huge efficiency gains over x86 servers and further reduction on acquisition costs.

juanrga · Jan 5, 2016

palladin9479 :

cdrkf :

Not for at least twenty years, if not longer. And that's assuming some sort of phenomenal never-before-seen increase in processing technology. We're talking borderline "I found it in a secret alien ship" type increase here.

Take the nVidia 980 Ti which has approximately 8bn transistors, not counting memory. Intel i7-6700K has somewhere around 1.75bn transistors, which includes it's small onboard iGPU. FX8350 has around 1.6bn transistors and the A10-7850K has 2.41bn transistors, including the iGPU.

That is the kind of power discrepancy that exists between dedicated vector processors (dGPU) and general purpose central processors (CPU/APU). There is almost an order of magnitude difference between the most powerful iGPU and the most powerful dGPU, not to mention dGPU's will have specialized ultra-wide, ultra-fast memory bus's dedicated while iGPU's will have to share with the central memory implementation.

We are decades away, at a minimum, from having "too much" vector processing power. We've barely scratched real time ray tracing and physics and have been experimenting with various implementations of 3D. Imagine what the graphics processing requirements will be once holographic displays become a consumer reality. There is just too much that you can do with a powerful vector co-processor available to the system.

For graphics, the PCIe bus presents zero issues. Programs simply upload their data sets to the dGPU's memory prior to execution, and as execution happens the program just keeps putting data into that memory before it's needed. The interesting thing about vector style processing is that it's incredibly predictable, compared to general processing and graphics memory is so large that there is never a problem of not having your data present prior to execution time. Thus the only issue from the PCIe (or any other) bus is latency, which is only an issue if your trying to use the dGPU has an integrated math co-processor instead of a dedicated graphics / physics co-processor. That is the only real advantage to having a local vector co-processor, and while it's a really good advantage it's not one that replaces the dGPU.

What your going to see isn't the iGPU replacing the dGPU, but rather the iGPU complimenting it. You will have a powerful CPU with a low to medium iGPU and a powerful dGPU. The CPU handles general computing with the iGPU acting as a co-processor while the dGPU acts as a graphics processor or physics when it's tied to graphics.

I already showed in the old thread why this kind of arguments are invalid. I find also interesting that a 250W graphic card is again compared to 65W/95W APUs. How many times I need to explain that current APUs are limited to low TPD due to the DDR3 memory bottleneck? To be fair, the 250W card would be compared to a 250W APU, and to be accurate we would be comparing chips made in future 10nm and 7 nm nodes by reasons that I have given a dozen of times.

Just to recall everyone here, the HPC/server APU announced by AMD the past year and that appears in the official roadmap for 2017 is not rated at 95W. This is a 250W APU, uses HBM, and have many advantages (including superior performance) over the dGPUs which replace (again guys check the roadmap).

Vogner16 · Jan 6, 2016

juanrga :

your using 250W as a reference holder I assume as we don't know power efficiency by 2017 to describe a significant higher performance (requiring HBM bandwidth) APU?

amd already has papers released for single stack HBM APU's for 2017 so it appears the theory is becoming reality soon.

I am very convinced to this strat working and amd's execution to be adequate enough to win significant market share in 2017. It was time to buy stocks 3 months ago (up 50% in 3 months) and it keeps looking better. an HBM APU would leapfrog intels igpu significantly.

EDIT: this all lies on if zen will be fast enough. zen has to be on haswell level performance (ipc gains and clocks included) or higher for this to be a success.

gamerk316 · Jan 6, 2016

your using 250W as a reference holder I assume as we don't know power efficiency by 2017 to describe a significant higher performance (requiring HBM bandwidth) APU?

No, he's using 250W because he's wrong if APUs don't come out at that performance level.

amd already has papers released for single stack HBM APU's for 2017 so it appears the theory is becoming reality soon.

Prediction: Gains will be significantly less then you're expecting. And I'm expecting the APU to either cost north of $250, or be sold at a loss. HBM is expensive after all.

I am very convinced to this strat working and amd's execution to be adequate enough to win significant market share in 2017. It was time to buy stocks 3 months ago (up 50% in 3 months) and it keeps looking better. an HBM APU would leapfrog intels igpu significantly.

They'll win share simply because of product refresh (Zen) regardless how it performs. That's why I purchased AMD back when it was at $1.60. Sure, I think they're dead in a few years, but hey, easy money in the short term.

8350rocks · Jan 6, 2016

Just to recall everyone here, the HPC/server APU announced by AMD the past year and that appears in the official roadmap for 2017 is not rated at 95W. This is a 250W APU, uses HBM, and have many advantages (including superior performance) over the dGPUs which replace (again guys check the roadmap).

I expect this iGPU math coprocessor will end up becoming some sort of integrated FPU-esque solution, and I very seriously doubt that it goes much further than that anytime soon due to thermal/electrical constraints.

Vogner16 · Jan 6, 2016

gamerk316 :

which is why zen has to be a success. I would bet $350 for a zen hbm apu. cpu performance of a haswell i5 and gpu performance of a 7850 or a 260x for $350 would be reasonable.

a 7850 or 260x hbm would be twice as fast as amd's 7870K overclocked.

med-high 1080p gaming desktop costing $500-$600 total...

its impossible to tell how they will do until zen comes out. with BD behind them there is a chance that they wont fail in a few years. this is why so many people are so anxious for amd's zen.

juanrga · Jan 6, 2016

Vogner16 :

The 250W figure is mentioned in several AMD docs about future high performance APUs. Moreover, when AMD presented the roadmap for next years, they mentioned that the top high-performance APU will have a TDP "between 200W and 300W"

http://www.fudzilla.com/news/processors/37395-amd-announces-glorious-five-year-plan
http://wccftech.com/amd-gpu-apu-roadmaps-2015-2020-emerge/

200W is the minimum AMD needs to compete with Nvidia and Intel on the high-performance arena. The Intel SoC is rated at 200W and Nvidia is working in a future 300W SoC prototype.

juanrga · Jan 6, 2016

gamerk316 :

A pair of years ago I predicted that engineers would start developing 250--300W SoCs to replace discrete cards. I even predicted the approximate timeline. My claims were received with strong skepticism, but I didn't care really because my claims were based in my knowledge of the physics of transistors.

Well, Intel already started replacing Phi cards with the new 200W SoCs and AMD announced a 200--300W APU SoC recently

Cazalan · Jan 6, 2016

Regurgitating information from HPC papers is not the same as predicting.

8350rocks · Jan 6, 2016

Cazalan :

By the same token, HPC is so stratospherically far away from consumer HEDT by comparison to something like even commercial grade data servers like SPARC/POWER/etc.

There are many concepts/developments that were a thing in HPC, and went by the wayside, before they ever made it anywhere near a consumer solution.

Now, there are also some things that make their way into HEDT, though often years later...

However, if something like a HBM APU appeared in HPCs next year at 300W...it would likely be a super high end workstation type option for firepro customers. If they decided to offer it to consumers, at all, it would likely cost something on the scale of a 5960x, because it is simply not a consumer grade part, and if you want one, pay a metric ton of money to get it.

jdwii · Jan 7, 2016

Ok again if one day a CPU only takes 10% of the die of the total APU then who would make a GPU that would only allow 10% more space? Not only that but its integrated into the APU meaning less latency currently that doesn't really manner but in the future software can take a advantage of this. Just for giggles take a fury X and add 10% more transistors and then think of 7nm die and come back and say its so impossible.

How big is my CPU? 1.4 Billion transistors however how big is a fury X? 8.9 Billion.

gummean · Jan 7, 2016

I think it's intended to protect the gpu cadrephotos division from the rest of AMD if it gets to the stage it's beyond saving.

jimmysmitty · Jan 7, 2016

jdwii :

There is more to it than just throw 10% more transistors. There is the power management and mostly the thermals. Look at what it takes to properly cool a Fury X die. Do you think any stock cooler currently out now could handle that? Even at 7nm it would require more than the stock AMD cooler.

There is a lot to the design and getting it to work within the range they need it to.

Will it be possible one day? Oh yea. Thing is that by that time a dGPU will also be capable of even more power. There will never be a time when a APU will be able to give the same power as discrete parts.

Vogner16 · Jan 7, 2016

jimmysmitty :

jdwii :

There is more to it than just throw 10% more transistors. There is the power management and mostly the thermals. Look at what it takes to properly cool a Fury X die. Do you think any stock cooler currently out now could handle that? Even at 7nm it would require more than the stock AMD cooler.

There is a lot to the design and getting it to work within the range they need it to.

Will it be possible one day? Oh yea. Thing is that by that time a dGPU will also be capable of even more power. There will never be a time when a APU will be able to give the same power as discrete parts.

http://www.pcworld.com/article/3019993/ces/meet-wraith-amds-whisper-quiet-new-stock-cpu-cooler.html

amd just released their new stock cooler.

"wraith" quieter and cooler than any other stock cooler ive ever seen. perhaps this is a sign of things to come.

jimmysmitty · Jan 7, 2016

Vogner16 :

And it, like their old heatpipe cooler, is rated for 125w so it is probably the same design as the old heatpipe cooler (looks pretty damn close from all the pictures) with a more efficient fan. I will tell you their current heatpipe cooler is loud. My wifes AMD system is louder than my Intel system with a H100i on normal with a heavy load. I can easily hear it through my headphones.

Still that cooler would not be able to handle the power dissipation of an APU with Fury X on it.

palladin9479 · Jan 7, 2016

jdwii :

Think long and hard about what you just said.

dGPU's are nearly an order of magnitude more then CPU's. We are already at the point where a CPU could take up ~10% of the die space and yet we still make very large dedicated graphics processors.

Its the size are are talking about. We don't have anywhere near enough raw graphics power to do all we need it to do visually. You think "4K derr" is intense? Try raytracing a 3D holographic display. You just shot past enterprise class and right into render farm / super computer class vector processing requirements. dGPU's are expensive because they are very large, they are the size they are because they can't make them any bigger economically. Any savings in space or power is just going to be reinvested into adding more processing power so there isn't any room for a CPU. Finally latency means all of nothing to a dGPU processing graphics data, it needs raw memory bandwidth, the more the better. This is exactly the inverse of a general purpose CPU, where bandwidth isn't nearly as important as latency because the CPU is spending most of it's time deciding what to do next while the GPU is spending most of it's time processing mammoth data sets, it already knows what it needs to do next. Those two performance profiles mix like oil and water. In order for there to be a single "APU" system you would need a ginormous 200~300W APU running it's own HBM stack that would completely suck at processing and bottleneck itself. Also these two processor types are intricately related to each other. The mammoth GPU doesn't need to decide what to process because the CPU has already made that decision for it. Pairing an incredibly weak CPU with a massive GPU is a waste as you quickly run into a problem where the CPU can no longer feed the GPU instructions and the GPU's start to stall out.

Essentially your trying to argue that it's perfectly ok to pair three GTX 980 TI's with a Pentium-G. For the same reason that's absurd is why dGPU's are going to remain for the next twenty years or more. The low end ones might vanish but the mid range and high won't due to sheer size and power requirements.

Vogner16 · Jan 7, 2016

jimmysmitty :

Vogner16 :

And it, like their old heatpipe cooler, is rated for 125w so it is probably the same design as the old heatpipe cooler (looks pretty damn close from all the pictures) with a more efficient fan. I will tell you their current heatpipe cooler is loud. My wifes AMD system is louder than my Intel system with a H100i on normal with a heavy load. I can easily hear it through my headphones.

Still that cooler would not be able to handle the power dissipation of an APU with Fury X on it.

no, but it would do wonders for a 14CU zen apu...

AMD CPUs, SoC Rumors and Speculations Temp. thread 2

Honorable

Glorious

Titan

Glorious

Distinguished

Champion

Honorable

Honorable

Distinguished

Distinguished

Honorable

Glorious

Distinguished

Honorable

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Reputable

Champion

Honorable

Champion

Splendid

Honorable

Share this page