The discussion is about the benefits of moving the GPU onto the CPU die. Until the x86 cores are less than ~20% of the total die area I think performance will be too constrained and that enthusiasts will prefer discrete solutions.
The 14nm node opens to door to the design of a 8-core APU, where 8 big-cores would represent about 25% of the total die area. Once one has 8- big cores inside an APU, the need for a separate dCPU vanishes.
The real game changer is the 10nm node.
jimmysmitty :
Will we eventually see a powerful GPU on die? Absolutely. I think it is further off than people think though, as it has been said the biggest issue is moving all that extra heat that a dGPU with even GTX680 performance would create.
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
Memory bottleneck is solved by HBM/HMC memory. AMD is already integrating HBM on the next APUs. Nvidia is doing the same (APUs with up to 1.6TB/s BW). Intel will start selling 'CPU' with packaged MCDRAM (500GB/s BW) the next year.
For the sake of comparison, a GTX680 has 192 GB/s BW available.
The discussion is about the benefits of moving the GPU onto the CPU die. Until the x86 cores are less than ~20% of the total die area I think performance will be too constrained and that enthusiasts will prefer discrete solutions.
The 14nm node opens to door to the design of a 8-core APU, where 8 big-cores would represent about 25% of the total die area. Once one has 8- big cores inside an APU, the need for a separate dCPU vanishes.
The real game changer is the 10nm node.
jimmysmitty :
Will we eventually see a powerful GPU on die? Absolutely. I think it is further off than people think though, as it has been said the biggest issue is moving all that extra heat that a dGPU with even GTX680 performance would create.
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
Memory bottleneck is solved by HBM/HMC memory. AMD is already integrating HBM on the next APUs. Nvidia is doing the same (APUs with up to 1.6TB/s BW). Intel will start selling 'CPU' with packaged MCDRAM (500GB/s BW) the next year.
For the sake of comparison, a GTX680 has 192 GB/s BW available.
what would you like to say to those who use features like quad sli or crossfire?
and in 2020 if someone tries to play a game on his 4 monitor setup each at 8k resolution then how these apus would be able to store the game data need for processing by igpu ? would amd use the crystals of some broken crystal ball ?
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
so, uhm.. what type of memory does PS4 use? i guess slow ddr is ... slow.
1.) K12 is not ARM only, that was my point. Yes, they refer to the entire subsystem as K12 for BOTH uarches, but the new x86 cores are K12. FYI: x86 is still not going anywhere...
2.) HEDT is growing: http://www.rockpapershotgun.com/2014/02/04/the-pc-gaming-market-is-flipping-enormous/
At the end of the day, I'd say that Intel's chances for long term success in the tablet space are pretty good - at least architecturally. Intel still needs a Nexus, iPad or other similarly important design win, but it should have the right technology to get there by 2014. It's up to Paul or his replacement to ensure that everything works on the business side.
4.) I said specifically, "AMD will not use HTT". What part of that did you not understand? You said that they would. I can promise you they will not license HTT from Intel, in case your confusion is confounding you, let me link you to SMT: http://en.wikipedia.org/wiki/Simultaneous_multithreading. There are many types of SMT. DO NOT confuse HTT as being the only version of SMT, and CMT is a version of SMT.
5.) LOL...right...10 TFLOP APU...ok...I will believe that when I see it...maybe when they get to picometer nodes...(you still provided no source). Besides, when there is a 10 TFLOP APU, there will likely be a 40-50 TFLOP dGPU that uses only 2x power of the APU.
6.) No source from you...and my SDK says ALL 8 CORES are available for game loads...
7.) I said the patents registered mentioned the highest frequency on the APU to be 2.75 GHz. Additionally...the PS4 is clocked about 2.0 GHz...so your prediction was wrong. Where is your crystal ball now?
This time I include your first try to answer and your second try to answer. It is interesting to see how your answers evolve over time, but continue showing the same disparity with what AMD says.
AMD: K12 is a new high-performance ARM-based core. 8350rocks (1st try): No. K12 != ARM. 8350rocks (2nd try): K12 is not ARM only, that was my point.
AMD: The TAM for x86 is decreasing, while it's increasing for ARM. 8350rocks (1st try): No. The TAM for x86 is increasing and it is gaining ground against ARM. 8350rocks (2nd try): HEDT is growing.
AMD: We will abandon CMT and will return to classic SMT design. 8350rocks (1st try): No. AMD new cores will be based in a redesigned CMT architecture (note: CMT is a version of SMT). 8350rocks (2nd try): I said specifically, "AMD will not use HTT".
AMD: We promise 10TFLOP APU by 2020 8350rocks (1st try): No. AMD cannot produce a 10 TFLOP APU for doing anything right now 8350rocks (2nd try): LOL...right...10 TFLOP APU...ok...I will believe that when I see it...
Sony: only six cores are available to games two cores are reserved by OS and for background/dev tasks 8350rocks (1st try): No. Eight cores are fully available to games because the OS is run in a separate chip. 8350rocks (2nd try): No source from you...and my SDK says ALL 8 CORES are available for game loads...
Sony: Jaguar cores are clocked at 1.6GHz 8350rocks (1st try): No. Jaguar cores are clocked at 2.75GHz. 8350rocks (2nd try): I said the patents registered mentioned the highest frequency on the APU to be 2.75 GHz. Additionally...the PS4 is clocked about 2.0 GHz...so your prediction was wrong. Where is your crystal ball now?
juanrga :
Some benchmarks of movile Kaveri against Haswell ULV
Nice to see benchmarked the FX APUs that I mentioned are coming
I suspect to see a huge performance benefit from Trinity(Richland) APU on a laptop to a Kaveri since Kaveri has around 15% more performance per clock compared to the phenom in my testing and its clocked higher as well at 2.7Ghz vs 2.5Ghz for A10-5750M. Probably a good 20% boost on average in CPU performance with a nice boost in GPU performance Since its going to use GCN. Honestly i might upgrade my old llano laptop to this if i can find a good laptop with the best APU for 650$ or less.
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
so, uhm.. what type of memory does PS4 use? i guess slow ddr is ... slow.
i was expecting a reply from juan...
it is very unusual that he'd take this long to answer such a simple query. may be real life business or something...
what would you like to say to those who use features like quad sli or crossfire?
I would say the same that I said the first time, and the second time, and the third time,... that I was asked about that.
truegenius :
and in 2020 if someone tries to play a game on his 4 monitor setup each at 8k resolution then how these apus would be able to store the game data need for processing by igpu ? would amd use the crystals of some broken crystal ball ?
Since you seem so confident, just let me know how many DRAM memory have these APUs?
The discussion is about the benefits of moving the GPU onto the CPU die. Until the x86 cores are less than ~20% of the total die area I think performance will be too constrained and that enthusiasts will prefer discrete solutions.
The 14nm node opens to door to the design of a 8-core APU, where 8 big-cores would represent about 25% of the total die area. Once one has 8- big cores inside an APU, the need for a separate dCPU vanishes.
The real game changer is the 10nm node.
jimmysmitty :
Will we eventually see a powerful GPU on die? Absolutely. I think it is further off than people think though, as it has been said the biggest issue is moving all that extra heat that a dGPU with even GTX680 performance would create.
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
Memory bottleneck is solved by HBM/HMC memory. AMD is already integrating HBM on the next APUs. Nvidia is doing the same (APUs with up to 1.6TB/s BW). Intel will start selling 'CPU' with packaged MCDRAM (500GB/s BW) the next year.
For the sake of comparison, a GTX680 has 192 GB/s BW available.
Memory is nowhere near the main bottleneck in a system right now and while in servers faster RAM would be nice, it is not what is needed for a APU to shine as current APUs do taper off.
The PS4 APU is actually a bit under impressive if you look at it from a GPU standpoint. It has the same amount of shaders as a HD7870 yet with even the CPU portion, it is slower in TFLOPS (1.84 vs 2.54) than a HD7870.
And pulling off 200-300w of heat from a CPU is fine. It would be the 250-300w of heat from the GPU plus the 80-150w + of heat that would be hard for air coolers to do as easily.
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
so, uhm.. what type of memory does PS4 use? i guess slow ddr is ... slow.
The PS4 uses GDDR5 @ 2750MHz as system memory. DDR3 @ 3000MHz would offer only about one half of the bandwidth.
I guess your computer DDR3 system memory peaks at something as 34GB/s. The PS4 system memory peaks at 176GB/s.
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
so, uhm.. what type of memory does PS4 use? i guess slow ddr is ... slow.
The PS4 uses GDDR5 @ 2750MHz as system memory. DDR3 @ 3000MHz would offer only about one half of the bandwidth.
I guess your computer DDR3 system memory peaks at something as 34GB/s. The PS4 system memory peaks at 176GB/s.
how is gddr5 "beyond" the "slow" ddr memory in pcs?
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
so, uhm.. what type of memory does PS4 use? i guess slow ddr is ... slow.
The PS4 uses GDDR5 @ 2750MHz as system memory. DDR3 @ 3000MHz would offer only about one half of the bandwidth.
I guess your computer DDR3 system memory peaks at something as 34GB/s. The PS4 system memory peaks at 176GB/s.
But yet CPU's work better with lower latency and GPU's work better with more bandwidth.
But yet CPU's work better with lower latency and GPU's work better with more bandwidth.
APUs are a compromise, plain and simple. Kaveri had to compromise CPU speed for GPU speed. There are some advantages though, like cost.
In maybe 5-6 years you'll be able to get consumer APUs with 3D memory at a reasonable price. It all depends on how quickly Micron/SKHynix/Samsung can ramp their production and get the costs down.
For reference, right now the only board you can get with 3D memory on it is 25k. I should be getting mine (for work) sometime this summer.
Memory is nowhere near the main bottleneck in a system right now and while in servers faster RAM would be nice, it is not what is needed for a APU to shine as current APUs do taper off.
The PS4 APU is actually a bit under impressive if you look at it from a GPU standpoint. It has the same amount of shaders as a HD7870 yet with even the CPU portion, it is slower in TFLOPS (1.84 vs 2.54) than a HD7870.
Memory bandwidth is the main bottleneck for current APUs. It is the reason why Richland increased native DRAM frequency from 1866MHz to 2133MHz. It is the reason why top APUs from Intel introduce L4 cache. It is the reason why the Xbox1 APU uses ESRAM and it is the reason why the PS4 APU uses fast GDDR5 instead of slow DDR3.
AMD had originally planned a faster top Kaveri APU with 6 CPU cores and more powerful GPU (that would hit 1TFLOP). That original APU concept used fast GDDR5 memory. However, there were problems with one of the memory suppliers and then AMD canceled its original plan, reduced CPU cores to 4, lowered the GPU clocks, fused the (now unused) GDDR5 memory ondie controller and released the final top Kaveri APU that you can purchase in the store, the one that relies on slow DDR3 memory.
The PS4 APU has been designed to fit inside a console. The HD7870 has been designed to fit inside a desktop. Consoles are limited at about 100W TDP. Desktops can have up to 10x more TDP than consoles.
If you were to use a HD7870 for a console you would cut its TDP by about one third to fit it in the 100W thermal limit, which implies lowering the clocks if you want maintain the same number of execution units.
Assuming cubic scaling one third less dissipation correspond to lowering clocks by ~1.4. which reduces TFLOPS from 2.54 to ~1.81.
Then you would have a CPU+dGPU configuration that would be more expensive, less reliable, and clearly slower. Both Sony and Microsoft rejected a CPU+dGPU configuration for the hardware of its consoles since the first minute that they start thinking about the design.
It’s becoming clearer that memory bandwidth rather than memory latency is the next bottleneck that needs to be addressed in future chips. Consequently we have seen a dramatic increase in the size of on-chip caches, a trend that is likely to continue as the amount of logical gates on a chip will keep increasing for some time to come, still following Moore’s law. There are however several other directions based on novel technologies that are currently pursued in the quest of improving memory bandwidth:
I bolded the relevant part. He then reports some of the research directions, including stacked RAM, which I mentioned above in one of my former posts. From the HMC consortium FAQ:
What is the problem that HMC solves?
Over time, memory bandwidth has become a bottleneck to system performance in high-performance computing, high-end servers, graphics, and (very soon) mid-level servers. Conventional memory technologies are not scaling with Moore's Law; therefore, they are not keeping pace with the increasing performance demands of the latest microprocessor roadmaps. Microprocessor enablers are doubling cores and threads-per-core to greatly increase performance and workload capabilities by distributing work sets into smaller blocks and distributing them among an increasing number of work elements, i.e. cores. Having multiple compute elements per processor requires an increasing amount of memory per element. This results in a greater need for both memory bandwidth and memory density to be tightly coupled to a processor to address these challenges. The term "memory wall" has been used to describe this dilemma.
Why is the current DRAM technology unable to fully solve this problem?
Current memory technology roadmaps do not provide sufficient performance to meet the CPU and GPU memory bandwidth requirements.
What are the measurable benefits of HMC
HMC is a revolutionary innovation in DRAM memory architecture that sets a new standard for memory performance, power, reliability, and cost. This major technology leap breaks through the memory wall, unlocking previously unthinkable processing power and ushering in a new generation of computing. + Increased Bandwidth — A single HMC unit can provide more than 15X the bandwidth of a DDR3 module. + Reduced Latency – With vastly more responders built into HMC, we expect lower queue delays and higher bank availability, which will provide a substantial system latency reduction. + Power Efficiency — The revolutionary architecture of HMC allows for greater power efficiency and energy savings, utilizing 70% less energy per bit than DDR3 DRAM technologies. + Smaller Physical Footprint — The stacked architecture uses nearly 90% less physical space than today’s RDIMMs. + Pliable to Multiple Platforms — Logic layer flexibility allows HMC to be tailored to multiple platforms and applications.
To throw some numbers to the discussion, the HMC 1.0 specification already consider sustained bandwidths on pair with a modern L3 cache from Intel. This is like if your CPU had access to a 100x bigger L3 cache
As I mentioned above AMD plans to offer HBM as option for the K12 APU. I can also confirm you that the ultra-high-performance designed by Nvidia doesn't use any L3 cache, but an ordinary hierarchy L1/L2 and then stacked DRAM with a total throughput of about 1.6TB/s.
The PS4 uses GDDR5 @ 2750MHz as system memory. DDR3 @ 3000MHz would offer only about one half of the bandwidth.
I guess your computer DDR3 system memory peaks at something as 34GB/s. The PS4 system memory peaks at 176GB/s.
how is gddr5 "beyond" the "slow" ddr memory in pcs?
I have mentioned how GDDR5 module provides twice more bandwidth than a similarly clocked DDR3 module.
can you repost it, please? i follow this thread. i am absolutely certain that you have never mentioned how gddr5 is "beyond" the "slow" ddr memory in pcs. by that i mean that you have never provided any technical explanation on that matter. my query is sorta two-parter. part 1 - how is gddr5 beyond ddr3 i.e. what makes gddr5 so much more advanced than ddr3 in terms of technology and performance - assuming that's what you mean by "beyond". part 2 - how is ddr (i guess you mean ddr3) slower than gddr5?
and in 2020 if someone tries to play a game on his 4 monitor setup each at 8k resolution then how these apus would be able to store the game data need for processing by igpu ? would amd use the crystals of some broken crystal ball ?
Since you seem so confident, just let me know how many DRAM memory have these APUs?
what ? i didn't got your question
i didn't born in pondicherry, didn't studied in Britain thus a little bit slow in English
so me want you question explain ( cavemen english )
juanrga :
Then you would have a CPU+dGPU configuration that would be more expensive, less reliable, and clearly slower. Both Sony and Microsoftrejected a CPU+dGPU configuration for the hardware of its consoles since the first minute that they start thinking about the design.
because they don't need to have swapable gpus thus single chip solution was better.
juanrga :
In fact their site correctly claims that this new memory technology is needed for exascale supercomputers:
average joe don't use exascale indeed extreme joe's extreme desktop cpu is not an exascale cpu
Not a lot of details about the new micro-architecture are known at present. What is recognized for sure is that it will drop CMT in favour of some kind of SMT (something akin to Intel’s HyperThreading) technology to improve performance in both single-threaded and multi-threaded cases.
About the unfounded rumor that AMD would return to SOI, Keller admits that he is "happy" with the 14/16 FINFET process and that it "looks pretty good". He also admits is targeting frequencies close to 4GHz for its K12 core.
Not a lot of details about the new micro-architecture are known at present. What is recognized for sure is that it will drop CMT in favour of some kind of SMT (something akin to Intel’s HyperThreading) technology to improve performance in both single-threaded and multi-threaded cases.
About the unfounded rumor that AMD would return to SOI, Keller admits that he is "happy" with the 14/16 FINFET process and that it "looks pretty good". He also admits is targeting frequencies close to 4GHz for its K12 core.
16-core Steamroller? Not according to your crystal ball!
1.) K12 is not ARM only, that was my point. Yes, they refer to the entire subsystem as K12 for BOTH uarches, but the new x86 cores are K12. FYI: x86 is still not going anywhere...
2.) HEDT is growing: http://www.rockpapershotgun.com/2014/02/04/the-pc-gaming-market-is-flipping-enormous/
At the end of the day, I'd say that Intel's chances for long term success in the tablet space are pretty good - at least architecturally. Intel still needs a Nexus, iPad or other similarly important design win, but it should have the right technology to get there by 2014. It's up to Paul or his replacement to ensure that everything works on the business side.
4.) I said specifically, "AMD will not use HTT". What part of that did you not understand? You said that they would. I can promise you they will not license HTT from Intel, in case your confusion is confounding you, let me link you to SMT: http://en.wikipedia.org/wiki/Simultaneous_multithreading. There are many types of SMT. DO NOT confuse HTT as being the only version of SMT, and CMT is a version of SMT.
5.) LOL...right...10 TFLOP APU...ok...I will believe that when I see it...maybe when they get to picometer nodes...(you still provided no source). Besides, when there is a 10 TFLOP APU, there will likely be a 40-50 TFLOP dGPU that uses only 2x power of the APU.
6.) No source from you...and my SDK says ALL 8 CORES are available for game loads...
7.) I said the patents registered mentioned the highest frequency on the APU to be 2.75 GHz. Additionally...the PS4 is clocked about 2.0 GHz...so your prediction was wrong. Where is your crystal ball now?
This time I include your first try to answer and your second try to answer. It is interesting to see how your answers evolve over time, but continue showing the same disparity with what AMD says.
AMD: K12 is a new high-performance ARM-based core. 8350rocks (1st try): No. K12 != ARM. 8350rocks (2nd try): K12 is not ARM only, that was my point.
AMD: The TAM for x86 is decreasing, while it's increasing for ARM. 8350rocks (1st try): No. The TAM for x86 is increasing and it is gaining ground against ARM. 8350rocks (2nd try): HEDT is growing.
AMD: We will abandon CMT and will return to classic SMT design. 8350rocks (1st try): No. AMD new cores will be based in a redesigned CMT architecture (note: CMT is a version of SMT). 8350rocks (2nd try): I said specifically, "AMD will not use HTT".
AMD: We promise 10TFLOP APU by 2020 8350rocks (1st try): No. AMD cannot produce a 10 TFLOP APU for doing anything right now 8350rocks (2nd try): LOL...right...10 TFLOP APU...ok...I will believe that when I see it...
Sony: only six cores are available to games two cores are reserved by OS and for background/dev tasks 8350rocks (1st try): No. Eight cores are fully available to games because the OS is run in a separate chip. 8350rocks (2nd try): No source from you...and my SDK says ALL 8 CORES are available for game loads...
Sony: Jaguar cores are clocked at 1.6GHz 8350rocks (1st try): No. Jaguar cores are clocked at 2.75GHz. 8350rocks (2nd try): I said the patents registered mentioned the highest frequency on the APU to be 2.75 GHz. Additionally...the PS4 is clocked about 2.0 GHz...so your prediction was wrong. Where is your crystal ball now?
Alright, I am no dignifying your lunacy with another response.
If you do not post a link to a verifiable source documenting exactly any claims you make from here on out, I am just ignoring you...
Additionally, not one of your claims about ARM have panned out, and AMD themselves have not stated anything confirming any of your claims.
I predicted dGPU will be replaced by APUs, because dGPUs don't scale up well (APUs will be more faster). Some people here disagreed and did strongly (including personal insults). I reproduced a quote from Nvidia Reseach Team agreeing with me on that discrete GPus will be replaced by GPUs on same die than CPU. The quote was ignored and/or deleted in replies.
I found another well-known HCP expert, Jack Dongarra, who agrees with me on that traditional dCPU(socket)+dGPU(PCIe) or even dCPU(socket)+dGPU(socket) doesn't scale up (I guess that he has made the math, like I did):
Another problem that GPUs present pertains to the movement of data. Any machine that requires a lot of data movement will never come close to achieving its peak performance. The CPU-GPU link is a thin pipe, and that becomes the strangle-point for the effective use of GPUs. In the future this problem will be addressed by having the CPU and GPU integrated in a single socket
I think the reason you are getting so frustrated while preaching your APU salvation to the heathens is that you are forgetting your audience. All of your arguments for the superiority of an APU over a dGPU are based off the comments of experts focused on HPC.
The average Tom's reader/forum lurker cares nothing about HPC workloads. They are concerned with whether or not their R9 780X will run BF7: Lunar Warfare. The Tom's editors know this which is why the System Builder Marathon every quarter devotes so much time to gaming benchmarks and analysis.
Gaming and HPC have different requirements and the PCIe bandwidth is not a bottleneck in typical games. Until this becomes a significant issue for games it won't make sense for an enthusiast to move to an APU. Yeah you crammed 3000 shaders with HBM cache into an APU but if you hadn't reserved 1/3 of the die for the CPU you could have 4500 shaders which will perform better.
Putting 250+ watts into a MB socket will present its own problems in the consumer segment and may require a break with the ATX form factor to adequately cool with acceptable acoustics. AMD does not have the resources required to create a new platform in a market they control less than ten percent of.
APUs are excellent for certain use cases; if you are concerned about total system power you probably should use an APU. Enthusiast class desktops are not one of those cases.
(i)
I am working in an article about how the chips of year 2020 will be. The stuff is very interesting, because includes a change of paradigm in the architecture of computers, and I decided to share my knowledge about that in this thread, adding also some details about AMD future plans and about related plans from Nvidia and Intel.
(ii)
I provided some math/phys relevant details such as the ratio of local compute to data movement for current silicon and the ratio for future silicon showing why a dGPU doesn't scale up and has to be rejected.
This derived in some posts ranging from the educated "I don't think so" to the usual ad hominem by the same guys of always.
(iii)
Then I decided to share some quotes of real experts disagreeing with the laughable comments from the last guys. Dongarra is a famous HPC expert, but the Nvidia Research Team is also behind the gaming Geforce cards.
(iv)
The same arguments apply to gaming. The only difference with HPC is on the order of magnitude of the problem associated to the nonlinear scaling of the future silicon.
I explained that gaming is evolving towards using the GPU also for compute (e.g. physics or AI). Once you are using the GPU for compute you have to confront problems that are very similar to those mentioned for HPC.
(v)
I explained that GPUs for gaming are not designed in a vacuum. The disappearance of the top-end discrete GPUs used for compute will affect the development of the cheaper gaming GPUs. The explanation is the same that I used here to predict that AMD wouldn't release Steamroller FX CPUs once the server roadmap was made public and it showed the cancellation of Steamroller Opteron CPU.
The same people who didn't understand then my argument and said me "wait for the desktop roadmap" is the same people who doesn't understand my argument now and believes that discrete GPUs for gaming will be released even after the Sun disappear.
(vi)
There is no serious problem with 250W sockets. AMD is already selling 220W CPUs and you can find 300W coolers in the market. Moreover the expensive 200--300W APUs that are being designed for exascale supercomputers don't need to be reused for gaming, in the same way that expensive 150W Xeons used in fastest supercomputers are not found in gaming PCs.
(vii)
Beyond my HPC sources, I also have relevant sources from gaming and rendering communities. E.g. I have material from a very well-known guy and he agrees with me on that GPUs for gaming will disappear. Not only we agree on this, but we also agree on how will be the gaming hardware that will replace the GPUs. He is already thinking on new advanced algorithms/code that will be used for the future games. But this is material reserved for my future article, now it is time to watch the possible funny reactions of the many 'engineers', 'game-developers', and Mr-I-have-a-friend-at-AMD in this thread.
HPC != HEDT
If that was the case, then you would see 256 node home PCs with coprocessors for FP ops and games would be designed to run on 100+ cores. Graphics would look like dreamworks quality CGI from hollywood movies, and your average power bill in the states would run about $300-400 monthly.
Additionally...what are your qualifications? How is it you feel ever so more qualified to speculate (being generous with that word, as it is mostly garbage you spout) about future PC technologies? There are many here with MANY more years experience in RELEVANT fields that disagree with you than just me. So...feel free to speculate away, however, I have a feeling your magic 8 ball is going to break soon...
Not a lot of details about the new micro-architecture are known at present. What is recognized for sure is that it will drop CMT in favour of some kind of SMT (something akin to Intel’s HyperThreading) technology to improve performance in both single-threaded and multi-threaded cases.
About the unfounded rumor that AMD would return to SOI, Keller admits that he is "happy" with the 14/16 FINFET process and that it "looks pretty good". He also admits is targeting frequencies close to 4GHz for its K12 core.
From your same article:
What is, perhaps, more important is that one of AMD’s official document detailed a sixteen-core AMD Bulldozer-derived processor. If the company pursues this opportunity an goes for a 16-core chip featuring Steamroller or Excavator cores, its new chips based on the new micro-architecture will only be available in 2016 or even 2017.
I know nothing...and I know no one at AMD, yep...that is certainly provable with that statement above.
According to your magic 8 ball, there would be no more HEDT dedicated dCPUs. But what do we find? PLANS FOR A NEW HEDT dCPU!?! Holy cow...what's next chicken little? The sky is falling?
Memory is nowhere near the main bottleneck in a system right now and while in servers faster RAM would be nice, it is not what is needed for a APU to shine as current APUs do taper off.
The PS4 APU is actually a bit under impressive if you look at it from a GPU standpoint. It has the same amount of shaders as a HD7870 yet with even the CPU portion, it is slower in TFLOPS (1.84 vs 2.54) than a HD7870.
Memory bandwidth is the main bottleneck for current APUs. It is the reason why Richland increased native DRAM frequency from 1866MHz to 2133MHz. It is the reason why top APUs from Intel introduce L4 cache. It is the reason why the Xbox1 APU uses ESRAM and it is the reason why the PS4 APU uses fast GDDR5 instead of slow DDR3.
AMD had originally planned a faster top Kaveri APU with 6 CPU cores and more powerful GPU (that would hit 1TFLOP). That original APU concept used fast GDDR5 memory. However, there were problems with one of the memory suppliers and then AMD canceled its original plan, reduced CPU cores to 4, lowered the GPU clocks, fused the (now unused) GDDR5 memory ondie controller and released the final top Kaveri APU that you can purchase in the store, the one that relies on slow DDR3 memory.
The PS4 APU has been designed to fit inside a console. The HD7870 has been designed to fit inside a desktop. Consoles are limited at about 100W TDP. Desktops can have up to 10x more TDP than consoles.
If you were to use a HD7870 for a console you would cut its TDP by about one third to fit it in the 100W thermal limit, which implies lowering the clocks if you want maintain the same number of execution units.
Assuming cubic scaling one third less dissipation correspond to lowering clocks by ~1.4. which reduces TFLOPS from 2.54 to ~1.81.
Then you would have a CPU+dGPU configuration that would be more expensive, less reliable, and clearly slower. Both Sony and Microsoft rejected a CPU+dGPU configuration for the hardware of its consoles since the first minute that they start thinking about the design.
Ironically, ALL the developers working on PS4 and XB1 will both tell you the decision to use ESRAM on the XB1 is the achilles heel and is actually a hindrance rather than a help. The ability to use all memory addressable for tasks as a coherent block on PS4 is far more useful and will only hinder XB1 performance moving forward (as we are currently seeing with lower FPS on cross platform games versus what PS4 can provide).
The PS4 uses GDDR5 @ 2750MHz as system memory. DDR3 @ 3000MHz would offer only about one half of the bandwidth.
I guess your computer DDR3 system memory peaks at something as 34GB/s. The PS4 system memory peaks at 176GB/s.
how is gddr5 "beyond" the "slow" ddr memory in pcs?
I have mentioned how GDDR5 module provides twice more bandwidth than a similarly clocked DDR3 module.
can you repost it, please? i follow this thread. i am absolutely certain that you have never mentioned how gddr5 is "beyond" the "slow" ddr memory in pcs. by that i mean that you have never provided any technical explanation on that matter. my query is sorta two-parter. part 1 - how is gddr5 beyond ddr3 i.e. what makes gddr5 so much more advanced than ddr3 in terms of technology and performance - assuming that's what you mean by "beyond". part 2 - how is ddr (i guess you mean ddr3) slower than gddr5?
My original words are quoted above in this same message. I mentioned how GDDR5 is beyond the "slow" DDR3. I didn't mention "why", but I can do it now you ask me. GDDR5 allows data to be transmitted on both peaks of the signal, this is why a single 3000MHz GDDR5 module provides double peak throughput than a single 3000MHz DDR3 module: 48GB/s vs 24GB/s, respectively.