AMD CPU speculation... and expert conjecture

juanrga · Feb 19, 2015

Skylake for the Desktop is coming on August of this year. So far as I know frequencies are ok; in fact they are higher.

About frequencies on future nodes, this is the data I have recollected:

■ Intel engineers are targeting 4.6GHz for CPU cores on 7nm node.
■AMD doesn't give official details about frequencies of the APU, but the APUsilicon article gives some guesses based in AMD labs data.
■ Japanese engineers are targeting 4GHz for CPU cores and 1GHz for the throughput cores. Both kind of cores on same die made on 10nm node.
■ Nvidia engineers are targeting 2GHz for CPU cores and 1GHz for the GPU cores. Both kind of cores on same die made on 7nm node. I asked to one of engineer of the project and he confirmed me that their estimations for the prototype are overly-conservative and that final silicon will run at higher frequencies.
■Samsung has just presented the 14nm node. About 25% higher frequencies than the 20nm node.

About 4k gaming. The APU described in the APUsilicon article is much faster than the 390X. If one can game at 4k the other also can.

No. You are considering a quad-core Haswell plus GT2, but ignoring that a quad-core Haswell plus GT3 occupies more than 260mm2. Both on 22nm. For your information 260mm2 would be about 420mm2 on 28nm, which is roughly twice bigger than Kaveri.

Other large dies include 8-core Bulldozer (315mm2), 8-core Sandy Bridge (416mm2), 10-core Westmere (513mm2)...

AMD has enough room for L3 cache, but as several of us explained before, increasing the die size by 50% for a weak 5--10% performance gain makes no sense, this is why AMD maintained Kaveri die small.

juanrga · Feb 19, 2015

You got it. Not only moving info from one die to another die in the mobo will be too costly, but even moving info inside the same die will be! As a consequence not only engineers are abandoning discrete cards for their future designs but are even abandoning current APU designs where the CPU is in one side of the die and the GPU in the other side.

In one of those future designs (which I like a lot of) each group of 64 GPU cores is associated to 1 CPU core. The distribution is 32+1+32 to maximize efficiency by reducing the average path that info has to travel inside the die, and this basic pattern is reproduced eight times along the die.

noob2222 · Feb 19, 2015

juan, don't give your own articles as a reliable source for information ...

4.6 ghz on 7nm ... rofl they can target that all they want, doesn't mean its going to happen. don't forget how fast things change.

http://www.kitguru.net/components/cpu/anton-shilov/intels-roadmap-leaks-broadwell-k-broadwell-e-and-skylake-k-due-in-2015/

broadwell - k is mia now when just 1 year ago it was going to be the best thing evah. predicting 5-10 years from now that 7nm is going to reach 4.6 ghz stock .... so somehow this trend of thermal density is suddenly going to reverse course .. ya right, when pigs fly

as for what your saying, Nvidia is the only one doing it right, start out lower than target and surprise customers with a better outcome. promise 4.6 ghz and deliver 2.5 will only enrage people.

reality and dreaming big are 2 different things that shouldn't be mixed into one concept.

for some more good laugh on past predictions vs reality

http://www.tomshardware.com/news/GlobalFoundries-7nm-processing-Common-Platform-14nm-XM-10nm-XM,21050.html

juanrga · Feb 19, 2015

4.6GHz is an achievable target and perfectly compatible with the international trends. No technical problem here. Final CPUs could be clocked at 4.3GHz or something instead 4.6GHz, ok, but without any doubt the engineers target is much more close to reality than "500 mhz".

truegenius · Feb 19, 2015

blackkstar :

hey there, try this, run your cpu at lowest speed like 1.6ghz or 2ghz on all core with turbo off ( without lowering stock settings for ram or north bridge ) and then run hwinfo64 and then play games and then see how much max cpu was used and how much average cpu was used
this way we are causing cpu botleneck and then you can check if the game is multithreded only or actually using multicore.
if it is using all your 8 cores to 100% with atleast 90% average usage then it is using all 8 cores else it is less core which you can calculate using average usage by using this formula ( core count * avg cpu usage / 100 = real core used , if result is in fraction like 7.3 then take it as 8 core instead of rounding it off to 7 core)

i tested gta4 using this method and i found that it only use 4 cores max regardless of its 50+ thread counts, and shocking part was that it performed significantly better ( like 35fps for 4 core vs 30 fps for 6 core ) with 4 core than with 6 core ( at same clock and game settings without turbo ) even after multiple runs.

so in some bad coded games multiple cores can actually harm performance

try this experiment and let us know

edit :- i again tested gta4 but now i am not seeing better performance with 4 core in comparison to 6, maybe new graphics card influenced the result , but i am sure that with hd6770 and 2x4GB ram i was getting significantly better performance with 4 core

jdwii · Feb 19, 2015

^^^ Why not just disable the core all together in the task messenger and then turn it back on per application? I did this with all the tests and i'm not sure what lowering the clock speed really does to see how many cores a game uses? One thing i loved doing is testing the module design withe different games even with windows 8.1 scheduler i would see a few more FPS if i enabled cores 1-3-5-7 in GTA4 instead of leaving it on auto. I personally saw no gains after the 3rd core was being used doing this.

Actually i'm going to test haswell's scaling with HT last time i used a Pentium 4 and did this it was like 15-20% at best guessing its way better now

de5_Roy · Feb 19, 2015

in juan's defense, he has never specified what type of clockrate. we already have 8GHz vram in current nvidia cards, so 4.6GHz is not a stretch. there's also the oft-overlooked matter of boost clock/powertune/turbo vs base clockrate. then there's tdp, die size, substrate... waitaminute.. so many factors are left out that it looks like blanket statement to me...
and if it's not memory rather host processor clocks, then he might be secretly back in the SOI camp. 😗 :lol:

truegenius · Feb 19, 2015

jdwii :

some time some software and games does not allow to change core affinity or priority even when running task manager at realtime priority with administrative privilege account :mouais:

so i tried disabling in bios only
i will try now with task manager, this may give extra 1MB l2 cache :miam:

juanrga · Feb 19, 2015

de5_Roy :

I gave the base frequencies of the cores.

noob2222 · Feb 20, 2015

Hey jaun, is your magical cpus going to run at .05mV and actually not require a power source too?

If your a self proclaimed scientist then explain what i have mentioned over and over, how do you get around thermal density in order to avoid instant meltdown. Dont rely on marketing for your answer.

Embra · Feb 20, 2015

Panasonic Toughbook CF-54 has AMD FirePro M5100
http://www.guru3d.com/news-story/panasonic-toughbook-cf-54-has-amd-firepro-m5100.html

de5_Roy · Feb 20, 2015

AMD Radeon 300 series to feature current GCN cores, Fiji the only new GPU
http://vr-zone.com/articles/amd-radeon-300-series-to-feature-current-gcn-cores-fiji-the-only-new-gpu/87315.html

The only graphics cards that will feature the new Fiji architecture will be the Radeon R9 390, Radeon R9 390X, and the Radeon R9 395X2 dual-GPU card. The Radeon R9 390X is rumored to arrive with 4096 cores, 4GB 4096-bit HBM memory and the new GCN 1.3 architecture.

AMD A8-7650K APU is available for sale in US
http://www.cpu-world.com/news_2015/2015021901_AMD_A8-7650K_APU_is_available_for_sale_in_US.html

gamerk316 · Feb 20, 2015

truegenius :

jdwii :

some time some software and games does not allow to change core affinity or priority even when running task manager at realtime priority with administrative privilege account :mouais:

so i tried disabling in bios only
i will try now with task manager, this may give extra 1MB l2 cache :miam:

Because some titles (I'm looking at you, FC4) hardcode threads to specific cores, which is VERY bad programming practice. You NEVER do this, and if you have to, then something is seriously wrong with your program.

juanrga · Feb 20, 2015

de5_Roy :

Just as expected. This confirms that the new GCN arch is far from revolutionary. The main point of the new arch will be the use of HBM, which is costly and only makes sense in the top cards. And that is why most cards of the 300 series will be re-labeled 200 series.

We are now just discussing similar case about Zen APUs on another forum. A group of people including myself think that AMD will use HBM only on top APUs (and maybe only semicustom) whereas rest of APUs will rely on dual-channel DDR4.

gamerk316 · Feb 20, 2015

How will using HBM on top APUs help? You're still going to be using DDR3/4 as main memory, so the bottleneck still exists. Improving speeds on-chip isn't going to fix that. And using HBM as system RAM is simply not cost-effective; it prices APUs out of the market.

8350rocks · Feb 20, 2015

bmacsys :

He is a self professed expert that uses his own theory crafted articles and will never admit he is wrong. Just as he cited the BGA only chips with Iris pro iGPUs to make a point that is essentially tied to a product that had less than 3% market share out of their processor lineup.

juanrga :

I *very seriously* doubt that AMD will use HBM system RAM on any APU that is not a custom design that requires it.

Asking consumers to foot the bill for HBM is going to be like Intel doing BGA chips with huge iGPU, and selling them for double what the top 4 core i7 went for...

Just will not see any market share...now, it would make tons of marketing slides that you like so much. Though, just like Intel, that will not be a real world result, only a tightly controlled, small sample size niche.

juanrga · Feb 20, 2015

noob, as I stated before Samsung 14nm allows 25% higher frequencies and Skylake desktop is released this year and frequencies are higher.

The thermal density myth was debunked by overclockers.com. The explanation for Ivy higher temperatures is other.

The engineer's targets aren't "marketing", but their values are in agreement with international consensus on technological trends. Voltages are predicted to decrease by ~5% compared to 22nm node, based exclusively on physical material scaling rules (aka science and technology not "marketing"). Moreover, there are extra voltage reductions coming from new aggressive voltage scaling techniques based in NTV technology and under active research by both academia and industry

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6241650

The goal of NTV techniques is to enable extremely low supply voltages, by using new kind of circuits. If you have more doubts send me a PM.

juanrga · Feb 20, 2015

gamerk316 :

It will help on a similar way as the ESRAM helps the Xbox-One APU

A high-bandwith memory pool for the GPU reduces the DDR memory bottleneck.

gamerk316 · Feb 20, 2015

Juan, remember the ESRAM on the 360 was VERY carefully managed, and a lot of low level coding was used to get the most out of it. That type of coding is NEVER done for general purposes systems, and will not be done on the PC.

Also remember the 360 was a different CPU architecture; 1MB L2 is tiny by modern standards. The ESRAM was basically acting as what is now the L3 cache, just clocked slightly slower. And guess what? We've already agreed that's worth, at best, 10% extra performance.

And again, you still have the bottleneck getting the data to the CPU from main memory over the memory bus. All the ESRAM is doing is acting as a next level cache.

szatkus · Feb 20, 2015

gamerk316 :

Remember Iris Pro? With eDRAM? Cache helps a lot.

noob2222 · Feb 20, 2015

Juan, do you even know what thermal density is?

25% higher clocks on an ulp chip is not the same as 25% higher clocks @100w. Try using your own equation on power and see if it even makes sense to claim what you are.

blackkstar · Feb 20, 2015

I remember when Intel engineers targeted 10ghz with Nehalem Netburst.

Someone is ignoring the keywords "up to" as well. I assume you're looking at Samsung's claims of up to 20% improvement from 20nm and just multiplying every frequency you want by 1.2.

http://www.zdnet.com/article/taking-chips-to-10ghz-and-beyond/

Apply some of the logic in this thread to this article, and you have guaranteed 10ghz chips in 2011!

juanrga · Feb 20, 2015

gamerk316 :

The ESRAM on Xbox-one have to be explicitly managed because is small quantity (32MB) and you have to guarantee that relevant data is on the ESRAM, otherwise has to be read from main memory which kill performance. But the future APUs will have one or more GB of HBM. Check the HPC APU diagram on the APUsilicon article. It represents an extreme APU with 32--64GB of stacked RAM for the iGPU.

About caches, what I said is that the 4MB L3 cache on FX-4300 CPU gives 5--10% improvement on gaming. That is different from the role of cache on iGPUs. Intel Iris Pro is an example, the 128MB eDRAM is an needed element to obtain 40% more performance than Haswell iGPUs without it. The ESRAM on Xbox-One APU is another example. Without ESRAM the iGPU would be bottlenecked by DDR3 memory. That is why the Xbox-One has 14 CUs but top Kaveri has only 8 CUs.

juanrga · Feb 20, 2015

noob2222 :

blackkstar :

Who said that 25% higher clocks apply uniformly to any chip? Who mentioned TDPs?

Engineer's target of 4.6GHz is feasible. It is 15% higher than current chips. This doesn't mean that power consumption will be 15% higher because voltages will drop, as I mentioned, by ~5%.

A mistake made by a pair of engineers from one company and only by them cannot be confused with different claims from another pair of engineers, claims that are in good agreement with international consensus trends from any expert in the field (both academia and industry).

If anyone here think that all the scientists and engineers in the field are plain wrong about scaling trends of future nodes, it will be to be proven, and mentioning Netburst architecture is not even half an argument.

Cazalan · Feb 20, 2015

Broadwell seems to throttle around 62c vs the 80c of Haswell.

http://www.phoronix.com/scan.php?page=article&item=9way-nehalem-broadwell&num=2

See the chart at the bottom for LAME MP3 Encoding v3.99.3.

AMD CPU speculation... and expert conjecture

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Glorious

Distinguished

Glorious

Distinguished

Distinguished

Distinguished

Glorious

Honorable

Distinguished

Honorable

Distinguished

Distinguished

Distinguished

Share this page