AMD CPU speculation... and expert conjecture

blackkstar · Sep 21, 2013

hcl123 :

Hahahah. This is hilarious. So FMA did nothing for Hasfail? Meanwhile I see OVER 60% increase in performance when custom compiling LAME single thread benchmark in Gentoo to use AVX/FMA, etc?

I thought Intel would just cripple AMD hard, but it looks like Intel can just cripple AMD by making sure no one gets fast instructions as Intel does better with legacy code and AMD would see massive gains with the instructions enabled while Intel wouldn't.

My god it's horrifying. My testing put FX 8350 stock FASTER than Intel in single thread in LAME benchmark according to Tom's review.

No wonder all the Gentoo guys laugh at people for wanting to CFLAG rice with Gentoo. They're all running Intels and Intel doesn't get anything out of using those instructions.

Meanwhile, over in AMD land, I'm seeing things getting done twice as fast or 60% faster or whatever.

Intel is going to kill x86 by trying to shoehorn an architecture with a CISC philosophy of making new instructions for performance into competing with a RISC architecture by ignoring what makes CISC so strong and pushing CISC into a market where it doesn't belong.

This explains so much why Intel doesn't bother with AVX/FMA/etc on Atom while AMD supports all of that on Jaguar. Intel's AVX/FMA/etc implementation sucks and AMD's is good.

Oh god, the irony of this all. No wonder Intel wants to shove x87 benchmarks in front of everyone''s faces instead of going "look at the new instructions in Haswell! ITS GIVES A 50% SPEED UP!!!!"

Really starting to think IB/SB/Haswell/etc just flat out suck and are horrible designs, but Intel has enough money to "polish a turd"

juanrga · Sep 21, 2013

GOM3RPLY3R :

Well, you wrote "hardware" not "card" the first time, but doesn't matter, because you continue wrong.

Cazalan :

ARM A9 chips have been clocked above the 3Ghz. As I said no problem here.

We are comparing _ARM_ to _x86_, not one chip to another; therefore any extra advantage that an i7 chip has, such as 7x more cache than the ARM phone chip has to be considered. Recall that desktops and supercomputers will be not using phone chips but desktop chips with large caches as well.

I think you continue misunderstanding the Mont-blanc project. They are not going to build a supercomputer with 32 bit phone ancient Tegra 2/3 chips. Nvidia last is Tegra 4. Those phone chips are only used to built a prototype for analysing the possible weak points of an ARM-based cluster. In the same presentation they explain how the detected weak points are due to implementation in phone chips, not problems with ARM architecture. They also explain how the new ARM server chips will solve those issues. Recall that the final ARM supercomputer will use a _supercomputer_ class chip from Nvidia project Denver.

I am not implying that ARM can scale forever, that is physically impossible for _any_ chip. I am merely remarking that ARM chips are improving to a much more higher rate than Intel chips.

ARMv8 adds much more than just 64bit processing.

Of course, there is not free lunch but Intel has tried its best with the Silvermont architecture and one week after Apple came with its first ARM64 core and humiliated Intel.

The GPU in an i7-2770k is very small. The Opteron has not GPU. The GPU in the new Apple A7 is BIG.

juanrga · Sep 21, 2013

What if the rumoured 3 module kaveri was finally replaced by a 2m module version, where the third module is used for iGPU? Then the top Kaveri iGPU wouldn't be a 512SP, but a 832SP

http://wccftech.com/amd-desktop-kaveri-apu-13-cus-enabled-radeon-r5-m200-832-stream-processors-gpu-spotted/#ixzz2fXJUsDAF

blackkstar · Sep 21, 2013

juanrga :

AMD has said they are treating everything as interchangeable modules. SR modules, GPU CUs, Jaguar Cores, etc.

It should be technologically possible for AMD to even go 1M/2c with the rest of the die dedicated to GPU.

I suppose it just depends on how much AMD thinks they can sell of a specific configuration. But 2m/4c with 13CU GPU on $114 enthusiast board would be a heck of a gaming rig. If you were smart about it, you could walk away with a gaming PC (provided you already had a monitor) for less than an XBone.

Gaming DT Is still important to AMD and they aren't leaving that market. Eagerly awaiting what we see on the 23rd, a new high end gaming platform based on SR cores released on the 10 year anniversary of Athlon would be too good to be true.

But it almost always seems like these types of announcements made on social networks are useless and aimed at the less educated consumers, who might not even fall into enthusiast territory. I say this by simply reading replies to the announcements on FX twitter in regards to the sept 23rd posts made by AMDFX.

hcl123 · Sep 21, 2013

jdwii :

I don't know... its all speculation now... but the discrepancy between the earlier 8CU (512sp), might exactly be because the first iteration follows past releases and Kaveri 2013 is mobile -> low power on bulk... while kaveri desktop is 28nm FD-SOI or PD-SOI half node compared with "bulk" ( 30 to 40% shrink) so there is room for much larger GPU... yet the size of both chips should be identical (~260 to 270mm²)

Yet even 13 CUs wont cut the mustard... Hawaii would have between 36 (lower) up to 48 (max of PS4 layout)... and 2014 Radeon on 20nm, since Nvidia will respond, could have up to 64 (4098 sp) if not much more...

You wont put those "monster" GPUs (relative) together with an APU... better to turn off xfire and the APU GPU from getting in the way. And so *IF* APU will be the *only* high end for 2014, is akin to say AMD will be making top Radeons for Intel only... kind of sounds weird even in the most wild speculations!...

hcl123 · Sep 21, 2013

juanrga :

Yes only 2 modules makes sense... APU is for mainstream entry level, is not meant to replace all the market. Even 832sp doesn't make a decent game rig by any standard, not in PC (consoles is different), and if you go xfire even with HSA for games then 4 CPU cores might be too short, better would be the xfire X thing with a CPU.

blackkstar · Sep 21, 2013

juanrga :

Yeah, but problem is that while they are shipping same base and turbo clocks stock, IB just turbos better.

If anything IB should be lower, SB can hit 5ghz with decent cooling but IB is lucky to make it above 4.5ghz.

If you want to talk about maximum potential of these chips, ergo IPC increase and maximum 24/7 clocks, IB loses more than 10% of clockspeed yet never has more than 10% gain in performance.

At maximum clock speed of both chips, IB is actually significantly SLOWER than SB.

You are saying IB is better because it can hit 45mph faster in an area with a 45mph speed limit while SB can do 100mph and IB can only do 85mph. It is good for most people but it does not mean that the slower chip that hits the artificial limits quicker is not a better chip overall. It is better in a specific situation but not better overall. Ergo IB's main improvement comes from improved turbo, probably due to node shrink, as opposed to architecture changes.

Basically, everyone who goes "MUH IVY BRIDGE IPC!!!!" has no idea of how turbo works at all and they assume that if given base and turbo clocks are the same between different chips, that the chips will always run at the same frequency.

CPU and GPU boost is the benchmark scam of the century, we need to start seeing reviews test with turbo on and turbo off. If you have logic which dictates clock rates based on thermals and such, you can just ship reviewers chips that run cooler, which in turn turbo better, which means better benchmarks.

Low and behold Intel usually has 11% increase in clock speed when turbo is enabled while AMD only gets 5% increase. So it should be little surprise why Intel seems to get better gains in single thread, turbo is working better.

But the joke is on the folks who overclock their chips. They go "omg I OCed my chip to 4.5ghz from 3.5ghz now I'm 28% faster in single thread!" when it's really going from 3.9ghz to 4.5ghz which is a 15% increase in clock speed.

Turbo, marketing crime of the century for CPU/GPU companies.

Cazalan · Sep 21, 2013

hcl123 :

The 2014 server roadmap is already out. There is only Warsaw (Piledriver) refresh. Likely just to fulfill contract obligations with Cray.

You're looking at 2015 at the earliest for a new high end server chip. AMD may skip right to Excavator at 20nm or even discontinue the product line. The new management is so heavily focused on the embedded and custom space.

AMD talks about the things they're excited about and none of that has to do with big core chips. At the recent HotChips AMD was talking about Jaguar not Steamroller/Excavator.

Yeah they've said that big core chips aren't dead yet but Intel said that about Itanic as well.

Cazalan · Sep 21, 2013

juanrga :

Yes like I said they have had 10+ years of road-map to copy from, so yes they are advancing at a quicker pace. But now with ARMv8 they're at a peak. What's left to make the next giant leap that won't also cost significant TDP?

There are only so many knobs to turn before you end up with a 100W processor. And they've turned practically all of those knobs. Those on the ARM bandwagon seem unaware of this.

They may have been most efficient with tiny 32bit cores with light workloads. That isn't guaranteed to hold true with 64bit large memory and heavy workloads. It's an entirely different ball game and usage scenario.

So far we've only seen a handful of benchmarks on an Apple A7 which is a tightly tuned and controlled OS. It's more like a game console than a general purpose desktop computer.

hcl123 · Sep 21, 2013

Cazalan :

No... the *new* chip based on SR will have a new socket (must), different from AM3+, from C32 and from G34+. Warsaw is G34+, its compatible with the old G34, so up to 4 DRAM channels and MCM formats.

The new chip should replace FX and all Opteron server parts from the 3200 series up to the 6000 series for single chip up to 2 P... meaning probably a socket with provisions for 3 mem channels and up to 2 x16 HT links (or equivalent PCIe and or combo) + I/O (FSB -> which could be PCIe ). Replaces AM3+ and C32.

The G34+, since its compatible with G34 and 6300 server parts, will have provisions for 4 mem channels, and up to 4 x16 HT links (being 1 internal + misc I/O DCCL for MCM) + I/O .. or equivalent x16 PCIe.

Compatibility with 6300 and Open Server platform must be why its PD, a fast and easy upgrade, the same plenty of server features, on a platform which is the most lucrative for AMD of the kind... while the new chip will be mostly 1P and for enthusiast, being server an adaptation, like intel used to do since long.

Its NOT an high end server chip... matter of fact high end server chip at AMD now, is for Seamicro... if you don't want to say its the MCM G34 like, which will be still PD (warsaw). The new one wont have capabilities for MCM, ~half the links, and they can cut also on RAS features (though registered ECC DRAM should be supported).

[Matter of fact the is new socket with up to 3 mem channels, DDR3 or DDR4, and 3 links for I/O, 1 FSB and 2 Expansion (HTX or PCIe - combo)... could be called FM3, be prepared for APUs with more than 2 m/4 cores and bigger GPUs... WHY NOT ? ---- that is a single socket for everything 1 P, single "processor" platforms ( more only trough expansion) with hUMA/HSA capabilities ]

juanrga · Sep 21, 2013

Confirmation that AMD has a "High Performance ARM core" project. The same guy was working before in Excavator and Piledriver. This could imply that AMD big cores dept. is now working in the custom ARM core leaded by JK.

http://www.linkedin.com/pub/david-li/1/689/33

juanrga · Sep 21, 2013

hcl123 :

I have been saying for days why the 1M/2M Kaveri specs makes sense.

If all this is right, the top iGPU could easily provide 2x the performance of a 6800k. I still believe final freq. will be ~900MHz.

I estimate the 2M CPU ~ i5 ~ FX-6000. Therefore APU will be good for dGPU for __most__ gamers. Rest of gamers can rely on a FX-8000/9000 series (or go for Intel) before Carrizo was here in 2015.

Cazalan :

I think just the contrary. Intel is at the peak of the x86, whereas ARM is starting to release its true potential. Look at AMD, they have benchmarked the new A57 core and found it is better than its own jaguar core, which was already an optimal peak in x86 _64 bit_ space.

I estimate Seattle to offer the raw performance of some of the fastest 16T Xeons, but consuming only a fraction of the power. And I am convinced that custom cores will do it much better.

Nobody knows what will come exactly. E.g. some rumours point to largest caches, whereas other rumours say that Nvidia is working in a non-traditional design without L1/L2/L3 caches, where the CPU is directly connected to a >1TB/s connection to a shared cache with priority access policies. Who knows?

Regarding the A7, the new chip is excellent even taking OS effects.

jdwii · Sep 22, 2013

hcl123 :

Cazalan :

No... the *new* chip based on SR will have a new socket (must), different from AM3+, from C32 and from G34+. Warsaw is G34+, its compatible with the old G34, so up to 4 DRAM channels and MCM formats.

The new chip should replace FX and all Opteron server parts from the 3200 series up to the 6000 series for single chip up to 2 P... meaning probably a socket with provisions for 3 mem channels and up to 2 x16 HT links (or equivalent PCIe and or combo) + I/O (FSB -> which could be PCIe ). Replaces AM3+ and C32.

The G34+, since its compatible with G34 and 6300 server parts, will have provisions for 4 mem channels, and up to 4 x16 HT links (being 1 internal + misc I/O DCCL for MCM) + I/O .. or equivalent x16 PCIe.

Compatibility with 6300 and Open Server platform must be why its PD, a fast and easy upgrade, the same plenty of server features, on a platform which is the most lucrative for AMD of the kind... while the new chip will be mostly 1P and for enthusiast, being server an adaptation, like intel used to do since long.

Its NOT an high end server chip... matter of fact high end server chip at AMD now, is for Seamicro... if you don't want to say its the MCM G34 like, which will be still PD (warsaw). The new one wont have capabilities for MCM, ~half the links, and they can cut also on RAS features (though registered ECC DRAM should be supported).

[Matter of fact the is new socket with up to 3 mem channels, DDR3 or DDR4, and 3 links for I/O, 1 FSB and 2 Expansion (HTX or PCIe - combo)... could be called FM3, be prepared for APUs with more than 2 m/4 cores and bigger GPUs... WHY NOT ? ---- that is a single socket for everything 1 P, single "processor" platforms ( more only trough expansion) with hUMA/HSA capabilities ]

You say it must why do you say that? Newer ram? Not enough power? Needs more pins on CPU? Why does it HAVE to have a different board

de5_Roy · Sep 22, 2013

jdwii :

mmmm... moar shaders. i like moar shaders. i like them moar than moar cores. power gating and binning should enable amd make a hell of a lineup from an apu like that.
but. the memory bandwidth.... i dunno if sandra's not optimized for kaveri .... it looks too low, way too low for a 7700 class gpu. imo it criminally low. i would blame software but amd has not improved their imc since llano. why does amd think they can brute force shaders instead of improving memory bandwidth? !@#$ damnit. T_T

edit: they tweaked sr for 'feeding the cores more', yet they seemingly didn't tweak the imc to 'feed the igpu's compute units more'. for shame! feed the damn igpu first! now i want a haswell-level memory perf (o.c.) in that. better if it's tri/quad-channel. don't care about yields, die area or cost. 😡

sarinaide · Sep 22, 2013

If AMD release the fastest single GPU mark my words, the world will stop and the Green boys will slit wrists in depression because despite not mattering one bit people seem to love the "mine is better than yours game".

And if the APU rumours are right yay will be able to play HD on a Chip. Get the drivers to fix dual graphics and APU's will take a huge percentage of market share back, particularly in a world getting poorer.

juanrga · Sep 22, 2013

For those who said me that Intel was having problems with 14nm

Right Broadwell 14nm. Left Haswell.

juanrga · Sep 22, 2013

de5_Roy :

jdwii :

mmmm... moar shaders. i like moar shaders. i like them moar than moar cores. power gating and binning should enable amd make a hell of a lineup from an apu like that.
but. the memory bandwidth.... i dunno if sandra's not optimized for kaveri .... it looks too low, way too low for a 7700 class gpu. imo it criminally low. i would blame software but amd has not improved their imc since llano. why does amd think they can brute force shaders instead of improving memory bandwidth? !@#$ damnit. T_T

edit: they tweaked sr for 'feeding the cores more', yet they seemingly didn't tweak the imc to 'feed the igpu's compute units more'. for shame! feed the damn igpu first! now i want a haswell-level memory perf (o.c.) in that. better if it's tri/quad-channel. don't care about yields, die area or cost. 😡

Memory Bandwidths:

* AMD Kaveri APU R5 M200 @600 MHz (DDR3-1600) - 15.0 GB/s
* AMD Richland APU HD 8670D @844 MHz (DDR3-2133) – 12.4 GB/s

Therefore expect huge improve, ~18GB/s, with >2133 memory.

Note: Kaveri is dual channel.

de5_Roy · Sep 22, 2013

juanrga :

i know kaveri has dual channel controller. i posted that out of frustration.
toms' recent richland memory scaling article shows sandra memory bw bench getting around 16 GB/s from 2133 ram and 14-15 GB/s from ddr3 1600 ram, more than shown in wccdefgijtech.com's leak.
haswell gets 21 GB/s minimum, from ddr3 1600. if amd hasn't improved kaveri's imc and put even more shaders on the apu(with per-shader perf higher than richland) the igpu will be starved even more. i tell people to stay away from radeon hd 7750's ddr3 version. that card can get 28~ GB/s. therefore, as usual, i am keeping my expectiations low. it's amd(or intel or nvidia or michael bay) after all. hell, stick 1-2 GB gddr5 on it, or on the mobo (somehow) i'll buy one. ;_;

hcl123 · Sep 22, 2013

jdwii :

To scale... more performance more cores, slightly different more I/O capability (its supposed to support PCIe alongside HTT).

All that needs moar pins, moar DRAM channels... different socket.

sarinaide · Sep 22, 2013

de5_Roy :

I have already said that I would much rather remove two DIMMs and use the space to populate it with GDDR5 under passive cooling and just change the interface that the GDDR5 is exclusive to the iGPU only but RAM is shared as per norm with the HUMA stuffs 😀 That said HD5100 vs HD8670D in BF3 is night and day, Intel is still very slide show despite the FPS recorded at smooth, Intel have a iGPU problem while AMD have a bandwidth problem 😀

etayorius · Sep 22, 2013

AMD will celebrate the Athlon 10 Years, i do hope they got something nice to announce at least, they posted again on FaceBook:

*Make sure to come back tomorrow to celebrate the 10 year anniversary of our FX Series processors!*

Come on AMD, at least announce a nice new CPU.

griptwister · Sep 22, 2013

If these are all real... And AMD hasn't made the full on driver enhancements... AMD has already won the battle...

http://wccftech.com/amd-hawaii-r9290x-gpu-volcanic-islands-benchmark-results-exposed-fastest-gpu-planet-1020-mhz-clock-speed/#ixzz2fbjDR4Bk

Maybe I'll go out and buy a GTX 780 lightning if it goes on sale for $450 xD Most likely I'll wait till next year when the performance of midrange GPUs almost double from the last generation. 🙁 I'm kind of okay with that.

hcl123 · Sep 22, 2013

de5_Roy :

It could be faked, a spoofed sissoft sandra thing.

This is the same logic/reason why a Hawaii without a good FX doesn't make sense... the entry level of VI is supposed to have 384sp... with a 512sp APU is already moot why develop such thing, with a 832 then is absurd.

http://www.3dcenter.org/news/amds-radeon-r7-240-r7-250-basieren-auf-dem-oland-chip

blackkstar · Sep 22, 2013

hcl123 :

Oh not HCL but AMD is done with enthusiast market entirely! You can match your $600 graphics card with a $150 high end APU in a $114 high end FM2+ board and you have the ultimate gaming experience! ITS GAMING EVOLVED!!!!

AMD is gonna tell us on Monday, the 10 year anniversary of the Athlon64, that they're don't trying to make super-fast CPUs and instead will move ARM to the desktop!

Ok, I'm done with the sarcasm, but the dream announcement on Monday would be a new gaming platform which supports HSA and is based around Steamroller followed by R9 290x announcement with R9 supporting HSA.

SR should be close to SB/IB in IPC. Jaguar is already close and AMD is saying SR is twice as fast, ergo SR should be dead even. Even 3m/6c SR would put up a heck of a fight and if AMD crammed 4m/8c SR into a 315mm^2 die they're going to have multi-thread performance on 3930k AND a smaller die.

I don't see AMD letting Intel's idleness getting away. And a new gaming platform centered around the anniversary of the day AMD released a product which blindsided Intel would be a marketing stroke of genius.

For all we know, and this is my inner optimist speaking, AMD has kept desktop plans under wraps for a while because they were waiting to release GPU + CPU around the same time to release a gaming platform instead of a CPU and a GPU. And at the same time of anniversary of one of AMD's greatest products?

I wouldn't be surprised if AMD released SR dCPU with Athlon name and it crushed Piledriver, and then left some high stock clock parts as FX models.

crimson87 · Sep 22, 2013

Going through the topic I thought about the following:

If AMD will only release APU's from now on , we would have to use Intel CPU's to drive AMD high end cards right?
Unless there is a 30+ % increase in processing power on those APU's I am seeing this more and more feasible since I am not seeing AMD giving up on the high end graphics market.
Besides , If the only choice for an AMD consumer is to get APU's , we'd better start seeing good results on Crossfire X... I mean , I would not want to waste money on an IGPU I won't use in the end.

AMD CPU speculation... and expert conjecture

Honorable

Distinguished

Distinguished

Honorable

Honorable

Honorable

Honorable

Distinguished

Distinguished

Honorable

Distinguished

Distinguished

Splendid

Splendid

Splendid

Distinguished

Distinguished

Splendid

Honorable

Splendid

Honorable

Distinguished

Honorable

Honorable

Honorable

Share this page