AMD CPU speculation... and expert conjecture

Page 72 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

airborn824

Honorable
Mar 3, 2013
226
0
10,690


Thank you, seems we should wait for a CPU that supports more DDR channels and express 3. maybe steamroller. i am a avid AMD user, mostly because of price to performance and being a business major i am not a fond of the deeds Intel has done to get where they are.
 


Duh? L3 isn't THAT much faster then needing to access main memory, and throw in the latency involved to access it, die space, and power draw, it simply doesn't make much sense. Never really thought it was that good a use of die space...
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860

http://www.anandtech.com/bench/Product/675?vs=700

Move down to the game bechmarks and you see where you need l3 cache. The 5800k is upwards of 20% slower at the same clock speed.

http://www.anandtech.com/bench/Product/122?vs=81
http://www.tomshardware.com/reviews/athlon-l3-cache,2416-6.html

Same with athlon II vs a slower clocked Phenom II, still 10% loss for l3 cache.

Sure, there are some games out there that could care less about L3 cache, but when you get one that does, you might as well shoot yourself in the foot for buying a cacheless cpu.

On the other side tho, office use and productivity is nearly identical, l3 or not doesn't matter if thats the intended use. This is where the APU is at its best, in office application usage, not as a dedicated gaming machine.

As far as main memory vs l3 cache ...

http://www.sisoftware.net/?d=qa&f=ben_mem_latency

40 clock cycles is dependent on cpu speed, but at 4.0 ghz thats 10 ns, 12.5ns at 3.2ghz, main memory is 65 ns (full random tests).
I am not sure where you get that main memory is just as fast as l3 cache when its 8.4ns to 75.6ns (nearly 10X slower?!!!!) or at best 13 clocks (3.9 ns) to 7.7 ns on sequential reads on the 2500k. Seeing the sequential times would explain why office applications don't really care, but on random access, your way off.

AMD definately needs to fix their 73 clk random l3 speed.
 
I am not sure where you get that main memory is just as fast as l3 cache when its 8.4ns to 75.6ns (nearly 10X slower?!!!!) or at best 13 clocks (3.9 ns) to 7.7 ns on sequential reads on the 2500k. Seeing the sequential times would explain why office applications don't really care, but on random access, your way off.

At least as of the Core 2 Generation, the rule of thumb was:
Data in CPU Register: 0 clock cycle latency
Data in L1 Cache: 1-2 clock cycle latency
Data in L2 Cache: ~20-30 clock cycle latency
Data in RAM: ~80-100 clock cycle latency
Data NOT in RAM: ~100k+ clock cycle latency [Page Faults are expensive!]

Based on what I've seen, L3 is about 60ish clock cycles latency to access. Anyway, in regards to sisofts tests, until I see for sure that they are not running into any pagefaults, my comment stands.
 


Then for Kepler cards it will be massive! And previous gen AMD cards as well (VLIW4 and 5).

Under that premise, I'm willing to bet that the GTX580 will be so near the GTX680 it will make no sense, hahaha.

Anyway, we had a very long L3 discussion a while back. I'm in the camp of "who cares about L3" :p

Cheers!
 

anxiousinfusion

Distinguished
Jul 1, 2011
1,035
0
19,360


Pretty much the number one reason I'm keeping up with this thread. I have unreasonable hopes that three AMD products will be released this summer;

1. Steamroller CPUs
2. Sea Islands GPUs
3. 1090FX chipset

Even if only one or two of these make it, I'll still be happy.
 
Anyway, we had a very long L3 discussion a while back. I'm in the camp of "who cares about L3"

L3 has always been one of those "if you have extra room, why not" kind of things. Of all the things on-die its providing the least performance for it's respective power / space consumption.

I've explained before that L3 was kind of the "oh sh!t" method of last resort before your forced to stall for main memory access. It's only there for when your prefetcher fails miserably and the data isn't inside L2.

That being said, SB/IB utilizes L3 a bit more then PD/BD do. Intel went with a very small L2 cache size of 256KB which is tiny by today's standards of 512~1MB. They tightly coupled that L2 cache to give it the lowest latency possible and rely on L3 to catch anything that couldn't be fit inside it's small L2.

So yeah APU's not having L3 isn't a bit deal, they have 2MB shared L2 per module. The on board GPU was far more important then the L3 cache for it's market value.
 

jdwii

Splendid



Amd has pretty much stated none of this will be out this year Steamroller non APU's will be out next year with the 1090FX(if that's what they call it) 8000HD series wont be out until 4th quarter of this year or even next year(most likley before steamroller AM3+ CPU's) if i remember correctly. Richland will be out the second quarter of this year maybe even third quarter since its AMD.
 



Interesting...

People can speculate on both sides but he had a very good point about streaming while playing. That's become all the rage these days with how interconnected social media has become. Also goes back to what I've stated in the past about needing to benchmark multiple heavy applications running at once, namely a game along with some sort of rendering software.
 

1090 is Long overdue. amd should decide if they're gonna continue with am3+ or make a new socket(with nothing new to offer) and introduce 1090fx on the new socket.
rumor is, there might not be an 8k series. looks like amd will release new gcn cards first, with new specs.
you can always count on glofo to be late. :)

how about this one below -
http://techreport.com/review/23750/amd-fx-8350-processor-reviewed/9
 



--Yeah-- I'd have to look at exactly what their doing cause there is no way in hell a four core i5 should do better then an 8 core fx8350 with that many threads. Unless the "Movie Encoder" was single threaded. In that scenario it would be a load of 3~4 cores which explains much. I don't use Windows Live Movie Encoder (many better options freely available) so I can't really speak about what it's doing.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


There are 3 products due this summer.

1) Richland (due slightly before the others)
2) Temash
3) Kabini

Steamroller (Kaveri) is Q4 at the earliest. Unknown why it's delayed so long past the other 28nm parts but it is a much bigger chip.

Really curious as to how well the 28nm (TSMC) CPUs will overclock compared to the 32nm (GF).
 


I think most of us here have already stated that the 5-10 FPS benefit that L3 may sometimes give is not worth the power consumed and die space used up, there are better means to mitigate that.

The Athlon 750K is faster than the stars based Athlons and uses less power, the L2 is also much faster with less latency than the former Athlon II's. The 750K nets similar performance with a discrete Graphics card as what the old P2 x 4 955/965 and the FX4000 series does within a few percentiles.

What we all will agree on here is that AMD has a lot of work to do on cache latencies and IMC speeds...oh wait there is SteamRoller, 37% latency reductions and 40% faster IMC as the slides say. Lets just wait and see. Already leaks of Kaviri almost doubling up Trinity, the other leaks is that x86 single threaded performance is now actually respectable compared to intels sub $200 offerings which is fair enough.
 
^^ that looks more believable.

toms' crysis 3 cpu scaling bench:
http://www.tomshardware.com/reviews/crysis-3-performance-benchmark-gaming,3451-8.html

AMD Begins to Send Software Developers Next-Generation APUs.
AMD Starts to Provide Its Next-Gen Fusion APUs to Software Developers
http://www.xbitlabs.com/news/cpu/display/20130304190201_AMD_Begins_to_Send_Software_Developers_Next_Generation_APUs.html

AMD Redoing Radeon HD 7990 Under New Codename - "Malta"
http://www.techpowerup.com/180923/AMD-Redoing-Radeon-HD-7990-Under-New-Codename-quot-Malta-quot-.html

 


37%, trying to work off a smartphone no putah and its killing me.



Screen%20Shot%202012-08-28%20at%204.38.09%20PM_575px.png


25% less miss predicts, thats about in margin of error with Intel's, l cache misses reduced 25% is also a big number generation to generation. 5-10% faster scheduling and 30% greater OPS.

It is very hard to predict how it will scale in real world but it will nontheless be faster than Vishera by some margin.

Screen%20Shot%202012-08-28%20at%204.38.14%20PM_575px.png


AMD has been vague on the 15% performance improvements, if it is in general computing across the board then Steamroller will represent a massive step forward, if it gets more then its a bonus. Still slower x86 compared with Intel but thats expected when Intel pumps 10:1 more resources into its microarchitecture. On Kaviri side, some leaks suggesting as about double Trinity's performance, considering Trinity is still capable of playing many games not included in suites at 16x10 and 19x10 at maxed or med-high settings. While its not apt for Crysis3 and the like what is certain is AMD's iGPU solution is improving, dual graphics is improving so I do agree with what AMD said, they are only a couple generations away from mainstream entry level gaming graphics on a chip.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810
With AMD's luck the switch from GF to TSMC will give the IPC gains they wanted but the process will limit them to ~3Ghz. The fastest TSMC 28nm chip I found was a 3.1Ghz ARM A9 test chip. A die that's tiny in comparison to the larger A10 style APUs.
 


AMD should guard against the smaller is better myopia, simple die shrink doesn't equate necessarily to improved performance, IB was not so much of a evolutionary step and Haswell from what is seen is hardly much more despite the 32-22nm jump. What Intel has done since Nahelem is improve the IMC significantly which has probably yielded the highest benefit to IPC's by affecting general performance the most rather than simple 10nm shrink. AMD's position is somewhat different, their fabs are sourced and limited resources prevent them from aggressively jumping the die sizes. Also AMD cannot gung ho to 22nm find that its not the lithology but rather the metal level architecture thats at fault.


 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860

I love how TR tries to twist the actual numbers around. First off, they used skyrim, amd's worst game in their benches.

Intel pre-encoding 3570k, 104FPS
AMD 8350 pre-encoding, 77 fps

with endocing, Intel, 82fps, amd 62

both dropped by ~20%, so what does it prove? nothing really, pretty much every one of the cpus dropped close to 20% until you look at the dual core cpus, pentium g2120 and I5 655 hit the hardest, go from 78 fps to 37. Conclusion : dual core are not for multitasking at all.

other than that, take a small sample for a program that already heavily favors intel, it still favors intel if you try other things.
 
Noob, its because if they didn't bench market Intel, they would buy into it then the 10 Billion USD R&D fund would look like its being used to line pockets and hold many office parties because the 10:1 R&D differential between intel and AMD is clearly not showing.

Intel shot themselves shifting Haswell to a new socket, it made end users irrate when IB a side grade to SB came out with a new chipset with neglible performance gains despite the faster IMC and goodies thrown onto the new chipsets. I feel they are going to take the flack for releasing a new socket for a same process part which ironically is another side grade made more expensive by the platform change. Then there is the iGPU, now many sane reviewers told people to hold their horses and not expect a golden lolipop, well the hype over GT is going to hit the ground hard when it barely manages to compete with Llano. The ultimate realisation here is that Intel has its fails as well.
 
Status
Not open for further replies.