AMD CPU speculation... and expert conjecture

Page 387 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


I know CPUz picks it up right. Only things I can think of is that I had an abandoned OC a while back. While the CPU should be at stock now, a few mobo settings are off defaults. Really shouldn't be causing the ID to be wrong though...I'll toy around a little tonight and see if I can figure out what's going on with that.

I'm more interested in the huge jump in performance between the 64 and 32 bit builds on the CPU tests. I mean, I'd expect something, but I saw a 33-45% jump, just by switching to a 64-bit binary. That seems excessive...
 
http://www.pcper.com/news/General-Tech/AMD-A10-7850K-and-A10-7700K-Kaveri-Leaks-Including-Initial-GPU-Benchmarks

But how does this compare? The original source (prohardver.hu) claims that Kaveri will achieve an average 28 FPS in Crysis 3 on low at 1680x1050; this is a 12% increase over Richland. It also achieved an average 53 FPS with Sleeping Dogs on Medium which is 26% more than Richland.

My take here, is Sleeping Dogs is much less stressing on the GPU; AA for instance takes a VERY heavy performance hit in SD, hence why it gets a good speedup when using Medium settings. Crysis 3, being much more stressing, chokes the GPU to death.
 


Doesn't make sense, I'm running an fx8350 @4.8Ghz and 16Gb DDR3-1600 yet your getting 2x memory performance. And the weight of the benchmark seems to place inordinate amount of weight on memory performance for pretty much everything. Also the 770 is showing higher then a 780 hydro, hydro's are factory overclocked higher then regular 780's as they expect you to be using a decent WC setup. I'm gonna dig around, I'm thinking something is up with the memory transfer test. I know AMD's IMC is weaker then Intel's but it's not that weak, especially seeing as mine is OC'd to 2.2Ghz.
 


There are 2 things that increase IPC, one is just Higher Frequency and the other is more Clocks Per Cycle, but yes Richland CPU is Faster Single Core Execution than FX-8350 CPU, they are both the same exact arch but Richland just have higher Frequency... hey, you´re missing some very basic info.

Just like Piledriver is about 15% faster than Bulldozer, 8% more Frequency and 7% more Clocks Per Cycle, it´s just part of the raw performance which adds to the IPC.

More IPC means more Single Core/Threading performance, not very hard to understand.

And don`t try to twist it this time... i am not saying Richland is Stronger Overall Performance, the FX-8350 is more than double the Multithreaded performance of Richland, but since Richland has more Frequency it Beats FX-8350 in Single Core performance... just making it clear.
 
It helps when you close firefox and all my other background stuff I have running.

Ok reposed new results and surprisingly I did better. Was running coretemp and Nvidia's GPU widget and actually paid attention during the benchmark. Integer and FPU are spot on. Graphics performance is trash, Java OpenGL isn't the best language for doing these in. Geometry is accurate but texture and pixel are both severely single thread CPU bound. My one 780 was at 20~25% tops during those benchmarks while a single core was at 100%. I suggest a rewrite of those metrics as there is entirely too much CPU overhead involved with each repetition. Java OpenGL has overhead but not that much. Also found out why the memory transfer was weird, it's a test of your cache subsystem which we can all agree Intel dominates in. I don't think java has the ability to write code that would bypass the CPU's caching to actually test the IMC.

So for that benchmark, CPU performance is a pretty solid test. Graphics is mostly a test of your CPU's single thread ability and memory is a test of your cacheing subsystem. Pretty good as long as people know what each thing does. Also I was using latest 64-bit java 1.7_45.
 


Blah, forgot my 770 is factory OC'd. Still, I'd imagine a factory OC'd 780 would still win, unless temps/throttling are REALLY coming into play.



That, would do it. I'm just going to say though, for the GPU tests, a single core on my 2600k never got above ~20% or so, if memory serves.

Still, if anything, this kinda shows how hard it is to write an unbiased benchmark at the OS level. What is looks like is happening is the CPU side of the house (driver side) is actually causing my OC'd 770 to perform faster then Palladins OC'd 780, even in GPU based testing. As for the memory test, well, as Palladin stated, AMD's cache issues are going to KILL them, and there really isn't much you can do to get around this as far as testing goes.
 
The 2700k still looking strong, heh. I'll do a little cheating tonight and have some fun (without hacking the code, that is) with the benchmark 😛

And in regards to Kaveri... Well, everything points that it *should* be better clock per clock than any Phenom II.

Also, I'm more curious about mobile parts this time around to be honest. I think Intel has a major advantage with Haswell there in terms of power consumption (not TDP, mind you). Hope they do fine there as well.

Cheers!
 


That is standard marketing practice. Can't fault AMD for that when everyone does it.
 


Hmm, it might be. AMD wasn't clear and about 14nm or 20nm and GloFo has been far from clear.

Perhaps what happened was GloFo bailed on 22nm PD-SOI in exchange for going 20nm FD-SOI? All I know of 22nm PD-SOI was that GloFo and IBM were speaking of working on it together around the 2009-2011 time frame. Then, GloFo suddenly dropped any mention of it and now IBM is the only one with 22nm PD-SOI.

Whatever happened I do feel like it's the reason why we have 28nm bulk SteamrollerB instead of SteamrollerA. I'm presuming SteamrollerA was meant for 22nm PD-SOI and AMD was hinting at FX SR when SR was still a 22nm PD-SOI part.

22nm would have had a chance of making a 4m/8c part profitable.
You can see an optimistic die size estimate with this formula:

22^2 / 32^2 * 315 = 149mm^2.

So I would have expected 22nm PD-SOI 4m/8c dCPU to come in around 150 to 200mm^2, which if AMD sold for $150 for 3m/6c part and $200 for low end 4m/8c part and $250 or $300 for high end 4m/8c part would have probably worked out much better than 32nm PD as long as 22nm die sizes werent that big.

If 20nm ends up as FD-SOI, a 4m/8c SR part would come in around 123 to 175mm^2. Which, quite honestly, would provide decent margins.

AMD is going to need it. Their prime case that "Kaveri is good enough CPU" has been that it plays it just fine in single player. You know, the same single player where people run around on youtube going "look at my Pentium G it has better single thread than AMD and it runs BF3 FINE look at me walk through this corridor when I'm the only one in it! SEE 60FPS PENTIUM G IS ALL YOU NEED!"

Mantle will at least alleviate some of the CPU bottleneck that's going to happen in BF4 multiplayer but it's still going to want a lot of power, where ideally it's a a lot of strong cores (think overclocked Intel 6 core), less ideally fewer strong cores and almost equally less ideally many weak cores, and 2m/4c and 2c/4t Intels will still get humiliated in large maps.

You guys can give me a month or so I can run some tests. I am gonna open up a website soon (I'm writing the CMS from the ground up so it's taking some time) and I have access to

1ghz P3, 1.7ghz P4, 2.1ghz northwood, 3.2ghz prescott, 2.9ghz or less Opteron 165, Athlon 3800+, 1.6ghz Core 2 Duo laptop, Core i7 920, and FX 8350.

Also for graphics I have Rage 128 Pro, 9700 pro, 9700 AIW (but I think it died), x800, Geforce 4 MX 440, 880GTS 320MB, two 4870s, 7970, and probably more 7000 series for cryptocurrency mining.

I can run tests soon but I am waiting on a PSU from newegg. I only have a spare working 300w one and it starts to squeal and buzz at me when I hook it up to anything besides the 2.1ghz P4 system. And my main rig is a watercooled box of spaghetti wires and tubing so I'm not touching it unless I have to.

 


IPC Stands for Instructions per clock not single core performance If anything CPI or Cycles per Instruction(Not CPC) is the most important measure of performance per clock if one processor has to take 2 cycles to get the same work as another processor that can do it with 1 its going to be slower unless the overall clock speed is high enough to overcome that issue. So again a processor clocked at 20mhz has the same IPC as a processor clocked at 6Ghz.

Richland does have a 2.5% Clock speed advantage(or 5% turbo advantage) compared to the 8350fx but its missing 8MB of L3 cache that is actually lowering performance by probably 3-5% on average and in some cases using "up to" statements 10%.

 


i understand that but people should not listen to their up to statement and then expect everything to be 30% faster when on average it isn't.
 




Thanks for the comments. Palladin, I can't explain your findings on the GPU tests. For example, in the shader test the only thing that changes is the number of iterations of a loop in a shader - the geometry and draw calls are the same, so there is no reason why CPU would climb to 100%. In fact as load on the GPU goes up, the CPU should have less load as it has to do its small part less often. CPU should go up to 100% briefly when the shader is compiled, and it is possible the compilation might take longer in later tests (if the compiler 'unrolls' the loop).

Going by the number CUDA cores in your 780 and my 430, I calculate you should be seeing roughly 2-3 times more performance in the shader test.

Perhaps the issue is related to the dual-GPU configuration, or possibly to the fact that the code uses OpenGL 2.0 (for portability) and your GPU driver is doing some CPU-intensive backwards compatibility stuff. It may also be an effect that only appears for very powerful GPUs which I can't see on my more modest systems.

You are right the memory test is primarily a test of the caches - I try to overwhelm the caches by swapping blocks up to 32MB in size, but cache performance still seems to predominate.

That, would do it. I'm just going to say though, for the GPU tests, a single core on my 2600k never got above ~20% or so, if memory serves.

Well, that is what I expect to happen gamerk - this suggests to we are seeing an effect specific to running on Palladin's machine.
 


At the basic level one can assume that GF did what's in their best interest, and AMD did what's in their best interest. AMD being a small fry and now fully divested of any GF stock they are now at the same level of any of their other customers.

GF wants to win high volume contracts from Qualcomm and other relatively small (higher yield) and low power SoCs. This put SOI on the back burner as the costs are rocketing up per node. They simply can't afford to do both at the same time. AMD is only buying around 1 billion in parts a year now. If the fab costs 5B then GF would never get their money back, much less make a profit.

We know GF is a second source for IBM 32nm parts but we don't know if that follows through for 22nm. It's unlikely given how quickly they want to bring up 14nm.

Recall NVidia made a big stink about the challenges of the new nodes and called them "essentially worthless" due to the increased costs. Everyone wants to have their cake and eat it too. Damn the physics.

http://www.extremetech.com/computing/123529-nvidia-deeply-unhappy-with-tsmc-claims-22nm-essentially-worthless
 

the Integrated Graphical Processing Unit in the apus are being called R7 series. there is no graphics card in the apus. the apu die (square chunk of wafer that sits under the metal heat-spreader) has the gpu fabricated alongside the cpu cores.
 


I ran the test, but didn't save it. My memory scores 15.1 on my 8120 (4.7ghz) @ 2133 8gb so there is something very odd if your memory is running at 2200 on the 8350. Everything else cpu wise is 1-2 pts lower.
 


Richland also officially supports DDR3-2133 so that's an advantage over the FX-8350 which officially supports DDR3-1866.

Granted people overclock memories already and that's why people didn't see huge gains from Trinity->Richland if they were already using DDR3-2133.

If Kaveri officially supports DDR3-2400 that's another potential 12% gain. Which helps push those "up to" numbers higher when dealing with any memory bound benchmark.
 


With this description, your terms of IPC is instructsions per core and not Instructions per Clock.

Instructions per clock aren't affected by clock speed, only Instructions per core (aka single threaded performance)

2nd, passmark is to be taken lightly as sometimes the numbers don't make much sense. Look at the single thread performance of the Athlon x2 370k. 1714 @ 2.2 ghz vs 6800k scoring 1568@4.2 ghz?

Passmark cpu numbers are a very basic, testing Integer and FP only, ignoring memory and overall performance.

Once you move away from synthetic testing to RL results, the 6800k @ 4.2 ghz can barely hang with the 4300 @3.8 ghz. Quite the opposite of what passmark suggests, 1568 vs 1421.

http://www.guru3d.com/articles_pages/amd_a10_6800k_review_apu,18.html
http://www.ultimatehardware.net/amd/amd_a10_6800_vs_amd_fx_4300.htm

The funny thing about the APU, very few reviews pit them against a "CPU" such as the 43xx, and generally only agains the i3. Kaveri will be the same, shown against the 6800k and not the fx line.
 


If you meant GPU, gamerk posted a link earlier.

http://prohardver.hu/teszt/mit_tudhat_a_kaveri_gpu-ja/teszteles.html

faster than 7750 gddr3, slower than 7750 gddr5, probably close to 7730 gddr5.

CPU side is just all guessing right now. Im guessing 5-10% faster than the 6800k @ stock settings. The lower clock speed is going to hurt overall expectations.
 
Status
Not open for further replies.