AMD CPU speculation... and expert conjecture

Page 448 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
That 58% should catapult the 290X's in P/P ratio over the Titans and 780TI in SLI, right? I wonder how MANTLE will work out the framing issue. Since it's a whole new driver part, it should be included, right? (Frame Pacing thingy).

Cheers!
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


If you want to see a modern x86 server CPU then check here. http://www.spec.org/cpu2006/results/rint2006.html

E5-2470 v2 @2.40 GHz has a SPECint_rate of 364.

How long until someone slaps 36 ARMv8 cores together to match that? ;)
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


By enabling Mantle for Intel what did they just do to the i3?
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


Mantle numbers are going to create absolute chaos with the whole "yo bro I got dis sweet gaymen PC it's Intel + Nvidia + Corsair with water cooling bro!" crowd. It's going to make a lot of people switch from Nvidia to AMD. People that would never consider AMD CPUs in the first place. Because lets be honest, Core i3 is not cost effective at any price range in comparison to AMD's offerings for gaming and if you go i3 or (god forbid) Pentium/Celeron, you're drinking Intel kool aide. And it's been my experience in lurking in forums that newbs to the enthusiast community absolutely love the Nvidia/Intel/Corsair combination. Seeing Radeon get a 50% boost in a high end review using Intel hex is going to be a massive blow to Nvidia.

Here's a review with 290x running on 3960x.
http://www.guru3d.com/articles_pages/gigabyte_radeon_r9_290x_review,20.html
Multiply those scores by 1.58 (or if you're conservative even 1.40) and see what happens.

If you're too lazy to do the math, that puts 290x at 107.44fps (optimistically).

If someone is looking to play Mantle enabled games, AMD is going to be a clear winner. And if the game isn't Mantle enabled, you're not going to end up with a horribly slow card either.

@gamerK, I know that you need to emulate and lot of chips, but I'm referring to what I thought were the largest bottlenecks. Generally, from what I understand, the major bottleneck is usually the CPU emulation itself. The other things usually aren't that bad, but I do know audio can be a problem as well.

Mantle's threading of rendering is all abstracted for the coder. At the very least I was thinking it would alleviate a bottleneck. Dolphin has tried by doing things like GPU accelerated texture decoder. Emulation has a lot of things that depend on syncing and single threaded performance, but IMO any chance you get on spreading the load around should be used. Switching from DX11 to Mantle should at least help move towards figuring things out and getting things to scale better.

Historically, the developers who get applications to scale to multiple cores have been the most successful ones.
 
@gamerK, I know that you need to emulate and lot of chips, but I'm referring to what I thought were the largest bottlenecks. Generally, from what I understand, the major bottleneck is usually the CPU emulation itself. The other things usually aren't that bad, but I do know audio can be a problem as well.

CPU emulation for most of the simpler emulators is actually quite simple; you take a group of instructions that does some purpose, and wrap it around an equivalent X86 instruction. Given the somewhat limited instruction sets these CPU's have (most being RISC CPU's, after all), this is more or less a cheap operation to do in the grant scheme of things.

Granted, running pure interpreter mode for some of these CPU's [PS2 in particular; try running PCSX2 in interpreter mode and enjoy 2 FPS] is killer, but HLE is often good enough. Synchronization kills you, and there's nothing you can do about it. That's why a Pentium 3 can handle N64 emulation, but you need a top tier Pentium 4 to run BSNES at full speed; that extra precision kills performance.
 


If you base the performance of the IGP on the APU to be the most important part of it, then yes AMD is ahead of Intel. But what Intel has is process lead and uses less power. A i7-4770K beats out FX-8350 in the majority of CPU applications, has decent integrated graphics for entry level gaming and uses half the power.

Power is a major factor that people leave out. I wouldn't buy a APU for gaming, I have a desktop for that and soon I might be able to just stream games via Steam on my home network so my HTPC wouldn't need anything more than what Intel offers but rather a good wireless or wired connection.

Intel is smarter than people think. They do more than CPUs and have been a driving force in the advancement of many technologies people assume were other companies, such as PCIe, SATA, USB and NICs for example. They have been around long enough to see where markets will go and while they were a bit late to the UMD party, they have the resources to jump in and push hard.
 
Granted, running pure interpreter mode for some of these CPU's [PS2 in particular; try running PCSX2 in interpreter mode and enjoy 2 FPS] is killer, but HLE is often good enough. Synchronization kills you, and there's nothing you can do about it. That's why a Pentium 3 can handle N64 emulation, but you need a top tier Pentium 4 to run BSNES at full speed; that extra precision kills performance.

That's because BSNES sucks. The guys at ZSNES did it the best, they had to use ASM though. Was able to run a SNES at full emulated speed on a K6-2 400. People using HLL to write emulators is what kills performance more then anything, to get any sort of real speed you need to code the HW emulation components in ASM.

The sync issue really depends on the HW and system being used, the newer generation systems are actually less sensitive to sync issues due to their slightly abstracted nature. It's still going to be a really large PITA though, your never going to run full parallel emulation, just ain't gonna happen. What can be done is have the primary thread do the HW synch and off lead individual chip emulation to their own threads. You end up with a ton of locks and pure lose efficiency but gain in scalability. The gains are different for each platform and might be more trouble then their worth.
 
Power is a major factor that people leave out. I wouldn't buy a APU for gaming, I have a desktop for that and soon I might be able to just stream games via Steam on my home network so my HTPC wouldn't need anything more than what Intel offers but rather a good wireless or wired connection.

That will never work well. Networks' aren't magical data providers, their are very real limits to what you can put on them. 1920X1080x32 comes to 66,355,200 bits per frame. 60 FPS would require 3797 Mbps of bandwidth, before 10/8 encoding happens. Inline dynamic compression forces quality to take a large hit (think youtube video) and essentially downscales the picture and upscales it at the distant end forcing a loss of information. And that's just layer 4+, at layer 3 your data is being packetized with latency being introduced. You want to talk about "input lag".

We have several systems here that digitize video signals and send them over IP to other sources, it works well enough for communication but absolutely would not work for HD gaming.

In the "power" department APU's beat any dCPU + dGPU combo. 45W for what the 7600 puts out is insane. It's why I recommend APU for SFF / power + space limited setups, it only makes sense there. Once your got a large flexible power / space budget then dGPU + dCPU work better as you can afford larger cooling surfaces.
 

Lessthannil

Honorable
Oct 14, 2013
468
0
10,860
I wouldn't take the advertised TDPs as what they pull under load, if that was what you were getting at.

http://www.bit-tech.net/hardware/cpus/2014/01/14/amd-a8-7600-kaveri-review/12

It would of been really awesome if it only did pull 45w on load but it seems way too good to be true. Not even Intel's 22nm + Haswell can do that unless you disable cores, downclock and undervolt. However, the power consumption great considering that this is a revision of the marred Bulldozer, is 28nm, and has a relatively big GPU on it.

Anyway, the GHz race was a race to nowhere and it was going against the grain of the market that is looking for smaller, more power efficent chips without sacrificing power.
 


That's because people don't understand what TDP means. TDP is the expected amount of heat that will need to be removed in order for the CPU to continue operating. Heat is nothing but electric power that has been converted into heat via ohmic heating, so quite literally waste heat = power consumption. A CPU that generated 45W of waste heat has used exactly 45W of electricity.

However, for the power consumption tests we re-enable everything in order to gauge real-world power draw. The power draw is measured via a power meter at the wall, so the numbers below represent the total system power draw from the mains, not the power consumption of a CPU itself. Measuring the power draw of any individual component in a PC is tricky to impossible to achieve.

CPU's require electricity to work, they require that electricity at a certain voltage and will pull an equivalent amount of amperage as needed. It's the MB and PSU's job to convert AC power into useful DC power to feed the system components. No conversion is 100% efficient unless your in a superconducting environment. So you get inefficiencies along the way, from the initial AC-DC conversion to the various DC-DC conversions along the way. Also PSU's are often sold based on maximum draw but that's not their more efficient draw. A PSU needs to operate at between 25 to 75% of it's draw to be most efficient, some are even tighter at 35~60% of draw. Then you have the MB circuitry, chips, bus's, timings and so on / so forth. All of that contributes to power draw, especially when load goes up.

If you look at that chart with this in mind you can actually see it in action. The 45W 7600 in that system has a total draw of 46W. At full load it becomes 90W making a 44W difference which match's to it's stated 45W TDP. Most of that draw would be attributed to the iGPU unless you were OCing the CPU.

Another thing to remember about TDP is that it's not an absolute limit, it's just a guideline on how much waste heat to expect to exhaust in most scenarios. It's also a useful guide on the amount of power a particular part will draw from the main PSU. People like to throw 450W PSU's onto everything but they are horribly inefficient for powering mini-ITX low power systems. You can use a 90W pico-PSU to power a system with a 45W 7600 and still have headway for memory / MB / HDD and not exceed 80% of the PSU's capacity.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


MANTLE's aim is to eliminate the CPU bottleneck generated by the bloatware layer of DX. If you are testing a top eXtreme intel chip with a low end graphics card, you are in a GPU bottleneck situation and MANTLE provides only single digit gains.

However, if you test an eXtreme intel chip with a pair of 290X then you are again in a CPU bottleneck situation and MANTLE provides double digit gains: BF4 MANTLE is 58% faster for an i7-3970X Extreme with two R9 290X cards.

Your predictions about MANTLE are destroyed once again. Myself predicted MANTLE would bring 30--50% gains to BF4. Kaveri with an AMD Radeon™ R9 290X falls in the middle with 41%. The 58% for an i7-3970X Extreme with two R9 290X cards is ~8% on top of my maximum predicted value. The 58% destroys your "almost nothing on the high end" prediction.

Finally, I believe that you also missed this pre-build Kaveri plus a 290

http://www.tomshardware.com/news/cyberpowerpc-zeus-mini-kavari-haswell-pc-gaming,25890.html
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


ARM is a high-efficiency design. It provides great performance in a given power constraint.

Performance = efficiency x power

For 1--2W power constraint, ARM provides more performance than x86 consuming the same 1--2W. We see now that 25W ARM provides about 2.8 times more performance than jaguar x86 cores (22W). Future 50--100W ARM chips will provide more performance than similar rated x86 chips.

I would take those Wiki numbers with a grain of salt, specially when they put Piledriver FX-8350 much slower than bulldozer FX :heink:

According to my numbers the Piledriver FX-8350 has ~3.6 DMIPS/MHz cores. And I was expecting the A57 core to be a 4.1 DMIPS/MHz design. Thus I was expecting it to be at the Steamroller core level of performance. My original prediction was

8 A57 @2GHz ~ 4 Steamroller @4GHz

However numbers shown by AMD say that it is about 1/3 faster than Steamroller.

Intel has a problem as well. Apple with its Cyclone A7 showed how a dual core offered the same performance than their last Intel quad-core clocked higher.

Some scores leaked in the web show Tegra K1 iGPU beating a higher TDP rated Haswell i5 iGPU. But I am more interested in the Denver cores, with a 7-wide design and Nvidia claiming that it will be faster than any A57, the Denver core could be at the Haswell level of performance in integer performance.



Do you really believe that AMD needs 49 Steamroller cores to match an Ivy Bridge Xeon? :sarcastic:

Besides that, I note how you pretend to compare the raw performance of a 25W SoC to a 95W CPU.

We could invert your 'logic' and compare a 95W ARM SoC to a 25W Intel CPU... but if you want know what will be beat the high-end Xeons, google Nvidia Boulder...
 


No, BSNES is accurate, ZSNES isn't, never has been, and never will be. There's a long laundry list of games with significant issues, workarounds, and hardcoded hacks to get them to work. And some stuff never got fixed; major issues with SMRPG in some areas being one of the more famous ones, or the necessity to toggle between the different rendering engines to get certain games to work properly.

BSNES is much more accurate and doesn't need a single hack for any game to get it to work, but is much, much, much slower as a result. BSNES is as close to emulating the real life SNES HW as you can likely get.
 


Did you not learn your lesson with "best case" benchmarking yet?

The 3970x case is interesting, since its unlikely to be a pure CPU bottleneck even with two GPU's; I wonder if HTT is affecting the results, similar to how they affect Args1's benchmarks. I'd be interested to see HTT On versus HTT Off results, to see if the graphics driver is getting stuck occasionally on a HTT core, which could KILL performance...I'd like to think game developers are smarter then that though...

And IMO, Mantle becomes moot if MSFT tackles the driver overhead in DX12, like it is said they are doing. No dev is going to do a OpenGL EX/libgcm backend for the PS4, DirectX 11.1/Native backend for the XB1, DirectX+Mantle backend for the Windows, and OpenGL/Mantle backend for Linux; they aren't going to re-write their backend on four separate occasions.
 

$hawn

Distinguished
Oct 28, 2009
854
1
19,060


That last line, iit shows how little you know about CPUs and power consumption!!
400MHz extra for just 5watts? Are you f**kin kidding me !! This is one good chip that AMD has cherry picked and provided to the reviewer. Most retails chips won't be so good.

You think AMD would've bothered with an insignificant 5 watts for a Desktop APU, when it could have gotten ~10% higher clocks?!!
 

8350rocks

Distinguished


Honestly, I am not sticking up for juan here at all, but that is typical. AMD advertises what they can guarantee, there will be fluctuation, but the stock configuration will be 100% achievable on any CPU, with any hardware compatible with the design. Most PD CPUs could hit 10% overclock without bumping voltage, my FX8350 is one of numerous such examples.

Really it shows how little you know about historical trends with a given architecture.
 

Sigmanick

Honorable
Sep 28, 2013
26
0
10,530
As much as we have gotten off topic in the past, I started a separate thread for Star Swarm (free on steam) and Mantle (14.1 driver not out yet) results.

http://Toms/starswarmresults.Tada!

SO, shall this thread stay open till a time when software is benchmarked for the full capabilities of the chip with regards to HUMA and such?
 
I think we would prefer you leave all of the AMD stuff in the one thread here which is a sticky for the present.

We have enough stickies to manage ... have at it over in drivers for a mantle thread but this one will stay up as a main sticky thanks.

$hawn ... abusing other users will get you sent on a holiday ... first and last warning.
 
Status
Not open for further replies.