AMD CPU speculation... and expert conjecture

Page 447 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

jaguar based opteron x2150 seems crippled compared to opteron a1100 from the specs. x2150 has 4 cores, 128 gcn shaders, No L3 cache, lower memory speed of ddr3 1600, no pcie gen 3.0 support, 2ghz max clockrate, no freedom fabric support etc.
http://www.extremetech.com/computing/156882-amds-kyoto-kabini-gives-amd-the-advantage-over-intel-in-low-power-servers-for-now
intel's avoton atoms can have 8 cores, no L3 cache, 2.40ghz max clockrate, no pcie gen 3.0, 64GB max memory capacity etc.
http://ark.intel.com/products/codename/54859/Avoton
both kyoto and avoton have 4MB L2 cache, ddr3 1600 ram speed.
 
DX9 plugin has been deprecated for quite some time. DX11 version is vastly superior. That's what you normally use now. perhaps I should have been more clear, OGL plugin is slower than DX11 version.

Not to mention I have compiz running on a 1440p monitor and 1200p monitor. I spend all this time customizing compiler settings and running Gentoo, only to enjoy fglrx driver and poorly coded OGL applications causing problems with tons of eye candy enabled.

I'm not gonna whine about fglrx like everyone loves to do though. I've been using fglrx since mid 00s on my x1600 mobility, and I've been using windows ATI driver since Rage 128 Pro. People who think Catalyst and fglrx suck now must have amnesia, because old Nvidia driver from 8800GTS days and old ATI/fglrx sucked way, way more than fglrx and catalyst do now.

I still remember going back to Nvidia with 8800GTS. Everyone went "NVIDIA THE DRIVERS ARE SO MUCH BETTER!"

The first time the drivers crashed and SMARTGART wasn't there to save me from a BSOD, I was pretty pissed off at every idiot who told me the drivers were better for Nvidia, haha. At least we don't have to worry about that anymore.

I am just hoping for a Mantle plugin for Dolphin. It would more than likely help a ton with getting the program to thread better as the render thread would be spread amongst x number of cores.

I might even give it a shot, I wonder if people would send me bitcoins in exchange for it. I love me some bitcoins. Set up file server for house, install 6950, leave it mining 24/7, make profit from running home file server.

AMD has a bit of a history with lacking OGL performance; that's likely part of the problem.

Secondly, emulators are NEVER going to use more then a handful of threads. That's the primary reason systems like the PS3 are going to need decades to emulate. You are emulating a system, and if you want perfect emulation, you are going down to the individual bus timings. There is simply no way to thread this, when you yourself have no control over when these threads are actually executed by the OS. Even if you could [say, using cooperative threading or fibers], you still have to put software locks all over the place to keep all the data in sync. Its an inherent single-threaded processes; that's why using a separate thread to handle audio processing, while an option, is very unstable and not recommended for use.

Dolphins performance problem isn't GPU side, its CPU side.

EDIT

Also, many, many games have known memory leaks on the GPU, a few that are OGL specific. I know Mario Golf crushes any card in existence in about a minute of gameplay [2GB VRAM usage]. Dolphin has a LOT of issues to work out still.

END EDIT

As for bitcoin mining, you can't make a profit these days without specialized FPGA's. Throw in the wild price swings, and the fact cheaper alternatives are flooding the market, and you have all the components for a market crash. Anyone who hasn't mined them already missed the boat, and the chance at a profit.
 


Software limitation. Intel chips can address up to 48 bits, AMD chips can address up to 56 bits. The hardware is there to handle that much, but to keep the software simple, most OS's put a *sane* limit on the maximum amount of memory you can address. Enterprise versions of Windows server, for instance, can address 4TB of physical RAM.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366778(v=vs.85).aspx
 

truegenius

Distinguished
BANNED

if comparing to nvidia then this history is old :no:
i see in ogl benches that amd performs better than similarly priced nvidia gpu



*fixed :p :pt1cable: ( for example memory limitations of desktop windows )
 

The CPUs will not support more due to the way x86-64 is set up on low powered cpus, its not even a software limitation, its hardware. I think its probably due to lowering power consumption. AMD probably would have loved to have jaguar based cpus be able to use more memory on their micro servers.
 

truegenius

Distinguished
BANNED
^ a10-7850 + r9-290 !

45384916.jpg
 

no, there've been pcs like that. i want to say that unlike trinity and richland, kaveri has a legitimate shot at small form factor gaming, but i haven't seen temprature and throttling tests yet. i think amd, in part, sorta softened kaveri's moementum to hype their seattle launch/unveiling. we might be seeing more and more kaveri parts in sff gaming pcs in the near future.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


Yeah, I know emulators only really use one or two threads, and it's because the consoles that are emulated are all single core systems.

IIRC, one thread goes for emulating the CPU and another is set up for rendering. I was thinking Mantle would get rid of the single rendering thread bottleneck (maybe). That's what I was getting at. As far as spreading the emulation of a single core CPU system out, it's pretty impossible.

I do agree. Plus there seems to be some sort of issue with fglrx where performance takes a huge nosedive after running lots of programs. But I'm not sure if it's fglrx or KDE or compiz or whatever. It seems to be better since I switched to compiz from kwin but it seems like there's still certain applications which cause it to happen.

You don't directly mine bitcoins anymore. You mine altcoins and then convert them to bitcoin or you hold onto the altcoin if you think the altcoin is going to be worth something someday. I've spent about $30 in electricity so far and my cryptocoin portfolio seems to bounce between $140 and $300 in value.
 

Master-flaw

Honorable
Dec 15, 2013
297
0
10,860

Where's this?
Can't find an add for it on microcenters site.
Nmind found it.
 

finally, some data. i didn't quite understand the unit being used - ms/f. is it miliseconds per frame?
amd has enabled only gcn 1.1 (hawaii, bonaire and kaveri's R7) support for launch.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
New Passmark baseline for the A10-7850K has appeared and almost hits the predicted 6000 points scoring 5997 points at stock

http://www.passmark.com/baselines/V8/display.php?id=19198305737

Other baselines at stock for Steamroller

http://www.passmark.com/baselines/V8/display.php?id=19096151266

http://www.passmark.com/baselines/V8/display.php?id=19024964763

The average of the three is 5874. I wonder who submitted this crippled baseline for Kaveri and why

http://www.passmark.com/baselines/V8/display.php?id=18936243269

In the above baseline Kaveri was crippled at Trinity level of performance

http://www.passmark.com/baselines/V8/display.php?id=19341764977



It looks better than I expected: the A57 core is about 35% faster than the jaguar core with the bonus of consuming much less power! This implies

8x Steamroller cores @2GHz would hit ~60 SPEC_int
8x A57 cores @2GHz hits ~80 SPEC_int

Thus A57 cores are above the Steamroller level of performance (I predicted they would be at the same level, but A57 cores are finally better).

Regarding floating point:

8x Steamroller cores @2GHz offers maximum of 128 GFLOPS
8x A57 cores @2GHz offers maximum of 128 GFLOPS

It is now made evident why AMD abandoned the 4/6/8 core Piledriver based Opterons, why is releasing the new 12/16 core Warsaw Opterons only for legacy customers, and why doesn't release any Steamroller based Opteron. An 8-core Opteron obtained from dual packaging quad-core Kaveri CPUs (plus extra L3 cache) would be slower and inefficient than the new Opteron A1100. And without Steamroller Opteron there is no Steamroller FX.

It is worth to mention that the 16 core Warsaw Opteron offers 2.6 GFLOPS/W whereas the new 8 core Opteron A1100 offers 5.12 GFLOPS/W, which is ~4x the efficiency of Piledriver! Recall Steamroller is only 10-15% more efficient than Piledriver.

All those who said that ARM was low performance and that couldn't match x86 in decades were wrong. Another of my predictions in this thread has been verified.

Those who claimed here that a friend at AMD supposedly said him that ARM is an experiment would read what AMD said during the presentation of the new Opterons:

After the announcement at the OpenCompute Summit we had the opportunity to discuss the impact of the new ARM server processors on AMDs business with Andrew Feldman. The most interesting aspect lies in the significance of the x86 and ARM technologies for AMD going forward. As current forecasts show, by 2019 CPUs based on an ARM architecture will comprise 25% of the server market, with the remaining 75% still being x86 based. However at AMD, Mr. Feldman actually expects a fifty-fifty share at AMD by that time.

And custom cores such as Nvidia Denver will be better than standard A57. There are rumors that AMD will release custom cores for Cambridge SoC.



I predicted in this thread, that we would see Kaveri paired with a 290. I predicted this when I discussed the slide #13 of a talk given to OEMs by AMD. My claims received a lot of criticism then, but they are confirmed now.

AMD is providing also some MANTLE performance gains for Kaveri+290

BF4: 40.9% (1080p) and 40.1% (1600p) performance improvement under Ultra settings and 4xAA on the AMD A10-7700K with an AMD Radeon™ R9 290X.

StarSwarm: 319% (1080p) and 281% (1600p) performance improvement in the “RTS” test on Extreme settings with the AMD A10-7700K and an AMD Radeon™ R9 290X.

http://www.brightsideofnews.com/news/2014/1/29/amd-release-mantle-into-the-wild-huge-improvements-in-cpu-bound-apps.aspx
 
Yeah, I know emulators only really use one or two threads, and it's because the consoles that are emulated are all single core systems.

No they aren't. You have to emulate SEVERAL specialty chips, DAC's, co-processors, and the like, and every one needs to stay perfectly in sync. The SNES, for example, had FOUR chips you had to emulate [not counting the expansion chips, such as Super-FX and the like]. You had the main CPU [Ricoh 5A22], the Picture Processing Unit, and the SMP and DSP chips that handled audio and signal processing. The idea that you are emulating a single chip is the fallacy most people make when discussing emulation; you are doing a LOT more then that.

Byuu, the author of BSNES, and later, Higan, probably put it best in an interview with ARSTechnica a few years back:

http://arstechnica.com/gaming/2011/08/accuracy-takes-power-one-mans-3ghz-quest-to-build-a-perfect-snes-emulator/2/

The primary demands of an emulator are the amount of times per second one processor must synchronize with another. An emulator is an inherently serial process. Attempting to rely on today's multi-core processors leads to all kinds of timing problems. Take the analogy of an assembly line: one person unloads the boxes, another person scans them, another opens them, another starts putting the item together, etc. Synchronization is the equivalent of stalling out and clearing the entire assembly line, then starting over on a new product. It's an incredible hit to throughput. It completely negates the benefits of pipelining and out-of-order execution. The more you have to synchronize, the faster your assembly line has to move to keep up.

Systems like the SNES and NES are simple enough though, where 99% of stuff works even if your timings are off by as much as 20%. Once you start emulating more modern systems, which run on top of an actual OS capable of doing its own addressing however, you need to be essentially timing accurate, as one bad pointer can blow up the system [similar to a 0xC0000005 Status Access Violation in Windows when a program clobbers memory]. And that synchronization is what kills performance. Throwing in multi-core main CPU's farther exasperates this problem, hence why I am firm in belief we simply don't have the necessary processing power to even come CLOSE to emulating modern systems like the PS3 and 360 [ESPECIALLY the PS3].
 
AMD is providing also some MANTLE performance gains for Kaveri+290

BF4: 40.9% (1080p) and 40.1% (1600p) performance improvement under Ultra settings and 4xAA on the AMD A10-7700K with an AMD Radeon™ R9 290X.

StarSwarm: 319% (1080p) and 281% (1600p) performance improvement in the “RTS” test on Extreme settings with the AMD A10-7700K and an AMD Radeon™ R9 290X.

http://www.brightsideofnews.com/news/2014/1/29/amd-rele...

Don't forget the not CPU bottlenecked outcomes:

GPU-limited scenario: 2.7% (1080p) and 1.4% (1600p) performance improvement under Ultra settings and FXAA on the Core i7-4960X with an AMD Radeon™ R7 260X

GPU-limited scenario: 5.1% (1080p) and 16.7% (1600p) performance improvement in the “RTS” test on Extreme settings with the Core i7-4960X and an AMD Radeon™ R7 260X

Who was the one here who predicted all the gains would be seen on lower tier systems? Almost nothing on the high end, as expected.

Also worth noting: Who would buy a Kaveri, then pair it with a high-end dGPU? So I'd wager even the CPU bound numbers are inflated; use a PD CPU and a slightly lower GPU (7750?), and I'd expect maybe 10-15% gains, at best. Looks like AMD is testing in an artificially bad and unrealistic CPU bottlenecked situation.
 
AMD A8-7600 Kaveri APU and R7 250 Dual Graphics Testing - Pacing is Fixed!
http://www.pcper.com/reviews/Graphics-Cards/AMD-A8-7600-Kaveri-APU-and-R7-250-Dual-Graphics-Testing-Pacing-Fixed/


ags1's link has dice's johan andersson running bf4 mp on an fx 8350 pc with 7970 at 1080p.
 


But they also say that the MANTLE patch for the 7970 will come at a later date, so is not that meaningful, right? Or I'm not understanding something there?

Cheers!
 

yeah. it seemed like a proof of concept type thing.
 

truegenius

Distinguished
BANNED

that means wiki is right :ouch:
i had suspicions too
as arm arch is known for low power consumption, and cortex-a15 and krait are hitting >2GHz clock at 4 cores using power in single digit (watt) with ipc per core of 3.2-3.5mips/mhz compared to 3.78 of bulldozer (wiki is showing 2.9 for piledriver :pfff: )

http://en.wikipedia.org/wiki/Instructions_per_second
amd's cpus are getting trashed by arm's cpu in ipc per core and in energy efficiency too
performance of mobile cpu's and gpu's are increasing very quickly as compared to desktop

the gpu in tegra k-1 is supposed to have 365gflops which is better than hd4400 desktop gpu (max 260gflops @1350Mhz) and is equal to radeon 7640G (350gflops @685mhz) :lol:

so will we be playing GTA 5 on mobile devices before its desktop launch :D:sarcastic::lol:
 
Status
Not open for further replies.