AMD CPU speculation... and expert conjecture

blackkstar · Mar 22, 2013

Does anyone know why Crytek optimized Crysis 3 for 8 cores? The number of people with 8 core CPUs on Steam HW Survey is about .27% last I checked. Did Crytek just optimize a game for .27% of PC users? Do you really believe that? That AMD managed to pull that off?

I take this as proof that we're going to see games end up vastly more threaded across 8 cores much faster than the Intel guys are saying.

If you're going to deny that game engines are already turning to 8 threads because of consoles, then you're going to admit that Crytek optimize for .27% of the PC gaming market.

mayankleoboy1 · Mar 22, 2013

blackkstar :

I play Crysis Warhead. It is optimized for 3000 cores.
Why optimize for 6/8 cores, when you got 2000-4000 cores to use ?

hcl123 · Mar 23, 2013

anxiousinfusion :

http://www.xbitlabs.com/news/cpu/display/20130321232000.html
^^

A market rumour has it that in April the world’s No. 2 supplier of microprocessors for personal computers

At best the world nº 3... no wonder ppl scratch their heads when they hear about >5Ghz CPUs, ppl are put in a "dichotomy box"... the world nº 1 supplier of "personal microprocessors" is ARM.

By units accounts, the all ARM ecosystems sells more chips in a couple of months than Intel and AMD put together sells in a year... yes that big a difference...

EDIT: About 8 cores Piledriver with more speed is quite possible. Richland in the Mobile format is around 10% more clock than Trinity... Trinity which already had that "temperature oriented turbo". So its mainly the process tweaked with other enhancements. Just to remember that this is a child process of the same 32nm PD-SOI used to put a ~600mm² chip at 5.5Ghz. AMD/GloFo will not reach close by several reasons, but i think a good bump is quite possible.

sarinaide · Mar 23, 2013

Haven't heard anything about which GCN parts Kaveri feature, one will assume its sea islands due to the HD8600 discrete being based on sea islands is the supporting dual graphics card. What is being said is that the iGPU on the top end Kaveri's using GDDR5 achieve higher bandwidth than the HD7750 and 7770. Will just have to wait on this a while.

truegenius · Mar 23, 2013

:/ who cares about rpm (ghz) when thransmission system (ipc) is good enough to provide performance (overall percore performance) of 10k rpm at only 5k rpm.
cars analogy B| ( dunno if i used it correctly or not :/ )

means, 1 sandy bridge core at 1GHz is almost equal to 1.5ghz of a k10 core in general tasks (with same instructions sets in use)
so why bother only about 1 thing (ghz), while combination (ghz*ipc) can result in much better result without hitting the wall of physics too early

palladin9479 · Mar 23, 2013

blackkstar :

They didn't "optimize" for 8 cores. They designed their product with a 3~4 core system and a high performance GPU in mind.

palladin9479 · Mar 23, 2013

truegenius :

IPC is a meaningless number now. The word isn't even being used properly anymore.

The only thing that matters is price per quantity of performance acquired (energy use factors in slightly here).

Any other argument ends up as nothing but dumping a bucket of your favorite colored paint onto oneself while cheering one a favorite team.

hcl123 · Mar 23, 2013

truegenius :

I like speed/clock, every enthusiast does...

and why not have both, ghz and ipc ? ... IBM just shows that is perfectly possible without breaking any laws of physics.

The z is passing the 1.5 ipc -instructions per clock-, while in the PC world x86 barely passes 1 ipc in average. This is another myth, performance depends on so much factors that is hard to begin with, but the best and more flexible way to avoid a lot of road blocks is in the software not in the CPU cores. As it is, and the why its ~1 ipc average or so, is because a CPU can already spend a lot of time stalling/waiting from a couple of instructions to the nexts... the memory wall is too big, caches yet too small... more crunching brute force, more pipes for stalling, is a steep diminishing return proposition, while clock can be a multiplying factor for everything(within limits of course).

Of course you can't expect to reach the power of an ARM solution with that vision... as example a sport cars will never have the fuel consumption of a small urban vehicle... you simply can't have both, and its not in the ISA(any of those can target both ends of the spectrum)... but ghz and ipc you can if your power targets are large enough...

mayankleoboy1 · Mar 23, 2013

hcl123 :

Well, what Intel and Nvidia are trying to do is to make "one architecture to rule them all". Meaning, they are aiming to have 1 architecture that can scale from 2W smartphone chips to 125W server chips
For Intel, they currently have the Haswell arch, and the Atom arch. By some estimates, in the era of either Broadwell or Skylake, Intel plans to have only a single arch.
Similarly, Nvidia is trying to have the same CUDA arch in GPU's as well as SoC's.

I am waiting for ARM architecture to become the next x86, whose perf cant be increased much in the same TDP. I guess we are 2-3 generations from flattening of the curve.

truegenius · Mar 23, 2013

and why not have both, ghz and ipc ? ... IBM just shows that is perfectly possible without breaking any laws of physics.

i also want this

for example
if we keep i5, fx4 pd and phenom 2 x4 at same speed, then i5 can do more work followed by phenom 2 x4 and fx4 pd will be at last position
i.e, bd/pd is only showing ghz and not doing much work, means less performance, means ghz is not translating to performance due to less performance per clock of bd/pd arch

so ramp up ghz and performance per clock and cores and softwares to use all resources smartly and efficiently

Cazalan · Mar 23, 2013

hcl123 :

Sure at a cost of 20 or more times higher than any desktop CPU people currently use. They have to cherry pick the cpus to get the higher 5.5GHz ones, and they are only sold as MCMs with 4 of them on one giant slab that's water cooled. The single cpu systems they sell based on the same chip (z114) are only rated for a much lower 3.8Ghz.

If one really wanted a super desktop computer they could outfit a liquid nitrogen cooled system for much less than the cost of an entry z196 system ($950,000). Just get 2 of these ($20,000) and you can generate 10L of liquid nitrogen per day right in your home.

http://www.elan2.com/product_elan2AT.asp

That should be enough for 4+ hour gaming sessions. When you're not gaming the system will hold up to 40L of LN2.

BTW, this is as absurd as comparing desktop PCs with supercomputer mainframes.

Cazalan · Mar 23, 2013

mayankleoboy1 :

I'm sure ARM has hit that x86 wall already. That's why the new A-15 core chips are clocked much lower than what they're designed for 2.5Ghz.

It's hard to get solid TDP numbers for the later ARM chips. They can do frequency scaling too and with the big.LITTLE cores use several kinds of performance/watt trade offs.

LSI now has a 16 core A-15 (AXM5516-B) that can go up to 1.6Ghz. I couldn't find any TDP numbers for it.

http://www.lsi.com/products/networkingcomponents/Pages/AxxiaCommunicationProcessor5500.aspx

mayankleoboy1 · Mar 23, 2013

Cazalan :

Yeah, A15 have double the perf of A9 at triple the power.

de5_Roy · Mar 23, 2013

i have a question: since all arm socs seem to be made on low power high performsnce or ultra low power processes, what will happen if a quad core arm soc comparable to ... say athlon ii x4 or phenom ii level ipc/clockrate is made on 28nm high performance process (with same gcn based igpu)? will it be close to kabini/kaveri in terms of performance?

hcl123 · Mar 23, 2013

Cazalan :

You are exaggerating, doesn't need to be 600mm² chips...besides instead of changing MPU every year or two, you could comfortably do it in 3 or 4 years, and instead of wasting even more money than the chips in after market brute force apparatus, those chips could came already optimized(and even with those apparatus).

And you can fit all the LN2 you want, there is a point where you can turn 2GHz more, and only get a 10% performance bump. Cache and interconnect is also part of the equation, no matter how much you push the logic, if all its not well balanced, it is going nowhere fast( example IBM has the lowest "ultra lowK" factor of them all with 2).

And now the foundries are going to kill me lol. This is the real reason, even Intel is a foundry now, they had/still have 3 costumers, and now they even are going to make part of the production of Apple for their 7 series.

Its a race to absurdity, that doesn't benefit "costumers" what so ever. Going smaller is relatively easy and brinks none to very few improvements now(mostly if not exclusively in low power). Going "well optimized" is where the most expense of fab development is and the dread of foundries, but is where the most improvements for "end users" might come. Its mostly for the sake of "volume" instead of quality (if it weren't we could have EUV by now, its more than ready for low volume production, the same with many other optimizing techs).

The price of the z chips is what it is, only because IBM doesn't have competition... not really(nvidia has similar sized chips and perhaps even worst yields)... if they had competition, i'm sure the price of those chips/systems could be half(at least). And they do what they do, because they since long dumped/shared the costs of fab development with many other players.

Hope this is suitable as a very good warning...

hcl123 · Mar 23, 2013

mayankleoboy1 :

This is going to fail badly... at least intel i think that would be not the case, i suspect they will always maintain 2 variants, one for low power and other for high performance, no matter how many variants (and names) evolve in the future.

mayankleoboy1 :

I'm sure ARM will spill into the DT to, and they are going to have the same issues. I believe only "architecture licensees" will pursue "high performance", the RTL guys (and ARM designs themselves) will remain "low power".

But ARMv8 64 bit IMO (has little to do with ARMv7) is even better for "high performance" than x86(of course for this the high performance ARM implementers will not try to be compatible with ARMv7, and will improve substantially compared with the A57). Its a very clean ISA and its a "relaxed memory ordering ISA" with very few fences, contrary to x86 that can only relax stores to loads. Its a killer ISA to implement various sorts of Hardware Transactional Memory schemes, and has also less dependency quirks and so permitting more "decode" bandwidth. This decode bandwidth is the PITA of x86, all this years yet no one has been really able to match 4 macro-ops decode(usually 4 pipes, but only 3 function simultaneously, even in intel), and even this 3 can be a real performance hog consuming a lot of cycles and power(qualcomm Snapdragon is already 4 decode wide for "real low power" on ARMv7 which is much worst than v8), more so to match that decode ability in full, complicated OoO memory disambiguation schemes have to be implement.

[ EDIT: forgot!.. and can be very important also for performance; ARMv8 is 32 GPR(general purpose registers), x86_84 is 16]

Sometimes this trends are so mixed up IMO lol... at least comparing with ARMv7 i'm convinced that x86 like MIPS even more, are even better for low power than ARM. Its a question of design(MIPS doesn't have) and clean or "reduced" ISA(x86 doesn't have)... if x86 doesn't try to be a GPU, and ditches AVX and the YMM register model to the garbage can, and coding/compilers avoids many quirks, i'm convinced it could be as good for low power as ARM (better, depends on design). Of course some of the performance advantage it now has could be inversed in favor of ARM.

I'm convinced ARM could be very successful at high performance desktop and or server.

esrever · Mar 23, 2013

mayankleoboy1 :

blackkstar :

I play Crysis Warhead. It is not optimized.
Why optimize for 6/8 cores, when you can not do any optimization ?

fixed

Cazalan · Mar 23, 2013

The cost is largely unknown. The minimum configuration they sell them in is a 2 full rack system that's a million dollars. You can't compare these to desktops or consumer level products.

blackkstar · Mar 23, 2013

palladin9479 :

http://www.pcgameshardware.com/aid,768604/Crysis-2-Everything-about-DirectX-11-3D-without-perfomance-drop-and-8-core-optimization/News/

Really?

palladin9479 · Mar 23, 2013

blackkstar :

Yes really.

Able to utilize and "optimized for" are not the same thing.

palladin9479 · Mar 23, 2013

Cazalan :

IBM's chips do not compete with x86, completely different market. They compete primarily with Oracle / Fujitsu and the T / M series.

People REALLY need to stop comparing x86 and Power uarch's, their both designed to do radically different things. Power and SPARC are both designed for massively parallel workloads and insanely high I/O. Current x86 is focused on desktop applications with only light parallel workloads.

mayankleoboy1 · Mar 24, 2013

palladin9479 :

massively parallel, or basically unrelated/independent of each other. (as in servers)

Current x86 is focused on desktop applications with only light parallel workloads

x86 is designed for desktop, but sadly the implementation is for mobile platforms.

hcl123 · Mar 24, 2013

Parallelism is not really "hard" encoded at the instruction level, at least not at an architectural level. If pertinent you can always add some new logical instructions(very very few suffice) for memory ordering/fence(HTM etc) as for instrumentation(profile), synchronization, prefetch and vector... all ISAs can do that... IMO could be called "trashing" if not done carefully. x86 by far has the most of that, specially in vector, a whole lot of bloat.

Parallel ISAs conceived from start, we can say it was the Itanic from a control(branch) perspective, and PowerPC, ARM and Alpha from a memory ordering perspective.

One is dead(Alpha) other is as good as dead(Itanic)... and the other 2 were always meant for client/embedded, not big parallel servers, no matter how strange it might sound.

Relaxed memory ordering might had been considered has an easier to program model, not a parallel feature. Only Alpha view it as a way for better SMT Hyperthreading, but intel failed to have vision back then(otherwise all would be intel by now), and killed it after having acquired it(i think was on purpose). Funny thing they ended up crafting SMT Hyperthreading on top of the "mule" that caries everything x86 lol

Meanwhile many changes were done; SPARC has several models, Power (the parent) is different from PowerPC (yet compatible)... but x86 was always the "ugly duck" in the picture, full of quirks and cumbersomes, yet its memory model is one of the best (the best) for low memory space, perhaps that is why it catched on fire, when you have 640K to 1MB of memory, and this last one costs a fortune comparatively(it did back then)... you may have a winner.

Things are completely in reverse now... x86 is not the best for low power neither for high performance or parallel works... and memory is now by the tons with several Gigabytes transition inside a socket.

In the Gigabyte terms Steamroller Kaveri might be the first to go there... intel has Cristalwell but i think its not in the Gigabytes for now... AMD as already done it, sort of, with mobile GPUs

http://www.techpowerup.com/img/11-05-03/17a.jpg

Now they might be preparing to do it even more closer "inside the package". Yes i think it will be "dual" memory interface of sorts, but only the DDR3 interface for DIMMS will be for outside socket, the other will be "internal" and immutable.

mayankleoboy1 · Mar 24, 2013

Meanwhile many changes were done; SPARC has several models, Power (the parent) is different from PowerPC (yet compatible)... but x86 was always the "ugly duck" in the picture, full of quirks and cumbersomes, yet its memory model is one of the best (the best) for low memory space, perhaps that is why it catched on fire, when you have 640K to 1MB of memory, and this last one costs a fortune comparatively(it did back then)... you may have a winner.

Very interesting info.

palladin9479 · Mar 24, 2013

mayankleoboy1 :

Not anymore though. That was back in the 70's and 80's. From the 80386 and onward the x86 supported a flat addressing model similar to what everyone else was using. And x64 does away with offset-segment entirely.

AMD CPU speculation... and expert conjecture

Honorable

Distinguished

Honorable

Splendid

Distinguished

Splendid

Splendid

Honorable

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Honorable

Honorable

Splendid

Distinguished

Honorable

Splendid

Splendid

Distinguished

Honorable

Distinguished

Splendid

Share this page