Shading, pixel fill rate, texel fill rate, geometry throughput.... in fact most of the key graphics performance area's.
This may be true to a degree on paper, but I think it is less true in reality.
e.g. Fill rate is completely misleading if we only look at the paper specs. Sure, the X1900XT(X) has a superior edge in theoretical fillrate, but it never will realize it due to the memory limitations. Further, once you enable MSAA your fill rate is going to go down. Xenos is the exact opposite--the 4Gigapixel fillrate is a real fillrate, and MSAA never impacts the fillrate.
Geometry is another example of where just reading numbers on a spec sheet COMPELTELY ignores architecture. Sure, Xenos can "only" setup 500M triangles (vertices) a second whereas some of the new GPUs have a setup limit in the Billions. The problem is two fold though with just taking this number as "superior". 1st is that there is a real and physical limitation in memory bandwidth and memory footprint. Meshes take up a bit of memory--this is one reason we don't see a lot of 3M poly meshes running around. Further, and significantly, is that just because a GPU has a setup limit of X-Billion, does not mean the Vertex Shaders can actually transform that many vertexes.
In general the obsession with spec sheets, pipelines, flops, etc... to the exception of how these things relate to the architecture is very misleading. In my original statement I conceeded that the X1900XT is better for the PC space due to the general limitations developers have to work with, but I think you are wrong on this point:
in fact most of the key graphics performance area's
Define "key"? Winning in theoretical peak fillrate and geometry does little good if they are not relevant to game design -or- real world conditions.
Like I said, Xenos has some significant advantages that actually apply to KEY areas of game design.
Does a developer want 8GP+ of theoretical fill rate or does he want eDRAM where he gets "free" HDR and MSAA and a real 4GP fillrate?
edram is obviously a major 360 advantage but enough to overcome <50% the main memory bandwidth? PS3 had edram too compared to the xbox which didn't. What I want to know is how much do you really need for a modern framebuffer with lossless compression. Is 256GB/sec overkill?
A 720p 32bit framebuffer with HDR and 4xMSAA is going to be in the 130GB/s range of incompressed data (taken from B3D where we both visit time to time). Once you start adding other framebuffer effects it can go higher (e.g. Xenos also uses the eDRAM for some other backroom tasks and also as part of the early cull and tiling). But we don't need to get too fancy to realize the significant penalty we have already--just look at the impact of AA or HDR. Never mind both together. These effects bring modern GPU's to their knees.
I would say being able to produce floating point blending and 4xMSAA with negligable impact to memory is "Key" because we already know 50GB/s is not enough. So I would not call it overkill.
But I think this issue is bigger. You mention Xenos has less bandwidth to the main memory--ok, at face value if we just want to look at some paper specs, then yeah. But this ignores the reality of the situation. The bottom line is that a significant portion of the 512MB of memory on the X1900XT(X) is being used for the back/framebuffer. Yet on the 360 all the bandwidth associated with the backbuffer is isolated, thus a significant portions of the 22.4GB/s of main memory can be utilized. So what good is 512MB of memory if 50MB of it is using 95% of the bandwidth???
Yet again, what appears as an edge on paper becomes less clear in the real world. There is a reason ATI spent 105M transistors on the eDRAM, the same reason Sony (PS2) and Nintendo (GCN) did the same.
Basically we need to stop treating Xenos like a PC GPU. It is a divergent architecture and any fruitful discussion comparing Xenos to a PC GPU needs to consider archtectural differences and how they impact the numbers. The numbers themselves are ONLY releveant in fixed environments.
Edit: Just as an example of how significant bandwidth is: 7800GTX 512MB, Fear @ 1280x960. No AA: 91FPS. With 4xMSAA: 50FPS.
That is a 45% performance hit! This resolution is only marginally higher than 720p, and yet that is an INSANE performance hit. eDRAM alleviates this significantly.
http://www.extremetech.com/article2/0,1697,1914665,00.asp
BTW FP10 is supported by R580
Last time I checked it supported the FP10 format but it did not support FP10 blending (but this was on the R520, but I assume the situation is the same as I have not read anything indicating this has changed).
CPU/Memory accesses are only going to be a limit if your limited by that in the first place. How many games get a speed up from the doubling of the CPU interface bandwidth by moving to PCI-E? How big is that jump?
Chicken or the Egg 😉
PC games are designed to the lowest common denominator. So just because PC games are stuck in this area, I am not sure it is valid to off hand dismiss the significant advantage the 360 (and PS3) have in PC<>GPU communication.
My original thesis was "all things even" and each being developed for independantly to maximize them--the only fair way to say which is "better" as originally stated. I would conceed the X1900XT is a better PC GPU, and Xenos is a better Console GPU. But people are trying to make it "either / or" and I think that ignores the different development environments.
Anyhow, on this specific point, to get procedural data from the CPU or advanced Animation (VERY important to graphics!) or to do advanced physics (stuff like fluid dynamics, destructable solid bodies, interactive elements, dynamic particles, etc) it is a significant advantage to have fast CPU<>GPU communication. PC games don't do this because quite frankly they SUCK at this (wags finger at Intel/AMD).
I would argue that this is more "Key" to the end product graphics (visuals and animation) that fillrate. And the X1900XT is always going to be limited to the PC bottlenecks... which is the point: the 360 addresses a number of bottlenecks the PC has.
Which of course means Xenos will not strut its stuff much when doing a PC port. Yawn. It comes into its own though when you begin to think outside the PC-box.
For all its talked about I simply don't see any evidence to suggest that the GPU is not working pretty independantly from the CPU with the limited connection between the two being more than enough to service the needs of the game.
Which is completely true of PC game design -- because that is the reality of the bottleneck. This has not been traditionally true in consoles. Arstechnia.com has a good article discussing the MS patents in regards to the CPU. In a nutshell MS dropped Intel to go the route Sony and Nintendo have gone in the past in the consoles--which DOES leverage the CPU.
Featurset is a Xenos advantage but how much of one? What specific features does it support than R580 doesn't in hardware? I know how far it goes beyond the SM3 spec but R580 goes beyond it too.
Off the top of my head: Higher Ordered Surfaces (cuirved surfaces), Hardware Tesselation, Coherant memory access (MEMEXPORT), unified shaders, unified shaders, texture lookup in every vertex shader, FP10 blending.
And of course features don't neccesarily equal more power.
No, but like all features they are a means to an end. All performance is a means to an end. Put it another way:
If the desired end result is HDR & 4xMSAA which is better?
If the Features nullify the disavantages (e.g. bandwidth and ROP performance penalties) then YES, absolutely the features make it more powerful.
But I think we focus too much on "power" instead of this: The end product on the screen.
The more powerful solution is the one that puts up the better graphics--not the one with bigger numbers. On paper a 3.8GHz P4 has 2x the integer performance of a 3800+ AMD64 -- but architecturally the AMD64 will mop the floor with the P4.
So architecture and features are relevant to power--at least if (a) usable resources or (b) end product are the point of discussion.
it still has quite a healthy lead over Xenos' multisampled fill rate because of twice the ROPs and 30% higher clock speed.
From what I have gathered is that the ROPs are tied to the eDRAM which is an effective 2GHz. The ROPs are also designed to do certain tasks much quicker.
We would all agree comparing the X1900's 48 fragment shaders to the 7800GTX's 24 fragment shaders is NOT an equal comparison because the sum of the parts are MUCH different. Same applies here. Xenos ROPs != X1900XT rops.
Well both displacement maps and HDR are also heavy shader functions so on that side of it, the R580 comes out in front.
The difference being that Xenos with the hardware tesselation can do a single pass displacement algorithm.
On HDR the X1900XT is going to be stuck doing FP16 blending where Xenos can do FP10 blending which is basically 2x as fast (same penalty as FP8).
So for the same result the X1900XT is going to have to work harder in both situations--not to mention HDR is going to add a substantial hit to memory bandwidth in addition to the processing penalties of FP16.
But how much bandwidth does such a frame take up? I remember seeing a calculation at B3D and it was suprisingly small even without compression.
The FRAMEBUFFER is very small (depending on the resolution it can be anywhere from less than 10MB to over 70MB). It is the backbuffer (commonly lumped in with the framebuffer) that sucks up a lot of bandwidth. The backbuffer is where all the calculations take place, and by every account AA and HDR are huge hoggs of memory bandwidth.
The easy way to test of course is just to get an R580 in a modern game (CoD2 or Oblivion would be ideal), run it at 720p with 4xFSAA and see how performance scales down as you downclock the memory speed.
If all things were equal, yes. Of course we are discussing the porting of games designed not to take advantage of many of the Xbox 360's closed box advantages. Further, we are limited by settings--how do we know the settings are identical?? So I would not put much stock in this, but one could make the case Xenos has an advantage in CoD2:
http://common.ziffdavisinternet.com/util_get_image/12/0,1425,i=125148,00.gif
This is not Apples-to-Apples, but it does raise a few eyebrows.
Further, on B3D Jawed had pointed out how at lower resolutions, when bandwidth was not a concern, the X1800XT was very, very fast but as resolutions w/ AA and HDR were added performance took a huge nose dive. These are known bottlenecks, and that is why ATI (and other console makers in the past) have used eDRAM as a solution.
To restate my point, I am not saying Xenos is a better GPU. I don't things are that simple. I think the X1900XT is a great GPU, and is MUCH better in the PC space. PC developers develop with certain bottlenecks in mind--they are NOT going to push the CPU in a way that requires a lot of interaction with the GPU because they don't have those resources available. And the X1900XT has more shader performance 1-on-1.
But the problem is saying one is "better" or better in "key graphics performance areas" when just looking at specs.
My point was if a developer was sitting down with both systems and building a game to maximize the system with a set budget and development time I am NOT convinced the 1900XT's end results would be better.
Its great to say, "The X1900XT has 2x the memory bandwidth" but this ignores that they don't use memory the same way! You can add, "Displacement mapping is a Shader heavy task" but this would ignore the X1900XT does not do texture lookup in the vertex shaders (Every Xenos shader does though) and Xenos, due to hardware tesselation can do it in a single pass.
In a nutshell the "Numbers" on the X1900XT are not equivalent to the "Numbers" on Xenos -- because they are used in different ways.
No one says, "My P4 has 2x the peak theoroetical performance an AMD--so it is faster". We know certain architectural issues (like long pipelines, HyperTransport, etc) make a HUGE different in utilization.
All I am saying is the same is true in this debate and I don't think it is as clear cut in regards to the end product. But like I said, in the PC space Xenos would be a poor solution and the X1900XT is better. The reverse is true in the consoles (MS could have easily put an X1000 series chip in if they had wanted).
The funny thing, imo, is the X1900XT is actually facing a similar dilemma to Xenos. The X1900XT has 3x the pixel shader performance of the X1800XT--yet it has no where near 3x the impact on graphics.
*IF* a game was designed with the X1900XT in mind it would mop the floor with the X1800XT, but as of right now there are situations where the difference is insignificant. PC gamers have no problem identifying that this could--and will--change. It is not hard to imagine situations where the X1900XT could end up being 2x as fast in a real game, so we are all saying the X1900XT is faster... even though the current software does NOT always bear it out.
You may still disagree, but oh well 😉
I posted a discussion of architecture the other day here: http://forumz.tomshardware.com/hardware/Xbox-gpu-single-7800-chip-ftopict168200.html
At least you will see where I (and MS/ATI for that matter) am coming from.