I am not going to compare the 360 GPU (Xenos) to the 7800 (G70) or at least go out of my way to do so, but it is worth noting some things Xenos does to address the challenges of a traditional GPU in a PC.
• Unified Shaders. The general problem in graphics is the load between vertex and pixel shaders varies. It varies from game-to-game (e.g. HL2 leans toward pixel shaders where Far Cry with the huge open areas with trees with leaves leans heavily on VS). A fixed ration can never reach complete effeciency in such situations (e.g. HL2 may underutilize vertex shaders). Further, scene-to-scene the balance changes. e.g. If you are looking at a wall there may be very few triangles on screen, but if the wall is using parallax mapping, specular highlights, normals, etc... it could be very pixel shader heavy. Yet if you turn around you may be looking at a forest where each tree has individual leaves and the balance flip flops. Finally, the vertex/pixel shader loads vary throughout the rendering process. While GPUs are pipelined there are a lot of oppurtunities for stalls and some techniques result in one or the other being idle.
ATI's solution was to make shaders a pooled resource. PS and VS are already fairly simply, and with DX10 the language and technical abilities of each are further blurred/unified. So ATI bit a bullet in committing the resources to a scheduler/arbitrator, but gained the benefit of pooling caches and pooling the shaders. ATI further decoupled the TMUs (texture mapping/management units) and ROPs (rasture output processors) from the pipeline. It just happens that shaders are pretty small comparatively which allowed them to inflate the shaders while not spending much more on TMUs and ROPs. Shaders, and not TMUs, are the future of game design so this was a good move. Further the ROPs are tied to the eDRAM which is a big benefit as well.
To put it in a nutshell Xenos has all the effeciency additions of the X1800 series and then some. Further, Xenos has more shader performance in the 32bit programmable pipeline (240GFLOPs to 187GFLOPs; G70 @ 550MHz is 255GFLOPs; remember flops don't tell us about the architecture). Anyhow, the X1800 series does very well in new shader rich games (FEAR, CoD2, BF2). It is pretty much a given Xenos is faster based on the paper stats.
While the architecture is a wild card (we already saw how some devs on the rushed launch could not get a handle on tiling... takes a while to maximize hardware), ATI has stated that their traditional GPU designs are 50-70% effecient whereas Xenos is ~95% effecient in regards to shader utilization.
Bottom line is Xenos is a very fast GPU and very well designed for shaders. Of course I expect R600 to be 2-3x faster in regards to shader performance.
• eDRAM. The biggest bottleneck on PC GPUs is memory. Games take a big hit with Anti-Aliasing. Games take a big hit with Floating Point Blending ("HDR"). Never mind doing both at the same time! Enter: eDRAM. This is not a new technology. The PS2 and GCN both use it because of the absolutely fantastic bandwidth it offers. The problem? It is small. The 10MB of eDRAM fits a 480p framebuffer w/ 4xMSAA and FP10 blending. Problem! How about 720p/1080i? Tiles. Kind of a throw back to the old TBR (tiled based renders like the PVR; modern NV and ATI chips are IMR, or immediate mode renderers) the eDRAM tiles the framebuffer. This allows Xenos to have the best of both IMR and TBA designs. Of course developers need to do an early Z pass to use this feature (which some games did not do due to the crunch), but overall this is easy enough solution.
Now where this rocks. First off the 8 ROPs are all tied to the eDRAM. The eDRAM has an internal bandwidth of 256GB/s and runs at an effective frequency of 2GHz (quad pumped @ 500MHz). While it is true the fillrate is a "mere" 4Gigapixels, when 4xMSAA is enabled it jumps to 16Gigasamples--basically the ROPs are designed for no-slow down 4xMSAA. The ROPs also have a a double pumped Z, a feature it shares in common with the R530 (X1600; X1800/X1900 do not have this). That is 64 Z samples a clock.
What about HDR? As Dave Baumann points out in his Xenos article at Beyond3d.com is that Xenos can do FP10 or FP16 blending with MSAA. What is FP10? FP10 (10-10-10-2 RGB-Alpha) is a special format that has the performance penalty of FP8 while allowing a very broad degree of ranges (through the use of a mantesia method). So while Xenos can do FP16 blending, there really is no need as it can do FP10 blending with no performance hit.
The bottom line is the framebuffer/backbuffers are the most bandwidth intensive/hungry parts of the GPU. Uncompressed the bandwidth needs well exceed 100GB/s when MSAA, FP16, etc are all enabled. The eDRAM effectively eliminates this issue. Instead of the Shader Pipeline constantly being stalled by the memory, the bottleneck falls back to the Shader Pipeline--which as noted above is very effecient.
One reason we don't see eDRAM in the PC market is because PCs require higher resolutions and many variable resolutions. Tiling works on a console that has 3 or 4 resolution target developers can optimize for, but on the PC it is a no go. It should be noted that there is no penalty for 2xMSAA, but 4xMSAA has a 1-5% penalty due to tiles. Think of titles as dividing your screen into 3 equal columns. The GPU must recalculate the "seems" to make sure they look correct.
• Direct bus to the CPU. One of the biggest pains on the PC is the GPU has to fight the CPU for horrid memory bandwidth (6.4GB/s on most modern PCs) AND has a very high latency connection to the CPU itself. Basically the GPU is on its own and benefits little from the PC CPU. This is different on the consoles. Xenos in particular has a nifty way of alleviating these issues to a degree.
The first is that the CPU and GPU can totally bypass the FSB and main memory to communicate as they have a dedicated connection (10.8GB/s). Beyond this the CPU has hardwired compression to compress vertex data at a minimum rate of 2:1. This falls in lines with MS's procedural synthesis patents (see arsetechnia.com for a run down on this... do a search on procedural and xbox). Another neat feature is the GPU can write to the CPU's L2 cache. Finally the CPU can "lock" the cache and stream data directly to the GPU. This does not touch the FSB or Memory bandwidth. This is one reason MS went the direction they did with the CPUs. The patents indicate a desire to use 1 core for procedural data (saving memory for other stuff and allowing advanced LOD techniques). The Xbox 360 CPU is kind of a middle ground. It lacks out of order execution or advanced branch prediction--so it is no match for a modern PC CPU in many ways. But it does have some really beefy vector units, what is loses in effeciency it makes up for in having 3 cores. And the design is more flexible/user friendly that the CELL which as an asymetric design and stripped down SPE cores (although it has a lot of them and they do excel at certain tasks... not putting down any of the designs, noting their differences). There is no doubt the Xbox 360 CPU is harder to effeciently use than a PC CPU, and it may not have the peak theoretical performance of CELL, but MS tried to find a middle ground of solid general performance and specialize vector power, all while trying to keep a design that allowed developers to use their game code on any core.
Basically the 360 CPU has some downsides (like most processors), but within the closed box of the 360 itself it alleviates some of the issues found on the PC. e.g. One benefit of this design is that it made Geometry Shaders redundant to a degree.
The Xenos GPU also has some nice features that should impact IQ or design.
• Memexport. This feature allows the GPU to coherantly read and write to the system memory similar to how a CPU does. This is one of the holy grails of GPGPU processing. The beyond3d.com article deals with this to a degree, but basically in a nutshell it means Xenos can do some advanced particle, and maybe shader, effects on the GPU. Take a look at ATI's ToyShop demo for some examples. This wont be used much immediately but most likely will be toyed with in the future.
• Hardware tesselation. Xenos setup engine can do 500M triangles (vertices) a second, or if it tesselates in hardware 250M triangles (1 triangle every 2 clocks). This can be used to subdivide, in conjunction with advanced LOD schemes, or single pass displacement mapping (as outlined in the ATI Xenos PDF). To do this though they need...
• Vertex texture lookup. ATI's current GPUs lack this (instead using R2VB), and NV is slow at it. Not Xenos. First is the TMUs are decoupled from the pipeline. Second is they were designed with this in mind. Thirdly, with a Unified architecture *every* shader has texture lookup capability because this is a standard Pixel Shader feature. This means all 48 shaders can do a texture lookup--even if they are working as a vertex shader. This means a lot of effects formerly not realistic on a GPU are being opened up because the hardware is more flexible and dynamic.
• HOS. Xenos can do higher order surfaces (basically curved surfaces). This is another DX10 features; in the past former implimentations have been poor (like N patches) and there are some issues with animation, texturing, etc in the dev tools. But it is there and means we may someday see round tires.
---
Basically the issue is not "What is faster". In the CONSOLE world it does not matter. In an ideal situation this is how games in consoles work (at least exclusive ones):
1. Developer tests hardware (usually while porting a rushed launch title!) and identifies strengths and weaknesses of the design
2. Developer then engineers an engine (or engine elements) tailored to the hardware's strengths and trying to minimize the weaknesses
3. In the meantime the art director(s) of the dev team are taking these ideas and engineering samples in mind and are building an art theme and story arch; the general art direction is paired well with the technology (an example: The art in HL2 works well with Source, and the art in Doom 3 works well with the D3 engine; but you mix the engines with the art and BLAH! It does NOT look as good... no use doing a technical trick when it does NOT match your art!)
4. With engine specs in hand art team develops art assets that will push the system/engine
Now this does not always happen, but this is why AAA titles, especially exclusive ones, always look so good. They have the money, time, skilled people, and ability to do it right. And the fact is skilled developers can make a slower machine scream. Look at Metroid Prime and RE4 on the GCN or The Shadow of the Collossas and God of War on the PS2. They are as good as anything on the Xbox (although the Xbox does have more good looking titles in general). But a great developer can make a great game on any system.
So I would argue:
- Power is moot to a point on consoles. It appears to me Xenos is faster than RSX (based on what we know, i.e. it is a modified G70 at 550MHz) as consoles are as much about the total system effeciency, the dev time, budget, and art direction. Fact how "fast" the consoles are will lean heavily on how good the dev teams are.
- Each design has strengths. Repeat: HL2/ATI & D3/NV ten times. Now do it again faster.
- The biggest difference, I think, will come down to the little things. e.g. Xenos has an elegant and simple solution for FP blending and AA. The PS3 will have work arounds for this (a lot of variable ways of doing it, but all at a cost of some sort... some not bad though... btw the HL2 method is a hack and while "HDR", HDR is really composed of a number of different technologies and imo Valve's hack does not cut it... of course I love HL2 though)
- It really is about the games. Get the system (PC, PS3, Xbox 360, Rev) that you want to play games on. Because the fact is the Xbox1 was heads and shoulders above the GCN and PS2, yet the games were fairly similar. The PS3 and Xbox 360--regardless of the hype Sony and MS put out--are really in the same ballpark in performance.
At the worse one may need to cut down on particle effects there, texture resolution here, cut down sample rate on physics objects detail by 40% here and there, or have 3k people on screen instead of 4k. If you pressed me for the biggest difference in consoles it would be this: Memory. The PS3 uses a segmented memory architecture, which could be a pain for devs, but the real problem is this: bandwidth. Lets say the 256MB GDDR3 pool is used for a framebuffer. The RSX will saturate the 22.4GB/s it offers with the back/framebuffers. But here is why this is a problem: The framebuffer is pretty small, in most cases it will be less than 50MBs. This means the OTHER 200MBs is sitting idle and wasted. Not totally wasted mind you, but it does become a glorified cache. CELL is going to need a bit of the 25GB/s the XDR 256MB pool offers if it is to be fully utilized, so the general issue is going to be balancing all of this. It is not undoable--PS devs are great and figured out the PS2--but it is not as an elegant solution as the 360's memory situation. One big 512MB pool and all the framebuffer is isolated so the UMA is being used for texture and mesh calls and for the CPU. But basically this all comes down to design. The PS3 in many ways has more brute power but has some high hurdles to get over; the Xbox 360 is very streamlined, almost a GameCube2. But of course the PS3 has a big advantage in that developers have been using CELL dev kits with SM3.0 hardware in the same performance class as RSX for almost a year. They will have had ~18months with such hardware by launch in the fall (the 360 devs had the 360 beta hardware with Xenos for no longer than 3 months before launch!). Basically the PS3's fall titles will have been working on hardware similar to the final product LONGER than the Xbox 360 software.
So expect BIG things from Sony. Because again it is about the games, development time, and developer skill more so than the hardware. And Sony has put PS devs in a position to excell at launch whereas the Xbox 360 (minus some games like Kameo, PGR3, Condemend, CoD2 and now GRAW, Oblivion, FNR3) was a lot of upgraded Xbox titles or cheesy ports done poorly.
As a PC gamer Xenos is exciting because it shows the direction the GPU is going. e.g. Geometry Shaders are an answer to the dilemma of the poor System Memory bandwidth and CPU issues on the PC. Just move those things onto the GPU!
And of course ATI got a lot of money/funding to test out some DX10 features on the 360... and us PC gamers will get more refined hardware