It is ironic that this review would point out how well pervious generation GPUs do versus modern GPUs. This is something I have been dwelling on with regards to the Radeon series, in particular with regards to the upcoming Southern Islands GPUs. The rumor is that Cayman will deliver performance in excess of that of GTX 480 using just 1600 shaders, and Barts will achieve Radeon 5830 level performance with just 800 shaders. The latter is definitely ralistic considering in DX9/10 games the Radeon HD 4890 is clock for clock equivalent to a Radeon HD 5830. I picture the following sequence of events:
Just as ATI designed the cut down 640 shader 4770 to replace the 4830 while testing the 40nm node, they started a cut down 1280 shader part, which would ship with one execution pipeline disable, to replace the 5830 while testing the 32nm node. At the same time they started tweaking the design of Cypress to produce an RV790 style speed bump for the 40nm Cypress design, which would have been the Radeon HD 5890. With the introduction of the Radeon HD 5830 came the realization that the 800 shader 4890 was clock for clock as fast as the 5830 for DX9/10, and a change of plans. Instead of the 32nm replacement for the 5830 being a derivative of Cypress, it would be a 32nm shrink of RV790 with DX11 and double precision fp bolted on, in many cases through microcode. The resulting part would be faster clock for clock than the 4890 in DX9/10 because of the doubling of ROPs resulting from using the Evergreen memory contollers, enough so that the slight shortfall in DX11 performance relative to the Radeon 5830 would be compensated for.
Then disaster struck. Both TSMC and GlobalFoundries cancelled their 32nm nodes. ATI was left with a 50%+ completed high end Northern Island design that would have been about 400mm2 in 32nm and north of 600mm2 if implemented at 40nm. The entire 32nm team got together and realized that implementing some of the NI DX11 enhancements in the 4890 derivative core would get it up to Cypress level DX11 performance, and doubling this core would create a GTX 480 level performance replacement for Cypress. Thus Southern Islands was born. With the team working on the RV790 analog for Cypress joining the Southern Islands project to get SI out the door in the same timeframe NI was supposed to launch, Southern Islands benefitted from the planned clock rate enhancements as well. The RV790 analog project could be safely cancelled because the anticipated price drops with the entry of nVidia's GTX 480/470 did not materialize.
With these assumptions we can predict the performance of Barts. At 850 MHz with half its ROPs disabled Barts would match the DX9/10 performance of the 4890 and the average DX11 performance of the 5830, with a huge increase in tessellation perfromance. And without compensating for the increased performance from the ROPs operating at a higher clock, Barts would match the performance of the 5850 if it were clocked at 930 MHz with all 32 ROPs active. Adjusting for the faster clock of the ROPs it would probably be able to achieve 5850 equivalent performance at 900 MHz. The question is, how well would it perform relative to Juniper? The answer to this depends on how high the clock for clock performance of Juniper is relative to RV790. This is hard to judge since Juniper has significantly lower bandwidth than RV790, so it is hard to tell how much of the performance difference is due to lower bandwidth and how much due to lower performance execution units. I went through a convoluted several step process to estimate this performance differential, and came up with the RV790 shaders being clock for clock 8% faster than the Juniper shaders plus or minus 8%. If only there were a more direct way to estimate this ratio.
The Radeon HD 5850 has 1.8 times the number of shaders of the Radeon HD 5770 and operates at 85.3% of the frequency. The Geforce GTX 460 has 1.75 times the number of shaders of the Geforce GTS 450 and operates at 86.2% of the frequency. With such a small difference in ratios and the shader ratio being higher for the Radeon while the frequency ratio is lower for the Radeon, we would expect the performance of the GTS 450 relative to the HD 5770 to be almost identical to the GTX 460 relative to the HD 5850 if the performance of the shaders of the 5770 relative to the 4890 is the same as that for the 5830 shaders. The performance of the GTS 450 seems to be much closer to that of the HD 5750 however, so it appears that, while the shaders of Cypress are clock for clock about 70% of the performance of RV790, the shader of Juniper are clock for clock 80% of the performance of RV790. I find it questionable that RV790 would be clock for clock 25% faster than Juniper when, even factoring in my margin of error, my estimate was up to 16%. But I am now confident that RV790 is shader for shader at least 16% faster than Juniper, and even with half the memory controllers disabled Barts could have have just 720 shaders active and clocked at 800 MHz while still matching the performance of the Radeon HD 5770. At 700 MHz with 720 active shaders and all memory controllers and ROPs operational, Barts will can probably still match the performance of the Radeon HD 5770. At 700 MHz with 640 active shaders and half the memory controllers disabled, Barts can probably match the performance of the Radeon HD 5750.
If the 6770 is the 5850 alternative at $239, and the 6650 the 5830 replacement at $169, ATI could offer a 6630 as the 5770 alterative at $129 and possibly a 6610 as a 5750 alternative at $99.