Actually, as I mentioned, the highest-performing single-GPU card only does about 675 GigaFLOPS. Keep in mind that the numbers are different depending on the level of precision: supercomputers are measured using DOUBLE-precision floating-point, aka 64-bit FP. This level of precision is what's needed for scientific and engineering tasks. Meanwhile, standard 3D rendering, gaming, and media tasks are fine using 32-bit single-precision FP. Hence, a lot of consumer-targetted equipment is measured using single-precision; the teraflop figures from AMD (as well as the entirely made-up teraflop figures for the consoles) are referring to single-precision power.
Depending on the architecture, single-precision FP units can be used to produce double-precision results, the double-precision levels will be as much as half the single-precision, (some types of units, such as x86 FPUs that use AVX, namely Sandy Bridge and Bulldozer, as well as the PowerXCell 8i) to quarter (Radeon 6000-series GPUs) a fifth (Radeon 5000-series GPUs, newer nVidia GPUs) to as low as a tenth. (older nVidia GPUs, the PS2's Cell)