I think NVidia's approach is perfect. Certain things work better on CPUs, and certain things work better on GPUs. In particular, the hardware structures in GPUs and other accelerators vastly outperform multi-core CPUs for many math intensive tasks, particularly for imaging, video, financial, geology, etc., while CPUs are still quite necessary for decision based logic and control. So you need both types of processors to be effective. CUDA is a perfect development tool to enable this, and LAME is a perfect mainstream application that can benefit from acceleration.
We're past the days where we can just raise the clock speed. New programming models are necessary. Homogeneous multi-core designs (e.g. Larabee) will fall short. Heterogeneous multi-core (many different types of cores) will dominate in the future. Although the bandwidth of the PCIe 2.0 bus is very capable, the latency of this bus will be an issue. The best designs will have all the different types of cores on the same chip. So while NVidia has a great development tool with CUDA, hardware designs along the lines of AMD's Fusion may be the way of the future.