Nvidia: Moore's Law is Dead, Multi-core Not Future

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
jenesuispasbavard, why can't you just split your range of integration into segments and let one core integrate each segment ? Why exactly do you think this task cannot be parallelized ? Am I missing something ?
 
Teaching more programmers parallel programming practices is all well and good but it'll only get us that far. Much like there are problems or algorithms suitable for parallel execution there are also those that are not, including the majority of mainstream user software.

In the end I believe preparing software for parallel, or serial, execution must be transparent to the programmer for such a paradigm shift to be possible.

Whether this is done through the compiler, operating system or a run-time compiler is largely irrelevant. The point is that expecting every programmer to adopt vastly different coding methodologies depending on which kind of algorithm a particular subroutine uses is neither practical nor realistic.

Perhaps the trend of treating various sub-processors, such as the GPU, as general processing units available to the system will see an upswing in popularity for interpreted languages due to all this.

*shrug* Interesting times, no doubt.

Oh yes, regarding the article... it's just NVIDIA being full of themselves again. Even if there is a point of two to be made along those lines.
 
😵 Reading some of these posts... man...

Out of order execution is the norm... really, we have already solved the problem of having to sequence data X and Y before we can sequence data Z when you do tasks X Y Z in parallel. Get over your puny conscious perceptions.

If you want to take it up a notch, pick up a quantum physics book. Then you might sound smart when talking about parallel architecture execution. You really can sequence Z before getting X and Y results. It's not silly math, it's how the universe works.
 
I think, as mentioned, a lot is due to 'habit'; I learned to program in FORTRAN IV in high school in '69 - I'm pretty well 'canalized' to serial execution. My 'habits' and procedural technique were set in stone by Modula2, which has taken me through C, C++, C# (which really appears to 'want' a JAVA background), and now, .net - I realize the paradigm shift, and am planning a rebuild to include an ATI stream processor, so I can 'take the plunge' to OpenCL - but I realize it'll be deep water for an 'old dog' like me! I'm pretty sure kids who learn CUDA or CL in high shool now will see way more opportunities for 'parallelism', but, as pointed out by several - there are large numbers of things you just can't 'split up'! But, damn - it sure seems to me that OS themselves could vastly benefit from a healthy dose of mulitple-coreitis!!!

BTW - considered using a couple 470's and a Fermi - BUT - figured out that I'd have to pull a 220 line up from the basement to accomodate the PSU, and run a second cooling loop back down to the basement, or just use my bedroom/office as a turkey roaster most days...
 
Changing to parallel computing is going to be the equivelant of changing the auto industry from using gasoline to hydrogen. It wont be a fast change nor and software companies are ill equiped to deal with this change as of right now.
 
"...while he believes that the future is in parallel processing...."

yea... in the furture... for now it is a tough thing to see. Look at how many years did theose companies take to get used of Cell processor on PS3 (well.. many of them are still either learnig ro just avoid it)..

Yes, parallel processing is very useful, but we need a/a few pioneers to sacrify themselves for the good... Is Nvidia going to be "the one"?.. 😉

 
I'd rather hold my breath for some sort of breakthrough in thermal dynamics allowing us to achieve exponentially higher clock speeds. I'm no physicist, and i certainly don't have the answer, but I want a 10ghz air cooled cpu Damnit! 😛
 
It's kind of interesting to see so many people comment on "parellel programming" who have never programmed in their lives, but somehow feel qualified to comment on it with authority. How can you know if you're not a programmer.

It's equally strange that people think multi-threaded apps are a new things. It's not, even for the PC. Since the 286 and OS/2, programmers were writing threaded apps. Why with a uniprocessor? For one, there are good programming techniques with regard to the interface. You always want it responsive. Microsoft, naturally, doesn't do this well, because you get an hourglass. But, let's say you run a business App, and you need to do a search in database. Bad programmers would go get the data and make the person wait. Good ones would spin that off in a thread, and keep the interface active for the customer, even if to just tell them you're retrieving data and give a rough idea of where you are with it. Or, you can even let them do some work while they wait.

Also, sometimes programs are waiting for a hard disk. You don't want your processor stalled for that if there can be more work done. So, you spawn a thread for that, and do other work with the processor, if possible.

Also, there were multiprocessing systems for many, many years. This is nothing new.

But, the main thing is, it's not a universal solution. It's not just retraining, or putting more effort into it. Some things are actually SLOWER if you do them in parallel, because the over head of synchronization is damning. You also use more memory, which lowers cache hit rates, also lowering performance.

It's also not always about which is faster, it's which the user prefers. If I write a multi-threaded app that's slower than yours by 10%, but it keeps the user informed, and able to see what's going on, and they prefer it, then it's still a better way to do it. At the end of the day, you want the end user to be happy, and everything else is a means to that end. One time I even spawned a dialogue box that asked if the person wanted to play a quick game, if I knew the computer was going to be busy for a long period of time, and there wasn't much else to be done. It used up resources, but the users "felt" like it didn't take as long. The human element is always an important one. People much prefer to click on a button and have it say that you're running this or that, at the moment, than have their app freeze with an hourglass where they can't do anything. It's not that different, but people like it much better. They still feel in control of the computer, not the other way around.

So, stop hoping for the impossible. You'll see more optimizations, but it's not that programmers suck (although, honestly, most do), but there are very defined limits to what should be, and what shouldn't be parallelized.

Hardware companies are going this way because they're having a terribly difficult time increasing serial performance, which is much more universally useful. It's not because it's a better approach, it's just the more feasible one now because increasing serial performance is so difficult.
 
"Out of order execution is the norm... really, we have already solved the problem of having to sequence data X and Y before we can sequence data Z when you do tasks X Y Z in parallel."

Out-of-order execution of a handful of instructions in a mainly serial task is vastly different from parallelizing a task over hundreds, thousands, or tens of thousands, of mini-cores though.
 
[citation][nom]killerclick[/nom]Why not take both approaches?[/citation]
I believe the Cell processor was just that.
Essentially, from what I understand, 1 PowerPC core, and 7 GPGPU (SPE) cores. The Xbox 360 as just a tri core PowerPC, but with 2 threads per core.

Or maybe AMD fusion, were you have a nice quad core, with a GPGPU at the side, and a discrete GPU to handle graphics.
 
[citation][nom]matt_b[/nom]On another note, am I the only one finding it amusing that the chief scientist of R&D at Nvidia is stating the CPU consumes too much energy??? Did he forget about the monster they just released, or does he still consider it to be within acceptable power requirements or efficient enough?[/citation]

I was thinking the same thing then I realized something. He is probably thinking of it in terms of the power requirements per core. With the GTX480 containing 480 cores and having a 250W TDP it is effectively using 0.5w per core (that includes reductions from that TDP for RAM). Current CPUs are using 32.5w per core. If you are going strictly on a rating of power per core then the Fermi architecture is actually 65 times as efficient than Intel's Bloomfield.
 
"I believe the Cell processor was just that."

In a sense I suppose that's a decent example. Unfortunately it also serves as an example of how difficult it is to properly utilize and tap the power of such an architecture.
 
I see why manny critics are justified because this arguments comes from Nvidia.

Somehow in practice general performances stops when Mhz hits the wall. Is true Moore's Law is about transistor count but it was related in the pass to performance increase.

I benefit from multicores and multithreading as I work with 3d modeling and rendering but beside some very specific optimized functions everything else even in 3d seems progressing very slowly. I have files from 6 years ago and when I open then now with a Xeon@3Ghz 64bit system with tons of memory I don't see a substantial speed increase when working in the viewport, at least in Maya. I created those when the mighty PIV with hyperthreading (mine was 3Ghz) and ATI9800XT were the dominant beasts. However other apps may take better advantage from current technology. I don't see a 3-4x increase.

What is the future? A 64-256 core beast with single treaded functions still running at 4Ghz? I don't know
 
[citation][nom]rebb[/nom]I'd rather hold my breath for some sort of breakthrough in thermal dynamics allowing us to achieve exponentially higher clock speeds. I'm no physicist, and i certainly don't have the answer, but I want a 10ghz air cooled cpu Damnit![/citation]
I agree that I want 10 ghz single core more than 2.6 ghz quad core, but unfortunately it won't happen. Intel thought they could get 5 ghz and all that back with the P4's but it never happened. You start getting problems with electron drift and huge amounts of heat as a result the harder you try and push these things. That's why crazy stuff like LN2 help...it calms the electrons' vibrations down so they can move faster without bumping into each other.
 
I have a hard time hearing the parallel myth yet again... I used to have a parallel connection to my printer and that's long been replaced with serial. I had a parallel connection to my harddrive and that's long since been converted to serial. You're telling me that for some reason my multi-core SMP processor is going to jump over to a parallel system? I don't get it.
 
Nvidia....ya just look at out current new GPU's that are thrown together multi-core processors that just suck down energy and barely don't melt. We need to make GPU's like ATI.
 
If the processors become parallel then most parts inside of a computer needs to be parallel also - except maybe some serial ports. People will still use only one program at a time, but switching and loading between them will be a lot faster. We will still have serial programs because some tasks just can not be run in parallel. There is really no need to stick with serial hardware though, there will always be bottlenecks if we do. It could be possible to make some hardware work both ways, with a switch, that way depending on what you want to do you can pick the most optimal way.
 
[citation][nom]matt_b[/nom]I totally agree with this statement here. However, if this were to change, and more were trained in how to properly program for parallel computing, then the same could be said about the need to train more on how to properly program for serial/series computing - which is where we are currently in processor design. I think it's more fair to say the insufficiency lies on both sides.On another note, am I the only one finding it amusing that the chief scientist of R&D at Nvidia is stating the CPU consumes too much energy??? Did he forget about the monster they just released, or does he still consider it to be within acceptable power requirements or efficient enough?[/citation][citation][nom]etrom[/nom]Ok, how about turning the billions of working COBOL lines of code running in mainframes of huge companies into parallel computing? You, Mr. Dally, do you accept this challenge?Nvidia is becoming a huge biased company, spreading the lobby of parallel computing to everyone. We are still waiting for something real (and useful) to get impressed.[/citation]
Actually in the context of what he's advocating Nvidia's latest high performance architecture is very power efficient, especially when comparing performance per watt to conventional processors and GPUs. I sure hope you're not basing your criticism about the power efficiency of the Fermi architecture on gaming benchmarks, especially when this article focuses entirely on parallel computing, because well... that would sound a little ignorant on your part.

The people waiting for huge breakthroughs to back up the legitimacy and potential of parallel computing may not be able to appreciate the benefits until it's already upon them. Although Fermi may not be "real" or impressive enough to trigger understanding for some people, it's a genuine step in the right direction, and represents as big of a breakthrough as this area of computing has ever seen.

The link below shows just a few examples of Fermi's performance potential in this area, parallel/GPGPU computing, an area that is still young and open to massive optimizations. Have a look...
http://anandtech.com/show/2977/nvidia-s-geforce-gtx-480-and-gtx-470-6-months-late-was-it-worth-the-wait-/6
 
[citation][nom]matt_b[/nom]On another note, am I the only one finding it amusing that the chief scientist of R&D at Nvidia is stating the CPU consumes too much energy??? Did he forget about the monster they just released, or does he still consider it to be within acceptable power requirements or efficient enough?[/citation]


I was thinking the exact same thing.
 
No comment on parallel vs. serial, but I am surprised at the "programmers can't change fast enough" thread which is very consistent throughout this discussion. Look at iPhone/iTouch/iPad. Think about how fast thousands (tens of thousands?) of programmers were able to deliver quality product against a new platform. I know you will argue that parallel programming is much harder than iPhone coding, and it probably is. But if you give the development community a clear, exciting reason to invest in a new direction, they will amazingly quickly.
 
i think the main drift of what he was saying is that just cramming multiple CPUs isn't going work (which is what modern multi-cores are) instead a more streamlined approach where by you break the CPU up into modular components each component can have n numbers of iterations, now you can scale which ever component n number times to give you the pay off you need, which is where fermi and bulldozer comes into the picture, though it's more bulldozer then fermi

and with regards to multi-threading, his example is a little off, what he is trying to get to is most apps will request one large operation rather break it up into modular components, for instance, if you wanted to know how many times the word 'and' and 'or' appears in a book within 5 words of each other, the traditional method would be to scan through the text serially until the word 'and' occurs, you would index this location with regards to the whole book, you will then scan 5 words ahead and behind the index to see if the word 'or' occurs, you will then return to the index and carry on looking for 'and'.... laborious to say the least, a more efficient way to do this in parallel would be to have the book broken up into smaller chunks, like chapters, while scanning through the chapters you would simultaneously be looking 5 words forward and backwards of the current indexed word
 
Coles Notes on CPU and GPU
GPU is good for extreme parralal,extreme processing but with major branching penalaties(no cache yields larger cache penalties).

CPU's are great at branching but not great at raw processing like GPU's.

Solution: Do extreme processing on GPU and extreme branching on CPU but avoid using the PCI bus(it's slow).
 
Status
Not open for further replies.