-Fran- :
juanrga :
-Fran- :
Well, algorithms scale poorly most of the time, so taking advantages of a full paradigm shift (bazillion cores) will take a lot of years.
In particular, I'm still curious as to how the ASM paradigm will be shifted for the bazillion core era. You feed the CPUs instructions in serial manner, right? Am I missing something there?
In particular, I'm still curious as to how the ASM paradigm will be shifted for the bazillion core era. You feed the CPUs instructions in serial manner, right? Am I missing something there?
There are studies about how to run a serial workload over several cores to improve their performance. Even highly serial algorithms saw speed up of 8x running on 64 different cores in a manycore. But there are limits to this approach.
Algorithms with large amounts of parallelism are split into threads with each running on a different core.
If you feed the CPU with a thread, what the CPU does is to identify the hidden parallelism on the stream of instructions (the sequence of ASM) and then execute the instructions on parallel to speed up. The problem is that ordinary CPUs logic (superscalar/OOOE) spend too power and die area to identify the hidden parallelism in a sequential stream of instructions and don't scale up. Haswell can execute a maximum of eight instructions at once but sustained average is about 3 or 4 due to the logic being limited to a 192 ROB window. To achieve eight instructions sustained rate the ROB would be increased at 1000--2000 entries, but the area/power penalty is superlineal. and Haswell is already a big/complex core.
I can see that and undestand why, but you still feed the CPUs (no matter how many there are) in a serial fashion. I want to know if that is a design restriction of CPUs or OSes. That's why you can do parallel work on serial loads, because you get the "illusion" of parallelism. Strictly speaking, a serial algorithm can be made parallel in chunks, but the atomic operation that come out of that will still be feed sequentially to the CPU(s) in assembler/binary. I think I'm missing memory management a bit on my thinking... Ugh, I'll have to go back and read memory management again (pointers and registers).
Cheers!
No sure what are you really asking but I will try to answer.
Most commercial CPUs are based in the von Neumann paradigm, which is essentially a simple sequential paradigm where one thing is made at one time and as controlled by the program counter. Modern architectures (superscalar, VLIW) use 'tricks' to get hidden parallelism and speed up the serial stream, but that is all.
There are attempts to develop parallel paradigms, such as the dataflow model.
The dataflow computing model is known to overcome the limitations of the von Neuman paradigm by fully exploiting the parallelism inherent in programs. In the dataflow model, the operations are made when the data is available to the operands without any pre-determined order. Unlike the von-Neumann model, the dataflow model is neither based on memory structures that require inherent state transitions, nor does it depend on a program counter to sequentially execute a program.
However it has found difficulties for its practical implementation
http://en.wikipedia.org/wiki/Dataflow_architecture