-Fran-
Glorious
TMTOWTSAC :
A friend of mine likes to quote from some movie I've never heard of, about a bunch of lackluster superheroes. One of them is supposed to have the "strength of 3 men." Not great by superhero standards but he can still do as much as 3 people put together. He's also supposed to be "as fast as 3 men." Of course the joke here is that 3 men can't run any faster than 1 man, just try it.
What 3 men can do is "more" which isn't exactly the same as "faster." Let's say you're helping a friend move from a downstairs apartment to an upstairs apartment. He's got 30 boxes of stuff to carry and it takes 10 minutes per box. It would take 1 man 30 trips for a total of 300 minutes. 3 men could do the job in 10 trips each, for a total of 100 minutes, 1/3 the time. Individually, none of them went any "faster" but together they did "more."
In both cases, it's 30 total trips. But with three people, you're parallelizing the workload, they're all making trips simultaneously and that's great. But it can't always work like that. Let's say he's got 30 boxes worth of stuff to carry, but only 1 box to put them in. So you load up a box, take it up upstairs, unload it, bring it back downstairs, refill, etc. Now it doesn't matter how many people you have, you can only make one trip at a time. There's no way to do "more" here, the only way to improve things would be go try to make each trip "faster."
That's because the status of each trip is now dependent upon the completion of the previous trip. You can't start a new trip until the old trip finishes. In this case because you need the box to fill and refill. But you can imagine that box as being some data, and the loading/unloading as operations being performed on that data. If those operations have to be performed sequentially, parallelizing won't help.
TL;DR more != faster
What 3 men can do is "more" which isn't exactly the same as "faster." Let's say you're helping a friend move from a downstairs apartment to an upstairs apartment. He's got 30 boxes of stuff to carry and it takes 10 minutes per box. It would take 1 man 30 trips for a total of 300 minutes. 3 men could do the job in 10 trips each, for a total of 100 minutes, 1/3 the time. Individually, none of them went any "faster" but together they did "more."
In both cases, it's 30 total trips. But with three people, you're parallelizing the workload, they're all making trips simultaneously and that's great. But it can't always work like that. Let's say he's got 30 boxes worth of stuff to carry, but only 1 box to put them in. So you load up a box, take it up upstairs, unload it, bring it back downstairs, refill, etc. Now it doesn't matter how many people you have, you can only make one trip at a time. There's no way to do "more" here, the only way to improve things would be go try to make each trip "faster."
That's because the status of each trip is now dependent upon the completion of the previous trip. You can't start a new trip until the old trip finishes. In this case because you need the box to fill and refill. But you can imagine that box as being some data, and the loading/unloading as operations being performed on that data. If those operations have to be performed sequentially, parallelizing won't help.
TL;DR more != faster
What you just described in your analogy I know it as "indivisible algorithm". I am *very* sure that a big chunk of code out there hasn't reached that point yet. If you do reach that point by means of the framework you're using or something, then nothing else you can do.
In your own example, you do recognize if the box *can be divided* you can make good use of parallelism (faster or not, it's given by other factors). Same applies to *any* solution design in computing science: why make the box un-divisible when you can go with smaller boxes for even *more* people? There is a reason Computer Scientists and Mathematicians make such good money when asked to optimize stuff. They re-design solutions to fit specific hardware; tailor made solutions in software for specific hardware arrays and configurations if you like. So, imagine you have control on how to create the box and you decide to make it indivisible. Then yes, making those 3 people carry the single box might not be as efficient/easy as making the stronger person carry it. You can modify your analogy to fit any scenario possible with CPU width and IPC, but what you and gamerk are missing here is that current code has not reached the threading nature where we can say "it can't go wider". Currently, most developers have the "I can't divide this box, because the framework doesn't let me"; come DX12 for alleviating part of that. Then you have "ok, I divided the box, but then the box allocation is making a huge queue"; come better schedulers! And so on.
I am not saying it is easy (again) but it is *doable* by your average code monkey with good architectural supervision.
Cheers!