Benefits of multicore

frobot

Honorable
Feb 25, 2012
31
0
10,530
I just recently decided to go with a 6 core processor because I was interested in learning some parallel programming.
But I did some benchmarks and saw things I didn't expect.
I had read that multiple cores only benefit you when you are running something that utilizes more than one core.
But that's now what my tests are showing.
Also if I turn off 5 cores leaving only one, you can see it in the results even with something that only runs on one core.

I also found something else I didn't expect.
I had assumed that if you plot the speeds of benchmarks, each time increasing the number of threads and the number of iterations , you would see a drop off in how much each thread helps you out after you exceed the number of threads the processor has.
Here is what I got -
1-4.png

Red represents the time it takes to execute on multiple cores, and blue represents the time it takes on a single core.
The blue increases linearly as expected but the red always seems to increase in a "wave" pattern, and it doesn't spike up after utilizing all 6 cores.
I've even tested using 50+ threads and the times stay increasing at nearly the same rate.

I did the test again after shutting off 3 cores and here is what I got -
2-4.png

The time increased even on the single core tests.
After some other tests I figured out this wasn't a fluke so now I'm just wondering if this is supposed to happen, since I always hear that your applications won't increase in speed unless they're designed to use multiple cores.
Or am I understanding this wrong?
 

brythespy

Distinguished
Jun 16, 2011
330
0
18,810
Well that's true, but Maybe the application will not utilize the extra cores, Other applications will use other cores. [I'm not good at explaining] Say one app is using 80% of a core leaving only 20% left for other apps.

On a multicore system The app can use 100% of the core because other applications can use the other free cores. So overall performance will be much increased versus a single core system, regardless whether or not the app utilizes multiple cores.
 
The operating system performs a task known as "scheduling" in which it tries to find the optimal method of distributing workloads across available resources. As you can imagine, scheduling has overhead itself.

If the stars all line up properly the overhead will be fairly low, if they don't it will be fairly high. Generally speaking, the greater the number of logical threads the greater the overhead on the scheduler regardless of how many physical threads are able to be handled. However, there are exceptions and this is why your 6 core test has a slight "bump" at 6 cores and a decrease for a bit after that.

When looking at results such as these one has to consider certain aspects of the system in question. Everything from the pipeline architecture, branch prediction, cache architecture, ALU architecture, etc... all play a factor. Generally speaking, each subsequent "core" results in diminishing returns in reality when in theory it should show a 100% increase. The reasons for this are many but one of the largest is that it's very difficult to parallelize the scheduler itself. So, as we add more and more cores we are capable of handling increasingly larger amounts of workloads so long as those workloads are suitably orthogonal to eachother.

Figuring out exactly why is dependent on your 6 core system's setup. For example, if the L2 cache is shared between every two processors then it's nigh impossible to have a 100% gain from adding that second processor because the cache operations overlap. However, enabling a third processor which does not share its L2 cache with the first two processors will increase highly orthogonal performance more so than adding the second processor did.

Hope this helps a bit
 
... multiple cores only benefit you when you are running something that utilizes more than one core

Welllll .. . it's probably a bit more complicated than such a general assumption -- especially if the assumption is the foundation of your overall understanding.

First off -- we don't know what hex-core you are using. That info will provide for more detailed answers and stuff. There are big differences between a Thuban and a Bulldozer -- even more diverse with Intel.

Processors do not execute code sequentially. It's a dance between elements of the processor itself, its operating environment, the program it is executing, etc., through scheduling, snooping, load balancing, etc. And a lot of waiting and misses.

And, for some reason, there are instances where three specific cores simply fail miserably in execution - most likely with scheduling issues - and you may have an example of it here. It's a 'doink' that otherwise seldom arises in everyday execution.

As far as the above quote, Windows generally balances load across cores (this is not parallel threading!) so scheduling more so than anything determines which core is hit, BUT ....

You may set the core affinity for a program. I have a few programs that gain 12-13% being locked to a single core.

The point being that with multiple cores you may designate a priority core for execution. Your program is at the front of the line for that core. That frees up remaining cores to handle other programs and background services. Look at it another way: it's just simply a more efficient way to utilize resources.

BUT :lol: don't let your thread be dragged off into a discussion of load balancing, scheduling, core affinity, Bulldozer cache (and modules) ...

It's application symmetric multiprocessing (SMP) where individual working threads in a program run in parallel across multiple cores on which you want to focus. Video conversion, content creation, 3D modeling, data analysis are the big winners, here. Gaming? Not so much.

And to further not answer your questions, the overwhelming majority of programs have no need or reason to run multiple threads. That would just make your system wait longer for something to do.

but wait, THERE'S MORE!

Execution in parallel across CPU cores ain't dead, but the writing is on the wall. Many parallel operations are more easily and efficiently performed across GPU cores. How long it takes to fully get there is anyone's guess.

Next shot up: nVidia. They have a lot riding on it, and early indications are not good so far with GTX680 compute.



 

frobot

Honorable
Feb 25, 2012
31
0
10,530
This was just more complex than I thought but I'm starting to understand how this happens.
Right now I'm running about 700 total threads, and that is why I see a speed boost where I didn't expect to.
You have to consider all the threads running instead of just the ones from a specific application.
 
99% of those threads run once and sit around waiting for something to do. They are not completely dependent of one another, but for the most part they are operationally independent outside of a specific process; and most likely independent of any other program.

Think of any process on your computer as a room, and a thread as a person in that room. Your program is a set of instructions for the person in the room to carry out.

A thread lives in a process, and executes the instructions of the program. The overwhelming majority of instructions just run once -- no real reason to have them bouncing around a bunch of different cores, BUT

... there are certain programs that can execute instructions/threads in parallel putting a lot of those folks in motion together. This is primarily in video conversion, content creation, 3D modeling, data analysis, lots of scientific stuff, etc.

But the other 99% of your people in the other rooms are still standing around waiting for instructions.

 
Remember the OS. It may be lightweight, but its there at all times.

Windows is essentially a priority based OS; at any point in time, the thread with the highest priority is run. [How that priority gets determined is a separate topic.] As a result, the OS will constantly go to the CPU and kick your application out to do whatever it has to get done.

With 2+ cores, you don't care. Even if your app is single-threaded, the OS has a second core to work with, so your app can keep running. The OS doesn't have to kick you out, because it has another core it can use.

With one core though, every time the OS needs to get something done, your application gets kicked out. That adds to your apps processing time significantly, as the OS will be taking a lot of time away from your application.

A VERY simplistic explanation, but should get the point across.