Why 4 threads run faster than 2 on my dual core?

tokyotech

Distinguished
Mar 18, 2008
146
0
18,680
I've built a ray tracer in Java. It generates images like this:

computer-ray-tracing.jpg


(this is not mine, it's just an example). I multithreaded it by cutting up the image horizontally and telling each thread to render a chunk of the final image. I did quite a few tests and these are the average times:

1 thread: 35 seconds
2 threads: 21 seconds
4 threads: 20 seconds
8 threads: 21 seconds

I own a Intel Wolfdale @ 3.0 GHz (Dual Core). The improvement when going to 2 threads is obvious, but I can't make sense of why going to 4 threads makes it go even faster. I thought Intel didn't have hyperthreading since Pentium IV, so my 2 cores mean that I can only run 2 threads simultaneously. The dip in 8 threads makes sense because of more context switching.
 
You don't have hyperthreading - that would give a bigger gain than that. I don't know why 4 would be faster, but that isn't a big enough difference to really matter.
 

DXRick

Distinguished
Jun 9, 2006
1,320
0
19,360
It's possible that one thread takes longer to run when you do 2, and total processing time will be based on the longest running thread. When you run 4 threads, the work is better balanced, resulting in the slightly faster run time.
 

tokyotech

Distinguished
Mar 18, 2008
146
0
18,680
I just tested the same program 6 times. I think it's pretty conclusive that 4 threads is faster than 2.

Two threads:
59.203 s
59.468 s
60.468 s

Four threads:
53.204 s
56.109 s
52.422 s
 

Siggy19

Distinguished
Mar 5, 2009
144
1
18,690
It is possible that caching either of the disk or the CPU is providing a slight benefit for 4 threads over 2.

One thing to remember is that even with a single core, the Processor does not run a thread to completion before starting the next one. Instead, it runs part of one thread, then part of the next and so on until everything is done. Therefore, it may be that it happens to be more efficient to do the first part of multiple threads and then do the second part of them than to do parts one and two of the first and then part one and two of the second.
 

radnor

Distinguished
Apr 9, 2008
1,021
0
19,290


The replay/prefetch/cache mechanisms are quite more complex than that. Not to mentions the vectorial protcols.

But anyway, that 101 Informatics post gets my seal of approval !!!!