What's happening to CPU prices?!

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
yeah, since they can be made much smaller then transistors today, and scientists have even gotten electrons to floow paths through the tubes to perform simple tasks. But the tubes are still not stable enough to be used on a large scale and tend to break down. Also, the process to create them is still a bi ton the pricey side, so till be a while.
 
I don't know much about this stuff. I did not know if it was conductive do you know how conductive it is? As to breaking down I heard the halflife on this stuff is somthing like "infinity" very strong and stable. I read in japan they want to use this material to build a pyrimid that will hold multiple skyscrappers somthing like 700,000 people. Truly a break through product for the future. just inagine what this stuff could do for airplanes or rockets getting material into otterspace wieght is everything.

If I glanced at a spilt box of tooth picks on the floor, could I tell you how many are in the pile. Not a chance, But then again I don't have to buy my underware at Kmart.
 
Thanks for your thoughts trooper11!

I only have one small remark:
It wont speed up things, just allow them to have more windows open at once, but most people dont do 10 things at once.
This is a common misconception. There can be multiple execution threads even for one 'window', one application. Every set of independent operations can be executed in parallel.

It can be applied to games as well. Things that I can immediately identify as independent to execute are:

- Game logic
- Artificial intelligent (per entity)
- Physics calculations (per entity)
- Sound processing
- Visibility culling
- Graphics processing (transform, lighting, rasterization)
- Human interface
- Etc...

Back in the days of Quake I, the processor did all these tasks sequentially. Nowadays, sound processing is done on the soundcard using an EAX processor. And graphics processing is totally controlled by the GPU processor. Yes that's exactly the same as having multiple processors! The only difference is that these processors are specialized at a specific task. The GPU is even an extreme example of multi-threading itself since every possible operation in done in parallel. The newest graphics cards can process 16 pixels in parallel, while at the same time transforming and lighting several vertices, performing clipping, rasterization, etc.

But still, artificial intelligence and physics takes a lot of processing power and is executed sequentially on single-processor systems. With a bit of effort from game developers, it could all be split into separate threads. On a dual-processor system all these threads could theoretically be executed two times faster. There's some overhead for 'synchronizing' operations and sharing data, but there will be an clear overall speedup! With three cores like for the next XBox even more things can be executed in parallel.

And I've only touched the tip of the iceberg. The game logic probably does things that can be computed independentely as well. Even something as simple as a 4x4 matrix calculation could actually be split into 16 separate threads. That would be extreme, but clearly illustrates that even small algorithms can be parallelized. The same can be done with every loop that has independent iterations. Every iteration can be a separate thread and even within the iterations things could be independent. In fact superscalar processors already do this to a very little extend, because they can look 40 micro-instructions ahead for independent operations.

There's an another easy way to conceptualize threading. Every professional application has several thousand functions (including external libraries it uses). If every function was a thead and we had a lot of processors, we could just run all independent functions in parallel. I remember my computer architecture professor saying that most applications have around 20 independent functions in average at any given time, often more.

...

Did I say "small remark"? :wink:
 
oh i know, of course almost any app can in theory be used in a multi thread environment if it performs more then one task at a time, but my point still stands, do mainstream users need it? Will they need it in 10 years? maybe, but i doubt it. Of course int he very high end there wil be a need earlier, but that is not the largest segment.

Like when will there ever be a need for multi threading in things liek word or internet explorer lol. of course oyu could use it, but no one NEEDS it.
 
yeah sure thats ten years. multi threading wont save the pcu either, itll just delay the inevitable , that is the phyisical parts hitting thier limit. its possible there could be yet another single core design using another process.

but like even today, most people that buy pcs, from like dell or hp, they dont even need a 3ghz p4, most get by with 2ghz just fine. Im just saying progress gets dragged down by the economics of it.
 
Sure, we could have diamond chips, with multiple cores. We will have something. Speculation at this point is looking at multi-core, but who knows? It's the journey, not the destination that I like.
 
Yeah I have to fully agree it has always been that way.

The only thing that changed is for a while the top cpu's were being released cheaper than what they used to be released at. P4 3.4 for $500 compared to PII450 for $1000. But stepping down 1 notch from top gave a huge savings then, just like now. Looking at the Pentium 4 C chips, the best Intel deals have stayed almost the same price but you keep getting a higher clocked chip. 2.4C was $175, 2.6C dropped to about that, 2.8C dropped to about that, and the 3.0 will soon drop to that. The place where it stops is near the top of what platforms can use, as those chips keep their value for a while after being discontinued.

Intel with the desperate attempt to match A64 in games put out the EE's and in affect raised their top consumer level chip back up in price from the $500 range to the $1000 range it was in the Pentium II days. Ugh, too rich and stupid for me.

Anyway, as you know, 2-4 steps down from the top chips have always saved a ton of money and been the better buys, which helps us do it yourselfer's put our money to better use than buying a Dell or OEM system. Buying the 3.4's now just doesn't really make sense. 2.8-3.2 is where it is at, both for A64 and P4.




ABIT IS7, P4 2.6C, 512MB Corsair TwinX PC3200LL, Radeon 9800 Pro, Santa Cruz, TruePower 430watt
 
Like when will there ever be a need for multi threading in things liek word or internet explorer lol. of course oyu could use it, but no one NEEDS it.
I need it. For faster code compilation. For faster multimedia encoding. For faster scientific simulation. (Others need it. For faster games.)

I agree many tasks don't need that much processing power, but some tasks can never be fast enough. Code will always get bigger, multimedia formats will always get bigger, and scientific simulations will always get bigger. The faster processors get, the more demanding the next generation of applications becomes.

Besides, this is an argument pro multi-core. Unoptimized applications like Word and Internet Explorer could just run a single thread on one core. Applications that do need the extra performance can be optimized to take advantage of the extra cores.

Going from Northwood to Prescott, the number of core transistors nearly tripled, while performance remained the same. That really makes you think why they need all these extra transistors for applications that don't even make use of it, doesn't it?
 
>Going from Northwood to Prescott, the number of core
>transistors nearly tripled, while performance remained the
>same. That really makes you think why they need all these
>extra transistors for applications that don't even make use
>of it, doesn't it?

No it doesnt.. makes one wonder how bad they screwed up Prescott, or how many unused transistors are in there. Netburst is bust, Prescott a trainwreck, don't generalize too much on that.

For comparison, an Athlon FX uses only 105M transistors, while running native (64 bit) OS and apps it will perform roughly 50 to 100% faster than a Banias with 77M transistors, and still significantly faster as Dothan with 120+M ? transistors).

Or as another example, S939 Athlon 64s will have a significantly smaller transistor count as either the current ones, or even Banias, while offering better to much better performance.

Really, there are still other and sometimes better ways to extract performance besides multicore. Multicore doubles not only (core) transistor count, but (core) diesize and power consumption as well. This really isnt always a good trade off, especially if it often brings zero performance increase (single threaded performance).

Now don't get me wrong, I'm seeing the trend, and look forward to multicore just as well, but in addition to increased single threaded performance, not to replace it. If I believed in the wider-only approach, I'd put my money on Niagra (8 way multicore, but highly primitive "slow" cores), which I definately don't for anything but specialized apps or server workloads (and even there I wonder); I surely wouldnt want that near my desktop, might as well run a beowolf cluster of 386's :)

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
>But did you find that date yet in the table that showed a
>worse scenario than today?

How hard can it be ? Oh well, here you go:
Q2 1997
Pentium II 233: $637
Pentium II 266: $775
Pentium II 300: $1981

I'm sure that proved the end of innovation and the semi industry hitting a brick wall :)

> I think it's a bit naïve to assume that if a CPU
>technology has continued to grow steadily for the past
>fifteen years, it will grow forever.

I would use "shrink" instead of "grow" in this context 😛
Anway, its not 15 years its roughly half a century. And during all this time, each year the end was near according to some. You did read that Amdahl's quote of mine, didnt you ? Well, since I am no Jehova witness, so I'll just see it when it happpens. Not any time soon though.

>Name one application that would never run faster on
>multiple processors.

Duke Nukem Forever ?
Leasure Suit Larry II ?
Heck, even "Quake 3 -r_smp 1"
:)

Now can you name me one application that will never benefit from faster clocks, or higher single threaded IPC ?

> AMD's less radical approach shows that this is not the
>only way to increase performance. IPC matters, and soon
>they'll realize TLP matters as well.

"soon" ? LOL, K8 was designed from the ground up to support multicore, and I learned of their 2 way core plans over 6 years ago. P4 was supposed to (and may still for Xeon )go multicore, as IPF obviously, but 2 way CMT dothans are a contingency plan that has not started longer than a year ago at most and I assure you the venerable Pentium Pro core did not have provisions for it.

> I'm currently working on a camera surveillance system
>where several cameras are shown on one monitor, using
>motion detection to pick the most interesting ones to
>display. It's really 'natural' to use a thead per network
>connection

OF course it is. Now consider how you'd speed up the motion detection algorithm of just ONE camera using multi threading... Tell me it is as easy as implementing MMX or SSE.

> Katmai has 9.5 million transistors, Prescott has 125
>million (75 excluding the cache)

Katmai had 0 Kb level 2 cache. You should count at least the off die cache chips as well, unless you really think even a dozen Celeron 300 (not A) cores bundled in one chip would be faster at ANYTHING as a 3+ GHz Prescott.

Even more fair, compare a .13 512 Kb P3 tualatin to a .13 512kb northwood. 44M for tualatin, 55M for Northwood (according to sandpile). How do you think a 1.4 GHz tualatin faires against a 3 GHz northwood ? Both using the same process, same cache size, comparable transistor count.

>And these phantom transistors are going to do what, double
>performance?

My guess is those phantom transistos are there for some form of DMT, which might indeed double performance in some cases (if it worked, and wouldnt send power consumption through the roof). Just like dual core might (nearly) double them in some. But like I said, try and ignore Prescott, its a wreck as it is, we know that.

>40% of Prescott transistors (cache) is on 30% of its die
>space. This tells me cache fits on 75% of logic space. Well
>if that's your definition of "MUCH more dense" then I can
>still fit in nearly a dozen Katmai cores.

Prescott uses 23 mm^2 for its 1 MB L2 out of 109 mm², so 21%. If that indeed consitutes 40% of its transistor count (I don't have a reference here, go a link ?), it means its ~2.5x as dense, MUCH denser indeed. i think you got your math wrong here:

Density= transistor count/mm²
prescott is 125M transistors
cache density = 40%x125M/23mm² or 2.2M/mm²
Core density = 60%x125M/86mm²=0.9M/mm²
If you divide the exact numbers the ratio is nearly spot on 2.5.

>Are you still sure that current single-processor technology
>is not in trouble?

Yes, I just think intel is in temporarely problems with their extreme speedracer design. The writing has been on the wall for a long time though.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
And if The Inquirer is correct then also Intel is looking at other architectures: Intel’s Potomac team gets dissolved

Are you still sure that current single-processor technology is not in trouble?
You can't be too careful when quoting the Inquirer. :evil:
<A HREF="http://www.theinquirer.net/?article=15780" target="_new">Intel Denies Potomac technology canned</A>

He that but looketh on a plate of ham and eggs to lust after it, hath already committed breakfast with it in his heart. -C.S. Lewis
 
and you missed my point agian.

you are not a minastream average joe user that buys from dell are you? No your not. Id say at least 70% of sales go to average joe users just to ge tont he internet, listne to some music, type, maybe watch videos and maybe game a little. Now you tell me why they need multi core now? See thsi is where the industry tries to force users to upgrade by making them think they need somehting they dont.

Yes maybe you need it, i said that all ready, in the hihg end segment it would be needed, but not where im talking about. A person could be happy with a p4 3ghz or ahtlon 64 3000+ for many years and never have problems. For all they need ot od its more then enouhg power. besides that, evne if they needed soemthing more, there are other componentst hey would upgrade first before the cpu thats for sure.
 
How hard can it be ? Oh well, here you go:
Q2 1997
Pentium II 233: $637
Pentium II 266: $775
Pentium II 300: $1981
Nice try, but those first two are not even close to affordable. And it's not because at that time processors were overall more expensive. Just look at the prices for Pentium I at the same date. Half a year later they did become affordable. Or look one year later and again you see a nice price ramp, where the affordable CPUs are nowhere near the performance of the fastest. So still, I don't see anything close to today's situation.
I would use "shrink" instead of "grow" in this context 😛
Anway, its not 15 years its roughly half a century. And during all this time, each year the end was near according to some. You did read that Amdahl's quote of mine, didnt you ? Well, since I am no Jehova witness, so I'll just see it when it happpens. Not any time soon though.
Well I don't want to make any predictions either. I just observe the strange prices, and the physical limits they're hitting with current technology. There are several ways to circumvent these limits using current technology, and one is multi-core.
Now can you name me one application that will never benefit from faster clocks, or higher single threaded IPC ?
None of course. The real question is: Will applications benefit more from multi-core processors than from the clock increase / IPC increase made possible by doubling the transistor count? Looking at Willamette, it's clear that on the same process there was almost no performance increase from using longer pipelines and lower IPC. Most performance increase was from increased memory bandwidth. And looking at Prescott, now it's even getting hard to increase clock frequency on newer processes. So unless some physical miracle happens that will enable Prescott to reach the promised 6 GHz in a couple of months, what way would you go?
Of course it is. Now consider how you'd speed up the motion detection algorithm of just ONE camera using multi threading... Tell me it is as easy as implementing MMX or SSE.
I'll tell you: it is. Split the image in two and process them separately to detect motion. It's so simple I could implement it in ten minutes. And yes, this is a perfect example of a pure 2x speedup on a dual-core processor. Oh and it doesn't have to stop with two threads and two cores...

You're right about Prescott cache density though. I must have pulled some numbers together too quickly. I know it's 40% of transistors since 1 MB x 6 transistors is 50 million.

Anyway, thanks for the learnful discussion. I hope you or anyone else learned something as well. I'd like to end it here because we could go on forever without leading anywhere. So let's just agree to agree on what we agree and agree to disagree on what we disagree. Feel free to answer all my unanswered questions though. :wink:
 
you are not a minastream average joe user that buys from dell are you?
Actually I have a Dell laptop. :wink:

The most important thing is that performance is needed, when it is needed. What I mean is that simple things like surfing the internet doesn't take much processor power at all, but that doesn't mean everybody is content with a Pentium 200 MHz. We have more powerful systems because some tasks can never be fast enough. Even my mother, a complete computer newbie, complains about the reaction time of some applications. To have a very responsive system, it has to be fast.

Besides, it would be a little simplistic to assume that 70% of computer users don't do anything that uses lots of processing power. They don't buy these relatively expensive machines solely for surfing the net one hour in the weekend. Every amateur has demanding software and they all want higher performance at a lower price. I remember when everybody said 1 GHz was more than anyone would ever need...
 
so your saying everyone needs more then a 3ghz pc now? how is that ? sure there is always better, but oyu still dont get my point.

people need whatever these companies tell them they need. Your actaully saying that most average users need more thena 3ghz cpu? why not 2gb of ram too? and lets see 1 tb of hd space? lol sure its nice to have, but what do poeple really need? Id like to have it, but my parents certainly dont. Yeah they will complain when somehting hangs or something, but oyu know as well as i do that its not the cpus fault most of the time for hang ups like that. Unless they are editing video or something, I could see that being a factor, but seriosuly, I knwo alot of novice users and belive me,t hey are happy with thier pcs. My aunt and uncle have a 2ghz intel machine and they couldnt be happier, they edit photos form a digital camera,burn cds, use the internet. My mom and dad use evne lower powerd machines and they do things like cd burning, video transfer form vhs to dvd, and interent use. Of course its possible ot speed those things up, but they are happy wiht it, and they wont pay to get a new system becuase it cuts 30 seconds off an encode or writes cds faster.

You just want it for you lol, thats cool, so do I. But i couldnt recommend to my parents they invest in multicore cpus just becuase they could encode to dvds faster, come on.
 
It's not so much what people need, as what they want. Faster computers will give more options.
For some the hook will be video+VOIP, for others TVOIP, and movie rentals OIP. Some want a computer to run household functions. Have we even begun to scratch the surface?
 
>Nice try, but those first two are not even close to
>affordable.

Neither are the P4 3.4EE, the 3.2EE and the 3.4E is $475. I think the situation is rather similar, if anything, the price premium for the "ultra fasts" was even bigger back then, and overall prices dropped. you call $637 'not even close to affordable', but it was a perfectly normal price for a high end cpu back then. Sub $100 cpu's just didnt exist afair.

>The real question is: Will applications benefit more from
>multi-core processors than from the clock increase / IPC
>increase made possible by doubling the transistor count?

Some will, other won't. Simple really.

>Looking at Willamette, it's clear that on the same process
>there was almost no performance increase from using longer
>pipelines and lower IPC.

Who said longer pipelines a la netburst was the only way ?

>Looking at Willamette, it's clear that on the same process
>there was almost no performance increase from using longer
>pipelines and lower IPC.

Even for willamette, you'd have to be fair and admit that a 2 GHz willamette (even 2.2's have been made) was considerably faster than a 1 GHz P3.

>Most performance increase was from increased memory
>bandwidth

What makes you so sure a P3 would have benefitted as much from that much bandwith ? Surely you remember similar clocked (and similary performing) Athlons gained close to nothing from faster FSB's and DDR Ram, that wasnt until ~1.4 GHz that it really started to matter.

>And looking at Prescott, now it's even getting hard to
>increase clock frequency on newer processes

You keep looking at Prescott, and I keep telling you to ignore it. You can't take one screwed up design and conclude a trend out of that. You think a 90nm Northwood would have had the same issues ? I think not.

>So unless some physical miracle happens that will enable
>Prescott to reach the promised 6 GHz in a couple of months,
>what way would you go?

No one promised 6 GHz Prescotts, but since you're asking, I'd have scrapped prescott a long time ago, and shrunk NW instead. Increase the cache to 1MB, and I think you'd have a nice cpu that should easily reach 4+ GHz without being excessively hot. Either that, or licence the K8 design :)

>I'll tell you: it is. Split the image in two and process
>them separately to detect motion.

LOL ! You're not multithreading your algorithm (which is what I asked), you're splitting up the workload ! It doesnt work that way either, if you can detect motion accurately enough using just half the image, why bother processing the entire image in the first place ? I could just as well claim I could speed up the algorithm by a factor 2 using just one core: just decrease the resolution (or rather number of pixels to be processed) by a factor 2, there done. See ? Easy as pie. Single core is twice as fast now. thing is, your "multithreaded" version won't behave the same as the original one, tracking motion from one part of the image to the other won't work for instance. try again, and honestly, tell me it would be easy to multithread the *algorithm*.

>So let's just agree to agree on what we agree and agree to
>disagree on what we disagree. Feel free to answer all my
>unanswered questions though.

Same here 😉

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
LOL! He who laughs last...
You're not multithreading your algorithm (which is what I asked), you're splitting up the workload !
I am multi-threading it. And yes I'm splitting up the workload over multiple threads (processed by multiple cores) to finish the job quicker. That is what you asked for.
It doesnt work that way either, if you can detect motion accurately enough using just half the image, why bother processing the entire image in the first place ?
Like I just said, I would still process the whole image. But in less time.
I could just as well claim I could speed up the algorithm by a factor 2 using just one core: just decrease the resolution (or rather number of pixels to be processed) by a factor 2, there done. See ? Easy as pie. Single core is twice as fast now.
Cheating like that still works on a multi-core processor as well.
thing is, your "multithreaded" version won't behave the same as the original one, tracking motion from one part of the image to the other won't work for instance.
And what makes you think that? Tracking motion happens per frame. And my multi-threaded version still processes frame by frame.
try again, and honestly, tell me it would be easy to multithread the *algorithm*.
It's as easy as 1 + 1 = 2. Honestly.

Look, all the required operations are done in loops. Loops of which each iteration is independent. Therefore, it's easy to let one thread process the first half of the loop, and let the next thread process the second half of the loop. Only when both threads are done, results are combined and we can continue processing the next loop in parallel. There's actually a resemblance with SIMD here. Instead of processing sequential data in parallel, I would be processing data from a totally different location in parallel. And each thread can still use SIMD as well. The possibilities are really endless...
 
Okay, without having more precise information on the algorithm and its bottlenecks and dependancies, I can't usefully comment. From what you posted I understood you'd actually have one thread process half an image, and search for motion there, and another process the other half. As if you'd be detecting motion independantly on two seperate camera's which obviously isnt the same thing. If you don't see way, extrapolate to the extreme, and you'd have a thread per pixel, and you really can't determine motion with a 1 pixel image. But I may have misunderstood.

But consider game engines. One can say it should be possible to create threads for AI, physics, rendering. There shouldnt be too many dependencies there, I agree that ought to be doable, even though it will create an overhead meaning the same game most likely will run slower on single threaded cpu's. If the threads can independantly process significant ammounts of data, the overhead won't be that big. But if they can, it also means its likely at some point the AI will consume a lot of CPU power, the other moment its the physics. Which means using a dual core chip, very often, one thread (therefore core) will not be doing a whole lot, while the other is working full speed, meaning you're not getting anywhere near a 100% speedup. As the workloads per thread without needing synchronizing and data exchange between the threads decrease, the overhead grows, meaning overall performance will drop, especially on single cored systems (still by far the majority for the foreseeable future). See the problem ?

Now, a "better" solution would be if you could actually multithread the physics or AI engine themselves, but good luck doing that. The physics engine would be nothing but dependencies, and I assume the same would apply to the AI. For the rendering it should be quite feasable to chop the workloads in smaller independant chunks, but most of that is already done by the GPU (and in parellel) anyway, so not much to be gained there. Same for audioprocessing which is usually also being offloaded to a dedicated audiochip. So if you're seeing a <10% speedup with Quake3 running on a SMP machine, I can't say I'm really surprised. If anyone ever achieves >30-40% using SMP with a real game (using null renderer or something to take the GPU out of the equation), I would be quite impressed and surprised.


= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
Okay, without having more precise information on the algorithm and its bottlenecks and dependancies, I can't usefully comment.
That's a much wiser answer than saying it isn't possible to use multi-threading here. Thank you.
As if you'd be detecting motion independantly on two seperate camera's which obviously isnt the same thing. If you don't see way, extrapolate to the extreme, and you'd have a thread per pixel, and you really can't determine motion with a 1 pixel image.
The algorithm consists of several parts. One just determines the amount of motion. It simply subtracts the new frame from the previous one and adds all these (absolute) differences. Theoretically, this could use a 'processor' per pixel (I wouldn't be surprised if hardware motion detection had logic for this per pixel). The next step is to determine the centroid of the movement. This is done by multiplying every difference with the pixel's position. Again an extremely parallelizable operation. Then a rectangle is constructed which contains most of the motion. This is done by adding up the motion per line (horizontally and vertically), to locate the regions with ~80% of the movement (again horizontally and vertically). The intersection of these regions is the rectangle where movement is concentrated. Once more this is a lot of loops with repetitive operations. One thread per line would theoretically be possible.

Anyway, I hope you're convinced that two processors would make these calculations really close to two times faster with relatively simple multi-threading. I hope it also sounds a bit more believable now that nearly any algorithm can be multi-threaded efficiently.
As the workloads per thread without needing synchronizing and data exchange between the threads decrease, the overhead grows, meaning overall performance will drop, especially on single cored systems (still by far the majority for the foreseeable future). See the problem ?
There's not much to worry about. First of all, threads, contrary to processes, are really efficient. They use the same memory space and everything else. In fact for the processor they are nothing more than multiple instruction pointers, and switching between them is nearly instantaneous. With Hyper-Threading the processor even decides where to read instructions from every clock cycle. Synchronisation means locking a thread with a mutex to wait till other data arrives. However, on a single-core processor, everything is processed sequentially and thus the next thread starts when the previous ends, anyway. If you're still worried about performance on single-core processors; I can assure you that it's easy to create two versions of the software. A simple command line option can be used to decide to use multi-threading or not.
The physics engine would be nothing but dependencies, and I assume the same would apply to the AI.
The physics engine has to test for collision by checking many polygons-polygon intersections, and the ragdoll calculations use big matrices. Both can use some parallel processing power. I'm no expert at A.I. but I assume it uses graph algorithms and genetic algorithms that need to test a lot of situations. Two threads can do this faster than one. The general rule is that every bottleneck of the application is a loop. And most of the time it does very repetitive, independent operations. This is exaclty where treading can really help. So I wouldn't assume too quickly that it's full of dependencies. maybe out of these loops there are many dependencies, but the real processing power is needed inside of them.
So if you're seeing a <10% speedup with Quake3 running on a SMP machine, I can't say I'm really surprised.
I'm not surprised either. Like I said before, and if I do recall correctly, only the A.I. code of Quake 3 is multi-threaded. And it certainly isn't using top technology for it. Besides I don't think Quake 3 was ever really very CPU limited. Anyway I'll stop using Prescott as an example for wasted transistors if you stop using Quake 3 as an example of multi-threading performance. :wink: Let's indeed hope that when multi-core processors finally arrive, developers will learn quickly how to use it optimally...
 
The algorithm consists of several parts. One just determines the amount of motion. It simply subtracts the new frame from the previous one and adds all these (absolute) differences. Theoretically, this could use a 'processor' per pixel (I wouldn't be surprised if hardware motion detection had logic for this per pixel). The next step is to determine the centroid of the movement. This is done by multiplying every difference with the pixel's position. Again an extremely parallelizable operation. Then a rectangle is constructed which contains most of the motion. This is done by adding up the motion per line (horizontally and vertically), to locate the regions with ~80% of the movement (again horizontally and vertically). The intersection of these regions is the rectangle where movement is concentrated. Once more this is a lot of loops with repetitive operations. One thread per line would theoretically be possible.
I think that splitting the calculations for a single frame is a bad idea. I think the overhead for syncing the data would take up enough processor time that it wouldn't be that much faster, or maybe even slower. I think the best applications for multi-threading are larger scale, like having a thread for each camera complete with its own motion detection, analysis, and tracking algorithms. They could all run independantly and write to disk as needed. Heheheh, who do you work for, I'd like to apply for a job.

He that but looketh on a plate of ham and eggs to lust after it, hath already committed breakfast with it in his heart. -C.S. Lewis
 
Personaly I agree with what you are saying. Multiple cpu core offer much potential. But to make it reality would rquire enormous effort from software engineers just like Itainium would. This is why Itainium will fail overall. Too much work and people don't wish to buy all new software. But if dual multi core becomes the norm I could see software being written to take advantage of it. Just it would be slow and gradual at least that is how I see it. and dual core would never be 200% performance of single core unless running an out of realilty benchmark to make dual core look good.

Thats how I see it anyway.

If I glanced at a spilt box of tooth picks on the floor, could I tell you how many are in the pile. Not a chance, But then again I don't have to buy my underware at Kmart.