Smithfield: 2Q05, 2.8Ghz for $240!!!

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
It might work.......... for a pacman type game. In terms of real computing, in non linear instructions, the thread would break
Unlikely, as the processing-arbiter would pretty much have be designed to not break threads no matter what. So at absolute worst the processing-arbiter would be forced to un-distribute execution back to a single core to prevent a distributed thread from breaking in the case of a pre-processing 'guess' miss. During any un-distribute cycles needed to fix such a miss the registers that the other cores containing pieces were using would be locked in use. Other than that there would be no loss, and the actual execution time would remain identical to a single core running the thread. The only possible loss of performance compared to any other system would be that if there were so many registers in use from multiple threads some of the other cores would have to enter waiting loops until the execution of the thread was returned to a single core so that the locked registers could be freed for use again.

That and the slightly longer pipeline would be the only losses in performance. The gains in performance however should more than make up for these, especially since in theory the processing-arbiter should allow as many threads as it has resources for per cycle instead of just one (or two with HT) per core. After all, the processing-arbiter wouldn't just be an enhancement to single-threading, but also an enhancement to multi-threading.

<pre>I just want to say <font color=red>I wuv you</font color=red>.
And I mean it fwom the <font color=red>bottom of my hawt</font color=red>.</pre><p>
 
What happens is that it becomes even more powerful because then, like in EPIC, it can start processing several branches simultaneously before it even knows which branch to use.
Well... not really. If the branches are conditional (which is mostly the case BTW), it'll have to wait until the flags have been set by the previous operation.

Honestly, your chip runs like a real time compiler to convert single threaded code into multi-threaded. If you really want to do this, why stick to the old and inefficient x86 code? Put the same chip in front of an Itanium (or two) and watch your x86 code fly...
 
Well... not really. If the branches are conditional (which is mostly the case BTW), it'll have to wait until the flags have been set by the previous operation.
Forgive the bluntness, but you clearly don't seem to understand what I'm describing. The processing-arbiter could calculate the condition determination and begin processing both branches simultaneously. Then when the results of the condition determination are calculated it already has some of the results of the right branch. (As well as some of the results of the wrong branch, which it just discards at that point.)

Honestly, your chip runs like a real time compiler to convert single threaded code into multi-threaded.
You've got it half right anyway. It's not a real time compiler to convert single-threaded code into multi-threaded. It's a real-time micro-code reassembler, a large shared register cluster, and CPU core utilization manager all in one. The point is not to convert single-threading to multi-threading, but to negate most of the differences between single and multi threading at micro code level for all cores in the CPU.

If you really want to do this, why stick to the old and inefficient x86 code? Put the same chip in front of an Itanium (or two) and watch your x86 code fly...
Actually, that's the point of this, is that it isn't really an architecture-specific concept. And in fact if you put a micro-code disassembly layer between the incoming instructions and the processing-arbiter, then as long as the processing-arbiter had access to cores with the required execution units and registers it could run any instruction set that gets turned into micro-code.

So you could put an Itanium core and a Xeon core in the same CPU, have a layer to dissasemble x86, EM64T, and IA64 instructions into micro-code, run that micro-code into the processing-arbiter, and the processing-arbiter would distribute to best utilize the resources of the available cores.

And if you combined it with Transmeta's technology and added a little style you could quite possibly create a CPU that could run <i>any</i> instruction set efficiently so long as you wrote a micro-code dissasembly handler for each instruction set and had the necessary elements to run the micro-code in your cores. (Obviously things like bitness of execution units would determine what kinds of instructions you could support.)

<pre>I just want to say <font color=red>I wuv you</font color=red>.
And I mean it fwom the <font color=red>bottom of my hawt</font color=red>.</pre><p>
 
Forgive the bluntness, but you clearly don't seem to understand what I'm describing. The processing-arbiter could calculate the condition determination and begin processing both branches simultaneously.
...effectively wasting 50% of the cpu power, cashe, FSB bandwith, etc. No, it doesn't sound good.
You've got it half right anyway. It's not a real time compiler to convert single-threaded code into multi-threaded. It's a real-time micro-code reassembler, a large shared register cluster, and CPU core utilization manager all in one.
Ehh. you could have added a warp drive and voilla, family's one stop shopping :)

No, I don't think I got it halfway. Call it assembler, reassembler, disassambler, what you are suggesting is to take a code and convert it to another, be it an improved x86 code or a micro-code (the latter makes it more like a compiler by taking the high level x86 opcodes and converting them to simpler RISC-like microcodes).

Anyway, we all know (I hope) that a simpler version of your reassembler exists in all post 486 CPU's (instruction decoders). What you described already happens to a lesser extent in almost any CPU today.

What doesn't register at this point is the need for two CPU's and all efforts you mentioned to synchronize them. Double the FPU/ALU units plus a little more beefed up code cashe (and all others that come with the package like trace, branch prediction, etc.) and there's your additional power. No?
 
If you insist, all I can say is the chips will be out in a few months. Then I may just drag this thread up, and you can show how you were as good at fortelling the future, as the arbiter is.