Reynod :
Though X86 has constraints of the design (quaint addressing modes / segment registers / variable-width & macro instructions - requiring more decode work therefore overhead, as these are implemented via microcode) both Intel and AMD have spent a lot of money developing microarchitectures for the x86 instruction set ... hence its efficiency and flexibility in the long term.
I suspect within the next few CPU architectures, you will see Intel start to drop parts of the old 16-bit backend from its processor design. That would free up some space and allow some power savings right there. Would break legacy compatibility though...
mayankleoboy1 :
i used to work in an IT company some time back. The work was to develop server side Java applications for database management and retrieval. The devs used threads all the time. But, they were all run on one core only. Nobody even knew how to do parallel threads on multiple cores.
Another work was to create Macros in Excel and Access, which fetch data and then process it and convert it into graphs, tables and charts. Again, this was all sequential. Even though, it can be made into SMT. But nobody knew how.
So yes, parallel programming is a dream right now.
Excel (and most of office) frankly does not scale well. Nothing devs can do about that. Excel should be scaleable for the most part. (Office is badly in need of a redesign...)
As for threads in general, I say this again: The CreateThread API call (http://msdn.microsoft.com/en-us/library/bb202727.aspx) has no facilities whatsoever for assigning a thread to a particular core. The ONLY way to do this on windows is via the SetThreadAffinityMask call.
And even then, you have to be concerned about data integrity. When do I have to start locking/unlocking. Making sure if you abort a thread, you unlock all its resources. This kills performance. Then you have all the implicit locking Windows does as the result of certain API calls. Its actually easy to lock the main GUI if you aren't careful (which I've seen plenty of people do over the years). The preferred method for most programs is to let the windows scheduler handle core loading, unless you have a workload that is known to be parallel in nature and not likely to be blocked by external I/O.
(Protip: A 'finally' block is NOT guaranteed to run, so don't rely on them to clean up the program state. Example: A thread that is deleted from an external thread will never go through its finally block, bleeding resources. You clean up resources when you are done with them, period, to avoid this issue. Almost all Java devs are taught wrong in this regard, and make very poor programs as a result. /rant).
FYI, some of the relevant API calls:
CreateThread:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682453(v=vs.85).aspx (Note you'll always call this via a differnet API call, such as BeginThread)
SetProcessAffinityMask:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686223(v=vs.85).aspx
SetThreadAffinityMask:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686247(v=vs.85).aspx
SetThreadIdealProcessor:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686253(v=vs.85).aspx
Note the lack of an easy way to find a HTT core. Aside from looking at the raw CPUID data, I don't know how to determine if a core is HTT or not (I'd be shocked if there wasn't an API call somewhere that exposes this though...)
And BTW, theres a massive trap with all these threading options. Lets look at a simple problem: You have two reasonably independent threads that can be run in parallel. How do you do this?
Option 1: Leave it to the Windows scheduler.
Option 2: Hardcode the two threads to cores 0/1 (or any two cores)
Option 3: Hardcode the second thread to any core besides core 0
Option 1 is the only correct option.
Option 2 ignores the possibility of high workloads already existing on the first two cores. Unacceptable to make the assumption you are the only heavy-work process active.
Option 3 ignores the possibility that the other thread will NOT be placed on core 0 to start. Same problem as Option 2 also exists.
So if you want to start hardcoding your thread logic, then you have to start asking for how heavy each CPU core is, figure out which ones are doing the least work, and manually assign your threads to those cores. Nevermind that in the time it takes to do this, those cores might be back at heavy workload again. Or some other task will use those cores, and two other cores might be doing less work, but because you hardcoded the thread logic, your thread can't jump to the less overworked cores. Woops.
See how threading very quickly becomes REALLY complicated?
AFAIk, Physx is the only physics engine that uses GPU. Bullet and Havoc are still CPU based.
My main point was that physics engines that can handle multiple-object interactions dynamically will HAVE to be GPU based, because they are massively parallel and complicated equations that will not be able to be run on the CPU with any decent amount of speed. [It would be equivalent of attempting to do rendering entirely on the CPU. You CAN, it will just be really, really slow.)