Thank you. I know that the subject of whether I've posted here before under alias has been repeated many times but, no, this is the first time. I got fed up with anonymous posting on my blog and Sharikou's blog. Anonymous posting seems to just make it easier to flame and removes any accountability from the poster. Also, I didn't care for the spoof posting that Sharikou180 has been doing (posting with someone else's ID). Any person with integrity will always own what they say even if they say something really stupid or incorrect. You say, "Oops, I was wrong" and then move on.
------------
To be honest, I've never had a lot of faith in MS's ability to do robust load sharing and management on multi-cpu systems. It is possible that MS could get it right with Vista but I think it is more likely that versions of Unix and Linux will remain the standard on server applications.
MS has a great history of snatching knowledge from other areas as they did when they bought MSDOS from another company, and then with with Xerox PARC and MacOS for Windows, and then VMS for Windows NT. The original team leader for Windows was from Xerox PARC and after he quit Windows was completed by MS's Mac applications team leader. The lead for Windows NT designed VMS which is why the initials are one letter higher V->W, M->N, and S->T. I guess they copied that from HAL. Maybe they'll get it right this time. However, I guess any improvement would be good.
I don't believe the bit instructions LZCNT/POPCNT have anything to do with alignment. I believe POPCNT counts the number of 1 bits in a word and I think LZCNT gives the number of zeros. Itanium has POPCNT which makes it much faster on certain parity benchmarks. However, several of the other hardware improvements should indeed help with alignment including the Prefetch buffer which at 32 bytes is large enough to not break long instructions in half. I recall when Motorola 68000 had to have instructions and data aligned on word boundaries.
Yes, in terms of performance it looks like:
K8 -> K10
will be the same jump as
Yonah -> Core 2 Duo
or K7 -> K8
There is definitely some convergence going on with Intel and AMD architecture. For example it is interesting the way that Intel used a hybrid bus on Tulsa to increase performance; this is a half step to native quad. With the large number of similarities between K10 and the C2D architecture you have to wonder how similar they will be when Intel has both IMC and P2P and both are producing quad cores on 45nm. Intel and AMD also both appear to be pursuing GPU based computation.