well, the pIII was built around the p6 architecture, first available with the pentium pro, and then used in the pII with a half speed l2 cache instead of the full speed 256/512/1 mo l2 cache of the pentium pro..apart from taht fact, you seems unaware that 128 bit sse execution cut some calculus times by a factor of 2....athlon64 need several passes to realize a 128 bit sse fpu instruction, core2 duo need only one.as you seems unaware that truncated fpu calculus are made using 128 bit sse precision instead of x87 commands...it s now a classical horse for intel : unable to beat amd in the ipc race, they tend to create new instructions, allegedly for better performan in fact to create inabilities among its rival s products..so it was with the p4 and sse2....so the improvement of the fpu are not limited to the x87 commands, but are massively used for the 128 bit sse units, which are located along the classical x87 fpus...