The issue of incompatiblity lies in the 64 bit operating system and is no rumor. Intel has deviated significantly from the JEDEC/AMD64 standard to the point that a separate compiler is really necessary. If you are using 32bit IA32 architecture everything will function as normal. But 64 bit is another standard.
"The divergence of AMD and Intel x86 implementations has created certain "challenges" for compiler vendors. Application developers would like to deliver a single binary that can execute optimally on both architectures. PGI's solution allows users to create separate versions of code for both chips, but enables them to be built into a single PGI Unified Binary. The Portland Group's Michael Wolfe describes the rationale and implementation of this technology, which he presented in an Exhibitor Forum this week at SCO6."
http://www.hpcwire.com/hpc/1096734.html
In a nut shell, Raid 5 will only work if the vendor has written separate drivers for EM64T. Otherwise you might as well be trying to run ATI drivers on an nVidia video card. The high performance computing and IEEE web sites will have the answers to these issues long before the general press has the story. Linux Beacon broke the story on this issue back some 8 months ago and IEEE confirmed it in May. Apparently some one at the Inquirer reads the HPC sites when no one else seems to.
Wow, I see a lot wrong here.
Start by reading
http://en.wikipedia.org/wiki/X86-64
That having been done, you'll discover there simply is no "standard" for X64...
...Which also should serve as a sign that JEDEC has nothing to say about the instruction sets used by these CPUs.
And I can't see why a separate compiler should be necessary when a sufficiently developed compiler should be able to make do with a simple directive. But I haven't written Assembler in a long time so I suppose anything is possible.
-Brad
Wikipedia barely scrached the surface of the differences. If you had read the HPC atrticle or bothered to look up Tom's paper on the web you would have learned as follows in laymans tems:
HPCire: Today what are the main differences between EM64T and AMD64 that a compiler writer needs to be aware of? Are there still ISA differences or is it all micro-architecture?
Wolfe: The main differences are in the chip implementations, the detailed micro-architectures of the processor cores. For instance, Intel EM64T processors have typically been implemented with deeper instruction pipelines and higher clock rates. This increases the importance of good scheduling by the compiler in order to avoid pipeline stalls and extract maximum performance from the chip. With respect to streaming SIMD extensions (SSE) instructions, Intel EM64T chips use parallel floating point pipelines, which provide higher performance for packed arithmetic but no advantage for scalar code.
The AMD64 implementation uses separate pipelined floating point units. This allows for faster double-precision scalar performance, but essentially means that AMD64 has the same peak performance for double-precision scalar or packed SSE instructions. There are also a wide variety of cache sizes and configurations, which are significant to how and when a compiler should generate parallel code on a multi-core processor.
There are also temporal ISA incompatibilities. Intel introduced SSE2 instructions to the Pentium 4, and these were later adopted by AMD as part of AMD64. AMD introduced 64-bit extensions and extended register sets to the x86 architecture with AMD64, and these were eventually adopted by Intel. Intel introduced SSE3 instructions with EM64T, which AMD adopted soon thereafter, and Supplemental SSE3 instructions with Core 2 which create binary incompatibilities between Core 2 processors and current generation AMD64 processors. The PGI Unified Binary allows users to leverage these innovations as they occur, but without generating code that is sub-optimal or simply does not work on competing processors.
For the compiler, we see five distinct categories. First, as mentioned, some instructions are introduced by one vendor, so there is a time period where the ISA is different. Second, the scheduling rules for instructions differ among the processors; typically, the schedule is more critical for the Intel processors, with its deeper pipeline. Third, instruction selection can also be important. We have cases where there are two or more instructions or instruction sequences to give the same result; due to the micro-architectural differences, different instructions or sequences will be faster for the two vendors. Fourth, the various choices for vectorizing for the packed SSE arithmetic can be very specific to the chip; this includes instruction selection and scheduling, but also involves tradeoffs in the breakpoint between scalar and vector code, whether to optimize for aligned operands, and so on. Lastly and also related to vectorization, cache optimizations can depend on the cache size, which differs between the chips and even between different revisions of the same processor"
The wikipeia article ignores the the deeper issues that belong in the high level technical conferences like the temporal ISA differences. I am afraid that the detailed mathematical and engineering terms of Tom Wolfe's paper are beyond the grasp of all but one or two readers in Tom's forums and obviously beyond the other posters here. Who can post a clear explaination of temporal ISA or why it matters and who can calculate zero sequence numbers or who has even heard of them. Crok is correct the external solutions ar the easiest route. They are not processor dependent.
As to evilroots FUD, that is FUD. Here is the link to the IEEE paper presented in March2005 that provides th mathematical documentation of the feasiblity and proper use of the multigrahics card for a display.
http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/ipdps/2005/2312/04/2312toc.xml&DOI=10.1109/IPDPS.2005.121#search=%22ron%20sass%20reconfigured%22
Yes for low level operations (gaming) in Vista you will not need a different version because the code is merely compromised. See Tom Wolfe's discussion above. Certain features like raid 5 are very high level and in the case of IBM Raid 10 controlers(based on Fortran 95) flat won't work with EM64T. So if you are doing very basic things the differences will simply show up as inefficient use of the CPU. Again see Tom's discussion above. If you do high level floating point apps like solid works or operating at about 200gflops or above, then you would gain significantly by getting a recompiled version of Vista. With the use of GPU and chip accelerators like IBM's Cell or Cell+ or ATI's Stream , 500gflops will be available on the desktop by Christmas 2007. In today's slow paced world of 20gflops the design differences really don't matter. This time next year they will. Recompiling the linux kernel to accomodate EM64T resulted in a 40% increse in performance for Thunderbird at Sandia. An increase from 38 to 53 teraflops.
http://www.top500.org/system/ranking/8114
evilroot if you are going to leave your FUD display up why don't you wander over to Lawrence Livermore and see Gauss. It is 256 Quadro 4500's interconnected. Granted the interconnect is an Infiniband network rather than the PCI or PCI-e bus. Display is 4.5m by 7m. Resoultion would be equal to 3270x2400. http://www.llnl.gov/pao/news/news_releases/2005/NR-05-11-04p.html[/img]