Intel says Penryn "complete"

enewmen · Jan 21, 2007

Thanks for the info!
I think you will find the 1067 fsb is not so restrictive.
Please keep us all posted on what you find.

geoffry · Jan 21, 2007

It seems you are telling me running a yorkfield at 1066 is an almost non-issue. You are just checking BW.

Read the Inq, Tom's, TR, or any other 'HW enthusiast site" and they often will say '' but that FSB is restricting performance blah this and blah that'' yet noone has actually done the right experiments to prove blah this and blah that.

Jack

Hey Jack, whats the TR you speak of? Past 2-3 months I've started to become obsessed with learning more about PC tech and check out hothardware, anandtech, toms, inq and 5 or so others every day to try and get something new to read and learn about. If you or any others have some favs it would be appreciated.

Thanks and sorry for the off-topic post.

r0ck · Jan 21, 2007

techreport.com

geoffry · Jan 21, 2007

It seems you are telling me running a yorkfield at 1066 is an almost non-issue. You are just checking BW.

Read the Inq, Tom's, TR, or any other 'HW enthusiast site" and they often will say '' but that FSB is restricting performance blah this and blah that'' yet noone has actually done the right experiments to prove blah this and blah that.

Jack

Hey Jack, whats the TR you speak of? Past 2-3 months I've started to become obsessed with learning more about PC tech and check out hothardware, anandtech, toms, inq and 5 or so others every day to try and get something new to read and learn about. If you or any others have some favs it would be appreciated.

Thanks and sorry for the off-topic post.

Some of the better one:
Tom's, AnandTech, TechReport, LostCircuits

Some good for general product evals:
Hexus.net, Hothardware, hardwarezone, trustedreviews, legitreviews

Good forums:
THGForumz, Xtremesystem.com

Technically oriented
Realworldtech.com, EETimes

Most of the HW enthusiast sites are good, the one to avoid is HardOCP, Kyle Bennet (the editor) is a diehard AMD fan.... there aren't any Diehard Intel fanbased HW enthusiast sites. Most of the HW sites are pretty fair in their reporting data and analysis, except HardOCP.

Keep up with my posting, I typically dig up technical articles that do not originate from anything you will find on any HW site... rather I try to pull in engineering/scientific technical journal info I can gleen from university servers.

I started a decent thread hear that I will add to as well:
http://forumz.tomshardware.com/hardware/modules.php?name=Forums&file=viewtopic&t=182836&highlight=process

Jack

Thanks a bunch. Ya, I keep an eye on these forums everyday, lots of very interesting topics and debates here. Wish I found these 2 months ago, haha

enewmen · Jan 21, 2007

Is the 680i with the 1333 fsb more future-proof than the p965? I am not talking about the PCIe 16x slots. I know I am thinking way to far ahead.

qcmadness · Jan 21, 2007

Is the 680i with the 1333 fsb more future-proof than the p965? I am not talking about the PCIe 16x slots. I know I am thinking way to far ahead.

1333MHz FSB Conroes are not far away.

But for 45nm processors, I don't know if a new VRM will be needed.

qcmadness · Jan 21, 2007

Is the 680i with the 1333 fsb more future-proof than the p965? I am not talking about the PCIe 16x slots. I know I am thinking way to far ahead.

Yes, 680i should be forward compatible with the 1333 MHz chips -- since nVidia endorses this spec I would be more prone to get into a 680i. I am purchasing a 680i striker as soon as one becomes available ... though I do not know about Penryn. I expect Intel will give a demo of Penryn at the spring IDF in china, so keep your eyes peeled for details, they will likely get questioned on what chipset/socket it is running.

Jack

Probably pairing Penryn with X38 and DDR3-1066 memory

qcmadness · Jan 21, 2007

Most definitely.....

I am also intrigued by the two socket approach in 2008. It would appear they are dong one socket at 1366 pin count (as I recall) and 711 pin count (again, as I recall) actual pin count may vary, you probably have better info on this in your roadmap thread.

Anyway, it looks like Intel will mask out two forms of Nehalem, one with an IMC and one without --- if this is true they get to have their cake and eat it too with respect to memory flexibility. The IMC dedicated to one memory type giving the best performance at the time, and a non-IMC version that allows flexible adoption of memory technology as it evolves.... if this is true, it is an interesting tact.

Jack

Are you referring to this?

http://techreport.com/onearticle.x/11064

The approach is better than current Intel and AMD approaches.
K8L will incorporate DDR-2 and DDR-3 support, but I think the die space will be wasted.

enewmen · Jan 21, 2007

Is the 680i with the 1333 fsb more future-proof than the p965? I am not talking about the PCIe 16x slots. I know I am thinking way to far ahead.

1333MHz FSB Conroes are not far away.

But for 45nm processors, I don't know if a new VRM will be needed.

Picking hairs here. If the VRM is ok, can I just overclock my 1066 to 1333, then accept the 1333 fsb chip?

qcmadness · Jan 21, 2007

Is the 680i with the 1333 fsb more future-proof than the p965? I am not talking about the PCIe 16x slots. I know I am thinking way to far ahead.

1333MHz FSB Conroes are not far away.

But for 45nm processors, I don't know if a new VRM will be needed.

Picking hairs here. If the VRM is ok, can I just overclock my 1066 to 1333, then accept the 1333 fsb chip?

I think so

qcmadness · Jan 21, 2007

This I do not know.... I would guess it depends if Intel releases the BIOS code to the older chipsets to recognize the chip and enable the boot strap.

I don't the BIOS will be a problem as there are outstanding motherboard manufacturers such as Asus / Asrock and Gigabyte

qcmadness · Jan 21, 2007

Good point --- it would come down to VRM and socket -- Bearlake is unofficially supporting older P4 so I suspect it will be a question of voltage.

Intel will need to use mechanical or electricial definitions to urge users to upgrade. BIOS is definitely not a problem for motherboard manufacturers.

enewmen · Jan 22, 2007

Madness & Jack,

Thanks for the reply's.
It seems good makers (Asus, Gigabyte, etc) will keep making BIOS upgrades and support future chips for a few years anyway.
I think I know enough now to stop asking stupid questions - for now 😉

levicki · Jan 22, 2007

Penryn will just be a die shrink of the C2D architechture and some added SSE4 instructions.

Some?!?

It is ~50 instructions, if that is "some" for you then you will really have to ask your beloved AMD for more.

For those who want to know, there will be CRC32 implemented as an instruction, some new string instructions for regular expression support, dot product, and population count. Others will help code vectorization and data type conversion.

Conroe is only 10-15% better than K8 OVERALL!!!

Me being a classic Intel troll will tell you that my E6300 at 2.8GHz has the performance of an Opteron 165 running at 3.4GHz. The only problem is that such Opteron still doesn't exist, let alone cost $190. If you manage to find such an Opteron I really doubt it will be air-cooled like my E6300. You can do your math now.

Why did Microsoft opted for AMD's 64-bit instruction set AND processors for current and future development of apps?

Umm, because Core 2 Duo wasn't around when they started writing 64-bit code?

Speaking of it, if Microsoft didn't stick to AMD for 64-bit platform, they would have to invest in some real code porting to IA64 and thus lose advantage over IBM, Sun, Oracle, in server software but we would have got way better software and platform.

AMD's weak developer support is exactly the reason why 64-bit XP and applications haven't taken off even though it has been 4 (four!) years since first AMD64 chip introduction.

Finally, AMD64 is stupid name, EM64T better describes what it is all about -- (Extended Memory technology) and it doesn't include vendor name in it. Btw, those are not 64-bit instructions but extensions. Just like 32-bit was an extension of 16-bit in the past. I am still pissed off because AMD got the laurels for that quick hack and for breathing back life into a dead horse (x86) as well as helping Microsoft keep its software monopoly.

MU_Engineer · Jan 22, 2007

levicki said:
Why did Microsoft opted for AMD's 64-bit instruction set AND processors for current and future development of apps?

Click to expand...

Umm, because Core 2 Duo wasn't around when they started writing 64-bit code?

Intel's EM64T was around with the Prescott, which was well before the Core 2 Duo and not that much after the Opteron first shipped. Intel correctly at the time figured that the only place that 64-bit computing was needed was in HPC environments, where 64-bit CPUs like the Power, MIPS64, Alpha, and Itanium were currently being sold. Intel thought that these 64-bit CPUs would filter down with the demand for large memory addressability filtering down from HPC to medium business to small business, finally to you and me. Thus the Itanium would displace the 32-bit ix86 CPUs when 64 bits were needed on the desktop. Since your apps and OS would need to be 64 bits, reverse compatibility with ix86 could be abolished and few would care.

AMD threw a spoke in their plans by offering reverse-compatible 64-bit x86 CPUs aimed at the medium business market, where 64 bits was not currently but would be needed soon. This let companies use current 32-bit software and OSes and then upgrade to 64-bit ones later when it was needed. MSFT saw this as a better business opportunity as well as a way to try and expand Windows up into the larger-server realm dominated by UNIXy OSes. Thus they went with AMD's x86_64.

[quoteSpeaking of it, if Microsoft didn't stick to AMD for 64-bit platform, they would have to invest in some real code porting to IA64 and thus lose advantage over IBM, Sun, Oracle, in server software but we would have got way better software and platform.

Click to expand...

Windows XP was actually released for the Itanium, so MSFT already did some porting. Since they got XP released on IA64, it would not have been too difficult to port Office or their other programs over.

AMD's weak developer support is exactly the reason why 64-bit XP and applications haven't taken off even though it has been 4 (four!) years since first AMD64 chip introduction.

Click to expand...

It has nothing to do with AMD's developer support. You forget that Intel's EM64T is compatible with AMD64 and that Intel is pushing EM64T also. The reasons that 64-bit Windows have not taken off are the following:

1. Few that run had need for over 3-3.5 GB RAM.
2. Since #1 was true, few had a pressing need for the OS and stuck with Windows XP 32-bit. This did not put pressure on the application vendors and driver writers to make 64-bit code.
3. #2 led people that might have otherwise tried out a 64-bit OS due to speed enhancements or "just because" to not do it.
4. Everybody knew that Windows Vista was going to be available in 64-bit and its original early release date that kept getting pushed back led people to not buy the "bastard" XP x86_64.
5. All four of these things worked in a negative feedback loop to keep everybody on 32-bit XP.

Finally, AMD64 is stupid name, EM64T better describes what it is all about -- (Extended Memory technology) and it doesn't include vendor name in it. Btw, those are not 64-bit instructions but extensions. Just like 32-bit was an extension of 16-bit in the past. I am still pissed off because AMD got the laurels for that quick hack and for breathing back life into a dead horse (x86) as well as helping Microsoft keep its software monopoly.

Click to expand...

What do you think the "i" in the i386/i486/i586/i686 architectures stands for? Or the "I" in Itanium's IA-64 (the latter is not "Itanium," I'll spot you that.) The spec is also generally referred to as "x86_64" which has no vendor name at all and says *exactly* what it is- 64-bit extensions to x86. x86_64 was a quick hack, yes. But it was actually a pretty elegant hack as it allowed for compatibility with current 32-bit x86 OSes and programs as well as being able to run new 64-bit code as well. One can run 32-bit code on an x86_64 chip on a 64-bit OS at about the same speed as running the code in a 32-bit OS. That is a major advantage and one that just about everybody using a 64-bit OS today uses and most people tomorrow will use as well. It's much better than emulating a CPU of an incompatible architecture- go as an Itanium owner how 32-bit x86 code runs on it.

x86 really doesn't have that much to do with keeping MSFT as an OS monopoly, perhaps beyond being able to run legacy apps. That isn't trivial, but it alone does not keep an OS a monopoly. x86_64 in 64-bit long mode is a different architecture than 32-bit x86. The 64-bit versions of Windows- both XP x86_64 and Vista 64-bit- require new drivers ad libraries to be written by third-party developers. This means that people have a significant migration to do from 32-bit Windows to 64-bit Windows and many may consider moving to a different OS because that would likely not require much more in migration costs.

Also, MSFT has compiled Windows NT variants for many different CPU architectures: 32-bit x86, 32-bit PowerPC, ARM9, IA64, and x86_64. I'm sure that if the platform of choice migrated to something other than x86 that it would not be too hard for them to recompile their applications for it. They have done it in the past and other OSes (especially Linux) does it right now with little trouble. MSFT has other factors other than what architecture of CPUs we all run that keeps them entrenched. I'll not get into that here unless you want to, then I can go on all night.

levicki · Jan 22, 2007

Intel's EM64T was around with the Prescott, which was well before the Core 2 Duo and not that much after the Opteron first shipped. Intel correctly at the time figured that the only place that 64-bit computing was needed was in HPC environments, where 64-bit CPUs like the Power, MIPS64, Alpha, and Itanium were currently being sold. Intel thought that these 64-bit CPUs would filter down with the demand for large memory addressability filtering down from HPC to medium business to small business, finally to you and me. Thus the Itanium would displace the 32-bit ix86 CPUs when 64 bits were needed on the desktop. Since your apps and OS would need to be 64 bits, reverse compatibility with ix86 could be abolished and few would care.

I agree with that.

AMD threw a spoke in their plans by offering reverse-compatible 64-bit x86 CPUs aimed at the medium business market, where 64 bits was not currently but would be needed soon. This let companies use current 32-bit software and OSes and then upgrade to 64-bit ones later when it was needed. MSFT saw this as a better business opportunity as well as a way to try and expand Windows up into the larger-server realm dominated by UNIXy OSes. Thus they went with AMD's x86_64.

I agree with that too. I only fear that AMD didn't do us a favor. They acted short-sightedly -- they only had in mind short-term profit from siding with Microsoft. And it is short-term in case you wonder, because Intel now has comparable 64-bit offer so AMD has lost its exclusiveness, not to mention that 64-bit OS still hasn't taken off so one really has to wonder how much their pockets are fuller now because of what they did.

Windows XP was actually released for the Itanium, so MSFT already did some porting. Since they got XP released on IA64, it would not have been too difficult to port Office or their other programs over.

It was working mostly in emulation mode which was slow. Microsoft never did any real porting except for Alpha. Now that I mention it, I remember that I heard from Microsoft MSVC compiler developer that their and AMD's plans for AMD64 go as far back as year 2000. That seems to coincide with Alpha support being dropped in Windows 2000 Beta 3.

It has nothing to do with AMD's developer support. You forget that Intel's EM64T is compatible with AMD64 and that Intel is pushing EM64T also.

Oh yes it has. Has AMD been more capable, 64-bit OS would've been released sooner. If AMD offered a 64-bit compiler developers would have tried to port their code and realized the potential benefits sooner.

It was only when Intel high volume manufacturing machinery started pushing EM64T capable Celerons into masses that the popularity and awareness of 64-bitness started to grow.

What do you think the "i" in the i386/i486/i586/i686 architectures stands for? Or the "I" in Itanium's IA-64 (the latter is not "Itanium," I'll spot you that.)

Please, do not mix things. I do not dispute term AMD64 as an architecture name (although I believe that the architecture name is K8 or Hammer).

I only say that 64-bit instruction set extensions should not be called AMD64 just like SSE2 is not called IntelSSE2.

The spec is also generally referred to as "x86_64" which has no vendor name at all and says *exactly* what it is- 64-bit extensions to x86.

AMD calls it "long mode" in their docs. Because of that non-imaginative and non-descriptive term, developers coined AMD64 and x86_64. I must agree that x86_64 is better.

However, all the names are fundamentally wrong because you neither have radically new architecture (it is still x86 under the hood) nor you really have 64-bit address space (physical or virtual).

x86_64 was a quick hack, yes. But it was actually a pretty elegant hack as it allowed for compatibility with current 32-bit x86 OSes and programs as well as being able to run new 64-bit code as well. One can run 32-bit code on an x86_64 chip on a 64-bit OS at about the same speed as running the code in a 32-bit OS.

I disagree with the "elegant" part. Adding yet another instruction prefix (0x48 or REX) means a penalty for instruction decoding because instruction lengths change.

Result of that as well as of the bigger footprint of the immediate operands is that 64-bit code can run several percent slower than the exactly same 32-bit code under 64-bit OS. That is absurd because you are actually penalized for making native 64-bit applications.

Note that I am speaking about ports of already highly-optimized code, any other (sloppy) code ported to 64-bit will automatically benefit through the (obligatory) use of SSE2 instead of legacy FPU so the penalty would in most cases be covered.

However, those who still write critical parts of their code in assembler like I sometimes do, will have a hard time to match 32-bit performance in 64-bit code.

I tried very hard and the best I could manage was only 6.5% slower code. Intel compiler's 64-bit code (and I am talking about the same thing I wrote in assembler) lags 8.3% behind 32-bit one.

On the other hand, MSVC gains 9.5% in 64-bit code .vs. 32-bit code because it doesn't use legacy instruction mix anymore, but it is still 4.33x slower than my assembler code and 3.96x slower than the code generated by Intel compiler.

x86 really doesn't have that much to do with keeping MSFT as an OS monopoly, perhaps beyond being able to run legacy apps. That isn't trivial, but it alone does not keep an OS a monopoly.

It is not just about the OS. It is about applications such as Office and server software such as IIS, ISA, MSSQL, Exchange, etc. Not having to port all that but instead just recompiling it is a great advantage over other vendors. Having to spend much more time and money to make and test a proper port would create oportunity for others to jump in.

x86_64 in 64-bit long mode is a different architecture than 32-bit x86.

I believe that CPU architecture is for example Netburst .vs. Core, or K7 .vs. K8 but not 32 .vs. 64. It is true that in AMD's case 32->64 coincides with architecture change (K7->K8) but that doesn't mean that x86_64 is a new architecture.

Furthermore, as I already said it really isn't that different. It even doesn't have any new instructions. Just wider registers, more of them (crippled by poorly thought out ABI if I may add) and more physical and virtual memory available. Nothing else has changed.

ajfink · Jan 22, 2007

Penryn will just be a die shrink of the C2D architechture and some added SSE4 instructions.

Some?!?

It is ~50 instructions, if that is "some" for you then you will really have to ask your beloved AMD for more.

For those who want to know, there will be CRC32 implemented as an instruction, some new string instructions for regular expression support, dot product, and population count. Others will help code vectorization and data type conversion.

Conroe is only 10-15% better than K8 OVERALL!!!

Me being a classic Intel troll will tell you that my E6300 at 2.8GHz has the performance of an Opteron 165 running at 3.4GHz. The only problem is that such Opteron still doesn't exist, let alone cost $190. If you manage to find such an Opteron I really doubt it will be air-cooled like my E6300. You can do your math now.

Why did Microsoft opted for AMD's 64-bit instruction set AND processors for current and future development of apps?

Umm, because Core 2 Duo wasn't around when they started writing 64-bit code?

Speaking of it, if Microsoft didn't stick to AMD for 64-bit platform, they would have to invest in some real code porting to IA64 and thus lose advantage over IBM, Sun, Oracle, in server software but we would have got way better software and platform.

AMD's weak developer support is exactly the reason why 64-bit XP and applications haven't taken off even though it has been 4 (four!) years since first AMD64 chip introduction.

Finally, AMD64 is stupid name, EM64T better describes what it is all about -- (Extended Memory technology) and it doesn't include vendor name in it. Btw, those are not 64-bit instructions but extensions. Just like 32-bit was an extension of 16-bit in the past. I am still pissed off because AMD got the laurels for that quick hack and for breathing back life into a dead horse (x86) as well as helping Microsoft keep its software monopoly.

This is essentially aimed at your last bit - calling AMD's 64-bit implementation (and it's name) "stupid" is a little off. Blaming the lack of developer support in the consumer market for 64-bit on AMD (or Intel, for that matter) is silly. AMD64 was developed for servers, where 64-bit apps are far more prevalent, then ported over to desktops. I must agree that x86_64 is a far more accurate and apt name.

I also must say stop bashing AMD for foresight and technical achievement in their execution and delivery of 64-bit extensions ahead of Intel in the x86 market. 64-bit processors from AMD or Intel are still a bit ahead of their time and probably won't hit their stride until late this year or early next year (I'm not calling any processors gimped, because let's face it, the X6800 or QX6700 are far from being held back at current 32-bit uses).

Also, AMD's 64-bit extensions are more efficient than Intel's. Microsoft choosing their model rather than others was both cheaper, simpler, and better than taking in others, as said. Again as someone said, Intel hadn't brought their EM64T to market yet.

XP64 isn't nearly as bad as people are saying. The driver support wasn't there initially, without question, but from everything I've heard it has much better support now. If I were building a new computer I would probably choose it over 32-bit (boo Vista).

Just my 2 cents.

MU_Engineer · Jan 22, 2007

levicki said:
Oh yes it has. Has AMD been more capable, 64-bit OS would've been released sooner. If AMD offered a 64-bit compiler developers would have tried to port their code and realized the potential benefits sooner.

AMD chose to work with makers of existing compilers- Microsoft for the VC++ compiler, the Free Software Foundation for gcc- rather than making their own compiler a la Intel. I'd be willing to bet that this was not only faster and easier to do but that it would have had a better end product than an AMD compiler. A good compiler is tricky to make and software generally works best with other software that is designed for and compiled with the same toolchain. Try to install Gentoo or LFS using gcc on a Linux box and tell me how that goes. The try it with icc. Tell me which one performs better or even compiles completely.

It was only when Intel high volume manufacturing machinery started pushing EM64T capable Celerons into masses that the popularity and awareness of 64-bitness started to grow.

Click to expand...

It wasn't specifically Intel making 64-bit Celerons that made people generally aware of 64-bit OSes but relatively inexpensive lines of 64-bit chips as a whole that did. That would include P4 Prescotts as well as Celerons. Of course AMD's Athlon 64, Turion 64, and Sempron 64 were also relatively inexpensive 64-bit CPUs that were widely used as well. About the only chips sold in the last few years that weren't 64 bit were a few Semprons and the Pentium M/Core 1 lineage.

Please, do not mix things. I do not dispute term AMD64 as an architecture name (although I believe that the architecture name is K8 or Hammer).

Click to expand...

ix86 is the ISA name, not the CPU name, at least in later incarnations such as i586 and i686. Intel never called the CPU that used the i586 ISA the i586- it was called the Pentium of internally it was the P5, P54, P54C, or P55. Likewise, no i686 ISA chip was ever called i686, either. The architecture was called P6 and the CPUs referred to internally by code names like "Klamath" and "Banias" or by their market name.

This is also borne out that AMD chips run on the ix86 ISA too. The K5 through the non-XP Athlons were i586 and the Athlon XP was i686.

I only say that 64-bit instruction set extensions should not be called AMD64 just like SSE2 is not called IntelSSE2.

Click to expand...

Just as the SSE implementations have dropped the little "i" in front of them to just become "SSE," 64-bit x86 is generically called "x86_64" as that incorporates both AMD64 and EM64T. So if you don't like to refer to the 64-bit ISA on AMD processors as "AMD64" say "x86_64" instead and it's the exact same thing.

AMD calls it "long mode" in their docs. Because of that non-imaginative and non-descriptive term, developers coined AMD64 and x86_64. I must agree that x86_64 is better.

Click to expand...

Long mode simply refers to the mode in which the CPU operates when it executes 64-bit code as opposed to 32-bit code (protected mode) or 16-bit code (real mode.) It does not refer to the specific ISA being used at all. We didn't call the various flavors of 32-bit x86 ISAs all "protected mode," did we?

However, all the names are fundamentally wrong because you neither have radically new architecture (it is still x86 under the hood) nor you really have 64-bit address space (physical or virtual).

Click to expand...

The address space on almost all 32-bit chips isn't 32 bits either, it's actually 36 bits. But it's still a 32-bit chip. The bit number is for how many bits are in an instruction word, which in all 32-bit processors is 32 bits and in 64-bit processors is 64 bits. This has nothing to do with memory addressability. It just means that without any modifications (like 36-bit PAE) that UP TO 16EB of address space can be mapped with one word. It doesn't mean that the chip has to be able to address that much memory.

I disagree with the "elegant" part. Adding yet another instruction prefix (0x48 or REX) means a penalty for instruction decoding because instruction lengths change.

I meant "elegant" because of how it solved the problem and its execution. x86 is not a particularly clean or simple ISA, especially compared to some RISC ISAs. But the approach that AMD did cleaned up the ISA some and allowed for pretty seamless transitions between 32- and 64-bit code. That's needed in the real world. Of course a brand new, clean-slate 64-bit ISA would be more elegant from a design view. But it would be horrible in an implementation view.

Result of that as well as of the bigger footprint of the immediate operands is that 64-bit code can run several percent slower than the exactly same 32-bit code under 64-bit OS. That is absurd because you are actually penalized for making native 64-bit applications.

Click to expand...

Most benchmarks that I've seen show 64-bit code running faster than 32-bit code, especially for applications that do a lot of math like encoders. This is somewhat due to the 387 FPU being shut off and SSE being used instead. But there are also more registers available in 64-bit mode than 32-bit mode, and that certainly does not hurt.

Note that I am speaking about ports of already highly-optimized code, any other (sloppy) code ported to 64-bit will automatically benefit through the (obligatory) use of SSE2 instead of legacy FPU so the penalty would in most cases be covered.

Click to expand...

Code that is highly optimized for anything generally won't recompile for a different target and keep the same performance. But if it was highly-optimized code as opposed to sloppy code in the first place, don't you think that perhaps optimizing for 64-bit code would reduce a lot of the bloat? For example, if you only need 8 bytes for a variable, it should be ported from (32-bit) long doubles to normal doubles in 64-bit mode instead of 64-bit long doubles that are 16 bytes.

However, those who still write critical parts of their code in assembler like I sometimes do, will have a hard time to match 32-bit performance in 64-bit code.

I tried very hard and the best I could manage was only 6.5% slower code. Intel compiler's 64-bit code (and I am talking about the same thing I wrote in assembler) lags 8.3% behind 32-bit one.

On the other hand, MSVC gains 9.5% in 64-bit code .vs. 32-bit code because it doesn't use legacy instruction mix anymore, but it is still 4.33x slower than my assembler code and 3.96x slower than the code generated by Intel compiler.

Click to expand...

I've not done any assembly language programming, so I'll take your word. But if you write a program and want to run it in 32-bit mode or even on a 32-bit OS, new x86_64 CPUs can still run it. If there was a separate, incompatible 64-bit ISA that was slower, then you'd either have to put up with the theoretical performance hit and run it on new 64-bit chips or dig out older, slower 32-bit CPUs with the faster ISA. You get a decent bit of both worlds with the x86_64 implementation- that's why it's not so bad.

It is not just about the OS. It is about applications such as Office and server software such as IIS, ISA, MSSQL, Exchange, etc. Not having to port all that but instead just recompiling it is a great advantage over other vendors. Having to spend much more time and money to make and test a proper port would create oportunity for others to jump in.

Click to expand...

As far as I have seen and experienced, source code is generally only specific to the kind of OS it will run on (e.g. Windows or UNIX) rather than the CPU type or bit width. I have written programs that I've compiled on a range of different machines, ranging from my 64-bit x86 desktop to my 32-bit x86 laptop to PowerPC Macintoshes. It's compiled fine on all of them and the only things that had changed was that the compiler introduced support for different things upon compiling, such as SSE, AltiVec, 64-bit code, etc. The binaries are not compatible with each other, with the exception that the 32-bit x86 binary would run on the 64-bit x86 machine. So it seems that basically a recompile is all that's needed to make a new-architecture version of most every program that's out there. The result won't necessarily be as optimized for the new architecture, but it will run and probably run at least decently. if you've used Microsoft applications before, efficiency is NOT something that they care strongly about anyway.

I believe that CPU architecture is for example Netburst .vs. Core, or K7 .vs. K8 but not 32 .vs. 64. It is true that in AMD's case 32->64 coincides with architecture change (K7->K8) but that doesn't mean that x86_64 is a new architecture.

Click to expand...

As far as having something compiled goes, an architecture is the ISA target, not the actual hardware micro-architecture of the chip. I should have been more explicit with this.

Click to expand...

Click to expand...

Click to expand...

r0ck · Jan 22, 2007

http://www.tgdaily.com/2007/01/22/intel_sun/

Early next week, the company is expected to provide more details on its first 45 nm processors.

Woo

pausert20 · Jan 22, 2007

Is the 680i with the 1333 fsb more future-proof than the p965? I am not talking about the PCIe 16x slots. I know I am thinking way to far ahead.

1333MHz FSB Conroes are not far away.

But for 45nm processors, I don't know if a new VRM will be needed.

QC, It looks like there will be a new VRM needed for the Wolfdale processors. I found out that there is a VRM/VID rework that can be done to allow the Bad Axe2 to run the new Wolfdale processors.

This means that the Bearlake chipset will have a new VRM to support the 45nm processors

levicki · Jan 23, 2007

This is essentially aimed at your last bit - calling AMD's 64-bit implementation (and it's name) "stupid" is a little off.

Next time if you bother to reply please read more carefully. Thank you.

Note how I said "AMD64 is stupid name ..." meaning "stupid name for 64-bit extensions". I believe I explained my reasoning behind it well enough.

Also note how I haven't said that AMD's 64-bit implementation itself is stupid, although as a developer I personally dislike the way x86_64 has been made in general.

Blaming the lack of developer support in the consumer market for 64-bit on AMD (or Intel, for that matter) is silly. AMD64 was developed for servers, where 64-bit apps are far more prevalent, then ported over to desktops.

Why would that be silly? No matter how excellent a piece of hardware is, without good software, support and documentation it is just an expensive brick. Just look at Ageia. They were selling their SDK at first, and it wasn't cheap, now it is free. Why? Because hardware needs software to survive.

I also must say stop bashing AMD for foresight and technical achievement in their execution and delivery of 64-bit extensions ahead of Intel in the x86 market.

Not only ahead of Intel -- ahead of time. I am bashing them for two reasons:

1. They extended ill-founded x86 YET AGAIN instead of letting it die.

2. They delivered extensions without compiler which can utilize them. I will not discuss Linux and server space but under Windows we got 64-bit capable compiler in 2005 with Visual Studio Whidbey if you don't count DDK compiler which was available a bit earlier. Isn't 2 (two) years a bit too long to wait for a compiler?

I am also bashing both AMD and Microsoft for:

1. ABI design -- you have 16 GPRs but you still have restrictions how many you can use (you still use stack for more than four parameters) and you better avoid higher numbered ones (8-15) because you will then have one byte longer instructions which slows down decoding.

And just Microsoft for:

1. Lack of inline assembler support. It is true that the code requires heavy changing in order to work again but it is easier than having to make separate .asm file.

Also, AMD's 64-bit extensions are more efficient than Intel's.

That was the case with Prescott. Core 2 Duo is equally efficient.

XP64 isn't nearly as bad as people are saying. The driver support wasn't there initially, without question, but from everything I've heard it has much better support now.

I know, I am using it. Here are just the few issues I am aware of:

1. If you have GeForce 7 VIVO card you can kiss goodbye your Video In because capture drivers have never been released for 64-bit OS by NVIDIA.

2. Creative sound card drivers have some new bugs, some of them ironically preventing you from using more than 4GB or RAM in combination with their sound cards.

3. 8800GTX was launched on November 8th. Drivers for x64 were published on December 8th. What would you do with such an expensive product in the meantime? Use it in VGA mode?

4. Force feedback wheel? Check if you have drivers first. When some Logitech wheels don't have them, I won't even mention noname vendors.

5. Scanners and printers. Many of them lack drivers.

6. USB ADSL modem? No problem as long as it is Thomson (Alcatel) SpeedTouch™ 330 because that is about the only one I know which has 64-bit drivers.

I could go on with some more details if you want, but I am a bit tired right now.

I'd be willing to bet that this was not only faster and easier to do but that it would have had a better end product than an AMD compiler.

I am beginning to notice a pattern here. Every time AMD does something and it doesn't go all too well, everyone defends them by saying "but it was faster and easier to do it that way".

Never mind that doing it the hard way would be better in the long term, it is dead obvious that short term earnings beat the crap out of everything else. Quantity instead of quality, I heard of it.

Try to install Gentoo or LFS using gcc on a Linux box and tell me how that goes. The try it with icc.

I only tried to compile kernel with icc once and it worked. Certain patches were required but it is not impossible thing to do and it does improve performance. gcc has improved for x86 a lot in version 4.0 but neither gcc nor MSVC have automatic vectorization and so many high-level optimizations as icc.

I agree that it is not suitable for every task but in my opinion CPU vendor must have its own compiler or otherwise I don't consider them as serious player. There are two things to consider:

1. Who else will know how to write the best performing code if not those who designed the damn chip?

2. What if Microsoft goes down the drain? I know that is not very likely to happen but it is not good to depend on someone else for such an essential feature.

Just as the SSE implementations have dropped the little "i" in front of them

If I don't count Z80 and Motorola 68000, I am active developer since Pentium 90 days, I remember "birth" of MMX and SSE. I have read all the official Intel documentation, programming guides and application notes and I have never seen the little "i" you mention. It is an invention of SiSoftware and adopted by Lavalys and the others. I claim the "i" never existed. I would really like someone to prove otherwise but not by showing screenshots from Sandra.

Regarding the address space, 32-bit chips are actually 36-bit but still can't address more than 4GB of RAM without windowing. 64-bit ones have 48-bit virtual address space and 40-bit physical address space. Note the difference -- 32-bit have 4 bits more, 64-bit have 16/24 bits less.

They only have 64-bit wide general purpose registers and 64-bit wide data path and that data path is something even 32-bit Pentiums had years ago.

But there are also more registers available in 64-bit mode than 32-bit mode, and that certainly does not hurt.

Sure, but as I said they don't help much either. Best performance is still with the lower eight of both GPR and XMM registers because you don't need a prefix byte to use them. There were already very effective mechanisms such as hardware register renaming which countered the need for more registers being exposed to compiler.

You get a decent bit of both worlds with the x86_64 implementation- that's why it's not so bad.

To be honest, I would really prefer clean new 64-bit ISA which performs better than x86_64 because I have to rewrite/port most of the code anyway just in order to bring the 64-bit code performance back to the approximate level of the 32-bit code performance.

Compile this code using icc 9.1.033 and -O3 -QxT switches and with gcc of your choice (you will probably have to change align directive syntax, this one is from Windows world). If you don't have Core 2 Duo to run the executable on, just try to compile it anyway, it should get vectorized. Try to compare generated code just for fun.

[code:1:2e80b3fa24]
__declspec(align(16)) float src[1024], dst[1024];

for(i = 2; i < 1024 - 2; i++) {
dst = src[i - 2] - src[i - 1] - src[i + 2];
}
[/code:1:2e80b3fa24]

I just can't wait for those SSE4 instructions to become available.

EDIT:
People all over the www are speculating as to what the new SSE4 instructions will be. I am surprised because I already knew all about them for quite some time.

New Instructions

MU_Engineer · Jan 23, 2007

I've used icc on certain projects and I did notice that it did vectorize code automatically with the right switches enabled. Let's see, I think it was -O3 -xB for my P4 Woody and simply -fast and something with "threads" for the dual Xeon Irwindale box that was really the target of the code. It does make a good difference with the right code, I won't lie. icc can do excellent work in some cases but it is kind of hampered by not being the default toolchain compiler for much of anything. I'd think that Intel would do themselves a favor by giving "the good stuff" that's in icc to the FSF so that -march=<some-intel-chip> and the right gcc switches will accomplish the same as the stuff in icc. That would lead to the same result- code that runs excellently on Intel CPUs- but with the added benefits of being able to have this advantage used more widely and regularly. That would only make Intel look better IMHO and I don't see what the downside is, besides not being able to sell developers compilers that are of limited use.

levicki · Jan 23, 2007

I don't see what the downside is, besides not being able to sell developers compilers that are of limited use.

Your idea is interesting, but I believe that the problem lies somewhere else. To be able to optimize so efficiently, icc probably does completely different analysis of the code (and especially the data flow) than the gcc.

Most likely the changes required to merge gcc with icc optimizations would break the gcc in much the same way you find icc broken now for certain tasks. It is completely different philosophy behind those two compiler engines. One is pragmatic, the other is opportunistic.

There is a document order #248966 available free of charge (both on a CD and as a download) called "Intel 64 and IA-32 Architectures Optimization Reference Manual" currently at revision 14. gcc developers should spend more time reading that document because it defines set of optimization rules for compiler writers all neatly classified by impact and generality. I am sure that they could match icc performance if they try following those rules.

MU_Engineer · Jan 23, 2007

Interesting. Good find.

Intel says Penryn "complete"

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Splendid

Share this page