Pretty good explanation of x86-64 by HP

Guest · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

In article <gbe8r0pp5ip0dl4d58fvktklbuu35442it@4ax.com>, Tony Hill wrote:
> It could
> be that Intel still has a reasonable amount of inventory of their old
> "Northwood" P4 chips and they want to clear those out first, but that
> certainly doesn't seem to be the case looking at Intel's pricing
> structure and what is being sold by the major OEMs (Intel seems to be
> pushing Prescott VERY hard here).

A friend recently (1 month ago IIRC) wanted a Northwood for his DIY
computer, but he found that none of the usual suspects around here had
them in stock. Eventually he called the importer, who said that
they're out of stock and they're not getting anymore either, buy a
Prescott instead.

> Long story short, I'm not quite sure what the actual answer is, but
> excessive inventory of 32-bit chips doesn't seem to make sense from
> what I've seen.

Considering the rate chips depreciate I guess manufacturers think
pretty hard about what they can do to minimize inventory.

--
Janne Blomqvist

Guest · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Mon, 06 Dec 2004 18:39:46 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:

>George Macdonald wrote:
>> On Sun, 05 Dec 2004 17:37:16 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:
<<snip>>

>> Thanks for the data but no I guess I should have highlighted better what I
>> was getting at: "the memory controller is integrated into and operates at
>> the core speed of the processor", which is what was being
>> discussed/disputed in another thread.
>>
>> I haven't been able to find any hard data from AMD on where the clock
>> domain boundaries are in the Opteron/Athlon64 but if the memory controller
>> is not operating at "core speed" it's now at the stage of Internet
>> Folklore.
>
>Ah, that one is much easier to answer. ;-)
>
>Straight from the horse's mouth:
>http://www.amd.com/us-en/Processors/ProductInformation/0%2C%2C30_118_4699_7981%5E7983%2C00.html
>
> "By running at the processor’s core frequency, an integrated
> memory controller greatly increases bandwidth directly available
> to the processor at significantly reduced latencies."

Ah so there we have it... assuming this has been approved by the technical
folks.🙂 BTW I notice that AMD seems to cutting back on the depth of info
in their technical docs - the Product Data Sheets now consist of one
page... a far cry from the excruciating detail on cache operation etc. we
used to get.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??

keith · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

On Sun, 05 Dec 2004 23:29:08 -0800, Greg Lindahl wrote:

> In article <8c97r0hqh2sqf8sh89ut3153lpdmddfs76@4ax.com>,
> George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:
>
>>I haven't been able to find any hard data from AMD on where the clock
>>domain boundaries are in the Opteron/Athlon64 but if the memory controller
>>is not operating at "core speed" it's now at the stage of Internet
>>Folklore.
>
> Note that the STREAM bandwidth and lmbench latency changes with every
> cpuspeedbump. So clearly part of the memory controller is at the cpu
> core frequency, or a related frequency, and not at the HT frequency,
> or the SDRAM external bus frequency.

That does *not* mean that the memory corntoller runs at the core speed.
>It would be nuts to assume such. Would you assume the cashes of the
>PII run at the the I/O bus speed?

> Please reduce the cross-post. Followups set to a group I read.

Isn't his a rather egotistical statement? "I don't read other
groups, so no one else matters!" Hint: Others are reading this thread
from other groups! It's posted to *three* related groups (hardly a breech
of USENET protocol).

--
Keith

Guest · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote,
in part:

>I found this whitepaper from HP to be pretty good, it is surprisingly
>candid, considering HP was the coinventor of the Itanium. It does a
>pretty good job of explaining and summarizing the similarities and
>differences between AMD64 and EM64T, and their comparison to the
>Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>compatible", but IA64 is a different animal altogether.

>http://h200001.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf

I would have preferred if you had given the URL of a page with a *link*
on it to this manual. That would make it easier to back-navigate for
other items of related interest, and it would have meant that the manual
could be downloaded with a right-click without waiting for the browser
plug-in to display the whole manual.

On page 13, under the heading "Power Considerations", I noticed a real
whopper. Or, at least, what _seemed_ to me to be a real whopper
initially.

It is true that for a given implementation, a higher clock speed means
more power consumption. It takes more power to make gates switch faster.

However, if a higher clock speed is obtained by splitting the pipeline
into more itty-bitty pieces, for the same level of instruction latency,
then one still has the same number of gates, each consuming the same
amount of power. (Except for the overhead of the pipelining process...
and one more thing to be noted later.)

What is the point of splitting up a pipeline into smaller pieces? Is it
to put more megahertz in the ad copy? No, it is so that more
instructions can be executing, in different stages, at once. (Which
means that a Pentium IV ought to have explicit vector instructions. Yes,
it has a separate instruction cache and data cache, but there's still
only one bus to *main memory*, and caches do have to get filled from
somewhere.)

Since CMOS gates only consume power when they are changing state, unused
elements of a non-pipelined ALU are not consuming power, so it may well
be that a 14-stage pipelined ALU can consume twice as much power as a
7-stage pipelined ALU.

But that will be because twice as much of it is in use, not because it
is going "twice as fast".

Since they are still sort of right, even if for the wrong reason,
perhaps all I am criticizing is an oversimplification here. But I think
that this can lead to a profound misconception of how microprocessors
work.

John Savard
http://home.ecn.ab.ca/~jsavard/index.html

Guest · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

"John Savard" <jsavard@excxn.aNOSPAMb.cdn.invalid> wrote in message
news:41b50db1.4580547@news.ecn.ab.ca...
> On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com>
wrote,
> in part:

OK, I wont trim the wonderful newsgroup list, all of whose readers are
breathlessly awaiting my imortal prose....
>
> >I found this whitepaper from HP to be pretty good, it is surprisingly
> >candid, considering HP was the coinventor of the Itanium. It does a
> >pretty good job of explaining and summarizing the similarities and
> >differences between AMD64 and EM64T, and their comparison to the
> >Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
> >compatible", but IA64 is a different animal altogether.
>
>
>http://h200001.www2.hp.com/bc/docs/support/SupportManual/c00238028/c002
38028.pdf
>
> I would have preferred if you had given the URL of a page with a
*link*
> on it to this manual. That would make it easier to back-navigate for
> other items of related interest, and it would have meant that the
manual
> could be downloaded with a right-click without waiting for the browser
> plug-in to display the whole manual.

What braindamaged newsreader are you using that won't let you right
click the link in the newsreader? Even OE does that. So quit whining
and switch to a decent newsreader.
>
> On page 13, under the heading "Power Considerations", I noticed a real
> whopper. Or, at least, what _seemed_ to me to be a real whopper
> initially.
>
> It is true that for a given implementation, a higher clock speed means
> more power consumption. It takes more power to make gates switch
faster.
>
Probably referring to that esoteric equation P= (sf)*.5*C*V**2 which you
may have encountered. Or perhaps I=Cdv/dt.

> However, if a higher clock speed is obtained by splitting the pipeline
> into more itty-bitty pieces, for the same level of instruction
latency,
> then one still has the same number of gates, each consuming the same
> amount of power. (Except for the overhead of the pipelining process...
> and one more thing to be noted later.)
If one adds pipe stages one has more gates and more latches and more
clock drivers. And the power per gate goes up because of the higher
frequency.
>
> What is the point of splitting up a pipeline into smaller pieces? Is
it
> to put more megahertz in the ad copy? No, it is so that more
> instructions can be executing, in different stages, at once. (Which
> means that a Pentium IV ought to have explicit vector instructions.
Yes,
> it has a separate instruction cache and data cache, but there's still
> only one bus to *main memory*, and caches do have to get filled from
> somewhere.)

Actually one reason for intel to "superpipeline" was to jack up the freq
for the ad copy.
You lost me with the "Pentium IV ought to have explicit vector
instructions" leap.
>
> Since CMOS gates only consume power when they are changing state,
unused
> elements of a non-pipelined ALU are not consuming power, so it may
well
> be that a 14-stage pipelined ALU can consume twice as much power as a
> 7-stage pipelined ALU.
Or maybe 4 times, if the freq is double.
>
> But that will be because twice as much of it is in use, not because it
> is going "twice as fast".

Clearly they are using "twice as fast" to mean "double the frequency".
Why do you find that so hard to understand?
>
> Since they are still sort of right, even if for the wrong reason,
> perhaps all I am criticizing is an oversimplification here. But I
think
> that this can lead to a profound misconception of how microprocessors
> work.

What ARE you talking about?
>
> John Savard
> http://home.ecn.ab.ca/~jsavard/index.html

Del Cecchi.

Guest · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

George Macdonald wrote:
> On Mon, 06 Dec 2004 18:39:46 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:
>
>
>>George Macdonald wrote:
>>
>>>On Sun, 05 Dec 2004 17:37:16 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:
>
> <<snip>>
>
>>>Thanks for the data but no I guess I should have highlighted better what I
>>>was getting at: "the memory controller is integrated into and operates at
>>>the core speed of the processor", which is what was being
>>>discussed/disputed in another thread.
>>>
>>>I haven't been able to find any hard data from AMD on where the clock
>>>domain boundaries are in the Opteron/Athlon64 but if the memory controller
>>>is not operating at "core speed" it's now at the stage of Internet
>>>Folklore.
>>
>>Ah, that one is much easier to answer. ;-)
>>
>>Straight from the horse's mouth:
>>http://www.amd.com/us-en/Processors/ProductInformation/0%2C%2C30_118_4699_7981%5E7983%2C00.html
>>
>> "By running at the processor’s core frequency, an integrated
>> memory controller greatly increases bandwidth directly available
>> to the processor at significantly reduced latencies."
>
>
> Ah so there we have it... assuming this has been approved by the technical
> folks.🙂 BTW I notice that AMD seems to cutting back on the depth of info
> in their technical docs - the Product Data Sheets now consist of one
> page... a far cry from the excruciating detail on cache operation etc. we
> used to get.

The "Product Data Sheets" are indeed so brief as to be
virtually useless, but there is still a wealth of PDFs
that provide details about just about everything.

The useless Product Data Sheet heads the list of
"AMD Opteron™ Processor Tech Docs" at
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_9003,00.html
but the other PDFs there have mind numbing details about
every little thing that does not give away trade secrets.
For example, read the "BIOS and Kernel Developer's Guide
for AMD Athlon™ 64 and AMD Opteron™ Processors".

Guest · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

In article <pan.2004.12.07.01.37.06.417847@att.bizzzz>,
keith <krw@att.bizzzz> wrote:

>> Note that the STREAM bandwidth and lmbench latency changes with every
>> cpuspeedbump. So clearly part of the memory controller is at the cpu
>> core frequency, or a related frequency, and not at the HT frequency,
>> or the SDRAM external bus frequency.
>
>That does *not* mean that the memory corntoller runs at the core speed.
>>It would be nuts to assume such. Would you assume the cashes of the
>>PII run at the the I/O bus speed?

"or a related frequency", i.e. based on the cpu frequency with a
constant divider.

>> Please reduce the cross-post. Followups set to a group I read.
>
>Isn't his a rather egotistical statement?

No, it follows Usenet tradition: post only to groups that you read.

But thanks for giving me the benefit of the doubt.

-- greg

Guest · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips (More info?)

Eugene Nalimov wrote:

> Greg Lindahl wrote:
>
>> These benchmarks were run with the best Opteron compiler
>
> Visual C?
>
> 🙂

Maybe he meant GCC!

;-)

Guest · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Del Cecchi wrote:

> What braindamaged newsreader are you using that won't let you right
> click the link in the newsreader? Even OE does that. So quit whining
> and switch to a decent newsreader.

Speaking of brain-damaged newsreaders, take a look at the mess yours
did when you quoted John's message. I rest my case.

Guest · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

"Grumble" <devnull@kma.eu.org> wrote in message
news:cp4djh$vdh$1@news-rocq.inria.fr...
> Del Cecchi wrote:
>
> > What braindamaged newsreader are you using that won't let you right
> > click the link in the newsreader? Even OE does that. So quit whining
> > and switch to a decent newsreader.
>
> Speaking of brain-damaged newsreaders, take a look at the mess yours
> did when you quoted John's message. I rest my case.

A few lines got wrapped. That what you are talking about?

del

Guest · Dec 7, 2004

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

Del Cecchi wrote:

> Grumble wrote:
>
>> Del Cecchi wrote:
>>
>>> What braindamaged newsreader are you using that won't let you
>>> right click the link in the newsreader? Even OE does that.
>>> So quit whining and switch to a decent newsreader.
>>
>> Speaking of brain-damaged newsreaders, take a look at the mess
>> yours did when you quoted John's message. I rest my case.
>
> A few lines got wrapped. That what you are talking about?

Yessir!

Perhaps OE-QuoteFix might help if you must use OE?

Guest · Dec 8, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

In article <41B578C0.1000400@sgi.com>, Michael Woodacre wrote:
> Another example would be making sure that people understand that when
> Opteron goes dual core, unless you double the memory bandwidth
> available, you effectively cut the bandwidth per core in half. This will
> impact some workloads quite dramatically. Has AMD made public statements
> about supporting higher local bandwidth for the dual core chip?

No public statements that I know of, but there are rumors that the
90nm Opterons, due Real Soon Now, will support DDR2 in addition to
plain old DDR. See e.g.

http://www.xbitlabs.com/news/cpu/display/20040212022200.html

By the time dual core Opterons arrive, I suspect that DDR2-800 will
also be available, thus providing twice the memory BW compared to the
current single core offerings using DDR-400.

--
Janne Blomqvist

Guest · Dec 8, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Mon, 6 Dec 2004 20:16:21 -0600, "del cecchi" <dcecchi.nojunk@att.net>
wrote, in part:

>What braindamaged newsreader are you using that won't let you right
>click the link in the newsreader?

Clicking on the link in the newsreader, supposing I could do that, would
simply cause the link to open in a browser window. Which is exactly what
I achieved by cutting and pasting.

Maybe some newsreaders do allow right-clicking links. Such newsreaders
would probably also do dangerous and reckless things like rendering HTML
posts instead of displaying them in all their <angle bracket> glory.

This could result in having a brain-damaged computer, were I to view the
wrong post by accident.

As the posting in question was a text posting, this means that the
newsreader would have to guess at what constituted an URL, as well, with
no doubt occasional hilarious results.

John Savard
http://home.ecn.ab.ca/~jsavard/index.html

Guest · Dec 8, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On 06 Dec 2004 14:12:20 +0100, Per Ekman <pek@pdc.kth.se> wrote:

>Tony Hill <hilla_nospam_20@yahoo.ca> writes:
>
>> It does, but the difference is small, usually less than 10% and often
>> much closer to 0%.
>
>And sometimes 50%...

Sure, there will be extreme cases in everything.

>> Most users don't use their computer to run STREAM though. Even in the
>> HPC community where memory bandwidth is king, STREAM is still a rather
>> extreme case.
>
>I admit I'm from the HPC-sector and memory bandwidth is very important
>to many applications here.

One thing that you need to keep in mind is that you represent a VERY
small minority here in terms of PC server sales. Just because it
matters to your application probably doesn't have much reference to
the bulk of the buying public, and it almost certainly isn't going to
have implications for what the marketing people write in the trade
rags.

>> Besides, they do recognize that it is NUMA, just that they are saying
>> you don't NEED to worry about that if you don't want to because for
>> the vast majority of times the performance difference is lost in the
>> noise.
>
>It's a pretty strange argument in my eyes, "If you ignore the
>applications that run poorly because of property X, then it makes
>sense to downplay property X." True, but not helpful if you have such
>an application.

Ahh, but it's VERY helpful if you're in the marketing department! :>

In the end, the people that are going to take a performance due to
lack of NUMA optimizations probably already know as much and have
factored it into their buying decisions. The people who are talking
to Dell or HPaq's server sales and are thinking about an Opteron
system but are worried that this here NoooMah thingy might cause their
application to run slow most likely don't have to worry about much.
Hence SUMO.

It's all a matter of perspective.

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca

Guest · Dec 8, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Wed, 08 Dec 2004 00:19:59 -0500, Tony Hill <hilla_nospam_20@yahoo.ca>
wrote:

>On 06 Dec 2004 14:12:20 +0100, Per Ekman <pek@pdc.kth.se> wrote:
>
>>Tony Hill <hilla_nospam_20@yahoo.ca> writes:
>>
>>> It does, but the difference is small, usually less than 10% and often
>>> much closer to 0%.
>>
>>And sometimes 50%...
>
>Sure, there will be extreme cases in everything.
>
>>> Most users don't use their computer to run STREAM though. Even in the
>>> HPC community where memory bandwidth is king, STREAM is still a rather
>>> extreme case.
>>
>>I admit I'm from the HPC-sector and memory bandwidth is very important
>>to many applications here.
>
>One thing that you need to keep in mind is that you represent a VERY
>small minority here in terms of PC server sales. Just because it
>matters to your application probably doesn't have much reference to
>the bulk of the buying public, and it almost certainly isn't going to
>have implications for what the marketing people write in the trade
>rags.

I think you're underestimating the size of the "workstation" market, which
will include people finding they can migrate down to PC-grade CPUs to
replace old "higher power" systems as well as people on the lower-end
fringe who may have grown their problem complexity beyond a uni-PC, or who
*could* get by with a fastish PC but like the comfort of the move up to
dual for future growth. Add them to the current established base of CAD,
engineering and modeling etc. applications and there is a decent sized
market.

There are a lot of mathematical/engineering problems out there which are
just part of everyday business computing - many *used* to be considered HPC
and are now quite routine on desktop sized boxes. In many cases,
proprietary (purchased) software is used and the algorithmic methods are
only understood fairly superficially by the user; what that user wants is
response, whether it's measured in minutes, hours or a day or more. The
software vendor thus feels responsible for supplying the best combination
of software and recommended hardware selection.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??

keith · Dec 9, 2004

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

On Tue, 07 Dec 2004 09:56:44 -0800, Greg Lindahl wrote:

> In article <pan.2004.12.07.01.37.06.417847@att.bizzzz>,
> keith <krw@att.bizzzz> wrote:
>
>>> Note that the STREAM bandwidth and lmbench latency changes with every
>>> cpuspeedbump. So clearly part of the memory controller is at the cpu
>>> core frequency, or a related frequency, and not at the HT frequency,
>>> or the SDRAM external bus frequency.
>>
>>That does *not* mean that the memory corntoller runs at the core speed.
>>>It would be nuts to assume such. Would you assume the cashes of the
>>>PII run at the the I/O bus speed?
>
> "or a related frequency", i.e. based on the cpu frequency with a
> constant divider.

Ok, how many "unrelated frequencies" are there in a CPU? Let's get real
here.

>>> Please reduce the cross-post. Followups set to a group I read.
>>
>>Isn't his a rather egotistical statement?
>
> No, it follows Usenet tradition: post only to groups that you read.

No, that is *not* Usenet tradition. The tradition is to limit
cross-postings to on-topic newsgroups. Cross-posting is not expensive
(unless you have a dran-bamaged newsreader).

> But thanks for giving me the benefit of the doubt.

Cutting off your audience, particularly those who *you* have responded to
is rude. Sorry if I've ruffled your feathers!

--
Keith

Guest · Dec 9, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

David Schwartz wrote:
> The scaling advantage comes largely from the architecture of a single
> processor. The memory controller is on the chip. The main reason this
> matters is that it means that local memory accesses don't have to content
> with any other inter-CPU or I/O traffic.

That's only partly true. The Opterons still talk to each other even on local
accesses (coherency tokens only, no real data transfer). This takes both
time and adds to the traffic, since such a token needs to get everywhere.

What's missing here is a "exclusive" bit in the page table, for non-coherent
pages. The OS pretty well knows (or can know) which core is accessing a
page, and for a page that's not shared, the coherency token is not
necessary.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Guest · Dec 14, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

In comp.arch David Schwartz <davids@webmaster.com> wrote:
> In typical Opteron setups (2-8 CPUs, using the Opteron's build
> in SMP hardware), the latency difference between local and remote
> memory accesses is so small that the benefits of treating it as NUMA
> are typically outweighed by the costs.

SPECweb99_SSL is probably atypical then (Yes, one of my favorite
benchmarks

- the evolution of the tunes for Opteron systems on that
benchmark show the size of the Zeus tuanble "cache_small_file"
increasing to 90000 bytes. That brings many more of the URLs into the
"malloc" cache of Zeus where they are replicated per Zeus instance and
in this case then per-CPU (things being bound to CPUs) "Normal"
practice is to have cache_small_file be "NBPG"/numCPU to optimize the
memory comsumption.

It all depends of course

Maybe that wasn't done for latency but to
cut-down the bandwidth consumed. Who knows - although I am interested
in trying to find-out

> Generally, you just distribute the memory evenly and interleaved on
> the nodes (if you can) to avoid overloading one memory controller
> channel.

FWIW, I've noticed that Node interleave is (or seems to be, it was set
that way on the first one I saw and had no indication from the source
that it had been altered) disabled by default on the Sun V20z's.
Anyone have data on how Node interleave defaults on other
Opteron-based systems?

rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway...

feel free to post, OR email to raj in cup.hp.com but NOT BOTH...

Guest · Dec 15, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Rick Jones <foo@bar.baz.invalid> writes:

>
>FWIW, I've noticed that Node interleave is (or seems to be, it was set
>that way on the first one I saw and had no indication from the source
>that it had been altered) disabled by default on the Sun V20z's.
>Anyone have data on how Node interleave defaults on other
>Opteron-based systems?

It defaults to "off" on Penguin systems, too.

scott

Guest · Dec 15, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Rick Jones <foo@bar.baz.invalid> writes:

> FWIW, I've noticed that Node interleave is (or seems to be, it was set
> that way on the first one I saw and had no indication from the source
> that it had been altered) disabled by default on the Sun V20z's.
> Anyone have data on how Node interleave defaults on other
> Opteron-based systems?

As far as I know it's disabled by default on most shipping Opteron
servers. Only a few build-it-yourself dual motherboards have it
enabled by default.

For Linux use i would recommend to always disable it. The modern
kernel can do page interleaving on demand (with numactl or libnuma),
which is nearly as good, and most programs seem to just prefer
good memory latency.

-Andi

Guest · Dec 15, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips (More info?)

lindahl@pbm.com (Greg Lindahl) writes:

> benchmark 1 3.71 3.03 + 22 %
> benchmark 2 3.76 3.29 + 14 %
> benchmark 3 3.78 3.26 + 16 %
> benchmark 4 3.79 3.45 + 10 %
> benchmark 5 3.92 3.89 + 1 %
> benchmark 6 3.88 3.71 + 5 %
>
> These benchmarks were run with the best Opteron compiler, so this
> scaling improvement was very good to see. And it's bigger than
> "usually less than 10%".

Averages out to 11 % .

Sounds like "usually less than 10%" may be right when talking about non scientific workloads.

Guest · Dec 15, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

jsavard@excxn.aNOSPAMb.cdn.invalid (John Savard) writes:

> As the posting in question was a text posting, this means that the
> newsreader would have to guess at what constituted an URL, as well, with
> no doubt occasional hilarious results.

Sorry, you dont make sense.
You really should get a decent newsreader.

keith · Dec 15, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Wed, 15 Dec 2004 03:35:43 +0000, Israel T wrote:

> jsavard@excxn.aNOSPAMb.cdn.invalid (John Savard) writes:
>
>> As the posting in question was a text posting, this means that the
>> newsreader would have to guess at what constituted an URL, as well, with
>> no doubt occasional hilarious results.
>
> Sorry, you dont make sense.
> You really should get a decent newsreader.

Hmmm, I alwasy though Agent was fairly good. Perhaps yours can't show
headers? ...oh, another emacs bigot.

--
Keith

Guest · Dec 15, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

keith <krw@att.bizzzz> writes:

> Hmmm, I alwasy though Agent was fairly good. Perhaps yours can't show
> headers?

I used Agent for some years untill it's limitations became irritating.

>...oh, another emacs bigot.

It is a matter of using the right tool for the job.
Emac's mail/news sub-system, Gnus is superb.

Guest · Dec 15, 2004

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Tue, 14 Dec 2004 23:41:04 -0500, keith <krw@att.bizzzz> wrote:

>On Wed, 15 Dec 2004 03:35:43 +0000, Israel T wrote:
>
>> jsavard@excxn.aNOSPAMb.cdn.invalid (John Savard) writes:
>>
>>> As the posting in question was a text posting, this means that the
>>> newsreader would have to guess at what constituted an URL, as well, with
>>> no doubt occasional hilarious results.
>>
>> Sorry, you dont make sense.
>> You really should get a decent newsreader.
>
>Hmmm, I alwasy though Agent was fairly good. Perhaps yours can't show
>headers? ...oh, another emacs bigot.

Well jsavard is using an *old* version of Free Agent but even the 1.93 I'm
using doesn't have a right click and "Save Link Target As.." I dunno what
the big deal is on either side here - copy/paste of a URL is always coming
up as a nuisance for file downloads, especially with the Adobe reader 6.0
being so damned slow to get started - the plugin has to load its err,
plugins to get started and then you also have to have it configured to turn
off "fast web view" to get the whole document without paging through the
ah heck... all a royal PITA.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??

Pretty good explanation of x86-64 by HP

Guest

Guest

Guest

Guest

keith

Distinguished

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

keith

Distinguished

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

Guest

keith

Distinguished

Guest

Guest

Guest

Guest

TRENDING THREADS

Latest posts

Moderators online

Share this page