Pretty good explanation of x86-64 by HP

G

Guest

Guest
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

I found this whitepaper from HP to be pretty good, it is surprisingly
candid, considering HP was the coinventor of the Itanium. It does a
pretty good job of explaining and summarizing the similarities and
differences between AMD64 and EM64T, and their comparison to the
Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
compatible", but IA64 is a different animal altogether.

Yousuf Khan

http://h200001.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote:

>I found this whitepaper from HP to be pretty good, it is surprisingly
>candid, considering HP was the coinventor of the Itanium. It does a
>pretty good job of explaining and summarizing the similarities and
>differences between AMD64 and EM64T, and their comparison to the
>Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>compatible", but IA64 is a different animal altogether.
>
> Yousuf Khan
>
>http://h200001.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf

Hmm and the following quote: "However, the latency difference between local
and remote accesses is actually very small because the memory controller is
integrated into and operates at the core speed of the processor, and
because of the fast interconnect between processors." is relevant to
another discussion here. I wish we could get a firm answer on this one.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Yousuf Khan wrote:
> I found this whitepaper from HP to be pretty good, it is surprisingly
> candid, considering HP was the coinventor of the Itanium. It does a
> pretty good job of explaining and summarizing the similarities and
> differences between AMD64 and EM64T, and their comparison to the
> Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
> compatible", but IA64 is a different animal altogether.
>
> Yousuf Khan
>
> http://h200001.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf

When did the non-Xeon Prescott P4s start offering EMT64 as listed in
the paper? News to me. Does HP know something the rest of the world
doesn't?

Bill
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 06:44:31 GMT, Bill Bradley
<senator2@NOSPAMearthlink.net> wrote:

>Yousuf Khan wrote:
>> I found this whitepaper from HP to be pretty good, it is surprisingly
>> candid, considering HP was the coinventor of the Itanium. It does a
>> pretty good job of explaining and summarizing the similarities and
>> differences between AMD64 and EM64T, and their comparison to the
>> Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>> compatible", but IA64 is a different animal altogether.
>>
>> Yousuf Khan
>>
>> http://h200001.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf
>
> When did the non-Xeon Prescott P4s start offering EMT64 as listed in
>the paper? News to me. Does HP know something the rest of the world
>doesn't?

Not that they know something the rest of the world doesn't, just that
they have access to processors that most of us do not. IBM sells them
as well, but for the time being Intel will ONLY sell them for use in
servers. Why? I really don't know. Maybe it's just a bit too much
crow for them to eat after saying (only a bit over a year ago) that
64-bit wouldn't be useful for the desktop until the end of the year?

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Bill Bradley wrote:
> When did the non-Xeon Prescott P4s start offering EMT64 as listed in
> the paper? News to me. Does HP know something the rest of the world
> doesn't?

It must have been at least two or three months now, I posted a message
about it in one of these newsgroups.

Google Search: g:thl403337196d
http://groups.google.ca/groups?q=g:thl403337196d&dq=&hl=en&lr=&selm=cGVPc.1412825%24Ar.705528%40twister01.bloor.is.net.cable.rogers.com

or,

http://tinyurl.com/6tnjy

Yousuf Khan
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

George Macdonald wrote:
> Hmm and the following quote: "However, the latency difference between local
> and remote accesses is actually very small because the memory controller is
> integrated into and operates at the core speed of the processor, and
> because of the fast interconnect between processors." is relevant to
> another discussion here. I wish we could get a firm answer on this one.

Yeah, but that's why I think AMD insists on calling their multiprocessor
connection scheme as SUMO (Sufficiently Uniform Memory Organization),
rather than NUMA. It's not worth headaching over such small differences
in latency, is basically what they're saying.

Yousuf Khan
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

George Macdonald wrote:
> On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote:
>
>
>>I found this whitepaper from HP to be pretty good, it is surprisingly
>>candid, considering HP was the coinventor of the Itanium. It does a
>>pretty good job of explaining and summarizing the similarities and
>>differences between AMD64 and EM64T, and their comparison to the
>>Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>>compatible", but IA64 is a different animal altogether.
>>
>> Yousuf Khan
>>
>>http://h200001.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf
>
>
> Hmm and the following quote: "However, the latency difference between local
> and remote accesses is actually very small because the memory controller is
> integrated into and operates at the core speed of the processor, and
> because of the fast interconnect between processors." is relevant to
> another discussion here. I wish we could get a firm answer on this one.
>

Not sure if this is exactly what you are looking for in the
way of a "firm answer", but the latencies in a Opteron system are:

0 hops 80 ns uniprocessor (Local access)
100 ns multiprocessor (Local access, with cache snooping on other processors)
1 hop 115 ns
2 hops 150 ns
3 hops 190 ns

I couldn't find my original source for those numbers, and
the two and three hop numbers above are a little higher
than I remembered them as being. This time around I got
them from this thread:
http://www.aceshardware.com/forum?read=80030960

That thread refers to this article:
http://www.digit-life.com/articles2/amd-hammer-family/
which gives slightly different numbers for a 2 GHz Opteron
with DDR333:
Uni-processor system: 45 ns
Dual-processor system: 0-hop - 69 ns, 1-hop - 117 ns.
Four-processor system: 0-hop - 100 ns, 1-hop - 118 ns, 2-hop - 136 ns.


I don't know if any of the numbers above are for cache misses
or if they are averages that include both hits and misses.
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

"Bill Bradley" <senator2@NOSPAMearthlink.net> wrote in message
news:j7ysd.1769$yr1.125@newsread3.news.pas.earthlink.net...
> Yousuf Khan wrote:
>> I found this whitepaper from HP to be pretty good, it is surprisingly
>> candid, considering HP was the coinventor of the Itanium. It does a
>> pretty good job of explaining and summarizing the similarities and
>> differences between AMD64 and EM64T, and their comparison to the
>> Itanium's IA64 instruction set. AMD64 and EM64T are "broadly compatible",
>> but IA64 is a different animal altogether.
>>
>> Yousuf Khan
>>
>> http://h200001.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf
>
> When did the non-Xeon Prescott P4s start offering EMT64 as listed in the
> paper? News to me. Does HP know something the rest of the world
> doesn't?
>
> Bill

www.overclockers.co.uk had some a few weeks back, and htey sold very quikly.
I think theres a few more in now.

hamman
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

"George Macdonald" <fammacd=!SPAM^nothanks@tellurian.com> wrote in message
news:hmr5r05drs3hird56j69qs2nbu5mth1b95@4ax.com...

> Hmm and the following quote: "However, the latency difference between
> local
> and remote accesses is actually very small because the memory controller
> is
> integrated into and operates at the core speed of the processor, and
> because of the fast interconnect between processors." is relevant to
> another discussion here. I wish we could get a firm answer on this one.

In typical Opteron setups (2-8 CPUs, using the Opteron's build in SMP
hardware), the latency difference between local and remote memory accesses
is so small that the benefits of treating it as NUMA are typically
outweighed by the costs. Generally, you just distribute the memory evenly
and interleaved on the nodes (if you can) to avoid overloading one memory
controller channel.

DS
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Tony Hill <hilla_nospam_20@yahoo.ca> writes:

>Not that they know something the rest of the world doesn't, just that
>they have access to processors that most of us do not. IBM sells them
>as well, but for the time being Intel will ONLY sell them for use in
>servers. Why? I really don't know. Maybe it's just a bit too much
>crow for them to eat after saying (only a bit over a year ago) that
>64-bit wouldn't be useful for the desktop until the end of the year?

How much does Intel stockpile? Could it be that they have warehouses
full of already produced non-64-bit processors, and those want to be
sold at the projected prices, not thrown away?

best regards
Patrick
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

> Patrick Schaaf <mailer-daemon@bof.de> wrote:

> How much does Intel stockpile?

Well, according to the Reg,
<http://www.theregister.co.uk/2004/12/03/intel_eol_p2/>
they just finally announced EOL for the Pentium-II.

"The Register reveals that you'll be able to continue
ordering the part for a year, with the last trays
leaving the chip giant's Pentium II warehouse on
1 June 2006."

> Could it be that they have warehouses full of already
> produced non-64-bit processors, and those want to be
> sold at the projected prices, not thrown away?

Whether there is any connection between your hypothesis
and the Reg news, is left as an exercise for the reader 🙂

--
Regards, Bob Niland mailto:name@ispname.tld
http://www.access-one.com/rjn email4rjn AT yahoo DOT com
NOT speaking for any employer, client or Internet Service Provider.
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 16:30:15 -0500, Yousuf Khan wrote:

> George Macdonald wrote:
>> Hmm and the following quote: "However, the latency difference between local
>> and remote accesses is actually very small because the memory controller is
>> integrated into and operates at the core speed of the processor, and
>> because of the fast interconnect between processors." is relevant to
>> another discussion here. I wish we could get a firm answer on this one.
>
> Yeah, but that's why I think AMD insists on calling their multiprocessor
> connection scheme as SUMO (Sufficiently Uniform Memory Organization),
> rather than NUMA. It's not worth headaching over such small differences
> in latency, is basically what they're saying.

I'd say that because in small systems (less than 8 CPUs), Opterons are
coherent in hardware thus sufficiently tightly coupled to be called UMA,
as far as the user is concerned.

--
Keith
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 19:47:30 +0000, Patrick Schaaf wrote:

> Tony Hill <hilla_nospam_20@yahoo.ca> writes:
>
>>Not that they know something the rest of the world doesn't, just that
>>they have access to processors that most of us do not. IBM sells them
>>as well, but for the time being Intel will ONLY sell them for use in
>>servers. Why? I really don't know. Maybe it's just a bit too much
>>crow for them to eat after saying (only a bit over a year ago) that
>>64-bit wouldn't be useful for the desktop until the end of the year?
>
> How much does Intel stockpile? Could it be that they have warehouses
> full of already produced non-64-bit processors, and those want to be
> sold at the projected prices, not thrown away?

Unsold inventory is a very bad thing indeed. The tax man isn't happy.
Stockholders aren't happy. Executives shiver.

--
Keith
 
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

On Sun, 05 Dec 2004 19:12:44 -0800, Greg Lindahl wrote:

> In article <pan.2004.12.06.02.44.34.997332@att.bizzzz>,
> keith <krw@att.bizzzz> wrote:
>
>>I'd say that because in small systems (less than 8 CPUs), Opterons are
>>coherent in hardware thus sufficiently tightly coupled to be called UMA,
>>as far as the user is concerned.
>
> However, it's not hard to show with benchmarks that paying attention
> to the NUMA nature of the Opteron is a significant win. So you can
> call it what you want, but...

Point well taken. So we have a desert topping and a floor wax. ;-)

> Newsgroups trimmed.

..chips added back in.

--
Keith
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

In comp.arch Tony Hill <hilla_nospam_20@yahoo.ca> wrote:

> Not that they know something the rest of the world doesn't, just that
> they have access to processors that most of us do not. IBM sells them
> as well, but for the time being Intel will ONLY sell them for use in
> servers. Why? I really don't know.

FWIW, Dell are shipping EM64T-equipped non-Xeon P4 workstations (the
Precision 370).

-a
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 17:37:16 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:

>George Macdonald wrote:
>> On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote:
>>
>>
>>>I found this whitepaper from HP to be pretty good, it is surprisingly
>>>candid, considering HP was the coinventor of the Itanium. It does a
>>>pretty good job of explaining and summarizing the similarities and
>>>differences between AMD64 and EM64T, and their comparison to the
>>>Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>>>compatible", but IA64 is a different animal altogether.
>>>
>>> Yousuf Khan
>>>
>>>http://h200001.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf
>>
>>
>> Hmm and the following quote: "However, the latency difference between local
>> and remote accesses is actually very small because the memory controller is
>> integrated into and operates at the core speed of the processor, and
>> because of the fast interconnect between processors." is relevant to
>> another discussion here. I wish we could get a firm answer on this one.
>>
>
>Not sure if this is exactly what you are looking for in the
>way of a "firm answer", but the latencies in a Opteron system are:
>
>0 hops 80 ns uniprocessor (Local access)
> 100 ns multiprocessor (Local access, with cache snooping on other processors)
>1 hop 115 ns
>2 hops 150 ns
>3 hops 190 ns
>
>I couldn't find my original source for those numbers, and
>the two and three hop numbers above are a little higher
>than I remembered them as being. This time around I got
>them from this thread:
>http://www.aceshardware.com/forum?read=80030960
>
>That thread refers to this article:
> http://www.digit-life.com/articles2/amd-hammer-family/
>which gives slightly different numbers for a 2 GHz Opteron
>with DDR333:
> Uni-processor system: 45 ns
> Dual-processor system: 0-hop - 69 ns, 1-hop - 117 ns.
> Four-processor system: 0-hop - 100 ns, 1-hop - 118 ns, 2-hop - 136 ns.
>
>
>I don't know if any of the numbers above are for cache misses
>or if they are averages that include both hits and misses.

Thanks for the data but no I guess I should have highlighted better what I
was getting at: "the memory controller is integrated into and operates at
the core speed of the processor", which is what was being
discussed/disputed in another thread.

I haven't been able to find any hard data from AMD on where the clock
domain boundaries are in the Opteron/Athlon64 but if the memory controller
is not operating at "core speed" it's now at the stage of Internet
Folklore.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 16:30:15 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote:

>George Macdonald wrote:
>> Hmm and the following quote: "However, the latency difference between local
>> and remote accesses is actually very small because the memory controller is
>> integrated into and operates at the core speed of the processor, and
>> because of the fast interconnect between processors." is relevant to
>> another discussion here. I wish we could get a firm answer on this one.
>
>Yeah, but that's why I think AMD insists on calling their multiprocessor
> connection scheme as SUMO (Sufficiently Uniform Memory Organization),
>rather than NUMA. It's not worth headaching over such small differences
>in latency, is basically what they're saying.

See my reply to Rob Stow.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips (More info?)

In article <8c97r0hqh2sqf8sh89ut3153lpdmddfs76@4ax.com>,
George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:

>I haven't been able to find any hard data from AMD on where the clock
>domain boundaries are in the Opteron/Athlon64 but if the memory controller
>is not operating at "core speed" it's now at the stage of Internet
>Folklore.

Note that the STREAM bandwidth and lmbench latency changes with every
cpuspeedbump. So clearly part of the memory controller is at the cpu
core frequency, or a related frequency, and not at the HT frequency,
or the SDRAM external bus frequency.

Please reduce the cross-post. Followups set to a group I read.

-- greg
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips (More info?)

In article <pqe8r0hsb4vfiqv4uvpk6h2h7cn8gq5q37@4ax.com>,
Tony Hill <hilla_nospam_20@yahoo.ca> wrote:

>It does, but the difference is small, usually less than 10% and often
>much closer to 0%.

No, it's not. The Opteron builds the best 4-cpu SMP system out there
according to the SPECrate2000 cpu benchmark, but in order to get that
best result, you need to pin the individual processes to cpus and
memory using a utility. Without it, the performance is no longer the
best. So people really care about that last bit of performance.

Now I don't have the directly comparison for that, but here's a
comparison on some benchmarks for a recent competitive bid. "Slow" is
a system without the processor binding and with "node interleave"
turned on. "Fast" is with processor binding and node interleave off,
which lets the processor binding have the best benefit. Note that it's
only a trivial amount of work to get this improvement for a serial
code, so this is a common situation, although these benchmarks are, of
course, particular to this scientific-computing customer. In these
results, the comparison is scaling for 4 processes on a 4 cpu machine.
4.0 would be a perfect score.

fast slow difference
benchmark 1 3.71 3.03 + 22 %
benchmark 2 3.76 3.29 + 14 %
benchmark 3 3.78 3.26 + 16 %
benchmark 4 3.79 3.45 + 10 %
benchmark 5 3.92 3.89 + 1 %
benchmark 6 3.88 3.71 + 5 %

These benchmarks were run with the best Opteron compiler, so this
scaling improvement was very good to see. And it's bigger than
"usually less than 10%".

> When well over 90% of your memory access is coming
> from cache anyway and (assuming a totally random distribution in a
> strictly UMA setup) 50% of your memory access is going to be local,
> most of the performance difference is lost in the noise.

Handwaving is a bad way to evaluate effects like this.

>I've said it before and I'll say it again: Hardware is cheap,
>software is expensive. It would be a true disservice to your
>customers to tell them to spend thousands upon thousands of dollars
>changing all their software for the small improvement in performance
>equal to a few hundred dollars of hardware costs.

Customers know what 10% or 20% more performance means, as do vendors
who are doing competitive bidding. The fact that I care a lot about
this should give you a clue. And in some cases, such as serial codes,
the benefits are easy to achieve. It took only a moderate amount of
work in our OpenMP compiler and runtime to get these benefits for some
parallel programs, too. Well worth it to our customers.

-- greg
speaking for myself, not PathScale
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On 05 Dec 2004 19:47:30 GMT, mailer-daemon@bof.de (Patrick Schaaf)
wrote:

>Tony Hill <hilla_nospam_20@yahoo.ca> writes:
>
>>Not that they know something the rest of the world doesn't, just that
>>they have access to processors that most of us do not. IBM sells them
>>as well, but for the time being Intel will ONLY sell them for use in
>>servers. Why? I really don't know. Maybe it's just a bit too much
>>crow for them to eat after saying (only a bit over a year ago) that
>>64-bit wouldn't be useful for the desktop until the end of the year?
>
>How much does Intel stockpile? Could it be that they have warehouses
>full of already produced non-64-bit processors, and those want to be
>sold at the projected prices, not thrown away?

ALL of the "Prescott" and "Nocona" cores are 64-bit capable excluding
those that would pass a validation as 32-bit chips but fail as 64-bit
chips, but such chips would be rather few and far between. It could
be that Intel still has a reasonable amount of inventory of their old
"Northwood" P4 chips and they want to clear those out first, but that
certainly doesn't seem to be the case looking at Intel's pricing
structure and what is being sold by the major OEMs (Intel seems to be
pushing Prescott VERY hard here).

Long story short, I'm not quite sure what the actual answer is, but
excessive inventory of 32-bit chips doesn't seem to make sense from
what I've seen.

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On 6 Dec 2004 00:30:48 GMT, ammonton@cc.full.stop.helsinki.fi wrote:

>In comp.arch Tony Hill <hilla_nospam_20@yahoo.ca> wrote:
>
>> Not that they know something the rest of the world doesn't, just that
>> they have access to processors that most of us do not. IBM sells them
>> as well, but for the time being Intel will ONLY sell them for use in
>> servers. Why? I really don't know.
>
>FWIW, Dell are shipping EM64T-equipped non-Xeon P4 workstations (the
>Precision 370).

Ahh, thanks. When I first wrote the above I had actually included
Dell's name as well, but then removed it when I couldn't find any
EM64T P4 processors in any of their servers (didn't think to check
workstations first). I figured that if anyone was selling 64-bit P4s
it would be Dell!

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips (More info?)

"Greg Lindahl" <lindahl@pbm.com> wrote in message
news:41b45512$1@news.meer.net...
> ...
> These benchmarks were run with the best Opteron compiler
> ...

Visual C?

🙂

Thanks,
Eugene
 
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

Greg Lindahl wrote:

> These benchmarks were run with the best Opteron compiler [...]

Which compiler would that be? PathScale?

🙂
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

keith wrote:
> I'd say that because in small systems (less than 8 CPUs), Opterons are
> coherent in hardware thus sufficiently tightly coupled to be called UMA,
> as far as the user is concerned.

Yes, exactly my point, it's more or less UMA in the upto 8 processor
range. After that, then you can start thinking of it as NUMA. But having
upto 8 processors being treated as UMA is quite a lot.

Yousuf Khan
 
Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

George Macdonald wrote:
> On Sun, 05 Dec 2004 17:37:16 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:
>
>
>>George Macdonald wrote:
>>
>>>On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote:
>>>
>>>
>>>
>>>>I found this whitepaper from HP to be pretty good, it is surprisingly
>>>>candid, considering HP was the coinventor of the Itanium. It does a
>>>>pretty good job of explaining and summarizing the similarities and
>>>>differences between AMD64 and EM64T, and their comparison to the
>>>>Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>>>>compatible", but IA64 is a different animal altogether.
>>>>
>>>> Yousuf Khan
>>>>
>>>>http://h200001.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf
>>>
>>>
>>>Hmm and the following quote: "However, the latency difference between local
>>>and remote accesses is actually very small because the memory controller is
>>>integrated into and operates at the core speed of the processor, and
>>>because of the fast interconnect between processors." is relevant to
>>>another discussion here. I wish we could get a firm answer on this one.
>>>
>>
>>Not sure if this is exactly what you are looking for in the
>>way of a "firm answer", but the latencies in a Opteron system are:
>>
>>0 hops 80 ns uniprocessor (Local access)
>> 100 ns multiprocessor (Local access, with cache snooping on other processors)
>>1 hop 115 ns
>>2 hops 150 ns
>>3 hops 190 ns
>>
>>I couldn't find my original source for those numbers, and
>>the two and three hop numbers above are a little higher
>>than I remembered them as being. This time around I got
>>them from this thread:
>>http://www.aceshardware.com/forum?read=80030960
>>
>>That thread refers to this article:
>> http://www.digit-life.com/articles2/amd-hammer-family/
>>which gives slightly different numbers for a 2 GHz Opteron
>>with DDR333:
>> Uni-processor system: 45 ns
>> Dual-processor system: 0-hop - 69 ns, 1-hop - 117 ns.
>> Four-processor system: 0-hop - 100 ns, 1-hop - 118 ns, 2-hop - 136 ns.
>>
>>
>>I don't know if any of the numbers above are for cache misses
>>or if they are averages that include both hits and misses.
>
>
> Thanks for the data but no I guess I should have highlighted better what I
> was getting at: "the memory controller is integrated into and operates at
> the core speed of the processor", which is what was being
> discussed/disputed in another thread.
>
> I haven't been able to find any hard data from AMD on where the clock
> domain boundaries are in the Opteron/Athlon64 but if the memory controller
> is not operating at "core speed" it's now at the stage of Internet
> Folklore.

Ah, that one is much easier to answer. ;-)

Straight from the horse's mouth:
http://www.amd.com/us-en/Processors/ProductInformation/0%2C%2C30_118_4699_7981%5E7983%2C00.html

"By running at the processor’s core frequency, an integrated
memory controller greatly increases bandwidth directly available
to the processor at significantly reduced latencies."