external cache on an amd64

G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

ive recently had a thought and figured i would post here to see if it were
mad or not.

my understanding is the amd has 3 hypertransport links which can connect
ram other slots or ever other processors to it. now i was wondering if it
would be possible for a motherboard manufacturer to build a board which
used one of these transports to connect to some exterenal but VERY high
frequency ram like back in the pentium days.

i would guess this would need an obteron as they have extra ht's but the
idea of say a 32-64meg L3 cache has got to be useful to someone and would
mean opterons could go up against the xeons with massive cache.
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

epaton wrote:
> ive recently had a thought and figured i would post here to see if it were
> mad or not.

Yup, mad.

> my understanding is the amd has 3 hypertransport links which can connect
> ram other slots or ever other processors to it. now i was wondering if it
> would be possible for a motherboard manufacturer to build a board which
> used one of these transports to connect to some exterenal but VERY high
> frequency ram like back in the pentium days.

The Hypertransport is *not* used to connect to RAM, it is only used to
connect to periperals or other Opterons. AMD uses an internal memory
controller to connect to RAM, not the HT. It's been misrepresented that
the Hypertransport and memory controller are the same thing.

That's why AMD came up with the term Direct Connect Architecture (DCA)
as an all-encompassing term to categorize both the Hypertransport and
the internal memory controller without making it look like HT and mem
controller are the same. It's best to think of HT being used mostly for
I/O transactions, while memory controller is used for memory transactions.

However, you will note that HT is used to connect to other Opterons. The
other Opterons might make their own local pool of memory available to
other Opterons, and in that case the memory transactions *are* going
over the HT bus. In a properly designed OS, the amount of RAM used
outside of the local pool would be minimal though.

Yousuf Khan
 

mygarbage2000

Distinguished
Jun 5, 2002
126
0
18,680
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Sun, 10 Apr 2005 01:15:53 GMT, epaton <epaton@null.com> wrote:

>ive recently had a thought and figured i would post here to see if it were
>mad or not.
>
>my understanding is the amd has 3 hypertransport links which can connect
>ram other slots or ever other processors to it. now i was wondering if it
>would be possible for a motherboard manufacturer to build a board which
>used one of these transports to connect to some exterenal but VERY high
>frequency ram like back in the pentium days.
>
> i would guess this would need an obteron as they have extra ht's but the
> idea of say a 32-64meg L3 cache has got to be useful to someone and would
> mean opterons could go up against the xeons with massive cache.

Another child left behind... Did anybody ever tell you that proper
capitalization makes texts easier to read? That's to begin with.
Then check the GED requirements and see what else you need to learn to
qualify for it. That should really help you to find your place under
the sun.
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Sun, 10 Apr 2005 01:15:53 GMT, epaton <epaton@null.com> wrote:

>ive recently had a thought and figured i would post here to see if it were
>mad or not.
>
>my understanding is the amd has 3 hypertransport links which can connect
>ram other slots or ever other processors to it. now i was wondering if it
>would be possible for a motherboard manufacturer to build a board which
>used one of these transports to connect to some exterenal but VERY high
>frequency ram like back in the pentium days.
>
> i would guess this would need an obteron as they have extra ht's but the
> idea of say a 32-64meg L3 cache has got to be useful to someone and would
> mean opterons could go up against the xeons with massive cache.

I thought it was the other way around: Intel puts big caches on Xeons to
try to keep up with Opterons.:) They use bandwidth to err, hide latency.

--
Rgds, George Macdonald
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Sun, 10 Apr 2005 02:49:51 -0400, George Macdonald
<fammacd=!SPAM^nothanks@tellurian.com> wrote:

>
>I thought it was the other way around: Intel puts big caches on Xeons to
>try to keep up with Opterons.:) They use bandwidth to err, hide latency.

And a damn fine job they do of it, too. ;-).

RM
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

"nobody@nowhere.net" <mygarbage2000@hotmail.com> wrote in message
news:eek:31h51l5fpb8lq2ul8ifjigik08n9trocg@4ax.com...
> On Sun, 10 Apr 2005 01:15:53 GMT, epaton <epaton@null.com> wrote:

> Another child left behind... Did anybody ever tell you that proper
> capitalization makes texts easier to read? That's to begin with.
> Then check the GED requirements and see what else you need to learn to
> qualify for it. That should really help you to find your place under
> the sun.

What a total jackass you are nowhere man. Time to killfile once again.
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Sun, 10 Apr 2005 19:57:11 -0400, Robert Myers <rmyers1400@comcast.net>
wrote:

>On Sun, 10 Apr 2005 02:49:51 -0400, George Macdonald
><fammacd=!SPAM^nothanks@tellurian.com> wrote:
>
>>
>>I thought it was the other way around: Intel puts big caches on Xeons to
>>try to keep up with Opterons.:) They use bandwidth to err, hide latency.
>
>And a damn fine job they do of it, too. ;-).

So should Intel say: "thank you IBM"???... for rendering the big caches
unnecessary?

--
Rgds, George Macdonald
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Sun, 10 Apr 2005 23:57:18 -0400, George Macdonald
<fammacd=!SPAM^nothanks@tellurian.com> wrote:

>On Sun, 10 Apr 2005 19:57:11 -0400, Robert Myers <rmyers1400@comcast.net>
>wrote:
>
>>On Sun, 10 Apr 2005 02:49:51 -0400, George Macdonald
>><fammacd=!SPAM^nothanks@tellurian.com> wrote:
>>
>>>
>>>I thought it was the other way around: Intel puts big caches on Xeons to
>>>try to keep up with Opterons.:) They use bandwidth to err, hide latency.
>>
>>And a damn fine job they do of it, too. ;-).
>
>So should Intel say: "thank you IBM"???... for rendering the big caches
>unnecessary?

I think Intel would be much more grateful to IBM if those chips were
delivering bytes to Itanium.

Here, and even more in comp.arch, large amounts of time are spent
talking about minute details of microarchitecture. Some of the
details that matter the most: the prefetch algorithm, the details of
the frontside bus, and the details of the actual operation of the
chipset are held proprietary. As neat as it is to know _what_ IBM
did, it would be even more interesting to know about the _how_. Then
we could pursue questions like the present one in less of a vacuum of
speculation.

RM
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Mon, 11 Apr 2005 07:11:11 -0400, Robert Myers <rmyers1400@comcast.net>
wrote:

>On Sun, 10 Apr 2005 23:57:18 -0400, George Macdonald
><fammacd=!SPAM^nothanks@tellurian.com> wrote:
>
>>On Sun, 10 Apr 2005 19:57:11 -0400, Robert Myers <rmyers1400@comcast.net>
>>wrote:
>>
>>>On Sun, 10 Apr 2005 02:49:51 -0400, George Macdonald
>>><fammacd=!SPAM^nothanks@tellurian.com> wrote:
>>>
>>>>
>>>>I thought it was the other way around: Intel puts big caches on Xeons to
>>>>try to keep up with Opterons.:) They use bandwidth to err, hide latency.
>>>
>>>And a damn fine job they do of it, too. ;-).
>>
>>So should Intel say: "thank you IBM"???... for rendering the big caches
>>unnecessary?
>
>I think Intel would be much more grateful to IBM if those chips were
>delivering bytes to Itanium.

Save them from the embarrassment of having pissed all those $$ down the
drain?:)

>Here, and even more in comp.arch, large amounts of time are spent
>talking about minute details of microarchitecture. Some of the
>details that matter the most: the prefetch algorithm, the details of
>the frontside bus, and the details of the actual operation of the
>chipset are held proprietary. As neat as it is to know _what_ IBM
>did, it would be even more interesting to know about the _how_. Then
>we could pursue questions like the present one in less of a vacuum of
>speculation.

I believe that IBM still has a few marketing cards to play with this before
we ever get to know much about the "how"... in particular wrt building NUMA
like, high CPU count, systems. Unisys is going to get clobbered so maybe
*they*'ll err, pay more attention to Itania.

--
Rgds, George Macdonald
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Sun, 10 Apr 2005 01:15:53 GMT, epaton <epaton@null.com> wrote:

>ive recently had a thought and figured i would post here to see if it were
>mad or not.
>
>my understanding is the amd has 3 hypertransport links which can connect
>ram other slots or ever other processors to it.

Hypertransport was designed to connect to other ICs, either processors
or I/O chips. It wasn't really designed to ever connect to RAM, since
Opterons have memory controllers built in.

> now i was wondering if it
>would be possible for a motherboard manufacturer to build a board which
>used one of these transports to connect to some exterenal but VERY high
>frequency ram like back in the pentium days.

Theoretically yes.

The Opteron does have a method of snooping over their hypertransport
links. This is used to access data stored in the caches of other
processors. I think it's not entirely out of the realm of possibility
that this could be adapted to connect to an IC which is little more
than a cache controller with a bunch of SRAM.

> i would guess this would need an obteron as they have extra ht's but the
> idea of say a 32-64meg L3 cache has got to be useful to someone and would
> mean opterons could go up against the xeons with massive cache.

It would definitely only be for an Opteron, as the Athlon64 has only a
single hypertransport connector (err, I suppose you could daisy chain
this in between your processor and your I/O chips, but doing so would
probably be just a dumb idea). However the real question you would
have to ask is whether or not this would be remotely useful.

The reason why Xeons have such massive caches is because they don't
have integrated memory controllers and therefore they NEED to do
everything in their power to minimize main memory access if they want
to keep up with the Opteron. With it's built-in memory controller the
Opteron is always going to have lower memory latency and therefore is
much less dependant on large caches.

Given that hypertransport offers less memory bandwidth (only 3.2GB/s
in each direction, vs. 6.4GB/s for the memory interface) and the very
low latency of accessing main memory, plus the overhead latency of
snooping over an HT link, it would be REALLY tough to get much extra
performance out of a such a setup. With caches you quickly run into a
situation of diminishing returns as your size increase. Only 64KB of
cache gets you a hit-rate well in excess of 90% on most applications.
Going up to 1MB of cache will often push your hit rate up to around
96-98%. If a 32MB cache only moves that cache hit rate up by 1 or 2%,
then you're spending a LOT of transistors for only a small percentage
of your memory access. If that memory access ends up being only twice
as fast as going to main memory (my very rough guesstimate of this
setup), then you're only going to see a 0.5-1% improvement in
performance. For comparison, with the Xeon's built in L3 cache you
are often looking at reducing latency by 75% vs. main memory, so only
a small increase in cache hit-rates can help out a fair bit.

In short, my guess is that this just wouldn't be worthwhile in all but
very rare situations.

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Tony Hill <hilla_nospam_20@yahoo.ca> writes:
....
>The reason why Xeons have such massive caches is because they don't
>have integrated memory controllers and therefore they NEED to do
>everything in their power to minimize main memory access if they want
>to keep up with the Opteron. With it's built-in memory controller the
>Opteron is always going to have lower memory latency and therefore is
>much less dependant on large caches.
....
>For comparison, with the Xeon's built in L3 cache you
>are often looking at reducing latency by 75% vs. main memory, so only
>a small increase in cache hit-rates can help out a fair bit.

I've wondered why processor designers haven't integrated some optimal
amount of main memory onto the cpu package. It would seem that this
would shorten the length of traces to get to the memory, could perhaps
make wider paths to memory more independent of the motherboard chip
set. the cpu vendor then marks up the price of the combined parts to
be what the customer would be paying to two different vendors, with the
volumes the cpu vendor could start asking for particular memory
architectures, etc. There might even have been a way to avoid the
whole Rambus dogfight.

It would seem that at any particular point in time and for any set
point of processor performance/market segment that there would be some
fairly clear amount of memory to put on the part. Extra memory would
be treated just like another cache miss and go off-module and through
the chipset. It might even be plausible to scale on-module memory in
the same way that performance is marketed, higher price=more memory
and there would be no memory out there inches away from where you need
it, buy the processor module with the memory you need/want.

I'm not necessarily saying this would be on the same die as the cpu,
with the failure rate/transistor issue. But something going in the
direction of the dual-chip module that I think I remember some of the
Pentium Pro's were supposed to be.

Are there some good reasons to not pull memory into the processor
module that outweighs any possible benefits?

thanks