Confused about AMD's Hypertransport...

oolceeoo

Distinguished
Jan 25, 2004
57
0
18,630
How does AMD64's Hypertransport work? I see on newegg that the FSB says integrated into chip.

Now, I know that they have an integrated memory controller eliminating much latency to system memory, but in the CPU guide it says that AMD64's have an 800mhz FSB HT link with DDR400 memory controller built in.

Coming from pentium 4 systems, I know that the CPU internal speed is determined by multiplying the external data bus with the CPU multiplier, but with 4 data transfers per clock cycle.

For example on p4C🙁4 data transfers per clock cycle)*(200mhz external data bus frequency)=800mhz effective speed of the FSB.

Someone said that because the A64's have an 800mhz FSB, and it is DDR, the effective speed of the FSB is 1600mhz. What is the speed of the external data bus on an A64 or does Hypertransport eliminate this? Is the multiplier on A64's only like 2.7x? Is he right?

P4C 3.0ghz
Asus P4C800E-D
GeForceFX 5900 Ultra 256MB
2x512 MB corsair dual channel pc3200
200GB WDJB HDD
Nothing OC'd<P ID="edit"><FONT SIZE=-1><EM>Edited by oolceeoo on 08/15/04 07:38 PM.</EM></FONT></P>
 
The big difference with Athlon 64's Hypertransport is the fact that it's Full duplex and not half.
It can transfer either way at the same time unlike the normal FSB.

A64 HT = 16bit @ 800(4x200MHz) UP AND 16bit @ 800(4x200MHz) Down
P4 FSB = 32bit @ 800(4x200MHz) UP OR 32bit @ 800(4x200MHz) Down
The theoretical bandwidth numbers are the same but 2 things:
2 ways vs 1 way at a time makes it more efficent
and
Not having memory on the line frees up tons of bandwidth.
 
Well... Much confusion arises from the new HT link and FSB speed. Here's my take on it:

Athlon 64s communicate with memory through a dedicated channel, controlled by the on-die circuitry (which, in P4's case, is in the northbridge). This communication channel is capable of 6.4GB/s bandwidth (dual-channel DDR400) and will not be upgraded soon.

The A64s also communicate with the rest of the system with what is called a Hypertransport link. This link is also capable of 6.4GB/s, and should be responsible for traffic with hard drives, video card, sound, PCI cards, and so on. This bus is full duplex, as previously mentioned, which makes it more efficient than the traditional FSB.

Now many advertisements or even CPU guides list the A64 as having 12.8GB/s system bandwidth. Personally, I think that is inaccurate at best; the CPU itself cannot access each component it relates to at 12.8GB/s; the only thing is that memory access doesn't suffer from any bus contention because it's got a separate communication channel, so to speak. It's like adding all 32 PCIe lanes' bandwidth and say, "oh, the new PCIe systems have 16GB/s system bandwidth available" - which is true. Each lane: 250MB/s, 500MB/s for full duplex, about 32 lanes per motherboard... and hence 16GB/s... But can it use that power with one single component? No. Can I, say, access a powerful RAID array at a theoretical 16GB/s max? No.

And saying it's 1600Mhz is a little misleading too, but not as bad as saying 12.8GB/s system bandwidth. I mean, point-to-point busses like those are much less relevant in single-cpu system than they are, say, for multi-cpu opteron systems. They're superior to a shared 6.4GB/s, but its superiority with a 12.8GB/s shared bus is quite suspicious. So it's hardly comparable to a conventional (i.e. shared) 12.8GB/s capable bus... Even if it's more sophisticated... It's another architecture. Saying 12.8GB/s system bandwidth is like saying "well, the truth is complicated, but it's damned fast alright"... :frown:

Also, the multiplier with A64 works with a 200Mhz basis (which is why A64 come in 200Mhz speed bumps - 2.0, 2.2, 2.4Ghz and so on). The HT link has its own multiplier (4x for 800Mhz or 5x for 1Ghz) and the CPU has another (obviously, 10, 11 and 12 for 2.0, 2.2 and 2.4)...

Feel free to disagree, anyone, these are just my opinions...



<P ID="edit"><FONT SIZE=-1><EM>Edited by Mephistopheles on 08/16/04 00:58 AM.</EM></FONT></P>
 
so basically the difference is theres now two busses, one to the memory, and one to the northbridge, and the one to the northbridge is full duplex

is it reduced in latency compared to a traditional fsb, aside from the fact that its full duplex?
-------
<A HREF="http://www.albinoblacksheep.com/flash/you.html" target="_new">please dont click here! </A>

Brand name whores are stupid!
 
>so basically the difference is theres now two busses, one
>to the memory, and one to the northbridge,

Hmm, well, "bus" is not the word, and neither is northbridge.. the RAM is connected directly to the cpu, there is no "bus" inbetween, the cpu talks to and controls the RAM by itselve (through the integrated mem controller).

HT is used for all other I/O being connecting to AGP/PCI-E, southbridge (IDE, network,..), or other cpu's in the case of opteron.

In a traditional CPU is looks like this:
<pre>° ------
° |CPU |
° ------
° |
° | (frontside bus)
° |
° ------ ------
° AGP--| NB |--| SB |-- IDE, NIC, ..
° ------ ------
° |
° |
° RAM
</pre><p>Front side bus is the only I/O bus to the cpu, and as such shares all traffic, including (mostly) memory access (which has to go through an extra hop being the NB/memory controller)

In case of K8 it looks like this:
<pre>° ------
° RAM--|CPU |
° ------
° |
° | (Hyper transport)
° |
° ------
° |AGP |
° ------
° |
° | (Hyper transport)
° |
° ------
° | SB | - IDE, NIC, Sound,..
° ------</pre><p>Hypertransport is not used for memory (unless you have more than one cpu), it is dedicated to southbridge and AGP (PCI-E) traffic. Using Opterons with more than 1 HT link, you could even dedicate one HT link to SB traffic, and another to PCI/E traffic for instance (as well as one to talk to the other cpu('s)). Or you could connect several AGP/PCI-E devices, each connected directly to the cpu, or if you need more I/O you could attach several SB linkes, etc, etc. Possibilities are endless, even though they get rarely used today.

Anyway back to the topic: A64. There are 2 major advantages here; first is having the memory controller ondie. This reduces latency dramatically, as first of all, the controller works at CPU speed, and secondly, data doesn't need to be send over an extra "hop". More over, it can effectively increase bandwith as no longer you have to share the FSB with AGP and SB traffic. HT has ample bandwith to handle those, and its only used for that -unlike a traditional FSB where in most cases the FSB has just enough bandwith for the memory only, and anything else like AGP traffic has to fight for the same bandwith. On top of that, HT is indeed also full duplex, unlike traditional FSBs, but consider HT is plenty fast to the point of being overkill in a single CPU setup (though this bandwith could be put to use when you connect several PCI-E video cards while using GB ethernet, PCI-E/SATA RAID controller, ..etc).

(drat, preview messes up my ASCII scheme completely, hope it will post correctly though)

= The views stated herein are my personal views, and not necessarily the views of my wife. =<P ID="edit"><FONT SIZE=-1><EM>Edited by P4man on 08/16/04 01:27 AM.</EM></FONT></P>
 
those diagrams help



everything seems alot closer to the cpu with the HyperTransport setup
-------
<A HREF="http://www.albinoblacksheep.com/flash/you.html" target="_new">please dont click here! </A>

Brand name whores are stupid!
 
So in magazine computer advertisements when they say '1600mhz system bus' on an A64 system, they are referring to the bus frequency of Hypertransport with its multiplier of 8x, and not the CPU FSB?

Can we even use the term FSB anymore with A64 systems? I always believed that FSB represented the clock frequency relationship between the CPU, L2 cache, and northbridge controller. Since the memory controller is on-die, then is it not technically correct to even mention a FSB on a A64 system?

So now can it be called the Hypertransport Bus, with its own multiplier seperate from the CPU's multiplier? Maybe I'm just slow but I still have tons of question about how this works.

P4C 3.0ghz
Asus P4C800E-D
GeForceFX 5900 Ultra 256MB
2x512 MB corsair dual channel pc3200
200GB WDJB HDD
Nothing OC'd
 
>So in magazine computer advertisements when they say
>'1600mhz system bus' on an A64 system,

Yeah, and its a small miracle they don't advertise the A64 as a 4 GHz cpu since its 64 bit which is twice 32 bit :)

>They are referring to the bus frequency of Hypertransport
>with its multiplier of 8x, and not the CPU FSB?

they are referring to the fact it has a 800 MHz hypertransport link, which is bidirectional so allows up to 1600 MT (megatransfers) per second (800 MT in each direction). Anyway, its crap and incorrect..just an attempt to show the consumer it is "faster" than a P4 800 MHz bus (which arguablly is also incorrect, and should be referred to as a 200 Mhz or 800 MT bus).

>Can we even use the term FSB anymore with A64 systems?

Depends how you define FSB :) For sure, its not like traditional FSB's. Here is what webopedia says:

The bus that connects the CPU to main memory on the motherboard. I/O buses, which connect the CPU with the systems other components, branch off of the system bus.

The system bus is also called the frontside bus, memory bus, local bus, or host bus.

Using this definition, the K8 has no frontside bus. Or you could consider the memory interface "front side bus".. but in fact, the definition can't be right either, since a P4 for instance doesn't connect to main memory, it connects to a memory controller. If you'd want to call that the frontside bus, you could say K8's cross bar is its FSB, and that one runs at core frequency :)

Oh well, its simple really, the term FSB was always used to describe a single bus that was used for several things. K8 splits this into different busses with different speeds/bandwith so basically you shouldn't call it FSB anymore, but what does Joe Sixpack know or care ? He needs numbers to compare, P4 has "800 MHz", what does K8 have ? Can you expect him to understand it has a a set of 200 DDR base clock/400 MT memory interface/2GHz cross bar/800 MT full duplex I/O bus instead ? I'm surprised some marketing idiot has not yet tried adding them all up together 😀

Now imagine the fun we'll have once cpu's stop being governed by a static clockgenerator and we'll move to asynchronous cpu's. "How many GHz does it have ?".. "None, sir" :)

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
He needs numbers to compare, P4 has "800 MHz", what does K8 have ? Can you expect him to understand it has a a set of 200 DDR base clock/400 MT memory interface/2GHz cross bar/800 MT full duplex I/O bus instead ? I'm surprised some marketing idiot has not yet tried adding them all up together 😀
You're right there. :smile: I'm also getting annoyed with the 12.8GB/s, 1600Mhz (or a full 2Ghz! Wooow) crap they're trying to pull off.

Hell, even alienware has ads with 1600Mhz FSB or something. Completely misleading.
 
well, not like Intel doesnt use misleading marketing tactics

-------
<A HREF="http://www.albinoblacksheep.com/flash/you.html" target="_new">please dont click here! </A>

Brand name whores are stupid!
 
The Athlon64 doesn't have a FSB and it's not Quadpumped! it runs at 800Mhz(true frequency go read*) HTB(hypertransport Bus also known as LDT "Lighting Data transfer Protocol") but it has Dual-signalling which means up/downstream and can transmit data up to 1600M(million raTes/sec)1600MT(transfers).

* BSB(backside bus) = L2 cache to CPU runs at CPU SPEED.
* FSB(Frontside bus) = Integrated memory controller to CPU runs at CPU SPEED.
* HTB(HT bus) = CPU to system runs at 800Mhz with Dual-signalling. (amd has 1000Mhz and higher with HT bus 2.0)
* MB(Memory bus) = Integrated memory controller to memory runs at 200Mhz DDR max now(depending on AMD).Its divided with the CPU as a multipiler that causes iterations when accessing memory(also called "memory Bus").

--------------

(Unknown MAge)
 
Hm, while the bandwidth increases of a full duplex connection with the peripherals are impressive on paper, I wonder if those truly make a big diference? I mean, you won't access, say, any RAID array with a speed greater than your one-directional bandwidth... So actually saying it's full duplex and doubling the number is actually not quite right. Saying 1600Mhz HT link is wrong. And saying it's 1600 "megatransfers a second" is also not quite true either, isn't it?.... A Full duplex link is much, <i>much</i> more relevant when dealing with more than one processor. In that case, it is conceivable that you'd need the two directions of data transfer simultaneously.
 
I think it's fair to say the amd marketing of 1600mhz fsb is wrong and misleading. While the full duplex might offer an increase in performance it's misleading to say 1600mhz implying a doubling of performance. That having bin said I don't remember you getting upset when Intel introduced the quad pumped design marketing an 800mhz fsb which implied a doubling in bandwidth over ddr400. quad pumped fsb did improve performance but came no where near a double which the marketing implied.

Looks to me like amd is just playing the same game.

If I glanced at a spilt box of tooth picks on the floor, could I tell you how many are in the pile. Not a chance, But then again I don't have to buy my underware at Kmart.
 
I thought i said up to 1600 Million transfer rates* per second...that does not mean it runs @ 1600Mhz??

--------------

(Unknown MAge)
 
Not all companies market it like that.

I believe that is the point.

If I glanced at a spilt box of tooth picks on the floor, could I tell you how many are in the pile. Not a chance, But then again I don't have to buy my underware at Kmart.
 
quad pumped fsb did improve performance but came no where near a <i>double </i>which the marketing implied.
You mean quadruple.

Also, I didn't even follow hardware news closely by the time the P4 got introduced in order to complain about anything. :smile:
 
Not trying to nitpick but it's a double over the ddr400 which in fact runs at 200mhz. It's all marketing.. ddr did not double performance over sdram. maybe 25% increase. ddr implied a double over standard sdram but the quad pumped design implied a doubling of performance over ddr. So its an exageration of an exageration and amd just upt the game a notch implying another double of a multiple exageration or at least that's how some market it.

This industry really needs a set of bechmark ratings to show exactly what one can expect from a cpu and system.

Personaly I don't see this happening. Too bad really. It's all very confusing.

If I glanced at a spilt box of tooth picks on the floor, could I tell you how many are in the pile. Not a chance, But then again I don't have to buy my underware at Kmart.
 
Hertz is a measurement of cycles per second, and megahertz is millions of cycles per second. Which would be 1,600,000,000 cycles per second. I think the first transistor based cpu was 4,770,000 hertz. Big difference!

It seems like people have differing ideas on how Hypertransport works. If only AMD could explain it a little better and give some hard benchmarks on its performance then it would be much easier to understand.

Not just theoretical bandwidth numbers!

It's still confusing!!!

P4C 3.0ghz
Asus P4C800E-D
GeForceFX 5900 Ultra 256MB
2x512 MB corsair dual channel pc3200
200GB WDJB HDD
Nothing OC'd
 
>It's still confusing!!!

What is confusing ? Its quite clear how it works (at least on a basic level), the only thing one can argue is wether or not such an architecture has a "FSB" or how you should rate its speed. The bottom line is, you can't directly compare it to the P4s bus, its far too different, and in fact far superior.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
>Hm, while the bandwidth increases of a full duplex
>connection with the peripherals are impressive on paper, I
>wonder if those truly make a big diference

Not for most things today I guess. However, it will be nice when one day you attach 2 PCI-E videocards to it, as well as a gigbit ethernet controller and a fast raid diskset. You should see some significant improvements under such circumstances over a (saturated) shared bus a la P4.

> So actually saying it's full duplex and doubling the
>number is actually not quite right

AFAIK, hypertransport is even NOT full duplex. AMD just implemented two HT busses, one up, and one down (both 16 bits wide if I'm not mistaken). Things like that can be spun anyway you want to, comparing frequencies is nonsense if you don't take the buswidht into account, and if you compare bandwith figures to the P4 bus, you should add the memory bus for the K8 as well. Either way, its pretty clear to me this architecture is a superior in just about anyway (easy to implement, power consumption, flexibility and performance)

>A Full duplex link is much, much more relevant when dealing
>with more than one processor. In that case, it is
>conceivable that you'd need the two directions of data
>transfer simultaneously

Having simultaneous up and download paths helps reducing latency (towards I/O and video, not memory latency). This is also why 1 GHz HT helps somewhat, the additional bandwith is pretty much wasted on single CPU, single GPU systems as its overkill, but latency drops further.

= The views stated herein are my personal views, and not necessarily the views of my wife. =