Whats the correct Opteron Bus speed?

skankinred

Distinguished
Jan 23, 2006
24
0
18,510
I'm a little confused because on the AMD website it says:

Opteron HyperTransport™ Technology Speed 1000MHz
Athlon 64 FX: HT Speed 2000
Opteron Front Side Bus frequency 1.4 - 2.8 GHz†
† The front side bus (interface to memory) of the AMD Opteron™ processor runs at the speed of the processor

On THG if you click on cpu > cpu charts > on the right side you can see the processors and their HT listed by an advertisement as well showing the same data.

I'm confused why on 2 places(amd website, and Tomshardware) say that opteron bus speed is 1000mhz.
Why would they say the fx/64 are 2000 instead of saying 1000mhz if all 3 are 2000mhz in total(since it's double like ddr)?
 

Fox_granit

Distinguished
Jan 21, 2006
209
0
18,680
HT doubles the output, but it still only run at 1k. In that respect it works similar to ddr. If you also notice, they don't do the speeds like they do with the athlons either. They are the server chips, so they treat them like that, no frills, no nonsense. Opty's also have 3 HT connections, when the athlons only have 1.
 

MG37221

Distinguished
Mar 17, 2006
209
0
18,680
HT doubles the output, but it still only run at 1k. In that respect it works similar to ddr. If you also notice, they don't do the speeds like they do with the athlons either. They are the server chips, so they treat them like that, no frills, no nonsense. Opty's also have 3 HT connections, when the athlons only have 1.

Only the socket 940 Opterons have 3. The 939s are essentially the same as their 939 Athlon64 counterparts. The HyperTransports for Opterons and Athlon64s rated at 1000/2000 are exactly the same.
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
They don't really have a front side bus (FSB).

AMD never advised people how to answer the question offically, so people say what they think is right.

eg: 1000 MHz x 16 bits wide, but in both directions. (So some people say 2000 MHz).

Earlier Opterons used 800 MHz x 16 bits wide, it didn't make that much difference.

The speed to memory for Opterons and Athlon 64's is really just 400 MHz x 128 bits wide, for 6.4 GB/sec... but the memory interface is not routed via a northbridge, and is also not shared with any other I/O, so no waits need occur. (Although it can be anything from 200 MHz or 266 MHz, all the way to 500 MHz or higher if overclocking).

Giving PCIe x16, and memory, and general system I/O all dedicated paths was a damn smart move by AMD.... and raises overall system performance.

Most the speeds I've given above are 'including' (or after) DDR is 'applied'.

Different Opterons have a different number of HT links, eg: This is one way how the 100, 200 and 800 series differ. They are used to aggregate memory and system performance (assuming each socket has DIMM slots).

It is also one reason why you can't install 2 x Opteron 100 series in a mainboard with 2 CPU sockets and expect it to work.

http://multicore.amd.com
http://www.amd.com - Opteron section..... all of it.
 

MadModMike

Distinguished
Feb 1, 2006
2,034
1
19,780
They don't really have a front side bus (FSB).

AMD never advised people how to answer the question offically, so people say what they think is right.

eg: 1000 MHz x 16 bits wide, but in both directions. (So some people say 2000 MHz).

Earlier Opterons used 800 MHz x 16 bits wide, it didn't make that much difference.

The speed to memory for Opterons and Athlon 64's is really just 400 MHz x 128 bits wide, for 6.4 GB/sec... but the memory interface is not routed via a northbridge, and is also not shared with any other I/O, so no waits need occur. (Although it can be anything from 200 MHz or 266 MHz, all the way to 500 MHz or higher if overclocking).

Giving PCIe x16, and memory, and general system I/O all dedicated paths was a damn smart move by AMD.... and raises overall system performance.

Most the speeds I've given above are 'including' (or after) DDR is 'applied'.

Different Opterons have a different number of HT links, eg: This is one way how the 100, 200 and 800 series differ. They are used to aggregate memory and system performance (assuming each socket has DIMM slots).

It is also one reason why you can't install 2 x Opteron 100 series in a mainboard with 2 CPU sockets and expect it to work.

http://multicore.amd.com
http://www.amd.com - Opteron section..... all of it.

Well, even according to AMD, all Opteron 64's, 1xx 2xx and 8xx, have 3 HT Links, just depends on how many are coherent/disabled. Obviously a s939 1xx can't fit into a board w/ 940 pins, but a 8xx will work in a 2xx and a 2xx will work in an 8xx but you can only use two 2xx in an 8xx board. I take that "† The front side bus (interface to memory) of the AMD Opteron™ processor runs at the speed of the processor" to mean it has a dedicated link @ CPU speed and a link back @ RAM speed, which if I'm not mistaken, is correct. People just look @ the HT Link speed and say "that's how fast it is", but they forget there's a seperate bus for Memory, just like people forget some Opteron 64 boards have 2 sets of DIMM's which give it 2 Memory Controllers effectively. That's another reason why Opties can bust out 11GB/s Memory Bandwidth @ DDR400 whereas a Xeon can't compare to half that. Not insulting you, but your knowledge of Opteron 64's may not be the best :?.

EDIT: You can find Opty 64 boards w/ a PCI-X Bridge integrated into the CPU (Like the one my friend has) and AMD plans to integrated the PCI-E Tunnel into the CPU as well. Should be interesting as AMD is planning some major integration involved w/ Windows Vista.

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
You didn't even mention NUMA in your reply Mike, so "your knowledge of Opteron 64's may not be the best :? ".

AMD do not integrate PCI-X bridges into their CPUs.

If a 'special board' existed (to which you failed to link to the manufacturer website to prove it exists) would all it do is enable said 'PCI-X tunnel' just sitting in the proessor 'integrated' as you claim doing jack-all on every other board.

The Opteron memory controller works at (DDR) 400 MHz x 128 bit wide (well up to 500 MHz, and it can go lower than 333 MHz), it does not 'read' faster than it 'writes' as you claim above.

The coherent HT links are just used to aggregate memory throughput across each NUMA node. (eg: When a system has more than one memory controller, and as each processor has one, a 2-way socket system will have two memory controllers, and a 4-way socket system will have four memory controllers. Hopefully each will actually have 2 x DIMMs installed to provide up to four dual-channel memory controllers..... aka: 4 NUMA nodes, each with a peak performance of 6.4 G/sec, to aggregate memory performance with).

Now most people here know SANDRA will indicate, when very small reads are used, that the figures are closer to 5.5 GB/sec on a typical single socket Athlon 64, FX, X2, or Opteron 100 series system. (Memory speed and timings, and memory controller timings dependant).

So with 2 x NUMA nodes one can aggregate around 11 GB/sec (as stated by you, but using the AMD DCA + server chipset explanation).

[image pending]
 

MadModMike

Distinguished
Feb 1, 2006
2,034
1
19,780
You of all people should know that AMD integrates the PCI-X bridge, since its done on the K8WE that YOU have...

http://www.tyan.sk/image/s2895_bd.png

I think what you interpreted was that the PCI-X is INSIDE the CPU, and that's not what I meant. I meant that the CPU goes directly to the bridge w/o Northbridge interference. I didn't meantion NUMA because you're obsessed with NUMA and you already mentioned it.

You: "it does not 'read' faster than it 'writes' as you claim above." - The people that MADE the CPU seem to say otherwise...hmm....

You: "The coherent HT links are just used to aggregate memory throughput across each NUMA node" - They're used for exactly what you said as well...COHERENCY...meaning the 2 CPU's communicating to each other :O!!!!! Not just to share Memory Bandwidth.

I still state, you don't know alot about Opteron 64's or Athlon 64's for that matter. I know you're going to research Google.com for a few hours to try and find some small thing wrong with my posts or to find some small piece of information to "one-up" me, and you'll probably find something, but that doesn't bother me. Enjoy your time, Google has a new logo for St. Patrick's Day, tell me how you like it.

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
You of all people should know that AMD integrates the PCI-X bridge, since its done on the K8WE that YOU have...

http://www.tyan.sk/image/s2895_bd.png

I think what you interpreted was that the PCI-X is INSIDE the CPU, and that's not what I meant. I meant that the CPU goes directly to the bridge w/o Northbridge interference. I didn't meantion NUMA because you're obsessed with NUMA and you already mentioned it.

Lets just say we are both in aggrement, and differences in clear wording are causing issues: 8)

- AMD do not integrate the PCI-X bridges or tunnels in their processors (We both agree on this)
- http://www.nvidia.com/page/nforce_pro.html (nForce 2200 + 2050 chipset links)
- AMD most likely don't care what the high speed system I/O links are used for, so long as it gains them market share. "The bandwidth is there, use it for what you may and make us look good" is more accurate.

In this case (as per your image) the links just so happen to end up here:
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_9004,00.html
- The AMD-8000 series PCI-X tunnel
- I figure most people know what a PCI-X tunnel connects to :p
- (Also if others are reading this PCI-X is not PCIe / PCI Express)

The Tyan K8WE manual(s) makes this very clear: (I've listed them all):
- http://www.tyan.com/products/html/thunderk8we.html
- ftp://ftp.tyan.com/manuals/m_s2895_101.pdf
- http://www.tyan.com/support/html/manuals.html#other
- ftp://ftp.tyan.com/manuals/a_s2895_100.pdf
- ftp://ftp.tyan.com/manuals/m_s2895_101.exe

When you state and I quote your edited post:

"EDIT: You can find Opty 64 boards w/ a PCI-X Bridge integrated into the CPU (Like the one my friend has) and AMD plans to integrated the PCI-E Tunnel into the CPU as well. Should be interesting as AMD is planning some major integration involved w/ Windows Vista. "

- How do you think people are going to interpret that ?

You: "it does not 'read' faster than it 'writes' as you claim above." - The people that MADE the CPU seem to say otherwise...hmm....

You: "The coherent HT links are just used to aggregate memory throughput across each NUMA node" - They're used for exactly what you said as well...COHERENCY...meaning the 2 CPU's communicating to each other :O!!!!! Not just to share Memory Bandwidth.

I still state, you don't know alot about Opteron 64's or Athlon 64's for that matter. I know you're going to research Google.com for a few hours to try and find some small thing wrong with my posts or to find some small piece of information to "one-up" me, and you'll probably find something, but that doesn't bother me. Enjoy your time, Google has a new logo for St. Patrick's Day, tell me how you like it.

~~Mad Mod Mike, pimpin' the world 1 rig at a time

AMD (the people who make the processors) do not claim they read the memory at say 2 GHz, and write to it a at 400 MHz. This is what your statement is 'implying'.

They read and write to memory at 400 MHz (DDR) x 128 bit (or 64 bit if single channel).

The HyperTransport links in a 1-way, non NUMA, system, the kind that you often recommend to people are not used to keep cache conents in sync (as the systems you recommend only have 1 processor, with 2 cores within it, which thus communicate directly w/o using an external bus, be it a FSB, or a HTT bus, etc).

To keep it short: The AMD systems you pimp / build / recommend to other people lack coherent HyperTransport links.

"I still state, you don't know alot about Opteron 64's or Athlon 64's for that matter" - This is a personal attack on my character, and slanderous at the very least, I wont waste a mods time reporting it as enough people are laughing at the "Mike stated it, so it must be true" spin you've put on yourself. :p

You are 100% correct in that the coherent HTT links are not just used to aggregate memory performance, they also keep CPU caches in check.

eg: If a CPU in socket A changes data in cache/memory, then other CPUs in other sockets need to know, or they might try and fetch 'old / invalid data of code' from memory...... however with AMDs design this is impossible.... look at it, the only way another CPU can access memory owned by a CPU in another socket via that CPU, the one acting as a NorthBridge, anyway thus the CPU just presents back to the bus what it already has in L1/L2 cache without reading memory.... this is one of the 'secrets' as to why AMD 4-way Opterons scale in an almost linear manner.

I suggest you é-mail AMD and ask who provides the best AMD Opteron system builders courses, or training, in (and I am guessing here), the good old United States of America, in your local region.

Some of the stuff you write I do enjoy reading, but I think alot of people misinterpret what the underlying 'meaning' of your posts truely are (myself included).

I mean first you say the PCI-X tunnel is integrated into an AMD CPU, then you take it back and say it isn't, or it wasn't worded very well, etc.

Heck, just for this reply I've used the [Preview] button twice.

The trick to AMD ccNUMA is that a processor can not get to memory without going through another processor, and because of this if said (2nd) processor has changed something in memory (a write), when it is asked for that data/code by the 1st processor it can just feed it what it already has in cache.

This keeps cache sync overheads, and latency (memory accesses are avoided in a very smart way) very low, and permits the memory performance to be aggregated more effectively than other designs on the market...... this is the secret to the AMD ccNUMA design.

This is right from AMDs mouth, techdocs, whitepapers, & staff ... and in contract to what your posts 'imply' and often 'present as text' until they are edited.

I respect you are very supportive of AMD, but as you don't appear to understand the basics of NUMA (around well before the Opterons) I fail to see how you can 'proclaim' yourself to have a better understanding (while correcting posts) about a system, and associated core logic chipsets, that you do not even own.

Many other people on the forums share my opinion, however we both support AMD (for now I do anyway), so can we not just all live in peace without making crude comments about things that no-one really cares about.

.... and just for digging yourself into a hole:
Otherwise feel free to provide me a link to an AMD Opteron 100 series (any 100 series available today btw, S939 or S940) system reading memory at 11 GB/sec. As I know no Opteron 100 series has NUMA, and what you 'now claim' about the 11 GB/sec reads is impossible. Any AMD engineer will agree with me on that. (Until they move to Dual-Channel DDR2-800 that is :p, but you appear to be referring to past systems you've pimped or built, and niether of us can predict the future with 100% accuracy).

You've also stated DDR(1)-400 PC3200 DDR-SDRAM, in a 128 bit configuration can do 11 GB/sec reads using 'magic' on a single CPU socket system.

Please back those claims up now.... (this is what forums are for, discussion and backing up claims with hard proof).


To re-interate, I still repsect you as one dude respects another dude, and many of your previous threads / posts and discussions have caught my eye, in a positive light..... I hope many of my own have done the same for you. The rare exception is replies like your last two, which if you go and edit so they have timestamps after this reply I'll lose any respect I did have for you. 8)



One brother to another, keep it real, and keep the systems pimped damn well.

PS: Google browsing for accurate technical is sad, (we both agree) as you'll get hits from Google which are technically very far from the truth. When dealing with IT hardware go straight to the source (in this case AMD and nVidia) for the technical documents and whitepapers on their hardware. Do not pass Go[ogle], Do not steal my donunts.

I also suggest people open their minds and read a few from Sun MicroSystems, as dealing with purely x86/x64 architectures can limit ones future potential... even impairing it.

http://www.sun.com - Huge number of TechDocs'n'WhitePapers.

We ain't all that different Mike, we just deal with different systems frequently. Ones without NUMA, and ones with NUMA. (Hey you said I brag about it, so.... why not brag some more - hehehe :lol: )
 

MadModMike

Distinguished
Feb 1, 2006
2,034
1
19,780
d00d, you just got badly pwn3d!
As in stamping 'pwn3d' on your forehead. :lol:

I fail to see how I got pwned. I see text that can be easily reprinted from www.amd.com, which infact, is where he says he gets all his information from...so hmm..he just pwned himself...yea...

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 

MadModMike

Distinguished
Feb 1, 2006
2,034
1
19,780
You should also note there chief, that I never stated anything about the 1xx Opteron 64's otherthan them having 3 HTT Links, and they do. After that, including the 11GB/s, was all about 2xx and above Opteron 64's. I apologize if I don't research my posts for 2 hours before so I have 100% word-for-word from a tech document from www.amd.com because I actually know the material and word it into a casual manner instead of trying to sound impressive on a forum board. But to each his own, right chap?

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 

bigbadwolf

Distinguished
Feb 17, 2006
87
0
18,630
You two guys really love to chat. im gonna go get a cup of tea, and lie down for a bit an think bout coolin my head after readin this. seriously , you need to chill out, i feel i hav just witnessed a war. didn't u see that bit-tech article, about how most internet arguments r caused by misinterpratations. im sure that in a room together u wud get on great, just bak off man, cool down and relax
 

MadModMike

Distinguished
Feb 1, 2006
2,034
1
19,780
d00d, you just got badly pwn3d!
As in stamping 'pwn3d' on your forehead. :lol:

OK...I'm a noob, what does PWN and pwn3d mean??

Pwn and Pwn3d are spinoffs from OWN which are a level up from OWNAGE which is annihilation that are normally in FPS Games. The levels are:

OWN: Whooped his @$$
PWN: Raped his @$$
LWN: You pwned him so much, you loan him out to be pwned
OML: Oh My Lord....this one speaks for itself.

~~Mad Mod Mike, pimpin' the world 1 rig at a time
 

gOJDO

Distinguished
Mar 16, 2006
2,309
1
19,780
...w8 I go to bring some beer and popcorn supplies, and than you can continue with arguing.:)

BTW I don't understand this: How can the Opterons 2xx or 8xx achieve 11GB/s of memory bandwidth?
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
BTW I don't understand this: How can the Opterons 2xx or 8xx achieve 11GB/s of memory bandwidth?

The following links might help:
http://multicore.amd.com/Products/CompetitiveComparisons/4P_Server_Comparison.pdf
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/PID30291H_2P_server_competitive_comp.pdf

We all know the Opteron and Athlon 64 lines have integrated dual-channel memory controllers right ? (400 MHz x 128 bit = 6.4 GB/sec - Well 144 bit, but 16 of those are ECC or ChipKill(tm) ECC :p).

But what happens when you have 2 sockets, each with their own CPU, and each with their own memory controller capable of 6.4 GB/sec (each) ?

As you have multiple memory controllers, and each CPU is connected to other CPUs (each with their own memory controllers) via the 'extra' HyperTransport connections present in the Opteron 200 series, and 800 series, they can aggregate performance of memory to 12.8 GB/sec (peak in a 2 socket system), or 25.6 GB/sec (peak in a 4 socket system).

AMD 'borrowed' the design from Sun Microsystems UltraSPARC based servers: http://www.sun.com ; as they have been doing similar things for years.

As each CPU can access memory, or effectively the cache(s), of each other CPU, and if the Operating System supports NUMA (Windows XP has for awhile now, So does Linux, etc) memory access can be spread across each 'node' to improve performance.

Why settle for just multi-core systems, when you can have multiple memory controllers working together aswell ?

Source Forge has had a NUMA FAQ running for awhile, it is fairly basic though:
Linux Scalability Effort Homepage: (aka: LSE)
http://lse.sourceforge.net/
http://lse.sourceforge.net/numa/
http://lse.sourceforge.net/numa/faq/

I hope that clears a few things up.

The AMD 'Direct Connect Architecture' really injects more life into x86/x64 platforms. Intel are still working on their CSI (replacement for their FSB), but until FSB becomes a major bottleneck (it isn't really currently, but in a 4+ way system it could start to impact performance badly), so x86/x64 is safe until 2008 - 2012 when they hit another bump*.

* (Advanced / Complex Out of Order execution will take too much space per core to keep viable, they may just toss it in favour of more cores per chip though, as by then performance without it will be acceptable.)

I'll post some benchmarks / images tomorrow afternoon.
Thinking of starting a NUMA thread on TomsHardware actually, if I do I'll link there and put images there, linked to my 'unlimited' webspace.
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
The new Opterons will be using Socket F (1207 pins, unknown which type of memory now, but suspect will be Registered DDR2 or FB-DIMMs, not XDR - Although I would like to see them use XDR myself as the performance per pin is excellent, and they have far more pins at their disposal)

The new Athlon 64's will be using Socket AM2 (940 pins still, but key'd differently, support for DDR2)

They'll also have a new Socket S1 with 638-pins (I suspect) for mobile platforms (Turion 64 X2), which will also support DDR2 and be going dual-channel for mobile systems. (They'll have 4x or more sustained memory throughput, which will help in onboard 3D video solutions that most laptops have).
 

endyen

Splendid
The one I like is
Opteron Front Side Bus frequency 1.4 - 2.8 GHz†
Since the memory controller is on die, of course it is connected at chip speed. I am really suprized we haven't seen more ads stating this.
Since this "fsb" is an exclusive memory bus, I'm also amazed that no ads have pointed out how much better a dedicated "memory fsb" is, than one that shares bandwidth with the rest of the I/Os. After all, every bit of data for the pci or pciexpress buss cuts into available memory bandwidth.
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
Check out the difference in performance, because of the above mostly, between a K7 at 2.2 GHz, and a K8 at 2.2 GHz, both with only 3.2 GB/sec (Socket A vs Socket 754) interface to memory:
http://www23.tomshardware.com/cpu.html?modelx=33&model1=259&model2=269&chart=68

Sure there where other changes, but not sharing the memory bus, with PCIe, and general system I/O helps scale performance very nicely. Even with only 3.2 GB/sec available to the K8 processor.

K8 = Athlon 64 3200 Newcastle core (in this case - Not the 6.4 GB/sec platform)
K7 = Athlon XP 3200 Barton core (in this case)

172.4 vs 134.3, both with the same GPU. = That is a +28% gain
 

gOJDO

Distinguished
Mar 16, 2006
2,309
1
19,780
@tabris 10x 4 the info about the bandwidth of Opteron server systems:).

Consider that Athlon64 3200+ s754 has iSSE2 which much more efficient handles memory operations compared to the SSE and therefore dramaticly boosts performance in all multimedia apps and games. Another benefit are the available northbridge chipsets for the Athlon64 3200+ which are much more better than those available for the AthlonXP 3200+(considering the nForce2 as the best).
So, the integrated memory controler is not the only difference in these systems. See in Sandra the Newcastle is only 0% to 3% better than the Barton, while PCMark is scoring +30% for the Newcastle.
http://www23.tomshardware.com/cpu.html?modelx=33&model1=259&model2=269&chart=56
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
Most observent I must say. (Although it is obvious they differ by much more than memory sub-system alone :p)

I've thought about it alot, but don't really want to build a Socket 754 system (although there is my Turion 64 laptop @ 1.8 GHz).

We've also forget to mention the Athlon 64 (Newcastle) has twice the L2 cache as the Athlon XP Barton. (Well, until now).

But it must also be said that when the SANDRA memory benchmark is run, very little other I/O is occuring. Where as in other software (eg: Games) that hit the general system I/O, PCIe / AGP bus(es), and memory subsystem all at once the Athlon 64 / Opteron will scale better under those circumstances.

It is also interesting to see that PCMark (which must be more 'real-life' application performance than I'd thought previously) scaled roughly the same as Far Cry :lol: , where as the synthetic SANDRA memory benchmark demonstated similar results (as one would expect it to do).