slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
I was just thinking about how DDR memory doesn't seem to give much (if any) performance gain over SDR. It still baffled me because it supposedly has twice the bandwidth, so should have much more of an impact on system performance than that.

Then I thought about it.

DDR works twice as fast because it send and recieves data on the rising and the falling of the clock cycle.

But is the rest of the system actually sending and recieving on the rise and fall of the clock cycle? I don't think that it is.

The CPUs are still externally using a 133MHz FSB. Athlons only use the double-pumping internally, to my knowledge. So they're still only doing I/O once per cycle of the FSB.

And possibly even the north bridge is designed to only do I/O once per cycle.

So even though the memory could theoretically work twice as fast, the CPU and/or the northbridge are completely ignoring the ability to transfer data twice per clock cycle.

So, theoreticaly, if you had a CPU and motherboard running at a 266MHz FSB, then PC2100 would be synced perfectly with the rest of the system.

However, because nothing uses an FSB that fast, the second I/O that the memory is capable of is completely ignored. So it's the CPU that is failing the concept of DDR SDRAM, not the memory.

Does this make any sense, or am I smoking some funky stuff here?

If the opposite of pro is con, what is the opposite of productivity? Ground first.
 

kurokaze

Distinguished
Mar 12, 2001
421
0
18,780
I want some of what you're smoking.. hehe.

actually I have no idea what you just said.. I just
came back from a really big lunch and I'm very very
groggy... :lol:

Intel Components, AMD Components... all made in Taiwan!
 

74merc

Distinguished
Dec 31, 2007
631
0
18,980
The Athlon and Duron were designed with DDRAM in mind, I doubt they overlooked the northbridge and motherboard.
the FSB is DDR on the motherboard, not the processor, if it was not the motherboard, they would have capability of higher FSB, but depending on the motherboard, you may hit a brick wall at 118, or you could go as high as 145(highest I've heard).
the FSB is double pumped.
As far as the failures of RDRAM and DDR on the PIII, your idea holds water, 3.2GBps bottlenecked to a 133mhz FSB is stupid, its like expecting your ATA100 harddrive to transfer @ 100MBps on an ISA controller.

----------------------
Independant thought is good.
It won't hurt for long.
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
Which all makes sense, except for the one massive failing that AMD chips don't get any more performance out of DDR memory than Intel chips. If AMD chips were doing I/O with the chipset at DDR, and if the chipset was working at DDR, then AMD DDR systems should be seeing more of a benefit than the benchmarks show. Somewhere something just isn't making sense to me.

And if systems get so little gain from DDR, what makes anyone think that they'll gain anything from QDR?

If the opposite of pro is con, what is the opposite of productivity? Ground first.
 

Sojourn

Distinguished
Dec 31, 2007
131
0
18,680
You're way off base, here. Firstly, AMD CPUs do gain more than the P3 from DDR SDRAM. The benchmarks show it. This is because AMD has a 133MHz double pumped front side bus, which syncs perfectly with PC2100 DDR SDRAM (133MHz x 2). This is reflected in all the benchmarks. Compare a PC133 SDR SDRAM AMD system to a PC2100 DDR SDRAM AMD system and you will see the performence difference. Perhaps you're confused becuase system performance isn't doubling with the available bandwidth? Non-synthetic benchmarks test the system as a whole, so doubling memory bandwidth will not result in double the performance, since there are many other factors to consider, which is why its a 'system'. You should go back and read Tom's DDR articles, as he explains all of this.

This doesn't mean that DDR chipsets for AMD CPUs are living up to their potential. The performance gains made by SiS are showing what kind of perfomance increases are possible. nVidia's crossbar technology should give DDR an even greater edge.

QDR SDRAM is not even on the horizon as system memory yet, so I woulnd't worry about it. I'd expect DDR SDRAM to scale to 200MHz, which would saturate the Athlon's maximum bus speed, before we see QDR SDRAM as system memory.

-= This is our wading pool.
Stop pissing in it. =-
 

mpjesse

Splendid
I think that your thinking that the FSB has everything to do with the processor. It doesn't. Theoretically, any Athlon processor could have a FSB speed as the same speed as the core clock speed. FSB has everything to do with the chipset. So, I think you mean that the northbridge and memory should be perfectly in-sync. But it isn't because of RAM latency. Believe it or not, it can take up to 45 cycles to send 1 or 2 bits of data with SDRAM (depending of course if it's DDR). 45 cycles! That's how bad latency is. Now, RDRAM takes the concept of dual channeling which essentially splits that 45 cycle latency in half- on top of that RDRAM's high clock speed also brings latency down.

So, while it would make logical sense that the northbridge and memory are in perfect sync- they're not. =)

-MP Jesse

"Signatures Still Suck"
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
I found it mildly condescending for you to just assume that I haven't read anything on the subject (especially from THG) and that you seem to think that I expect an exact double of performance. I've never said any such nonesense and I believe my points to have been reasonably educated enough to prove that I'm not totally uneducated on the subject. However, I'm willing to overlook this in hopes of a good debate. After all, we all make mistakes.

It's funny how you can miss the flaws in your own reasoning though. You say, "<font color=blue>Compare a PC133 SDR SDRAM AMD system to a PC2100 DDR SDRAM AMD system and you will see the performence difference.</font color=blue>", and then say,"<font color=blue>since there are many other factors to consider, which is why its a 'system'</font color=blue>".

Perhaps you need re-evaluate your statements. First of all, the performance gain is more likely to be from just a better designed chipset/motherboard, especially when the performance gains are so minimal. The double-rate bus isn't the only change made between old SDR motherboards and new DDR motherboards.

Your statement of, "<font color=blue>Firstly, AMD CPUs do gain more than the P3 from DDR SDRAM</font color=blue>", is a perfect example of you ignoring this. DDR SDRAM on the P3 is done through VIA, the same company who's SDR motherboards lag badly in speed compared to Intel's. Is it no wonder that AMD gains more from DDR when you compare a VIA chipset for Intel to an AMD chipset for AMD? If Intel were to make a DDR chipset, I'm 100% positive that you would see Intel CPUs getting the same performance gain.

Second, you need to look at some of the benchmarks that THG has done, such as the ALi Magik 1 review. This graphic from the benchmark from THG clearly shows SDR performing at the <i>same</i> speed as DDR in Sysmark2000.<A HREF="http://www6.tomshardware.com/mainboard/01q2/010509/images/image013.gif" target="_new">http://www6.tomshardware.com/mainboard/01q2/010509/images/image013.gif</A> How can SDR possibly compete at all with DDR if the memory response time is halved in DDR systems?

Third, the small performance gains that we are seeing from DDR over SDR can easily be explained by synchonization alone. It's just the same as seeing a performance gain from having a memory clock of 133MHz with a 100MHz FSB CPU.

I have yet to see <i>any</i> evidence of memory bandwidth actually being doubled in a DDR system. Every single performance gain is perfectly explainable without a double-rate CPU. So I find it very hard to believe that the current Athlon has a double-rate I/O. Sure, internally it may be a double-pumped chip, and that's pretty cool in and of itself. Obviously Intel liked the idea considering what they did with their P4. However, I have seen no evidence to make me believe that the Athlon's communications with the chipset/memory are anything other than single-rate. And we all know that the P3 certainly will never be anything more than single-rate.

So my proposal still stands that perhaps the reason no one is seeing DDR give the performance that it should be giving is simply because the CPUs themselves, or possibly still the chipsets, are only working at single-rate for I/O.

If the opposite of pro is con, what is the opposite of productivity? Ground first.
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
Don't get me wrong. I'm not saying that I firmly believe my idea is right. I want people to prove me wrong. (Or prove me right.) I was just trying to figure out why DDR sucks so bad to what it could do for computers. I mean it works for graphics cards quite well. Why doesn't it have the same effect on PCs?

If the opposite of pro is con, what is the opposite of productivity? Ground first.
 

74merc

Distinguished
Dec 31, 2007
631
0
18,980
this is quite possibly my last response on this subject, it seems you've made up your mind regardless of what the facts are or our insight into such.
you mention you find it condescending for whoever to assume you didn't read up on it, then you make assumptions on our comparisons.
Compare Via KT266 vs 266Pro or whatever the hell its called.
the Via DDR P3 chipset BARELY hangs with an 815, there is a 10-20% improvement on the Athlon when comparing a Via Kt133A to a Via KT266.
as far as it being simply synchronious memory and FSB, why then, do the KT133A's not exactly match the KT266?
has it ever occured to you that the Athlon has a good L2 cache and doesn't require all this bandwidth?
also, I'm not sure if the system latency/memory latency on the DDR is counted on actual cycles or DDR cycles.
is it 2.5 real clock latency, or 2.5 DDR clocks?
if its real clock, that equals to a latency of 5 DDR clocks, if its DDR then its real world latency should be in the 1.25 area, which performance should blow the top off of it.

----------------------
Independant thought is good.
It won't hurt for long.
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
Thank you for your post. This one is a much better argument. I would appreciate a few clarifications though:

When you say 45 cycles, are you referring to CPU cycles or FSB cycles? The reason that I ask is that it contradicts what I've read about how computers work if you mean FSB cycles.

My understanding of how the CPU, the chipset, and the memory work goes like this:

Say that the CPU has a FSB of 133 and a multiplier of 10. This means that the CPU is able to perform 10 'CPU cycles' in each 'bus cycle'. The CPU only communicates to the chipset on each 'bus cycle'. When each 'bus cycle' comes, the CPU takes data from the chipset, gives processed data to the chipset, and querries the chipset for new data. Then the CPU will have 10 'cpu cycles' to crunch that data before it accesses the chipset again. This is why cache memory is so important, because the more data that the CPU is holding internally, the less reliant it is on waiting for the next 'bus cycle' to come around.

The chipset is responsible for communicating with everything. Such as, when the CPU asks it for data, it then asks the corresponding hardware (such as memory), and then when it gets a response, it gives that information back to the CPU. I'm not even sure that the chipset works on a concept of a FSB, because that might just slow it down. Instead it probably just works by sending and recieving data between components as fast as possible. However, if it does work on an FSB itself, then that FSB is probably synced to the CPU's 'bus cycle'.

The memory, not being a big powerful chip like a CPU, can only perform one 'process' per cycle. So the memory's speed is both it's FSB and it's total MHz. Each I/O request made of the memory is handled within that one cycle of the memory's clock.

DDR/dual chanel memory work by being able to actually perform more than one operation per memory clock cycle, thus in effect responding virtually twice as fast.

A CPU that has a dual-pumped I/O works the same, being able to access the chipset twice per 'bus cycle'. The P4's quad-pumped bus can actually access the chipset four times per 'bus cycle'.

Also, a CPU that is internally double or quad pumped can perform that many more 'cpu cycles' per 'bus cycle'. Such as in the case with a multiplier of 10, it can perform 10 'cpu cycles' between the middle and the crest of the wave of the 'bus cycle', and then another 10 'cpu cycles' between the middle and the trough of the wave of the 'bus cycle'. So multi-pumping the CPU means that you are multiplying the number of cpu operations performed per 'bus cycle'.

So a double-pumped CPU with an only internally-pumped bus <i>only</i> performs twice as many CPU operations per bus cycle.

While a double-pumped CPU that is both internally and externally pumped not only performs twice as many CPU operations per bus cycle, but also performs two I/O communications with the chipset per 'bus cycle'.

And so yes, 45 'cpu cycles' can easily be spend between when the CPU asks for data and when the chipset hands the data to the CPU. In a double-pumped system such as an Athlon 1.33 MHz with PC2100 memory, in a worst case scenario if the 'memory cycle' occurs a fraction of time off from a 'bus cycle' than the CPU can run through two bus cycles. With a multiplier of 10, that makes ten 'cpu cycles' per pump, and with a double-pump that makes 20 'cpu cycles' per 'bus cycle'. And with two 'bus cycles' between when the CPU asked for data and when it finally got it, 40 'cpu cycles' could have easily occured.

I'd be very scared to hear though that 45 'bus cycles' could occur between when the CPU asks for memory and when it gets it. That would mean 900 'cpu cycles' would have been spent wasted waiting for new data.

We know that P3s aren't pumped at all. And it's my theory that Athlons are only internally pumped, not internally and externally. It's also my theory that as soon as we see an externally pumped CPU (which the P4 may be) then we'll see DDR SDRAM improve performance considerably.

If the opposite of pro is con, what is the opposite of productivity? Ground first.
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
I'm quite sorry if you feel my mind is already made up. It certainly isn't. I am merely theorizing based on what I've seen from benchmark results. And I have yet to see any 'facts' with any proof to back them up from anyone. So far mpjesse is the first person to even come up with any 'facts' that sounds halfway legit. Everything else has been (for lack of a better phrase) meaningless marketting hype. Please forgive me if I don't hang on every word that comes out of marketting's mouth like some people do.

You accuse me of making assumptions on other people's comparisons. I appologize if I have. I do not believe that I have though. Please point any out to me if you can, for I have really tried not to.

You want me to compare a Via Kt133A to a Via KT266. Sure. And I aknowledge that there is in some applications a small performance gain. But look at how bad VIA has been at making hardware compared to companies like AMD and Intel. And look at how old the kt133 is. The only difference between the kt133 and the kt133A is support for a 133MHz FSB in the CPU. There were no other changes to that chipset. So how can you not count the majority of the performance gained from going from the kt133A to the kt266 as simply VIA having learned how to make a more efficient chipset? The kt266 was a completely new product. The kt133A was a minor modification to an ancient chipset. The real test would be to see how the kt266 runs with single-rate memory compared to how it runs with double-rate memory. We've seen how ALi's chipset runs with such a comparison. The performance improvement is rather small.

I don't want a comparison of two dissimilar chipsets. I want a comparison of the same chipset running single and double rate memory. And if in such a comparison we see that DDR performs significantly better, especially in memory-intensive applications, <i>then</i> I'll believe that the Athlon is externally double-pumped.

No one goes into a scientific experiement and compares the rate of mold A to mold B when mold A is on an apple and mold B is on an orange. They compare mold A to mold B when they're both on an apple and both on an orange.

I have yet to hear much useful debate based on this alone. So if it seems like I refuse to aknowledge anyone's points, it's probably because those points are based on reasoning that is completely useless based on the scientific process of analysis. Not because my theories might be in jeapordy.

Heck, I'd love for someone to flat out prove me totally wrong because I like AMD and I like DDR memory and every little extra point that they can score with me is all the better.

My theories aren't to say that any CPUs suck or that DDR SDRAM sucks. My theory is to simply try to understand why so far DDR has been giving less-than-desirable performance boosts to applications that are incredibly memory intensive.

If we were talking about a P3 Xeon with a meg of cache, I could see why memory bandwidth doesn't make much of a difference. We aren't though. We're talking about 256KB of cache in applications that are using megs of memory. DDR should be making more than just 10% of a performance gain. It should be at least a 20% performance gain, if not 30%. Especially when MMX, SSE, SSE2, and 3DNow! are able to execute so many more operations per clock cycle, making memory access speed that much more important.

If the opposite of pro is con, what is the opposite of productivity? Ground first.
 

Ncogneto

Distinguished
Dec 31, 2007
2,355
53
19,870
I can't recall very many applications (Real World or synthetic) that max the bandwidth overall, most fetches are much under the total available bandwidth, but are bothered more by latency hit's. Henceforth, you will only notice substantial gains were the meomory bandwith of the sdram system was a bottleneck. Also, when this is the case, they percentage gained will only be shown by how much of a bottleneck actually existed to begin with. While bandwith is indeed important, it definatly is not the only factor to consider, hence merly doubling your bandwith certanly will not increase performance 2 fold, expecially in applications that were not bandwith limited to begin with. In other bandwith heavy apps, you will see performance increases directly related to how much of a bottleneck existed to begin with. While the bandwith issue was addressed with DDR, latency penalties are still a major factor. This is one of the things the the new athlon MP and the nforce chipset will adress, making them better suited to DDR ram. Also, your link to the ali chipset is not correct but I think you may be refering to the gif in which they were using pc-1600 DDRAM in which the ali chipset has always had performce issues with. These issues all but disapear when opting to use the pc-2100 DDR RAM.

A little bit of knowledge is a dangerous thing!
 

74merc

Distinguished
Dec 31, 2007
631
0
18,980
<A HREF="http://www6.tomshardware.com/mainboard/01q2/010416/images/image004.gif" target="_new">http://www6.tomshardware.com/mainboard/01q2/010416/images/image004.gif</A>

here. this is about the only memory intensive bench I could find, I'll admit, I didn't look that hard.
the difference between the KT133A and KT266 is roughly a processor speed grade, to expect much more isn't realistic man, it still has latency issues, which are going to keep the performance low. If it were quad pumped, it may overcome them better, but it isn't. DDR isn't the holy grail, and to be honest, I'm not impressed, but it is real.

----------------------
Independant thought is good.
It won't hurt for long.
 

Ncogneto

Distinguished
Dec 31, 2007
2,355
53
19,870
The whole point of the matter is people think that due to the fact DDR is effectively running at twice the speed it should be twice as fast. However, while the theoritical bandwith is doubled, the latency is not cut in half. Latency continues to be the limiting factor. Latency was also a factor with the p4/ Rambus soluiton but to alleiviate this they went to a dual memory controller.

A little bit of knowledge is a dangerous thing!<P ID="edit"><FONT SIZE=-1><EM>Edited by ncogneto on 06/15/01 05:33 PM.</EM></FONT></P>
 

Blessedman

Distinguished
May 29, 2001
583
1
18,980
I don't understand why we are just now seeing dual channel memory now... Why has it taken soo long for this to take place? nVidia's crossbar seems an easier soulution, I say split it further. Have 4 banks of memory requiring the same size DIMMS/RIMMS/SIMMS or what have you with multiple memory controllers. Like a raid array for memory. nForce seems like its on that path. Another quick question, why haven't we seen a new PowerPC chip? I wanna see how bad a 1.5ghz PowerPC kicks the sh!t out of all x86 platforms.
 

74merc

Distinguished
Dec 31, 2007
631
0
18,980
exactly what I was saying, or trying to say...
DDR has effectively double the bandwidth, and apparently double the latency.
same old [-peep-].

----------------------
Independant thought is good.
It won't hurt for long.
 

74merc

Distinguished
Dec 31, 2007
631
0
18,980
there are already server boards that do that, why would noone buy it?
if it were comercially available at a decent price, I'd probably have one already...

----------------------
Independant thought is good.
It won't hurt for long.
 

Ncogneto

Distinguished
Dec 31, 2007
2,355
53
19,870
No, not double the latency, just the same ( CL2) or slightly higher ( CL2.5). Look at it this way, it is like taking a two lane freeway and widening it to 4 lanes, but, the speed limit is still the same. it can now handle more traffic during rush hour, but, you still have to drive 70 mph.

A little bit of knowledge is a dangerous thing!
 

74merc

Distinguished
Dec 31, 2007
631
0
18,980
is the CAS latency rated by clock cycles or DDR cycles?

----------------------
Independant thought is good.
It won't hurt for long.
 

Ncogneto

Distinguished
Dec 31, 2007
2,355
53
19,870
Cas latency is always rated on clock cycles.
ie.

Cas 2 has a wait state of 2 clock cycles
Cas 3 has a wait state of 3 clock cycles

DDR is capable of transmitting Data on the rising and falling sides of the clock cycle but is still bound by the same wait states as SDR RAM.

The only way to improve this is by hardware prefetch, larger on die caches or to increase the FSB, the latter of which would still have the same number of wait states but they would be shorter in terms of time.


A little bit of knowledge is a dangerous thing!<P ID="edit"><FONT SIZE=-1><EM>Edited by ncogneto on 06/16/01 02:07 PM.</EM></FONT></P>
 

tlaughrey

Distinguished
May 9, 2001
581
0
18,980
From what I've read, the AMD760 outperforms the ALi MAGiK because it does a better job of reducing latency with DDR memory. Is that correct, or am I confused? I think it was Anandtech's article that did a pretty thorough comparison of the two.

<i>Real knowledge is to know the extent of one's ignorance.</i>
 

74merc

Distinguished
Dec 31, 2007
631
0
18,980
ok, again, that's what I was saying.
CAS 2 x DDR = Effective CAS of 4.
the actual clock latency is the same or a tad worse(2.5), not that big of a deal, but its effect on the performance is doubled because of the way DDR works. SDR ram loses 2 cycles of memory, DDR loses 4.
your anology needs a little correcting, the pathway hasn't increased, the speed limit has. Now you can go 70 instead of 35, only you wait twice as long at each redlight.

----------------------
Independant thought is good.
It won't hurt for long.
 

Ncogneto

Distinguished
Dec 31, 2007
2,355
53
19,870
Perhaps to some extent, not looking to get into all the differences between DDR chipsets and there effiency or lack thereof, as they all have there plus and minus's.

A little bit of knowledge is a dangerous thing!
 

Ncogneto

Distinguished
Dec 31, 2007
2,355
53
19,870
ok, again, that's what I was saying.
CAS 2 x DDR = Effective CAS of 4.
no a cas rating of 4 would mean it is twice as slow where it is the same. Try to look at it this way, cas latency is measured in wait cycles but in actuallity what is important is the actual time that elapses during those waits( measured in ns). Therefore the actaul time spent in the wait state when comparing DDR to SDR ( both at 133 cas2) is the same.If I follow you correctly you are trying to say that DDR spends twice as much time in wait when compared to SDR. What I am saying is that it spends the same time in wait. Misconception is that it spends half the time in wait.

your anology needs a little correcting, the pathway hasn't increased, the speed limit has. Now you can go 70 instead of 35, only you wait twice as long at each redlight.
my anology is not the best admittably but, it is the pathway that has increased ( bandwith) while the speed limit(latency) has not.

A little bit of knowledge is a dangerous thing!
 

TRENDING THREADS