Memory - Part III - "Evaluation and selection"


Memory - Part III - "Evaluation and selection"

other content:
Part I - "What memory is"
Part II - "What memory does"
Part IV - "Tweaking and tuning"

The "Engineering Objective"!

People who have neve been exposed to it have a 'skewed view' of engineering. They think that an engineer is trying to make the "ultimate": the biggest, the fastest, the strongest, the most powerful... [:lorbat:9] ...whatever! This is almost never (the older you get [:bilbat:6], I think, the more careful you become about ever using an unqualified 'never', or 'always'...) the case! The fact is that one is almost always 'looking at' a multi-dimensional array of interacting characteristics, trying to optimize some, at the 'cost of' others. Nowhere is this so evident as in high-performance, or racing cars:

Let's look at 'tub' (the main structural 'body', as well as the driver 'containment') of an F1 racer... Carbon fiber composites, pound for pound, have extrememly high modulus (stiffness) - but they have one hell of a failure mode! Carbon fiber will load up, load up, load up - taking incredible strains - but then it fractures, releasing all its stored energy in one ungodly 'crack'! Aramid (think 'Kevlar'), on the other hand, has nowhere near the intrinsic strength of carbon, nor is it nearly as light - but it is cheaper, and it has an infinitely better failure mode - it strains, strains, strains, then it 'ruptures' or tears, absorbing fracture energy into its structure, and releasing the remainder gradually. So - now we decide to use a mix - first problem - what 'substrate' or 'matrix' material, epoxy, polyester, what-have-you, can we find that will bond properly to both carbon and aramid? What does it cost? Will it take autoclave temperatures? Once we've found one, we'll then want to do some testing, to find out what ratio of carbon to aramid gives us the characteristics we're looking for. And on, and on it goes - no 'best answer' - always a 'best compromise'!

Unless you've won the lottery (or, happen to be involved in a 'cost-plus' government project :pt1cable: ), one of these 'multi-dimensional axes' of analysis is alway$ - co$t! If I saved a hundred dollars on this (outrageously) fast RAM, would I get more 'bang-for-my-buck' going from a 57xx vidcard to a 58xx? Could I afford better cooling? Another hard drive to put my OS on a RAID0? As well as: what will this this (outrageously) fast RAM cost me in terms of stability; excess stress on, and heat in, the memory controller; time spent 'tweaking'; and, do I have the skill-set required to actually get it properly set up?

I can easily see the utility of DDR3-1600; many who intend to OC a 'core' CPU will do it by taking the Bclk from the stock 133 to the 'near neighborhood' of 200, for a roughly 4GHz CPU; so happens that an eight memory multiplier at a 200 Bclk happens to give you a nice even 1600 - and it's only a 20% 'bump' from Intel's top officially supported (see below...) 1333 - past that, I start to have a lot of que$tion$, as do, apparently, many, many people who've bought the $tuff!!

The difference between "Supported Speed", and "Supported Speed"!

The board makers say "DDR3-XXXX Supported!!", meaning the circuitry on the board can be coaxed, one way or another, eventually, into making at least one piece of someone's XXXX speed memory function. They know that less than 1% of their customers have even a vague idea of what's actually involved, but more than 90% will be mightily impressed by 'BIG NUMBERS'! They simply can't pass by the marketing advantage to those big numbers...

Then, there is the processor. Intel plainly states: DDR3-800, DDR3-1066, and, on some processors, DDR3-1333 are supported - and that's IT! I have pointed this out more than once:

"Intel® Core™ i7 Processor Extreme Edition and Intel® Core™ i7 Processor Datasheet, Volume 2"
2.14 Integrated Memory Controller Miscellaneous Registers

2.14.1 MC_DIMM_CLK_RATIO_STATUS This register contains status information about DIMM clock ratio
Device:3 Function:4 Offset:50h Access as Dword
Bit 28:24 MAX_RATIO. Maximum ratio allowed by the part.
Value = Qclk
00000 = RSVD
00110 = 800MHz
01000 = 1066MHz
01010 = 1333MHz

Bit 4:0
QCLK_RATIO. Current ratio of Qclk
Value = Qclk.
00000 = RSVD
00110 = 800MHz
01000 = 1066MHz
01010 = 1333MHz

2.14.2 MC_DIMM_CLK_RATIO This register is Requested DIMM clock ratio (Qclk), the data rate going to the DIMM. The clock sent to the DIMM is 1/2 of QCLK rate
Device:3 Function:4 Offset:54h Access as Dword
QCLK_RATIO. Requested ratio of Qclk/Bclk.
00000 = RSVD
00110 = 800MHz
01000 = 1066MHz
01010 = 1333MHz
As Porky Pig
used to say, at the end of every cartoon, "Th-Th-Th-
!" Everything else falls under the broad label of 'undocumented' - like fifteenth century maps marked "here be dragons!" I'm not saying it can't work; it obviously does work, sometimes... Somehow, the BIOS and the board hardware are being manipulated to 'fool' the CPU into clocking the memory faster than spec - but it's one of those "pay no attention to that little man behind the curtain" things... AND: If you 'rob Peter to pay Paul' long enough, you wind up with a sore peter! :lol:

Regarding "Intel Supported", there are major advantages to staying within these specifications. The i3/i5/i7 have 'moved' the memory controller onto the processor die. One of the reasons (among many) for the existence of the above mentioned memory configuration registers, is that the memory controller contains 'training' functions: much like the process of the BIOS 'waking up' the machine, and 'discovering' or 'polling' what devices are available, and how are they 'hooked up', the memory controller turns on, looks at its memory configuration registers, and attempts to 'hook up' to the physically attached RAM. The first thing it must do is determine the actual layout of the memory - the 'organization' by rows, columns, ranks, and sides that we discussed in section I. Then, it will attempt to 'adjust its ciruitry' to the physical characteristics of the RAM itself. It needs to 'measure' the impedance characteristics of the on-DIMM RAM controller chip, and the attached DRAM itself; in other words, the combined effects of resistance and capacitance (as well as any 'stray' inductance - a bad thing!) [from section I, again...], that will affect its physical transactions/speed...

As the CPU has not got access to a multimeter, oscilloscope, or logic analyzer, it can only do this 'measuring' by 'looking at' two domains: voltage, and time. It sends a 'pulse' or command from here, and watches there, for a return; it says "Ah-ha! It took so long, to reach such voltage - I must adjust myself' thusly!" And, hopefully, your memory channels are as 'tuned' as they're going to get...

Now, Intel specifies everything, and guarantees nothing! If your memory is constructed exactly to JEDEC spec, and the 'physical hookup' is done correctly, and the planets are in the proper alignment, 'training' will work... In numerous places, Intel's documents contain the 'electronic engineering equivalent' of "your mileage may vary!" This is where your "sore peter" comes in - if you're ridiculously outside Intel's physical specs (and, let's face it - 2166 memory is twice the 1066 supported by all i3/i5/i7 CPUs, meaning it requires the memory controller to perform its functions in half the time - which, patently, falls into the 'realm of the ridiculous'), you sacrifice any benefit of these 'built-in' accommodations!

I consider myself a fair-to-middlin' amateur philosopher; and the great cognitive philosopher Daniel Dennet has written "one of the proper jobs of philosophers is definition mongering" - you can plainly see here that one person's definition of 'supported' (the board maker's and memory manufacturer's) varies wildly from another's (the processor manufacturer's)!!

The 'Memory Support List', and how to use it:

The support list is done when the board design is finalized to production, and almost never updated thereafter. When this is done, somebody with some degree of engineering talent, and knowledge of the hardware involved, sits down with a collection of RAM they have 'lying about', mostly samples provided by manufacturers who have a vested interest in getting their products on the list. He tosses aside the candidates he knows won't work, for one reason or another, on that particular platform, and goes to work setting up and testing the remainder. If he can get it working - it goes on the list; if not, not! I imagine he stays at it until he reaches some arbitrary number, or until his boss says "you got other, important work to do - GIT!"

This leaves a large number of issues for the user:

Being 'on the list' does not guarantee 'instant' compatibility for your use; the list provides no detail regarding "did it just come up and run with a 'Load Optimized?'', "did he have to enable XMP and it worked?", or, "did he (with 'inside' knowledge of the MOBO and BIOS) have to 'diddle around' a half-hour to set it up?"

Not being on the list certainly does not imply it won't work! I have built a little Excel 'tool' for evalution and comparison of RAM; I went to update its contents just to reflect what's available in 2G x 3Channel from NewEgg, and, if memory serves me, wound up with eighty-some odd part numbers! Considering the ungodly amount of MOBOs made, this would require a full-time staff of ten, even assuming the samples were consistently available - else another ten could work all day every day 'hunting down' samples! Your RAM part's absence may simply reflect that it was released after the board...

Many parts you'll see on the list are from unfamiliar makers - they may be available in every quick-service gas-station in Taiwan, but simply aren't available in your market...

The main advantage I see in 'sticking with' items on the memory supported list is just that - the position it puts you in, vis-a-vis support! If your memory can pass MemTesting a single stick at a time, and it is on the QVL, you have support pretty much 'over a barrel' - they have to help get the stuff running - they're the ones who said it would!

In any other situation, you're pretty much trapped in what, unfortunately, has become an industry 'standard operating procedure' of 'passing the blame' - kind of like the Scarecrow in the Wizard of Oz crossing his arms across his chest, pointing in both directions, and saying "They usually go thataway!" The memory manufacturer says: "well, it must be your board, because that memory works at rated speed on 'ABC' board"; the board manufacturer says: "it must be your memory, because 'DEF' memory works on our board at that speed"; the CPU maker says: "it's your problem, because our CPU is rated to run RAM at 'GHI' speeds"; the software guys say: "it's obviously a hardware problem, because it works on 'UVW' platform; the hardware guys say: "it's obviously a software problem, because 'XYZ' program, which does the same thing, runs fine on our platform"; meanwhile, you are 'stuck in the middle', saying "'%$#&' these people, why can't somebody tell me how to make it work?!?"

Evaluating memory from a price/performance standpoint:

As discussed in part I, latencies are 'rated' in 'counts' of memory cycles, faster (smaller numbers) being better; and are, per part II, the most important factor in practical memory performance - this brings us to another (complicating) issue: Q - quick, which is better? A CAS latency of seven at DDR3-1066, or a CAS latency of nine at DDR3-1333? A - who the hell knows!?!

No matter how math-phobic we are[:fixitbil:8], the actual answers are going to require some (not too serious) calculating. Bear in mind while trying [:fixitbil:5] to follow this: "Figures don't lie, but liars (and marketing guys!) figure." - Samuel Clemens (alias Mark Twain) One of the loosely kept 'secrets' in charging the 'big bucks' for (ostensibly and nominally) 'fast' memory is that almost no one knows the answers (or, more importantly - how to get them!) requisite to make these comparisons... Don't be put off by the math - I'm not going to expect you to do the math (heh-heh-heh, have a 'secret' still 'up my sleeve'), I just want to walk through the math, so you get an idea of the concepts involved in not wa$ting the money $pent on RAM!

Again, to re-iterate: the latencies are physical periods of time, necessary to perform physical functions - functions that simply cannot be 'speeded up'! So, how do we go about finding the answer to the earlier question regarding comparison of latency numbers at differing speeds? Simply put, the latencies are some number of counts multiplied by some period of time, per count, at whatever given speed. The number of counts is easy - it's 'given' in the memory specifications. The period of time per count, is 'close to easy'; the period is the reciprocal of the frequency! (...don' sound easy put that way, does it?[:fixitbil:4]) Think of this: we have a sewing needle in a sewing machine that makes five strokes per second; how long does each stroke take? Obviously - one-fifth of a second, that is 1/5th second! Much easier, no? 'Period' is alway equal to one (the time unit - usually a second for 'electronics junk'), divided by the frequency (the number of occurences per unit of time)...

When we are simply trying to get comparisons, the units of time really don't matter to us. You might remember, from science class (if you weren't 'sleeping through' that particular day - I couldn't sleep, was in Catholic school, with 'Sister Mary Attila' hovering around!), that 'scientific math numbers' are often presented in 'scientific notation', thusly: "1.36459 x 10 to the minus ninth" ("1.36459e-9") - beause it's easier than trying to keep track of: ".00000000136459"! At computer frequencies, time periods usually wind up being in microseconds (a millionth of a second, or ten to the minus sixth seconds - for really sloowww stuff), nanoseconds (billionths of a second, or ten to the minus ninth seconds - which most latencies are given in), or picoseconds (trillionths [one millionth of a millionth] of a second, or ten to the minus twelveth seconds - which might be the 'rise time' of a particular individual signal). So long as you are doing the same math the same way for a couple of closely-related frequencies, all you really care about is the "1.364" part of the "1.36459 x 10 to the minus ninth" answer!

So, for calculation purposes, we're looking at a number of 'ticks' of the memory clock, times the length of each tick. For the earlier example, seven ticks at 1066 gives us: 7 x (1/1066), or 7 x 9.38e-4, which is 6.57e-3; nine ticks at 1333 gives us: 9 X (1/1333), or 9 X 7.50e-4, which is 6.75e-3; we can 'toss out' the units (powers of ten), and just look at the 'bare' numbers. 6.57 'somethingths' of a second is smaller than 6.75 'somethingths', so seven latency at 1066 is a teensy bit ('bout three percent) faster than nine latency at 1333!

Another consideration you need to be aware of is the 'integer rounding' effect. As I've pointed out repeatedly, the latencies are actually physical time periods, for physical processes, usually given in nanoseconds. Integer rounding affects your memory thus: the nanosecond latencies are actually 'counted out' in 'memory clock ticks'; if you have a seven nanosecond latency, and your memory clock is 1066, the actual latency calculates to 7.46 'clocks', but you can't set 7.46 - you must always 'round up' to the next higher integer, in this case eight. Eight clocks at 1066 is seven and a half nanoseconds, which means you have to 'waste' a half nanosecond (or roughly 7%) each time the latency is activated - but that's simply how it works! This is an advantage for high speed memory; statistically, you are going to 'lose' a half-cycle on average, every transaction - faster (higher frequency) RAM has 'shorter' clock cycles, so that 'half-cycle' waste gets relatively smaller with each step's increase in clock speed...

Integer rounding causes another issue - 'specification uncertainty'. Say we're looking at the above 9 CAS 1333 memory; we know that the actual physical timing is greater than eight counts (8 X 7.50e-4 = 6.0e-3), or, trust me, the marketing guys would have called it "CAS 8"; and we know that nine counts (9 X 7.50e-4 = 6.75e-3) should allow it to work - and that's all we know! So, knowing that we're somewhere in the range ('domain', for the 'techies!) of 6.0 to 6.75, if we want to do any comparisons at another speed, (unless we have access to the actual, 'raw' SPD data, in nanoseconds - see Timings, 'sub-timings', and "why is this crap a secret?" in part IV...) integer rounding forces us to 'assume the worst', and always work with the largest, slowest number!

Go figure...

Now, you're probably saying "this guy is nuts[:bilbat:2], and he's gonna try to drive me crazy[:fixitbil:9], too, if he thinks I'm gonna struggle through all this to pick out a couple lousy sticks of memory!" The 'secret' up my sleeve is:

The DDR3 Memory Comparison Tool:

This tool was done in Office 10's Excel, saved for earlier versions, and, if you don't happen to have a version of Office installed, should open with OpenOffice's free 'Calc'.

All you need do to use the tool is 'plug in' the values highlighted in blue for whatever DIMMs you are trying to get a handle on, and it will 'pop up' with a value in the LV (latency value - circled in red) column that will give you a simple, 'bang for your buck', easily compared single number! How it works: it takes the values you 'plug into' the latency and speed (frequency) columns, and converts them to a 'lowest common denominator' latency at 1066 in the "Adj'd" cells (I chose 1066 as it is supported, by Intel, on all 1156/1366 CPUs - some do not support 1333 'natively'), taking into account 'specification uncertainty' and 'integer rounding' effects, and averages the resultant 1066 latencies. It then 'looks at' the price you entered, and 'weights' them, according to a couple rules: if two DIMMs have the same latency, the one with the lower price will give a higher LV number; if two DIMMs have the same price, the one with the lower (faster) latency will yield the higher LV; and that's it! Ignore the green circled area at the lower right - it can be used to average the NewEgg 'ratings' for the items - but I never advise putting too much credence in these reviews: first, everyone who can't make something work will complain, whereas many people who did their homework, and knew what they were doing, won't be bothered to 'review'; second, there appear to be an unusually large contingent of people reviewing who are 'dumber than driveway gravel'!

Now, I'd like to talk a little about the memory in the spreadsheet, so it'll probably help if you either download and open it, or click on the image above to enlarge it in a seperate tab... It happens to all be G.Skill, so you can guess where my sentiments lie, but the range of products represented is pretty typical of the 'market variety'. First let's look at a couple of 'ranges'; specifically, the range of prices ($125 - $260) versus the range of average 'adjusted' latency (6.0 - 10.2); this means for more than doubling the price (108%, to be exact), you are getting a 41% lower overall, 'comparative' latency part! Again, looking at 'ranges', half the difference in actual latency ([10.2 + 6.0] / 2, or 8.1) has already been achieved by the fourth part, at $140, or a mere increase of 12% in price! This is what we call, in price/performance analysis, the 'sweet spot'... Another item to note (again, my 'caveat' regarding reviews applies - but the inference is interesting), is that the lowest review average, 2.5, 'belongs to' the highest frequency (2133, double the 1066 supported by all Intel 1156/1366 processors) part - accident? - methinks not! 'Dropping in' a set of DIMMs this fast is not like the difference between 'dropping in' a piece of white bread versus rye into your toaster. You need to know what you're doing, as well as how to accomplish it - and it may take you a couple of days of 'tweaking', if you can manage it at all - so be aware, before you click on that 'confirm order' button!

Part IV - "Tweaking and tuning"

Similar threads