Question DRAM Rank-Level Allocations

AhmadNouralizadeh · Sep 3, 2020

I want to do allocations on a specific DRAM rank. The smallest allocation unit in an OS such as Linux is at the page size granularity, which is typically 4KB. So I need to be able to place at least a complete page in a DRAM rank.

But based on DRAMDig Paper [1] and page 20 of Datasheet, volume 1 (M- and H-processor lines) (i.e., my processor chipset family), my Haswell processor uses channel interleaving when DIMMs are placed in slots belonging to different channels (my system has two DIMM, a 4GB and a 8GB). In other words, successive 64-byte (which is, unfortunately, less than 4KB) allocations are mapped to different channels and, consequently, different ranks (e.g., physical address 0 is mapped to rank 0 of DIMM 0 which is located at channel 0, while physical address 64 is mapped to rank 0 of DIMM1 which is located at channel 1), for performance reasons. My questions are:

1) Is it possible to set the channel interleaving bit at a higher order index so as to make 4KB single-rank allocations possible?

2) Otherwise, is it possible to disable channel interleaving purely in software?

3) Otherwise, is placing both DIMMs in the same-channel slots the only remaining approach? This is equivalent to the following statement: channel interleaving does not allow single rank allocations.

Any help, suggestions and guesses are appreciated. I am stuck and the documentations are rather vague.

----------
[1] Table 2 on page 5 shows mapping functions (from physical addresses to DRAM components). In all dual-channel configurations, a low-order bit (a bit in the range [6-8]) is used as the channel interleave bit while it should be at least bit 12 to make 4KB single-rank allocations possible.

alceryes · Sep 3, 2020

It looks like you already got some good answers over at stackexchange but I would add that running two completely different DIMMs (a 4GB and 8GB) is going to tank your performance much more than any theoretical gains in sizing your memory ranks (if that was even possible, considering that DRAM chips are physically connected to chip selects in groups).

Remember, memory channels (designated by the memory slots) are different than memory ranks (designated by the IC architecture on each memory stick).

AhmadNouralizadeh · Sep 3, 2020

alceryes said:
It looks like you already got some good answers over at stackexchange but I would add that running two completely different DIMMs (a 4GB and 8GB) is going to tank your performance much more than any theoretical gains in sizing your memory ranks (if that was even possible, considering that DRAM chips are physically connected to chip selects in groups).

Could you elaborate more on the first sentence where you are talking about sizing memory ranks?

alceryes · Sep 3, 2020

A group of memory ICs are electronically connected to a chip select. That is one rank. You can't break up a memory rank any more than you can cut a CPU in half and expect both halves to continue working. MAYBE it's possible to program groups of ranks but, even if possible, it won't net you anything performance gain in your configuration. You would need to be running server hardware with several memory 'ranks' at your disposal.

Start with a matching pair of RAM sticks to enable dual channel mode in your system to net an overall 7-10% memory performance gain.

AhmadNouralizadeh · Sep 3, 2020

I agree with you. I want to manage ranks exclusively purely for energy reduction (which is applied at the rank-level). I can not use multiple channels, because channel interleaving also results in fine-grained distribution of physical addresses between ranks. Therefore, I have to accept the 10% performance loss in order to achieve full control over a complete rank.

USAFRet · Sep 3, 2020

Energy reduction?
DDR3 consumes maybe 3 watts. 4x DDR3 would be 12.
Running 24/7...maybe $1/month.

Or is there some other consideration?

AhmadNouralizadeh · Sep 3, 2020

USAFRet said:
Energy reduction?
DDR3 consumes maybe 3 watts. 4x DDR3 would be 12.
Running 24/7...maybe $1/month.

Or is there some other consideration?

Well, firstly, there are some low-power modes incorporated in DRAM systems that should be used somehow.😁
All jokes aside, there are some papers stating that DRAM power consumption can be a major contributer to the total system power consumption. For example, IAMEM: Interaction-Aware Memory Energy Management, works towards DRAM power reduction and says that in a datacenter, memory subsystem can consume up to 40% of the total system energy. The same is also true in mobile devices but the ratios are lower. Professor Onur Mutlu of ETH Zurich also has a number of papers on DRAM energy reduction: Prof. Mutlu Publications.

USAFRet · Sep 3, 2020

Please condense the salient point out of that 22 minute slide show.

"40% of the total system energy" ? Including CPU, storage devices, fans, room HVAC?
40%? Really?

AhmadNouralizadeh · Sep 3, 2020

USAFRet said:
Please condense the salient point out of that 22 minute slide show.

"40% of the total system energy" ? Including CPU, storage devices, fans, room HVAC?
40%? Really?

Yes, here, in figure 1, the results of SPEC CPU 2006 benchmark show that on average 23% of the total energy is consumed by the DRAM subsystem (two Intel Xeon Nehalem processors with the total of 8 cores, and 12 dual-rank 4GB DIMMs , as shown in table 4).
One of the reasons is that the main power-reduction efforts have always been focused on processors. Therefore, recently, other components of the system constitute a higher ratio of the total power consumption.

USAFRet · Sep 3, 2020

AhmadNouralizadeh said:
Yes, here, in figure 1, the results of SPEC CPU 2006 benchmark show that on average 23% of the total energy is consumed by the DRAM subsystem (two Intel Xeon Nehalem processors with the total of 8 cores, and 12 dual-rank 4GB DIMMs , as shown in table 4).
One of the reasons is that the main power-reduction efforts have always been focused on processors. Therefore, recently, other components of the system constitute a higher ratio of the total power consumption.

Abstract:
"...yielding 2.4%average (5.2% max) whole-system energy improvement."

AhmadNouralizadeh · Sep 3, 2020

USAFRet said:
Abstract:
"...yielding 2.4%average (5.2% max) whole-system energy improvement."

23% was the upper bound. The used approach could not reduce that much. One of the reasons was small and fixed number of frequency levels. The other reason might be errors in cache-miss pattern prediction.

Search

Question DRAM Rank-Level Allocations

AhmadNouralizadeh

alceryes

Splendid

AhmadNouralizadeh

alceryes

Splendid

AhmadNouralizadeh

USAFRet

Titan

AhmadNouralizadeh

USAFRet

Titan

AhmadNouralizadeh

USAFRet

Titan

AhmadNouralizadeh

TRENDING THREADS

Latest posts

Moderators online

Share this page