Thank you! I frequently run a program under Linux, which uses only around 50Mb of RAM, but it runs for about 45 minutes, and I run several hundreds of it sequentially. I have a 2x16Gb 2133Mhz cl14 dual rank HyperX Fury kit in my computer and was curious if faster RAM would matter, so I tested a couple of faster 2x8Gb (3600c17 and 4000cl19 Patriot) and also a 4x4Gb (3200cl16 HyperX) single rank kits based on your suggestion. While in synthetic benchmarks the performance gain varied from 97,4% (2x4Gb 3200c16) up to 111.2% (3600c17) in the running time of this specific program the performance gain (calculated from averaged tunning times, with very low variance) was in the range of 99.4% to 100.6%. The 4x4Gb 3200MHz CL16 "quasi dual rank" kit was only 0.2% faster in running this job, than the 2x4Gb single rank. In synthetic benchmarks the performance advantage of the 4x4Gb was around 10% against 2x4Gb.
I was quite dissapointed until I tried the G-SKill F4-3400C16D-16GVK Samsung D-die single rank kit. It was consistently 104% faster than the original 2133MHz cl14 dual rank kit. It seems to be small difference, but it means about 3 hours for a 3 day long job.
I tried two other G.Skill F4-3200C14D-16GVK and F4-3600C16D-16GVK Samsung B-die kits. All three kits are single rank and the 3400Mhz kit is the "slowest" on paper with the worst latency. Of course in synthetic benchmarks the faster kit were faster. But both faster B-die kits delivered worse performance with this job (around 100.9% when compared to the 2133Mhz result). I obtained the second best 101.2% result using the 3600C16D-16GVK kit with XMP setting but changing the "base frequency" from 133Mhz to 100Mhz. I rechecked it and the number are reproducible. I even tried to set latencyes manually on the theoretically fastest 3200MHz cl14 kit, I even dig into advanced setting like tWR,tRFC, tRRD_SL,... and tried to copy all setting of the 3400c16, but could not get better than 101%. Do you have any idea what can be the cause of this?